[ 
https://issues.apache.org/jira/browse/SPARK-42001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42001:
------------------------------------

    Assignee:     (was: Apache Spark)

> Unexpected schema set to DefaultSource plan 
> (ReadwriterTests.test_save_and_load)
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-42001
>                 URL: https://issues.apache.org/jira/browse/SPARK-42001
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Connect
>    Affects Versions: 3.4.0
>            Reporter: Hyukjin Kwon
>            Priority: Major
>
> {code}
>                                                                               
>   
> pyspark/sql/tests/test_readwriter.py:28 
> (ReadwriterParityTests.test_save_and_load)
> self = 
> <pyspark.sql.tests.connect.test_parity_readwriter.ReadwriterParityTests 
> testMethod=test_save_and_load>
>     def test_save_and_load(self):
>         df = self.df
>         tmpPath = tempfile.mkdtemp()
>         shutil.rmtree(tmpPath)
>         df.write.json(tmpPath)
>         actual = self.spark.read.json(tmpPath)
>         self.assertEqual(sorted(df.collect()), sorted(actual.collect()))
>     
>         schema = StructType([StructField("value", StringType(), True)])
>         actual = self.spark.read.json(tmpPath, schema)
> >       self.assertEqual(sorted(df.select("value").collect()), 
> > sorted(actual.collect()))
> ../test_readwriter.py:39: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../../connect/dataframe.py:1246: in collect
>     query = self._plan.to_proto(self._session.client)
> ../../connect/plan.py:93: in to_proto
>     plan.root.CopyFrom(self.plan(session))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = <pyspark.sql.connect.plan.DataSource object at 0x7fe0d09c22b0>
> session = <pyspark.sql.connect.client.SparkConnectClient object at 
> 0x7fe0d069b5b0>
>     def plan(self, session: "SparkConnectClient") -> proto.Relation:
>         plan = proto.Relation()
>         if self.format is not None:
>             plan.read.data_source.format = self.format
>         if self.schema is not None:
> >           plan.read.data_source.schema = self.schema
> E           TypeError: StructType([StructField('value', StringType(), True)]) 
> has type StructType, but expected one of: bytes, unicode
> ../../connect/plan.py:246: TypeError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to