[jira] [Updated] (SPARK-42666) Fix `createDataFrame` to work properly with rows and schema
[ https://issues.apache.org/jira/browse/SPARK-42666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-42666: Summary: Fix `createDataFrame` to work properly with rows and schema (was: Fix `createDataFrame` to work properly) > Fix `createDataFrame` to work properly with rows and schema > --- > > Key: SPARK-42666 > URL: https://issues.apache.org/jira/browse/SPARK-42666 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > The code below is not working properly in Spark Connect: > {code:java} > >>> sdf = spark.range(10) > >>> spark.createDataFrame(sdf.tail(5), sdf.schema) > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 94, in > __repr__ > return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 162, in > dtypes > return [(str(f.name), f.dataType.simpleString()) for f in > self.schema.fields] > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1346, in > schema > self._schema = self._session.client.schema(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 614, in schema > proto_schema = self._analyze(method="schema", plan=plan).schema > File "/.../spark/python/pyspark/sql/connect/client.py", line 755, in > _analyze > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 894, in > _handle_error > raise convert_exception(info, status.message) from None > pyspark.errors.exceptions.connect.AnalysisException: > [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's > required to be non-nullable.{code} > whereas working properly in regular PySpark: > {code:java} > >>> sdf = spark.range(10) > >>> spark.createDataFrame(sdf.tail(5), sdf.schema).show() > +---+ > | id| > +---+ > | 5| > | 6| > | 7| > | 8| > | 9| > +---+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42666) Fix `createDataFrame` to work properly
[ https://issues.apache.org/jira/browse/SPARK-42666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-42666: Summary: Fix `createDataFrame` to work properly (was: Fix `tail` to work properly) > Fix `createDataFrame` to work properly > -- > > Key: SPARK-42666 > URL: https://issues.apache.org/jira/browse/SPARK-42666 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > The code below is not working properly in Spark Connect: > {code:java} > >>> sdf = spark.range(10) > >>> spark.createDataFrame(sdf.tail(5), sdf.schema) > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 94, in > __repr__ > return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 162, in > dtypes > return [(str(f.name), f.dataType.simpleString()) for f in > self.schema.fields] > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1346, in > schema > self._schema = self._session.client.schema(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 614, in schema > proto_schema = self._analyze(method="schema", plan=plan).schema > File "/.../spark/python/pyspark/sql/connect/client.py", line 755, in > _analyze > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 894, in > _handle_error > raise convert_exception(info, status.message) from None > pyspark.errors.exceptions.connect.AnalysisException: > [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's > required to be non-nullable.{code} > whereas working properly in regular PySpark: > {code:java} > >>> sdf = spark.range(10) > >>> spark.createDataFrame(sdf.tail(5), sdf.schema).show() > +---+ > | id| > +---+ > | 5| > | 6| > | 7| > | 8| > | 9| > +---+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org