panbingkun commented on PR #40280: URL: https://github.com/apache/spark/pull/40280#issuecomment-1457349284
> Thanks @panbingkun for the nice fix! Btw, think I found another `createDataFrame` bug which is not working properly with non-nullable schema as below: > > ```python > >>> from pyspark.sql.types import * > >>> schema_false = StructType([StructField("id", IntegerType(), False)]) > >>> spark.createDataFrame([[1]], schema=schema_false) > Traceback (most recent call last): > ... > pyspark.errors.exceptions.connect.AnalysisException: [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's required to be non-nullable. > ``` > > whereas working find with nullable schema as below: > > ```python > >>> schema_true = StructType([StructField("id", IntegerType(), True)]) > >>> spark.createDataFrame([[1]], schema=schema_true) > DataFrame[id: int] > ``` > > Do you have any idea what might be causing this? Could you take a look at it if you're interested in? I have filed an issue at [SPARK-42679](https://issues.apache.org/jira/browse/SPARK-42679). > > Also cc @hvanhovell as an original author for `createDataFrame`. Let me try to investigate it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org