[ https://issues.apache.org/jira/browse/SPARK-39262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xinrong Meng updated SPARK-39262: --------------------------------- Description: Correct the behavior of creating DataFrame from an RDD **with `0` or an empty list as the first element**. Before: ```py >>> spark.createDataFrame(spark._sc.parallelize([0, 1])) Traceback (most recent call last): ... ValueError: The first row in RDD is empty, can not infer schema >>> spark.createDataFrame(spark._sc.parallelize([[], []])) Traceback (most recent call last): ... ValueError: The first row in RDD is empty, can not infer schema ``` After: ```py >>> spark.createDataFrame(spark._sc.parallelize([0, 1])) Traceback (most recent call last): ... TypeError: Can not infer schema for type: <class 'int'> >>> spark.createDataFrame(spark._sc.parallelize([[], []])) DataFrame[] >>> spark.createDataFrame(spark._sc.parallelize([[], []])).show() ++ || ++ || || ++ ``` was: Correct error messages when creating DataFrame from an RDD with the first element `0`. Previously, we raise a ValueError "The first row in RDD is empty, can not infer schema" in such case. However, a TypeError "Can not infer schema for type: <class 'int'>" should be raised instead. > Correct the behavior of creating DataFrame from an RDD > ------------------------------------------------------ > > Key: SPARK-39262 > URL: https://issues.apache.org/jira/browse/SPARK-39262 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.4.0 > Reporter: Xinrong Meng > Priority: Major > > Correct the behavior of creating DataFrame from an RDD **with `0` or an empty > list as the first element**. > > Before: > ```py > >>> spark.createDataFrame(spark._sc.parallelize([0, 1])) > Traceback (most recent call last): > ... > ValueError: The first row in RDD is empty, can not infer schema > >>> spark.createDataFrame(spark._sc.parallelize([[], []])) > Traceback (most recent call last): > ... > ValueError: The first row in RDD is empty, can not infer schema > ``` > After: > ```py > >>> spark.createDataFrame(spark._sc.parallelize([0, 1])) > Traceback (most recent call last): > > ... > TypeError: Can not infer schema for type: <class 'int'> > >>> spark.createDataFrame(spark._sc.parallelize([[], []])) > DataFrame[] > > >>> spark.createDataFrame(spark._sc.parallelize([[], []])).show() > ++ > || > ++ > || > || > ++ > ``` -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org