[ https://issues.apache.org/jira/browse/SPARK-41855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653917#comment-17653917 ]
Apache Spark commented on SPARK-41855: -------------------------------------- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39360 > `createDataFrame` doesn't handle None/NaN properly > -------------------------------------------------- > > Key: SPARK-41855 > URL: https://issues.apache.org/jira/browse/SPARK-41855 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark > Affects Versions: 3.4.0 > Reporter: Ruifeng Zheng > Priority: Major > > {code:python} > data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), > Row(id=3, value=None)] > # +---+-----+ > # | id|value| > # +---+-----+ > # | 1| NaN| > # | 2| 42.0| > # | 3| null| > # +---+-----+ > cdf = self.connect.createDataFrame(data) > sdf = self.spark.createDataFrame(data) > print() > print() > print(cdf._show_string(100, 100, False)) > print() > print(cdf.schema) > print() > print(sdf._jdf.showString(100, 100, False)) > print() > print(sdf.schema) > self.compare_by_show(cdf, sdf) > {code} > {code:java} > +---+-----+ > | id|value| > +---+-----+ > | 1| null| > | 2| 42.0| > | 3| null| > +---+-----+ > StructType([StructField('id', LongType(), True), StructField('value', > DoubleType(), True)]) > +---+-----+ > | id|value| > +---+-----+ > | 1| NaN| > | 2| 42.0| > | 3| null| > +---+-----+ > StructType([StructField('id', LongType(), True), StructField('value', > DoubleType(), True)]) > {code} > this issue is due to that `createDataFrame` can't handle None/NaN properly: > 1, in the conversion from local data to pd.DataFrame, it automatically > converts both None and NaN to NaN > 2, then in the conversion from pd.DataFrame to pa.Table, it always converts > NaN to null -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org