[ https://issues.apache.org/jira/browse/SPARK-41814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653736#comment-17653736 ]
Ruifeng Zheng edited comment on SPARK-41814 at 1/3/23 3:06 AM: --------------------------------------------------------------- this issue is due to that `createDataFrame` can't handle NaN/None properly: 1, the conversion from rows to pd.DataFrame, which automatically convert null to NaN 2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null was (Author: podongfeng): this issue is due to: 1, the conversion from rows to pd.DataFrame, which automatically convert null to NaN 2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null > Column.eqNullSafe fails on NaN comparison > ----------------------------------------- > > Key: SPARK-41814 > URL: https://issues.apache.org/jira/browse/SPARK-41814 > Project: Spark > Issue Type: Sub-task > Components: Connect > Affects Versions: 3.4.0 > Reporter: Hyukjin Kwon > Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 115, in > pyspark.sql.connect.column.Column.eqNullSafe > Failed example: > df2.select( > df2['value'].eqNullSafe(None), > df2['value'].eqNullSafe(float('NaN')), > df2['value'].eqNullSafe(42.0) > ).show() > Expected: > +----------------+---------------+----------------+ > |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)| > +----------------+---------------+----------------+ > | false| true| false| > | false| false| true| > | true| false| false| > +----------------+---------------+----------------+ > Got: > +----------------+---------------+----------------+ > |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)| > +----------------+---------------+----------------+ > | true| false| false| > | false| false| true| > | true| false| false| > +----------------+---------------+----------------+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org