[jira] [Comment Edited] (SPARK-41814) Column.eqNullSafe fails on NaN comparison
[ https://issues.apache.org/jira/browse/SPARK-41814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653736#comment-17653736 ] Ruifeng Zheng edited comment on SPARK-41814 at 1/3/23 3:12 AM: --- this issue is due to that `createDataFrame` can't handle NaN/None properly: 1, the conversion from rows to pd.DataFrame, which automatically convert None to NaN 2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null was (Author: podongfeng): this issue is due to that `createDataFrame` can't handle NaN/None properly: 1, the conversion from rows to pd.DataFrame, which automatically convert null to NaN 2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null > Column.eqNullSafe fails on NaN comparison > - > > Key: SPARK-41814 > URL: https://issues.apache.org/jira/browse/SPARK-41814 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 115, in > pyspark.sql.connect.column.Column.eqNullSafe > Failed example: > df2.select( > df2['value'].eqNullSafe(None), > df2['value'].eqNullSafe(float('NaN')), > df2['value'].eqNullSafe(42.0) > ).show() > Expected: > ++---++ > |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)| > ++---++ > | false| true| false| > | false| false|true| > |true| false| false| > ++---++ > Got: > ++---++ > |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)| > ++---++ > |true| false| false| > | false| false|true| > |true| false| false| > ++---++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41814) Column.eqNullSafe fails on NaN comparison
[ https://issues.apache.org/jira/browse/SPARK-41814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653736#comment-17653736 ] Ruifeng Zheng edited comment on SPARK-41814 at 1/3/23 3:06 AM: --- this issue is due to that `createDataFrame` can't handle NaN/None properly: 1, the conversion from rows to pd.DataFrame, which automatically convert null to NaN 2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null was (Author: podongfeng): this issue is due to: 1, the conversion from rows to pd.DataFrame, which automatically convert null to NaN 2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null > Column.eqNullSafe fails on NaN comparison > - > > Key: SPARK-41814 > URL: https://issues.apache.org/jira/browse/SPARK-41814 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 115, in > pyspark.sql.connect.column.Column.eqNullSafe > Failed example: > df2.select( > df2['value'].eqNullSafe(None), > df2['value'].eqNullSafe(float('NaN')), > df2['value'].eqNullSafe(42.0) > ).show() > Expected: > ++---++ > |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)| > ++---++ > | false| true| false| > | false| false|true| > |true| false| false| > ++---++ > Got: > ++---++ > |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)| > ++---++ > |true| false| false| > | false| false|true| > |true| false| false| > ++---++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org