[ https://issues.apache.org/jira/browse/SPARK-21160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057722#comment-16057722 ]
Edoardo Vivo commented on SPARK-21160: -------------------------------------- Thank you for your answer. I noticed the same happens in relational databases and in R too. Strangely enough, it is the first time I have come across this issue. However, I still keep my opinion that this should not be the default behavior. I understand you will probably close this issue, however I would like to suggest that maybe issuing a Warning in this case might be helpful (for naive users like me). Thank you again > Filtering rows with "not equal" operator yields unexpected result with null > rows > -------------------------------------------------------------------------------- > > Key: SPARK-21160 > URL: https://issues.apache.org/jira/browse/SPARK-21160 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core, SQL > Affects Versions: 2.0.2 > Reporter: Edoardo Vivo > Priority: Minor > > ``` > schema = StructType([StructField("Test", DoubleType())]) > test2 = spark.createDataFrame([[1.0],[1.0],[2.0],[2.0],[None]], schema=schema) > test2.where("Test != 1").show() > ``` > This returns only the rows with the value 2, it does not return the null row. > This should not be the expected behavior, IMO. > Thank you. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org