[ https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614627#comment-14614627 ]
Santiago M. Mola commented on SPARK-8636: ----------------------------------------- [~davies] NULL values are grouped together when using a GROUP BY clause. See https://en.wikipedia.org/wiki/Null_%28SQL%29#When_two_nulls_are_equal:_grouping.2C_sorting.2C_and_some_set_operations {quote} Because SQL:2003 defines all Null markers as being unequal to one another, a special definition was required in order to group Nulls together when performing certain operations. SQL defines "any two values that are equal to one another, or any two Nulls", as "not distinct". This definition of not distinct allows SQL to group and sort Nulls when the GROUP BY clause (and other keywords that perform grouping) are used. {quote} > CaseKeyWhen has incorrect NULL handling > --------------------------------------- > > Key: SPARK-8636 > URL: https://issues.apache.org/jira/browse/SPARK-8636 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.0 > Reporter: Santiago M. Mola > Labels: starter > > CaseKeyWhen implementation in Spark uses the following equals implementation: > {code} > private def equalNullSafe(l: Any, r: Any) = { > if (l == null && r == null) { > true > } else if (l == null || r == null) { > false > } else { > l == r > } > } > {code} > Which is not correct, since in SQL, NULL is never equal to NULL (actually, it > is not unequal either). In this case, a NULL value in a CASE WHEN expression > should never match. > For example, you can execute this in MySQL: > {code} > SELECT CASE NULL WHEN NULL THEN "NULL MATCHES" ELSE "NULL DOES NOT MATCH" END > FROM DUAL; > {code} > And the result will be "NULL DOES NOT MATCH". -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org