Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22047 Let me post something I wrote recently. Could you add test cases to ensure that we do not break the "Ignore NULLs" policy > All the set/aggregate functions ignore NULLs. The typical built-in Set/Aggregate functions are AVG, COUNT, MAX, MIN, SUM, GROUPING. > Note, COUNT(*) is actually equivalent to COUNT(1). Thus, it still includes rows containing null. > Tip, because of the "Ignore NULLs" policy, Sum(a) + Sum(b) is not the same as Sum(a+b). > Note, although the set functions follow the "Ignore NULLs" policy, MIN, MAX, SUM AVG, EVERY, ANY and SOME returns NULL if 1) every value is NULL or 2) SELECT returns no row at all. COUNT never returns NULL. > TODO: When a set function eliminates NULLs, Spark SQL does not follow others to issue a warning message SQLSTATE 01003 "null value eliminated in set function". > TODO: Check whether all the expressions that extend AggregateFunction follow the "Ignore NULLs" policy. If not, we need more investigation to see whether we should correct them. > TODO: When Spark SQL supports EVERY, ANY, and SOME, they follow the same "Ignore NULLs" policy.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org