Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22047
  
    Let me post something I wrote recently. Could you add test cases to ensure 
that we do not break the  "Ignore NULLs" policy
    
    > All the set/aggregate functions ignore NULLs. The typical built-in 
Set/Aggregate functions are AVG, COUNT, MAX, MIN, SUM, GROUPING. 
    
    > Note, COUNT(*) is actually equivalent to COUNT(1). Thus, it still 
includes rows containing null.
    
    > Tip, because of the "Ignore NULLs" policy, Sum(a) + Sum(b) is not the 
same as Sum(a+b). 
    
    > Note, although the set functions follow the "Ignore NULLs" policy, MIN, 
MAX, SUM AVG, EVERY, ANY and SOME returns NULL if 1) every value is NULL or 2) 
SELECT returns no row at all. COUNT never returns NULL. 
    
    > TODO: When a set function eliminates NULLs, Spark SQL does not follow 
others to issue a warning message SQLSTATE 01003 "null value eliminated in set 
function".
    
    > TODO: Check whether all the expressions that extend AggregateFunction 
follow the "Ignore NULLs" policy. If not, we need more investigation to see 
whether we should correct them.
    
    > TODO: When Spark SQL supports EVERY, ANY, and SOME, they follow the same 
"Ignore NULLs" policy.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to