[ 
https://issues.apache.org/jira/browse/SPARK-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584448#comment-14584448
 ] 

Holman Lan commented on SPARK-5680:
-----------------------------------

Thanks. A closer look at the statement in the description I realized that it's 
different from our test cases.

Our test cases are:
    select sum(c2) from sum_test
    select c1, sum(c2) from sum_test group by c1

Where c1 is an int column with non-NULL values and c2 is an int column with all 
NULL values.

Spark 1.3.0 and 1.3.1 return NULL for sum(c2) whereas Spark 1.4.0 returns 0. 
Hive, Impala and SQL Server returns NULL for the both cases.

For the statement "select sum('a') from src" Hive indeed returns 0, my bad. The 
title of this JIRA caught my attention. Could the change in behavior on sum of 
all NULL values be related to the changes made for this JIRA?

> Sum function on all null values, should return zero
> ---------------------------------------------------
>
>                 Key: SPARK-5680
>                 URL: https://issues.apache.org/jira/browse/SPARK-5680
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Venkata Ramana G
>            Assignee: Venkata Ramana G
>            Priority: Minor
>             Fix For: 1.3.1, 1.4.0
>
>
> SELECT  sum('a'),  avg('a'),  variance('a'),  std('a') FROM src;
> Current output:
> NULL  NULL    NULL    NULL
> Expected output:
> 0.0   NULL    NULL    NULL
> This fixes hive udaf_number_format.q 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to