[
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843972#comment-13843972
]
Xuefu Zhang commented on HIVE-5996:
-----------------------------------
For information, the follow is the text from SQL-92[1] standard w.r.t SUM
function:
{quote}
b) If SUM is specified and DT is exact numeric with scale
S, then the data type of the result is exact numeric with
implementation-defined precision and scale S.
{quote}
For DT as long, currently Hive returns long, which doesn't violate the
standard. However, such implementation is problematic as demonstrated in this
JIRA. Plus, for decimal, Hive sum function accommodates at least 10 billion
rows of data. Letting sum(long) return long is not able to uphold that
assurance. Thus, we need to change the return type to make the function more
useful.
[1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
> Query for sum of a long column of a table with only two rows produces wrong
> result
> ----------------------------------------------------------------------------------
>
> Key: HIVE-5996
> URL: https://issues.apache.org/jira/browse/HIVE-5996
> Project: Hive
> Issue Type: Bug
> Components: UDF
> Affects Versions: 0.12.0
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
>
> {code}
> hive> desc test2;
> OK
> l bigint None
> hive> select * from test2;
> OK
> 6666666666666666666
> 5555555555555555555
> hive> select sum(l) from test2;
> OK
> -6224521851487329395
> {code}
> It's believed that a wrap-around error occurred. It's surprising that it
> happens only with two rows. Same query in MySQL returns:
> {code}
> mysql> select sum(l) from test;
> +----------------------+
> | sum(l) |
> +----------------------+
> | 12222222222222222221 |
> +----------------------+
> 1 row in set (0.00 sec)
> {code}
> Hive should accommodate large number of rows. Overflowing with only two rows
> is very unusable.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)