[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845760#comment-13845760
 ] 

Xuefu Zhang commented on HIVE-5996:
-----------------------------------

{quote}
Having sum(bigint) return bigint is long standing behavior in Hive and is 
reasonable.

As a side note, SQL Server returns bigint for sum(bigint).

If users need more digits, they can cast the input to sum to a decimal.
{quote}

My concern is not about the number of digits that long can hold. Hive processes 
large number of rows that traditional DBs are shy of, and the chance of getting 
overflow error is bigger. With the proposed change, Hive can guarantee 10b (or 
certain number of) rows without worry about this problem. Without it, Hive has 
such guaranty, and two valid rows can overflow, as demonstrated in this JIRA.


> Query for sum of a long column of a table with only two rows produces wrong 
> result
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-5996
>                 URL: https://issues.apache.org/jira/browse/HIVE-5996
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.12.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-5996.patch
>
>
> {code}
> hive> desc test2;
> OK
> l                     bigint                  None                
> hive> select * from test2;                                 
> OK
> 6666666666666666666
> 5555555555555555555
> hive> select sum(l) from test2;
> OK
> -6224521851487329395
> {code}
> It's believed that a wrap-around error occurred. It's surprising that it 
> happens only with two rows. Same query in MySQL returns:
> {code}
> mysql> select sum(l) from test;
> +----------------------+
> | sum(l)               |
> +----------------------+
> | 12222222222222222221 |
> +----------------------+
> 1 row in set (0.00 sec)
> {code}
> Hive should accommodate large number of rows. Overflowing with only two rows 
> is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to