[ https://issues.apache.org/jira/browse/HIVE-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841534#comment-13841534 ]
Eric Hanson commented on HIVE-5878: ----------------------------------- To me, the issue is really about the importance of obeying the SQL standard (which would argue for using exact types for results in the situations the standard describes) vs. maintaining backward compatibility. I don't think that accuracy is the issue. E.g. avg(int) yields int in SQL Server which is less accurate than yielding double, but it does meet the SQL standard's requirements of being an exact type. I would not favor making Hive yield int for avg(int) to meet the SQL standard because that loses information compared to the previous Hive behavior (yielding double) and that would be perceived even more as a breaking change in existing applications than producing decimal. I agree that vectorization is an implementation detail. Vectorization can be extended to handle decimal. > Hive standard avg UDAF returns double as the return type for some exact input > types > ----------------------------------------------------------------------------------- > > Key: HIVE-5878 > URL: https://issues.apache.org/jira/browse/HIVE-5878 > Project: Hive > Issue Type: Bug > Components: Types, UDF > Affects Versions: 0.12.0 > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Attachments: HIVE-5878.1.patch, HIVE-5878.patch > > > For standard, no-partial avg result, hive currently returns double as the > result type. > {code} > hive> desc test; > OK > d int None > Time taken: 0.051 seconds, Fetched: 1 row(s) > hive> explain select avg(`d`) from test; > ... > Reduce Operator Tree: > Group By Operator > aggregations: > expr: avg(VALUE._col0) > bucketGroup: false > mode: mergepartial > outputColumnNames: _col0 > Select Operator > expressions: > expr: _col0 > type: double > {code} > However, exact types including integers and decimal should yield exact type. > Here is what MySQL does: > {code} > mysql> desc test; > +-------+--------------+------+-----+---------+-------+ > | Field | Type | Null | Key | Default | Extra | > +-------+--------------+------+-----+---------+-------+ > | i | int(11) | YES | | NULL | | > | b | tinyint(1) | YES | | NULL | | > | d | double | YES | | NULL | | > | s | varchar(5) | YES | | NULL | | > | dd | decimal(5,2) | YES | | NULL | | > +-------+--------------+------+-----+---------+-------+ > mysql> create table test62 as select avg(i) from test; > mysql> desc test62; > +-------+---------------+------+-----+---------+-------+ > | Field | Type | Null | Key | Default | Extra | > +-------+---------------+------+-----+---------+-------+ > | avg(i) | decimal(14,4) | YES | | NULL | | > +-------+---------------+------+-----+---------+-------+ > 1 row in set (0.00 sec) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)