[ https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351212#comment-16351212 ]
Vihang Karajgaonkar commented on HIVE-18421: -------------------------------------------- Updated the Jira description appropriately. This patch doesn't add code to handle overflows in a "right" manner. It only makes vectorized code execution handle overflow in a "consistent" manner when compared to non-vectorized execution. I feel that handling overflows according to a configurable policy (wraparound, warn, error, set to null) would be a good addition and it can be taken up as a separate effort. Attaching the first version of this patch. I reviewed almost all the vector expressions (phew ..;)) and found a subset of expressions which do not handle overflows in a consistent manner. As discussed above, it introduces a new config which enables usage of checked expressions when available. > Vectorized execution handles overflows in a different manner than > non-vectorized execution > ------------------------------------------------------------------------------------------ > > Key: HIVE-18421 > URL: https://issues.apache.org/jira/browse/HIVE-18421 > Project: Hive > Issue Type: Bug > Components: Vectorization > Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2 > Reporter: Vihang Karajgaonkar > Assignee: Vihang Karajgaonkar > Priority: Major > Attachments: HIVE-18421.01.patch > > > In vectorized execution arithmetic operations which cause integer overflows > can give wrong results. Issue is reproducible in both Orc and parquet. > Simple test case to reproduce this issue > {noformat} > set hive.vectorized.execution.enabled=true; > create table parquettable (t1 tinyint, t2 tinyint) stored as parquet; > insert into parquettable values (-104, 25), (-112, 24), (54, 9); > select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by > diff desc; > +-------+-----+-------+ > | t1 | t2 | diff | > +-------+-----+-------+ > | -104 | 25 | 127 | > | -112 | 24 | 120 | > | 54 | 9 | 45 | > +-------+-----+-------+ > {noformat} > When vectorization is turned off the same query produces only one row. -- This message was sent by Atlassian JIRA (v7.6.3#76005)