[
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905871#comment-13905871
]
Prasanth J commented on HIVE-5994:
----------------------------------
Puneeth,
This issue can happen with large positive values as well. The reason being when
the number of repetitions of large number is >3 and <=10 SHORT_REPEAT encoding
is used.
https://github.com/apache/hive/blob/branch-0.12/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java#L35
This encoding zigzag encodes the repeating value. So in your case when
4703275633953830000L is zigzag encoded, the MSB bit (64th) is set which will be
considered as a negative value according to this bug.
I tested your test case with trunk and it works fine. Applying the patch
attached in this JIRA should also work.
> ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )
> ----------------------------------------------------------------
>
> Key: HIVE-5994
> URL: https://issues.apache.org/jira/browse/HIVE-5994
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.13.0
> Reporter: Prasanth J
> Assignee: Prasanth J
> Labels: orcfile
> Fix For: 0.13.0
>
> Attachments: HIVE-5994.1.patch
>
>
> For large negative BIGINTs, zigzag encoding will yield large value (64bit
> value) with MSB set to 1. This value is interpreted as negative value in
> SerializationUtils.findClosestNumBits(long value) function. This resulted in
> wrong computation of total number of bits required which results in wrong
> encoding/decoding of values.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)