[
https://issues.apache.org/jira/browse/ORC-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Panagiotis Garefalakis updated ORC-703:
---------------------------------------
Summary: [C++] RLE encoding bug on large negative integer (was: RLE
encoding bug on large negative integer)
> [C++] RLE encoding bug on large negative integer
> ------------------------------------------------
>
> Key: ORC-703
> URL: https://issues.apache.org/jira/browse/ORC-703
> Project: ORC
> Issue Type: Bug
> Reporter: lichaoyong
> Priority: Major
>
> ORC has use RLE to encoding/decoding integer.
> Four types are comprised of the RLE encoding/decoding algorithm.
> Short Repeat : used for short repeating integer sequences.
> Direct : used for integer sequences whose values have a relatively constant
> bit width.
> Patched Base : used for integer sequences whose bit widths varies a lot.
> Delta : used for monotonically increasing or decreasing sequences.
> This bug occurs in **Patched Base** Type for large negative number.
> In patched base, we use [3 bits to store base
> value|https://orc.apache.org/specification/ORCv2/] width that is encoded
> using 1 to 8 bytes.
> If the base value is actually 8 bytes in length, the value for base width
> should be 7.
> Currently, this value can go up to 8 that can result in inconsistent data as
> part of the encoding procedure.
> In extreme cases, the encoding/decoding process can even be cored dump
> referring to an illegal address.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)