Hi, Jorge.

The short answer is that the run length is one-off. So, you can get 4
by adding 1 to 3.
> how we are expected to get a length of 4 from this 3

Historically, HIVE-4123 added Integer RLE version 2 in the following
commit on Aug 12, 2013 and became Apache ORC code base later.

https://github.com/apache/hive/commit/69deabeaac020ba60b0f2156579f53e9fe46157a#diff-c00fea1863eaf0d6f047535e874274199020ffed3eb00deb897f513aa86f6b59R232-R236

    // extract the run length
     int len = (firstByte & 0x01) << 8;
     len |= input.read();
     // runs are one off
     len += 1;

Dongjoon.


On Fri, Jul 15, 2022 at 11:18 PM Jorge Cardoso Leitão
<jorgecarlei...@gmail.com> wrote:
>
> Hi,
>
> I am trying to follow the example in the spec:
>
> The unsigned sequence of [23713, 43806, 57005, 48879] would be serialized
> with direct encoding (1), a width of 16 bits (15), and length of 4 (3) as
> [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad, 0xbe, 0xef].
>
> The first two bytes in binary are [0b01011110, 0b00000011]. Splitting it
> according to the spec, I get:
>
> [0b01-01111-0, 0b00000011]
>
> The first 2 bits represent 1 -> it is "direct"
> The next 5 bits represent a 15 in uint8, which is mapped to 16 bit width
> using the table in the spec
> The next bit (0) and byte (00000011), represent a 3 when read into a uint16
> in big endian.
>
> I am not following how we are expected to get a length of 4 from this 3 -
> which mapping converts this 3 into a 4?
>
> Thanks,
> Jorge

Reply via email to