Hi, Jorge. The short answer is that the run length is one-off. So, you can get 4 by adding 1 to 3. > how we are expected to get a length of 4 from this 3
Historically, HIVE-4123 added Integer RLE version 2 in the following commit on Aug 12, 2013 and became Apache ORC code base later. https://github.com/apache/hive/commit/69deabeaac020ba60b0f2156579f53e9fe46157a#diff-c00fea1863eaf0d6f047535e874274199020ffed3eb00deb897f513aa86f6b59R232-R236 // extract the run length int len = (firstByte & 0x01) << 8; len |= input.read(); // runs are one off len += 1; Dongjoon. On Fri, Jul 15, 2022 at 11:18 PM Jorge Cardoso Leitão <jorgecarlei...@gmail.com> wrote: > > Hi, > > I am trying to follow the example in the spec: > > The unsigned sequence of [23713, 43806, 57005, 48879] would be serialized > with direct encoding (1), a width of 16 bits (15), and length of 4 (3) as > [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad, 0xbe, 0xef]. > > The first two bytes in binary are [0b01011110, 0b00000011]. Splitting it > according to the spec, I get: > > [0b01-01111-0, 0b00000011] > > The first 2 bits represent 1 -> it is "direct" > The next 5 bits represent a 15 in uint8, which is mapped to 16 bit width > using the table in the spec > The next bit (0) and byte (00000011), represent a 3 when read into a uint16 > in big endian. > > I am not following how we are expected to get a length of 4 from this 3 - > which mapping converts this 3 into a 4? > > Thanks, > Jorge