Hi Dongjoon,

Yes - I have been working with an implementation of Apache Arrow in Rust
(programming language) [1], and have been adding interoperability with
(storage) formats, such as JSON, Parquet and Avro. ORC is next.

The exercise started by going through the spec and implementing
functionality to read compressed stripes, RLE, etc, which is what I have
been doing [2]. I am purposely separating the code so that others can use
ORC without Arrow.

Thanks to everyone that has contributed to the spec - it is quite easy to
follow.

[1] https://github.com/jorgecarleitao/arrow2
[2] https://github.com/DataEngineeringLabs/orc-format

Best,
Jorge


On Tue, Jul 26, 2022 at 2:13 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Thank you, Jorge.
>
> If you don't mind, may I ask about your usage of Apache ORC spec?
>
> I'm just wondering if you are trying to implement a new writer and
> reader from scratch by yourself?
>
> Dongjoon.
>
> On Mon, Jul 25, 2022 at 11:19 PM Jorge Cardoso Leitão
> <jorgecarlei...@gmail.com> wrote:
> >
> > Hi Dongjoon,
> >
> > Thank you for your answer. That was it. The rationale seems to be that
> > since certain encodings runs have minimum lengths, we do not even store
> the
> > minimum lengths and instead make it part of the decoding. Neat.
> >
> > Best,
> > Jorge
>

Reply via email to