Re: Haskell Implementation of ORC

Owen O'Malley Tue, 23 Aug 2022 13:34:39 -0700

Huw,
   Generally we assign each ORC File writer implementation a unique writer
id so that we can determine the writer of the file. Would you like a number
assigned to your writer? We'd ask that your writer always set its id into
the Footer.writer field.


https://github.com/apache/orc/blob/75a8f5f2d938a5d13c62619024c1a2443489cce7/proto/orc_proto.proto#L364

.. Owen

On Tue, Aug 23, 2022 at 8:30 PM Dongjoon Hyun <[email protected]>
wrote:

> Thank you for sharing, Huw.
>
> Dongjoon.
>
> On Mon, Aug 22, 2022 at 10:27 PM Huw Campbell <[email protected]>
> wrote:
>
> > Hi all,
> >
> > In case you're interested in this. A while ago I wrote up a Haskell
> parser
> > and writer for ORC, which one can find here
> > <https://github.com/HuwCampbell/orc-haskell>. I use it in the day job a
> > fair bit, and it's come in quite handy for ad-hoc data generators and
> > parsing tasks.
> >
> > It's a "clean room" implementation, and was written almost entirely from
> > the specification instead of cribbing from the Java or C++ versions.
> >
> > It's also quite capable, being able to read any schemas for v0 and v1
> files
> > with a few different compression codecs. It writes with v0 style RLEs.
> >
> > Lastly it's pretty compact, being only ~6000 lines of sparsely formatted
> > Haskell. I think it demonstrates how ORC works quite nicely.
> >
> > Kind regards,
> > Huw
> >
>

Re: Haskell Implementation of ORC

Reply via email to