Huw, Generally we assign each ORC File writer implementation a unique writer id so that we can determine the writer of the file. Would you like a number assigned to your writer? We'd ask that your writer always set its id into the Footer.writer field.
https://github.com/apache/orc/blob/75a8f5f2d938a5d13c62619024c1a2443489cce7/proto/orc_proto.proto#L364 .. Owen On Tue, Aug 23, 2022 at 8:30 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Thank you for sharing, Huw. > > Dongjoon. > > On Mon, Aug 22, 2022 at 10:27 PM Huw Campbell <huw.campb...@gmail.com> > wrote: > > > Hi all, > > > > In case you're interested in this. A while ago I wrote up a Haskell > parser > > and writer for ORC, which one can find here > > <https://github.com/HuwCampbell/orc-haskell>. I use it in the day job a > > fair bit, and it's come in quite handy for ad-hoc data generators and > > parsing tasks. > > > > It's a "clean room" implementation, and was written almost entirely from > > the specification instead of cribbing from the Java or C++ versions. > > > > It's also quite capable, being able to read any schemas for v0 and v1 > files > > with a few different compression codecs. It writes with v0 style RLEs. > > > > Lastly it's pretty compact, being only ~6000 lines of sparsely formatted > > Haskell. I think it demonstrates how ORC works quite nicely. > > > > Kind regards, > > Huw > > >