Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

Wes McKinney Thu, 09 Jan 2020 08:17:03 -0800

hi folks,

I think we have reached a point where the incomplete C++ Parquet
nested data assembly/disassembly is harming the value of several
others parts of the project, for example the Datasets API. As another
example, it's possible to ingest nested data from JSON but not write
it to Parquet in general.


Implementing the nested data read and write path completely is a
difficult project requiring at least several weeks of dedicated work,
so it's not so surprising that it hasn't been accomplished yet. I know
that several people have expressed interest in working on it, but I
would like to see if anyone would be able to volunteer a commitment of
time and guess on a rough timeline when this work could be done. It
seems to me if this slips beyond 2020 it will significant diminish the
value being created by other parts of the project.

Since I'm pretty familiar with all the Parquet code I'm one candidate
person to take on this project (and I can dedicate the time, but it
would come at the expense of other projects where I can also be
useful). But Micah and others expressed interest in working on it, so
I wanted to have a discussion about it to see what others think.

Thanks
Wes

Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

Reply via email to