Update the gist <https://gist.github.com/alkis/b2c78af23cb224671d7a8a77ac5f60b7> is now in Apache 2.0 licence and updated to the latest version I got so far.
Tangentially related, I have published a PR for a tool that extracts and scrubs footers from parquet files to aid customers donating footers to the foundation to build a benchmark database. On Fri, Jun 7, 2024 at 9:58 AM Alkis Evlogimenos < [email protected]> wrote: > Absolutely, when we are ready to move to a shared repo I will start the > formal release process. > > > On Thu, Jun 6, 2024 at 10:38 PM Micah Kornfield <[email protected]> > wrote: > >> Hi Alkis, >> This is great, I can try to find some time to try to make it work in CPP >> if >> nobody else volunteers. I think one formality that should probably be >> done >> before we iterate on it is changing the License on the top of the gist to >> the Apache 2.0 license (if I am reading it correctly it appears to be >> marked as proprietary currently). >> >> >> Thanks, >> Micah >> >> >> On Thu, Jun 6, 2024 at 1:22 PM Alkis Evlogimenos >> <[email protected]> wrote: >> >> > Hey folks. >> > >> > I have been asked to share the latest flatbuffer prototype. >> > >> > I will put the latest in this gist >> > <https://gist.github.com/alkis/b2c78af23cb224671d7a8a77ac5f60b7> left >> with >> > TODOs if folks want to collaborate. >> > >> > I am iterating in our internal C++ codebase, it would be nice if someone >> > more knowledgeable with parquet-cpp can integrate this there so that we >> can >> > do benchmarking/experimentation. Once setup I would be happy to >> contribute >> > the scaffolding that converts from thrift to flatbuffers and take it >> from >> > there. >> > >> > Other than the TODOs in the file, the following items are still missing: >> > - optimize Statistics: this is by far the biggest payload >> > - encryption is completely untouched/unthought >> > - column indexes >> > - bloom filters >> > >> > Some of the above might have to stay as is. >> > >> > The biggest blocker for me right now is collecting "interesting" footers >> > from real tables (I very much dislike generated ones) and building a >> good >> > repository with them to drive more design decisions. >> > >> >
