Update the gist
<https://gist.github.com/alkis/b2c78af23cb224671d7a8a77ac5f60b7> is now in
Apache 2.0 licence and updated to the latest version I got so far.

Tangentially related, I have published a PR for a tool that extracts and
scrubs footers from parquet files to aid customers donating footers to the
foundation to build a benchmark database.

On Fri, Jun 7, 2024 at 9:58 AM Alkis Evlogimenos <
alkis.evlogime...@databricks.com> wrote:

> Absolutely, when we are ready to move to a shared repo I will start the
> formal release process.
>
>
> On Thu, Jun 6, 2024 at 10:38 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> Hi Alkis,
>> This is great, I can try to find some time to try to make it work in CPP
>> if
>> nobody else volunteers.  I think one formality that should probably be
>> done
>> before we iterate on it  is changing the License on the top of the gist to
>> the Apache 2.0 license (if I am reading it correctly it appears to be
>> marked as proprietary currently).
>>
>>
>> Thanks,
>> Micah
>>
>>
>> On Thu, Jun 6, 2024 at 1:22 PM Alkis Evlogimenos
>> <alkis.evlogime...@databricks.com.invalid> wrote:
>>
>> > Hey folks.
>> >
>> > I have been asked to share the latest flatbuffer prototype.
>> >
>> > I will put the latest in this gist
>> > <https://gist.github.com/alkis/b2c78af23cb224671d7a8a77ac5f60b7> left
>> with
>> > TODOs if folks want to collaborate.
>> >
>> > I am iterating in our internal C++ codebase, it would be nice if someone
>> > more knowledgeable with parquet-cpp can integrate this there so that we
>> can
>> > do benchmarking/experimentation. Once setup I would be happy to
>> contribute
>> > the scaffolding that converts from thrift to flatbuffers and take it
>> from
>> > there.
>> >
>> > Other than the TODOs in the file, the following items are still missing:
>> > - optimize Statistics: this is by far the biggest payload
>> > - encryption is completely untouched/unthought
>> > - column indexes
>> > - bloom filters
>> >
>> > Some of the above might have to stay as is.
>> >
>> > The biggest blocker for me right now is collecting "interesting" footers
>> > from real tables (I very much dislike generated ones) and building a
>> good
>> > repository with them to drive more design decisions.
>> >
>>
>

Reply via email to