Re: [DISCUSS] Format additions for encoding/compression (Was: [Discuss] Format additions to Arrow for sparse data and data integrity)

Micah Kornfield Thu, 25 Jul 2019 23:06:53 -0700

>
> It's not just computation libraries, it's any library peeking inside
> Arrow data.  Currently, the Arrow data types are simple, which makes it
> easy and non-intimidating to build data processing utilities around
> them.  If we start adding sophisticated encodings, we also raise the
> cost of supporting Arrow for third-party libraries.

This is another legitimate concern about complexity.

To try to limit complexity. I simplified the proposal PR [1] to only have 1
buffer encoding (FrameOfReferenceIntEncoding) scheme and 1 array encoding
scheme (RLE) that I think will have the most benefit if exploited
properly.  Compression is removed.

I'd like to get closure on the proposal one way or another.  I think now
the question to be answered is if we are willing to introduce the
additional complexity for the performance improvements they can yield?  Is
there more data that people would like to see that would influence their
decision?

Thanks,
Micah

[1] https://github.com/apache/arrow/pull/4815

On Mon, Jul 22, 2019 at 8:59 AM Antoine Pitrou <solip...@pitrou.net> wrote:

> On Mon, 22 Jul 2019 08:40:08 -0700
> Brian Hulette <hulet...@gmail.com> wrote:
> > To me, the most important aspect of this proposal is the addition of
> sparse
> > encodings, and I'm curious if there are any more objections to that
> > specifically. So far I believe the only one is that it will make
> > computation libraries more complicated. This is absolutely true, but I
> > think it's worth that cost.
>
> It's not just computation libraries, it's any library peeking inside
> Arrow data.  Currently, the Arrow data types are simple, which makes it
> easy and non-intimidating to build data processing utilities around
> them.  If we start adding sophisticated encodings, we also raise the
> cost of supporting Arrow for third-party libraries.
>
> Regards
>
> Antoine.
>
>
>

Re: [DISCUSS] Format additions for encoding/compression (Was: [Discuss] Format additions to Arrow for sparse data and data integrity)

Reply via email to