> > It's not just computation libraries, it's any library peeking inside > Arrow data. Currently, the Arrow data types are simple, which makes it > easy and non-intimidating to build data processing utilities around > them. If we start adding sophisticated encodings, we also raise the > cost of supporting Arrow for third-party libraries.
This is another legitimate concern about complexity. To try to limit complexity. I simplified the proposal PR [1] to only have 1 buffer encoding (FrameOfReferenceIntEncoding) scheme and 1 array encoding scheme (RLE) that I think will have the most benefit if exploited properly. Compression is removed. I'd like to get closure on the proposal one way or another. I think now the question to be answered is if we are willing to introduce the additional complexity for the performance improvements they can yield? Is there more data that people would like to see that would influence their decision? Thanks, Micah [1] https://github.com/apache/arrow/pull/4815 On Mon, Jul 22, 2019 at 8:59 AM Antoine Pitrou <solip...@pitrou.net> wrote: > On Mon, 22 Jul 2019 08:40:08 -0700 > Brian Hulette <hulet...@gmail.com> wrote: > > To me, the most important aspect of this proposal is the addition of > sparse > > encodings, and I'm curious if there are any more objections to that > > specifically. So far I believe the only one is that it will make > > computation libraries more complicated. This is absolutely true, but I > > think it's worth that cost. > > It's not just computation libraries, it's any library peeking inside > Arrow data. Currently, the Arrow data types are simple, which makes it > easy and non-intimidating to build data processing utilities around > them. If we start adding sophisticated encodings, we also raise the > cost of supporting Arrow for third-party libraries. > > Regards > > Antoine. > > >