Re: [DISCUSS] Format additions for encoding/compression (Was: [Discuss] Format additions to Arrow for sparse data and data integrity)

Antoine Pitrou Fri, 12 Jul 2019 02:24:22 -0700


Le 12/07/2019 à 10:08, Micah Kornfield a écrit :
> OK, I've created a separate thread for data integrity/digests [1], and
> retitled this thread to continue the discussion on compression and
> encodings.  As a reminder the PR for the format additions [2] suggested a
> new SparseRecordBatch that would allow for the following features:
> 1.  Different data encodings at the Array (e.g. RLE) and Buffer levels
> (e.g. narrower bit-width integers)
> 2.  Compression at the buffer level
> 3.  Eliding all metadata and data for empty columns.


So the question is whether this really needs to be in the in-memory
format, i.e. is it desired to operate directly on this compressed
format, or is it solely for transport?

If the latter, I wonder why Parquet cannot simply be used instead of
reinventing something similar but different.

Regards

Antoine.

Re: [DISCUSS] Format additions for encoding/compression (Was: [Discuss] Format additions to Arrow for sparse data and data integrity)

Reply via email to