Le 12/07/2019 à 10:08, Micah Kornfield a écrit : > OK, I've created a separate thread for data integrity/digests [1], and > retitled this thread to continue the discussion on compression and > encodings. As a reminder the PR for the format additions [2] suggested a > new SparseRecordBatch that would allow for the following features: > 1. Different data encodings at the Array (e.g. RLE) and Buffer levels > (e.g. narrower bit-width integers) > 2. Compression at the buffer level > 3. Eliding all metadata and data for empty columns.
So the question is whether this really needs to be in the in-memory format, i.e. is it desired to operate directly on this compressed format, or is it solely for transport? If the latter, I wonder why Parquet cannot simply be used instead of reinventing something similar but different. Regards Antoine.