Le 01/03/2020 à 22:01, Wes McKinney a écrit : > In the context of a "next version of the Feather format" ARROW-5510 > (which is consumed only by Python and R at the moment), I have been > looking at compressing buffers using fast compressors like ZSTD when > writing the RecordBatch bodies. This could be handled privately as an > implementation detail of the Feather file, but since ZSTD compression > could improve throughput in Flight, for example, I thought I would > bring it up for discussion. > > I can see two simple compression strategies: > > * Compress the entire message body in one-shot, writing the result out > with an 8-byte int64 prefix indicating the uncompressed size > * Compress each non-zero-length constituent Buffer prior to writing to > the body (and using the same uncompressed-length-prefix when writing > the compressed buffer) > > The latter strategy is preferable for scenarios where we may project > out only a few fields from a larger record batch (such as reading from > a memory-mapped file).
Agreed. It may also allow using different compression strategies for different kinds of buffers (for example a bytestream splitting strategy for floats and doubles, or a delta encoding strategy for integers). > Implementation could be accomplished by one of the following methods: > > * Setting a field in Message.custom_metadata > * Adding a new field to Message I think it has to be a new field in Message. Making it an ignorable metadata field means non-supporting receivers will decode and interpret the data wrongly. Regards Antoine.