Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

Antoine Pitrou Sun, 01 Mar 2020 13:14:11 -0800


Le 01/03/2020 à 22:01, Wes McKinney a écrit :
> In the context of a "next version of the Feather format" ARROW-5510
> (which is consumed only by Python and R at the moment), I have been
> looking at compressing buffers using fast compressors like ZSTD when
> writing the RecordBatch bodies. This could be handled privately as an
> implementation detail of the Feather file, but since ZSTD compression
> could improve throughput in Flight, for example, I thought I would
> bring it up for discussion.
> 
> I can see two simple compression strategies:
> 
> * Compress the entire message body in one-shot, writing the result out
> with an 8-byte int64 prefix indicating the uncompressed size
> * Compress each non-zero-length constituent Buffer prior to writing to
> the body (and using the same uncompressed-length-prefix when writing
> the compressed buffer)
> 
> The latter strategy is preferable for scenarios where we may project
> out only a few fields from a larger record batch (such as reading from
> a memory-mapped file).


Agreed.  It may also allow using different compression strategies for
different kinds of buffers (for example a bytestream splitting strategy
for floats and doubles, or a delta encoding strategy for integers).

> Implementation could be accomplished by one of the following methods:
> 
> * Setting a field in Message.custom_metadata
> * Adding a new field to Message

I think it has to be a new field in Message.  Making it an ignorable
metadata field means non-supporting receivers will decode and interpret
the data wrongly.

Regards

Antoine.

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

Reply via email to