[ 
https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667895#comment-15667895
 ] 

Wes McKinney commented on ARROW-300:
------------------------------------

One issue with doing compression only at the transport level is if people use 
the Arrow memory layout and metadata to create file formats for storing larger 
amounts of data. For example, I would like to deprecate the Feather metadata 
https://github.com/wesm/feather/blob/master/cpp/src/feather/metadata.fbs and 
use only the Arrow metadata. Unless you support column/buffer-level 
compression, then it would be expensive to read only a subset of the file. You 
could argue that such data should be stored as Parquet instead, but it does 
offer a flexibility that's really appealing (particularly since random access 
on memory-mapped Arrow-like data would be possible). 

> [Format] Add buffer compression option to IPC file format
> ---------------------------------------------------------
>
>                 Key: ARROW-300
>                 URL: https://issues.apache.org/jira/browse/ARROW-300
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Format
>            Reporter: Wes McKinney
>
> It may be useful if data is to be sent over the wire to compress the data 
> buffers themselves as their being written in the file layout.
> I would propose that we keep this extremely simple with a global buffer 
> compression setting in the file Footer. Probably only two compressors worth 
> supporting out of the box would be zlib (higher compression ratios) and lz4 
> (better performance).
> What does everyone think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to