[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579852#comment-16579852 ]
Wes McKinney commented on PARQUET-1241: --------------------------------------- [~ee07b291] would you be able to contribute a patch to Apache Arrow to add support for the framed LZ4 format? > Use LZ4 frame format > -------------------- > > Key: PARQUET-1241 > URL: https://issues.apache.org/jira/browse/PARQUET-1241 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp, parquet-format > Reporter: Lawrence Chan > Priority: Major > > The parquet-format spec doesn't currently specify whether lz4-compressed data > should be framed or not. We should choose one and make it explicit in the > spec, as they are not inter-operable. After some discussions with others [1], > we think it would be beneficial to use the framed format, which adds a small > header in exchange for more self-contained decompression as well as a richer > feature set (checksums, parallel decompression, etc). > The current arrow implementation compresses using the lz4 block format, and > this would need to be updated when we add the spec clarification. > If backwards compatibility is a concern, I would suggest adding an additional > LZ4_FRAMED compression type, but that may be more noise than anything. > [1] https://github.com/dask/fastparquet/issues/314 -- This message was sent by Atlassian JIRA (v7.6.3#76005)