Well, it depends how important speed is, but LZ4 has extremely fast decompression, even compared to Snappy: https://github.com/lz4/lz4#benchmarks
Regards Antoine. Le 02/07/2020 à 19:47, Christian Hudon a écrit : > At least for us, the advantages of Parquet are speed and interoperability > in the context of longer-term data storage, so I would tend to say > "reasonably conservative". > > Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou <solip...@pitrou.net> a > écrit : > >> >> I don't have a sense of how conservative Parquet users generally are. >> Is it worth adding a LZ4_FRAMED compression option in the Parquet >> format, or would people just not use it? >> >> Regards >> >> Antoine. >> >> >> On Tue, 30 Jun 2020 14:33:17 +0200 >> "Uwe L. Korn" <uw...@xhochy.com> wrote: >>> I'm also in favor of disabling support for now. Having to deal with >> broken files or the detection of various incompatible implementations in >> the long-term will harm more than not supporting LZ4 for a while. Snappy is >> generally more used than LZ4 in this category as it has been available >> since the inception of Parquet and thus should be considered as a viable >> alternative. >>> >>> Cheers >>> Uwe >>> >>> On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote: >>>> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org> >> wrote: >>>>> >>>>> >>>>> Le 25/06/2020 à 00:02, Wes McKinney a écrit : >>>>>> hi folks, >>>>>> >>>>>> (cross-posting to dev@arrow and dev@parquet since there are >>>>>> stakeholders in both places) >>>>>> >>>>>> It seems there are still problems at least with the C++ >> implementation >>>>>> of LZ4 compression in Parquet files >>>>>> >>>>>> https://issues.apache.org/jira/browse/PARQUET-1241 >>>>>> https://issues.apache.org/jira/browse/PARQUET-1878 >>>>> >>>>> I don't have any particular opinion on how to solve the LZ4 issue, >> but >>>>> I'd like to mention that LZ4 and ZStandard are the two most efficient >>>>> compression algorithms available, and they span different parts of >> the >>>>> speed/compression spectrum, so it would be a pity to disable one of >> them. >>>> >>>> It's true, however I think it's worse to write LZ4-compressed files >>>> that cannot be read by other Parquet implementations (if that's what's >>>> happening as I understand it?). If we are indeed shipping something >>>> broken then we either should fix it or disable it until it can be >>>> fixed. >>>> >>>>> Regards >>>>> >>>>> Antoine. >>>> >>> >> >> >> >> >