Well, it depends how important speed is, but LZ4 has extremely fast
decompression, even compared to Snappy:
https://github.com/lz4/lz4#benchmarks

Regards

Antoine.


Le 02/07/2020 à 19:47, Christian Hudon a écrit :
> At least for us, the advantages of Parquet are speed and interoperability
> in the context of longer-term data storage, so I would tend to say
> "reasonably conservative".
> 
> Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou <solip...@pitrou.net> a
> écrit :
> 
>>
>> I don't have a sense of how conservative Parquet users generally are.
>> Is it worth adding a LZ4_FRAMED compression option in the Parquet
>> format, or would people just not use it?
>>
>> Regards
>>
>> Antoine.
>>
>>
>> On Tue, 30 Jun 2020 14:33:17 +0200
>> "Uwe L. Korn" <uw...@xhochy.com> wrote:
>>> I'm also in favor of disabling support for now. Having to deal with
>> broken files or the detection of various incompatible implementations in
>> the long-term will harm more than not supporting LZ4 for a while. Snappy is
>> generally more used than LZ4 in this category as it has been available
>> since the inception of Parquet and thus should be considered as a viable
>> alternative.
>>>
>>> Cheers
>>> Uwe
>>>
>>> On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote:
>>>> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org>
>> wrote:
>>>>>
>>>>>
>>>>> Le 25/06/2020 à 00:02, Wes McKinney a écrit :
>>>>>> hi folks,
>>>>>>
>>>>>> (cross-posting to dev@arrow and dev@parquet since there are
>>>>>> stakeholders in both places)
>>>>>>
>>>>>> It seems there are still problems at least with the C++
>> implementation
>>>>>> of LZ4 compression in Parquet files
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/PARQUET-1241
>>>>>> https://issues.apache.org/jira/browse/PARQUET-1878
>>>>>
>>>>> I don't have any particular opinion on how to solve the LZ4 issue,
>> but
>>>>> I'd like to mention that LZ4 and ZStandard are the two most efficient
>>>>> compression algorithms available, and they span different parts of
>> the
>>>>> speed/compression spectrum, so it would be a pity to disable one of
>> them.
>>>>
>>>> It's true, however I think it's worse to write LZ4-compressed files
>>>> that cannot be read by other Parquet implementations (if that's what's
>>>> happening as I understand it?). If we are indeed shipping something
>>>> broken then we either should fix it or disable it until it can be
>>>> fixed.
>>>>
>>>>> Regards
>>>>>
>>>>> Antoine.
>>>>
>>>
>>
>>
>>
>>
> 

Reply via email to