I'm also in favor of disabling support for now. Having to deal with broken 
files or the detection of various incompatible implementations in the long-term 
will harm more than not supporting LZ4 for a while. Snappy is generally more 
used than LZ4 in this category as it has been available since the inception of 
Parquet and thus should be considered as a viable alternative.

Cheers
Uwe

On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote:
> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org> wrote:
> >
> >
> > Le 25/06/2020 à 00:02, Wes McKinney a écrit :
> > > hi folks,
> > >
> > > (cross-posting to dev@arrow and dev@parquet since there are
> > > stakeholders in both places)
> > >
> > > It seems there are still problems at least with the C++ implementation
> > > of LZ4 compression in Parquet files
> > >
> > > https://issues.apache.org/jira/browse/PARQUET-1241
> > > https://issues.apache.org/jira/browse/PARQUET-1878
> >
> > I don't have any particular opinion on how to solve the LZ4 issue, but
> > I'd like to mention that LZ4 and ZStandard are the two most efficient
> > compression algorithms available, and they span different parts of the
> > speed/compression spectrum, so it would be a pity to disable one of them.
> 
> It's true, however I think it's worse to write LZ4-compressed files
> that cannot be read by other Parquet implementations (if that's what's
> happening as I understand it?). If we are indeed shipping something
> broken then we either should fix it or disable it until it can be
> fixed.
> 
> > Regards
> >
> > Antoine.
>

Reply via email to