Re: parquet 2 incompatibility between 0.16 and 0.17?

Micah Kornfield Thu, 30 Apr 2020 10:30:25 -0700

This sounds like something we might want to do and issue a patch release.
It seems bad to default to a non-production version?


I can try to take a look tonight at a patch of no gets to it before.

Thanks,
Micah

On Wednesday, April 29, 2020, Wes McKinney <wesmck...@gmail.com> wrote:

> On Wed, Apr 29, 2020 at 6:15 PM Pierre Belzile <pierre.belz...@gmail.com>
> wrote:
> >
> > Wes,
> >
> > You used the words "forward compatible". Does this mean that 0.17 is able
> > to decode 0.16 datapagev2?
>
> 0.16 doesn't write DataPageV2 at all, the version flag only determines
> the type casting and metadata behavior I indicated in my email. The
> changes in
>
> https://github.com/apache/arrow/commit/809d40ab9518bd254705f35af01162
> a9da588516
>
> enabled the use of DataPageV2 and I/we didn't think about the forward
> compatibility issue (version=2.0 files written in 0.17.0 being
> unreadable in 0.16.0). We might actually want to revert this (just the
> toggle between DataPageV1/V2, not the whole patch).
>
>
>
> > Crossing my fingers...
> >
> > Pierre
> >
> > Le mer. 29 avr. 2020 à 19:05, Wes McKinney <wesmck...@gmail.com> a
> écrit :
> >
> > > Ah, so we have a slight mess on our hands because the patch for
> > > PARQUET-458 enabled the use of DataPageV2, which is not forward
> > > compatible with older version because the implementation was fixed
> > > (see the JIRA for more details)
> > >
> > >
> > > https://github.com/apache/arrow/commit/809d40ab9518bd254705f35af01162
> a9da588516
> > >
> > > Unfortunately, in Python the version='1.0' / version='2.0' flag is
> > > being used for two different purposes:
> > >
> > > * Expanded ConvertedType / LogicalType metadata, like unsigned types
> > > and nanosecond timestamps
> > > * DataPageV1 vs. DataPageV2 data pages
> > >
> > > I think we should separate these concepts and instead have a
> > > "compatibility mode" option regarding the ConvertedType/LogicalType
> > > annotations and the behavior around conversions when writing unsigned
> > > integers, nanosecond timestamps, and other types to Parquet V1 (which
> > > is the only "production" Parquet format).
> > >
> > > On Wed, Apr 29, 2020 at 5:56 PM Pierre Belzile <
> pierre.belz...@gmail.com>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > We've been using the parquet 2 format (mostly because of nanosecond
> > > > resolution). I'm getting crashes in the C++ parquet decoder, arrow
> 0.16,
> > > > when decoding a parquet 2 file created with pyarrow 0.17.0. Is this
> > > > expected? Would a 0.17 decode a 0.16?
> > > >
> > > > If that's not expected, I can put the debugger on it and see what is
> > > > happening. I suspect it's with string fields (regular, not large
> string).
> > > >
> > > > Cheers, Pierre
> > >
>

Re: parquet 2 incompatibility between 0.16 and 0.17?

Reply via email to