At the sync today, the idea was to formally vote on these changes.

Antoine, I'm not clear if we've reached consensus or we need more
discussion.

Anybody else have concerns with the doc updates?  I'll try to start a vote
next week if no more issues come up.

I agree that V2 is a bit meaningless on its own. I think it would be more
> valuable to establish a baseline of what's expected to be supported today,
> and then we can build from there.


I think this work has stalled today, but would be good to pick up again.  I
like Antoine's ideas on presets:
https://github.com/apache/parquet-format/issues/384#issuecomment-3406653123 as
a solution here.  Hopefully, we can restart this process formally, now that
we are getting good coverage on the feature matrix (I think we are probably
missing a few key systems, like Spark, Iceberg and Trino, which at least
have their own readers).

Thanks,
Micah

On Tue, Dec 9, 2025 at 12:02 PM Fokko Driesprong <[email protected]> wrote:

> Thanks, Micah, for taking the lead here.
>
> I agree that V2 is a bit meaningless on its own. I think it would be more
> valuable to establish a baseline of what's expected to be supported today,
> and then we can build from there.
>
> Kind regards,
> Fokko
>
> Op ma 8 dec 2025 om 12:30 schreef Andrew Lamb <[email protected]>:
>
> > > So really "closing out 2.0" in my mind mostly makes any existing
> > > distinction between V1 and V2 disappear for downstream consumers of
> > Parquet.
> >
> > I personally think stopping discussing "V2" vs "V1" would improve the
> > overall understanding of the state of Parquet implementations.
> >
> >
> >
> > On Mon, Dec 8, 2025 at 3:46 AM Micah Kornfield <[email protected]>
> > wrote:
> >
> > > Hi Antoine,
> > >
> > > The parquet-format source tree is already versioned, do we really need
> > > > something else?
> > >
> > >
> > > At this point, I'm hoping not. But there have been prior attempts to
> > define
> > > what V2 is (or at least core features [1]).   I think two things have
> > > happened over the course of time:
> > >
> > > 1. We've de-emphasized versioning in general and are now trying to
> > document
> > > feature support explicitly [2]
> > > 2. Over the past few years most OSS implementations we know about
> > actually
> > > support most of the initial novelties introduced as part of the V2
> > effort.
> > >
> > >
> > > > I agree with the changes you propose, but I'd rather we refrain from
> > > > branding it as "V2".
> > >
> > >
> > > Agreed, I don't want to brand this as anything or make a big deal about
> > it.
> > > I think the proposed changes try to de-emphasize Parquet V2/2.0. Please
> > let
> > > me know if there are other places where you think we can improve this.
> > >
> > > So really "closing out 2.0" in my mind mostly makes any existing
> > > distinction between V1 and V2 disappear for downstream consumers of
> > > Parquet.
> > >
> > > Cheers,
> > > Micah
> > >
> > > [1] https://github.com/apache/parquet-format/pull/164
> > > [2] https://parquet.apache.org/docs/file-format/implementationstatus/
> > >
> > > On Mon, Dec 8, 2025 at 12:16 AM Antoine Pitrou <[email protected]>
> > wrote:
> > >
> > > >
> > > > The parquet-format source tree is already versioned, do we really
> need
> > > > something else?
> > > >
> > > > I agree with the changes you propose, but I'd rather we refrain from
> > > > branding it as "V2".
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > On Fri, 5 Dec 2025 14:55:36 -0800
> > > > Micah Kornfield <[email protected]>
> > > > wrote:
> > > > > There still appears to be a recurring question for what exactly
> > > > constitutes
> > > > > Parquet 2.0.
> > > > >
> > > > > Given current implementation statuses, my suggestion is to not
> > mention
> > > > 2.0
> > > > > in general.  I've made a proposed changes
> > > > > <https://github.com/apache/parquet-format/pull/535> [1] to this
> > effect
> > > > in a
> > > > > parquet-format repo to try to give guidance that:
> > > > >
> > > > > 1.  All encodings documented can now be used regardless of page
> type.
> > > > > 2.  DataPageHeaderV2 is now widely supported by readers
> > > > > 3.  Versions should be populated with "1", but readers should
> accept
> > > "1"
> > > > > and "2".
> > > > >
> > > > > Thoughts?  Does this seem like a reasonable path forward?
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > >
> > > > > [1] https://github.com/apache/parquet-format/pull/535
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to