Hi Antoine, The parquet-format source tree is already versioned, do we really need > something else?
At this point, I'm hoping not. But there have been prior attempts to define what V2 is (or at least core features [1]). I think two things have happened over the course of time: 1. We've de-emphasized versioning in general and are now trying to document feature support explicitly [2] 2. Over the past few years most OSS implementations we know about actually support most of the initial novelties introduced as part of the V2 effort. > I agree with the changes you propose, but I'd rather we refrain from > branding it as "V2". Agreed, I don't want to brand this as anything or make a big deal about it. I think the proposed changes try to de-emphasize Parquet V2/2.0. Please let me know if there are other places where you think we can improve this. So really "closing out 2.0" in my mind mostly makes any existing distinction between V1 and V2 disappear for downstream consumers of Parquet. Cheers, Micah [1] https://github.com/apache/parquet-format/pull/164 [2] https://parquet.apache.org/docs/file-format/implementationstatus/ On Mon, Dec 8, 2025 at 12:16 AM Antoine Pitrou <[email protected]> wrote: > > The parquet-format source tree is already versioned, do we really need > something else? > > I agree with the changes you propose, but I'd rather we refrain from > branding it as "V2". > > Regards > > Antoine. > > > On Fri, 5 Dec 2025 14:55:36 -0800 > Micah Kornfield <[email protected]> > wrote: > > There still appears to be a recurring question for what exactly > constitutes > > Parquet 2.0. > > > > Given current implementation statuses, my suggestion is to not mention > 2.0 > > in general. I've made a proposed changes > > <https://github.com/apache/parquet-format/pull/535> [1] to this effect > in a > > parquet-format repo to try to give guidance that: > > > > 1. All encodings documented can now be used regardless of page type. > > 2. DataPageHeaderV2 is now widely supported by readers > > 3. Versions should be populated with "1", but readers should accept "1" > > and "2". > > > > Thoughts? Does this seem like a reasonable path forward? > > > > Thanks, > > Micah > > > > > > [1] https://github.com/apache/parquet-format/pull/535 > > > > > >
