Thanks, Micah, for taking the lead here. I agree that V2 is a bit meaningless on its own. I think it would be more valuable to establish a baseline of what's expected to be supported today, and then we can build from there.
Kind regards, Fokko Op ma 8 dec 2025 om 12:30 schreef Andrew Lamb <[email protected]>: > > So really "closing out 2.0" in my mind mostly makes any existing > > distinction between V1 and V2 disappear for downstream consumers of > Parquet. > > I personally think stopping discussing "V2" vs "V1" would improve the > overall understanding of the state of Parquet implementations. > > > > On Mon, Dec 8, 2025 at 3:46 AM Micah Kornfield <[email protected]> > wrote: > > > Hi Antoine, > > > > The parquet-format source tree is already versioned, do we really need > > > something else? > > > > > > At this point, I'm hoping not. But there have been prior attempts to > define > > what V2 is (or at least core features [1]). I think two things have > > happened over the course of time: > > > > 1. We've de-emphasized versioning in general and are now trying to > document > > feature support explicitly [2] > > 2. Over the past few years most OSS implementations we know about > actually > > support most of the initial novelties introduced as part of the V2 > effort. > > > > > > > I agree with the changes you propose, but I'd rather we refrain from > > > branding it as "V2". > > > > > > Agreed, I don't want to brand this as anything or make a big deal about > it. > > I think the proposed changes try to de-emphasize Parquet V2/2.0. Please > let > > me know if there are other places where you think we can improve this. > > > > So really "closing out 2.0" in my mind mostly makes any existing > > distinction between V1 and V2 disappear for downstream consumers of > > Parquet. > > > > Cheers, > > Micah > > > > [1] https://github.com/apache/parquet-format/pull/164 > > [2] https://parquet.apache.org/docs/file-format/implementationstatus/ > > > > On Mon, Dec 8, 2025 at 12:16 AM Antoine Pitrou <[email protected]> > wrote: > > > > > > > > The parquet-format source tree is already versioned, do we really need > > > something else? > > > > > > I agree with the changes you propose, but I'd rather we refrain from > > > branding it as "V2". > > > > > > Regards > > > > > > Antoine. > > > > > > > > > On Fri, 5 Dec 2025 14:55:36 -0800 > > > Micah Kornfield <[email protected]> > > > wrote: > > > > There still appears to be a recurring question for what exactly > > > constitutes > > > > Parquet 2.0. > > > > > > > > Given current implementation statuses, my suggestion is to not > mention > > > 2.0 > > > > in general. I've made a proposed changes > > > > <https://github.com/apache/parquet-format/pull/535> [1] to this > effect > > > in a > > > > parquet-format repo to try to give guidance that: > > > > > > > > 1. All encodings documented can now be used regardless of page type. > > > > 2. DataPageHeaderV2 is now widely supported by readers > > > > 3. Versions should be populated with "1", but readers should accept > > "1" > > > > and "2". > > > > > > > > Thoughts? Does this seem like a reasonable path forward? > > > > > > > > Thanks, > > > > Micah > > > > > > > > > > > > [1] https://github.com/apache/parquet-format/pull/535 > > > > > > > > > > > > > > > > > > >
