I've commented on the PR, but its contents look good to me on the
principle. Thanks a lot, Micah :-)
Le 11/12/2025 à 09:10, Micah Kornfield a écrit :
At the sync today, the idea was to formally vote on these changes.
Antoine, I'm not clear if we've reached consensus or we need more
discussion.
Anybody else have concerns with the doc updates? I'll try to start a vote
next week if no more issues come up.
I agree that V2 is a bit meaningless on its own. I think it would be more
valuable to establish a baseline of what's expected to be supported today,
and then we can build from there.
I think this work has stalled today, but would be good to pick up again. I
like Antoine's ideas on presets:
https://github.com/apache/parquet-format/issues/384#issuecomment-3406653123 as
a solution here. Hopefully, we can restart this process formally, now that
we are getting good coverage on the feature matrix (I think we are probably
missing a few key systems, like Spark, Iceberg and Trino, which at least
have their own readers).
Thanks,
Micah
On Tue, Dec 9, 2025 at 12:02 PM Fokko Driesprong <[email protected]> wrote:
Thanks, Micah, for taking the lead here.
I agree that V2 is a bit meaningless on its own. I think it would be more
valuable to establish a baseline of what's expected to be supported today,
and then we can build from there.
Kind regards,
Fokko
Op ma 8 dec 2025 om 12:30 schreef Andrew Lamb <[email protected]>:
So really "closing out 2.0" in my mind mostly makes any existing
distinction between V1 and V2 disappear for downstream consumers of
Parquet.
I personally think stopping discussing "V2" vs "V1" would improve the
overall understanding of the state of Parquet implementations.
On Mon, Dec 8, 2025 at 3:46 AM Micah Kornfield <[email protected]>
wrote:
Hi Antoine,
The parquet-format source tree is already versioned, do we really need
something else?
At this point, I'm hoping not. But there have been prior attempts to
define
what V2 is (or at least core features [1]). I think two things have
happened over the course of time:
1. We've de-emphasized versioning in general and are now trying to
document
feature support explicitly [2]
2. Over the past few years most OSS implementations we know about
actually
support most of the initial novelties introduced as part of the V2
effort.
I agree with the changes you propose, but I'd rather we refrain from
branding it as "V2".
Agreed, I don't want to brand this as anything or make a big deal about
it.
I think the proposed changes try to de-emphasize Parquet V2/2.0. Please
let
me know if there are other places where you think we can improve this.
So really "closing out 2.0" in my mind mostly makes any existing
distinction between V1 and V2 disappear for downstream consumers of
Parquet.
Cheers,
Micah
[1] https://github.com/apache/parquet-format/pull/164
[2] https://parquet.apache.org/docs/file-format/implementationstatus/
On Mon, Dec 8, 2025 at 12:16 AM Antoine Pitrou <[email protected]>
wrote:
The parquet-format source tree is already versioned, do we really
need
something else?
I agree with the changes you propose, but I'd rather we refrain from
branding it as "V2".
Regards
Antoine.
On Fri, 5 Dec 2025 14:55:36 -0800
Micah Kornfield <[email protected]>
wrote:
There still appears to be a recurring question for what exactly
constitutes
Parquet 2.0.
Given current implementation statuses, my suggestion is to not
mention
2.0
in general. I've made a proposed changes
<https://github.com/apache/parquet-format/pull/535> [1] to this
effect
in a
parquet-format repo to try to give guidance that:
1. All encodings documented can now be used regardless of page
type.
2. DataPageHeaderV2 is now widely supported by readers
3. Versions should be populated with "1", but readers should
accept
"1"
and "2".
Thoughts? Does this seem like a reasonable path forward?
Thanks,
Micah
[1] https://github.com/apache/parquet-format/pull/535