Well, the features table still has columns "V1" and "V2". If we agree
that 2.0.0 was not special compared to other parquet-format releases (in
particular, it didn't break forwards compatibility), then why single it out?
Le 10/06/2026 à 13:03, Andrew Lamb a écrit :
Thanks for taking a look.
I think this is a nice doc page, except that it's inventing an a
posteriori meaning for "V1" and "V2".
I agree that earlier versions of the document did try and invent meaning,
which Ed and Jorris pointed out[1], and I have tried to remove in several
updates like [2] (and [3] this morning) I would be happy to remove or
reword any additional sections you think are implying such a meaning
The current proposed wording is this:
FileMetadata version field
Each Parquet file has a version field in the thrift FileMetadata. This
field has
historically been used inconsistently: writers populate 1 or 2
without a consistent relationship to the features actually used. See the
note in parquet.thrift and this discussion for details.
parquet-format release versions
The Thrift definition is released independently of implementations such
as
parquet-java or arrow-rs, following the Apache release process. Note that
release
numbering DOES NOT FOLLOW semantic versioning:
1. The major version corresponds to the thrift FileMetadata version field.
2. Minor releases (e.g. 2.10.0 to 2.11.0) sometimes contain forward
incompatible
features. The minor version is not recorded in the file itself.
Are there other parts of the document you feel incorrectly imply a meaning
for V1 and V2?
Thanks,
Andrew
[1]: https://github.com/apache/parquet-site/pull/186#discussion_r3380588765
[2]:
https://github.com/apache/parquet-site/pull/186/commits/89159332dc770c64d88f48fcdeb24be53fc82161
[3]:
https://github.com/apache/parquet-site/pull/186/commits/0b3a17f0e8cddc39eeccdc3ca2fbd7e2def0b077
On Wed, Jun 10, 2026 at 5:01 AM Antoine Pitrou <[email protected]> wrote:
hi Andrew,
I think this is a nice doc page, except that it's inventing an a
posteriori meaning for "V1" and "V2". Why is it useful? Why single out
V2 aka. parquet-format 2.0.0?
Regards
Antoine.
Le 05/06/2026 à 16:18, Andrew Lamb a écrit :
Dear Parquet Fans,
I have become convinced over the last few discussions that it is more
important than ever to document clearly what V1 and V2 mean (including
the
messy reality)
Thus, I spent several hours documenting Parquet features and when each
was
introduced [1]. I would love any feedback you may have.
Thank you,
Andrew
[1]: https://github.com/apache/parquet-site/pull/186