While working to document what features are forwards incompatible and what
parquet-format version they were introduced in[1], it occurs to me that we
**already have** a versioning scheme that is frequently released, time
based and clearly defines feature levels:

parquet-format version (e.g. 2.11, 2.12, etc).  <---- We already have this!

The only missing piece is that parquet-format version is not recorded in
the metadata itself. However we could add it, in a backwards compatible
way: I made two RFCs for discussion[2][3].

Andrew

ps. If you squint, I think the parquet-format versions look a lot like a
combination of "preset" and versions, which is also a nice property.

[1]: https://github.com/apache/parquet-site/pull/186
[2]: https://github.com/apache/parquet-format/pull/581
[3]: https://github.com/apache/parquet-format/pull/582


On Tue, Jun 9, 2026 at 6:01 AM Antoine Pitrou <[email protected]> wrote:

>
> Hi Ryan,
>
> Le 05/06/2026 à 21:56, Ryan Blue a écrit :
> >
> > Before responding to the preset idea, I want to note that the version
> > numbers we are talking about aren’t editorializing. The idea isn’t a big
> > marketing splash around Parquet 3. Instead, think of it as a
> compatibility
> > identifier or epoch.
>
> Note that the term "epoch" hints towards the idea of calendar-based
> presets ;-)
>
> > The purpose of this is to more easily and clearly communicate
> compatibility
> > across projects. Downstream users want to know what projects work
> together.
> > They do that at a high level, not at the granularity of individual
> changes.
> > I would never expect a user to ask “does this version of Trino support
> > files that didn’t write path_in_schema?”
>
> That's the point of presets as well.
>
> > As I understand it from the description in this thread, a preset gives a
> > snapshot of the things that are common to a subset of implementations,
> say
> > Java, Rust, and C++. It would also give users a way to ask some
> questions,
> > like “when was ALP supported?” in all 3.
>
> Right.
>
> > But that isn’t sufficient for checking compatibility when there are other
> > implementations. A big issue is that vendors have custom implementations
> > that aren’t covered. Other implementations could document their
> > compatibility with presets, but only after the time-based milestone
> because
> > we don’t know what’s in a preset until the time passes.
>
> Well, you also don't know what's in a version until the version gets
> decided upon. Unless we commit on a frequent enough version decision
> process, versions won't really be an improvement in that regard.
>
> > A preset also doesn’t give useful control over compatibility. Let’s
> assume
> > I’m only using implementations that are part of the preset list. If I
> have
> > an engine that uses an older C++ version, how does a preset help me
> > maintain compatibility with it?
>
> I'm not sure what the problem is, or how it's different from "V2" vs.
> "V3". You just pass whichever preset matches your old C++ version.
> For example, if you're using a Parquet C++ release from February 2023,
> then you know you shouldn't be using a preset more recent than 2023-02.
>
> > If I understand correctly, the oldest
> > preset (that C++ library) would determine which write-side features I am
> > able to switch on. Do I need to go update a config file of flags
> somewhere
> > to do that?
>
> As usual, API choices would depend on the Parquet writer implementation.
> I'm also not sure how this is a different concern than with the "V3"
> proposal, where you also have to pass that information to the Parquet
> writer implementation.
>
> But presumably, most Parquet writer implementations will let you pass
> this as a function argument.
>
> > But at this point, a preset as a bundle
> > of features is basically the same thing as a version number. It’s a
> bundle
> > of features you switch on as a group.
>
> It is exactly that, except that it's:
>
> 1) produced mechanistically instead of requiring periodic decisions by
> the community (which has been, up to now, unable to commit to make such
> periodic decisions)
>
> 2) explicitly labeled as a calendar-based compatibility marker, which
> makes it easier to understand and decide upon for the user than an
> opaque "V3" label.
>
> In Parquet C++ we already have a bunch of optional version toggles that
> users can pass, but even experts struggle to remember what they are
> about. I wouldn't like to expose *yet another* version toggle that
> nobody will dare specify.
>
> > Another drawback of presets is that it doesn’t require agreement.
>
> To me it's a feature, not a drawback :-) Instead of lengthy discussions
> about what's important, presets are based on concrete facts.
>
> > Versioning makes this simpler because we all get to determine what
> > gets added as we make changes, and finalizing a version lets us review
> that
> > list. The list is also constantly available so that all implementations
> can
> > keep up with it before it is finalized.
>
> I think it's a good argument in favor of versions. I also think it's the
> only one. And it is conditioned on our ability to decide quickly enough
> on new version numbers.
>
> Regards
>
> Antoine.
>
>
>

Reply via email to