While working to document what features are forwards incompatible and what parquet-format version they were introduced in[1], it occurs to me that we **already have** a versioning scheme that is frequently released, time based and clearly defines feature levels:
parquet-format version (e.g. 2.11, 2.12, etc). <---- We already have this! The only missing piece is that parquet-format version is not recorded in the metadata itself. However we could add it, in a backwards compatible way: I made two RFCs for discussion[2][3]. Andrew ps. If you squint, I think the parquet-format versions look a lot like a combination of "preset" and versions, which is also a nice property. [1]: https://github.com/apache/parquet-site/pull/186 [2]: https://github.com/apache/parquet-format/pull/581 [3]: https://github.com/apache/parquet-format/pull/582 On Tue, Jun 9, 2026 at 6:01 AM Antoine Pitrou <[email protected]> wrote: > > Hi Ryan, > > Le 05/06/2026 à 21:56, Ryan Blue a écrit : > > > > Before responding to the preset idea, I want to note that the version > > numbers we are talking about aren’t editorializing. The idea isn’t a big > > marketing splash around Parquet 3. Instead, think of it as a > compatibility > > identifier or epoch. > > Note that the term "epoch" hints towards the idea of calendar-based > presets ;-) > > > The purpose of this is to more easily and clearly communicate > compatibility > > across projects. Downstream users want to know what projects work > together. > > They do that at a high level, not at the granularity of individual > changes. > > I would never expect a user to ask “does this version of Trino support > > files that didn’t write path_in_schema?” > > That's the point of presets as well. > > > As I understand it from the description in this thread, a preset gives a > > snapshot of the things that are common to a subset of implementations, > say > > Java, Rust, and C++. It would also give users a way to ask some > questions, > > like “when was ALP supported?” in all 3. > > Right. > > > But that isn’t sufficient for checking compatibility when there are other > > implementations. A big issue is that vendors have custom implementations > > that aren’t covered. Other implementations could document their > > compatibility with presets, but only after the time-based milestone > because > > we don’t know what’s in a preset until the time passes. > > Well, you also don't know what's in a version until the version gets > decided upon. Unless we commit on a frequent enough version decision > process, versions won't really be an improvement in that regard. > > > A preset also doesn’t give useful control over compatibility. Let’s > assume > > I’m only using implementations that are part of the preset list. If I > have > > an engine that uses an older C++ version, how does a preset help me > > maintain compatibility with it? > > I'm not sure what the problem is, or how it's different from "V2" vs. > "V3". You just pass whichever preset matches your old C++ version. > For example, if you're using a Parquet C++ release from February 2023, > then you know you shouldn't be using a preset more recent than 2023-02. > > > If I understand correctly, the oldest > > preset (that C++ library) would determine which write-side features I am > > able to switch on. Do I need to go update a config file of flags > somewhere > > to do that? > > As usual, API choices would depend on the Parquet writer implementation. > I'm also not sure how this is a different concern than with the "V3" > proposal, where you also have to pass that information to the Parquet > writer implementation. > > But presumably, most Parquet writer implementations will let you pass > this as a function argument. > > > But at this point, a preset as a bundle > > of features is basically the same thing as a version number. It’s a > bundle > > of features you switch on as a group. > > It is exactly that, except that it's: > > 1) produced mechanistically instead of requiring periodic decisions by > the community (which has been, up to now, unable to commit to make such > periodic decisions) > > 2) explicitly labeled as a calendar-based compatibility marker, which > makes it easier to understand and decide upon for the user than an > opaque "V3" label. > > In Parquet C++ we already have a bunch of optional version toggles that > users can pass, but even experts struggle to remember what they are > about. I wouldn't like to expose *yet another* version toggle that > nobody will dare specify. > > > Another drawback of presets is that it doesn’t require agreement. > > To me it's a feature, not a drawback :-) Instead of lengthy discussions > about what's important, presets are based on concrete facts. > > > Versioning makes this simpler because we all get to determine what > > gets added as we make changes, and finalizing a version lets us review > that > > list. The list is also constantly available so that all implementations > can > > keep up with it before it is finalized. > > I think it's a good argument in favor of versions. I also think it's the > only one. And it is conditioned on our ability to decide quickly enough > on new version numbers. > > Regards > > Antoine. > > >
