Hi Ryan,

Le 05/06/2026 à 21:56, Ryan Blue a écrit :

Before responding to the preset idea, I want to note that the version
numbers we are talking about aren’t editorializing. The idea isn’t a big
marketing splash around Parquet 3. Instead, think of it as a compatibility
identifier or epoch.

Note that the term "epoch" hints towards the idea of calendar-based presets ;-)

The purpose of this is to more easily and clearly communicate compatibility
across projects. Downstream users want to know what projects work together.
They do that at a high level, not at the granularity of individual changes.
I would never expect a user to ask “does this version of Trino support
files that didn’t write path_in_schema?”

That's the point of presets as well.

As I understand it from the description in this thread, a preset gives a
snapshot of the things that are common to a subset of implementations, say
Java, Rust, and C++. It would also give users a way to ask some questions,
like “when was ALP supported?” in all 3.

Right.

But that isn’t sufficient for checking compatibility when there are other
implementations. A big issue is that vendors have custom implementations
that aren’t covered. Other implementations could document their
compatibility with presets, but only after the time-based milestone because
we don’t know what’s in a preset until the time passes.

Well, you also don't know what's in a version until the version gets decided upon. Unless we commit on a frequent enough version decision process, versions won't really be an improvement in that regard.

A preset also doesn’t give useful control over compatibility. Let’s assume
I’m only using implementations that are part of the preset list. If I have
an engine that uses an older C++ version, how does a preset help me
maintain compatibility with it?

I'm not sure what the problem is, or how it's different from "V2" vs. "V3". You just pass whichever preset matches your old C++ version. For example, if you're using a Parquet C++ release from February 2023, then you know you shouldn't be using a preset more recent than 2023-02.

If I understand correctly, the oldest
preset (that C++ library) would determine which write-side features I am
able to switch on. Do I need to go update a config file of flags somewhere
to do that?

As usual, API choices would depend on the Parquet writer implementation. I'm also not sure how this is a different concern than with the "V3" proposal, where you also have to pass that information to the Parquet writer implementation.

But presumably, most Parquet writer implementations will let you pass this as a function argument.

But at this point, a preset as a bundle
of features is basically the same thing as a version number. It’s a bundle
of features you switch on as a group.

It is exactly that, except that it's:

1) produced mechanistically instead of requiring periodic decisions by the community (which has been, up to now, unable to commit to make such periodic decisions)

2) explicitly labeled as a calendar-based compatibility marker, which makes it easier to understand and decide upon for the user than an opaque "V3" label.

In Parquet C++ we already have a bunch of optional version toggles that users can pass, but even experts struggle to remember what they are about. I wouldn't like to expose *yet another* version toggle that nobody will dare specify.

Another drawback of presets is that it doesn’t require agreement.

To me it's a feature, not a drawback :-) Instead of lengthy discussions about what's important, presets are based on concrete facts.

Versioning makes this simpler because we all get to determine what
gets added as we make changes, and finalizing a version lets us review that
list. The list is also constantly available so that all implementations can
keep up with it before it is finalized.

I think it's a good argument in favor of versions. I also think it's the only one. And it is conditioned on our ability to decide quickly enough on new version numbers.

Regards

Antoine.


Reply via email to