To try to move the conversation forward I made two PRs:

1. parquet-format (https://github.com/apache/parquet-format/pull/588):
TL;DR;
        -  Changes language to use recommended specification version as the
mechanism for configuration.
        -  Commits to using SemVer for parquet-format releases going
forward (all forwards incompatible changes, including encodings, etc) will
bump the version number.
        -  Adds a proposal for a new PARX magic number that has a new fixed
length component to the file footer composed of (metadata_len, feature
bitmap, CRC for footer and 'PARX' trailer). This also unifies encrypted and
unencrypted parquet files.


2.  A POC in Rust on how this could be implemented (
https://github.com/apache/arrow-rs/pull/10177), including its usage with
path_in_footer

I think this meets the implicit requirements discussed in this thread.
Namely:

1.  It allows users to think about versions in a canonical way for feature
enablement.
2.  Keep recommendations that won't push default versions too quickly.
3.  Allows for continuous and iterative releases of the specification.
4.  Allows flexibility for readers to determine at a granular level if they
can properly read the file (and forces a single forward incompatible change
so we aren't relying on guesswork for when a writer feature can be safely
enabled).
5.  Limits the need for new magic numbers past a single new value.

Please let me know if I missed something.  If we can gain consensus around
this, I can add a Java implementation so we can adopt the changes and vote
on them.

Cheers,
Micah


On Mon, Jun 15, 2026 at 7:45 AM Russell Spitzer <[email protected]>
wrote:

> I agree this is getting a bit too complicated, I feel like everyone here
> understands versions as does the wider community. Why not just start there
> and add other techniques if that fails to work properly or be effective for
> communication.
>
>
> I think we are better of just choosing something simple and going forward
> rather than deliberating, the worst thing that happens is that we have to
> make a different choice later. I’m not sure that’s worse than sitting on
> what we currently have and not being able to make progress on new encodings
> or footers.
>
>
> On Mon, Jun 15, 2026 at 3:28 AM Micah Kornfield <[email protected]>
> wrote:
>
> > Thank you for the feedback Andrew. Practically speaking, I wonder if we
> > should have two separate notions of feature bundling:
> >
> > 1.  Specification version (this would be primary and risks using features
> > that aren't widely adopted).
> > 2.  Presets - Gives users a different way of configuring things that
> allows
> > for better guarantees about compatibility in the ecosystem.
> >
> >
> >
> > On Friday, June 12, 2026, Andrew Bell <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > This discussion and the proposals seem to have gotten very complicated.
> > > People not on this mailing list or not doing regular development on
> > Parquet
> > > would probably benefit from simplicity. Most people are used to version
> > > numbers without worrying about various types of compatibility --
> readers
> > > can simply state "I can read version 3-7", for example. Users
> understand
> > > this. People writing files can also easily understand "I want to write
> a
> > > version 6 file because version 6 supports feature X that I want."  or
> "I
> > > want to write version 7 because it's the latest version."
> > >
> > > I don't really care about the details of a solution, but please keep in
> > > mind that a more simple solution probably increases accessibility for
> the
> > > widest range of people.
> > >
> > > --
> > > Andrew Bell
> > > [email protected]
> > >
> >
>

Reply via email to