following on from the discussion today

   1. I can see the benefits in tagging it as optional
   2. it would be a long time before the systems I field support calls over
   would stop generating it because we don't know where data would end up
   being used.
   3. For those people who are encountering major problems here, it would
   at least be possible to say "provided you intend to only work with versions
   of <product> dated 2027 or newer, all is good.

making the field optional as soon as possible would increase the time at
which parquet releases can actually stop adding the field.

Being able to tie it to a non-backwards-compatible database change (and I'm
thinking Iceberg v4 tables) would provide a clear way to scope that
incompatibility. Imagine if iceberg was set up to turn the feature of when
generating files for v4 tables, knowing all applications which could read
the tables wouldn't need path_in_schema. *regardless of the language of
that implementation*

steve

On Mon, 20 Apr 2026 at 09:34, Gang Wu <[email protected]> wrote:

> Thanks Ed for raising this!
>
> Overall I'm +1 to this. We need input from others since it is a slight
> breaking change.
>
> Best,
> Gang
>
> On Thu, Apr 9, 2026 at 9:41 PM Ed Seidl <[email protected]> wrote:
>
> > Hi All,
> >
> > Following a lively discussion on this list, I thought I’d take a stab at
> > addressing one pain point in the Parquet footer. I’ve put up a proposal
> [1]
> > and PR [2] to switch path_in_schema in the ColumnMetaData from “required”
> > to “optional”. I’ve also whipped up PoCs in Rust [3] and Java [4].
> >
> > Please take a look and let’s discuss in the PR.
> >
> > Thanks,
> > Ed
> >
> > [1] https://github.com/apache/parquet-format/issues/563
> > [2] https://github.com/apache/parquet-format/pull/564
> > [3] https://github.com/apache/arrow-rs/pull/9678
> > [4] https://github.com/apache/parquet-java/pull/3470
> >
>

Reply via email to