Hi all,
Quick update on this. A third PoC implementation in arrow-cpp has been created 
[1], and a file
without the path_in_schema field (created with arrow-rs) has been submitted to 
parquet-testing [2]. I've confirmed that the java and cpp PoCs can properly 
read the file. I'll be proposing a vote on this proposal soon if no objections 
are raised here or in the PR [3].

Cheers,
Ed

[1] https://github.com/apache/arrow/pull/49707
[2] https://github.com/apache/parquet-testing/pull/108
[3] https://github.com/apache/parquet-format/pull/564

On 2026/04/22 20:58:46 Micah Kornfield wrote:
> I need to review the implementations more carefully, but I think this looks
> good.  Maybe we should give people through next week for people to review
> and then we can start a vote?
> 
> On Wed, Apr 22, 2026 at 1:45 PM Steve Loughran <[email protected]> wrote:
> 
> > following on from the discussion today
> >
> >
> >    1. I can see the benefits in tagging it as optional
> >    2. it would be a long time before the systems I field support calls over
> >    would stop generating it because we don't know where data would end up
> >    being used.
> >    3. For those people who are encountering major problems here, it would
> >    at least be possible to say "provided you intend to only work with
> > versions
> >    of <product> dated 2027 or newer, all is good.
> >
> > making the field optional as soon as possible would increase the time at
> > which parquet releases can actually stop adding the field.
> >
> > Being able to tie it to a non-backwards-compatible database change (and I'm
> > thinking Iceberg v4 tables) would provide a clear way to scope that
> > incompatibility. Imagine if iceberg was set up to turn the feature of when
> > generating files for v4 tables, knowing all applications which could read
> > the tables wouldn't need path_in_schema. *regardless of the language of
> > that implementation*
> >
> > steve
> >
> > On Mon, 20 Apr 2026 at 09:34, Gang Wu <[email protected]> wrote:
> >
> > > Thanks Ed for raising this!
> > >
> > > Overall I'm +1 to this. We need input from others since it is a slight
> > > breaking change.
> > >
> > > Best,
> > > Gang
> > >
> > > On Thu, Apr 9, 2026 at 9:41 PM Ed Seidl <[email protected]> wrote:
> > >
> > > > Hi All,
> > > >
> > > > Following a lively discussion on this list, I thought I’d take a stab
> > at
> > > > addressing one pain point in the Parquet footer. I’ve put up a proposal
> > > [1]
> > > > and PR [2] to switch path_in_schema in the ColumnMetaData from
> > “required”
> > > > to “optional”. I’ve also whipped up PoCs in Rust [3] and Java [4].
> > > >
> > > > Please take a look and let’s discuss in the PR.
> > > >
> > > > Thanks,
> > > > Ed
> > > >
> > > > [1] https://github.com/apache/parquet-format/issues/563
> > > > [2] https://github.com/apache/parquet-format/pull/564
> > > > [3] https://github.com/apache/arrow-rs/pull/9678
> > > > [4] https://github.com/apache/parquet-java/pull/3470
> > > >
> > >
> >
> 

Reply via email to