Hi all, Quick update on this. A third PoC implementation in arrow-cpp has been created [1], and a file without the path_in_schema field (created with arrow-rs) has been submitted to parquet-testing [2]. I've confirmed that the java and cpp PoCs can properly read the file. I'll be proposing a vote on this proposal soon if no objections are raised here or in the PR [3].
Cheers, Ed [1] https://github.com/apache/arrow/pull/49707 [2] https://github.com/apache/parquet-testing/pull/108 [3] https://github.com/apache/parquet-format/pull/564 On 2026/04/22 20:58:46 Micah Kornfield wrote: > I need to review the implementations more carefully, but I think this looks > good. Maybe we should give people through next week for people to review > and then we can start a vote? > > On Wed, Apr 22, 2026 at 1:45 PM Steve Loughran <[email protected]> wrote: > > > following on from the discussion today > > > > > > 1. I can see the benefits in tagging it as optional > > 2. it would be a long time before the systems I field support calls over > > would stop generating it because we don't know where data would end up > > being used. > > 3. For those people who are encountering major problems here, it would > > at least be possible to say "provided you intend to only work with > > versions > > of <product> dated 2027 or newer, all is good. > > > > making the field optional as soon as possible would increase the time at > > which parquet releases can actually stop adding the field. > > > > Being able to tie it to a non-backwards-compatible database change (and I'm > > thinking Iceberg v4 tables) would provide a clear way to scope that > > incompatibility. Imagine if iceberg was set up to turn the feature of when > > generating files for v4 tables, knowing all applications which could read > > the tables wouldn't need path_in_schema. *regardless of the language of > > that implementation* > > > > steve > > > > On Mon, 20 Apr 2026 at 09:34, Gang Wu <[email protected]> wrote: > > > > > Thanks Ed for raising this! > > > > > > Overall I'm +1 to this. We need input from others since it is a slight > > > breaking change. > > > > > > Best, > > > Gang > > > > > > On Thu, Apr 9, 2026 at 9:41 PM Ed Seidl <[email protected]> wrote: > > > > > > > Hi All, > > > > > > > > Following a lively discussion on this list, I thought I’d take a stab > > at > > > > addressing one pain point in the Parquet footer. I’ve put up a proposal > > > [1] > > > > and PR [2] to switch path_in_schema in the ColumnMetaData from > > “required” > > > > to “optional”. I’ve also whipped up PoCs in Rust [3] and Java [4]. > > > > > > > > Please take a look and let’s discuss in the PR. > > > > > > > > Thanks, > > > > Ed > > > > > > > > [1] https://github.com/apache/parquet-format/issues/563 > > > > [2] https://github.com/apache/parquet-format/pull/564 > > > > [3] https://github.com/apache/arrow-rs/pull/9678 > > > > [4] https://github.com/apache/parquet-java/pull/3470 > > > > > > > > > >
