following on from the discussion today
1. I can see the benefits in tagging it as optional 2. it would be a long time before the systems I field support calls over would stop generating it because we don't know where data would end up being used. 3. For those people who are encountering major problems here, it would at least be possible to say "provided you intend to only work with versions of <product> dated 2027 or newer, all is good. making the field optional as soon as possible would increase the time at which parquet releases can actually stop adding the field. Being able to tie it to a non-backwards-compatible database change (and I'm thinking Iceberg v4 tables) would provide a clear way to scope that incompatibility. Imagine if iceberg was set up to turn the feature of when generating files for v4 tables, knowing all applications which could read the tables wouldn't need path_in_schema. *regardless of the language of that implementation* steve On Mon, 20 Apr 2026 at 09:34, Gang Wu <[email protected]> wrote: > Thanks Ed for raising this! > > Overall I'm +1 to this. We need input from others since it is a slight > breaking change. > > Best, > Gang > > On Thu, Apr 9, 2026 at 9:41 PM Ed Seidl <[email protected]> wrote: > > > Hi All, > > > > Following a lively discussion on this list, I thought I’d take a stab at > > addressing one pain point in the Parquet footer. I’ve put up a proposal > [1] > > and PR [2] to switch path_in_schema in the ColumnMetaData from “required” > > to “optional”. I’ve also whipped up PoCs in Rust [3] and Java [4]. > > > > Please take a look and let’s discuss in the PR. > > > > Thanks, > > Ed > > > > [1] https://github.com/apache/parquet-format/issues/563 > > [2] https://github.com/apache/parquet-format/pull/564 > > [3] https://github.com/apache/arrow-rs/pull/9678 > > [4] https://github.com/apache/parquet-java/pull/3470 > > >
