I'd actually prefer that we don't deprecate this field (at least not immediately).
Recognizing that we've discussed separating column data into multiple files for over a decade without any concrete implementations, there are emerging use cases that may benefit from investing in this feature. Many of the use cases in the past have been misaligned (e.g. separating column data for security/encryption) and better alternatives addressed those scenarios. However, there are ongoing discussions around multi-modal cases where either separating large columns (e.g. inline blobs) or appending column data without rewriting existing data may leverage this. I don't feel like leaving this for now while we explore those use cases would cause any additional confusion/complexity. -Dan On Thu, Dec 4, 2025 at 9:04 AM Micah Kornfield <[email protected]> wrote: > > What does "deprecated" entail here? Do we plan to remove this field > from the format? Otherwise, is it just documentation? > > I was imagining just documentation, since we don't want to break the > "_metadata file" use case. > > On Thu, Dec 4, 2025 at 8:18 AM Antoine Pitrou <[email protected]> wrote: > > > > > What does "deprecated" entail here? Do we plan to remove this field > > from the format? Otherwise, is it just documentation? > > > > > > > > On Mon, 1 Dec 2025 12:09:18 -0800 > > Micah Kornfield <[email protected]> > > wrote: > > > This has come up a few times in the sync and other forums. I wanted to > > > start the conversation about deprecating file_path > > > < > > > https://github.com/apache/parquet-format/blob/3ab52ff2e4e1cbe4c52a3e25c0512803e860c454/src/main/thrift/parquet.thrift#L962 > > > > > > [1] in the parquet footer. > > > > > > Outside of the "_metadata" file index use-case I don't think this is > used > > > or implemented in any reader (effectively a poor man's table format). > > > > > > With the rise of file formats, it seems like a reasonable design choice > > to > > > push complexity of referencing columns across files to the table level > > and > > > keep parquet focused on single file storage (encodings, indexing, etc). > > > > > > Implementing this at a file level also can be challenging in the > context > > of > > > knowing all credentials one might need to read from different objects > on > > > object storage? > > > > > > Thoughts/Objections? > > > > > > Thanks, > > > Micah > > > > > > > > > [1] > > > > > > https://github.com/apache/parquet-format/blob/3ab52ff2e4e1cbe4c52a3e25c0512803e860c454/src/main/thrift/parquet.thrift#L962 > > > > > > > > > > > >
