Hi Dan, > However, there are ongoing discussions around multi-modal cases where > either separating large columns (e.g. inline blobs) or appending column > data without rewriting existing data may leverage this.
Do you have any design docs or mailing list discussions you can point to? I don't feel like leaving this for now while we explore those use cases > would cause any additional confusion/complexity. Agreed, it isn't urgent to clean this up. But having a more concrete timeline would be helpful, this does seem to be a semi-regular source of confusion for folks, so it would be nice to clean up the loose end. Thanks, Micah On Fri, Dec 5, 2025 at 4:07 PM Daniel Weeks <[email protected]> wrote: > I'd actually prefer that we don't deprecate this field (at least not > immediately). > > Recognizing that we've discussed separating column data into multiple files > for over a decade without any concrete implementations, there are emerging > use cases that may benefit from investing in this feature. > > Many of the use cases in the past have been misaligned (e.g. separating > column data for security/encryption) and better alternatives addressed > those scenarios. > > However, there are ongoing discussions around multi-modal cases where > either separating large columns (e.g. inline blobs) or appending column > data without rewriting existing data may leverage this. > > I don't feel like leaving this for now while we explore those use cases > would cause any additional confusion/complexity. > > -Dan > > On Thu, Dec 4, 2025 at 9:04 AM Micah Kornfield <[email protected]> > wrote: > > > > What does "deprecated" entail here? Do we plan to remove this field > > from the format? Otherwise, is it just documentation? > > > > I was imagining just documentation, since we don't want to break the > > "_metadata file" use case. > > > > On Thu, Dec 4, 2025 at 8:18 AM Antoine Pitrou <[email protected]> > wrote: > > > > > > > > What does "deprecated" entail here? Do we plan to remove this field > > > from the format? Otherwise, is it just documentation? > > > > > > > > > > > > On Mon, 1 Dec 2025 12:09:18 -0800 > > > Micah Kornfield <[email protected]> > > > wrote: > > > > This has come up a few times in the sync and other forums. I wanted > to > > > > start the conversation about deprecating file_path > > > > < > > > > > > https://github.com/apache/parquet-format/blob/3ab52ff2e4e1cbe4c52a3e25c0512803e860c454/src/main/thrift/parquet.thrift#L962 > > > > > > > > [1] in the parquet footer. > > > > > > > > Outside of the "_metadata" file index use-case I don't think this is > > used > > > > or implemented in any reader (effectively a poor man's table format). > > > > > > > > With the rise of file formats, it seems like a reasonable design > choice > > > to > > > > push complexity of referencing columns across files to the table > level > > > and > > > > keep parquet focused on single file storage (encodings, indexing, > etc). > > > > > > > > Implementing this at a file level also can be challenging in the > > context > > > of > > > > knowing all credentials one might need to read from different objects > > on > > > > object storage? > > > > > > > > Thoughts/Objections? > > > > > > > > Thanks, > > > > Micah > > > > > > > > > > > > [1] > > > > > > > > > > https://github.com/apache/parquet-format/blob/3ab52ff2e4e1cbe4c52a3e25c0512803e860c454/src/main/thrift/parquet.thrift#L962 > > > > > > > > > > > > > > > > > > >
