Hi Dan,

> However, there are ongoing discussions around multi-modal cases where
> either separating large columns (e.g. inline blobs) or appending column
> data without rewriting existing data may leverage this.


Do you have any design docs or mailing list discussions you can point to?

I don't feel like leaving this for now while we explore those use cases
> would cause any additional confusion/complexity.


Agreed, it isn't urgent to clean this up. But having a more concrete
timeline would be helpful, this does seem to be a semi-regular source of
confusion for folks, so it would be nice to clean up the loose end.

Thanks,
Micah

On Fri, Dec 5, 2025 at 4:07 PM Daniel Weeks <[email protected]> wrote:

> I'd actually prefer that we don't deprecate this field (at least not
> immediately).
>
> Recognizing that we've discussed separating column data into multiple files
> for over a decade without any concrete implementations, there are emerging
> use cases that may benefit from investing in this feature.
>
> Many of the use cases in the past have been misaligned (e.g. separating
> column data for security/encryption) and better alternatives addressed
> those scenarios.
>
> However, there are ongoing discussions around multi-modal cases where
> either separating large columns (e.g. inline blobs) or appending column
> data without rewriting existing data may leverage this.
>
> I don't feel like leaving this for now while we explore those use cases
> would cause any additional confusion/complexity.
>
> -Dan
>
> On Thu, Dec 4, 2025 at 9:04 AM Micah Kornfield <[email protected]>
> wrote:
>
> > > What does "deprecated" entail here? Do we plan to remove this field
> > from the format? Otherwise, is it just documentation?
> >
> > I was imagining just documentation, since we don't want to break the
> > "_metadata file" use case.
> >
> > On Thu, Dec 4, 2025 at 8:18 AM Antoine Pitrou <[email protected]>
> wrote:
> >
> > >
> > > What does "deprecated" entail here? Do we plan to remove this field
> > > from the format? Otherwise, is it just documentation?
> > >
> > >
> > >
> > > On Mon, 1 Dec 2025 12:09:18 -0800
> > > Micah Kornfield <[email protected]>
> > > wrote:
> > > > This has come up a few times in the sync and other forums.  I wanted
> to
> > > > start the conversation about deprecating file_path
> > > > <
> > >
> >
> https://github.com/apache/parquet-format/blob/3ab52ff2e4e1cbe4c52a3e25c0512803e860c454/src/main/thrift/parquet.thrift#L962
> > > >
> > > > [1] in the parquet footer.
> > > >
> > > > Outside of the "_metadata" file index use-case I don't think this is
> > used
> > > > or implemented in any reader (effectively a poor man's table format).
> > > >
> > > > With the rise of file formats, it seems like a reasonable design
> choice
> > > to
> > > > push complexity of referencing columns across files to the table
> level
> > > and
> > > > keep parquet focused on single file storage (encodings, indexing,
> etc).
> > > >
> > > > Implementing this at a file level also can be challenging in the
> > context
> > > of
> > > > knowing all credentials one might need to read from different objects
> > on
> > > > object storage?
> > > >
> > > > Thoughts/Objections?
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/3ab52ff2e4e1cbe4c52a3e25c0512803e860c454/src/main/thrift/parquet.thrift#L962
> > > >
> > >
> > >
> > >
> > >
> >
>

Reply via email to