Re: [DISCUSS] Deprecate file_path field in column chunk

Daniel Weeks Fri, 05 Dec 2025 16:07:48 -0800

I'd actually prefer that we don't deprecate this field (at least not
immediately).

Recognizing that we've discussed separating column data into multiple files
for over a decade without any concrete implementations, there are emerging
use cases that may benefit from investing in this feature.

Many of the use cases in the past have been misaligned (e.g. separating
column data for security/encryption) and better alternatives addressed
those scenarios.

However, there are ongoing discussions around multi-modal cases where
either separating large columns (e.g. inline blobs) or appending column
data without rewriting existing data may leverage this.

I don't feel like leaving this for now while we explore those use cases
would cause any additional confusion/complexity.

-Dan

On Thu, Dec 4, 2025 at 9:04 AM Micah Kornfield <[email protected]>
wrote:

> > What does "deprecated" entail here? Do we plan to remove this field
> from the format? Otherwise, is it just documentation?
>
> I was imagining just documentation, since we don't want to break the
> "_metadata file" use case.
>
> On Thu, Dec 4, 2025 at 8:18 AM Antoine Pitrou <[email protected]> wrote:
>
> >
> > What does "deprecated" entail here? Do we plan to remove this field
> > from the format? Otherwise, is it just documentation?
> >
> >
> >
> > On Mon, 1 Dec 2025 12:09:18 -0800
> > Micah Kornfield <[email protected]>
> > wrote:
> > > This has come up a few times in the sync and other forums.  I wanted to
> > > start the conversation about deprecating file_path
> > > <
> >
> https://github.com/apache/parquet-format/blob/3ab52ff2e4e1cbe4c52a3e25c0512803e860c454/src/main/thrift/parquet.thrift#L962
> > >
> > > [1] in the parquet footer.
> > >
> > > Outside of the "_metadata" file index use-case I don't think this is
> used
> > > or implemented in any reader (effectively a poor man's table format).
> > >
> > > With the rise of file formats, it seems like a reasonable design choice
> > to
> > > push complexity of referencing columns across files to the table level
> > and
> > > keep parquet focused on single file storage (encodings, indexing, etc).
> > >
> > > Implementing this at a file level also can be challenging in the
> context
> > of
> > > knowing all credentials one might need to read from different objects
> on
> > > object storage?
> > >
> > > Thoughts/Objections?
> > >
> > > Thanks,
> > > Micah
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/apache/parquet-format/blob/3ab52ff2e4e1cbe4c52a3e25c0512803e860c454/src/main/thrift/parquet.thrift#L962
> > >
> >
> >
> >
> >
>

Re: [DISCUSS] Deprecate file_path field in column chunk

Reply via email to