This has come up a few times in the sync and other forums.  I wanted to
start the conversation about deprecating file_path
<https://github.com/apache/parquet-format/blob/3ab52ff2e4e1cbe4c52a3e25c0512803e860c454/src/main/thrift/parquet.thrift#L962>
[1] in the parquet footer.

Outside of the "_metadata" file index use-case I don't think this is used
or implemented in any reader (effectively a poor man's table format).

With the rise of file formats, it seems like a reasonable design choice to
push complexity of referencing columns across files to the table level and
keep parquet focused on single file storage (encodings, indexing, etc).

Implementing this at a file level also can be challenging in the context of
knowing all credentials one might need to read from different objects on
object storage?

Thoughts/Objections?

Thanks,
Micah


[1]
https://github.com/apache/parquet-format/blob/3ab52ff2e4e1cbe4c52a3e25c0512803e860c454/src/main/thrift/parquet.thrift#L962

Reply via email to