Re: [PR] [Parquet]: GH-563: Make `path_in_schema` optional [arrow-rs]

via GitHub Tue, 12 May 2026 07:22:11 -0700


alamb commented on PR #9678:
URL: https://github.com/apache/arrow-rs/pull/9678#issuecomment-4431469356


   > IIUC this would allow writing files that would fail parsing from other 
readers due to the field currently being required? If so, while it seems like 
there is general consensus on the parquet mailing list to transition to this, 
doing before it is adopted in parquet-format seems like a small risk of 
fragmenting the ecosystem?
   
   This is true. though the same argument can be applied to V2 Data pages and 
other encodings like byte stream split that are not supported by other 
implementations. 
   
   In my mind there are basically two benefits to using the (Rust) 
implementation of parquet:
   1. Interoperability with other systems
   2. Reuse all the engineering that has gone into the implementation (though 
you don't plan to share the files)
   
   I think there are a bunch of use cases where systems use parquet internally 
and either 
   * don't care about interoperability as the data never leaves there systems 
or 
   * have to rewrite the files for interoperability anyways (e.g. convert 
nanosecond --> millisecond timestamps)
   
   Many of the early adopters of Vortex fall into this second category (as does 
InfluxData). 
   
   My goal with this setting is to cater to the second category and allow 
people to take advantage of the all the engineering in the Rust implementation
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Parquet]: GH-563: Make `path_in_schema` optional [arrow-rs]

Reply via email to