alamb commented on PR #9678: URL: https://github.com/apache/arrow-rs/pull/9678#issuecomment-4431469356
> IIUC this would allow writing files that would fail parsing from other readers due to the field currently being required? If so, while it seems like there is general consensus on the parquet mailing list to transition to this, doing before it is adopted in parquet-format seems like a small risk of fragmenting the ecosystem? This is true. though the same argument can be applied to V2 Data pages and other encodings like byte stream split that are not supported by other implementations. In my mind there are basically two benefits to using the (Rust) implementation of parquet: 1. Interoperability with other systems 2. Reuse all the engineering that has gone into the implementation (though you don't plan to share the files) I think there are a bunch of use cases where systems use parquet internally and either * don't care about interoperability as the data never leaves there systems or * have to rewrite the files for interoperability anyways (e.g. convert nanosecond --> millisecond timestamps) Many of the early adopters of Vortex fall into this second category (as does InfluxData). My goal with this setting is to cater to the second category and allow people to take advantage of the all the engineering in the Rust implementation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
