zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041421377
> zhuqi-lucas#1 Thank you @JigaoLuo , merged your changes! > Two small nitpicks I came across today: > > * "Footer" vs. "Metadata" ?: Apologies for being pedantic, but I think we’re consistently referring to metadata here, not just the footer. Xiangpeng also corrected me on this elsewhere: > > > footer often refers to the last 8 byte of Parquet file > > * One small thing&question to consider—does support for user-defined indexes depend on a specific version of Parquet? If so, it might be helpful to add a brief note about that. I’m not sure of the answer myself, but it could be worth clarifying. You’re absolutely right—what we're describing is the file‑level metadata (the key_value_metadata in the FileMetaData Thrift struct), not just the last 8 bytes of the file. In Parquet parlance, “footer” technically refers to the file trailer (the magic + length + magic markers), whereas “metadata” covers everything in the FileMetaData block (including all custom key‑value pairs). We should consistently say “metadata” throughout to avoid that confusion. Custom (user‑defined) metadata via the FileMetaData.key_value_metadata map has been part of the Parquet format since its earliest releases . Any reader/writer that implements basic Parquet will preserve arbitrary file metadata fields. But our arrow-rs dependencies should be >= 55.2.0 to keep writing consistency for internal buffer. > **Prerequisite:** Requires **arrow‑rs v55.2.0** or later, which includes the new “buffered write” API ([apache/arrow-rs#7714](https://github.com/apache/arrow-rs/pull/7714)). > This API keeps the internal byte count in sync so you can append index bytes immediately after data pages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org