zhuqi-lucas commented on PR #79:
URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041421377

   > zhuqi-lucas#1
   
   Thank you @JigaoLuo , merged your changes!
   
   
   
   > Two small nitpicks I came across today:
   > 
   > * "Footer" vs. "Metadata" ?: Apologies for being pedantic, but I think 
we’re consistently referring to metadata here, not just the footer. Xiangpeng 
also corrected me on this elsewhere:
   > 
   > > footer often refers to the last 8 byte of Parquet file
   > 
   > * One small thing&question to consider—does support for user-defined 
indexes depend on a specific version of Parquet? If so, it might be helpful to 
add a brief note about that. I’m not sure of the answer myself, but it could be 
worth clarifying.
   
   
   
   You’re absolutely right—what we're describing is the file‑level metadata 
(the key_value_metadata in the FileMetaData Thrift struct), not just the last 8 
bytes of the file. In Parquet parlance, “footer” technically refers to the file 
trailer (the magic + length + magic markers), whereas “metadata” covers 
everything in the FileMetaData block (including all custom key‑value pairs). We 
should consistently say “metadata” throughout to avoid that confusion.
   
   
   Custom (user‑defined) metadata via the FileMetaData.key_value_metadata map 
has been part of the Parquet format since its earliest releases . Any 
reader/writer that implements basic Parquet will preserve arbitrary file 
metadata fields.
   
   But our arrow-rs dependencies should be >= 55.2.0 to keep writing 
consistency for internal buffer.
   
   > **Prerequisite:** Requires **arrow‑rs v55.2.0** or later, which includes 
the new “buffered write” API 
([apache/arrow-rs#7714](https://github.com/apache/arrow-rs/pull/7714)).  
   > This API keeps the internal byte count in sync so you can append index 
bytes immediately after data pages. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to