HippoBaro commented on code in PR #9697:
URL: https://github.com/apache/arrow-rs/pull/9697#discussion_r3083355657
##########
parquet/src/file/metadata/mod.rs:
##########
@@ -713,6 +713,21 @@ impl RowGroupMetaData {
self.file_offset
}
+ /// Returns the byte offset just past the last column chunk in this row
group.
Review Comment:
My reading of [the
spec](https://github.com/apache/parquet-format/blob/master/README.md) makes me
think the guarantee is at the column chunk level: "_Column chunks are composed
of pages written back to back._" But there is no equivalent guarantee for
column chunks within a row group. It explicitly says "_There is no physical
structure that is guaranteed for a row group._"
Another reason, on top of your previous remark, that puts the final nail in
the coffin of the watermark release mechanism.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]