etseidl commented on PR #6582: URL: https://github.com/apache/arrow-rs/pull/6582#issuecomment-2422652824
Yes, as @jroddev found, at least part of this issue is tracked by #6447. The original reader for the offset index returns an empty vec if offset indexes are requested but are not actually present in the file. https://github.com/apache/arrow-rs/blob/dd5a2294b8b28f768b991e0e89fe7686b296c4ec/parquet/src/file/page_index/index_reader.rs#L135 There is a test somewhere that actually expects this behavior (I'll search later for that). The async `MetadataLoader` instead leaves the offset index as `None` in that case. https://github.com/apache/arrow-rs/blob/dd5a2294b8b28f768b991e0e89fe7686b296c4ec/parquet/src/arrow/async_reader/metadata.rs#L174 @alamb and I felt we couldn't reconcile the two until 54.0.0. As to `ParquetMetadataWriter`, I'm honestly not sure what happened in the past when the offset index was `Some([])`, so I'll do some digging there. It's possible there was a behavior change there. I'll have more time this afternoon to dig into this and look over this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
