vigneshsiva11 commented on PR #9369: URL: https://github.com/apache/arrow-rs/pull/9369#issuecomment-3901843325
Thanks for the question. PR #9362 prevents the overflow by stopping the batch early inside the byte array decoder when approaching the 2GB i32 offset limit. This PR (#9369) operates at a higher level in the ParquetRecordBatchReader, ensuring that when a batch would overflow due to accumulated binary offsets, we emit the current partial RecordBatch safely and continue processing remaining rows in subsequent batches. So in short: - #9362 → fixes the issue at the decoder level (low-level safety) - #9369 → handles safe batch splitting at the reader level (high-level batch management) They address the same root problem but at different layers of the stack, and this PR ensures correct batch semantics when overflow conditions occur. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
