JigaoLuo opened a new issue, #7910: URL: https://github.com/apache/arrow-rs/issues/7910
Hi, I'm raising this both as a question and a possible bug. I'd appreciate your feedback to determine whether this is a confirmed issue. # The Question https://github.com/apache/arrow-rs/blob/2be261b78b16a4aa7b5b9aece648bec663c0dbf1/parquet/src/file/serialized_reader.rs#L471-L472 My concern is about the type of `offset` in `SerializedPageReaderState`. Should it be `u64` instead of `usize`? If I understand correctly, this offset represents a global position within a Parquet file, which can easily exceed 4 GB. On 32-bit environments (e.g., WebAssembly), `usize` is limited to `u32`'s max, which could lead to problems with larger files. # The Potential Bug As a frequent user of [Parquet viewer](https://parquet-viewer.xiangpeng.systems/), I encountered an error with a file larger than 4 GB. The offset I read exceeded u32's max, resulting in the following error: ``` Integer overflow: out of range integral type conversion attempted ``` I traced this to the line where the exception was triggered, and verified that the offset causing the issue is global and indeed exceeds `u32`'s max. https://github.com/apache/arrow-rs/blob/2be261b78b16a4aa7b5b9aece648bec663c0dbf1/parquet/src/file/serialized_reader.rs#L604 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org