JigaoLuo opened a new issue, #7910:
URL: https://github.com/apache/arrow-rs/issues/7910

   Hi, I'm raising this both as a question and a possible bug. I'd appreciate 
your feedback to determine whether this is a confirmed issue.
   
   # The Question
   
   
https://github.com/apache/arrow-rs/blob/2be261b78b16a4aa7b5b9aece648bec663c0dbf1/parquet/src/file/serialized_reader.rs#L471-L472
   
   My concern is about the type of `offset` in `SerializedPageReaderState`. 
Should it be `u64` instead of `usize`? If I understand correctly, this offset 
represents a global position within a Parquet file, which can easily exceed 4 
GB. On 32-bit environments (e.g., WebAssembly), `usize` is limited to `u32`'s 
max, which could lead to problems with larger files.
   
   # The Potential Bug
   
   As a frequent user of [Parquet 
viewer](https://parquet-viewer.xiangpeng.systems/), I encountered an error with 
a file larger than 4 GB. The offset I read exceeded u32's max, resulting in the 
following error:
   ```
   Integer overflow: out of range integral type conversion attempted
   ```
   I traced this to the line where the exception was triggered, and verified 
that the offset causing the issue is global and indeed exceeds `u32`'s max.
   
   
https://github.com/apache/arrow-rs/blob/2be261b78b16a4aa7b5b9aece648bec663c0dbf1/parquet/src/file/serialized_reader.rs#L604


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to