fallintoplace opened a new issue, #10002:
URL: https://github.com/apache/arrow-rs/issues/10002

   **Describe the bug**
   
   Parquet column metadata statistics can panic while converting malformed 
INT96 min/max values from the file footer.
   
   The footer metadata path checks INT96 statistics are at least 12 bytes, but 
then asserts they are exactly 12 bytes. This means an overlong 13-byte INT96 
min or max statistic can pass the length precheck and then panic instead of 
returning a `ParquetError`.
   
   **To Reproduce**
   
   Read Parquet metadata containing column chunk INT96 statistics where `min`, 
`max`, `min_value`, or `max_value` is longer than 12 bytes.
   
   The relevant path is `parquet/src/file/metadata/thrift/mod.rs`, where 
`convert_stats` calls `check_len(..., 12)` and later asserts the INT96 
statistic length is exactly 12.
   
   **Expected behavior**
   
   Malformed INT96 column metadata statistics should return an error instead of 
panicking.
   
   This would be consistent with the project guidance that invalid input should 
generally become an error result, and with the page-statistics path, which 
already returns `Incorrect Int96 min statistics` / `Incorrect Int96 max 
statistics` for non-12-byte INT96 values.
   
   **Additional context**
   
   A related closed issue, #8614, covered the page-statistics path in 
`parquet/src/file/statistics.rs`. The footer column metadata conversion path 
still has the panic.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to