pchintar opened a new pull request, #9833:
URL: https://github.com/apache/arrow-rs/pull/9833
# Which issue does this PR close?
- Closes #9832 .
# Rationale for this change
In `parquet/src/file/page_index/column_index.rs`, `ColumnIndex` decoding
assumes that page-aligned arrays (`null_pages`, `min_values`, `max_values`, and
optional arrays) have matching lengths, but this is not validated.
As a result, malformed metadata can trigger an out-of-bounds panic during
decoding instead of returning a `ParquetError`. Since parquet files are
external input, this should be handled safely.
# What changes are included in this PR?
* Added validation in:
* `PrimitiveColumnIndex::try_new`
* `ByteArrayColumnIndex::try_new`
* Ensures:
* `min_values.len() == null_pages.len()`
* `max_values.len() == null_pages.len()`
* optional arrays (`null_counts`, histograms) are consistent with page
count
* Returns `ParquetError` on mismatch instead of panicking
# Are these changes tested?
Yes.
Added a unit test:
* `test_column_index_rejects_mismatched_min_max_lengths`
This constructs a `ColumnIndex` with mismatched lengths and verifies that
decoding returns an error instead of panicking.
# Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]