pchintar opened a new pull request, #9833:
URL: https://github.com/apache/arrow-rs/pull/9833

   # Which issue does this PR close?
   
   - Closes #9832 .
   
   # Rationale for this change
   
   In `parquet/src/file/page_index/column_index.rs`, `ColumnIndex` decoding 
assumes that page-aligned arrays (`null_pages`, `min_values`, `max_values`, and 
optional arrays) have matching lengths, but this is not validated.
   
   As a result, malformed metadata can trigger an out-of-bounds panic during 
decoding instead of returning a `ParquetError`. Since parquet files are 
external input, this should be handled safely.
   
   # What changes are included in this PR?
   
   * Added validation in:
   
     * `PrimitiveColumnIndex::try_new`
     * `ByteArrayColumnIndex::try_new`
   
   * Ensures:
   
     * `min_values.len() == null_pages.len()`
     * `max_values.len() == null_pages.len()`
     * optional arrays (`null_counts`, histograms) are consistent with page 
count
   
   * Returns `ParquetError` on mismatch instead of panicking
   
   # Are these changes tested?
   
   Yes.
   
   Added a unit test:
   
   * `test_column_index_rejects_mismatched_min_max_lengths`
   
   This constructs a `ColumnIndex` with mismatched lengths and verifies that 
decoding returns an error instead of panicking.
   
   # Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to