pchintar opened a new pull request, #9952:
URL: https://github.com/apache/arrow-rs/pull/9952

   # Which issue does this PR close?
   
   - Closes #9950 .
   
   # Rationale for this change
   
   The current IPC reader does not correctly handle duplicate projection 
indices.
   
   `Schema::project` and `RecordBatch::project` both allow duplicate indices 
such as:
   
   ```rust id="n4pq0f"
   vec![1, 1]
   ```
   
   However, the IPC reader currently uses:
   
   ```rust id="gjklyo"
   projection.iter().position(|p| p == &idx)
   ```
   
   which only returns the first matching entry. As a result, only one column is 
decoded even though the projected schema contains multiple fields, leading to 
schema/column count mismatches when constructing the `RecordBatch`.
   
   This also affects reordered duplicate projections such as:
   
   ```rust id="jlwmku"
   vec![2, 0, 2]
   ```
   
   # What changes are included in this PR?
   
   * Updated IPC projection handling in `arrow-ipc/src/reader.rs` to preserve 
all matching projection entries
   * Reused the decoded array for duplicate projection indices instead of 
decoding the same field multiple times
   * Preserved projection order for reordered duplicate projections
   
   # Are these changes tested?
   
   Yes.
   
   Added `test_projection_duplicate_indices`, which verifies:
   
   * duplicate projections (`vec![1, 1]`)
   * reordered duplicate projections (`vec![2, 0, 2]`)
   
   The test compares IPC projection results against `RecordBatch::project`.
   
   The test fails before the fix and passes after it.
   
   All existing `arrow-ipc` tests also pass `cargo test -p arrow-ipc --lib`
   
   # Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to