pchintar opened a new pull request, #9952: URL: https://github.com/apache/arrow-rs/pull/9952
# Which issue does this PR close? - Closes #9950 . # Rationale for this change The current IPC reader does not correctly handle duplicate projection indices. `Schema::project` and `RecordBatch::project` both allow duplicate indices such as: ```rust id="n4pq0f" vec![1, 1] ``` However, the IPC reader currently uses: ```rust id="gjklyo" projection.iter().position(|p| p == &idx) ``` which only returns the first matching entry. As a result, only one column is decoded even though the projected schema contains multiple fields, leading to schema/column count mismatches when constructing the `RecordBatch`. This also affects reordered duplicate projections such as: ```rust id="jlwmku" vec![2, 0, 2] ``` # What changes are included in this PR? * Updated IPC projection handling in `arrow-ipc/src/reader.rs` to preserve all matching projection entries * Reused the decoded array for duplicate projection indices instead of decoding the same field multiple times * Preserved projection order for reordered duplicate projections # Are these changes tested? Yes. Added `test_projection_duplicate_indices`, which verifies: * duplicate projections (`vec![1, 1]`) * reordered duplicate projections (`vec![2, 0, 2]`) The test compares IPC projection results against `RecordBatch::project`. The test fails before the fix and passes after it. All existing `arrow-ipc` tests also pass `cargo test -p arrow-ipc --lib` # Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
