[I] IPC reader projection does not handle duplicate projection indices correctly [arrow-rs]

via GitHub Fri, 08 May 2026 12:16:27 -0700


pchintar opened a new issue, #9950:
URL: https://github.com/apache/arrow-rs/issues/9950


   ### Description
   
   Currently, when reading IPC data with column projection enabled, duplicate 
projection indices can produce an invalid `RecordBatch`.
   
   ---
   
   ### Root Cause
   
   In `arrow-ipc/src/reader.rs`, projected columns are matched using:
   
   ```rust id="jlwmjv"
   projection.iter().position(|p| p == &idx)
   ```
   
   However, `position()` only returns the first matching entry.
   
   For example:
   
   ```rust id="7bgmrr"
   projection = vec![1, 1]
   ```
   
   Only a single column is decoded even though the projected schema contains 
two fields.
   
   `Schema::project` and `RecordBatch::project` both allow duplicate projection 
indices, so the IPC reader behavior becomes inconsistent with the rest of Arrow.
   
   ---
   
   ### Impact
   
   Can lead to:
   
   * invalid `RecordBatch` construction
   * runtime errors due to schema/column count mismatch
   
   Occurs when:
   
   * projection contains duplicate indices
   * reading IPC data through `FileReader` or `StreamReader`
   
   ---
   
   ### Reproduction
   
   A minimal example:
   
   ```rust id="s4q7gx"
   let projection = vec![1, 1];
   
   let reader =
       FileReader::try_new(std::io::Cursor::new(buf), Some(projection))?;
   ```
   
   Before fix:
   
   ```text id="uwmkmt"
   InvalidArgumentError(
       "number of columns(1) must match number of fields(2) in schema"
   )
   ```
   
   ---
   
   ### Proposed Fix to this Bug
   
   Update projection handling to preserve all matching projection entries while 
decoding each physical field only once.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] IPC reader projection does not handle duplicate projection indices correctly [arrow-rs]

Reply via email to