pchintar opened a new issue, #9828:
URL: https://github.com/apache/arrow-rs/issues/9828

   ### Description
   
   When reading IPC data with column projection enabled, skipping a `Union` 
column encoded with V4 metadata can lead to buffer misalignment and incorrect 
decoding of subsequent columns.
   
   ---
   
   ### Root Cause
   
   In `arrow-ipc/src/reader.rs`, `skip_field` does not correctly handle the 
buffer layout of `Union` types for V4.
   
   Current implementation:
   
   ```rust
   Union(fields, mode) => {
       self.skip_buffer(); // Nulls
   
       match mode {
           UnionMode::Dense => self.skip_buffer(),
           UnionMode::Sparse => {}
       };
   
       ...
   }
   ```
   
   However, based on the V4 layout:
   
   * `Union` includes:
   
     * null buffer
     * type_ids buffer
     * (for dense) offsets buffer
   
   And `create_array` correctly consumes:
   
   ```rust
   if self.version < MetadataVersion::V5 {
       self.next_buffer()?; // null
   }
   let type_ids = self.next_buffer()?; // type_ids
   // optionally offsets for dense
   ```
   
   So the current `skip_field` logic does not skip `type_ids` and misinterprets 
buffer order
   
   ---
   
   ### Impact
   
   * Can lead to:
   
     * incorrect decoding of subsequent columns
     * runtime errors (e.g., invalid buffer sizes)
   * Only occurs when:
   
     * projection is enabled
     * a `Union` column is skipped
     * IPC metadata version is V4
   
   ---
   
   ### Reproduction
   
   A minimal test case:
   
   ```rust
   // Schema:
   // union: Union<Int32> (skipped)
   // values: Int32 (projected)
   
   let options = IpcWriteOptions::try_new(8, false, MetadataVersion::V4)?;
   let mut writer = FileWriter::try_new_with_options(..., options)?;
   
   let reader = FileReader::try_new(cursor, Some(vec![1]))?;
   ```
   
   Before fix:
   
   ```
   InvalidArgumentError("Need at least 12 bytes in buffers[0] in array of type 
Int32, but got 1")
   ```
   
   ---
   
   ### Proposed Fix
   
   Update `skip_field` to match the actual buffer layout


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to