pchintar opened a new pull request, #9829:
URL: https://github.com/apache/arrow-rs/pull/9829
# Which issue does this PR close?
- Closes #9828 .
# Rationale for this change
`skip_field` does not correctly handle the buffer layout of `Union` types
for V4 IPC.
In V4:
* `Union` includes a null buffer + type_ids (+ offsets for dense)
In V5:
* `Union` has no null buffer, only type_ids (+ offsets for dense)
`create_array` correctly handles this difference using a version check.
However, `skip_field` always assumes a null buffer and does not skip
`type_ids`, leading to buffer misalignment when skipping a `Union` column in V4.
This can cause incorrect decoding or runtime errors for projected columns.
# What changes are included in this PR?
* Updated `skip_field` in `arrow-ipc/src/reader.rs` to:
* conditionally skip the null buffer only for V4
* explicitly skip the `type_ids` buffer
* correctly handle dense vs sparse offsets
* Aligns `skip_field` behavior with `create_array` and actual IPC layout
# Are these changes tested?
Yes.
* Added test: `test_projection_skip_union_v4`
* The test:
* writes IPC data using V4 metadata
* includes a `Union` column followed by an `Int32` column
* reads only the second column (skipping the `Union`)
* verifies the output matches expected values
* The test fails before the fix and passes after
* All existing `arrow-ipc` tests pass (`cargo test -p arrow-ipc --lib`)
# Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]