Ebraam-Ashraf opened a new pull request, #49443: URL: https://github.com/apache/arrow/pull/49443
### Rationale for this change `if_else` with a null scalar and a sliced BaseBinary array (offset != 0) produces invalid output. The ASA and AAS shortcut paths in `scalar_if_else.cc` copy offsets without adjusting for the slice offset, and copy data from byte 0 instead of `data + offsets[0]`. all existing BaseBinary tests use arrays built directly from `ArrayFromJSON` where offset is always 0. A proposed fix (adjusting offsets by `offsets[0]` and copying data from `data + offsets[0]` in both the ASA and AAS paths) is ready and will be added in a follow-up commit once this test is reviewed and confirmed correct. ### What changes are included in this PR? A regression test `IfElseBaseBinarySlicedChunk` that reproduces the bug across `utf8`, `binary`, `large_utf8`, and `large_binary` types, covering the ASA path, AAS path, and the full round-trip from the original issue report. The test currently fails: ``` [ RUN ] TestIfElseKernel.IfElseBaseBinarySlicedChunk scalar_if_else_test.cc:3790: Failure 'result_asa.make_array()->ValidateFull()' failed with Invalid: Offset invariant failure: offset for slot 2 out of bounds: 3 > 2 [ FAILED ] TestIfElseKernel.IfElseBaseBinarySlicedChunk (4 ms) ``` ### Are these changes tested? Yes. The new test reproduces the bug. Fix will follow in a separate commit. **This PR contains a "Critical Fix".** The bug causes incorrect/invalid data to be produced when `if_else` is called with a null scalar and a sliced BaseBinary array. * GitHub Issue: #49410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
