Ebraam-Ashraf opened a new pull request, #49443:
URL: https://github.com/apache/arrow/pull/49443

   ### Rationale for this change
   `if_else` with a null scalar and a sliced BaseBinary array (offset != 0) 
produces invalid output. The ASA and AAS shortcut paths in `scalar_if_else.cc` 
copy offsets without adjusting for the slice offset, and copy data from byte 0 
instead of `data + offsets[0]`.
   
   all existing BaseBinary tests use arrays built directly from `ArrayFromJSON` 
where offset is always 0.
   
   A proposed fix (adjusting offsets by `offsets[0]` and copying data from 
`data + offsets[0]` in both the ASA and AAS paths) is ready and will be added 
in a follow-up commit once this test is reviewed and confirmed correct.
   
   ### What changes are included in this PR?
   A regression test `IfElseBaseBinarySlicedChunk` that reproduces the bug 
across `utf8`, `binary`, `large_utf8`, and `large_binary` types, covering the 
ASA path, AAS path, and the full round-trip from the original issue report.
   
   The test currently fails:
   ```
   [ RUN      ] TestIfElseKernel.IfElseBaseBinarySlicedChunk
   scalar_if_else_test.cc:3790: Failure 
'result_asa.make_array()->ValidateFull()' 
   failed with Invalid: Offset 
   invariant failure: offset for slot 2 out of bounds: 3 > 2
   [  FAILED  ] TestIfElseKernel.IfElseBaseBinarySlicedChunk (4 ms)
   ```
   ### Are these changes tested?
   Yes. The new test reproduces the bug. Fix will follow in a separate commit.
   
   **This PR contains a "Critical Fix".** The bug causes incorrect/invalid data 
to be produced when `if_else` is called with a null scalar and a sliced 
BaseBinary array.
   
   * GitHub Issue: #49410


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to