neilconway opened a new pull request, #20770:
URL: https://github.com/apache/datafusion/pull/20770

   ## Which issue does this PR close?
   
   - Closes #20769.
   
   ## Rationale for this change
   
   `array_positions` previously compared the needle against each row's 
sub-array individually. When the needle is a scalar (the common case), we can 
do a single bulk `arrow_ord::cmp::not_distinct` comparison against the entire 
flat values buffer and then walk the result bitmap, which is significantly 
faster.
   
   The same pattern has already been applied to `array_position` (#20532), and 
previously to other array UDFs.
   
   ## What changes are included in this PR?
   
   - Add benchmarks for `array_positions`.
   - Implement bulk-comparison optimization
   - Refactor `array_position`'s existing fast path slightly for consistency
   - Add unit tests for `array_positions` with sliced ListArrays, for peace of 
mind
   - Add unit tests for sliced lists and sliced lists with nulls for the new 
`array_positions` fast path.
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   No.
   
   ## AI usage
   
   Multiple AI tools were used to iterate on this PR. I have reviewed and 
understand the resulting code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to