Re: [PR] perf: Optimize `contains` for scalar search arg [datafusion]

via GitHub Sun, 28 Dec 2025 08:23:14 -0800


andygrove commented on PR #19514:
URL: https://github.com/apache/datafusion/pull/19514#issuecomment-3694864572


   > > Thanks @andygrove the numbers make huge sense to me, what prob concerns 
is DF using `make_scalar_function` in lots of places so the same problem can be 
relevant for others.
   > > Would you mind adding more details what is the exact reason and what 
scalar/array combinations are the most trouble?
   > > this info might be important how DF treats bultin functions now
   > 
   > Basically, the `make_scalar_function` functions (there are three 
implementations, two of which are identical and one with slightly different 
behavior) convert scalar arguments into arrays. This adds unnecessary overhead 
in some cases because Arrow has specialized implementations of some kernels 
with fast paths for scalar arguments. For example, for `contains`, Arrow has 
`pub fn contains(left: &dyn Datum, right: &dyn Datum)` and `Datum` is 
implemented for arrays and scalars. The implementation has special handling for 
scalars.
   > 
   > The three implementations of `make_scalar_function` can be found in these 
files:
   > 
   >     * `datafusion/functions/src/utils.rs`
   > 
   >     * `datafusion/functions-nested/src/utils.rs`
   > 
   >     * `datafusion/spark/src/function/functions_nested_utils.rs`
   
   The issue is that `make_scalar_function` expected ArrayRefs:
   
   ```
   F: Fn(&[ArrayRef]) -> Result<ArrayRef>,                                      
                                                                                
                                
   ```
   
   It would perhaps be better if it accepted `ColumnarValue`, which supports 
both arrays and scalars. This is probably a large refactor.
   
   ```
   F: Fn(&[ColumnarValue]) -> Result<ArrayRef>,   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: Optimize `contains` for scalar search arg [datafusion]

Reply via email to