alamb commented on issue #11212:
URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2204379090

   > possibility that left might already be false and op is And, or that left 
might be true and op is Or.
   
   In general I think there is a tradeoff between doing short circuiting (what 
I think this ticket is describing) and having to check if each row should be 
evaluted
   
   So for a predicate like `(a = 5) AND (b = 10)` it is very likely faster to 
evaluate `(a = b)` and (`b = 10`) with very tight loops / SIMD instructions and 
then `&&` the resutlting boolean together that it would be to evalute (`a = b`) 
and then have a loop that checked the result for each row when evaluting `b = 
10`.
   
   However, for an example like `(a = 5) AND (very_expensive_udf(b) = 10)` it 
may well be faster to do short circuting execution
   
   Note DataFusion already does short circuting for evaluating `CASE`
   
   
https://github.com/apache/datafusion/blob/4f4cd81de72a858896ac37a51b0e354cb379307c/datafusion/physical-expr/src/expressions/case.rs#L125-L187
   
   SO TLDR I think this would be an interesting optimization to explore and as 
@Dandandan  notes finding some benchmarks where it matters is likely a good 
first step


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to