alamb commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2204379090
> possibility that left might already be false and op is And, or that left might be true and op is Or. In general I think there is a tradeoff between doing short circuiting (what I think this ticket is describing) and having to check if each row should be evaluted So for a predicate like `(a = 5) AND (b = 10)` it is very likely faster to evaluate `(a = b)` and (`b = 10`) with very tight loops / SIMD instructions and then `&&` the resutlting boolean together that it would be to evalute (`a = b`) and then have a loop that checked the result for each row when evaluting `b = 10`. However, for an example like `(a = 5) AND (very_expensive_udf(b) = 10)` it may well be faster to do short circuting execution Note DataFusion already does short circuting for evaluating `CASE` https://github.com/apache/datafusion/blob/4f4cd81de72a858896ac37a51b0e354cb379307c/datafusion/physical-expr/src/expressions/case.rs#L125-L187 SO TLDR I think this would be an interesting optimization to explore and as @Dandandan notes finding some benchmarks where it matters is likely a good first step -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org