pepijnve commented on PR #17813: URL: https://github.com/apache/datafusion/pull/17813#issuecomment-3350730015
@alamb thinking about this a bit more. I'm going to struggle expressing myself sufficiently clearly here, but I'll try to explain the idea behind what I'm doing here. Maybe that can help us figure out a better way to express the idea. What I'm trying to do is improve the accuracy of the predicate `is_nullable(expr) -> bool` specifically for the case where expr is a `CASE` expression. In the current code this predicate is implemented by checking (in pseudo code) `case_expr.when_then.any?((w_t) -> is_nullable(t)) || is_nullable(else)`. This results in quite a few `CASE` expressions being reported as nullable even though they're not. In particular there's one interesting case (pun not intended) which results from the `coalesce` simplification and that is `CASE WHEN x IS NOT NULL THEN x ELSE y`. If the expression `x` is nullable, `is_nullable` will currently report the entire case expression as nullable. The implementation does not take into account that there is a guard clause preventing the null value of `x` ever being returned. What I attempted to do in this PR is the to look at the more general form `CASE WHEN f(x) IS NOT NULL THEN x ELSE y` where `f(x)` is some arbitrary predicate that may or may not depend on the value of `x`. What the code is trying to do is to inject a binding of `x = NULL` and then determining if `f(null)` can be const evaluated to `false`. If that is so, then predicate `f` prevents this particular branch of the case expression from ever returning a null value and we can ignore the nullability of `x`. I tried to implement this in a cheap, but imprecise way. My rationale was that even though it's not perfect, it's an improvement in accuracy over the current code. A possible alternative would be to rewrite the expression corresponding to `f` using the binding `x = null` and then attempting to const evaluate using the existing const evaluation code. That introduces a dependency from the logical expression module to the optimiser though and would probably have a longer run time than the current crude approximation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
