pepijnve commented on issue #17801:
URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3497492805

   To summarise for any newcomer, the root cause of this issue is that
   1. `IFNULL(x, y)` gets simplified to `CASE WHEN x IS NOT NULL THEN x ELSE y 
END`
   2. `IFNULL(x, y)` reports itself as `nullable? false`
   3. `CASE WHEN x IS NOT NULL THEN x ELSE y END` reports itself as `nullable? 
true`
   
   After step 1, the logical schema of the query still has `nullable? false` 
for the column.
   When translating from logical to physical, the physical expression also 
reports `nullable? true` and the planner errors out due to the mismatch between 
logical and physical nullability.
   
   There are a couple of ways to fix this:
   1. Assume that scalar UDF simplification is not allowed to change the schema 
in any way. The implication is that the `CASE` expression in this example must 
also return `nullable? false`. Adapt `is_nullable` for `CASE` to make this work
   2. Allow scalar UDF simplification to change the logical schema. Adjust the 
code where the simplification is happening so that the schema does not get out 
of sync with the actual expressions.
   
   Option 1 feels like the more correct approach, but is rather tricky to 
implement; particularly for the logical expression. This requires constant 
evaluation of the 'when' expressions of the case expression from code that's 
located in the `expr` crate. Const evaluation uses physical expression 
evaluation, but that's not accessible from `expr`.
   The alternative is to emulate evaluation as best as possible. An attempt to 
implement this can be found in https://github.com/apache/datafusion/pull/17813.
   
   Option 2 is the easy quick fix, but doesn't feel correct. You wouldn't want 
an expression to change from not-nullable to nullable as part of optimisation. 
It might also invalidate assumptions that earlier optimisation passes made 
regarding nullability.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to