neilconway commented on PR #22037:
URL: https://github.com/apache/datafusion/pull/22037#issuecomment-4412797480

   Please let me know if I'm understanding this correctly:
   
   * The PR aims to address a situation where there is a schema mismatch 
between the anchor and recursive cases in a CTE
   * In particular, we might infer different nullability properties between the 
anchor vs the recursive query -- e.g., if we have `0` in the anchor and 
`min(...)` in the recursive case, `0` is non-nullable and `min(...)` is 
nullable (as an aside, the latter is conservative: `min(x)` without `FILTER` in 
a grouped query is non-nullable if `x` is non-nullable, but I suppose this is a 
separate planner shortcoming...)
   * The proposed behavior is to apply the anchor schema to the recursive CTE 
branch schemas. So we would effectively be requiring that a nullable `min` 
expression never returns a NULL, in the example above
   * If the recursive query *does* return a NULL, we produce a runtime error
   
   If that is accurate, then the proposed behavior would result in this query 
producing an error:
   
   ```sql
   SET datafusion.execution.enable_recursive_ctes = true;
   
     WITH RECURSIVE t AS (
       SELECT 0 AS n
       UNION ALL
       SELECT CAST(NULL AS INT) AS n FROM t WHERE n IS NOT NULL
     )
     SELECT * FROM t;
     ```
   
   (`Column 'n' is declared as non-nullable but contains null values`) -- but 
this query seems pretty reasonable to me and is allowed by other SQL 
implementations (e.g., Postgres, DuckDB, MariaDB, SQLite).
   
   Instead, shouldn't we be computing the CTE's logical schema by widening the 
anchor and the recursive schemas? This is conceptually similar to what we do 
for `UNION`. That is, if the anchor expression is non-nullable and the 
recursive expression is nullable, the output schema should be nullable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to