kosiew opened a new issue, #22034:
URL: https://github.com/apache/datafusion/issues/22034

   ## Summary
   
   #21912 introduced a SQL behavior change: an existing recursive CTE SLT had 
to be amended from `0 AS level` to `SUM(0) AS level` to pass.
   
   A follow-up fix is required because recursive CTE physical alignment must 
preserve the CTE's declared/static schema, including nullability, rather than 
widening the recursive CTE output schema and forcing SQL rewrites.
   
   ## Background
   
   Issue #21910 is about preventing recursive CTE batches from leaking 
recursive-term field names when the recursive term is planned independently 
from the anchor/static term.
   
   #21912 adds a reusable execution-layer schema helper and wires recursive CTE 
execution to avoid ad hoc schema rebinding. During review/testing, another 
problem appeared:
   
   - The anchor/static term can declare a non-null field, e.g. `0 AS level`.
   - The recursive term can compute the same column through an expression that 
is nullable, e.g. `MIN(rs.level) + 1`.
   - If `RecursiveQueryExec` declares a widened schema using `static_nullable 
|| recursive_nullable`, downstream planning sees a physical schema that no 
longer matches the logical/static CTE schema.
   - Changing SQL from `0 AS level` to `SUM(0) AS level` makes the logical 
anchor nullable too, but only hides the mismatch.
   
   ## Evidence
   
   The SQL behavior change is visible in the amended SLT test:
   
   ```diff
   diff --git a/datafusion/sqllogictest/test_files/cte.slt 
b/datafusion/sqllogictest/test_files/cte.slt
   index e9c1c0245..d13e0d4f0 100644
   --- a/datafusion/sqllogictest/test_files/cte.slt
   +++ b/datafusion/sqllogictest/test_files/cte.slt
   @@ -699,7 +699,7 @@ WITH RECURSIVE region_sales AS (
        SELECT
            s.salesperson_id AS salesperson_id,
            SUM(s.sale_amount) AS amount,
   -        0 as level
   +        SUM(0) as level
        FROM
            sales s
        GROUP BY
   ```
   
   Reverting this SLT query change from:
   
   ```sql
   SUM(0) AS level
   ```
   
   to the original:
   
   ```sql
   0 AS level
   ```
   
   fails with:
   
   ```text
   Physical input schema should be the same as the one converted from logical 
input schema.
   Differences:
     - field nullability at index 2 [level]: (physical) true vs (logical) false.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to