kosiew opened a new pull request, #22037:
URL: https://github.com/apache/datafusion/pull/22037
## Which issue does this PR close?
* Closes #22034.
## Rationale for this change
`RecursiveQueryExec` widened recursive CTE output nullability by reconciling
the static and recursive term schemas. This caused the physical schema to
diverge from the logical/static CTE schema and forced valid SQL such as `0 AS
level` to be rewritten as nullable expressions like `SUM(0) AS level`.
This change preserves the declared recursive CTE schema by treating the
static/anchor term schema as authoritative and aligning the recursive term to
that schema during plan construction.
## What changes are included in this PR?
* Added `align_plan_to_schema`, a higher-level plan-time schema alignment
helper that guarantees the resulting plan advertises the expected schema
exactly.
* Kept `project_plan_to_schema` as the narrower projection-based helper and
refactored shared validation into `validate_schema_alignment`.
* Added `SchemaAlignExec`, an execution-plan adapter that:
* advertises the expected schema from plan properties
* preserves positional column values
* rebinds emitted `RecordBatch` schemas inside the adapter
* validates column count, data types, field metadata, and schema metadata
* Updated `RecursiveQueryExec::try_new` to:
* use the static term schema as the recursive CTE output schema
* align the recursive term with `align_plan_to_schema`
* remove recursive output schema widening logic
* Restored the recursive CTE SLT coverage from `SUM(0) AS level` back to `0
AS level`.
## Are these changes tested?
Yes.
Added and updated tests covering:
* `align_plan_to_schema`:
* exact schema returns unchanged plan
* rename-only alignment uses `ProjectionExec`
* nullable input to non-null expected schema uses `SchemaAlignExec`
* column count mismatch errors
* type mismatch errors
* field metadata mismatch errors
* schema metadata mismatch errors
* `project_plan_to_schema`:
* schema match passthrough
* nullability widening
* nullability narrowing rejection
* metadata mismatch validation
* `RecursiveQueryExec`:
* recursive term projection alignment
* preservation of the static nullability contract
* recursive term schema matches the static schema after construction
* Restored SQL logic test coverage in `cte.slt` using `0 AS level`.
Validated with:
```bash
cargo test -p datafusion-physical-plan recursive_query_exec
cargo test -p datafusion-physical-plan project_plan_to_schema
cargo test -p datafusion-sqllogictest --test sqllogictests -- cte
```
## Are there any user-facing changes?
Yes.
Recursive CTEs now preserve the declared/static schema instead of widening
nullability based on recursive expressions. Existing valid SQL such as:
```sql
0 AS level
```
continues to work without requiring nullable rewrites like:
```sql
SUM(0) AS level
```
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]