cloud-fan opened a new pull request, #55949:
URL: https://github.com/apache/spark/pull/55949
### What changes were proposed in this pull request?
Root-cause fix for SPARK-46625. Supersedes #55706.
Push `PlanWithUnresolvedIdentifier` into the identifier slot of write
commands at parse time (`InsertIntoStatement.table`,
`CreateTableAsSelect.name`, `ReplaceTableAsSelect.name`) instead of wrapping
the entire command. `CTESubstitution` then sees the real `CTEInChildren` and
places `WithCTE` on the command's children by construction, so the invalid
`WithCTE(InsertIntoStatement, ...)` / `WithCTE(CreateTableAsSelect, ...)` shape
never appears.
Specifically:
- `AstBuilder.withInsertInto`, `visitCreateTable`, `visitReplaceTable` build
the command directly, with a `PlanWithUnresolvedIdentifier` (or a plain
`UnresolvedRelation` / `UnresolvedIdentifier` when the identifier is a constant
string) in the name slot. New helper `buildWriteTableSlot` covers the INSERT
cases.
- `ResolveIdentifierClause` gets a parallel handler that materializes
placeholders living inside the `innerPlans` of a command (the non-child
`LogicalPlan` slots) — needed for `InsertIntoStatement.table`, which is not a
child (`child = query`).
- `LogicalPlan` gains a small `innerPlans` / `withNewInnerPlans` hook
(default `Nil` / no-op). `InsertIntoStatement` overrides it.
`LogicalPlan.getDefaultTreePatternBits` unions the inner plans' pattern bits,
and `BindParameters.bind` recurses into them. This is required so that
`containsPattern(PARAMETER)` and `BindParameters` reach a placeholder inside
`InsertIntoStatement.table` — without it, `INSERT ... IDENTIFIER(:p)` under
`spark.sql.legacy.parameterSubstitution.constantsOnly=true` fails to bind the
parameter.
- `CreateTableAsSelect.name` and `ReplaceTableAsSelect.name` are already
children via `V2CreateTableAsSelectPlan.childrenToAnalyze`, so no `innerPlans`
hook is needed for them.
- A narrow post-hoc `WithCTE(c: CTEInChildren, _)` collapse is retained in
`ResolveIdentifierClause` for the two `withIdentClause` call sites where the
identifier slot's type prevents pushing the placeholder in directly:
`OverwriteByExpression.table: NamedRelation` (INSERT REPLACE WHERE) and
`CacheTableAsSelect.tempViewName: String`.
### Why are the changes needed?
```sql
WITH t AS (...)
INSERT [INTO|OVERWRITE] TABLE IDENTIFIER('t')
SELECT * FROM t
```
previously produced an analysed plan with `WithCTE` wrapping the
`InsertIntoStatement` — a structurally invalid shape. Plug-in datasources that
re-analyse the `InsertIntoStatement`'s query subtree throw
`NoSuchElementException: key not found`. Zero-day issue since SPARK-46625.
\#55706 fixed the symptom by collapsing `WithCTE(c: CTEInChildren, defs)`
after the fact in `ResolveIdentifierClause`. That works but encodes an undo for
a placement that should never have happened. The review on that PR
(https://github.com/apache/spark/pull/55706#discussion_r3196249628) asked for
the root-cause fix; this PR implements it. Credit to @stevomitric for surfacing
in the original PR thread that the parser-level placement also has to work for
legacy parameter substitution — that's why this PR adds the `innerPlans`
traversal.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New tests in `ParametersSuite`:
- `WITH ... INSERT OVERWRITE TABLE IDENTIFIER(:p) SELECT ... FROM cte`
- `WITH ... INSERT INTO IDENTIFIER(:p) SELECT ... FROM cte`
- `CREATE TABLE IDENTIFIER(:p) AS WITH ... SELECT ... FROM cte`
- `INSERT IDENTIFIER(:p) under legacy parameter substitution` (covers
`spark.sql.legacy.parameterSubstitution.constantsOnly=true`, which exercises
the `innerPlans` hook on `LogicalPlan` and the corresponding recursion in
`BindParameters.bind`)
Each CTE test asserts that no `WithCTE(CTEInChildren, _)` shape leaks
through analysis.
Existing suites run clean: `ParametersSuite` (150),
`IdentifierClauseParserSuite` (15), `AnalysisSuite` / `PlanParserSuite` /
`AnalysisErrorSuite` and siblings (675), `DataSourceV2SQLSuiteV1Filter` (319),
`InsertSuite` (154), `CTEInlineSuite` / `CTEHintSuite` (156),
`CachedTableSuite` (104), `CreateTableAsSelectSuite` / `DDLParserSuite` (53).
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Claude Code, Opus 4.7)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]