cloud-fan commented on code in PR #56024:
URL: https://github.com/apache/spark/pull/56024#discussion_r3279105128
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1535,6 +1535,32 @@ class Analyzer(
}
def doApply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
+ // `InsertIntoStatement.table` and `V2WriteCommand.table` are non-child
`LogicalPlan`
+ // slots (`child = query`), so the default `resolveOperatorsUp` +
`mapExpressions`
+ // traversal never resolves expressions placed inside them. For a
+ // `PlanWithUnresolvedIdentifier`, `identifierExpr` (e.g. an
`UnresolvedAttribute`
+ // referring to a SQL variable in `INSERT INTO IDENTIFIER(target_table)
...`) must
+ // be resolved here before `ResolveIdentifierClause` can materialize the
relation.
+ // Mirror the recursion that `BindParameters` and
`ResolveIdentifierClause` already
+ // do for the same shape (SPARK-46625). The `!identifierExpr.resolved`
guard makes
+ // the case idempotent under bottom-up traversal.
+ case i: InsertIntoStatement
+ if i.table.isInstanceOf[PlanWithUnresolvedIdentifier] &&
+
!i.table.asInstanceOf[PlanWithUnresolvedIdentifier].identifierExpr.resolved =>
+ val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier]
+ val resolvedExpr = resolveExpressionByPlanChildren(
+ p.identifierExpr, i, includeLastResort = true)
Review Comment:
Should this resolve against `p` rather than `i`? Passing `i` widens the
resolution scope to `i.children = Seq(query)`, so a nested
`UnresolvedAttribute` in `identifierExpr` can match a query output column. With
a colliding name like
```sql
DECLARE VARIABLE a STRING DEFAULT 't_shadow';
CREATE TABLE t_shadow (a INT) USING PARQUET;
INSERT INTO IDENTIFIER(a) SELECT 42 AS a;
```
the column wins and `IdentifierResolution.evalIdentifierExpr` then throws
`NOT_A_CONSTANT_STRING.NOT_CONSTANT` (since an `AttributeReference` is not
foldable). Resolving against `p` (whose `children` are `Nil` for the
INSERT/OverwriteByExpression path built by `buildWriteTableSlot`) mirrors how
`PlanWithUnresolvedIdentifier` is resolved in regular child slots via the
generic fallback at line 1865 — column resolution becomes a no-op and only the
last-resort variable resolution fires.
Same point applies to the `V2WriteCommand` case below (line 1561).
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1535,6 +1535,32 @@ class Analyzer(
}
def doApply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
+ // `InsertIntoStatement.table` and `V2WriteCommand.table` are non-child
`LogicalPlan`
+ // slots (`child = query`), so the default `resolveOperatorsUp` +
`mapExpressions`
+ // traversal never resolves expressions placed inside them. For a
+ // `PlanWithUnresolvedIdentifier`, `identifierExpr` (e.g. an
`UnresolvedAttribute`
+ // referring to a SQL variable in `INSERT INTO IDENTIFIER(target_table)
...`) must
+ // be resolved here before `ResolveIdentifierClause` can materialize the
relation.
+ // Mirror the recursion that `BindParameters` and
`ResolveIdentifierClause` already
+ // do for the same shape (SPARK-46625). The `!identifierExpr.resolved`
guard makes
+ // the case idempotent under bottom-up traversal.
+ case i: InsertIntoStatement
+ if i.table.isInstanceOf[PlanWithUnresolvedIdentifier] &&
+
!i.table.asInstanceOf[PlanWithUnresolvedIdentifier].identifierExpr.resolved =>
+ val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier]
+ val resolvedExpr = resolveExpressionByPlanChildren(
+ p.identifierExpr, i, includeLastResort = true)
+ resolveColumnDefaultInCommandInputQuery(
Review Comment:
This wrap looks like a no-op in this iteration:
`ResolveColumnDefaultInCommandInputQuery.apply` short-circuits unless
`i.table.resolved`, and `i.table` here is still a
`PlanWithUnresolvedIdentifier` (just with the identifier expression now
resolved). The column-default work happens later via the existing `case i:
InsertIntoStatement => resolveColumnDefaultInCommandInputQuery(i)` once
`ResolveIdentifierClause` has materialized the table. The `V2WriteCommand` case
below doesn't wrap — consider dropping the wrap here too for symmetry:
```suggestion
i.copy(table = p.copy(identifierExpr = resolvedExpr))
```
##########
sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala:
##########
@@ -2586,6 +2586,28 @@ class ParametersSuite extends SharedSparkSession {
s"Expected :tname inside OverwriteByExpression.table to be bound,
got:\n$boundOverwrite")
}
+ // SPARK-46625 followup: `INSERT INTO IDENTIFIER(<sql-variable>) ...` places
a
+ // `PlanWithUnresolvedIdentifier` in `InsertIntoStatement.table`, whose
`identifierExpr`
+ // holds an `UnresolvedAttribute` for the variable name. That slot is a
non-child
+ // `LogicalPlan`, so the default `ResolveReferences` traversal never
resolves the
+ // attribute, `ResolveIdentifierClause` cannot fire (it waits on
`identifierExpr.resolved`),
+ // and analysis fails. Verify that the explicit `InsertIntoStatement` case
added to
+ // `ResolveReferences` rewrites the attribute to a `VariableReference` and
the insert
+ // completes end-to-end.
+ test("SPARK-46625: INSERT INTO IDENTIFIER(<sql-variable>) resolves variable
in table slot") {
+ withTable("t_var_insert") {
+ sql("CREATE TABLE t_var_insert (a INT) USING PARQUET")
+ sql("DECLARE OR REPLACE VARIABLE target_table STRING")
+ try {
+ sql("SET VAR target_table = 't_var_insert'")
+ sql("INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a")
Review Comment:
Worth adding a companion test where the SQL variable name matches a query
output column (e.g. variable `a`, table `t_shadow(a INT)`, `INSERT INTO
IDENTIFIER(a) SELECT 42 AS a`) to lock in that the IDENTIFIER expression
doesn't see query output columns. With the resolution-scope question above,
that test would fail today on the colliding-name case; after narrowing the
scope it should succeed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]