cloud-fan commented on code in PR #56024:
URL: https://github.com/apache/spark/pull/56024#discussion_r3279105128


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1535,6 +1535,32 @@ class Analyzer(
     }
 
     def doApply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
+      // `InsertIntoStatement.table` and `V2WriteCommand.table` are non-child 
`LogicalPlan`
+      // slots (`child = query`), so the default `resolveOperatorsUp` + 
`mapExpressions`
+      // traversal never resolves expressions placed inside them. For a
+      // `PlanWithUnresolvedIdentifier`, `identifierExpr` (e.g. an 
`UnresolvedAttribute`
+      // referring to a SQL variable in `INSERT INTO IDENTIFIER(target_table) 
...`) must
+      // be resolved here before `ResolveIdentifierClause` can materialize the 
relation.
+      // Mirror the recursion that `BindParameters` and 
`ResolveIdentifierClause` already
+      // do for the same shape (SPARK-46625). The `!identifierExpr.resolved` 
guard makes
+      // the case idempotent under bottom-up traversal.
+      case i: InsertIntoStatement
+          if i.table.isInstanceOf[PlanWithUnresolvedIdentifier] &&
+             
!i.table.asInstanceOf[PlanWithUnresolvedIdentifier].identifierExpr.resolved =>
+        val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier]
+        val resolvedExpr = resolveExpressionByPlanChildren(
+          p.identifierExpr, i, includeLastResort = true)

Review Comment:
   Should this resolve against `p` rather than `i`? Passing `i` widens the 
resolution scope to `i.children = Seq(query)`, so a nested 
`UnresolvedAttribute` in `identifierExpr` can match a query output column. With 
a colliding name like
   
   ```sql
   DECLARE VARIABLE a STRING DEFAULT 't_shadow';
   CREATE TABLE t_shadow (a INT) USING PARQUET;
   INSERT INTO IDENTIFIER(a) SELECT 42 AS a;
   ```
   
   the column wins and `IdentifierResolution.evalIdentifierExpr` then throws 
`NOT_A_CONSTANT_STRING.NOT_CONSTANT` (since an `AttributeReference` is not 
foldable). Resolving against `p` (whose `children` are `Nil` for the 
INSERT/OverwriteByExpression path built by `buildWriteTableSlot`) mirrors how 
`PlanWithUnresolvedIdentifier` is resolved in regular child slots via the 
generic fallback at line 1865 — column resolution becomes a no-op and only the 
last-resort variable resolution fires.
   
   Same point applies to the `V2WriteCommand` case below (line 1561).



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1535,6 +1535,32 @@ class Analyzer(
     }
 
     def doApply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
+      // `InsertIntoStatement.table` and `V2WriteCommand.table` are non-child 
`LogicalPlan`
+      // slots (`child = query`), so the default `resolveOperatorsUp` + 
`mapExpressions`
+      // traversal never resolves expressions placed inside them. For a
+      // `PlanWithUnresolvedIdentifier`, `identifierExpr` (e.g. an 
`UnresolvedAttribute`
+      // referring to a SQL variable in `INSERT INTO IDENTIFIER(target_table) 
...`) must
+      // be resolved here before `ResolveIdentifierClause` can materialize the 
relation.
+      // Mirror the recursion that `BindParameters` and 
`ResolveIdentifierClause` already
+      // do for the same shape (SPARK-46625). The `!identifierExpr.resolved` 
guard makes
+      // the case idempotent under bottom-up traversal.
+      case i: InsertIntoStatement
+          if i.table.isInstanceOf[PlanWithUnresolvedIdentifier] &&
+             
!i.table.asInstanceOf[PlanWithUnresolvedIdentifier].identifierExpr.resolved =>
+        val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier]
+        val resolvedExpr = resolveExpressionByPlanChildren(
+          p.identifierExpr, i, includeLastResort = true)
+        resolveColumnDefaultInCommandInputQuery(

Review Comment:
   This wrap looks like a no-op in this iteration: 
`ResolveColumnDefaultInCommandInputQuery.apply` short-circuits unless 
`i.table.resolved`, and `i.table` here is still a 
`PlanWithUnresolvedIdentifier` (just with the identifier expression now 
resolved). The column-default work happens later via the existing `case i: 
InsertIntoStatement => resolveColumnDefaultInCommandInputQuery(i)` once 
`ResolveIdentifierClause` has materialized the table. The `V2WriteCommand` case 
below doesn't wrap — consider dropping the wrap here too for symmetry:
   
   ```suggestion
           i.copy(table = p.copy(identifierExpr = resolvedExpr))
   ```



##########
sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala:
##########
@@ -2586,6 +2586,28 @@ class ParametersSuite extends SharedSparkSession {
       s"Expected :tname inside OverwriteByExpression.table to be bound, 
got:\n$boundOverwrite")
   }
 
+  // SPARK-46625 followup: `INSERT INTO IDENTIFIER(<sql-variable>) ...` places 
a
+  // `PlanWithUnresolvedIdentifier` in `InsertIntoStatement.table`, whose 
`identifierExpr`
+  // holds an `UnresolvedAttribute` for the variable name. That slot is a 
non-child
+  // `LogicalPlan`, so the default `ResolveReferences` traversal never 
resolves the
+  // attribute, `ResolveIdentifierClause` cannot fire (it waits on 
`identifierExpr.resolved`),
+  // and analysis fails. Verify that the explicit `InsertIntoStatement` case 
added to
+  // `ResolveReferences` rewrites the attribute to a `VariableReference` and 
the insert
+  // completes end-to-end.
+  test("SPARK-46625: INSERT INTO IDENTIFIER(<sql-variable>) resolves variable 
in table slot") {
+    withTable("t_var_insert") {
+      sql("CREATE TABLE t_var_insert (a INT) USING PARQUET")
+      sql("DECLARE OR REPLACE VARIABLE target_table STRING")
+      try {
+        sql("SET VAR target_table = 't_var_insert'")
+        sql("INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a")

Review Comment:
   Worth adding a companion test where the SQL variable name matches a query 
output column (e.g. variable `a`, table `t_shadow(a INT)`, `INSERT INTO 
IDENTIFIER(a) SELECT 42 AS a`) to lock in that the IDENTIFIER expression 
doesn't see query output columns. With the resolution-scope question above, 
that test would fail today on the colliding-name case; after narrowing the 
scope it should succeed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to