cloud-fan commented on code in PR #55706:
URL: https://github.com/apache/spark/pull/55706#discussion_r3196249652


##########
sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala:
##########
@@ -2460,4 +2460,53 @@ class ParametersSuite extends SharedSparkSession {
       spark.sql("SELECT 1", Array.empty[Any]),
       Row(1))
   }
+
+  test("WITH ... INSERT OVERWRITE TABLE IDENTIFIER(:p) SELECT ... FROM cte") {
+    withTable("t_cte_overwrite") {
+      sql("CREATE TABLE t_cte_overwrite (a INT) USING PARQUET")
+      sql("INSERT INTO t_cte_overwrite VALUES (10)")
+      spark.sql(
+        """WITH transformation AS (SELECT 1 AS a)
+          |INSERT OVERWRITE TABLE IDENTIFIER(:tname)
+          |SELECT * FROM transformation""".stripMargin,
+        Map("tname" -> "t_cte_overwrite"))
+      checkAnswer(spark.table("t_cte_overwrite"), Row(1))
+    }
+  }
+
+  test("WITH ... INSERT INTO IDENTIFIER(:p) SELECT ... FROM cte") {
+    withTable("t_cte_into") {
+      sql("CREATE TABLE t_cte_into (a INT) USING PARQUET")
+      spark.sql(
+        """WITH transformation AS (SELECT 7 AS a)
+          |INSERT INTO IDENTIFIER(:tname)
+          |SELECT * FROM transformation""".stripMargin,
+        Map("tname" -> "t_cte_into"))
+      checkAnswer(spark.table("t_cte_into"), Row(7))
+    }
+  }
+
+  test("Analyzed plan does not leave WithCTE wrapping a CTEInChildren " +

Review Comment:
   Once the fix moves into the parser per the comment on 
`ResolveIdentifierClause.scala`, please broaden the structural test to cover 
the other affected commands too — at minimum a `WITH t AS (...) CREATE TABLE 
IDENTIFIER(:p) AS SELECT * FROM t` variant, since CTAS goes through the same 
shape. Also worth knowing: the `INSERT INTO/OVERWRITE` smoke tests above pass 
even without the fix (eager command execution doesn't hit the re-analysis path 
that produces the `NoSuchElementException`); the structural assertion here is 
what actually anchors the regression.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala:
##########
@@ -82,6 +82,13 @@ class ResolveIdentifierClause(earlyBatches: 
Seq[RuleExecutor[LogicalPlan]#Batch]
               IdentifierResolution.evalIdentifierExpr(e.identifierExpr), 
e.otherExprs)
         }
     }
+    // When `PlanWithUnresolvedIdentifier` materializes into a `CTEInChildren` 
(e.g.
+    // `InsertIntoStatement`) inside an outer `WithCTE`, push the CTE defs 
into the command's
+    // children - restoring the invariant from `CTESubstitution.withCTEDefs`.
+    resolved.resolveOperatorsUpWithPruning(_.containsPattern(CTE)) {
+      case WithCTE(c: CTEInChildren, cteDefs) => c.withCTEDefs(cteDefs)

Review Comment:
   This patches the symptom rather than the root cause: `withIdentClause` lifts 
`PlanWithUnresolvedIdentifier` above the whole write/CTAS command, and we then 
have to undo the placement after materialization. Please change the parser to 
push the placeholder into the identifier slot of the produced plan (e.g. 
`InsertIntoStatement.table`, `CreateTableAsSelect.name`) instead of wrapping 
the entire command, and add a parallel handler in this rule — e.g.:
   
   ```scala
   case i @ InsertIntoStatement(p: PlanWithUnresolvedIdentifier, _, _, _, _, _, 
_, _, _)
       if p.identifierExpr.resolved =>
     i.copy(table = executor.execute(p.planBuilder.apply(
       IdentifierResolution.evalIdentifierExpr(p.identifierExpr), p.children)))
   ```
   
   Then `CTESubstitution` sees the actual `CTEInChildren` from the start and 
places `WithCTE` correctly — no post-hoc collapse needed, and the invariant is 
preserved by construction. The same shape applies to all `withIdentClause` call 
sites whose builder produces a `CTEInChildren` (`INSERT`, CTAS, RTAS, `CACHE 
TABLE AS` — `AstBuilder.scala:911-1002, 5640, 5724, 6502`). Downstream matchers 
like `case InsertIntoStatement(LogicalRelationWithTable(_), ...)` aren't 
affected because they run after this rule, by which point `table` is back to a 
normal resolved relation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to