[GitHub] [spark] aokolnychyi commented on a diff in pull request #40734: [SPARK-43088][SQL] Respect RequiresDistributionAndOrdering in CTAS/RTAS

via GitHub Tue, 11 Apr 2023 18:35:30 -0700


aokolnychyi commented on code in PR #40734:
URL: https://github.com/apache/spark/pull/40734#discussion_r1163475347



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala:
##########
@@ -389,6 +389,46 @@ case class WriteDelta(
   }
 }
 
+trait V2CreateTableAsSelectPlan extends Command with V2CreateTablePlan with 
KeepAnalyzedQuery {
+  def name: LogicalPlan
+  def query: LogicalPlan
+  def isQueryAnalyzed: Boolean
+
+  override lazy val resolved: Boolean = childrenResolved && {
+    // the table schema is created from the query schema, so the only 
resolution needed is to check
+    // that the columns referenced by the table's partitioning exist in the 
query schema
+    val references = partitioning.flatMap(_.references).toSet
+    
references.map(_.fieldNames).forall(query.schema.findNestedField(_).isDefined)
+  }
+
+  override def children: Seq[LogicalPlan] = if (isQueryAnalyzed) Seq(name) 
else Seq(name, query)

Review Comment:
   The primary purpose of this trait is to share common logic for hiding 
`query` after the analysis to avoid optimizing the plan twice.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala:
##########
@@ -389,6 +389,46 @@ case class WriteDelta(
   }
 }
 
+trait V2CreateTableAsSelectPlan extends Command with V2CreateTablePlan with 
KeepAnalyzedQuery {
+  def name: LogicalPlan
+  def query: LogicalPlan
+  def isQueryAnalyzed: Boolean
+
+  override lazy val resolved: Boolean = childrenResolved && {
+    // the table schema is created from the query schema, so the only 
resolution needed is to check
+    // that the columns referenced by the table's partitioning exist in the 
query schema
+    val references = partitioning.flatMap(_.references).toSet
+    
references.map(_.fieldNames).forall(query.schema.findNestedField(_).isDefined)
+  }
+
+  override def children: Seq[LogicalPlan] = if (isQueryAnalyzed) Seq(name) 
else Seq(name, query)

Review Comment:
   The primary purpose of this trait is to share common logic for hiding 
`query` after the analysis to avoid optimizing the plan twice. I also moved 
other common logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40734: [SPARK-43088][SQL] Respect RequiresDistributionAndOrdering in CTAS/RTAS

Reply via email to