Re: [PR] fix(spark): align CTAS partition fields by table partition order [hudi]

via GitHub Tue, 02 Jun 2026 12:38:07 -0700


hudi-agent commented on code in PR #18899:
URL: https://github.com/apache/hudi/pull/18899#discussion_r3343929451



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala:
##########
@@ -432,6 +433,36 @@ case class ResolveImplementationsEarly(spark: 
SparkSession) extends Rule[Logical
       case _ => plan
     }
   }
+
+  private def alignCtasQueryByPartitionOrder(query: LogicalPlan, 
partitionColumns: Seq[String]): LogicalPlan = {
+    if (partitionColumns.isEmpty) {
+      query
+    } else {
+      val resolver = spark.sessionState.conf.resolver
+      val (dataAttrs, partitionAttrs) = query.output.partition { attr =>
+        !partitionColumns.exists(partition => resolver(partition, attr.name))
+      }
+
+      if (partitionAttrs.size != partitionColumns.size) {
+        throw new HoodieAnalysisException(s"Partition columns 
${partitionColumns.mkString("[", ", ", "]")} " +
+          s"do not match query output ${query.output.map(_.name).mkString("[", 
", ", "]")}")
+      }
+
+      val alreadyAligned = partitionColumns.zip(partitionAttrs).forall {

Review Comment:
   🤖 nit: could you add a short comment here explaining that this early-return 
exists to avoid inserting an unnecessary `Project` node into the query plan 
when columns are already in the right order? Without it, a future reader could 
easily think the `alreadyAligned` check is redundant (since 
`orderedPartitionAttrs` would produce the correct result either way) and remove 
it.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(spark): align CTAS partition fields by table partition order [hudi]

Reply via email to