Re: [PR] feat(spark): Make Hudi's analyzed plans introspectable to lineage tooling [hudi]

via GitHub Wed, 13 May 2026 08:16:18 -0700


hudi-agent commented on code in PR #18726:
URL: https://github.com/apache/hudi/pull/18726#discussion_r3235391470



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala:
##########
@@ -620,3 +624,48 @@ case class HoodiePostAnalysisRule(sparkSession: 
SparkSession) extends Rule[Logic
     }
   }
 }
+
+/**
+ * Stamps a synthesized [[CatalogTable]] (table name, base path, schema) onto 
path-based
+ * Hudi reads whose underlying file index is incremental or CDC. Without it, 
lineage and
+ * governance tooling sees `LogicalRelation.catalogTable = None` and falls 
back to the
+ * relation's class name as the dataset identifier -- useless for tracking 
which table
+ * an incremental query came from.
+ *
+ * Scope is intentionally limited to incremental and CDC reads:
+ *  - Catalog-registered reads already have `catalogTable` populated.
+ *  - Path-based snapshot reads have a working file-path-based fallback in 
existing
+ *    lineage tooling; changing their behavior is a separate decision.
+ */
+object HoodieIncrementalRelationIdentifier extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan =
+    AnalysisHelper.allowInvokingTransformsInAnalyzer {

Review Comment:
   🤖 The 4-arg pattern `LogicalRelation(hfsr, _, None, _)` will fail to compile 
against Spark 4.0/4.1, where `LogicalRelation` is a 5-arg case class (see e.g. 
`BaseSpark4Adapter`, `Spark4HoodiePruneFileSourcePartitions`, 
`HoodieSpark40/41Analysis` — all 5-arg). Since `hudi-spark` is built per Spark 
profile, this would break the Spark 4 builds. Could you switch to a 
Spark-version-agnostic type pattern (e.g. `case lr: LogicalRelation if 
lr.catalogTable.isEmpty && lr.relation.isInstanceOf[HadoopFsRelation] && 
isIncrementalOrCDC(lr.relation.asInstanceOf[HadoopFsRelation].location] => 
...`) to match the convention used by every other `LogicalRelation` reference 
in `hudi-spark`? @yihua
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala:
##########
@@ -620,3 +624,48 @@ case class HoodiePostAnalysisRule(sparkSession: 
SparkSession) extends Rule[Logic
     }
   }
 }
+
+/**
+ * Stamps a synthesized [[CatalogTable]] (table name, base path, schema) onto 
path-based
+ * Hudi reads whose underlying file index is incremental or CDC. Without it, 
lineage and
+ * governance tooling sees `LogicalRelation.catalogTable = None` and falls 
back to the
+ * relation's class name as the dataset identifier -- useless for tracking 
which table
+ * an incremental query came from.
+ *
+ * Scope is intentionally limited to incremental and CDC reads:
+ *  - Catalog-registered reads already have `catalogTable` populated.
+ *  - Path-based snapshot reads have a working file-path-based fallback in 
existing
+ *    lineage tooling; changing their behavior is a separate decision.
+ */
+object HoodieIncrementalRelationIdentifier extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan =
+    AnalysisHelper.allowInvokingTransformsInAnalyzer {
+      plan transform {
+        case lr @ LogicalRelation(hfsr: HadoopFsRelation, _, None, _)

Review Comment:
   🤖 nit: `hfsr` requires a moment of mental decoding — could you use 
`fsRelation` or just `rel` so the reader doesn't have to expand the 
abbreviation?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(spark): Make Hudi's analyzed plans introspectable to lineage tooling [hudi]

Reply via email to