cloud-fan commented on a change in pull request #29585:
URL: https://github.com/apache/spark/pull/29585#discussion_r480156120



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
##########
@@ -203,3 +203,28 @@ abstract class BinaryNode extends LogicalPlan {
 abstract class OrderPreservingUnaryNode extends UnaryNode {
   override final def outputOrdering: Seq[SortOrder] = child.outputOrdering
 }
+
+object LogicalPlanIntegrity {
+
+  private def canGetOutputAttrs(p: LogicalPlan): Boolean = {
+    p.resolved && !p.expressions.exists { e =>
+      // Some plans cannot call `output` because their expressions have 
`Unevaluable`,
+      // e.g., `Join` having a `ExistenceJoin` type.
+      e.collectFirst { case _: Unevaluable => true }.isDefined
+    }
+  }
+
+  /**
+   * This method checks if the same expression ID, `ExprId`, refer to an 
unique attribute.
+   * Some plan transformers (e.g., `RemoveNoopOperators`) rewrite logical
+   * plans based on this assumption.
+   */
+  def hasUniqueExprIdsForAttributes(plan: LogicalPlan): Boolean = {
+    val allOutputAttrs = plan.collect { case p if canGetOutputAttrs(p) =>
+      p.output.filter(_.resolved).map(_.canonicalized.asInstanceOf[Attribute])
+    }
+    val groupedAttrsByExprId = allOutputAttrs
+      .flatten.groupBy(_.exprId).values.map(_.distinct)

Review comment:
       This reminds me of the hacks in `FoldablePropagation`.
   
   We do have attributes with the same exprId but are actually different 
attributes, but I don't think there is an easy way to detect it automatically. 
For example, `a + 1 as a`, if I reuse the exprId of attribute `a` in the alias, 
how can we detect it? The name, data type and nullability are all the same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to