Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/17330#discussion_r106840346 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala --- @@ -61,6 +63,37 @@ abstract class SubqueryExpression( } } +/** + * This expression is used to represent any form of subquery expression namely + * ListQuery, Exists and ScalarSubquery. This is only used to make sure the + * expression equality works properly when LogicalPlan.sameResult is called + * on plans containing SubqueryExpression(s). This is only a transient expression + * that only lives in the scope of sameResult function call. In other words, analyzer, + * optimizer or planner never sees this expression type during transformation of + * plans. + */ +case class CanonicalizedSubqueryExpr(expr: SubqueryExpression) + extends UnaryExpression with Unevaluable { + override def dataType: DataType = expr.dataType + override def nullable: Boolean = expr.nullable + override def child: Expression = expr + override def toString: String = s"CanonicalizedSubqueryExpr(${expr.toString})" + + // Hashcode is generated conservatively for now i.e it does not include the + // sub query plan. Doing so causes issue when we canonicalize expressions to + // re-order them based on hashcode. + // TODO : improve the hashcode generation by considering the plan info. + override def hashCode(): Int = { + val h = Objects.hashCode(expr.children) --- End diff -- @cloud-fan Hi wenchen, we are unable to use expr.semanticHash here. I had actually tried to do that. So when we canonicalize the expressions we are re-ordering them based on their hash code and that creates problem. I have a test case that has three subquery expressions (ScalarSubquery, ListQuery and Exists) and while re-ordering them the order become unpredictable. So in here when i generate the hashcode, i am generating them conservatively. Let me know what you think.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org