cloud-fan commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058115431
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala: ########## @@ -117,10 +111,6 @@ object InterpretedMutableProjection { * Returns a [[MutableProjection]] for given sequence of bound Expressions. */ def createProjection(exprs: Seq[Expression]): MutableProjection = { - // We need to make sure that we do not reuse stateful expressions. - val cleanedExpressions = exprs.map(_.transform { - case s: Stateful => s.freshCopy() Review Comment: The old usage of `.transform` here contained a subtle bug related to how `fastEquals` works. Let's say that we have a tree which looks like this: > Outer(Middle(Stateful())) where `Outer` and `Middle` are non-Stateful expressions. When the `.transform` is applied to `Stateful()` and `.freshCopy()` is called, the returned value will be `==` to the original Stateful expression but will have a different object identity (because it's a fresh object). Internally, `.transform` will use `fastEquals` to check whether the transformation modified the node. `Stateful` overrides `fastEquals` so that it only considers object identity, so the transform will return the `freshCopy()` result. At the next level up, Middle will check whether any of its children have been changed in the recursive bottom-up transformation (see [childrenFastEquals() in withNewChildren()](https://github.com/apache/spark/blob/9305cc744d27daa6a746d3eb30e7639c63329072/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L455-L456), which is called from `mapChildren()`). It will detect that its children have changed, so the transform will return a new Middle node. Finally, at the top level, `Outer` will perform the same check to see if any of its children have changed. This time, however, it will be calling `Middle.fastEquals` instead of `Stateful.fastEquals`. Middle's fastEquals method is the regular implementation which also considers object equality. Both the original and new `Middle` nodes will be `==`, so `fastEquals` will be true and `Outer` will conclude that its children have not been changed by the transformation and the original `Outer` will be returned (losing the copy of the stateful expression). In other words, the old `.transform` and copying logic here was incorrect if the Stateful expression was nested more than a single level deep. In this PR I chose to fix this by adding a `freshCopyIfContainsStatefulExpression()` method to `Expression` which implements a custom tree traversal considers _only_ object identity when determining whether the transform has changed a node or a node's children. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org