cloud-fan commented on code in PR #39248:
URL: https://github.com/apache/spark/pull/39248#discussion_r1058115431


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala:
##########
@@ -117,10 +111,6 @@ object InterpretedMutableProjection {
    * Returns a [[MutableProjection]] for given sequence of bound Expressions.
    */
   def createProjection(exprs: Seq[Expression]): MutableProjection = {
-    // We need to make sure that we do not reuse stateful expressions.
-    val cleanedExpressions = exprs.map(_.transform {
-      case s: Stateful => s.freshCopy()

Review Comment:
   The old usage of `.transform` here contained a subtle bug related to how 
`fastEquals` works.
   
   Let's say that we have a tree which looks like this:
   
   > Outer(Middle(Stateful()))
   
   where `Outer` and `Middle` are non-Stateful expressions.
   
   When the `.transform` is applied to `Stateful()` and `.freshCopy()` is 
called, the returned value will be `==` to the original Stateful expression but 
will have a different object identity (because it's a fresh object). 
Internally, `.transform` will use `fastEquals` to check whether the 
transformation modified the node. `Stateful` overrides `fastEquals` so that it 
only considers object identity, so the transform will return the `freshCopy()` 
result.
   
   At the next level up, Middle will check whether any of its children have 
been changed in the recursive bottom-up transformation (see 
[childrenFastEquals() in 
withNewChildren()](https://github.com/apache/spark/blob/9305cc744d27daa6a746d3eb30e7639c63329072/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L455-L456),
 which is called from `mapChildren()`). It will detect that its children have 
changed, so the transform will return a new Middle node.
   
   Finally, at the top level, `Outer` will perform the same check to see if any 
of its children have changed. This time, however, it will be calling 
`Middle.fastEquals` instead of `Stateful.fastEquals`. Middle's fastEquals 
method is the regular implementation which also considers object equality. Both 
the original and new `Middle` nodes will be `==`, so `fastEquals` will be true 
and `Outer` will conclude that its children have not been changed by the 
transformation and the original `Outer` will be returned (losing the copy of 
the stateful expression).
   
   In other words, the old `.transform` and copying logic here was incorrect if 
the Stateful expression was nested more than a single level deep.
   
   In this PR I chose to fix this by adding a 
`freshCopyIfContainsStatefulExpression()` method to `Expression` which 
implements a custom tree traversal considers _only_ object identity when 
determining whether the transform has changed a node or a node's children.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to