HeartSaVioR commented on a change in pull request #28975: URL: https://github.com/apache/spark/pull/28975#discussion_r450042481
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ########## @@ -259,6 +269,9 @@ class SymmetricHashJoinStateManager( return null } + // Make a copy on value row, as below cleanup logic may update the value row silently. + currentValue = currentValue.copy(value = currentValue.value.copy()) Review comment: Yes. That wasn't necessary for format V1 as the original row was stored into state store, and state store (strictly saying, the implementation of HDFS state store provider) makes sure these rows are copied version. For other places, it can propagate to the callers outside of state manager, and looks like these callers don't need to copy the row. (It's super tricky for me to determine whether the copy is necessary or not, if the code is not in a simple loop or stream.) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org