[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

GitBox Mon, 06 Jul 2020 00:49:07 -0700


HeartSaVioR commented on a change in pull request #28975:
URL: https://github.com/apache/spark/pull/28975#discussion_r450042481




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -259,6 +269,9 @@ class SymmetricHashJoinStateManager(
           return null
         }
 
+        // Make a copy on value row, as below cleanup logic may update the 
value row silently.
+        currentValue = currentValue.copy(value = currentValue.value.copy())

Review comment:
       Yes. That wasn't necessary for format V1 as the original row was stored 
into state store, and state store (strictly saying, the implementation of HDFS 
state store provider) makes sure these rows are copied version.
   
   For other places, it can propagate to the callers outside of state manager, 
and looks like these callers don't need to copy the row. (It's super tricky for 
me to determine whether the copy is necessary or not, if the code is not in a 
simple loop or stream.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

Reply via email to