boyuanzz commented on a change in pull request #12241:
URL: https://github.com/apache/beam/pull/12241#discussion_r454504904



##########
File path: sdks/python/apache_beam/runners/common.py
##########
@@ -842,29 +847,37 @@ def _invoke_process_per_window(self,
 
   def try_split(self, fraction):
     # type: (...) -> Optional[Tuple[SplitResultPrimary, SplitResultResidual]]
-    if self.threadsafe_restriction_tracker and self.current_windowed_value:
+    if not self.is_splittable:
+      return None
+
+    with self.splitting_lock:
+      # Make a local reference to member variables that change references 
during
+      # processing under lock before attempting to split so we have a 
consistent
+      # view of all the references.
+      current_windowed_value = self.current_windowed_value

Review comment:
       Are you trying to deep-copy `current_windowed_value`, 
`threadsafe_restriction_tracker ` and `threadsafe_watermark_estimator `? If so, 
we need to do it explicitly `copy.deepcopy`. 

##########
File path: sdks/python/apache_beam/runners/common.py
##########
@@ -842,29 +847,37 @@ def _invoke_process_per_window(self,
 
   def try_split(self, fraction):
     # type: (...) -> Optional[Tuple[SplitResultPrimary, SplitResultResidual]]
-    if self.threadsafe_restriction_tracker and self.current_windowed_value:
+    if not self.is_splittable:
+      return None
+
+    with self.splitting_lock:

Review comment:
       If we are deep copying objects, it seems like we can use the lock to 
guard the copying logic only, instead of the entire split logic.

##########
File path: sdks/python/apache_beam/runners/common.py
##########
@@ -842,29 +847,37 @@ def _invoke_process_per_window(self,
 
   def try_split(self, fraction):
     # type: (...) -> Optional[Tuple[SplitResultPrimary, SplitResultResidual]]
-    if self.threadsafe_restriction_tracker and self.current_windowed_value:
+    if not self.is_splittable:
+      return None
+
+    with self.splitting_lock:
+      # Make a local reference to member variables that change references 
during
+      # processing under lock before attempting to split so we have a 
consistent
+      # view of all the references.
+      current_windowed_value = self.current_windowed_value
+      threadsafe_restriction_tracker = self.threadsafe_restriction_tracker
+      threadsafe_watermark_estimator = self.threadsafe_watermark_estimator
+
+    if threadsafe_restriction_tracker:

Review comment:
       Would it be better to both check `threadsafe_restriction_tracker ` and 
`threadsafe_watermark_estimator` for easy reading?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to