robertwb commented on code in PR #31420:
URL: https://github.com/apache/beam/pull/31420#discussion_r1618006764


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataInboundObserver.java:
##########
@@ -119,7 +127,12 @@ public void close() throws Exception {
   public void awaitCompletion() throws Exception {
     try {
       while (true) {
+        // The SDK is available to process data right before it is ready to 
take elements off the
+        // queue.
+        consumingReceivedData.set(false);
         BeamFnApi.Elements elements = queue.take();
+        // The SDK is now no longer available to receive more data, so we set 
it to false.

Review Comment:
   ...no longer blocked on receiving more data...



##########
model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto:
##########
@@ -1672,6 +1672,10 @@ message StandardProtocols {
     // during bundle processing. This is disabled by default and enabled with 
the
     // `enable_data_sampling` experiment.
     DATA_SAMPLING = 8 [(beam_urn) = "beam:protocol:data_sampling:v1"];
+
+    // Indicates if the SDK is currently consuming received data.

Review Comment:
   Indicates whether the SDK sets the `consuming_received_data` bit on progress 
response messages.



##########
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto:
##########
@@ -503,6 +503,8 @@ message ProcessBundleProgressResponse {
   // as the MonitoringInfo could be reconstructed fully by overwriting its
   // payload field with the bytes specified here.
   map<string, bytes> monitoring_data = 5;
+  // Indicates if the SDK is consuming received data or not.

Review Comment:
   We should probably be more explicit about what this actually means than just 
paraphrasing its name. 
   
   For example, 
   
   Indicates that the SDK is still busy consuming the data that as already been 
received on the data channel. If this is set, a runner may abstain from sending 
further data on the data channel until this field becomes unset. 



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataInboundObserver.java:
##########
@@ -119,7 +127,12 @@ public void close() throws Exception {
   public void awaitCompletion() throws Exception {
     try {
       while (true) {
+        // The SDK is available to process data right before it is ready to 
take elements off the

Review Comment:
   The SDK indicates it's consumed all received data before it attempts to take 
more elements off the queue.



##########
sdks/go/pkg/beam/core/runtime/exec/datasource.go:
##########
@@ -151,6 +157,8 @@ func (n *DataSource) process(ctx context.Context, data 
func(bcr *byteCountReader
                        // io.EOF means the reader successfully drained.
                        // We're ready for a new buffer.
                case <-ctx.Done():
+                       // now that it is done processing received data, we set 
it to false.

Review Comment:
   Wouldn't this have already been set to false on entering the loop? (I assume 
the cases here are mutually exclusive...) Or is this just to ensure we never 
exit this function without resetting this value?



##########
sdks/python/apache_beam/runners/worker/bundle_processor.py:
##########
@@ -123,16 +123,17 @@
 class RunnerIOOperation(operations.Operation):
   """Common baseclass for runner harness IO operations."""
 
-  def __init__(self,
-               name_context,  # type: common.NameContext
-               step_name,  # type: Any
-               consumers,  # type: Mapping[Any, Iterable[operations.Operation]]
-               counter_factory,  # type: counters.CounterFactory
-               state_sampler,  # type: statesampler.StateSampler
-               windowed_coder,  # type: coders.Coder
-               transform_id,  # type: str
-               data_channel  # type: data_plane.DataChannel
-              ):
+  def __init__(

Review Comment:
   Undo this whitespace refactoring (or move to another PR)? 



##########
sdks/python/apache_beam/runners/worker/sdk_worker.py:
##########
@@ -745,18 +746,29 @@ def process_bundle_progress(
           instruction_id=instruction_id, error=traceback.format_exc())
     if processor:
       monitoring_infos = processor.monitoring_infos()
+      consuming_received_data = \

Review Comment:
   Don't manually break line.



##########
sdks/python/apache_beam/transforms/environments.py:
##########
@@ -111,11 +111,12 @@ class Environment(object):
   _known_urns = {}  # type: Dict[str, Tuple[Optional[type], ConstructorFn]]
   _urn_to_env_cls = {}  # type: Dict[str, type]
 
-  def __init__(self,
+  def __init__(

Review Comment:
   Again, the huge number of irrelevent whitespace changes make it hard to see 
what the actual (subtle) relevant bits are. 



##########
sdks/python/apache_beam/runners/worker/bundle_processor.py:
##########
@@ -1112,6 +1131,9 @@ def process_bundle(self, instruction_id):
           elif isinstance(element, beam_fn_api_pb2.Elements.Data):
             input_op_by_transform_id[element.transform_id].process_encoded(
                 element.data)
+          # Since we have processed this element, we are now ready to

Review Comment:
   this bundle of elements



##########
sdks/python/apache_beam/runners/worker/sdk_worker.py:
##########
@@ -745,18 +746,29 @@ def process_bundle_progress(
           instruction_id=instruction_id, error=traceback.format_exc())
     if processor:
       monitoring_infos = processor.monitoring_infos()
+      consuming_received_data = \
+        processor.consuming_received_data
+      return beam_fn_api_pb2.InstructionResponse(

Review Comment:
   Don't duplicate the whole return; just set `consuming_received_data = None` 
(or `False`) just as was done with monitoring infos. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to