robertwb commented on code in PR #31420: URL: https://github.com/apache/beam/pull/31420#discussion_r1618006764
########## sdks/java/core/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataInboundObserver.java: ########## @@ -119,7 +127,12 @@ public void close() throws Exception { public void awaitCompletion() throws Exception { try { while (true) { + // The SDK is available to process data right before it is ready to take elements off the + // queue. + consumingReceivedData.set(false); BeamFnApi.Elements elements = queue.take(); + // The SDK is now no longer available to receive more data, so we set it to false. Review Comment: ...no longer blocked on receiving more data... ########## model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto: ########## @@ -1672,6 +1672,10 @@ message StandardProtocols { // during bundle processing. This is disabled by default and enabled with the // `enable_data_sampling` experiment. DATA_SAMPLING = 8 [(beam_urn) = "beam:protocol:data_sampling:v1"]; + + // Indicates if the SDK is currently consuming received data. Review Comment: Indicates whether the SDK sets the `consuming_received_data` bit on progress response messages. ########## model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto: ########## @@ -503,6 +503,8 @@ message ProcessBundleProgressResponse { // as the MonitoringInfo could be reconstructed fully by overwriting its // payload field with the bytes specified here. map<string, bytes> monitoring_data = 5; + // Indicates if the SDK is consuming received data or not. Review Comment: We should probably be more explicit about what this actually means than just paraphrasing its name. For example, Indicates that the SDK is still busy consuming the data that as already been received on the data channel. If this is set, a runner may abstain from sending further data on the data channel until this field becomes unset. ########## sdks/java/core/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataInboundObserver.java: ########## @@ -119,7 +127,12 @@ public void close() throws Exception { public void awaitCompletion() throws Exception { try { while (true) { + // The SDK is available to process data right before it is ready to take elements off the Review Comment: The SDK indicates it's consumed all received data before it attempts to take more elements off the queue. ########## sdks/go/pkg/beam/core/runtime/exec/datasource.go: ########## @@ -151,6 +157,8 @@ func (n *DataSource) process(ctx context.Context, data func(bcr *byteCountReader // io.EOF means the reader successfully drained. // We're ready for a new buffer. case <-ctx.Done(): + // now that it is done processing received data, we set it to false. Review Comment: Wouldn't this have already been set to false on entering the loop? (I assume the cases here are mutually exclusive...) Or is this just to ensure we never exit this function without resetting this value? ########## sdks/python/apache_beam/runners/worker/bundle_processor.py: ########## @@ -123,16 +123,17 @@ class RunnerIOOperation(operations.Operation): """Common baseclass for runner harness IO operations.""" - def __init__(self, - name_context, # type: common.NameContext - step_name, # type: Any - consumers, # type: Mapping[Any, Iterable[operations.Operation]] - counter_factory, # type: counters.CounterFactory - state_sampler, # type: statesampler.StateSampler - windowed_coder, # type: coders.Coder - transform_id, # type: str - data_channel # type: data_plane.DataChannel - ): + def __init__( Review Comment: Undo this whitespace refactoring (or move to another PR)? ########## sdks/python/apache_beam/runners/worker/sdk_worker.py: ########## @@ -745,18 +746,29 @@ def process_bundle_progress( instruction_id=instruction_id, error=traceback.format_exc()) if processor: monitoring_infos = processor.monitoring_infos() + consuming_received_data = \ Review Comment: Don't manually break line. ########## sdks/python/apache_beam/transforms/environments.py: ########## @@ -111,11 +111,12 @@ class Environment(object): _known_urns = {} # type: Dict[str, Tuple[Optional[type], ConstructorFn]] _urn_to_env_cls = {} # type: Dict[str, type] - def __init__(self, + def __init__( Review Comment: Again, the huge number of irrelevent whitespace changes make it hard to see what the actual (subtle) relevant bits are. ########## sdks/python/apache_beam/runners/worker/bundle_processor.py: ########## @@ -1112,6 +1131,9 @@ def process_bundle(self, instruction_id): elif isinstance(element, beam_fn_api_pb2.Elements.Data): input_op_by_transform_id[element.transform_id].process_encoded( element.data) + # Since we have processed this element, we are now ready to Review Comment: this bundle of elements ########## sdks/python/apache_beam/runners/worker/sdk_worker.py: ########## @@ -745,18 +746,29 @@ def process_bundle_progress( instruction_id=instruction_id, error=traceback.format_exc()) if processor: monitoring_infos = processor.monitoring_infos() + consuming_received_data = \ + processor.consuming_received_data + return beam_fn_api_pb2.InstructionResponse( Review Comment: Don't duplicate the whole return; just set `consuming_received_data = None` (or `False`) just as was done with monitoring infos. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org