robertwb commented on code in PR #31420:
URL: https://github.com/apache/beam/pull/31420#discussion_r1618006764
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataInboundObserver.java:
##########
@@ -119,7 +127,12 @@ public void close() throws Exception {
public void awaitCompletion() throws Exception {
try {
while (true) {
+ // The SDK is available to process data right before it is ready to
take elements off the
+ // queue.
+ consumingReceivedData.set(false);
BeamFnApi.Elements elements = queue.take();
+ // The SDK is now no longer available to receive more data, so we set
it to false.
Review Comment:
...no longer blocked on receiving more data...
##########
model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto:
##########
@@ -1672,6 +1672,10 @@ message StandardProtocols {
// during bundle processing. This is disabled by default and enabled with
the
// `enable_data_sampling` experiment.
DATA_SAMPLING = 8 [(beam_urn) = "beam:protocol:data_sampling:v1"];
+
+ // Indicates if the SDK is currently consuming received data.
Review Comment:
Indicates whether the SDK sets the `consuming_received_data` bit on progress
response messages.
##########
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto:
##########
@@ -503,6 +503,8 @@ message ProcessBundleProgressResponse {
// as the MonitoringInfo could be reconstructed fully by overwriting its
// payload field with the bytes specified here.
map<string, bytes> monitoring_data = 5;
+ // Indicates if the SDK is consuming received data or not.
Review Comment:
We should probably be more explicit about what this actually means than just
paraphrasing its name.
For example,
Indicates that the SDK is still busy consuming the data that as already been
received on the data channel. If this is set, a runner may abstain from sending
further data on the data channel until this field becomes unset.
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataInboundObserver.java:
##########
@@ -119,7 +127,12 @@ public void close() throws Exception {
public void awaitCompletion() throws Exception {
try {
while (true) {
+ // The SDK is available to process data right before it is ready to
take elements off the
Review Comment:
The SDK indicates it's consumed all received data before it attempts to take
more elements off the queue.
##########
sdks/go/pkg/beam/core/runtime/exec/datasource.go:
##########
@@ -151,6 +157,8 @@ func (n *DataSource) process(ctx context.Context, data
func(bcr *byteCountReader
// io.EOF means the reader successfully drained.
// We're ready for a new buffer.
case <-ctx.Done():
+ // now that it is done processing received data, we set
it to false.
Review Comment:
Wouldn't this have already been set to false on entering the loop? (I assume
the cases here are mutually exclusive...) Or is this just to ensure we never
exit this function without resetting this value?
##########
sdks/python/apache_beam/runners/worker/bundle_processor.py:
##########
@@ -123,16 +123,17 @@
class RunnerIOOperation(operations.Operation):
"""Common baseclass for runner harness IO operations."""
- def __init__(self,
- name_context, # type: common.NameContext
- step_name, # type: Any
- consumers, # type: Mapping[Any, Iterable[operations.Operation]]
- counter_factory, # type: counters.CounterFactory
- state_sampler, # type: statesampler.StateSampler
- windowed_coder, # type: coders.Coder
- transform_id, # type: str
- data_channel # type: data_plane.DataChannel
- ):
+ def __init__(
Review Comment:
Undo this whitespace refactoring (or move to another PR)?
##########
sdks/python/apache_beam/runners/worker/sdk_worker.py:
##########
@@ -745,18 +746,29 @@ def process_bundle_progress(
instruction_id=instruction_id, error=traceback.format_exc())
if processor:
monitoring_infos = processor.monitoring_infos()
+ consuming_received_data = \
Review Comment:
Don't manually break line.
##########
sdks/python/apache_beam/transforms/environments.py:
##########
@@ -111,11 +111,12 @@ class Environment(object):
_known_urns = {} # type: Dict[str, Tuple[Optional[type], ConstructorFn]]
_urn_to_env_cls = {} # type: Dict[str, type]
- def __init__(self,
+ def __init__(
Review Comment:
Again, the huge number of irrelevent whitespace changes make it hard to see
what the actual (subtle) relevant bits are.
##########
sdks/python/apache_beam/runners/worker/bundle_processor.py:
##########
@@ -1112,6 +1131,9 @@ def process_bundle(self, instruction_id):
elif isinstance(element, beam_fn_api_pb2.Elements.Data):
input_op_by_transform_id[element.transform_id].process_encoded(
element.data)
+ # Since we have processed this element, we are now ready to
Review Comment:
this bundle of elements
##########
sdks/python/apache_beam/runners/worker/sdk_worker.py:
##########
@@ -745,18 +746,29 @@ def process_bundle_progress(
instruction_id=instruction_id, error=traceback.format_exc())
if processor:
monitoring_infos = processor.monitoring_infos()
+ consuming_received_data = \
+ processor.consuming_received_data
+ return beam_fn_api_pb2.InstructionResponse(
Review Comment:
Don't duplicate the whole return; just set `consuming_received_data = None`
(or `False`) just as was done with monitoring infos.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]