arunpandianp commented on code in PR #38988:
URL: https://github.com/apache/beam/pull/38988#discussion_r3431554183
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/PartialGroupByKeyParDoFns.java:
##########
@@ -378,15 +394,29 @@ public void processElement(Object elem) throws Exception {
}
@Override
- public void processTimers() {}
+ public void processTimers() throws Exception {
+ if (!activeKey) {
+ onStartKey(null);
+ }
+ }
@Override
- public void finishKey(Object key) throws Exception {}
+ public void finishKey(Object key) throws Exception {
+ if (!activeKey) {
+ onStartKey((K) key);
+ }
+ sideInputFetcher.persist();
+ sideInputFetcher = null;
+ this.activeKey = false;
+ }
@Override
public void finishBundle() throws Exception {
groupingTable.flush(receiver);
- sideInputFetcher.persist();
+ if (sideInputFetcher != null) {
+ sideInputFetcher.persist();
+ }
+ this.activeKey = false;
Review Comment:
nit: redundant with the logic in finishKey and can be removed.
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFnHelpers.java:
##########
@@ -103,6 +104,8 @@ class SimpleParDoFnHelpers<InputT, OutputT, W extends
BoundedWindow> {
// This may additionally be null if it is not a real DoFn but an OldDoFn or
// GroupAlsoByWindowViaWindowSetDoFn
protected @Nullable DoFnSignature fnSignature;
+ boolean activeKey = false;
+ Consumer<K> onStartKey;
Review Comment:
make these `private`? `onStartKey` can also be `final`
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputDoFnRunner.java:
##########
@@ -49,6 +50,10 @@ public StreamingSideInputDoFnRunner(
@Override
public void startBundle() {
simpleDoFnRunner.startBundle();
+ this.activeKey = false;
+ }
+
+ private void tryUnblockElements() {
sideInputProcessor.tryUnblockElements(
Review Comment:
do we need to recreate sideInputProcessor every key like in other classses?
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFnHelpers.java:
##########
@@ -233,18 +243,29 @@ public <TagT> void output(TupleTag<TagT> tag,
WindowedValue<TagT> output) {
fnRunner.startBundle();
}
+ void finishKey(StreamingSideInputProcessor<?, ?> sideInputProcessor) {
+ if (!activeKey) {
+ // This means that there were no elements for this key. Try to unblock
any queued elements.
+ onStartKey.accept((K) stepContext.stateInternals().getKey());
+ }
+ if (sideInputProcessor != null) {
+ sideInputProcessor.handleFinishKeyOrBundle();
+ }
+ this.activeKey = false;
+ }
+
void finishBundle(StreamingSideInputProcessor<?, ?> sideInputProcessor)
throws Exception {
if (fnRunner != null) {
fnRunner.finishBundle();
if (sideInputProcessor != null) {
Review Comment:
I think the handleFinishKeyOrBundle calls can be removed from finishBundle
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFnHelpers.java:
##########
@@ -145,6 +149,11 @@ class SimpleParDoFnHelpers<InputT, OutputT, W extends
BoundedWindow> {
this.outputsPerElementTracker = createOutputsPerElementTracker();
this.doFnSchemaInformation = doFnSchemaInformation;
this.sideInputMapping = sideInputMapping;
+ this.onStartKey =
+ k -> {
+ onStartKey.accept(k);
+ this.activeKey = false;
Review Comment:
should this be `this.activeKey = true;`
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/PartialGroupByKeyParDoFns.java:
##########
@@ -378,15 +394,29 @@ public void processElement(Object elem) throws Exception {
}
@Override
- public void processTimers() {}
+ public void processTimers() throws Exception {
+ if (!activeKey) {
+ onStartKey(null);
Review Comment:
can we pass key from stepContext instead of null?
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputDoFnRunner.java:
##########
@@ -38,6 +38,7 @@ public class StreamingSideInputDoFnRunner<InputT, OutputT, W
extends BoundedWind
implements DoFnRunner<InputT, OutputT> {
private final DoFnRunner<InputT, OutputT> simpleDoFnRunner;
private final StreamingSideInputProcessor<InputT, W> sideInputProcessor;
+ boolean activeKey = false;
Review Comment:
can be made private.
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingKeyedWorkItemSideInputParDoFn.java:
##########
@@ -208,7 +190,7 @@ public void abort() throws Exception {
}
protected void onProcessWindowedValue(WindowedValue<KeyedWorkItem<K,
InputT>> elem) {
- // TODO: Get rid of this!
+ // TODO: Get rid of this once we know the current key.
final K key = elem.getValue().key();
Review Comment:
can we remove `keyValue().write(key);`?
If it is kept for update compatibility, need to update comment and maybe add
a TODO to remove on newer jobs with updateCompatibility flag.
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFnHelpers.java:
##########
@@ -299,17 +338,15 @@ void processTimers(
TimerType mode,
DataflowExecutionContext.DataflowStepContext context,
Coder<BoundedWindow> windowCoder,
- Runnable startKey,
Supplier<StreamingSideInputProcessor<?, ?>> sideInputProcessor)
throws Exception {
TimerInternals.TimerData timer = context.getNextFiredTimer(windowCoder);
-
if (timer != null && fnRunner == null) {
// If we need to run reallyStartBundle in here, we need to make sure to
switch the state
// sampler into the start state.
try (Closeable start = operationContext.enterStart()) {
reallyStartBundle();
- startKey.run();
+ this.onStartKey.accept((K) context.stateInternals().getKey());
Review Comment:
this should to be under a `if (!activeKey) {` check.
In processElements onStartKey is called outside
`operationContext.enterStart()`, here we call it inside
`operationContext.enterStart()`. Move it outside the `enterStart` block?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]