arunpandianp commented on code in PR #38988:
URL: https://github.com/apache/beam/pull/38988#discussion_r3423000063
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java:
##########
Review Comment:
There could be keys with only timers and no elements, do we need to call
`onStartKey` in `processTimers`?
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingKeyedWorkItemSideInputParDoFn.java:
##########
@@ -119,7 +120,9 @@ protected void onStartKey() {
if (sideInputProcessor != null) {
boolean hasState = helpers.hasState();
- // TODO(relax): We should be able to get this without writing it to
state!
+ // TODO(relax): We should be able to get this without writing it to
state! To make this work,
Review Comment:
finishKey has the key. If we move onStartKey to finishKey, we should be able
to make this optimization.
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java:
##########
@@ -196,12 +199,15 @@ public void processTimers() throws Exception {
}
@Override
- public void finishKey(Object key) throws Exception {}
+ public void finishKey(Object key) throws Exception {
+ this.activeKey = false;
Review Comment:
should we call `sideInputFetcher.persist();` in `finishKey`
##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java:
##########
@@ -196,12 +199,15 @@ public void processTimers() throws Exception {
}
@Override
- public void finishKey(Object key) throws Exception {}
+ public void finishKey(Object key) throws Exception {
+ this.activeKey = false;
Review Comment:
IIUC there could be keys with no elements or timers. Should we move the
`onStartKey` call here to handle such keys?
Instead of calling `onStartKey` in startBundle and processElements, we could
call it in processElements, processTimers and finishKey.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]