[GitHub] [beam] y1chi commented on a diff in pull request #26085: Populate getWorkStream latencies in dataflow streaming worker harness

via GitHub Tue, 30 May 2023 10:06:54 -0700


y1chi commented on code in PR #26085:
URL: https://github.com/apache/beam/pull/26085#discussion_r1210574213



##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServer.java:
##########
@@ -969,6 +987,76 @@ protected void startThrottleTimer() {
       getWorkThrottleTimer.start();
     }
 
+    private class GetWorkTimingInfosTracker {
+      private final Map<State, Duration> getWorkStreamLatencies;
+
+      public GetWorkTimingInfosTracker() {
+        this.getWorkStreamLatencies = new EnumMap<>(State.class);
+      }
+
+      public void addTimingInfo(Collection<GetWorkStreamTimingInfo> infos) {
+        // We want to record duration for each stage and also be reflective on 
total work item
+        // processing time. It can be tricky because timings of different
+        // StreamingGetWorkResponseChunks can be interleaved. Current strategy 
is to record the
+        // maximum duration in each stage across different chunks, this will 
allow us to identify
+        // the slow stage, but note the sum duration of each slowest stages 
may be larger than the
+        // duration from first chunk creation to last chunk reception by user 
worker.

Review Comment:
   Do you mind elaborate on how to scale them, do you mean apply a multiplier?
   From testing it looks like the GetWork creation and transmission are on the 
magnitude of  0 ~ 200 ms for a work item (unless there is significant delay). 
They are likely incomparable to the latencies measured on the user worker 
anyways which is a few seconds to a few minutes most likely. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] y1chi commented on a diff in pull request #26085: Populate getWorkStream latencies in dataflow streaming worker harness

Reply via email to