Github user zd-project commented on a diff in the pull request:
https://github.com/apache/storm/pull/2764#discussion_r208751049
--- Diff:
storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -2826,9 +2915,22 @@ public void launchServer() throws Exception {
.parallelStream()
.mapToDouble(SupervisorResources::getTotalCpu)
.sum());
-
+
StormMetricsRegistry.registerGauge("nimbus:longest-scheduling-time-ms", () -> {
+ Long currTime = Time.nanoTime();
+ Long startTime = schedulingStartTime.get();
+ //There could be race condition here but seems trivial,
elapsed is
+ // guaranteed to be no longer than real elapsed time of
scheduling
+ Long longest = longestSchedulingTime.get();
+ if (startTime != null) {
+ longest = currTime - startTime > longest ? currTime -
startTime : longest;
--- End diff --
We would like to compute the distribution of scheduler latency as well as
the longest scheduling iteration. If a scheduler is stuck in the middle of a
scheduling iteration, the histogram won't reflect that until the scheduling
iteration has ended because timer only report the time for a complete cycle.
Hence I added this gauge to track the longest scheduling iteration in real time.
---