jack-moseley commented on a change in pull request #3290:
URL: https://github.com/apache/gobblin/pull/3290#discussion_r640226670
##########
File path:
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java
##########
@@ -460,6 +463,10 @@ public synchronized void setActive(boolean active) {
this.jobStatusPolledTimer =
Optional.of(this.metricContext.timer(ServiceMetricNames.JOB_STATUS_POLLED_TIMER));
ContextAwareGauge<Long> orchestrationDelayMetric =
metricContext.newContextAwareGauge(ServiceMetricNames.FLOW_ORCHESTRATION_DELAY,
() -> orchestrationDelay.get());
+ this.allSuccessfulMeter = metricContext.contextAwareMeter(
Review comment:
See
https://javadoc.io/doc/io.dropwizard.metrics/metrics-core/3.2.1/com/codahale/metrics/Meter.html
It can return an "exponentially-weighted moving average rate" of the past
5/10/15 minutes. Which is not exactly a "number of flows", but instead each
time an event occurs there is an increase in the meter, then it gradually drops
to 0 over the course of the window.
I thought it makes more sense to use the existing meter concept rather than
try to make our own implementation of a meter by having a gauge that we reset
ourselves. And if we count number of flows as we thought of before, I think it
is confusing (like if you see the number 20 on a graph, does that mean 20
failures in the past 5 minutes from that point? 20 failures in a fixed 5 minute
interval?).
With this I think we can still look for spikes in failures on the graph, or
look at the ratio of success to failure meters to measure the health of the
system.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]