[GitHub] zookeeper issue #580: ZOOKEEPER-3098: Add additional server metrics

jtuple Fri, 07 Sep 2018 17:51:49 -0700

Github user jtuple commented on the issue:

    https://github.com/apache/zookeeper/pull/580
  
    Sorry for the delay, August was a super busy month.
    
    I've rebased onto latest master + added basic unit tests for 
`AvgMinMaxCounter` and `SimpleCounter`. I also resolved the findbugs issue and 
the `testMonitor` unit tests. Jenkin still seems unhappy, but I'm not sure 
what's up -- the linked test results show no failures.
    
    I've also addressed the majority of inline comments. Please let me know if 
there's anything else I overlooked.
    
    ---
    
    As mentioned in a prior comment, this pull request is just the first of 
many. In this PR we add the `AvgMinMaxCounter` and `SimpleCounter` metrics.
    
    We have additional metrics internally that we'll upstream in the future, 
including `AvgMinMaxPercentileCounter` as well as set variants: 
`AvgMinMaxCounterSet` and `AvgMinMaxPercentileCounterSet`. I talked about both 
of these in [a previous 
comment](https://github.com/apache/zookeeper/pull/580#issuecomment-410340039)
    
    Those are all the types we currently have internally. Other types like a 
Gauge could easily be done in a future pull-request, although our experience 
has been that gauge's are less useful in a polling based approach since you 
only see the most recent value. Something like an `AvgMinMaxCounter` metric 
gives you more information when there are multiple events, and trivially 
reduces to a gauge-equivalent when there's only a single value during the 
sampling interval.
    
    For things like commit processor queue sizes and the like, we simply query 
the queue size every time we enqueue an item and report that instantaneous size 
as a value in an `AvgMinMaxCounter`. When we periodically query metrics, we 
then know the min/max/avg queue size during the interval, and as well as the 
population count which is equivalent to "number of enqueues" during the 
interval.
    
    ---
    
    @hanm: For your example, you have a few options:
    
    1. Do as you mentioned and add a new metric to the `ServerMetric` enum for 
each request type.
    
    2. Once we land the set-variants, you could add a single metric to the 
`ServerMetric` enum and rely upon the set logic. Eg. add a `REQUEST_LATENCY(new 
AvgMinMaxCounterSet("request_latency"))` metric and then do 
`ServerMetrics.REQUEST_LATENCY.add("create_session", latency)`, 
`ServerMetrics.REQUEST_LATENCY.add("set_data", latency)`, etc. For your 
example, you wanted to track counts so you'd probably want a `SimpleCounterSet` 
which we don't have, but would be easy to create.
    
    3. Use something custom. For example, the majority of our internal metrics 
build on `ServerMetrics`, but we do have a handful that don't. One example that 
we'll be upstreaming soon is our so-called "request state" metrics that track 
the count for requests at different stages in the request pipeline. We maintain 
a count of requests queued, issued, and completed for different category of 
requests (all, read, write, create_session, close_session, sync, auth, 
set_watches, other). This matrix is also added to the output of `/metrics` but 
doesn't use any of the `ServerMetrics` logic.

---

[GitHub] zookeeper issue #580: ZOOKEEPER-3098: Add additional server metrics

Reply via email to