GitHub user jtuple opened a pull request:

    https://github.com/apache/zookeeper/pull/580

    ZOOKEEPER-3098: Add additional server metrics

    This patch adds several new server-side metrics as well as makes it easier 
to add new metrics in the future. This patch also includes a handful of other 
minor metrics-related changes.
    
    Here's a high-level summary of the changes.
    
    1. This patch extends the request latency tracked in `ServerStats` to
       track `read` and `update` latency separately. Updates are any request
       that must be voted on and can change data, reads are all requests that
       can be handled locally and don't change data.
    
    2. This patch adds the `ServerMetrics` logic and the related 
`AvgMinMaxCounter`
       and `SimpleCounter` classes. This code is designed to make it incredibly 
easy to
       add new metrics. To add a new metric you just add one line to 
`ServerMetrics` and
       then directly reference that new metric anywhere in the code base. The 
`ServerMetrics`
       logic handles creating the metric, properly adding the metric to the 
JSON output of
       the `/monitor` admin command, and properly resetting the metric when 
necessary.
    
       The motivation behind `ServerMetrics` is to make things easy enough that 
it encourages
       new metrics to be added liberally. Lack of in-depth metrics/visibility 
is a long-standing
       ZooKeeper weakness. At Facebook, most of our internal changes build on 
`ServerMetrics` and
       we have nearly 100 internal metrics at this time -- all of which we'll 
be upstreaming
       in the coming months as we publish more internal patches.
    
    3. This patch adds 20 new metrics, 14 which are handled by `ServerMetrics`.
    
    4. This patch replaces some uses of `synchronized` in `ServerStats` with 
atomic operations.
    
    Here's a list of new metrics added in this patch:
    
    - `uptime`: time that a peer has been in a stable 
leading/following/observing state
    - `leader_uptime`: uptime for peer in leading state
    - `global_sessions`: count of global sessions
    - `local_sessions`: count of local sessions
    - `quorum_size`: configured ensemble size
    - `synced_observers`: similar to existing `synced_followers` but for 
observers
    - `fsynctime`: time to fsync transaction log (avg/min/max)
    - `snapshottime`: time to write a snapshot (avg/min/max)
    - `dbinittime`: time to reload database -- read snapshot + apply 
transactions (avg/min/max)
    - `readlatency`: read request latency (avg/min/max)
    - `updatelatency`: update request latency (avg/min/max)
    - `propagation_latency`: end-to-end latency for updates, from proposal on 
leader to committed-to-datatree on a given host (avg/min/max)
    - `follower_sync_time`: time for follower to sync with leader (avg/min/max)
    - `election_time`: time between entering and leaving election (avg/min/max)
    - `looking_count`: number of transitions into looking state
    - `diff_count`: number of diff syncs performed
    - `snap_count`: number of snap syncs performed
    - `commit_count`: number of commits performed on leader
    - `connection_request_count`: number of incoming client connection requests
    - `bytes_received_count`: similar to existing `packets_received` but tracks 
bytes

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jtuple/zookeeper ZOOKEEPER-3098

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/580.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #580
    
----
commit e6935f8d99eace05d29c2d6659e68e8b90b9a633
Author: Joseph Blomstedt <jdb@...>
Date:   2018-07-19T19:47:15Z

    ZOOKEEPER-3098: Add additional server metrics
    
    This patch adds several new server-side metrics as well as makes it easier
    to add new metrics in the future. This patch also includes a handful of
    other minor metrics-related changes.
    
    Here's a high-level summary of the changes.
    
    1. This patch extends the request latency tracked in `ServerStats` to
       track `read` and `update` latency separately. Updates are any request
       that must be voted on and can change data, reads are all requests that
       can be handled locally and don't change data.
    
    2. This patch adds the `ServerMetrics` logic and the related 
`AvgMinMaxCounter`
       and `SimpleCounter` classes. This code is designed to make it incredibly 
easy to
       add new metrics. To add a new metric you just add one line to 
`ServerMetrics` and
       then directly reference that new metric anywhere in the code base. The 
`ServerMetrics`
       logic handles creating the metric, properly adding the metric to the 
JSON output of
       the `/monitor` admin command, and properly resetting the metric when 
necessary.
    
       The motivation behind `ServerMetrics` is to make things easy enough that 
it encourages
       new metrics to be added liberally. Lack of in-depth metrics/visibility 
is a long-standing
       ZooKeeper weakness. At Facebook, most of our internal changes build on 
`ServerMetrics` and
       we have nearly 100 internal metrics at this time -- all of which we'll 
be upstreaming
       in the coming months as we publish more internal patches.
    
    3. This patch adds 20 new metrics, 14 which are handled by `ServerMetrics`.
    
    4. This patch replaces some uses of `synchronized` in `ServerStats` with 
atomic operations.
    
    Here's a list of new metrics added in this patch:
    
    - `uptime`: time that a peer has been in a stable 
leading/following/observing state
    - `leader_uptime`: uptime for peer in leading state
    - `global_sessions`: count of global sessions
    - `local_sessions`: count of local sessions
    - `quorum_size`: configured ensemble size
    - `synced_observers`: similar to existing `synced_followers` but for 
observers
    - `fsynctime`: time to fsync transaction log (avg/min/max)
    - `snapshottime`: time to write a snapshot (avg/min/max)
    - `dbinittime`: time to reload database -- read snapshot + apply 
transactions (avg/min/max)
    - `readlatency`: read request latency (avg/min/max)
    - `updatelatency`: update request latency (avg/min/max)
    - `propagation_latency`: end-to-end latency for updates, from proposal on 
leader to committed-to-datatree on a given host (avg/min/max)
    - `follower_sync_time`: time for follower to sync with leader (avg/min/max)
    - `election_time`: time between entering and leaving election (avg/min/max)
    - `looking_count`: number of transitions into looking state
    - `diff_count`: number of diff syncs performed
    - `snap_count`: number of snap syncs performed
    - `commit_count`: number of commits performed on leader
    - `connection_request_count`: number of incoming client connection requests
    - `bytes_received_count`: similar to existing `packets_received` but tracks 
bytes

----


---

Reply via email to