GitHub user jtuple opened a pull request:
https://github.com/apache/zookeeper/pull/580
ZOOKEEPER-3098: Add additional server metrics
This patch adds several new server-side metrics as well as makes it easier
to add new metrics in the future. This patch also includes a handful of other
minor metrics-related changes.
Here's a high-level summary of the changes.
1. This patch extends the request latency tracked in `ServerStats` to
track `read` and `update` latency separately. Updates are any request
that must be voted on and can change data, reads are all requests that
can be handled locally and don't change data.
2. This patch adds the `ServerMetrics` logic and the related
`AvgMinMaxCounter`
and `SimpleCounter` classes. This code is designed to make it incredibly
easy to
add new metrics. To add a new metric you just add one line to
`ServerMetrics` and
then directly reference that new metric anywhere in the code base. The
`ServerMetrics`
logic handles creating the metric, properly adding the metric to the
JSON output of
the `/monitor` admin command, and properly resetting the metric when
necessary.
The motivation behind `ServerMetrics` is to make things easy enough that
it encourages
new metrics to be added liberally. Lack of in-depth metrics/visibility
is a long-standing
ZooKeeper weakness. At Facebook, most of our internal changes build on
`ServerMetrics` and
we have nearly 100 internal metrics at this time -- all of which we'll
be upstreaming
in the coming months as we publish more internal patches.
3. This patch adds 20 new metrics, 14 which are handled by `ServerMetrics`.
4. This patch replaces some uses of `synchronized` in `ServerStats` with
atomic operations.
Here's a list of new metrics added in this patch:
- `uptime`: time that a peer has been in a stable
leading/following/observing state
- `leader_uptime`: uptime for peer in leading state
- `global_sessions`: count of global sessions
- `local_sessions`: count of local sessions
- `quorum_size`: configured ensemble size
- `synced_observers`: similar to existing `synced_followers` but for
observers
- `fsynctime`: time to fsync transaction log (avg/min/max)
- `snapshottime`: time to write a snapshot (avg/min/max)
- `dbinittime`: time to reload database -- read snapshot + apply
transactions (avg/min/max)
- `readlatency`: read request latency (avg/min/max)
- `updatelatency`: update request latency (avg/min/max)
- `propagation_latency`: end-to-end latency for updates, from proposal on
leader to committed-to-datatree on a given host (avg/min/max)
- `follower_sync_time`: time for follower to sync with leader (avg/min/max)
- `election_time`: time between entering and leaving election (avg/min/max)
- `looking_count`: number of transitions into looking state
- `diff_count`: number of diff syncs performed
- `snap_count`: number of snap syncs performed
- `commit_count`: number of commits performed on leader
- `connection_request_count`: number of incoming client connection requests
- `bytes_received_count`: similar to existing `packets_received` but tracks
bytes
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jtuple/zookeeper ZOOKEEPER-3098
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zookeeper/pull/580.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #580
----
commit e6935f8d99eace05d29c2d6659e68e8b90b9a633
Author: Joseph Blomstedt <jdb@...>
Date: 2018-07-19T19:47:15Z
ZOOKEEPER-3098: Add additional server metrics
This patch adds several new server-side metrics as well as makes it easier
to add new metrics in the future. This patch also includes a handful of
other minor metrics-related changes.
Here's a high-level summary of the changes.
1. This patch extends the request latency tracked in `ServerStats` to
track `read` and `update` latency separately. Updates are any request
that must be voted on and can change data, reads are all requests that
can be handled locally and don't change data.
2. This patch adds the `ServerMetrics` logic and the related
`AvgMinMaxCounter`
and `SimpleCounter` classes. This code is designed to make it incredibly
easy to
add new metrics. To add a new metric you just add one line to
`ServerMetrics` and
then directly reference that new metric anywhere in the code base. The
`ServerMetrics`
logic handles creating the metric, properly adding the metric to the
JSON output of
the `/monitor` admin command, and properly resetting the metric when
necessary.
The motivation behind `ServerMetrics` is to make things easy enough that
it encourages
new metrics to be added liberally. Lack of in-depth metrics/visibility
is a long-standing
ZooKeeper weakness. At Facebook, most of our internal changes build on
`ServerMetrics` and
we have nearly 100 internal metrics at this time -- all of which we'll
be upstreaming
in the coming months as we publish more internal patches.
3. This patch adds 20 new metrics, 14 which are handled by `ServerMetrics`.
4. This patch replaces some uses of `synchronized` in `ServerStats` with
atomic operations.
Here's a list of new metrics added in this patch:
- `uptime`: time that a peer has been in a stable
leading/following/observing state
- `leader_uptime`: uptime for peer in leading state
- `global_sessions`: count of global sessions
- `local_sessions`: count of local sessions
- `quorum_size`: configured ensemble size
- `synced_observers`: similar to existing `synced_followers` but for
observers
- `fsynctime`: time to fsync transaction log (avg/min/max)
- `snapshottime`: time to write a snapshot (avg/min/max)
- `dbinittime`: time to reload database -- read snapshot + apply
transactions (avg/min/max)
- `readlatency`: read request latency (avg/min/max)
- `updatelatency`: update request latency (avg/min/max)
- `propagation_latency`: end-to-end latency for updates, from proposal on
leader to committed-to-datatree on a given host (avg/min/max)
- `follower_sync_time`: time for follower to sync with leader (avg/min/max)
- `election_time`: time between entering and leaving election (avg/min/max)
- `looking_count`: number of transitions into looking state
- `diff_count`: number of diff syncs performed
- `snap_count`: number of snap syncs performed
- `commit_count`: number of commits performed on leader
- `connection_request_count`: number of incoming client connection requests
- `bytes_received_count`: similar to existing `packets_received` but tracks
bytes
----
---