[
https://issues.apache.org/jira/browse/ZOOKEEPER-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph Blomstedt updated ZOOKEEPER-3098:
----------------------------------------
Description:
This patch adds several new server-side metrics as well as makes it easier to
add new metrics in the future. This patch also includes a handful of other
minor metrics-related changes.
Here's a high-level summary of the changes.
# This patch extends the request latency tracked in {{ServerStats}} to track
{{read}} and {{update}} latency separately. Updates are any request that must
be voted on and can change data, reads are all requests that can be handled
locally and don't change data.
# This patch adds the {{ServerMetrics}} logic and the related
{{AvgMinMaxCounter}} and {{SimpleCounter}} classes. This code is designed to
make it incredibly easy to add new metrics. To add a new metric you just add
one line to {{ServerMetrics}} and then directly reference that new metric
anywhere in the code base. The {{ServerMetrics}} logic handles creating the
metric, properly adding the metric to the JSON output of the {{/monitor}} admin
command, and properly resetting the metric when necessary. The motivation
behind {{ServerMetrics}} is to make things easy enough that it encourages new
metrics to be added liberally. Lack of in-depth metrics/visibility is a
long-standing ZooKeeper weakness. At Facebook, most of our internal changes
build on {{ServerMetrics}} and we have nearly 100 internal metrics at this time
– all of which we'll be upstreaming in the coming months as we publish more
internal patches.
# This patch adds 20 new metrics, 14 which are handled by {{ServerMetrics}}.
# This patch replaces some uses of {{synchronized}} in {{ServerStats}} with
atomic operations.
Here's a list of new metrics added in this patch:
- {{uptime}}: time that a peer has been in a stable
leading/following/observing state
- {{leader_uptime}}: uptime for peer in leading state
- {{global_sessions}}: count of global sessions
- {{local_sessions}}: count of local sessions
- {{quorum_size}}: configured ensemble size
- {{synced_observers}}: similar to existing `synced_followers` but for
observers
- {{fsynctime}}: time to fsync transaction log (avg/min/max)
- {{snapshottime}}: time to write a snapshot (avg/min/max)
- {{dbinittime}}: time to reload database – read snapshot + apply transactions
(avg/min/max)
- {{readlatency}}: read request latency (avg/min/max)
- {{updatelatency}}: update request latency (avg/min/max)
- {{propagation_latency}}: end-to-end latency for updates, from proposal on
leader to committed-to-datatree on a given host (avg/min/max)
- {{follower_sync_time}}: time for follower to sync with leader (avg/min/max)
- {{election_time}}: time between entering and leaving election (avg/min/max)
- {{looking_count}}: number of transitions into looking state
- {{diff_count}}: number of diff syncs performed
- {{snap_count}}: number of snap syncs performed
- {{commit_count}}: number of commits performed on leader
- {{connection_request_count}}: number of incoming client connection requests
- {{bytes_received_count}}: similar to existing `packets_received` but tracks
bytes
was:
This patch adds several new server-side metrics as well as makes it easier to
add new metrics in the future. This patch also includes a handful of other
minor metrics-related changes.
Here's a high-level summary of the changes.
# This patch extends the request latency tracked in {{ServerStats}} to track
{{read}} and {{update}} latency separately. Updates are any request that must
be voted on and can change data, reads are all requests that can be handled
locally and don't change data.
# This patch adds the {{ServerMetrics}} logic and the related
{{AvgMinMaxCounter}} and {{SimpleCounter}} classes. This code is designed to
make it incredibly easy to add new metrics. To add a new metric you just add
one line to {{ServerMetrics}} and then directly reference that new metric
anywhere in the code base. The {{ServerMetrics}} logic handles creating the
metric, properly adding the metric to the JSON output of the {{/monitor}} admin
command, and properly resetting the metric when necessary.
The motivation behind {{ServerMetrics}} is to make things easy enough that it
encourages new metrics to be added liberally. Lack of in-depth
metrics/visibility is a long-standing ZooKeeper weakness. At Facebook, most of
our internal changes build on {{ServerMetrics}} and we have nearly 100 internal
metrics at this time – all of which we'll be upstreaming in the coming months
as we publish more internal patches.
# This patch adds 20 new metrics, 14 which are handled by {{ServerMetrics}}.
# This patch replaces some uses of {{synchronized}} in {{ServerStats}} with
atomic operations.
Here's a list of new metrics added in this patch:
- {{uptime}}: time that a peer has been in a stable
leading/following/observing state
- {{leader_uptime}}: uptime for peer in leading state
- {{global_sessions}}: count of global sessions
- {{local_sessions}}: count of local sessions
- {{quorum_size}}: configured ensemble size
- {{synced_observers}}: similar to existing `synced_followers` but for
observers
- {{fsynctime}}: time to fsync transaction log (avg/min/max)
- {{snapshottime}}: time to write a snapshot (avg/min/max)
- {{dbinittime}}: time to reload database – read snapshot + apply transactions
(avg/min/max)
- {{readlatency}}: read request latency (avg/min/max)
- {{updatelatency}}: update request latency (avg/min/max)
- {{propagation_latency}}: end-to-end latency for updates, from proposal on
leader to committed-to-datatree on a given host (avg/min/max)
- {{follower_sync_time}}: time for follower to sync with leader (avg/min/max)
- {{election_time}}: time between entering and leaving election (avg/min/max)
- {{looking_count}}: number of transitions into looking state
- {{diff_count}}: number of diff syncs performed
- {{snap_count}}: number of snap syncs performed
- {{commit_count}}: number of commits performed on leader
- {{connection_request_count}}: number of incoming client connection requests
- {{bytes_received_count}}: similar to existing `packets_received` but tracks
bytes
> Add additional server metrics
> -----------------------------
>
> Key: ZOOKEEPER-3098
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3098
> Project: ZooKeeper
> Issue Type: Improvement
> Components: server
> Affects Versions: 3.6.0
> Reporter: Joseph Blomstedt
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This patch adds several new server-side metrics as well as makes it easier to
> add new metrics in the future. This patch also includes a handful of other
> minor metrics-related changes.
> Here's a high-level summary of the changes.
> # This patch extends the request latency tracked in {{ServerStats}} to track
> {{read}} and {{update}} latency separately. Updates are any request that must
> be voted on and can change data, reads are all requests that can be handled
> locally and don't change data.
> # This patch adds the {{ServerMetrics}} logic and the related
> {{AvgMinMaxCounter}} and {{SimpleCounter}} classes. This code is designed to
> make it incredibly easy to add new metrics. To add a new metric you just add
> one line to {{ServerMetrics}} and then directly reference that new metric
> anywhere in the code base. The {{ServerMetrics}} logic handles creating the
> metric, properly adding the metric to the JSON output of the {{/monitor}}
> admin command, and properly resetting the metric when necessary. The
> motivation behind {{ServerMetrics}} is to make things easy enough that it
> encourages new metrics to be added liberally. Lack of in-depth
> metrics/visibility is a long-standing ZooKeeper weakness. At Facebook, most
> of our internal changes build on {{ServerMetrics}} and we have nearly 100
> internal metrics at this time – all of which we'll be upstreaming in the
> coming months as we publish more internal patches.
> # This patch adds 20 new metrics, 14 which are handled by {{ServerMetrics}}.
> # This patch replaces some uses of {{synchronized}} in {{ServerStats}} with
> atomic operations.
> Here's a list of new metrics added in this patch:
> - {{uptime}}: time that a peer has been in a stable
> leading/following/observing state
> - {{leader_uptime}}: uptime for peer in leading state
> - {{global_sessions}}: count of global sessions
> - {{local_sessions}}: count of local sessions
> - {{quorum_size}}: configured ensemble size
> - {{synced_observers}}: similar to existing `synced_followers` but for
> observers
> - {{fsynctime}}: time to fsync transaction log (avg/min/max)
> - {{snapshottime}}: time to write a snapshot (avg/min/max)
> - {{dbinittime}}: time to reload database – read snapshot + apply
> transactions (avg/min/max)
> - {{readlatency}}: read request latency (avg/min/max)
> - {{updatelatency}}: update request latency (avg/min/max)
> - {{propagation_latency}}: end-to-end latency for updates, from proposal on
> leader to committed-to-datatree on a given host (avg/min/max)
> - {{follower_sync_time}}: time for follower to sync with leader (avg/min/max)
> - {{election_time}}: time between entering and leaving election (avg/min/max)
> - {{looking_count}}: number of transitions into looking state
> - {{diff_count}}: number of diff syncs performed
> - {{snap_count}}: number of snap syncs performed
> - {{commit_count}}: number of commits performed on leader
> - {{connection_request_count}}: number of incoming client connection requests
> - {{bytes_received_count}}: similar to existing `packets_received` but
> tracks bytes
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)