Github user jtuple commented on a diff in the pull request:
https://github.com/apache/zookeeper/pull/580#discussion_r216112301
--- Diff: src/java/main/org/apache/zookeeper/server/ServerMetrics.java ---
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server;
+
+import org.apache.zookeeper.server.metric.AvgMinMaxCounter;
+import org.apache.zookeeper.server.metric.Metric;
+import org.apache.zookeeper.server.metric.SimpleCounter;
+
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+public enum ServerMetrics {
+ /**
+ * Txnlog fsync time
+ */
+ FSYNC_TIME(new AvgMinMaxCounter("fsynctime")),
+
+ /**
+ * Snapshot writing time
+ */
+ SNAPSHOT_TIME(new AvgMinMaxCounter("snapshottime")),
+
+ /**
+ * Db init time (snapshot loading + txnlog replay)
+ */
+ DB_INIT_TIME(new AvgMinMaxCounter("dbinittime")),
+
+ /**
+ * Stats for read request. The timing start from when the server see
the
+ * request until it leave final request processor.
+ */
+ READ_LATENCY(new AvgMinMaxCounter("readlatency")),
+
+ /**
+ * Stats for request that need quorum voting. Timing is the same as
read
+ * request. We only keep track of stats for request that originated
from
+ * this machine only.
+ */
+ UPDATE_LATENCY(new AvgMinMaxCounter("updatelatency")),
+
+ /**
+ * Stats for all quorum request. The timing start from when the leader
+ * see the request until it reach the learner.
+ */
+ PROPAGATION_LATENCY(new AvgMinMaxCounter("propagation_latency")),
+
+ FOLLOWER_SYNC_TIME(new AvgMinMaxCounter("follower_sync_time")),
+ ELECTION_TIME(new AvgMinMaxCounter("election_time")),
--- End diff --
If by gauge you're referring to a metric that only reports a single value,
I don't think that's a good fit for these metrics. If you're querying metrics
from Zookeeper every minute or so, you'd miss all but the last election that
occurred in that interval. With an avg/min/max counter, we can query
periodically and have statistics that summarize all elections that occurred in
that interval. Same logic applies to fsync, init, etc.
Tracking individual event times is probably better left for the "push
metrics to external system" approach that's orthogonal to this pull request.
However, if we wanted to provide statistics + bounded history internally,
we could consider adding an extended version of say `AvgMinMaxCounter` that
also keeps a fixed number of values around that could be queried, dropping the
oldest or doing some sort of reservoir sampling.
Definitely something we can do in the future. If we do, it's unclear if
we'd want to wire that up to `/metrics` admin command though vs exposing
through a different means.
---