Mathieu Gaudin created ZOOKEEPER-4358: -----------------------------------------
Summary: Latency metrics showing surprising results for a keberos-enabled cluster Key: ZOOKEEPER-4358 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4358 Project: ZooKeeper Issue Type: Bug Components: metric system Affects Versions: 3.6.2 Reporter: Mathieu Gaudin Attachments: image-2021-08-27-16-10-28-783.png, image-2021-08-27-16-37-50-112.png Hi, I'm trying to understand why the values of min/avg/max latency are showing surprising results. The graph below shows the max latency value of a particular node for last 7 days. The value increases gradually over time and it only ever decreases when the node gets restarted as if the metric value gets reset. [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerStats.java#L226] !image-2021-08-27-16-10-28-783.png|width=984,height=204! * 3 nodes * Keberos enabled * TGT ticket cashe enabled. I believes the values of min/avg/max latency should show more realistic variations. It's very unlikely that the max latency value is expected to always increase while the node is running. [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerStats.java#L142] _public void updateLatency(Request request, long currentTime) {_ _long latency = currentTime - request.createTime;_ _if (latency < 0) {_ _return;_ _}_ _*{color:#FF0000}requestLatency.addDataPoint(latency);{color}*_ _if (request.getHdr() != null) {_ _// Only quorum request should have header_ _ServerMetrics.getMetrics().UPDATE_LATENCY.add(latency);_ _} else {_ _// All read request should goes here_ _ServerMetrics.getMetrics().READ_LATENCY.add(latency);_ _}_ The method called let me think that the max latency metric gets set if the current values happens to be lower. __ [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounter.java#L51] _private void setMax(long value) {_ *{color:#FF0000}_long current;_{color}* *{color:#FF0000}_while (value > (current = max.get()) && !max.compareAndSet(current, value)) {_{color}* _// no op_ _}_ _}_ I put below a graph of a particular from a totally different cluster for last 2 days. The node has not been restarted and all the data is from the same process. We can see a more realistic variations of the max latency metric as it would normally. !image-2021-08-27-16-37-50-112.png|width=1084,height=222! Thanks for you time in advance, Math -- This message was sent by Atlassian Jira (v8.3.4#803005)