Mathieu Gaudin created ZOOKEEPER-4358:
-----------------------------------------
Summary: Latency metrics showing surprising results for a
keberos-enabled cluster
Key: ZOOKEEPER-4358
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4358
Project: ZooKeeper
Issue Type: Bug
Components: metric system
Affects Versions: 3.6.2
Reporter: Mathieu Gaudin
Attachments: image-2021-08-27-16-10-28-783.png,
image-2021-08-27-16-37-50-112.png
Hi,
I'm trying to understand why the values of min/avg/max latency are showing
surprising results. The graph below shows the max latency value of a particular
node for last 7 days. The value increases gradually over time and it only ever
decreases when the node gets restarted as if the metric value gets reset.
[https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerStats.java#L226]
!image-2021-08-27-16-10-28-783.png|width=984,height=204!
* 3 nodes
* Keberos enabled
* TGT ticket cashe enabled.
I believes the values of min/avg/max latency should show more realistic
variations. It's very unlikely that the max latency value is expected to always
increase while the node is running.
[https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerStats.java#L142]
_public void updateLatency(Request request, long currentTime) {_
_long latency = currentTime - request.createTime;_
_if (latency < 0) {_
_return;_
_}_
_*{color:#FF0000}requestLatency.addDataPoint(latency);{color}*_
_if (request.getHdr() != null) {_
_// Only quorum request should have header_
_ServerMetrics.getMetrics().UPDATE_LATENCY.add(latency);_
_} else {_
_// All read request should goes here_
_ServerMetrics.getMetrics().READ_LATENCY.add(latency);_
_}_
The method called let me think that the max latency metric gets set if the
current values happens to be lower. __
[https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounter.java#L51]
_private void setMax(long value) {_
*{color:#FF0000}_long current;_{color}*
*{color:#FF0000}_while (value > (current = max.get()) &&
!max.compareAndSet(current, value)) {_{color}*
_// no op_
_}_
_}_
I put below a graph of a particular from a totally different cluster for last 2
days. The node has not been restarted and all the data is from the same
process. We can see a more realistic variations of the max latency metric as it
would normally.
!image-2021-08-27-16-37-50-112.png|width=1084,height=222!
Thanks for you time in advance,
Math
--
This message was sent by Atlassian Jira
(v8.3.4#803005)