Mathieu Gaudin created ZOOKEEPER-4358:
-----------------------------------------

             Summary: Latency metrics showing surprising results for a 
keberos-enabled cluster
                 Key: ZOOKEEPER-4358
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4358
             Project: ZooKeeper
          Issue Type: Bug
          Components: metric system
    Affects Versions: 3.6.2
            Reporter: Mathieu Gaudin
         Attachments: image-2021-08-27-16-10-28-783.png, 
image-2021-08-27-16-37-50-112.png

Hi,

I'm trying to understand why the values of min/avg/max latency are showing 
surprising results. The graph below shows the max latency value of a particular 
node for last 7 days. The value increases gradually over time and it only ever 
decreases when the node gets restarted as if the metric value gets reset.

[https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerStats.java#L226]

!image-2021-08-27-16-10-28-783.png|width=984,height=204!
 * 3 nodes
 * Keberos enabled
 * TGT ticket cashe enabled.

I believes the values of min/avg/max latency should show more realistic 
variations. It's very unlikely that the max latency value is expected to always 
increase while the node is running.

[https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerStats.java#L142]

 _public void updateLatency(Request request, long currentTime) {_
 _long latency = currentTime - request.createTime;_
 _if (latency < 0) {_
 _return;_
 _}_
 _*{color:#FF0000}requestLatency.addDataPoint(latency);{color}*_
 _if (request.getHdr() != null) {_
 _// Only quorum request should have header_
 _ServerMetrics.getMetrics().UPDATE_LATENCY.add(latency);_
 _} else {_
 _// All read request should goes here_
 _ServerMetrics.getMetrics().READ_LATENCY.add(latency);_
 _}_

The method called let me think that the max latency metric gets set if the 
current values happens to be lower. __ 

[https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounter.java#L51]

 _private void setMax(long value) {_
 *{color:#FF0000}_long current;_{color}*
 *{color:#FF0000}_while (value > (current = max.get()) && 
!max.compareAndSet(current, value)) {_{color}*
 _// no op_
 _}_
 _}_

I put below a graph of a particular from a totally different cluster for last 2 
days. The node has not been restarted and all the data is from the same 
process. We can see a more realistic variations of the max latency metric as it 
would normally. 

!image-2021-08-27-16-37-50-112.png|width=1084,height=222!

Thanks for you time in advance,

Math



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to