Github user karanmehta93 commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/307#discussion_r187815218
  
    --- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java ---
    @@ -148,9 +174,46 @@ synchronized public void resetRequestCounters(){
             packetsReceived = 0;
             packetsSent = 0;
         }
    +    synchronized public void resetNumRequestsAboveThresholdTime() {
    +        numRequestsAboveThresholdTime = 0;
    +    }
         synchronized public void reset() {
             resetLatency();
             resetRequestCounters();
    +        resetNumRequestsAboveThresholdTime();
    +    }
    +
    +    public void checkLatency(final ZooKeeperServer zks, Request request) {
    +        long requestLatency = Time.currentElapsedTime() - 
request.createTime;
    +        boolean enabledAndAboveThreshold = (requestWarnThresholdMs == 0) ||
    +                (requestWarnThresholdMs > -1 && requestLatency > 
requestWarnThresholdMs);
    +        if (enabledAndAboveThreshold) {
    +            zks.serverStats().incNumRequestsAboveThresholdTime();
    +
    +            // Try acquiring lock only if not waiting
    +            boolean success = 
waitForLoggingWarnThresholdMsg.compareAndSet(Boolean.FALSE, Boolean.TRUE);
    +            if (success) {
    +                LOG.warn("Request {} exceeded threshold. Took {} ms", 
request, requestLatency);
    +                startCount = 
zks.serverStats().getNumRequestsAboveThresholdTime();
    +                timer.schedule(new TimerTask() {
    +                    @Override
    +                    public void run() {
    +                        long count = 
zks.serverStats().getNumRequestsAboveThresholdTime() - startCount;
    --- End diff --
    
    > If there is no slow requests coming in between the time the task is 
scheduled and the time the task is executed, this count will be 0. It will not 
reflect the actual number of requests we want to log. Maybe log the startCount 
instead and reset it and leave the task only to reset the barrier?
    
    The whole idea behind the schedule timer is to see the requests that 
exceeded threshold in the past rolling window of 60 seconds. Since the task is 
only scheduled once a high latency request is logged, it is fine to say that 0 
requests had longer times since the last bad request was logged. The total 
count can be seen using `stat` at any point of time. What do you suggest?
    
    > Also please use ScheduledService instead of Timer task
    
    Sure will update.



---

Reply via email to