[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221123#comment-16221123 ]
ASF GitHub Bot commented on ZOOKEEPER-2770: ------------------------------------------- Github user karanmehta93 commented on the issue: https://github.com/apache/zookeeper/pull/307 Due to some reason, Jenkins didn't build it automatically. I triggered a manual build with my latest commit ID SHA1 and the results are at https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1123/ All the Findbugs warnings have been resolved, and the single failing test seems to be unrelated to my change (or possibly a flapper since it is passing on my local machine). This commit is a squashed commit based on the earlier inputs provided by everyone. Thank you! As of now, the default value for `requestWarnThresholdMs` is 10 seconds, which is too high. Users can set it according to the average latency values they typically see from the serverstats. The patch also limits logging rate, so if the value is set too low, it will print out a message every 60 seconds showing the number of high latency requests in past minute. The motivation of this JIRA is specified in description at https://issues.apache.org/jira/browse/ZOOKEEPER-2770. If you can review this PR, it would be great. @hanm @tdunning @eribeiro @skamille If we can get in this JIRA, the future plan is to work on @tdunning t-digest and use it to determine high latency requests specifically based on its type. Thanks! > ZooKeeper slow operation log > ---------------------------- > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement > Reporter: Karan Mehta > Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)