[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092593#comment-16092593 ]
ASF GitHub Bot commented on ZOOKEEPER-2770: ------------------------------------------- Github user karanmehta93 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r128155180 --- Diff: src/java/test/org/apache/zookeeper/server/ZooKeeperServerMainTest.java --- @@ -138,14 +145,56 @@ void delete(File f) throws IOException { ServerCnxnFactory getCnxnFactory() { return main.getCnxnFactory(); } + } - public static class TestZKSMain extends ZooKeeperServerMain { + public static class TestZKSMain extends ZooKeeperServerMain { + + private ServerStats serverStats; + + @Override + public ZooKeeperServer getZooKeeperServer(FileTxnSnapLog txnLog, ServerConfig config, ZKDatabase zkDb) { + ZooKeeperServer zooKeeperServer = super.getZooKeeperServer(txnLog, config, zkDb); + serverStats = zooKeeperServer.serverStats(); + return zooKeeperServer; + } + + @Override public void shutdown() { super.shutdown(); } } + // Test for ZOOKEEPER-2770 ZooKeeper slow operation log + @Test + public void testRequestWarningThreshold() throws IOException, KeeperException, InterruptedException { + ClientBase.setupTestEnv(); + + final int CLIENT_PORT = PortAssignment.unique(); + + MainThread main = new MainThread(CLIENT_PORT, true, null, 0); + main.start(); + + Assert.assertTrue("waiting for server being up", + ClientBase.waitForServerUp("127.0.0.1:" + CLIENT_PORT, + CONNECTION_TIMEOUT)); + // Get the stats object from the ZooKeeperServer to keep track of high latency requests. + ServerStats stats = main.main.serverStats; + + ZooKeeper zk = new ZooKeeper("127.0.0.1:" + CLIENT_PORT, + ClientBase.CONNECTION_TIMEOUT, this); + + zk.create("/foo1", "foobar".getBytes(), Ids.OPEN_ACL_UNSAFE, + CreateMode.PERSISTENT); + + Assert.assertEquals(new String(zk.getData("/foo1", null, null)), "foobar"); + // It takes a while for the counter to get updated sometimes, this is added to reduce flakyness + Thread.sleep(1000); --- End diff -- > We can always increase the counter before we send a response back to client in FinalRequestProcessor. This will make test deterministic, Or, Not this approach. It will boil down to the same thing as `ServerStats` which we don't want in the first place. > In test, implement retry polling with a timeout and a smaller sleep interval (e..g 10ms instead of 1000ms). We use this pattern a lot in our test. In general we'd like to make deterministic tests as much as possible (in other words, avoid sleep at all cost.) though sometimes that is hard to do due to various reasons. I will add the polling mechanism with a smaller sleep interval. I would also love to make the test deterministic, however I don't think this will be possible here because of race condition involved between client side and server side processing thread. > ZooKeeper slow operation log > ---------------------------- > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement > Reporter: Karan Mehta > Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)