[
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089315#comment-16089315
]
ASF GitHub Bot commented on ZOOKEEPER-2770:
-------------------------------------------
Github user karanmehta93 commented on a diff in the pull request:
https://github.com/apache/zookeeper/pull/307#discussion_r127633278
--- Diff:
src/java/test/org/apache/zookeeper/server/ZooKeeperServerMainTest.java ---
@@ -138,14 +145,56 @@ void delete(File f) throws IOException {
ServerCnxnFactory getCnxnFactory() {
return main.getCnxnFactory();
}
+
}
- public static class TestZKSMain extends ZooKeeperServerMain {
+ public static class TestZKSMain extends ZooKeeperServerMain {
+
+ private ServerStats serverStats;
+
+ @Override
+ public ZooKeeperServer getZooKeeperServer(FileTxnSnapLog txnLog,
ServerConfig config, ZKDatabase zkDb) {
+ ZooKeeperServer zooKeeperServer =
super.getZooKeeperServer(txnLog, config, zkDb);
+ serverStats = zooKeeperServer.serverStats();
+ return zooKeeperServer;
+ }
+
+ @Override
public void shutdown() {
super.shutdown();
}
}
+ // Test for ZOOKEEPER-2770 ZooKeeper slow operation log
+ @Test
+ public void testRequestWarningThreshold() throws IOException,
KeeperException, InterruptedException {
+ ClientBase.setupTestEnv();
+
+ final int CLIENT_PORT = PortAssignment.unique();
+
+ MainThread main = new MainThread(CLIENT_PORT, true, null, 0);
+ main.start();
+
+ Assert.assertTrue("waiting for server being up",
+ ClientBase.waitForServerUp("127.0.0.1:" + CLIENT_PORT,
+ CONNECTION_TIMEOUT));
+ // Get the stats object from the ZooKeeperServer to keep track of
high latency requests.
+ ServerStats stats = main.main.serverStats;
+
+ ZooKeeper zk = new ZooKeeper("127.0.0.1:" + CLIENT_PORT,
+ ClientBase.CONNECTION_TIMEOUT, this);
+
+ zk.create("/foo1", "foobar".getBytes(), Ids.OPEN_ACL_UNSAFE,
+ CreateMode.PERSISTENT);
+
+ Assert.assertEquals(new String(zk.getData("/foo1", null, null)),
"foobar");
+ // It takes a while for the counter to get updated sometimes, this
is added to reduce flakyness
+ Thread.sleep(1000);
--- End diff --
I didn't see it consistently. I ran the same test about 60-70 times
individually and couple of times as a suite with all other tests. It happened
only once for me that the final assertion failed because the registered count
was 2 instead of 3, which was when I was running it with other tests from the
same class.
What do you suggest? Should I reduce the time-out or remove it all together?
> ZooKeeper slow operation log
> ----------------------------
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
> Issue Type: Improvement
> Reporter: Karan Mehta
> Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch,
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why
> any given read or write operation may become slow: a software bug, a protocol
> problem, a hardware issue with the commit log(s), a network issue. If the
> problem is constant it is trivial to come to an understanding of the cause.
> However in order to diagnose intermittent problems we often don't know where,
> or when, to begin looking. We need some sort of timestamped indication of the
> problem. Although ZooKeeper is not a datastore, it does persist data, and can
> suffer intermittent performance degradation, and should consider implementing
> a 'slow query' log, a feature very common to services which persist
> information on behalf of clients which may be sensitive to latency while
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally
> processing the request, that the current time minus arrival time of the
> request is beyond a configured threshold.
> Look at the HBase {{responseTooSlow}} feature for inspiration.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)