[jira] [Updated] (ZOOKEEPER-2614) Port ZOOKEEPER-1576 to branch3.4
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han updated ZOOKEEPER-2614: --- Fix Version/s: (was: 3.4.9) 3.4.11 > Port ZOOKEEPER-1576 to branch3.4 > > > Key: ZOOKEEPER-2614 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2614 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.9 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal > Fix For: 3.4.11 > > Attachments: ZOOKEEPER-2614.branch-3.4.00.patch > > > ZOOKEEPER-1576 handles UnknownHostException and it good to have this change > for 3.4 branch as well. Porting the changes to 3.4 after resolving the > conflicts -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2849) Quorum port binding needs exponential back-off retry
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101185#comment-16101185 ] Michael Han commented on ZOOKEEPER-2849: +1 on the idea. I think this will be a good improvement on the resilience of cloud deployment. [~brian.linin...@gmail.com]: are you interested in contributing a patch for this? > Quorum port binding needs exponential back-off retry > > > Key: ZOOKEEPER-2849 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2849 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum >Affects Versions: 3.4.6, 3.5.3 >Reporter: Brian Lininger >Priority: Minor > > Recently we upgraded the AWS instance type we use for running out ZooKeeper > nodes, and by doing so we're intermittently hitting an issue where ZooKeeper > cannot bind to the server election port because the IP is incorrect. This is > due to name resolution in Route53 not being in sync when ZooKeeper starts on > the more powerful EC2 instances. Currently in QuorumCnxManager.Listener, we > only attempt to bind 3 times with a 1s sleep between retries, which is not > long enough. > I'm proposing to change this to follow an exponential back-off type strategy > where each failed attempt causes a longer sleep between retry attempts. This > would allow for Zookeeper to gracefully recover when the host is > misconfigured, and subsequently corrected, without requiring the process to > be restarted while also minimizing the impact to the running instance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2856) ZooKeeperSaslClient#respondToServer should log exception message of SaslException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101183#comment-16101183 ] Michael Han commented on ZOOKEEPER-2856: [~panyuxuan] Thanks for the patch. It's a good improvement. Instead of uploading patch you can file a pull request instead. That is our recommended approach for new contribution. The benefit is you will get your github contribution credit when the patch is merged. Please refer https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute for more details. For this case since the change is trivial I can commit your patch directly, it's up to you. For the patch itself, can you use parameterized logging instead? > ZooKeeperSaslClient#respondToServer should log exception message of > SaslException > - > > Key: ZOOKEEPER-2856 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2856 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.10, 3.5.3 >Reporter: Pan Yuxuan >Assignee: Pan Yuxuan >Priority: Minor > Attachments: ZOOKEEPER-2856-1.patch > > > When upstream like HBase call ZooKeeperSaslClient with security enabled, we > sometimes get error in HBase logs like: > {noformat} > SASL authentication failed using login context 'Client'. > {noformat} > This error occures when getting SaslException in > ZooKeeperSaslClient#respondToServer : > {noformat} > catch (SaslException e) { > LOG.error("SASL authentication failed using login context '" + > this.getLoginContext() + "'."); > saslState = SaslState.FAILED; > gotLastPacket = true; > } > {noformat} > This error makes user confused without explicit exception message. So I think > we can add exception message to the log. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ZOOKEEPER-2856) ZooKeeperSaslClient#respondToServer should log exception message of SaslException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-2856: -- Assignee: Pan Yuxuan > ZooKeeperSaslClient#respondToServer should log exception message of > SaslException > - > > Key: ZOOKEEPER-2856 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2856 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.10, 3.5.3 >Reporter: Pan Yuxuan >Assignee: Pan Yuxuan >Priority: Minor > Attachments: ZOOKEEPER-2856-1.patch > > > When upstream like HBase call ZooKeeperSaslClient with security enabled, we > sometimes get error in HBase logs like: > {noformat} > SASL authentication failed using login context 'Client'. > {noformat} > This error occures when getting SaslException in > ZooKeeperSaslClient#respondToServer : > {noformat} > catch (SaslException e) { > LOG.error("SASL authentication failed using login context '" + > this.getLoginContext() + "'."); > saslState = SaslState.FAILED; > gotLastPacket = true; > } > {noformat} > This error makes user confused without explicit exception message. So I think > we can add exception message to the log. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101161#comment-16101161 ] Ted Dunning commented on ZOOKEEPER-2770: Btw I note that there is no metering on this logging. That raise an obligatory question. Is there a plausible circumstance where thousands of nearly identical messages might be logged? > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101158#comment-16101158 ] Ted Dunning commented on ZOOKEEPER-2770: {quote} With that said, is 300 ms a good value or even less is better? {quote} I would suggest that getting a real time varying histogram is the right answer. I suggested that early on for just this kind of reason. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101114#comment-16101114 ] Karan Mehta commented on ZOOKEEPER-2770: bq. Operations over 100ms should be vanishingly rare, but I wouldn't leap up to find out what is happening. I would be fairly unhappy, though, and would start checking. Let's take this as a motivation. :) With that said, is 300 ms a good value or even less is better? > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
ZooKeeper-trunk - Build # 3476 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk/3476/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 64.61 MB...] [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-07-25 23:26:04,269 [myid:] - INFO [ProcessThread(sid:0 cport:19547)::PrepRequestProcessor@614] - Processed session termination for sessionid: 0x10068c16e94 [junit] 2017-07-25 23:26:04,270 [myid:] - INFO [SyncThread:0:MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port19547,name1=Connections,name2=127.0.0.1,name3=0x10068c16e94] [junit] 2017-07-25 23:26:04,270 [myid:] - INFO [main:ZooKeeper@1329] - Session: 0x10068c16e94 closed [junit] 2017-07-25 23:26:04,270 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for session: 0x10068c16e94 [junit] 2017-07-25 23:26:04,270 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 152360 [junit] 2017-07-25 23:26:04,270 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 1648 [junit] 2017-07-25 23:26:04,271 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD testWatcherAutoResetWithLocal [junit] 2017-07-25 23:26:04,271 [myid:] - INFO [main:ClientBase@601] - tearDown starting [junit] 2017-07-25 23:26:04,271 [myid:] - INFO [main:ClientBase@571] - STOPPING server [junit] 2017-07-25 23:26:04,271 [myid:] - INFO [main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:19547 [junit] 2017-07-25 23:26:04,277 [myid:] - INFO [main:ZooKeeperServer@541] - shutting down [junit] 2017-07-25 23:26:04,277 [myid:] - ERROR [main:ZooKeeperServer@505] - ZKShutdownHandler is not registered, so ZooKeeper server won't take any action on ERROR or SHUTDOWN server state changes [junit] 2017-07-25 23:26:04,277 [myid:] - INFO [main:SessionTrackerImpl@232] - Shutting down [junit] 2017-07-25 23:26:04,279 [myid:] - INFO [main:PrepRequestProcessor@1008] - Shutting down [junit] 2017-07-25 23:26:04,279 [myid:] - INFO [main:SyncRequestProcessor@191] - Shutting down [junit] 2017-07-25 23:26:04,280 [myid:] - INFO [ProcessThread(sid:0 cport:19547)::PrepRequestProcessor@155] - PrepRequestProcessor exited loop! [junit] 2017-07-25 23:26:04,280 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited! [junit] 2017-07-25 23:26:04,280 [myid:] - INFO [main:FinalRequestProcessor@481] - shutdown of request processor complete [junit] 2017-07-25 23:26:04,280 [myid:] - INFO [main:MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port19547,name1=InMemoryDataTree] [junit] 2017-07-25 23:26:04,280 [myid:] - INFO [main:MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port19547] [junit] 2017-07-25 23:26:04,281 [myid:] - INFO [main:FourLetterWordMain@85] - connecting to 127.0.0.1 19547 [junit] 2017-07-25 23:26:04,281 [myid:] - INFO [main:JMXEnv@146] - ensureOnly:[] [junit] 2017-07-25 23:26:04,287 [myid:] - INFO [main:ClientBase@626] - fdcount after test is: 4837 at start it was 4837 [junit] 2017-07-25 23:26:04,288 [myid:] - INFO [main:ZKTestCase$1@68] - SUCCEEDED testWatcherAutoResetWithLocal [junit] 2017-07-25 23:26:04,288 [myid:] - INFO [main:ZKTestCase$1@63] - FINISHED testWatcherAutoResetWithLocal [junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 403.914 sec, Thread: 4, Class: org.apache.zookeeper.test.NioNettySuiteTest [junit] 2017-07-25 23:26:04,541 [myid:127.0.0.1:19430] - INFO [main-SendThread(127.0.0.1:19430):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:19430. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-07-25 23:26:04,542 [myid:127.0.0.1:19430] - WARN [main-SendThread(127.0.0.1:19430):ClientCnxn$SendThread@1235] - Session 0x30068be339e for server 127.0.0.1/127.0.0.1:19430, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) fail.build.on.test.failure: BUILD FAILED /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1338: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1219: The following error
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100945#comment-16100945 ] Ted Dunning commented on ZOOKEEPER-2770: On second thought, I could imagine that startup transients could cause a long operation. Once you have your quorum in a groove, however, >1 second is very bad, especially if you don't have something like a quorum leader change happening. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100942#comment-16100942 ] Ted Dunning commented on ZOOKEEPER-2770: To put some color on Camille's surprise, I would consider any operation over a second to be indicative of gross failure in the quorum. Operations over 100ms should be vanishingly rare, but I wouldn't leap up to find out what is happening. I would be fairly unhappy, though, and would start checking. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100940#comment-16100940 ] ASF GitHub Bot commented on ZOOKEEPER-2770: --- Github user karanmehta93 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r129450258 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java --- @@ -61,6 +61,7 @@ private static boolean standaloneEnabled = true; private static boolean reconfigEnabled = false; +private static int requestWarnThresholdMs = 1; --- End diff -- To be frank, I am newbie and haven't debugged this in detail. This value is purely seen based on the 'stat' command on our test cluster. @apurtell might be able to tell more practical values. @skamille I would prefer turning this on by default, although the default value needs to be discussed. In my understanding, this helps in situations when we see timeouts at application level, such a log can might help narrow down towards the cause. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log
Github user karanmehta93 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r129450258 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java --- @@ -61,6 +61,7 @@ private static boolean standaloneEnabled = true; private static boolean reconfigEnabled = false; +private static int requestWarnThresholdMs = 1; --- End diff -- To be frank, I am newbie and haven't debugged this in detail. This value is purely seen based on the 'stat' command on our test cluster. @apurtell might be able to tell more practical values. @skamille I would prefer turning this on by default, although the default value needs to be discussed. In my understanding, this helps in situations when we see timeouts at application level, such a log can might help narrow down towards the cause. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
ZooKeeper_branch34_jdk8 - Build # 1073 - Failure
See https://builds.apache.org/job/ZooKeeper_branch34_jdk8/1073/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 8.11 KB...] > git config remote.origin.url git://git.apache.org/zookeeper.git # timeout=10 Cleaning workspace > git rev-parse --verify HEAD # timeout=10 No valid HEAD. Skipping the resetting > git clean -fdx # timeout=10 Fetching upstream changes from git://git.apache.org/zookeeper.git > git --version # timeout=10 > git fetch --tags --progress git://git.apache.org/zookeeper.git > +refs/heads/*:refs/remotes/origin/* ERROR: Error fetching remote repo 'origin' hudson.plugins.git.GitException: Failed to fetch from git://git.apache.org/zookeeper.git at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:812) at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1079) at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1110) at hudson.scm.SCM.checkout(SCM.java:495) at hudson.model.AbstractProject.checkout(AbstractProject.java:1276) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:560) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:485) at hudson.model.Run.execute(Run.java:1735) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:405) Caused by: hudson.plugins.git.GitException: Command "git fetch --tags --progress git://git.apache.org/zookeeper.git +refs/heads/*:refs/remotes/origin/*" returned status code 128: stdout: stderr: fatal: unable to connect to git.apache.org: git.apache.org: Temporary failure in name resolution at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1903) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1622) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:71) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:348) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:336) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) at ..remote call to cassandra11(Native Method) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1545) at hudson.remoting.UserResponse.retrieve(UserRequest.java:253) at hudson.remoting.Channel.call(Channel.java:830) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146) at sun.reflect.GeneratedMethodAccessor864.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132) at com.sun.proxy.$Proxy104.execute(Unknown Source) at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:810) ... 11 more ERROR: Error fetching remote repo 'origin' Recording test results ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error? Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100906#comment-16100906 ] ASF GitHub Bot commented on ZOOKEEPER-2770: --- Github user skamille commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r129444784 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java --- @@ -61,6 +61,7 @@ private static boolean standaloneEnabled = true; private static boolean reconfigEnabled = false; +private static int requestWarnThresholdMs = 1; --- End diff -- You've seen 2.3 seconds latency within the ZK quorum operations? That seems worthy of posting to the mailing list along with some information about what was happening and why. I think it sounds like @hanm wants to turn this off by default, which makes this moot, and I'm supportive of that, so I'll let him make the call. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log
Github user skamille commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r129444784 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java --- @@ -61,6 +61,7 @@ private static boolean standaloneEnabled = true; private static boolean reconfigEnabled = false; +private static int requestWarnThresholdMs = 1; --- End diff -- You've seen 2.3 seconds latency within the ZK quorum operations? That seems worthy of posting to the mailing list along with some information about what was happening and why. I think it sounds like @hanm wants to turn this off by default, which makes this moot, and I'm supportive of that, so I'll let him make the call. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100881#comment-16100881 ] Michael Han commented on ZOOKEEPER-2770: For hardcode I meant the default value of "requestWarnThresholdMs" baked in code. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100852#comment-16100852 ] Andrew Purtell commented on ZOOKEEPER-2770: --- >From the original patch the warning threshold has been configurable. Calling >it 'hardcoded' isn't correct. Maybe you meant a simple threshold only? That's >true. It's better than nothing. FWIW I also like Ted's suggestion as a >followup, and in fact would like to carry that over to HBase if it works out >well here. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100846#comment-16100846 ] Michael Han commented on ZOOKEEPER-2770: A hardcoded default value in code is unlikely to work for everyone and it is possible to have false negatives if the value is too small. I am leaning towards have this feature as an opt-in feature with the value has its default -1 only and for those who want use it they can tune the parameter for their deployment but it has to be enabled explicitly. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2829) Interface usability / compatibility improvements through Java annotation.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100843#comment-16100843 ] ASF GitHub Bot commented on ZOOKEEPER-2829: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/316 Yes please split - that would be easier to land current patch and I expect it will take some discussions to nail down the complete set of new APIs to be exposed. > Interface usability / compatibility improvements through Java annotation. > - > > Key: ZOOKEEPER-2829 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2829 > Project: ZooKeeper > Issue Type: Improvement > Components: java client, server >Affects Versions: 3.4.10, 3.5.3 >Reporter: Michael Han >Assignee: Abraham Fine > Labels: annotation > > Hadoop has interface classification regarding the interfaces' scope and > stability. ZK should do something similar, which not only provides additional > benefits of making API compatibility easier between releases (or even > commits, by automating the checks via some tooling), but also consistent with > rest of Hadoop ecosystem. > See HADOOP-5073 for more context. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #316: ZOOKEEPER-2829: Interface usability / compatibility im...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/316 Yes please split - that would be easier to land current patch and I expect it will take some discussions to nail down the complete set of new APIs to be exposed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100820#comment-16100820 ] ASF GitHub Bot commented on ZOOKEEPER-2770: --- Github user karanmehta93 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r129431091 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java --- @@ -61,6 +61,7 @@ private static boolean standaloneEnabled = true; private static boolean reconfigEnabled = false; +private static int requestWarnThresholdMs = 1; --- End diff -- Is 2 or 3 seconds reasonable? I have seen 2.3 seconds as max latency sometimes, however I don't have much experience. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log
Github user karanmehta93 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r129431091 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java --- @@ -61,6 +61,7 @@ private static boolean standaloneEnabled = true; private static boolean reconfigEnabled = false; +private static int requestWarnThresholdMs = 1; --- End diff -- Is 2 or 3 seconds reasonable? I have seen 2.3 seconds as max latency sometimes, however I don't have much experience. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #316: ZOOKEEPER-2829: Interface usability / compatibility im...
Github user afine commented on the issue: https://github.com/apache/zookeeper/pull/316 @hanm I am happy to split it up if you insist. My concern is that just adding the annotations to our "normal" java classes does not actually do much since it is technically incomplete. I thought it would be a good idea to do the javadoc generation change here because it provides us a reasonably full proof way of verifying that every class that should be labeled public has been labeled public. Otherwise it would be rather tedious to make sure that we have labeled all of our classes appropriately. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2829) Interface usability / compatibility improvements through Java annotation.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100797#comment-16100797 ] ASF GitHub Bot commented on ZOOKEEPER-2829: --- Github user afine commented on the issue: https://github.com/apache/zookeeper/pull/316 @hanm I am happy to split it up if you insist. My concern is that just adding the annotations to our "normal" java classes does not actually do much since it is technically incomplete. I thought it would be a good idea to do the javadoc generation change here because it provides us a reasonably full proof way of verifying that every class that should be labeled public has been labeled public. Otherwise it would be rather tedious to make sure that we have labeled all of our classes appropriately. > Interface usability / compatibility improvements through Java annotation. > - > > Key: ZOOKEEPER-2829 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2829 > Project: ZooKeeper > Issue Type: Improvement > Components: java client, server >Affects Versions: 3.4.10, 3.5.3 >Reporter: Michael Han >Assignee: Abraham Fine > Labels: annotation > > Hadoop has interface classification regarding the interfaces' scope and > stability. ZK should do something similar, which not only provides additional > benefits of making API compatibility easier between releases (or even > commits, by automating the checks via some tooling), but also consistent with > rest of Hadoop ecosystem. > See HADOOP-5073 for more context. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100780#comment-16100780 ] ASF GitHub Bot commented on ZOOKEEPER-2770: --- Github user skamille commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r129424302 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java --- @@ -61,6 +61,7 @@ private static boolean standaloneEnabled = true; private static boolean reconfigEnabled = false; +private static int requestWarnThresholdMs = 1; --- End diff -- If we're going to implement this let's at least put some sort of realistic threshold. 10s is basically saying "don't enable this feature" is that what we want? > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log
Github user skamille commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r129424302 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java --- @@ -61,6 +61,7 @@ private static boolean standaloneEnabled = true; private static boolean reconfigEnabled = false; +private static int requestWarnThresholdMs = 1; --- End diff -- If we're going to implement this let's at least put some sort of realistic threshold. 10s is basically saying "don't enable this feature" is that what we want? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2829) Interface usability / compatibility improvements through Java annotation.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100737#comment-16100737 ] ASF GitHub Bot commented on ZOOKEEPER-2829: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/316 >> The javadoc generated by this patch should be identical to our javadoc before with a few extra classes (that I think should have been included before anyway). I suggest we scope this JIRA so it only focuses on the first part: "The javadoc generated by this patch should be identical to our javadoc before". The remaining part such as whether or not include jute and other new APIs can be discussed on dev list and done in a separate JIRA. > Interface usability / compatibility improvements through Java annotation. > - > > Key: ZOOKEEPER-2829 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2829 > Project: ZooKeeper > Issue Type: Improvement > Components: java client, server >Affects Versions: 3.4.10, 3.5.3 >Reporter: Michael Han >Assignee: Abraham Fine > Labels: annotation > > Hadoop has interface classification regarding the interfaces' scope and > stability. ZK should do something similar, which not only provides additional > benefits of making API compatibility easier between releases (or even > commits, by automating the checks via some tooling), but also consistent with > rest of Hadoop ecosystem. > See HADOOP-5073 for more context. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #316: ZOOKEEPER-2829: Interface usability / compatibility im...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/316 >> The javadoc generated by this patch should be identical to our javadoc before with a few extra classes (that I think should have been included before anyway). I suggest we scope this JIRA so it only focuses on the first part: "The javadoc generated by this patch should be identical to our javadoc before". The remaining part such as whether or not include jute and other new APIs can be discussed on dev list and done in a separate JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100706#comment-16100706 ] Andrew Purtell edited comment on ZOOKEEPER-2770 at 7/25/17 8:26 PM: The originally proposed change is hardly complex. I don't understand that aspect of this discussion. Whether or not the metric is useful, on the other hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op takes longer than a second to complete, and how often that might happen, and on what host(s)/quorum it is happening. We have fleet of thousands of servers. We have tens of ZooKeeper installations, each on five servers. Hardware does funny things from time to time. We'd like to be proactive. Edit: More like 160 quorums, I think. was (Author: apurtell): The originally proposed change is hardly complex. I don't understand that aspect of this discussion. Whether or not the metric is useful, on the other hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op takes longer than a second to complete, and how often that might happen, and on what host it is happening. We have fleet of thousands of servers. We have tens of ZooKeeper installations, each on five servers. Hardware does funny things from time to time. We'd like to be proactive. Edit: More like 160 quorums, I think. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100706#comment-16100706 ] Andrew Purtell edited comment on ZOOKEEPER-2770 at 7/25/17 8:23 PM: The originally proposed change is hardly complex. I don't understand that aspect of this discussion. Whether or not the metric is useful, on the other hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op takes longer than a second to complete, and how often that might happen, and on what host it is happening. We have fleet of thousands of servers. We have tens of ZooKeeper installations, each on five servers. Hardware does funny things from time to time. We'd like to be proactive. Edit: More like 160 quorums, I think. was (Author: apurtell): The originally proposed change is hardly complex. I don't understand that aspect of this discussion. Whether or not the metric is useful, on the other hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op takes longer than a second to complete, and how often that might happen, and on what host it is happening. We have fleet of thousands of servers. We have tens of ZooKeeper installations, each on five servers. Hardware does funny things from time to time. We'd like to be proactive. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100706#comment-16100706 ] Andrew Purtell commented on ZOOKEEPER-2770: --- The originally proposed change is hardly complex. I don't understand that aspect of this discussion. Whether or not the metric is useful, on the other hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op takes longer than a second to complete, and how often that might happen, and on what host it is happening. We have fleet of thousands of servers. We have tens of ZooKeeper installations, each on five servers. Hardware does funny things from time to time. We'd like to be proactive. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
ZooKeeper-trunk-openjdk7 - Build # 1556 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1556/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 64.60 MB...] [junit] 2017-07-25 20:14:56,652 [myid:] - INFO [main:ClientBase@601] - tearDown starting [junit] 2017-07-25 20:14:56,652 [myid:] - INFO [main:ClientBase@571] - STOPPING server [junit] 2017-07-25 20:14:56,652 [myid:] - INFO [main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:11468 [junit] 2017-07-25 20:14:56,658 [myid:] - INFO [main:ZooKeeperServer@541] - shutting down [junit] 2017-07-25 20:14:56,658 [myid:] - ERROR [main:ZooKeeperServer@505] - ZKShutdownHandler is not registered, so ZooKeeper server won't take any action on ERROR or SHUTDOWN server state changes [junit] 2017-07-25 20:14:56,658 [myid:] - INFO [main:SessionTrackerImpl@232] - Shutting down [junit] 2017-07-25 20:14:56,658 [myid:] - INFO [main:PrepRequestProcessor@1008] - Shutting down [junit] 2017-07-25 20:14:56,658 [myid:] - INFO [main:SyncRequestProcessor@191] - Shutting down [junit] 2017-07-25 20:14:56,658 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited! [junit] 2017-07-25 20:14:56,658 [myid:] - INFO [ProcessThread(sid:0 cport:11468)::PrepRequestProcessor@155] - PrepRequestProcessor exited loop! [junit] 2017-07-25 20:14:56,658 [myid:] - INFO [main:FinalRequestProcessor@481] - shutdown of request processor complete [junit] 2017-07-25 20:14:56,659 [myid:] - INFO [main:MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11468,name1=InMemoryDataTree] [junit] 2017-07-25 20:14:56,660 [myid:] - INFO [main:MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11468] [junit] 2017-07-25 20:14:56,660 [myid:] - INFO [main:FourLetterWordMain@85] - connecting to 127.0.0.1 11468 [junit] 2017-07-25 20:14:56,661 [myid:] - INFO [main:JMXEnv@146] - ensureOnly:[] [junit] 2017-07-25 20:14:56,672 [myid:] - INFO [main:ClientBase@626] - fdcount after test is: 7141 at start it was 7141 [junit] 2017-07-25 20:14:56,672 [myid:] - INFO [main:ZKTestCase$1@68] - SUCCEEDED testWatcherAutoResetWithLocal [junit] 2017-07-25 20:14:56,672 [myid:] - INFO [main:ZKTestCase$1@63] - FINISHED testWatcherAutoResetWithLocal [junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 405.159 sec, Thread: 1, Class: org.apache.zookeeper.test.NioNettySuiteTest [junit] 2017-07-25 20:14:56,694 [myid:127.0.0.1:11351] - INFO [main-SendThread(127.0.0.1:11351):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:11351. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-07-25 20:14:56,694 [myid:127.0.0.1:11351] - WARN [main-SendThread(127.0.0.1:11351):ClientCnxn$SendThread@1235] - Session 0x305a5ff5baa for server 127.0.0.1/127.0.0.1:11351, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-07-25 20:14:56,716 [myid:127.0.0.1:11222] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-07-25 20:14:56,717 [myid:127.0.0.1:11222] - WARN [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1235] - Session 0x105a5fc70c2 for server 127.0.0.1/127.0.0.1:11222, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-07-25 20:14:56,892 [myid:127.0.0.1:11271] - INFO [main-SendThread(127.0.0.1:11271):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:11271. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-07-25 20:14:56,892 [myid:127.0.0.1:11271] - WARN [main-SendThread(127.0.0.1:11271):ClientCnxn$SendThread@1235] - Session 0x105a5fcbee20001 for server 127.0.0.1/127.0.0.1:11271, unexpected error,
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100640#comment-16100640 ] Camille Fournier commented on ZOOKEEPER-2770: - Are there really 10s long slow requests? It's defaults like this that make me skeptical about the usefulness of this particular implementation. If we have a request through ZK that takes 10s to process your whole system is completely effed. I don't think we should add complexity to the code base without suitable justification for the value of the new feature. With that in mind, I'd like to understand what, specifically, the circumstances we're trying to measure are. It looks like processing time for a request through the ZK quorum alone, correct? The only network time that might be captured would be, in the case of a write, the quorum voting time. I'm all for making ZK more operable and exposing metrics but I don't think exposing low-value metrics is worth the additional code complexity without justification. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: ZOOKEEPER-2614 PreCommit Build #3639
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2614 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3639/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 884 B...] > git rev-parse refs/remotes/origin/master^{commit} # timeout=10 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10 Checking out Revision f60928787a908f358a64763f802a6d0371ad4404 (refs/remotes/origin/master) Commit message: "ZOOKEEPER-2841: ZooKeeper public include files leak porting changes" > git config core.sparsecheckout # timeout=10 > git checkout -f f60928787a908f358a64763f802a6d0371ad4404 > git rev-list f60928787a908f358a64763f802a6d0371ad4404 # timeout=10 No emails were triggered. Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [PreCommit-ZOOKEEPER-Build] $ /bin/bash /tmp/jenkins7698180655943185224.sh /home/jenkins/tools/java/latest1.7/bin/java java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 386177 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 6 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 10240 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/tools/ant/launch/Launcher : Unsupported major.minor version 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482) Build step 'Execute shell' marked build as failure Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error? Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Description set: ZOOKEEPER-2841 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100603#comment-16100603 ] Ted Dunning commented on ZOOKEEPER-2770: [~fournc], I am not so sure that *I* agree with me at this point. It is fair to say that on occasion there are slow operations in ZK and it would be good to know about them. This kind of problem is almost always due, in my own vicarious experience, to bad configuration. Often the bad configuration is simply collocation with a noisy neighbor on a deficient storage layer. There might be situations where an operation is slow due to the content of the query itself, but I cannot imagine what those situations might be. Writing a large value (but that is strictly limited in size), or even doing a huge multi-op (which has the same limited size in aggregate) should never take very long. As such, I would expect that the highest diagnostic value would not be something that dumped the contents of slow queries, but rather a capability that characterizes the entire distribution of query times. The frequency of slow queries is a diagnostic of sorts, but is one that could be inferred from the time-varying distributional information I was suggesting. That said, I don't think that a slow query log is a BAD thing (except a bit bad in terms of security if it logs the actual query). And I wouldn't want the BEST thing (a distribution log) to stop somebody contributing something. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2614) Port ZOOKEEPER-1576 to branch3.4
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100569#comment-16100569 ] Thomas Schüttel commented on ZOOKEEPER-2614: I just tested the patch with my Zookeeper ensemble running in Kubernetes. It was works fine now. Previously without the patch, my Kafka-cluster failed as soon as one Zookeeper node died even though a healthy ensemble was still present. Please merge this patch and release it. > Port ZOOKEEPER-1576 to branch3.4 > > > Key: ZOOKEEPER-2614 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2614 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.9 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal > Fix For: 3.4.9 > > Attachments: ZOOKEEPER-2614.branch-3.4.00.patch > > > ZOOKEEPER-1576 handles UnknownHostException and it good to have this change > for 3.4 branch as well. Porting the changes to 3.4 after resolving the > conflicts -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100551#comment-16100551 ] Camille Fournier commented on ZOOKEEPER-2770: - I completely agree with [~tdunning] I don't understand the motivation for this. Are we just timing the internal processing time for the request? ZK is not the same type of system as HBase so I'm not sure we are comparing apples to oranges trying to cross-implement this feature. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: ZOOKEEPER-2856 PreCommit Build #3638
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2856 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3638/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 1.25 KB...] > git rev-parse refs/remotes/origin/master^{commit} # timeout=10 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10 Checking out Revision f60928787a908f358a64763f802a6d0371ad4404 (refs/remotes/origin/master) Commit message: "ZOOKEEPER-2841: ZooKeeper public include files leak porting changes" > git config core.sparsecheckout # timeout=10 > git checkout -f f60928787a908f358a64763f802a6d0371ad4404 > git rev-list f60928787a908f358a64763f802a6d0371ad4404 # timeout=10 No emails were triggered. Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [PreCommit-ZOOKEEPER-Build] $ /bin/bash /tmp/jenkins4918952795639117506.sh /home/jenkins/tools/java/latest1.7/bin/java java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 386172 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 6 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 10240 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/tools/ant/launch/Launcher : Unsupported major.minor version 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482) Build step 'Execute shell' marked build as failure Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error? Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Description set: ZOOKEEPER-2841 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## No tests ran.
Re: Is there a benchmark performance test to reveal how the disk's iops effect zookeeper's tps/qps?
For public benchmark: * ZK has a systest in src/java/systest. * Checkout https://coreos.com/blog/performance-of-etcd.html. There is a github link on the benchmark. If you end up writing your own benchmark, please consider contribute it back to open source :) On Mon, Jul 24, 2017 at 10:35 PM, gpwrote: > As the document says > "incorrect placement of transasction log > The most performance critical part of ZooKeeper is the transaction log. > ZooKeeper syncs transactions to media before it returns a response. A > dedicated transaction log device is key to consistent good performance. > Putting the log on a busy device will adversely effect performance..." > However, is there a benchmark perf test to reveal how the disk's iops > effect zookeeper's tps/qps? > > > > > > > > > > -- Cheers Michael.
[jira] [Updated] (ZOOKEEPER-2856) ZooKeeperSaslClient#respondToServer should log exception message of SaslException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pan Yuxuan updated ZOOKEEPER-2856: -- Attachment: ZOOKEEPER-2856-1.patch Attache a simple patch. > ZooKeeperSaslClient#respondToServer should log exception message of > SaslException > - > > Key: ZOOKEEPER-2856 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2856 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.10, 3.5.3 >Reporter: Pan Yuxuan >Priority: Minor > Attachments: ZOOKEEPER-2856-1.patch > > > When upstream like HBase call ZooKeeperSaslClient with security enabled, we > sometimes get error in HBase logs like: > {noformat} > SASL authentication failed using login context 'Client'. > {noformat} > This error occures when getting SaslException in > ZooKeeperSaslClient#respondToServer : > {noformat} > catch (SaslException e) { > LOG.error("SASL authentication failed using login context '" + > this.getLoginContext() + "'."); > saslState = SaslState.FAILED; > gotLastPacket = true; > } > {noformat} > This error makes user confused without explicit exception message. So I think > we can add exception message to the log. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ZOOKEEPER-2856) ZooKeeperSaslClient#respondToServer should log exception message of SaslException
Pan Yuxuan created ZOOKEEPER-2856: - Summary: ZooKeeperSaslClient#respondToServer should log exception message of SaslException Key: ZOOKEEPER-2856 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2856 Project: ZooKeeper Issue Type: Improvement Affects Versions: 3.5.3, 3.4.10 Reporter: Pan Yuxuan Priority: Minor When upstream like HBase call ZooKeeperSaslClient with security enabled, we sometimes get error in HBase logs like: {noformat} SASL authentication failed using login context 'Client'. {noformat} This error occures when getting SaslException in ZooKeeperSaslClient#respondToServer : {noformat} catch (SaslException e) { LOG.error("SASL authentication failed using login context '" + this.getLoginContext() + "'."); saslState = SaslState.FAILED; gotLastPacket = true; } {noformat} This error makes user confused without explicit exception message. So I think we can add exception message to the log. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099714#comment-16099714 ] Hadoop QA commented on ZOOKEEPER-2770: -- -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//console This message is automatically generated. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: ZOOKEEPER- PreCommit Build #900
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 70.91 MB...] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] c45b420d4ece535ee9e0f92e46aaf8893d3b7735 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1642: exec returned: 2 Total time: 12 minutes 20 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Description set: ZOOKEEPER-2770 Putting comment on the pull request Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## 4 tests failed. FAILED: org.apache.zookeeper.server.quorum.ReconfigDuringLeaderSyncTest.testDuringLeaderSync Error Message: zoo.cfg.dynamic.next is not deleted. Stack Trace: junit.framework.AssertionFailedError: zoo.cfg.dynamic.next is not deleted. at org.apache.zookeeper.server.quorum.ReconfigDuringLeaderSyncTest.testDuringLeaderSync(ReconfigDuringLeaderSyncTest.java:165) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79) FAILED: org.apache.zookeeper.test.FourLetterWordsTest.testValidateStatOutput Error Message: null Stack Trace: junit.framework.AssertionFailedError at org.apache.zookeeper.test.FourLetterWordsTest.testValidateStatOutput(FourLetterWordsTest.java:158) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.lang.Thread.run(Thread.java:745) FAILED: org.apache.zookeeper.test.FourLetterWordsTest.testValidateStatOutput Error Message: null Stack Trace: junit.framework.AssertionFailedError at org.apache.zookeeper.test.FourLetterWordsTest.testValidateStatOutput(FourLetterWordsTest.java:158) at
Is there a benchmark performance test to reveal how the disk's iops effect zookeeper's tps/qps?
As the document says "incorrect placement of transasction log The most performance critical part of ZooKeeper is the transaction log. ZooKeeper syncs transactions to media before it returns a response. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely effect performance..." However, is there a benchmark perf test to reveal how the disk's iops effect zookeeper's tps/qps?