[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124766#comment-15124766 ] Hadoop QA commented on ZOOKEEPER-2247: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12785335/ZOOKEEPER-2247-10.patch against trunk revision 1726354. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3024//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3024//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3024//console This message is automatically generated. > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, > ZOOKEEPER-2247-10.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: ZOOKEEPER-2247 PreCommit Build #3024
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3024/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 379396 lines...] [exec] [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3024//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3024//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3024//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 8732ae556b518abcc61d9e4d11efa144fb67aeb9 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1816: exec returned: 1 Total time: 8 minutes 53 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Compressed 543.84 KB of artifacts by 47.1% relative to #3019 Recording test results Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 [description-setter] Description set: ZOOKEEPER-2247 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig Error Message: waiting for server 2 being up Stack Trace: junit.framework.AssertionFailedError: waiting for server 2 being up at org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig(ReconfigRecoveryTest.java:217) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124741#comment-15124741 ] Rakesh R commented on ZOOKEEPER-2247: - Attached another patch addressing {{while (self.isRunning() && this.isRunning())}} comment. Also, added checks in the {{Leader}}, that was missed previously. > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, > ZOOKEEPER-2247-10.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated ZOOKEEPER-2247: Attachment: ZOOKEEPER-2247-10.patch > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, > ZOOKEEPER-2247-10.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124733#comment-15124733 ] Rakesh R commented on ZOOKEEPER-2247: - bq. Per discussion in the mailing list, lets punt this to 3.4.9. Thanks for the patience [~rgs] . If we get a good solution will include this, otw agreed to postpone. Thanks [~fpj] for the review comments. bq. Instead of doing this while (self.isRunning() && this.isRunning()), why don't you do this while (this.isRunning()) and check self.running() in this.isRunning()? Agreed, will change it. {quote} I don't think we need the health monitor thread. It is just shutting down the cnxn factories and you could do it immediately after the server loops. For example, in Follower.java, add the cnxn factory shutdown calls after the 'while (this.isRunning())'. Does it work?{quote} The solution works well with ensemble. IIUC, there is no server loop in the standalone server. He just waits on {{#join}} [ZooKeeperServerMain.java#L149|https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServerMain.java#L149]. I'm not finding a way to interrupt this without a watchdog in standalone mode. Could you please correct me if I'm missing anything. Thanks! > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124729#comment-15124729 ] Rakesh R commented on ZOOKEEPER-2247: - Thanks [~cnauroth] for the interest. bq. In the latest patch, I only see the new HeartbeatMonitor thread started in standalone mode, because it's coupled with ZooKeeperServerMain. In the case of a full ensemble, QuorumPeerMain won't start the thread, because it won't call ZooKeeperServerMain (unless it's the degenerate case of a single-node ensemble). Monitor thread is not required for the quorum as they already have a server loop where the {{internalError}} checks has been integrated. But in case of standalone there is no loops instead it has {{#join()}} like I mentioned earlier [ZooKeeperServerMain.java#L149|https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServerMain.java#L149]. Thats the reason I've added a watch dog there. > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124723#comment-15124723 ] Flavio Junqueira commented on ZOOKEEPER-2247: - [~rakesh_r] Thanks for updating the patch. There are two things I believe we can improve here: # Instead of doing this {{while (self.isRunning() && this.isRunning())}}, why don't you do this {{while (this.isRunning())}} and check {{self.running()}} in {{this.isRunning()}}? # I don't think we need the health monitor thread. It is just shutting down the cnxn factories and you could do it immediately after the server loops. For example, in Follower.java, add the cnxn factory shutdown calls after the {{while (this.isRunning()) {...} }}. Does it work? > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124549#comment-15124549 ] Chris Nauroth commented on ZOOKEEPER-2247: -- In the latest patch, I only see the new {{HeartbeatMonitor}} thread started in standalone mode, because it's coupled with {{ZooKeeperServerMain}}. In the case of a full ensemble, {{QuorumPeerMain}} won't start the thread, because it won't call {{ZooKeeperServerMain}} (unless it's the degenerate case of a single-node ensemble). I've been trying to look for a way to solve this without adding another watchdog thread, but unfortunately I don't have another proposal to offer right now. > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124117#comment-15124117 ] Chris Nauroth commented on ZOOKEEPER-1936: -- There was a test failure in {{WatcherTest#testWatcherAutoResetWithLocal}}, but I can't reproduce it. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2355) Ephemeral node is never deleted if follower fails while reading the proposal packet
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124088#comment-15124088 ] Hadoop QA commented on ZOOKEEPER-2355: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12785227/ZOOKEEPER-2355-02.patch against trunk revision 1726354. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3023//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3023//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3023//console This message is automatically generated. > Ephemeral node is never deleted if follower fails while reading the proposal > packet > --- > > Key: ZOOKEEPER-2355 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2355 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9 > > Attachments: ZOOKEEPER-2355-01.patch, ZOOKEEPER-2355-02.patch > > > ZooKeeper ephemeral node is never deleted if follower fail while reading the > proposal packet > The scenario is as follows: > # Configure three node ZooKeeper cluster, lets say nodes are A, B and C, > start all, assume A is leader, B and C are follower > # Connect to any of the server and create ephemeral node /e1 > # Close the session, ephemeral node /e1 will go for deletion > # While receiving delete proposal make Follower B to fail with > {{SocketTimeoutException}}. This we need to do to reproduce the scenario > otherwise in production environment it happens because of network fault. > # Remove the fault, just check that faulted Follower is now connected with > quorum > # Connect to any of the server, create the same ephemeral node /e1, created > is success. > # Close the session, ephemeral node /e1 will go for deletion > # {color:red}/e1 is not deleted from the faulted Follower B, It should have > been deleted as it was again created with another session{color} > # {color:green}/e1 is deleted from Leader A and other Follower C{color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: ZOOKEEPER-2355 PreCommit Build #3023
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2355 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3023/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 377543 lines...] [exec] [exec] +1 tests included. The patch appears to include 5 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3023//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3023//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3023//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] a2b6ad36635c4a2034ac87c8c458c2b78c75e511 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1816: exec returned: 1 Total time: 22 minutes 47 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Compressed 543.63 KB of artifacts by 23.5% relative to #3019 Recording test results Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 [description-setter] Description set: ZOOKEEPER-2355 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 ### ## FAILED TESTS (if any) ## 7 tests failed. FAILED: org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenFollowerRestart Error Message: KeeperErrorCode = ConnectionLoss for /foofoofoo-connected Stack Trace: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /foofoofoo-connected at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1643) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1671) at org.apache.zookeeper.server.ZxidRolloverTest.checkClientConnected(ZxidRolloverTest.java:119) at org.apache.zookeeper.server.ZxidRolloverTest.checkClientsConnected(ZxidRolloverTest.java:90) at org.apache.zookeeper.server.ZxidRolloverTest.start(ZxidRolloverTest.java:165) at org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenFollowerRestart(ZxidRolloverTest.java:345) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79) FAILED: org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalFollowerRun Error Message: expected:<4294967297> but was:<4294967296> Stack Trace: junit.framework.AssertionFailedError: expected:<4294967297> but
[jira] [Commented] (ZOOKEEPER-2297) NPE is thrown while creating "key manager" and "trust manager"
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124061#comment-15124061 ] Arshad Mohammad commented on ZOOKEEPER-2297: Too many findbugs get induce this way, I think the best way is to create ZooKeeper instance level context and use it statically as it was done in ZOOKEEPER-2297-02.patch. [~rakeshr] can you have a look on the current patch and its findbugs and earlier patch ZOOKEEPER-2297-02.patch. I think ZOOKEEPER-2297-02.patch is the better way to handle this secenario any other opinion?? > NPE is thrown while creating "key manager" and "trust manager" > --- > > Key: ZOOKEEPER-2297 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2297 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.1 > Environment: Suse 11 sp 3 >Reporter: Anushri >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2297-01.patch, ZOOKEEPER-2297-02.patch, > ZOOKEEPER-2297-03.patch, ZOOKEEPER-2297-04.patch, ZOOKEEPER-2355-05.patch > > > NPE is thrown while creating "key manager" and "trust manager" , even though > the zk setup is in non-secure mode > bq. 2015-10-19 12:54:12,278 [myid:2] - ERROR [ProcessThread(sid:2 > cport:-1)::X509AuthenticationProvider@78] - Failed to create key manager > bq. org.apache.zookeeper.common.X509Exception$KeyManagerException: > java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:129) > at > org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:75) > at > org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42) > at > org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68) > at > org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144) > Caused by: java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:113) > ... 7 more > bq. 2015-10-19 12:54:12,279 [myid:2] - ERROR [ProcessThread(sid:2 > cport:-1)::X509AuthenticationProvider@90] - Failed to create trust manager > bq. org.apache.zookeeper.common.X509Exception$TrustManagerException: > java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:158) > at > org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:87) > at > org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42) > at > org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68) > at > org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144) > Caused by: java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:143) > ... 7 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124054#comment-15124054 ] Hadoop QA commented on ZOOKEEPER-1936: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12785224/ZOOKEEPER-1936.v4.patch against trunk revision 1726354. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3022//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3022//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3022//console This message is automatically generated. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do >
Failed: ZOOKEEPER-1936 PreCommit Build #3022
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3022/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 380756 lines...] [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3022//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3022//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3022//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] ab3a6d0298540a99f457fef29c3d1f2750bab4ec logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1816: exec returned: 2 Total time: 12 minutes 12 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Compressed 543.42 KB of artifacts by 29.4% relative to #3019 Recording test results Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 [description-setter] Description set: ZOOKEEPER-1936 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.zookeeper.test.WatcherTest.testWatcherAutoResetWithLocal Error Message: Did not connect Stack Trace: java.util.concurrent.TimeoutException: Did not connect at org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:132) at org.apache.zookeeper.test.WatcherTest.testWatcherAutoReset(WatcherTest.java:340) at org.apache.zookeeper.test.WatcherTest.testWatcherAutoResetWithLocal(WatcherTest.java:240) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
[jira] [Updated] (ZOOKEEPER-2355) Ephemeral node is never deleted if follower fails while reading the proposal packet
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arshad Mohammad updated ZOOKEEPER-2355: --- Attachment: ZOOKEEPER-2355-02.patch Thanks [~rgs], [~eribeiro] for your review comments on test code. Submitting the solution. [~fpj] please have a look > Ephemeral node is never deleted if follower fails while reading the proposal > packet > --- > > Key: ZOOKEEPER-2355 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2355 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9 > > Attachments: ZOOKEEPER-2355-01.patch, ZOOKEEPER-2355-02.patch > > > ZooKeeper ephemeral node is never deleted if follower fail while reading the > proposal packet > The scenario is as follows: > # Configure three node ZooKeeper cluster, lets say nodes are A, B and C, > start all, assume A is leader, B and C are follower > # Connect to any of the server and create ephemeral node /e1 > # Close the session, ephemeral node /e1 will go for deletion > # While receiving delete proposal make Follower B to fail with > {{SocketTimeoutException}}. This we need to do to reproduce the scenario > otherwise in production environment it happens because of network fault. > # Remove the fault, just check that faulted Follower is now connected with > quorum > # Connect to any of the server, create the same ephemeral node /e1, created > is success. > # Close the session, ephemeral node /e1 will go for deletion > # {color:red}/e1 is not deleted from the faulted Follower B, It should have > been deleted as it was again created with another session{color} > # {color:green}/e1 is deleted from Leader A and other Follower C{color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: (was: ZOOKEEPER-1936.v4.patch) > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: ZOOKEEPER-1936.v4.patch > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124017#comment-15124017 ] Hadoop QA commented on ZOOKEEPER-1936: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12785223/ZOOKEEPER-1936.v4.patch against trunk revision 1726354. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3021//console This message is automatically generated. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: ZOOKEEPER-1936 PreCommit Build #3021
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3021/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 1522 lines...] [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12785223/ZOOKEEPER-1936.v4.patch [exec] against trunk revision 1726354. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3021//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] ba3d0a903aeb241d845950a7302417ae7a7510ae logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1816: exec returned: 1 Total time: 1 minute 15 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Recording test results Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 ERROR: Publisher 'Publish JUnit test result report' failed: No test report files were found. Configuration error? Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 [description-setter] Description set: ZOOKEEPER-1936 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 ### ## FAILED TESTS (if any) ## No tests ran.
Failed: ZOOKEEPER-2297 PreCommit Build #3020
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2297 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3020/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 390765 lines...] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3020//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3020//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3020//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] d6ad9365c4cd538cb96d5d61bec699643995be11 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1816: exec returned: 2 Total time: 20 minutes 7 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Recording test results Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 [description-setter] Description set: ZOOKEEPER-2297 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: ZOOKEEPER-1936.v4.patch Patch v4 addresses Chris' comment above > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2297) NPE is thrown while creating "key manager" and "trust manager"
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124008#comment-15124008 ] Hadoop QA commented on ZOOKEEPER-2297: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12785213/ZOOKEEPER-2355-05.patch against trunk revision 1726354. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3020//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3020//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3020//console This message is automatically generated. > NPE is thrown while creating "key manager" and "trust manager" > --- > > Key: ZOOKEEPER-2297 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2297 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.1 > Environment: Suse 11 sp 3 >Reporter: Anushri >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2297-01.patch, ZOOKEEPER-2297-02.patch, > ZOOKEEPER-2297-03.patch, ZOOKEEPER-2297-04.patch, ZOOKEEPER-2355-05.patch > > > NPE is thrown while creating "key manager" and "trust manager" , even though > the zk setup is in non-secure mode > bq. 2015-10-19 12:54:12,278 [myid:2] - ERROR [ProcessThread(sid:2 > cport:-1)::X509AuthenticationProvider@78] - Failed to create key manager > bq. org.apache.zookeeper.common.X509Exception$KeyManagerException: > java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:129) > at > org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:75) > at > org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42) > at > org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68) > at > org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144) > Caused by: java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:113) > ... 7 more > bq. 2015-10-19 12:54:12,279 [myid:2] - ERROR [ProcessThread(sid:2 > cport:-1)::X509AuthenticationProvider@90] - Failed to create trust manager > bq. org.apache.zookeeper.common.X509Exception$TrustManagerException: > java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:158) > at > org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:87) > at > org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42) > at > org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68) > at > org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144) > Caused by: java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:143) > ... 7 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123961#comment-15123961 ] Chris Nauroth commented on ZOOKEEPER-1936: -- It's a tough call then. I don't see a way to write a deterministic JUnit test to prove the fix. I'm reluctant to accept a code change without a test or a manual verification, at least on a stable maintenance line. Here is my take on it. Even without a consistent repro, I see the theoretical problem in the code. The logic change in the patch looks correct to me, even if there might have been something more happening when Ted reported it. Let's put a fix into trunk and branch-3.5, but not branch-3.4. In trunk and branch-3.5, we also can make the switch to [{{Files#createDirectories}}|http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#createDirectories(java.nio.file.Path,%20java.nio.file.attribute.FileAttribute...)] that I mentioned earlier, because those branches are compiling to JDK 7. That way, if we see another repro, we'll get additional debugging information to help with any subsequent patches. Would some other committers like to comment on that plan? Thanks! > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2297) NPE is thrown while creating "key manager" and "trust manager"
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arshad Mohammad updated ZOOKEEPER-2297: --- Attachment: ZOOKEEPER-2355-05.patch submitting new patch > NPE is thrown while creating "key manager" and "trust manager" > --- > > Key: ZOOKEEPER-2297 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2297 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.1 > Environment: Suse 11 sp 3 >Reporter: Anushri >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2297-01.patch, ZOOKEEPER-2297-02.patch, > ZOOKEEPER-2297-03.patch, ZOOKEEPER-2297-04.patch, ZOOKEEPER-2355-05.patch > > > NPE is thrown while creating "key manager" and "trust manager" , even though > the zk setup is in non-secure mode > bq. 2015-10-19 12:54:12,278 [myid:2] - ERROR [ProcessThread(sid:2 > cport:-1)::X509AuthenticationProvider@78] - Failed to create key manager > bq. org.apache.zookeeper.common.X509Exception$KeyManagerException: > java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:129) > at > org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:75) > at > org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42) > at > org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68) > at > org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144) > Caused by: java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:113) > ... 7 more > bq. 2015-10-19 12:54:12,279 [myid:2] - ERROR [ProcessThread(sid:2 > cport:-1)::X509AuthenticationProvider@90] - Failed to create trust manager > bq. org.apache.zookeeper.common.X509Exception$TrustManagerException: > java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:158) > at > org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:87) > at > org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42) > at > org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68) > at > org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144) > Caused by: java.lang.NullPointerException > at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:143) > ... 7 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123934#comment-15123934 ] Ted Yu commented on ZOOKEEPER-1936: --- Haven't got a chance to reproduce the bug. After some QE fix, hbase un-secure deployment works reliably. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 3.4.8 release
On 29 January 2016 at 10:23, Flavio Junqueira wrote: > Give me until the end of today to have a look at ZK-2247. Rakesh has put > the effort to prepare the patch, so I'd like to consider it. If it is not > ready, then let's push it to 3.4.9. Does it sound good, Raul? > Yeah that's reasonable. I actually won't be able to cut an RC until Sunday, so we have the whole weekend for that patch get more reviews :-) Thanks Flavio! -rgs > > -Flavio > > > On 29 Jan 2016, at 10:20, Raúl Gutiérrez Segalés > wrote: > > > > Ok - lets punt them to 3.4.9 then. > > > > > > -rgs > > > > On 29 January 2016 at 10:10, Chris Nauroth > wrote: > > > >> +1 for expediting the 3.4.8 release. Our goal was simply to correct the > >> critical bug discovered late in 3.4.7. Any patches more than that can > be > >> deferred to a 3.4.9 at a later date. > >> > >> --Chris Nauroth > >> > >> > >> > >> > >> On 1/29/16, 10:06 AM, "Flavio Junqueira" wrote: > >> > >>> +1 > >>> > On 29 Jan 2016, at 10:02, Ted Yu wrote: > > ZOOKEEPER-2355 is in Open state as of now. > > My two cents: > > 3.4.7 has been retracted. > It would be nice to get 3.4.8 out the door soon so that zookeeper > users > can > pick up bug fixes in between 3.4.6 and 3.4 branch. > > On Thu, Jan 28, 2016 at 8:57 AM, Raúl Gutiérrez Segalés > > wrote: > > > Hi, > > > > On 28 January 2016 at 07:07, Talluri, Chandra < > > chandra.tall...@fmr.com.invalid> wrote: > > > >> Thanks for the updates. > >> > >> When can we expect 3.4.8? > >> > > > > I think we need to decide if we want to include these patches: > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2355 > > https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > > > > They seem to be almost ready, though there might be some subtleties > > left to > > be addressed. > > > > > >> What are the plans for 3.5.* stable version release? > >> > > > > The general plan for making 3.5 stable is here: > > > > http://markmail.org/message/ymxliy2rrwjc2pmo > > > > I believe Chris Nauroth will be the RM for the upcoming 3.5.2 > release. > > > > > > -rgs > > > >>> > >>> > >> > >> > >
Re: 3.4.8 release
Give me until the end of today to have a look at ZK-2247. Rakesh has put the effort to prepare the patch, so I'd like to consider it. If it is not ready, then let's push it to 3.4.9. Does it sound good, Raul? -Flavio > On 29 Jan 2016, at 10:20, Raúl Gutiérrez Segalés wrote: > > Ok - lets punt them to 3.4.9 then. > > > -rgs > > On 29 January 2016 at 10:10, Chris Nauroth wrote: > >> +1 for expediting the 3.4.8 release. Our goal was simply to correct the >> critical bug discovered late in 3.4.7. Any patches more than that can be >> deferred to a 3.4.9 at a later date. >> >> --Chris Nauroth >> >> >> >> >> On 1/29/16, 10:06 AM, "Flavio Junqueira" wrote: >> >>> +1 >>> On 29 Jan 2016, at 10:02, Ted Yu wrote: ZOOKEEPER-2355 is in Open state as of now. My two cents: 3.4.7 has been retracted. It would be nice to get 3.4.8 out the door soon so that zookeeper users can pick up bug fixes in between 3.4.6 and 3.4 branch. On Thu, Jan 28, 2016 at 8:57 AM, Raúl Gutiérrez Segalés wrote: > Hi, > > On 28 January 2016 at 07:07, Talluri, Chandra < > chandra.tall...@fmr.com.invalid> wrote: > >> Thanks for the updates. >> >> When can we expect 3.4.8? >> > > I think we need to decide if we want to include these patches: > > https://issues.apache.org/jira/browse/ZOOKEEPER-2355 > https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > > They seem to be almost ready, though there might be some subtleties > left to > be addressed. > > >> What are the plans for 3.5.* stable version release? >> > > The general plan for making 3.5 stable is here: > > http://markmail.org/message/ymxliy2rrwjc2pmo > > I believe Chris Nauroth will be the RM for the upcoming 3.5.2 release. > > > -rgs > >>> >>> >> >>
[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123914#comment-15123914 ] Raul Gutierrez Segales commented on ZOOKEEPER-2247: --- Per discussion in the mailing list, lets punt this to 3.4.9. cc: [~fpj] > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2355) Ephemeral node is never deleted if follower fails while reading the proposal packet
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123912#comment-15123912 ] Raul Gutierrez Segales commented on ZOOKEEPER-2355: --- Per discussion in the mailing list, lets punt this to 3.4.9. cc: [~fpj] > Ephemeral node is never deleted if follower fails while reading the proposal > packet > --- > > Key: ZOOKEEPER-2355 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2355 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9 > > Attachments: ZOOKEEPER-2355-01.patch > > > ZooKeeper ephemeral node is never deleted if follower fail while reading the > proposal packet > The scenario is as follows: > # Configure three node ZooKeeper cluster, lets say nodes are A, B and C, > start all, assume A is leader, B and C are follower > # Connect to any of the server and create ephemeral node /e1 > # Close the session, ephemeral node /e1 will go for deletion > # While receiving delete proposal make Follower B to fail with > {{SocketTimeoutException}}. This we need to do to reproduce the scenario > otherwise in production environment it happens because of network fault. > # Remove the fault, just check that faulted Follower is now connected with > quorum > # Connect to any of the server, create the same ephemeral node /e1, created > is success. > # Close the session, ephemeral node /e1 will go for deletion > # {color:red}/e1 is not deleted from the faulted Follower B, It should have > been deleted as it was again created with another session{color} > # {color:green}/e1 is deleted from Leader A and other Follower C{color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales updated ZOOKEEPER-2247: -- Fix Version/s: (was: 3.4.8) 3.4.9 > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 3.4.8 release
Ok - lets punt them to 3.4.9 then. -rgs On 29 January 2016 at 10:10, Chris Nauroth wrote: > +1 for expediting the 3.4.8 release. Our goal was simply to correct the > critical bug discovered late in 3.4.7. Any patches more than that can be > deferred to a 3.4.9 at a later date. > > --Chris Nauroth > > > > > On 1/29/16, 10:06 AM, "Flavio Junqueira" wrote: > > >+1 > > > >> On 29 Jan 2016, at 10:02, Ted Yu wrote: > >> > >> ZOOKEEPER-2355 is in Open state as of now. > >> > >> My two cents: > >> > >> 3.4.7 has been retracted. > >> It would be nice to get 3.4.8 out the door soon so that zookeeper users > >>can > >> pick up bug fixes in between 3.4.6 and 3.4 branch. > >> > >> On Thu, Jan 28, 2016 at 8:57 AM, Raúl Gutiérrez Segalés > >> >>> wrote: > >> > >>> Hi, > >>> > >>> On 28 January 2016 at 07:07, Talluri, Chandra < > >>> chandra.tall...@fmr.com.invalid> wrote: > >>> > Thanks for the updates. > > When can we expect 3.4.8? > > >>> > >>> I think we need to decide if we want to include these patches: > >>> > >>> https://issues.apache.org/jira/browse/ZOOKEEPER-2355 > >>> https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > >>> > >>> They seem to be almost ready, though there might be some subtleties > >>>left to > >>> be addressed. > >>> > >>> > What are the plans for 3.5.* stable version release? > > >>> > >>> The general plan for making 3.5 stable is here: > >>> > >>> http://markmail.org/message/ymxliy2rrwjc2pmo > >>> > >>> I believe Chris Nauroth will be the RM for the upcoming 3.5.2 release. > >>> > >>> > >>> -rgs > >>> > > > > > >
[jira] [Updated] (ZOOKEEPER-2355) Ephemeral node is never deleted if follower fails while reading the proposal packet
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales updated ZOOKEEPER-2355: -- Fix Version/s: 3.4.9 > Ephemeral node is never deleted if follower fails while reading the proposal > packet > --- > > Key: ZOOKEEPER-2355 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2355 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.9 > > Attachments: ZOOKEEPER-2355-01.patch > > > ZooKeeper ephemeral node is never deleted if follower fail while reading the > proposal packet > The scenario is as follows: > # Configure three node ZooKeeper cluster, lets say nodes are A, B and C, > start all, assume A is leader, B and C are follower > # Connect to any of the server and create ephemeral node /e1 > # Close the session, ephemeral node /e1 will go for deletion > # While receiving delete proposal make Follower B to fail with > {{SocketTimeoutException}}. This we need to do to reproduce the scenario > otherwise in production environment it happens because of network fault. > # Remove the fault, just check that faulted Follower is now connected with > quorum > # Connect to any of the server, create the same ephemeral node /e1, created > is success. > # Close the session, ephemeral node /e1 will go for deletion > # {color:red}/e1 is not deleted from the faulted Follower B, It should have > been deleted as it was again created with another session{color} > # {color:green}/e1 is deleted from Leader A and other Follower C{color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123906#comment-15123906 ] Chris Nauroth commented on ZOOKEEPER-1936: -- [~yuzhih...@gmail.com], my understanding is that you only can repro this in standalone mode, not when deploying a full ensemble. Is that correct? Did you get to a point where you had a consistent repro and were able to verify that this patch helped fix it? The logic change looks correct to me, but I'm trying to figure out if there is really something more going on, as suggested in earlier comments. Thanks! > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 3.4.8 release
+1 for expediting the 3.4.8 release. Our goal was simply to correct the critical bug discovered late in 3.4.7. Any patches more than that can be deferred to a 3.4.9 at a later date. --Chris Nauroth On 1/29/16, 10:06 AM, "Flavio Junqueira" wrote: >+1 > >> On 29 Jan 2016, at 10:02, Ted Yu wrote: >> >> ZOOKEEPER-2355 is in Open state as of now. >> >> My two cents: >> >> 3.4.7 has been retracted. >> It would be nice to get 3.4.8 out the door soon so that zookeeper users >>can >> pick up bug fixes in between 3.4.6 and 3.4 branch. >> >> On Thu, Jan 28, 2016 at 8:57 AM, Raúl Gutiérrez Segalés wrote: >> >>> Hi, >>> >>> On 28 January 2016 at 07:07, Talluri, Chandra < >>> chandra.tall...@fmr.com.invalid> wrote: >>> Thanks for the updates. When can we expect 3.4.8? >>> >>> I think we need to decide if we want to include these patches: >>> >>> https://issues.apache.org/jira/browse/ZOOKEEPER-2355 >>> https://issues.apache.org/jira/browse/ZOOKEEPER-2247 >>> >>> They seem to be almost ready, though there might be some subtleties >>>left to >>> be addressed. >>> >>> What are the plans for 3.5.* stable version release? >>> >>> The general plan for making 3.5 stable is here: >>> >>> http://markmail.org/message/ymxliy2rrwjc2pmo >>> >>> I believe Chris Nauroth will be the RM for the upcoming 3.5.2 release. >>> >>> >>> -rgs >>> > >
Re: 3.4.8 release
+1 > On 29 Jan 2016, at 10:02, Ted Yu wrote: > > ZOOKEEPER-2355 is in Open state as of now. > > My two cents: > > 3.4.7 has been retracted. > It would be nice to get 3.4.8 out the door soon so that zookeeper users can > pick up bug fixes in between 3.4.6 and 3.4 branch. > > On Thu, Jan 28, 2016 at 8:57 AM, Raúl Gutiérrez Segalés > wrote: > >> Hi, >> >> On 28 January 2016 at 07:07, Talluri, Chandra < >> chandra.tall...@fmr.com.invalid> wrote: >> >>> Thanks for the updates. >>> >>> When can we expect 3.4.8? >>> >> >> I think we need to decide if we want to include these patches: >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-2355 >> https://issues.apache.org/jira/browse/ZOOKEEPER-2247 >> >> They seem to be almost ready, though there might be some subtleties left to >> be addressed. >> >> >>> What are the plans for 3.5.* stable version release? >>> >> >> The general plan for making 3.5 stable is here: >> >> http://markmail.org/message/ymxliy2rrwjc2pmo >> >> I believe Chris Nauroth will be the RM for the upcoming 3.5.2 release. >> >> >> -rgs >>
Re: 3.4.8 release
ZOOKEEPER-2355 is in Open state as of now. My two cents: 3.4.7 has been retracted. It would be nice to get 3.4.8 out the door soon so that zookeeper users can pick up bug fixes in between 3.4.6 and 3.4 branch. On Thu, Jan 28, 2016 at 8:57 AM, Raúl Gutiérrez Segalés wrote: > Hi, > > On 28 January 2016 at 07:07, Talluri, Chandra < > chandra.tall...@fmr.com.invalid> wrote: > > > Thanks for the updates. > > > > When can we expect 3.4.8? > > > > I think we need to decide if we want to include these patches: > > https://issues.apache.org/jira/browse/ZOOKEEPER-2355 > https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > > They seem to be almost ready, though there might be some subtleties left to > be addressed. > > > > What are the plans for 3.5.* stable version release? > > > > The general plan for making 3.5 stable is here: > > http://markmail.org/message/ymxliy2rrwjc2pmo > > I believe Chris Nauroth will be the RM for the upcoming 3.5.2 release. > > > -rgs >
[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123244#comment-15123244 ] Hadoop QA commented on ZOOKEEPER-2247: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12785133/ZOOKEEPER-2247-09.patch against trunk revision 1726354. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3019//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3019//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3019//console This message is automatically generated. > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.8, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: ZOOKEEPER-2247 PreCommit Build #3019
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3019/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 384909 lines...] [exec] http://issues.apache.org/jira/secure/attachment/12785133/ZOOKEEPER-2247-09.patch [exec] against trunk revision 1726354. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3019//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3019//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3019//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] daff8d3aeac2a3e58bfba738c5601033b1a22942 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD SUCCESSFUL Total time: 16 minutes 56 seconds Archiving artifacts Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Recording test results Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 [description-setter] Description set: ZOOKEEPER-2247 Email was triggered for: Success Sending email for trigger: Success Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 Setting LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7 ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated ZOOKEEPER-2247: Attachment: ZOOKEEPER-2247-09.patch > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.8, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated ZOOKEEPER-2247: Attachment: (was: ZOOKEEPER-2247-09.patch) > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.8, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated ZOOKEEPER-2247: Attachment: ZOOKEEPER-2247-09.patch > Zookeeper service becomes unavailable when leader fails to write transaction > log > > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Critical > Fix For: 3.4.8, 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, > ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, > ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)