[jira] [Commented] (ZOOKEEPER-1440) Spurious log error messages when QuorumCnxManager is shutting down
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276523#comment-13276523 ] Michi Mutsuzaki commented on ZOOKEEPER-1440: Ah sorry I should've caught that. Jordan's new patch looks good to me. Pat, I'll wait for your +1 before checking in this time :) --Michi Spurious log error messages when QuorumCnxManager is shutting down -- Key: ZOOKEEPER-1440 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1440 Project: ZooKeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.4.3 Reporter: Jordan Zimmerman Assignee: Jordan Zimmerman Priority: Minor Fix For: 3.5.0 Attachments: patch.txt, patch.txt When shutting down the QuroumPeer, ZK server logs unnecessary errors. See QuorumCnxManager.Listener.run() - ss.accept() will throw an exception when it is closed. The catch (IOException e) will log errors. It should first check the shutdown field to see if the Listener is being shutdown. If it is, the exception is correct and no errors should be logged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1467) Server principal on client side is derived using hostname.
Laxman created ZOOKEEPER-1467: - Summary: Server principal on client side is derived using hostname. Key: ZOOKEEPER-1467 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1467 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.3, 3.4.4, 3.5.0, 4.0.0 Reporter: Laxman Priority: Blocker Server principal on client side is derived using hostname. org.apache.zookeeper.ClientCnxn.SendThread.startConnect() {code} try { zooKeeperSaslClient = new ZooKeeperSaslClient(zookeeper/+addr.getHostName()); } {code} This may have problems when admin wanted some customized principals like zookeeper/cluste...@hadoop.com where clusterid is the cluster identifier but not the host name. IMO, server principal also should be configurable as hadoop is doing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1437) Client uses session before SASL authentication complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated ZOOKEEPER-1437: - Attachment: ZOOKEEPER-1437.patch Use a CountDownLatch within ClientCnxn to control access to outgoing packet queue: non-SASL packets must wait until SASL authentication has completed. Client uses session before SASL authentication complete --- Key: ZOOKEEPER-1437 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1437 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.3 Reporter: Thomas Weise Assignee: Eugene Koontz Fix For: 3.4.4, 3.5.0 Attachments: ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch Found issue in the context of hbase region server startup, but can be reproduced w/ zkCli alone. getData may occur prior to SaslAuthenticated and fail with NoAuth. This is not expected behavior when the client is configured to use SASL. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1437) Client uses session before SASL authentication complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276988#comment-13276988 ] Eugene Koontz commented on ZOOKEEPER-1437: -- Excuse me, I meant ClientCnxn:queuePacket(), not ClientCnxn:queueSaslPacket() in my 20:26 comment above. Client uses session before SASL authentication complete --- Key: ZOOKEEPER-1437 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1437 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.3 Reporter: Thomas Weise Assignee: Eugene Koontz Fix For: 3.4.4, 3.5.0 Attachments: ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch Found issue in the context of hbase region server startup, but can be reproduced w/ zkCli alone. getData may occur prior to SaslAuthenticated and fail with NoAuth. This is not expected behavior when the client is configured to use SASL. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Success: ZOOKEEPER-1437 PreCommit Build #1076
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1437 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1076/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 182317 lines...] [exec] BUILD SUCCESSFUL [exec] Total time: 0 seconds [exec] [exec] [exec] [exec] [exec] +1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12527672/ZOOKEEPER-1437.patch [exec] against trunk revision 1337029. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 15 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1076//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1076//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1076//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] aI6vlNOI45 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD SUCCESSFUL Total time: 26 minutes 42 seconds Archiving artifacts Recording test results Description set: ZOOKEEPER-1437 Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1437) Client uses session before SASL authentication complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277016#comment-13277016 ] Hadoop QA commented on ZOOKEEPER-1437: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527672/ZOOKEEPER-1437.patch against trunk revision 1337029. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1076//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1076//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1076//console This message is automatically generated. Client uses session before SASL authentication complete --- Key: ZOOKEEPER-1437 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1437 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.3 Reporter: Thomas Weise Assignee: Eugene Koontz Fix For: 3.4.4, 3.5.0 Attachments: ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch Found issue in the context of hbase region server startup, but can be reproduced w/ zkCli alone. getData may occur prior to SaslAuthenticated and fail with NoAuth. This is not expected behavior when the client is configured to use SASL. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Possible issue with cluster availability following new Leader Election - ZK 3.4
We also have encountered a problem where the newly elected leader sends entire snapshot to a follower even though the follower is in sync with the leader. A closer look at the code shows the problem in the logic where we decide to send a snapshot. Following scenario explains the problem in details. Start a 3 node Zookeeper ensemble where every quorum member has seen same changes. zxid: *0x40004* 1. When a newly elected leader starts, it bumps up its zxid to the new epoch. Code snippet Leader.java long epoch = getEpochToPropose(self.getId(), self.getAcceptedEpoch()); zk.setZxid(ZxidUtils.makeZxid(epoch, 0)); synchronized(this){ lastProposed = zk.getZxid(); // *0x5* } 2. Now a follower tries to join the leader with its peerLastZxid = * 0x40004* Note that now the leader has in memory committedLog list with* * maxCommittedLog=*0x40004** * * * As committedLog don't have any new transactions which have zxid peerLastZxid, we check if the leader and follower are in sync. Code snippet from LearnerHandler.java leaderLastZxid = leader.startForwarding(this, updates); if (peerLastZxid == leaderLastZxid) { *0x40004 == **0x5* // We are in sync so we'll do an empty diff packetToSend = Leader.DIFF; zxidToSend = leaderLastZxid; } Note that the function *leader.startForwarding()* returns *lastProposed *zxid which is already set to *0x5 *by the leader. So in this scenario we never send empty diff even though the leader and follower are in sync, and we end up sending entire snapshot in the code that follows above check. A possible fix would be to keep *lastProcessedZxid* in the leader which will get updated only when the leader processes a transaction. While syncing with a follower, if the peerLastZxid sent by a follower is same as lastProcessedZxid of the leader we can send empty diff to the follower. This shall avoid unnecessarily sending entire snapshot when the leader and follower are already in sync. Zookeeper developers please share your views on above mentioned issue. - Vinayak On Mon, May 14, 2012 at 8:30 AM, Camille Fournier cami...@apache.orgwrote: Thanks. I just ran a couple of tests to start the debugging. Mark, I don't see a long cluster settle with a mostly empty data set, so I think this might be two different problems. I do see a lot of snapshots being sent though so there is probably some overaggressiveness in the way that we evaluate when to send snapshots that should be evaluated. Adding the dev mailing list, as I may need ben or flavio to take a look as well. C On Thu, May 10, 2012 at 10:48 AM, alexandar.gvozdeno...@ubs.com wrote: Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465 -Original Message- From: Camille Fournier [mailto:cami...@apache.org] Sent: 10 May 2012 14:58 To: u...@zookeeper.apache.org Subject: Re: Possible issue with cluster availability following new Leader Election - ZK 3.4 I will take a look at this soon, have you created a Jira for it? If not please do so. Thanks, C On Thu, May 10, 2012 at 7:20 AM, alexandar.gvozdeno...@ubs.com wrote: I think there may be a problem here with the 3.4 branch. I dropped the cluster back to 3.3.5 and the behaviour was much better. To summarize: 650mb of data 20k nodes of varied size 3 node cluster On 3.4.x (using latest branch build) - Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to recover from a leader failure Takes 10 secs for a new follower to join the cluster On 3.3.5 Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to recover from a leader failure Takes 10 secs for a new follower to join the cluster Any views on this from the ZK devs? The differences in behaviour only start becoming apparent as the dataset gets bigger. I was hoping to use 3.4 for the transactional features it offered via the 'multi-update' operations, but this issue seems pretty serious... Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial
[jira] [Updated] (ZOOKEEPER-1355) Add zk.updateServerList(newServerList)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marshall McMullen updated ZOOKEEPER-1355: - Attachment: ZOOKEEPER-1355-ver12-4.patch Updated to work correctly against tip of trunk. All unit tests should pass now. Add zk.updateServerList(newServerList) --- Key: ZOOKEEPER-1355 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1355 Project: ZooKeeper Issue Type: New Feature Components: c client, java client Reporter: Alexander Shraer Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1355-ver10-1.patch, ZOOKEEPER-1355-ver10-2.patch, ZOOKEEPER-1355-ver10-3.patch, ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10.patch, ZOOKEEPER-1355-ver11-1.patch, ZOOKEEPER-1355-ver11.patch, ZOOKEEPER-1355-ver12-1.patch, ZOOKEEPER-1355-ver12-2.patch, ZOOKEEPER-1355-ver12-4.patch, ZOOKEEPER-1355-ver12.patch, ZOOKEEPER-1355-ver2.patch, ZOOKEEPER-1355-ver4.patch, ZOOKEEPER-1355-ver5.patch, ZOOKEEPER-1355-ver6.patch, ZOOKEEPER-1355-ver7.patch, ZOOKEEPER-1355-ver8.patch, ZOOKEEPER-1355-ver9-1.patch, ZOOKEEPER-1355-ver9.patch, ZOOKEEPER=1355-ver3.patch, ZOOOKEEPER-1355-test.patch, ZOOOKEEPER-1355-ver1.patch, ZOOOKEEPER-1355.patch, loadbalancing-more-details.pdf, loadbalancing.pdf When the set of servers changes, we would like to update the server list stored by clients without restarting the clients. Moreover, assuming that the number of clients per server is the same (in expectation) in the old configuration (as guaranteed by the current list shuffling for example), we would like to re-balance client connections across the new set of servers in a way that a) the number of clients per server is the same for all servers (in expectation) and b) there is no excessive/unnecessary client migration. It is simple to achieve (a) without (b) - just re-shuffle the new list of servers at every client. But this would create unnecessary migration, which we'd like to avoid. We propose a simple probabilistic migration scheme that achieves (a) and (b) - each client locally decides whether and where to migrate when the list of servers changes. The attached document describes the scheme and shows an evaluation of it in Zookeeper. We also implemented re-balancing through a consistent-hashing scheme and show a comparison. We derived the probabilistic migration rules from a simple formula that we can also provide, if someone's interested in the proof. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Failed: ZOOKEEPER-1355 PreCommit Build #1077
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1355 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1077/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 174 lines...] [exec] Hunk #4 FAILED at 437. [exec] Hunk #5 FAILED at 575. [exec] 5 out of 5 hunks FAILED -- saving rejects to file src/gc/java/main/org/apache/zookeeper/ZooKeeper.java.rej [exec] patching file src/gc/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java [exec] Hunk #1 FAILED at 211. [exec] 1 out of 1 hunk FAILED -- saving rejects to file src/gc/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java.rej [exec] patching file src/gc/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java [exec] Hunk #1 FAILED at 29. [exec] Hunk #2 FAILED at 85. [exec] 2 out of 2 hunks FAILED -- saving rejects to file src/gc/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java.rej [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12527735/ZOOKEEPER-1355-ver12-4.patch [exec] against trunk revision 1337029. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 34 new or modified tests. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1077//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] RqGMF8Y8I0 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1568: exec returned: 1 Total time: 42 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1355 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-1355) Add zk.updateServerList(newServerList)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277238#comment-13277238 ] Hadoop QA commented on ZOOKEEPER-1355: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527735/ZOOKEEPER-1355-ver12-4.patch against trunk revision 1337029. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 34 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1077//console This message is automatically generated. Add zk.updateServerList(newServerList) --- Key: ZOOKEEPER-1355 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1355 Project: ZooKeeper Issue Type: New Feature Components: c client, java client Reporter: Alexander Shraer Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1355-ver10-1.patch, ZOOKEEPER-1355-ver10-2.patch, ZOOKEEPER-1355-ver10-3.patch, ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10.patch, ZOOKEEPER-1355-ver11-1.patch, ZOOKEEPER-1355-ver11.patch, ZOOKEEPER-1355-ver12-1.patch, ZOOKEEPER-1355-ver12-2.patch, ZOOKEEPER-1355-ver12-4.patch, ZOOKEEPER-1355-ver12.patch, ZOOKEEPER-1355-ver2.patch, ZOOKEEPER-1355-ver4.patch, ZOOKEEPER-1355-ver5.patch, ZOOKEEPER-1355-ver6.patch, ZOOKEEPER-1355-ver7.patch, ZOOKEEPER-1355-ver8.patch, ZOOKEEPER-1355-ver9-1.patch, ZOOKEEPER-1355-ver9.patch, ZOOKEEPER=1355-ver3.patch, ZOOOKEEPER-1355-test.patch, ZOOOKEEPER-1355-ver1.patch, ZOOOKEEPER-1355.patch, loadbalancing-more-details.pdf, loadbalancing.pdf When the set of servers changes, we would like to update the server list stored by clients without restarting the clients. Moreover, assuming that the number of clients per server is the same (in expectation) in the old configuration (as guaranteed by the current list shuffling for example), we would like to re-balance client connections across the new set of servers in a way that a) the number of clients per server is the same for all servers (in expectation) and b) there is no excessive/unnecessary client migration. It is simple to achieve (a) without (b) - just re-shuffle the new list of servers at every client. But this would create unnecessary migration, which we'd like to avoid. We propose a simple probabilistic migration scheme that achieves (a) and (b) - each client locally decides whether and where to migrate when the list of servers changes. The attached document describes the scheme and shows an evaluation of it in Zookeeper. We also implemented re-balancing through a consistent-hashing scheme and show a comparison. We derived the probabilistic migration rules from a simple formula that we can also provide, if someone's interested in the proof. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1355) Add zk.updateServerList(newServerList)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marshall McMullen updated ZOOKEEPER-1355: - Attachment: ZOOKEEPER-1355-ver13.patch Bad patch last time, my apologies. Bumped the version on this one to 13 to avoid confusion. Add zk.updateServerList(newServerList) --- Key: ZOOKEEPER-1355 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1355 Project: ZooKeeper Issue Type: New Feature Components: c client, java client Reporter: Alexander Shraer Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1355-ver10-1.patch, ZOOKEEPER-1355-ver10-2.patch, ZOOKEEPER-1355-ver10-3.patch, ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10.patch, ZOOKEEPER-1355-ver11-1.patch, ZOOKEEPER-1355-ver11.patch, ZOOKEEPER-1355-ver12-1.patch, ZOOKEEPER-1355-ver12-2.patch, ZOOKEEPER-1355-ver12-4.patch, ZOOKEEPER-1355-ver12.patch, ZOOKEEPER-1355-ver13.patch, ZOOKEEPER-1355-ver2.patch, ZOOKEEPER-1355-ver4.patch, ZOOKEEPER-1355-ver5.patch, ZOOKEEPER-1355-ver6.patch, ZOOKEEPER-1355-ver7.patch, ZOOKEEPER-1355-ver8.patch, ZOOKEEPER-1355-ver9-1.patch, ZOOKEEPER-1355-ver9.patch, ZOOKEEPER=1355-ver3.patch, ZOOOKEEPER-1355-test.patch, ZOOOKEEPER-1355-ver1.patch, ZOOOKEEPER-1355.patch, loadbalancing-more-details.pdf, loadbalancing.pdf When the set of servers changes, we would like to update the server list stored by clients without restarting the clients. Moreover, assuming that the number of clients per server is the same (in expectation) in the old configuration (as guaranteed by the current list shuffling for example), we would like to re-balance client connections across the new set of servers in a way that a) the number of clients per server is the same for all servers (in expectation) and b) there is no excessive/unnecessary client migration. It is simple to achieve (a) without (b) - just re-shuffle the new list of servers at every client. But this would create unnecessary migration, which we'd like to avoid. We propose a simple probabilistic migration scheme that achieves (a) and (b) - each client locally decides whether and where to migrate when the list of servers changes. The attached document describes the scheme and shows an evaluation of it in Zookeeper. We also implemented re-balancing through a consistent-hashing scheme and show a comparison. We derived the probabilistic migration rules from a simple formula that we can also provide, if someone's interested in the proof. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1355) Add zk.updateServerList(newServerList)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277285#comment-13277285 ] Hadoop QA commented on ZOOKEEPER-1355: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527742/ZOOKEEPER-1355-ver13.patch against trunk revision 1337029. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 34 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1078//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1078//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1078//console This message is automatically generated. Add zk.updateServerList(newServerList) --- Key: ZOOKEEPER-1355 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1355 Project: ZooKeeper Issue Type: New Feature Components: c client, java client Reporter: Alexander Shraer Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1355-ver10-1.patch, ZOOKEEPER-1355-ver10-2.patch, ZOOKEEPER-1355-ver10-3.patch, ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10.patch, ZOOKEEPER-1355-ver11-1.patch, ZOOKEEPER-1355-ver11.patch, ZOOKEEPER-1355-ver12-1.patch, ZOOKEEPER-1355-ver12-2.patch, ZOOKEEPER-1355-ver12-4.patch, ZOOKEEPER-1355-ver12.patch, ZOOKEEPER-1355-ver13.patch, ZOOKEEPER-1355-ver2.patch, ZOOKEEPER-1355-ver4.patch, ZOOKEEPER-1355-ver5.patch, ZOOKEEPER-1355-ver6.patch, ZOOKEEPER-1355-ver7.patch, ZOOKEEPER-1355-ver8.patch, ZOOKEEPER-1355-ver9-1.patch, ZOOKEEPER-1355-ver9.patch, ZOOKEEPER=1355-ver3.patch, ZOOOKEEPER-1355-test.patch, ZOOOKEEPER-1355-ver1.patch, ZOOOKEEPER-1355.patch, loadbalancing-more-details.pdf, loadbalancing.pdf When the set of servers changes, we would like to update the server list stored by clients without restarting the clients. Moreover, assuming that the number of clients per server is the same (in expectation) in the old configuration (as guaranteed by the current list shuffling for example), we would like to re-balance client connections across the new set of servers in a way that a) the number of clients per server is the same for all servers (in expectation) and b) there is no excessive/unnecessary client migration. It is simple to achieve (a) without (b) - just re-shuffle the new list of servers at every client. But this would create unnecessary migration, which we'd like to avoid. We propose a simple probabilistic migration scheme that achieves (a) and (b) - each client locally decides whether and where to migrate when the list of servers changes. The attached document describes the scheme and shows an evaluation of it in Zookeeper. We also implemented re-balancing through a consistent-hashing scheme and show a comparison. We derived the probabilistic migration rules from a simple formula that we can also provide, if someone's interested in the proof. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Possible issue with cluster availability following new Leader Election - ZK 3.4
This pretty much matches what I expect. It would be great if you wanted to try your hand at creating a patch and submitting it to the ticket that was created for this problem, but if not, please post this analysis to issue 1465 and we'll look at it ASAP. C On Wed, May 16, 2012 at 2:55 PM, Vinayak Khot vina...@nutanix.com wrote: We also have encountered a problem where the newly elected leader sends entire snapshot to a follower even though the follower is in sync with the leader. A closer look at the code shows the problem in the logic where we decide to send a snapshot. Following scenario explains the problem in details. Start a 3 node Zookeeper ensemble where every quorum member has seen same changes. zxid: *0x40004* 1. When a newly elected leader starts, it bumps up its zxid to the new epoch. Code snippet Leader.java long epoch = getEpochToPropose(self.getId(), self.getAcceptedEpoch()); zk.setZxid(ZxidUtils.makeZxid(epoch, 0)); synchronized(this){ lastProposed = zk.getZxid(); // *0x5* } 2. Now a follower tries to join the leader with its peerLastZxid = * 0x40004* Note that now the leader has in memory committedLog list with* * maxCommittedLog=*0x40004** * * * As committedLog don't have any new transactions which have zxid peerLastZxid, we check if the leader and follower are in sync. Code snippet from LearnerHandler.java leaderLastZxid = leader.startForwarding(this, updates); if (peerLastZxid == leaderLastZxid) { *0x40004 == **0x5* // We are in sync so we'll do an empty diff packetToSend = Leader.DIFF; zxidToSend = leaderLastZxid; } Note that the function *leader.startForwarding()* returns *lastProposed *zxid which is already set to *0x5 *by the leader. So in this scenario we never send empty diff even though the leader and follower are in sync, and we end up sending entire snapshot in the code that follows above check. A possible fix would be to keep *lastProcessedZxid* in the leader which will get updated only when the leader processes a transaction. While syncing with a follower, if the peerLastZxid sent by a follower is same as lastProcessedZxid of the leader we can send empty diff to the follower. This shall avoid unnecessarily sending entire snapshot when the leader and follower are already in sync. Zookeeper developers please share your views on above mentioned issue. - Vinayak On Mon, May 14, 2012 at 8:30 AM, Camille Fournier cami...@apache.orgwrote: Thanks. I just ran a couple of tests to start the debugging. Mark, I don't see a long cluster settle with a mostly empty data set, so I think this might be two different problems. I do see a lot of snapshots being sent though so there is probably some overaggressiveness in the way that we evaluate when to send snapshots that should be evaluated. Adding the dev mailing list, as I may need ben or flavio to take a look as well. C On Thu, May 10, 2012 at 10:48 AM, alexandar.gvozdeno...@ubs.com wrote: Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465 -Original Message- From: Camille Fournier [mailto:cami...@apache.org] Sent: 10 May 2012 14:58 To: u...@zookeeper.apache.org Subject: Re: Possible issue with cluster availability following new Leader Election - ZK 3.4 I will take a look at this soon, have you created a Jira for it? If not please do so. Thanks, C On Thu, May 10, 2012 at 7:20 AM, alexandar.gvozdeno...@ubs.com wrote: I think there may be a problem here with the 3.4 branch. I dropped the cluster back to 3.3.5 and the behaviour was much better. To summarize: 650mb of data 20k nodes of varied size 3 node cluster On 3.4.x (using latest branch build) - Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to recover from a leader failure Takes 10 secs for a new follower to join the cluster On 3.3.5 Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to recover from a leader failure Takes 10 secs for a new follower to join the cluster Any views on this from the ZK devs? The differences in behaviour only start becoming apparent as the dataset gets bigger. I was hoping to use 3.4 for the transactional features it offered via the 'multi-update' operations, but this issue seems pretty serious... Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender
[jira] [Commented] (ZOOKEEPER-1355) Add zk.updateServerList(newServerList)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277532#comment-13277532 ] Eugene Koontz commented on ZOOKEEPER-1355: -- Typo in src/docs/src/documentation/content/xdocs/zookeeperProgrammers.xml: jus should be just. The documentation is good - perhaps it could make reference in the source code to where the client's server-selection logic is implemented (StaticHostProvider::updateServerList()). i.e. provide a link to the source such as http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/client/StaticHostProvider.java?view=markup or https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/client/StaticHostProvider.java The test cases seem to have a lot more cases than the documentation: would be nice to have the doc's examples correspond to the test cases. Add zk.updateServerList(newServerList) --- Key: ZOOKEEPER-1355 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1355 Project: ZooKeeper Issue Type: New Feature Components: c client, java client Reporter: Alexander Shraer Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1355-ver10-1.patch, ZOOKEEPER-1355-ver10-2.patch, ZOOKEEPER-1355-ver10-3.patch, ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10.patch, ZOOKEEPER-1355-ver11-1.patch, ZOOKEEPER-1355-ver11.patch, ZOOKEEPER-1355-ver12-1.patch, ZOOKEEPER-1355-ver12-2.patch, ZOOKEEPER-1355-ver12-4.patch, ZOOKEEPER-1355-ver12.patch, ZOOKEEPER-1355-ver13.patch, ZOOKEEPER-1355-ver2.patch, ZOOKEEPER-1355-ver4.patch, ZOOKEEPER-1355-ver5.patch, ZOOKEEPER-1355-ver6.patch, ZOOKEEPER-1355-ver7.patch, ZOOKEEPER-1355-ver8.patch, ZOOKEEPER-1355-ver9-1.patch, ZOOKEEPER-1355-ver9.patch, ZOOKEEPER=1355-ver3.patch, ZOOOKEEPER-1355-test.patch, ZOOOKEEPER-1355-ver1.patch, ZOOOKEEPER-1355.patch, loadbalancing-more-details.pdf, loadbalancing.pdf When the set of servers changes, we would like to update the server list stored by clients without restarting the clients. Moreover, assuming that the number of clients per server is the same (in expectation) in the old configuration (as guaranteed by the current list shuffling for example), we would like to re-balance client connections across the new set of servers in a way that a) the number of clients per server is the same for all servers (in expectation) and b) there is no excessive/unnecessary client migration. It is simple to achieve (a) without (b) - just re-shuffle the new list of servers at every client. But this would create unnecessary migration, which we'd like to avoid. We propose a simple probabilistic migration scheme that achieves (a) and (b) - each client locally decides whether and where to migrate when the list of servers changes. The attached document describes the scheme and shows an evaluation of it in Zookeeper. We also implemented re-balancing through a consistent-hashing scheme and show a comparison. We derived the probabilistic migration rules from a simple formula that we can also provide, if someone's interested in the proof. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (BOOKKEEPER-262) Implement a meta store based hedwig metadata manager.
Sijie Guo created BOOKKEEPER-262: Summary: Implement a meta store based hedwig metadata manager. Key: BOOKKEEPER-262 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-262 Project: Bookkeeper Issue Type: Sub-task Components: hedwig-server Reporter: Sijie Guo Fix For: 4.2.0 We had provided a metadata manager interface by BOOKKEEPER-250 BOOKKEEPER-259. we need a metadata manager implementation use meta store API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-253) BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock
[ https://issues.apache.org/jira/browse/BOOKKEEPER-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276580#comment-13276580 ] Ivan Kelly commented on BOOKKEEPER-253: --- Ah yes, I had misunderstood the problem. I think the Write permission node will work, but it needs a small modification to ensure that in the time period between deleting and acquiring the write permission and creating the using the ledger, and other node doesn't come in and do the same. I think it should work as follows. There is one znode, the write permission znode, /journal/writeLock When a node wants to start writing, it must read the znode to see what the current inprogress_znode is. At this point it saves the version of the writeLock znode. It then recovers the inprogress_znode, which will fence the ledger which it is using. It creates its own ledger, and then writes the new inprogress_znode to writeLock, using the version it previously saved. If another node has tried to start writing before this, the version will have changed, so the write will fail. BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock --- Key: BOOKKEEPER-253 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-253 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Reporter: suja s Assignee: Uma Maheswara Rao G Priority: Blocker Normal switch fails. (BKjournalManager zk session timeout is 3000 and ZKFC session timeout is 5000. By the time control comes to acquire lock the previous lock is not released which leads to failure in lock acquisition by NN and NN gets shutdown. Ideally it should have been done) = 2012-05-09 20:15:29,732 ERROR org.apache.hadoop.contrib.bkjournal.WriteLock: Failed to acquire lock with /ledgers/lock/lock-07, lock-06 already has it 2012-05-09 20:15:29,732 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@412beeec, stream=null)) java.io.IOException: Could not acquire lock at org.apache.hadoop.contrib.bkjournal.WriteLock.acquire(WriteLock.java:107) at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.recoverUnfinalizedSegments(BookKeeperJournalManager.java:406) at org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:551) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:322) at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:548) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1134) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:598) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) 2012-05-09 20:15:29,736 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at HOST-XX-XX-XX-XX/XX.XX.XX.XX Scenario: Start ZKFCS, NNs NN1 is active and NN2 is standby Stop NN1. NN2 tries to transition to active and gets
[jira] [Updated] (BOOKKEEPER-258) CompactionTest failed
[ https://issues.apache.org/jira/browse/BOOKKEEPER-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-258: - Attachment: BOOKKEEPER-258.diff I did set readTimeout to a large value to disable readTimeout during testing. and I ran while [ $? = 0 ]; do mvn test -Dtest=CompactionTest compaction.log; done for several hours, it doesn't reproduce the issue. CompactionTest failed - Key: BOOKKEEPER-258 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-258 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-server Reporter: Flavio Junqueira Assignee: Sijie Guo Priority: Blocker Fix For: 4.1.0 Attachments: BOOKKEEPER-258.diff {noformat} --- Test set: org.apache.bookkeeper.bookie.CompactionTest --- Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 32.557 sec FAILURE! testCompactionSmallEntryLogs(org.apache.bookkeeper.bookie.CompactionTest) Time elapsed: 6.507 sec ERROR! org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException at org.apache.bookkeeper.client.BKException.create(BKException.java:62) at org.apache.bookkeeper.client.LedgerHandle.readEntries(LedgerHandle.java:347) at org.apache.bookkeeper.bookie.CompactionTest.verifyLedger(CompactionTest.java:128) at org.apache.bookkeeper.bookie.CompactionTest.testCompactionSmallEntryLogs(CompactionTest.java:317) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:172) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:78) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:70) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-258) CompactionTest failed
[ https://issues.apache.org/jira/browse/BOOKKEEPER-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276607#comment-13276607 ] Sijie Guo commented on BOOKKEEPER-258: -- @Ivan to explain why readTimeout cause this issue. we had to clarify two things, 1) how readTimeout works? 2) how the test runs. for first thing, from Netty documents (http://docs.jboss.org/netty/3.1/api/org/jboss/netty/handler/timeout/ReadTimeoutHandler.html). The timeout happened when no data was read within a certain of period time. for second thing, CompactionTest#testCompactionSmallEntryLogs ran as below: 1) add several messages to bookkeeper. (so the connection will be established to bookie server) 2) delete ledgers and sleep to wait for GC. the sleep interval is {MajorCompactionInterval + GcWaitTime}, which is 5 seconds, equals to ReadTimeout (is also 5 seconds). so during 5 seconds, there is no activities. The channel might be time out after the sleep interval. 3) read entries to verify them. {code} client.connectIfNeededAndDoOp(new GenericCallbackVoid() { @Override public void operationComplete(int rc, Void result) { if (rc != BKException.Code.OK) { cb.readEntryComplete(rc, ledgerId, entryId, null, ctx); return; } client.readEntry(ledgerId, entryId, cb, ctx); } }); {code} As the code indicated above, the channel is only checked when calling client.connectIfNeededAndDoOp. If the channel is not set to disconnected, the client.readEntry will be called to send requests. After client.readEntry put the completion keys in completion pending queue, the channel timeout happened (there is no data read from channel, and the time has been up to 5 seconds due to sleep), all those requests would be error out. For continuous traffic, it is OK because there is data to read on the channel. But if the traffic arrives in readtimeout interval, there is no data to read for readtimeout interval, the channel has to be closed due to readtimeout. For more, the timeout callback is triggered by Netty. so we have no idea that when the timeout callback will be triggered. So it is difficult to guarantee the readEntry/addEntry operations be executed atomically before/after timeout callback. CompactionTest failed - Key: BOOKKEEPER-258 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-258 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-server Reporter: Flavio Junqueira Assignee: Sijie Guo Priority: Blocker Fix For: 4.1.0 Attachments: BOOKKEEPER-258.diff {noformat} --- Test set: org.apache.bookkeeper.bookie.CompactionTest --- Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 32.557 sec FAILURE! testCompactionSmallEntryLogs(org.apache.bookkeeper.bookie.CompactionTest) Time elapsed: 6.507 sec ERROR! org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException at org.apache.bookkeeper.client.BKException.create(BKException.java:62) at org.apache.bookkeeper.client.LedgerHandle.readEntries(LedgerHandle.java:347) at org.apache.bookkeeper.bookie.CompactionTest.verifyLedger(CompactionTest.java:128) at org.apache.bookkeeper.bookie.CompactionTest.testCompactionSmallEntryLogs(CompactionTest.java:317) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) at
[jira] [Commented] (BOOKKEEPER-253) BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock
[ https://issues.apache.org/jira/browse/BOOKKEEPER-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276620#comment-13276620 ] Rakesh R commented on BOOKKEEPER-253: - @Ivan bq. but it needs a small modification to ensure that in the time period between deleting and acquiring the write permission and creating the using the ledger, and other node doesn't come in and do the same Hope you are pointing to the window gap between 'delete create' operations and chances of race condition. Can we use ZooKeeper MultiTransactionRecord api like, Op.delete(delete, Op.create(create, zk.multi(ops); I feel, this would resolve the race condition. what's your opinion? Also, I didn't fully understand the versioning concept what you are proposing? BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock --- Key: BOOKKEEPER-253 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-253 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Reporter: suja s Assignee: Uma Maheswara Rao G Priority: Blocker Normal switch fails. (BKjournalManager zk session timeout is 3000 and ZKFC session timeout is 5000. By the time control comes to acquire lock the previous lock is not released which leads to failure in lock acquisition by NN and NN gets shutdown. Ideally it should have been done) = 2012-05-09 20:15:29,732 ERROR org.apache.hadoop.contrib.bkjournal.WriteLock: Failed to acquire lock with /ledgers/lock/lock-07, lock-06 already has it 2012-05-09 20:15:29,732 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@412beeec, stream=null)) java.io.IOException: Could not acquire lock at org.apache.hadoop.contrib.bkjournal.WriteLock.acquire(WriteLock.java:107) at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.recoverUnfinalizedSegments(BookKeeperJournalManager.java:406) at org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:551) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:322) at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:548) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1134) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:598) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) 2012-05-09 20:15:29,736 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at HOST-XX-XX-XX-XX/XX.XX.XX.XX Scenario: Start ZKFCS, NNs NN1 is active and NN2 is standby Stop NN1. NN2 tries to transition to active and gets shut down -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-253) BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock
[ https://issues.apache.org/jira/browse/BOOKKEEPER-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276635#comment-13276635 ] Rakesh R commented on BOOKKEEPER-253: - @Ivan, Oh. You meant, recovering the inprogress_znode will release the write permission and startLogSegment will again try acquiring the write permission. In that case, we could not go with multi() option, since these are two different calls. I also feel, logic based on znode version would work. BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock --- Key: BOOKKEEPER-253 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-253 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Reporter: suja s Assignee: Uma Maheswara Rao G Priority: Blocker Normal switch fails. (BKjournalManager zk session timeout is 3000 and ZKFC session timeout is 5000. By the time control comes to acquire lock the previous lock is not released which leads to failure in lock acquisition by NN and NN gets shutdown. Ideally it should have been done) = 2012-05-09 20:15:29,732 ERROR org.apache.hadoop.contrib.bkjournal.WriteLock: Failed to acquire lock with /ledgers/lock/lock-07, lock-06 already has it 2012-05-09 20:15:29,732 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@412beeec, stream=null)) java.io.IOException: Could not acquire lock at org.apache.hadoop.contrib.bkjournal.WriteLock.acquire(WriteLock.java:107) at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.recoverUnfinalizedSegments(BookKeeperJournalManager.java:406) at org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:551) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:322) at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:548) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1134) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:598) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) 2012-05-09 20:15:29,736 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at HOST-XX-XX-XX-XX/XX.XX.XX.XX Scenario: Start ZKFCS, NNs NN1 is active and NN2 is standby Stop NN1. NN2 tries to transition to active and gets shut down -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-253) BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock
[ https://issues.apache.org/jira/browse/BOOKKEEPER-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276648#comment-13276648 ] Ivan Kelly commented on BOOKKEEPER-253: --- Yes, thats exactly what I mean. I've been trying to formulate a possible race for this for the last hours, but I haven't been able. Once I come up with one, ill post it here. BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock --- Key: BOOKKEEPER-253 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-253 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Reporter: suja s Assignee: Uma Maheswara Rao G Priority: Blocker Normal switch fails. (BKjournalManager zk session timeout is 3000 and ZKFC session timeout is 5000. By the time control comes to acquire lock the previous lock is not released which leads to failure in lock acquisition by NN and NN gets shutdown. Ideally it should have been done) = 2012-05-09 20:15:29,732 ERROR org.apache.hadoop.contrib.bkjournal.WriteLock: Failed to acquire lock with /ledgers/lock/lock-07, lock-06 already has it 2012-05-09 20:15:29,732 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@412beeec, stream=null)) java.io.IOException: Could not acquire lock at org.apache.hadoop.contrib.bkjournal.WriteLock.acquire(WriteLock.java:107) at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.recoverUnfinalizedSegments(BookKeeperJournalManager.java:406) at org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:551) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:322) at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:548) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1134) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:598) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) 2012-05-09 20:15:29,736 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at HOST-XX-XX-XX-XX/XX.XX.XX.XX Scenario: Start ZKFCS, NNs NN1 is active and NN2 is standby Stop NN1. NN2 tries to transition to active and gets shut down -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-253) BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock
[ https://issues.apache.org/jira/browse/BOOKKEEPER-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276651#comment-13276651 ] Ivan Kelly commented on BOOKKEEPER-253: --- If the race doesn't exist, it would be possible to simply 'lock' using the inprogress znode. BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock --- Key: BOOKKEEPER-253 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-253 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Reporter: suja s Assignee: Uma Maheswara Rao G Priority: Blocker Normal switch fails. (BKjournalManager zk session timeout is 3000 and ZKFC session timeout is 5000. By the time control comes to acquire lock the previous lock is not released which leads to failure in lock acquisition by NN and NN gets shutdown. Ideally it should have been done) = 2012-05-09 20:15:29,732 ERROR org.apache.hadoop.contrib.bkjournal.WriteLock: Failed to acquire lock with /ledgers/lock/lock-07, lock-06 already has it 2012-05-09 20:15:29,732 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@412beeec, stream=null)) java.io.IOException: Could not acquire lock at org.apache.hadoop.contrib.bkjournal.WriteLock.acquire(WriteLock.java:107) at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.recoverUnfinalizedSegments(BookKeeperJournalManager.java:406) at org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:551) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:322) at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:548) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1134) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:598) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) 2012-05-09 20:15:29,736 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at HOST-XX-XX-XX-XX/XX.XX.XX.XX Scenario: Start ZKFCS, NNs NN1 is active and NN2 is standby Stop NN1. NN2 tries to transition to active and gets shut down -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-253) BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock
[ https://issues.apache.org/jira/browse/BOOKKEEPER-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276715#comment-13276715 ] Ivan Kelly commented on BOOKKEEPER-253: --- @Uma This is what I was suggesting. BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock --- Key: BOOKKEEPER-253 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-253 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Reporter: suja s Assignee: Uma Maheswara Rao G Priority: Blocker Normal switch fails. (BKjournalManager zk session timeout is 3000 and ZKFC session timeout is 5000. By the time control comes to acquire lock the previous lock is not released which leads to failure in lock acquisition by NN and NN gets shutdown. Ideally it should have been done) = 2012-05-09 20:15:29,732 ERROR org.apache.hadoop.contrib.bkjournal.WriteLock: Failed to acquire lock with /ledgers/lock/lock-07, lock-06 already has it 2012-05-09 20:15:29,732 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@412beeec, stream=null)) java.io.IOException: Could not acquire lock at org.apache.hadoop.contrib.bkjournal.WriteLock.acquire(WriteLock.java:107) at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.recoverUnfinalizedSegments(BookKeeperJournalManager.java:406) at org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:551) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:322) at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:548) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1134) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:598) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) 2012-05-09 20:15:29,736 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at HOST-XX-XX-XX-XX/XX.XX.XX.XX Scenario: Start ZKFCS, NNs NN1 is active and NN2 is standby Stop NN1. NN2 tries to transition to active and gets shut down -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-237) Automatic recovery of under-replicated ledgers and its entries
[ https://issues.apache.org/jira/browse/BOOKKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated BOOKKEEPER-237: Attachment: Auto Recovery Detection - distributed chain approach.doc bq.I'm getting to realize that the main difference between what you're proposing and my half-baked proposal is that I'm trying to get rid of master accountant election and have each bookie individually figuring out what it has to replicate in the case of a crash. I believe that's the key difference. bq. Also, should design multiple groups and pointers to withstand multiple crashes. Instead can we make it simple by choosing one guy for monitoring? I'm just attaching(Auto Recovery Detection - distributed chain approach.doc) my thoughts about, how does chaining based distributed approach works?. Hope you are also thinking about similar approach. Please review. Automatic recovery of under-replicated ledgers and its entries -- Key: BOOKKEEPER-237 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-237 Project: Bookkeeper Issue Type: New Feature Components: bookkeeper-client, bookkeeper-server Affects Versions: 4.0.0 Reporter: Rakesh R Assignee: Rakesh R Attachments: Auto Recovery Detection - distributed chain approach.doc, Auto Recovery and Bookie sync-ups.pdf As per the current design of BookKeeper, if one of the BookKeeper server dies, there is no automatic mechanism to identify and recover the under replicated ledgers and its corresponding entries. This would lead to losing the successfully written entries, which will be a critical problem in sensitive systems. This document is trying to describe few proposals to overcome these limitations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-146) TestConcurrentTopicAcquisition sometimes hangs
[ https://issues.apache.org/jira/browse/BOOKKEEPER-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated BOOKKEEPER-146: -- Attachment: BOOKKEEPER-146.diff It's been running in a loop for 30 minutes now, and doesn't seem to be hanging. Main problem was that even after the hedwig client was closed, a subscription request could succeed and add a channel to the channel list, though hedwig client had already moved by the point at which it closed them. TestConcurrentTopicAcquisition sometimes hangs -- Key: BOOKKEEPER-146 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-146 Project: Bookkeeper Issue Type: Bug Reporter: Ivan Kelly Assignee: Sijie Guo Priority: Blocker Fix For: 4.1.0 Attachments: BOOKKEEPER-146.diff to repro {code} while [ $? = 0 ]; do mvn test -Dtest=TestConcurrentTopicAcquisition; done {code} The stacktrace where it hangs looks very like BOOKKEEPER-5 {code} main prio=5 tid=102801000 nid=0x100601000 waiting on condition [1005ff000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 7bd8e1090 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1253) at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:107) at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.releaseExternalResources(NioClientSocketChannelFactory.java:143) at org.apache.hedwig.client.netty.HedwigClientImpl.close(HedwigClientImpl.java:234) at org.apache.hedwig.client.HedwigClient.close(HedwigClient.java:70) at org.apache.hedwig.server.topics.TestConcurrentTopicAcquisition.tearDown(TestConcurrentTopicAcquisition.java:99) at junit.framework.TestCase.runBare(TestCase.java:140) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: BOOKKEEPER-146 TestConcurrentTopicAcquisition sometimes hangs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5144/ --- Review request for bookkeeper. Summary --- It's been running in a loop for 30 minutes now, and doesn't seem to be hanging. Main problem was that even after the hedwig client was closed, a subscription request could succeed and add a channel to the channel list, though hedwig client had already moved by the point at which it closed them. This addresses bug BOOKKEEPER-146. https://issues.apache.org/jira/browse/BOOKKEEPER-146 Diffs - hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigSubscriber.java 0c8634c hedwig-client/src/main/java/org/apache/hedwig/client/netty/WriteCallback.java a8552f4 hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigPublisher.java 603766c hedwig-client/src/main/java/org/apache/hedwig/client/netty/ConnectCallback.java f5077b0 hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigClientImpl.java 806cdef Diff: https://reviews.apache.org/r/5144/diff Testing --- Thanks, Ivan
[jira] [Commented] (BOOKKEEPER-146) TestConcurrentTopicAcquisition sometimes hangs
[ https://issues.apache.org/jira/browse/BOOKKEEPER-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276825#comment-13276825 ] jirapos...@reviews.apache.org commented on BOOKKEEPER-146: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5144/ --- Review request for bookkeeper. Summary --- It's been running in a loop for 30 minutes now, and doesn't seem to be hanging. Main problem was that even after the hedwig client was closed, a subscription request could succeed and add a channel to the channel list, though hedwig client had already moved by the point at which it closed them. This addresses bug BOOKKEEPER-146. https://issues.apache.org/jira/browse/BOOKKEEPER-146 Diffs - hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigSubscriber.java 0c8634c hedwig-client/src/main/java/org/apache/hedwig/client/netty/WriteCallback.java a8552f4 hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigPublisher.java 603766c hedwig-client/src/main/java/org/apache/hedwig/client/netty/ConnectCallback.java f5077b0 hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigClientImpl.java 806cdef Diff: https://reviews.apache.org/r/5144/diff Testing --- Thanks, Ivan TestConcurrentTopicAcquisition sometimes hangs -- Key: BOOKKEEPER-146 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-146 Project: Bookkeeper Issue Type: Bug Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Blocker Fix For: 4.1.0 Attachments: BOOKKEEPER-146.diff to repro {code} while [ $? = 0 ]; do mvn test -Dtest=TestConcurrentTopicAcquisition; done {code} The stacktrace where it hangs looks very like BOOKKEEPER-5 {code} main prio=5 tid=102801000 nid=0x100601000 waiting on condition [1005ff000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 7bd8e1090 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1253) at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:107) at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.releaseExternalResources(NioClientSocketChannelFactory.java:143) at org.apache.hedwig.client.netty.HedwigClientImpl.close(HedwigClientImpl.java:234) at org.apache.hedwig.client.HedwigClient.close(HedwigClient.java:70) at org.apache.hedwig.server.topics.TestConcurrentTopicAcquisition.tearDown(TestConcurrentTopicAcquisition.java:99) at junit.framework.TestCase.runBare(TestCase.java:140) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-251) Noise error message printed when scanning entry log files those have been garbage collected.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-251: - Attachment: BK-251.diff_v2 brought the patch to latest trunk Noise error message printed when scanning entry log files those have been garbage collected. Key: BOOKKEEPER-251 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-251 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-server Affects Versions: 4.1.0 Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.1.0 Attachments: BK-251.diff, BK-251.diff_v2 currently, due to the messy scan mechanism deployed by garbage collector thread. following noise error message would be printed when scanning those entry log files has been garbage collected. {quote} 2012-05-09 15:58:52,742 - INFO [GarbageCollectorThread:GarbageCollectorThread@466] - Extracting entry log meta from entryLogId: 0 2012-05-09 15:58:52,743 - WARN [GarbageCollectorThread:EntryLogger@386] - Failed to get channel to scan entry log: 0.log 2012-05-09 15:58:52,743 - WARN [GarbageCollectorThread:GarbageCollectorThread@473] - Premature exception when processing 0recovery will take care of the problem java.io.FileNotFoundException: No file for log 0 at org.apache.bookkeeper.bookie.EntryLogger.findFile(EntryLogger.java:366) at org.apache.bookkeeper.bookie.EntryLogger.getChannelForLogId(EntryLogger.java:340) at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:384) at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLog(GarbageCollectorThread.java:485) at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:470) at org.apache.bookkeeper.bookie.GarbageCollectorThread.run(GarbageCollectorThread.java:189) {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: BOOKKEEPER-146 TestConcurrentTopicAcquisition sometimes hangs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5144/#review7950 --- thanks Ivan. the patch seems great. just some slight comments. hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigPublisher.java https://reviews.apache.org/r/5144/#comment17288 it would be better to move line 'closed = true;' to the top of close(). because you used closed to avoid new channel being storeHost2ChannelMapping. hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigSubscriber.java https://reviews.apache.org/r/5144/#comment17289 do we need to put the closing logic in closeLock synchronization block? if we had acquired closeLock and set closed to true, no channel could be put into topicSubscriber2Channel again. - Sijie On 2012-05-16 15:48:50, Ivan Kelly wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5144/ --- (Updated 2012-05-16 15:48:50) Review request for bookkeeper. Summary --- It's been running in a loop for 30 minutes now, and doesn't seem to be hanging. Main problem was that even after the hedwig client was closed, a subscription request could succeed and add a channel to the channel list, though hedwig client had already moved by the point at which it closed them. This addresses bug BOOKKEEPER-146. https://issues.apache.org/jira/browse/BOOKKEEPER-146 Diffs - hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigSubscriber.java 0c8634c hedwig-client/src/main/java/org/apache/hedwig/client/netty/WriteCallback.java a8552f4 hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigPublisher.java 603766c hedwig-client/src/main/java/org/apache/hedwig/client/netty/ConnectCallback.java f5077b0 hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigClientImpl.java 806cdef Diff: https://reviews.apache.org/r/5144/diff Testing --- Thanks, Ivan
[jira] [Commented] (BOOKKEEPER-146) TestConcurrentTopicAcquisition sometimes hangs
[ https://issues.apache.org/jira/browse/BOOKKEEPER-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277567#comment-13277567 ] jirapos...@reviews.apache.org commented on BOOKKEEPER-146: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5144/#review7950 --- thanks Ivan. the patch seems great. just some slight comments. hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigPublisher.java https://reviews.apache.org/r/5144/#comment17288 it would be better to move line 'closed = true;' to the top of close(). because you used closed to avoid new channel being storeHost2ChannelMapping. hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigSubscriber.java https://reviews.apache.org/r/5144/#comment17289 do we need to put the closing logic in closeLock synchronization block? if we had acquired closeLock and set closed to true, no channel could be put into topicSubscriber2Channel again. - Sijie On 2012-05-16 15:48:50, Ivan Kelly wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/5144/ bq. --- bq. bq. (Updated 2012-05-16 15:48:50) bq. bq. bq. Review request for bookkeeper. bq. bq. bq. Summary bq. --- bq. bq. It's been running in a loop for 30 minutes now, and doesn't seem to be hanging. Main problem was that even after the hedwig client was closed, a subscription request could succeed and add a channel to the channel list, though hedwig client had already moved by the point at which it closed them. bq. bq. bq. This addresses bug BOOKKEEPER-146. bq. https://issues.apache.org/jira/browse/BOOKKEEPER-146 bq. bq. bq. Diffs bq. - bq. bq. hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigSubscriber.java 0c8634c bq. hedwig-client/src/main/java/org/apache/hedwig/client/netty/WriteCallback.java a8552f4 bq. hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigPublisher.java 603766c bq. hedwig-client/src/main/java/org/apache/hedwig/client/netty/ConnectCallback.java f5077b0 bq. hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigClientImpl.java 806cdef bq. bq. Diff: https://reviews.apache.org/r/5144/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Ivan bq. bq. TestConcurrentTopicAcquisition sometimes hangs -- Key: BOOKKEEPER-146 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-146 Project: Bookkeeper Issue Type: Bug Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Blocker Fix For: 4.1.0 Attachments: BOOKKEEPER-146.diff to repro {code} while [ $? = 0 ]; do mvn test -Dtest=TestConcurrentTopicAcquisition; done {code} The stacktrace where it hangs looks very like BOOKKEEPER-5 {code} main prio=5 tid=102801000 nid=0x100601000 waiting on condition [1005ff000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 7bd8e1090 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1253) at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:107) at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.releaseExternalResources(NioClientSocketChannelFactory.java:143) at org.apache.hedwig.client.netty.HedwigClientImpl.close(HedwigClientImpl.java:234) at org.apache.hedwig.client.HedwigClient.close(HedwigClient.java:70) at org.apache.hedwig.server.topics.TestConcurrentTopicAcquisition.tearDown(TestConcurrentTopicAcquisition.java:99) at junit.framework.TestCase.runBare(TestCase.java:140) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at
[jira] [Commented] (BOOKKEEPER-263) ZK ledgers root path is hard coded
[ https://issues.apache.org/jira/browse/BOOKKEEPER-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277576#comment-13277576 ] Sijie Guo commented on BOOKKEEPER-263: -- thanks Aniruddha. the patch seems good. just one comment, sees that AVAILABLE_NODE spreads over several files. could we consider moving it to a common place (which could be shared by client and server), such as AbstractConfiguration to have a method getAvailableBookiesPath(), which is similar what Hedwig did in ServerConfiguration to manage its znode path. ZK ledgers root path is hard coded -- Key: BOOKKEEPER-263 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-263 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client, bookkeeper-server Affects Versions: 4.1.0 Reporter: Aniruddha Assignee: Aniruddha Fix For: 4.1.0 Attachments: BK-263.patch Currently the ZK ledger root path is not picked up from the config file (It is hardcoded). This patch fixes this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira