[ https://issues.apache.org/jira/browse/HDFS-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284805#comment-13284805 ]
Ivan Kelly commented on HDFS-3452: ---------------------------------- For #1, what you have is good. > BKJM:Switch from standby to active fails and NN gets shut down due to delay > in clearing of lock > ----------------------------------------------------------------------------------------------- > > Key: HDFS-3452 > URL: https://issues.apache.org/jira/browse/HDFS-3452 > Project: Hadoop HDFS > Issue Type: Sub-task > Affects Versions: 2.0.0-alpha > Reporter: suja s > Assignee: Uma Maheswara Rao G > Priority: Blocker > Attachments: BK-253-BKJM.patch, HDFS-3452-1.patch, HDFS-3452.patch, > HDFS-3452.patch > > > Normal switch fails. > (BKjournalManager zk session timeout is 3000 and ZKFC session timeout is > 5000. By the time control comes to acquire lock the previous lock is not > released which leads to failure in lock acquisition by NN and NN gets > shutdown. Ideally it should have been done) > ============================================================================= > 2012-05-09 20:15:29,732 ERROR org.apache.hadoop.contrib.bkjournal.WriteLock: > Failed to acquire lock with /ledgers/lock/lock-0000000007, lock-0000000006 > already has it > 2012-05-09 20:15:29,732 FATAL > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: > recoverUnfinalizedSegments failed for required journal > (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@412beeec, > stream=null)) > java.io.IOException: Could not acquire lock > at org.apache.hadoop.contrib.bkjournal.WriteLock.acquire(WriteLock.java:107) > at > org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.recoverUnfinalizedSegments(BookKeeperJournalManager.java:406) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:551) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:322) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:548) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1134) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:598) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) > 2012-05-09 20:15:29,736 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NameNode at HOST-XX-XX-XX-XX/XX.XX.XX.XX > Scenario: > Start ZKFCS, NNs > NN1 is active and NN2 is standby > Stop NN1. NN2 tries to transition to active and gets shut down -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira