[ 
https://issues.apache.org/jira/browse/HDFS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429684#comment-13429684
 ] 

liaowenrui commented on HDFS-3769:
----------------------------------

Active NN editlog:
158-1-132-18:/opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current # ll 
edits_000000000000000235
edits_0000000000000002354-0000000000000002354  
edits_0000000000000002355-0000000000000002356  
edits_0000000000000002357-0000000000000002358  
edits_0000000000000002359-0000000000000002360

Active NN fsimage file:
-rw-r--r-- 1 root root   37545 Aug  6 07:44 fsimage_0000000000000002351
-rw-r--r-- 1 root root      62 Aug  6 07:46 fsimage_0000000000000002351.md5
-rw-r--r-- 1 root root   37545 Aug  6 09:33 fsimage_0000000000000002353
-rw-r--r-- 1 root root      62 Aug  6 09:33 fsimage_0000000000000002353.md5


Standby NN editlog:
158-1-132-19:/opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current # ll 
edits_000000000000000235
edits_0000000000000002350-0000000000000002351  
edits_0000000000000002352-0000000000000002353

Standby NN fsimage file:
-rw-r--r-- 1 root root   37545 Aug  6 11:51 fsimage_0000000000000002351
-rw-r--r-- 1 root root      62 Aug  6 11:51 fsimage_0000000000000002351.md5
-rw-r--r-- 1 root root   37545 Aug  6 13:38 fsimage_0000000000000002353
-rw-r--r-- 1 root root      62 Aug  6 13:38 fsimage_0000000000000002353.md5
-rw-r--r-- 1 root root       5 Aug  6 11:47 seen_txid

share storage editlog:
[zk: localhost:2181(CONNECTED) 3] ls /hdfsEdit/ledgers/edits_00000000000000235

edits_000000000000002352_000000000000002353   
edits_000000000000002357_000000000000002358   
edits_000000000000002355_000000000000002356   
edits_000000000000002350_000000000000002351
edits_000000000000002359_000000000000002360
[zk: localhost:2181(CONNECTED) 2] get /hdfsEdit/maxtxid
2360
cZxid = 0x30000002d
ctime = Mon Jul 30 05:25:32 EDT 2012
mZxid = 0xb00000860
mtime = Mon Aug 06 15:09:36 EDT 2012
pZxid = 0x30000002d
cversion = 0
dataVersion = 681
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0

we can find edits_0000000000000002354-0000000000000002354 file only in active 
nn. when standby nn become active,and load 2354 editlog,but 
2354<2360(maxtxid),and then Standby NN throw excption,and shutdown.


                
> standby namenode become ative fail ,because starting log segment fail on 
> share strage
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-3769
>                 URL: https://issues.apache.org/jira/browse/HDFS-3769
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.0.0-alpha
>         Environment: 3 datanode:158.1.132.18,158.1.132.19,160.161.0.143
> 2 namenode:158.1.131.18,158.1.132.19
> 3 zk:158.1.132.18,158.1.132.19,160.161.0.143
> 3 bookkeeper:158.1.132.18,158.1.132.19,160.161.0.143
> ensemble-size:2,quorum-size:2
>            Reporter: liaowenrui
>            Priority: Critical
>             Fix For: 2.1.0-alpha, 2.0.1-alpha
>
>
> 2012-08-06 15:09:46,264 ERROR 
> org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper: Node 
> /ledgers/available already exists and this is not a retry
> 2012-08-06 15:09:46,264 INFO 
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager: Successfully 
> created bookie available path : /ledgers/available
> 2012-08-06 15:09:46,273 INFO 
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering 
> unfinalized segments in 
> /opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current
> 2012-08-06 15:09:46,277 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest 
> edits from old active before taking over writer role in edits logs.
> 2012-08-06 15:09:46,363 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication 
> and invalidation queues...
> 2012-08-06 15:09:46,363 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all 
> datandoes as stale
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of 
> blocks            = 239
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid 
> blocks          = 0
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of 
> under-replicated blocks = 0
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of  
> over-replicated blocks = 0
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks 
> being written    = 0
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
> edit logs at txnid 2354
> 2012-08-06 15:09:46,471 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 2354
> 2012-08-06 15:09:46,472 FATAL 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: starting log segment 
> 2354 failed for required journal 
> (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@4eda1515,
>  stream=null))
> java.io.IOException: We've already seen 2354. A new stream cannot be created 
> with it
>         at 
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.startLogSegment(BookKeeperJournalManager.java:297)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:86)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$2.apply(JournalSet.java:182)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:319)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:179)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:894)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:268)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:618)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1322)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1230)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:990)
>         at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>         at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>                                                                               
>                                         

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to