[ 
https://issues.apache.org/jira/browse/HDFS-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chuanjie.duan updated HDFS-16349:
---------------------------------
    Attachment:     (was: HDFS-16349-branch-3.2.3.patch)

> FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
> ---------------------------------------------------------
>
>                 Key: HDFS-16349
>                 URL: https://issues.apache.org/jira/browse/HDFS-16349
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.2.2, 3.2.3
>            Reporter: chuanjie.duan
>            Priority: Blocker
>
> 2021-11-22 20:36:44,440 INFO 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest 
> log: 10.65.57.133:8485=segmentState {
>   startTxId: 3906965
>   endTxId: 3906965
>   isInProgress: false
> }
> lastWriterEpoch: 5
> lastCommittedTxId: 3906964
> 2021-11-22 20:36:44,457 INFO 
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering 
> unfinalized segments in /data12/data/flashHadoopU/namenode/current
> 2021-11-22 20:36:44,495 INFO 
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits 
> file 
> /data12/data/flashHadoopU/namenode/current/edits_inprogress_0000000000003898378
>  -> 
> /data12/data/flashHadoopU/namenode/current/edits_0000000000003898378-0000000000003898412
> 2021-11-22 20:36:44,657 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: Gap in transactions. Expected to be able to read up 
> until at least txid 2510934 but unable to find any edit logs containing txid 
> 2510933
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
> 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped 
> HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070
> 2021-11-22 20:36:44,760 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics 
> system...
> 2021-11-22 20:36:44,761 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system 
> stopped.
> 2021-11-22 20:36:44,761 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system 
> shutdown complete.
> 2021-11-22 20:36:44,761 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> Old version: 2.7.3
> New version: 3.2.2
> Steps to Reproduce
> Step 1: Start NN1 as active , NN2 as standby .
> Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare"
> Step 3: Start NN2 active and NN1 as standby with rolling upgrade started 
> option.
> Step 4: DN also restarted in upgrade mode.
> Step 5: Restart journalnode with new hadoop version 
> Step 6: a few days later
> Step 7: bring down both NN, journalnode, DN
> Step 8: Start JN with old version
> Step 9: Start NN1 with rolling upgrade rollback option. nn started failed 
> with above ERROR(Above mentioned txid version 2510933 has been deleted 
> because of  checkpoint mechanism)
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to