[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-13596: - Fix Version/s: 3.1.3 3.2.1 3.3.0 Hadoop Flags: Reviewed Target Version/s: 3.1.2, 3.2.0, 2.10.0, 3.3.0 (was: 2.10.0, 3.2.0, 3.1.2, 3.3.0) Resolution: Fixed Status: Resolved (was: Patch Available) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Blocker > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, > HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, > HDFS-13596.009.patch, HDFS-13596.010.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) >
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-13596: --- Target Version/s: 3.1.2, 3.2.0, 2.10.0, 3.3.0 (was: 2.10.0) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Blocker > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, > HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, > HDFS-13596.009.patch, HDFS-13596.010.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNo
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.010.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Blocker > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, > HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, > HDFS-13596.009.patch, HDFS-13596.010.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.se
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.009.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Blocker > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, > HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, > HDFS-13596.009.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-13596: --- Target Version/s: 2.10.0 > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Blocker > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, > HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > a
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-13596: --- Priority: Blocker (was: Critical) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Blocker > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, > HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.008.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, > HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) >
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.007.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, > HDFS-13596.006.patch, HDFS-13596.007.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.h
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.006.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, > HDFS-13596.006.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.Na
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.005.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.004.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.003.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.nam
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: (was: HDFS-13596.003.patch) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.003.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.nam
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Status: Patch Available (was: Open) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.create
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Status: Open (was: Patch Available) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.create
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.002.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createName
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Status: Patch Available (was: In Progress) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNo
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HDFS-13596: --- Attachment: HDFS-13596.001.patch > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Zsolt Venczel >Priority: Critical > Attachments: HDFS-13596.001.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.ja
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-13596: - Priority: Critical (was: Blocker) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Zsolt Venczel >Priority: Critical > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-13596: - Target Version/s: (was: 3.2.0, 3.0.4, 3.1.2) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Zsolt Venczel >Priority: Blocker > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.ap
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated HDFS-13596: -- Target Version/s: 3.2.0, 3.0.4, 3.1.2 (was: 3.2.0, 3.1.1, 3.0.4) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Zsolt Venczel >Priority: Blocker > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-13596: -- Target Version/s: 3.2.0, 3.1.1, 3.0.4 > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Priority: Blocker > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameN
[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated HDFS-13596: - Priority: Blocker (was: Critical) > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Priority: Blocker > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode