[jira] [Updated] (HDFS-9406) FSImage may get corrupted after deleting snapshot
[ https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9406: Fix Version/s: 2.8.0 > FSImage may get corrupted after deleting snapshot > - > > Key: HDFS-9406 > URL: https://issues.apache.org/jira/browse/HDFS-9406 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: CentOS 6 amd64, CDH 5.4.4-1 > 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3 > Memory: 32GB > Namenode blocks: ~700_000 blocks, no HA setup >Reporter: Stanislav Antic >Assignee: Yongjun Zhang > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9406.001.patch, HDFS-9406.002.patch, > HDFS-9406.003.patch, HDFS-9406.branch-2.7.patch > > > FSImage corruption happened after HDFS snapshots were taken. Cluster was not > used > at that time. > When namenode restarts it reported NULL pointer exception: > {code} > 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized > segments in /tmp/fsimage_checker_5857/fsimage/current > 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected. > 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes. > 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:810) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:794) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553) > 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1 > {code} > Corruption happened after "07.11.2015 00:15", and after that time blocks > ~9300 blocks were invalidated that shouldn't be. > After recovering FSimage I discovered that around ~9300 blocks were missing. > -I also attached log of namenode before and after corruption happened.- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9406) FSImage may get corrupted after deleting snapshot
[ https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9406: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.8, branch-2.7. {quote} commit 34ab50ea92370cc7440a8f7649286b148c2fde65 Author: Yongjun Zhang Date: Mon Feb 1 11:23:44 2016 -0800 HDFS-9406. FSImage may get corrupted after deleting snapshot. (Contributed by Jing Zhao, Stanislav Antic, Vinayakumar B, Yongjun Zhang) {quote} Many thanks to [~stanislav.an...@gmail.com], [~jingzhao], and [~vinayrpet] for the contribution, really nice community work! > FSImage may get corrupted after deleting snapshot > - > > Key: HDFS-9406 > URL: https://issues.apache.org/jira/browse/HDFS-9406 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: CentOS 6 amd64, CDH 5.4.4-1 > 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3 > Memory: 32GB > Namenode blocks: ~700_000 blocks, no HA setup >Reporter: Stanislav Antic >Assignee: Yongjun Zhang > Fix For: 2.7.3 > > Attachments: HDFS-9406.001.patch, HDFS-9406.002.patch, > HDFS-9406.003.patch, HDFS-9406.branch-2.7.patch > > > FSImage corruption happened after HDFS snapshots were taken. Cluster was not > used > at that time. > When namenode restarts it reported NULL pointer exception: > {code} > 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized > segments in /tmp/fsimage_checker_5857/fsimage/current > 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected. > 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes. > 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:810) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:794) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553) > 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1 > {code} > Corruption happened after "07.11.2015 00:15", and after that time blocks > ~9300 blocks were invalidated that shouldn't be. > After recovering FSimage I discovered that around ~9300 blocks were missing. > -I also attached log of namenode before and after corruption happened.- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9406) FSImage may get corrupted after deleting snapshot
[ https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9406: Attachment: HDFS-9406.branch-2.7.patch > FSImage may get corrupted after deleting snapshot > - > > Key: HDFS-9406 > URL: https://issues.apache.org/jira/browse/HDFS-9406 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: CentOS 6 amd64, CDH 5.4.4-1 > 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3 > Memory: 32GB > Namenode blocks: ~700_000 blocks, no HA setup >Reporter: Stanislav Antic >Assignee: Yongjun Zhang > Attachments: HDFS-9406.001.patch, HDFS-9406.002.patch, > HDFS-9406.003.patch, HDFS-9406.branch-2.7.patch > > > FSImage corruption happened after HDFS snapshots were taken. Cluster was not > used > at that time. > When namenode restarts it reported NULL pointer exception: > {code} > 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized > segments in /tmp/fsimage_checker_5857/fsimage/current > 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected. > 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes. > 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:810) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:794) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553) > 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1 > {code} > Corruption happened after "07.11.2015 00:15", and after that time blocks > ~9300 blocks were invalidated that shouldn't be. > After recovering FSimage I discovered that around ~9300 blocks were missing. > -I also attached log of namenode before and after corruption happened.- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9406) FSImage may get corrupted after deleting snapshot
[ https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9406: Summary: FSImage may get corrupted after deleting snapshot (was: FSImage corruption after taking snapshot) > FSImage may get corrupted after deleting snapshot > - > > Key: HDFS-9406 > URL: https://issues.apache.org/jira/browse/HDFS-9406 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: CentOS 6 amd64, CDH 5.4.4-1 > 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3 > Memory: 32GB > Namenode blocks: ~700_000 blocks, no HA setup >Reporter: Stanislav Antic >Assignee: Yongjun Zhang > Attachments: HDFS-9406.001.patch, HDFS-9406.002.patch, > HDFS-9406.003.patch > > > FSImage corruption happened after HDFS snapshots were taken. Cluster was not > used > at that time. > When namenode restarts it reported NULL pointer exception: > {code} > 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized > segments in /tmp/fsimage_checker_5857/fsimage/current > 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected. > 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes. > 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:810) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:794) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553) > 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1 > {code} > Corruption happened after "07.11.2015 00:15", and after that time blocks > ~9300 blocks were invalidated that shouldn't be. > After recovering FSimage I discovered that around ~9300 blocks were missing. > -I also attached log of namenode before and after corruption happened.- -- This message was sent by Atlassian JIRA (v6.3.4#6332)