[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999378#comment-16999378 ] Hudson commented on HDFS-15012: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17773 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17773/]) HDFS-15012. NN fails to parse Edit logs after applying HDFS-13101. (shashikant: rev fdd96e46d1f89f0ecdb9b1836dc7fca9fbb954fd) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestRenameWithSnapshots.java > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) >
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998407#comment-16998407 ] Arpit Agarwal commented on HDFS-15012: -- +1 for the updated patch. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994397#comment-16994397 ] Hadoop QA commented on HDFS-15012: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 13s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}102m 24s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}164m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:e573ea49085 | | JIRA Issue | HDFS-15012 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12988642/HDFS-15012.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5b7f69772808 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 93bb368 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/28513/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28513/testReport/ | | Max. process+thread count | 2814 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28513/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994244#comment-16994244 ] Shashikant Banerjee commented on HDFS-15012: Thanks [~szetszwo] . Patch v1 addresses the checkstyle issues. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991695#comment-16991695 ] Tsz-wo Sze commented on HDFS-15012: --- +1 the 000 patch looks good. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990206#comment-16990206 ] Hadoop QA commented on HDFS-15012: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 53s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 141 unchanged - 0 fixed = 143 total (was 141) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 33s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 47s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}187m 58s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | hadoop.hdfs.web.TestWebHDFSAcl | | | hadoop.hdfs.web.TestWebHDFSForHA | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.web.TestWebHDFS | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.hdfs.server.namenode.TestLeaseManager | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | | | hadoop.hdfs.server.namenode.TestPersistentStoragePolicySatisfier | | | hadoop.hdfs.web.TestWebHDFSXAttr | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.web.TestWebHdfsTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-15012 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987780/HDFS-15012.000.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987183#comment-16987183 ] Ayush Saxena commented on HDFS-15012: - HDFS-13101 is fixing FSImage Corruption, that too seems a critical bug fix, Will reverting that not make it vulnerable? Moreover that has been released too as part of lower versions. Reverting would solve this problem but would open up the older problem. IMO Reverting isn't going to give any big relief, Anyway there is no release planned soon, you all can take some more time to get the solution. If you have any more pointers to how to reproduce the problem, let us know we can try help too.. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) >
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987164#comment-16987164 ] Wei-Chiu Chuang commented on HDFS-15012: We are actively working on this. However given the nature of the problem it is not easy to reproduce in a unit test format. To be on the safe side, I would suggest reverting HDFS-13101 for now. Bump the priority to blocker and add release-blocker label to this jira. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985456#comment-16985456 ] Eric Lin commented on HDFS-15012: - [~surendrasingh], i am a support engineer, not a developer. We do have fsimage and edit files to re-produce the issue. [~weichiu] & [~shashikant] can you please help to develop a patch to re-produce the issue? Thanks > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Critical > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982748#comment-16982748 ] Surendra Singh Lilhore commented on HDFS-15012: --- [~ericlin], can you attach test patch to reproduce this issue ? > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Priority: Critical > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982314#comment-16982314 ] Eric Lin commented on HDFS-15012: - Thanks [~weichiu], Hopefully we can nail it down soon. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Priority: Critical > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982106#comment-16982106 ] Wei-Chiu Chuang commented on HDFS-15012: We recently received several reports regarding corrupt fsimage or edit logs that fail to apply due to snapshot related operations. They happened on the version of HDFS with HDFS-13101, which is supposed to kill snapshot bugs once and for all. However it didn't. We are actively investigating it but so far not a solution yet. [~shashikant] [~sodonnell] FYI. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Priority: Critical > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982090#comment-16982090 ] Ayush Saxena commented on HDFS-15012: - Thanx [~ericlin] for the report. Do you propose a fix for this? [~shashikanth] [~weichiu] Pls give a check once!!! > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Priority: Major > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org