[ https://issues.apache.org/jira/browse/HDFS-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381749#comment-14381749 ]
Hudson commented on HDFS-6353: ------------------------------ FAILURE: Integrated in Hadoop-Yarn-trunk #878 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/878/]) HDFS-6353. Check and make checkpoint before stopping the NameNode. Contributed by Jing Zhao. (jing9: rev 5e21e4ca377f68e030f8f3436cd93fd7a74dc5e0) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotBlocksMap.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/UpgradeUtilities.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestStartup.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSaveNamespace.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSAclBaseTest.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFetchImage.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionFunctional.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestParallelImageWrite.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBootstrapStandbyWithBKJM.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogRace.java Move HDFS-6353 to the trunk section in CHANGES.txt (jing9: rev 97e2aa2551338c9430b20fdd839deee49f6ea3c9) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Check and make checkpoint before stopping the NameNode > ------------------------------------------------------ > > Key: HDFS-6353 > URL: https://issues.apache.org/jira/browse/HDFS-6353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Reporter: Suresh Srinivas > Assignee: Jing Zhao > Fix For: 3.0.0 > > Attachments: HDFS-6353.000.patch, HDFS-6353.001.patch, > HDFS-6353.002.branch-2.patch, HDFS-6353.002.branch-2.patch, > HDFS-6353.002.patch > > > One of the failure patterns I have seen is, in some rare circumstances, due > to some inconsistency the secondary or standby fails to consume editlog. The > only solution when this happens is to save the namespace at the current > active namenode. But sometimes when this happens, unsuspecting admin might > end up restarting the namenode, requiring more complicated solution to the > problem (such as ignore editlog record that cannot be consumed etc.). > How about adding the following functionality: > When checkpointer (standby or secondary) fails to consume editlog, based on a > configurable flag (on/off) to let the active namenode know about this > failure. Active namenode can enters safemode and saves namespace. When in > this type of safemode, namenode UI also shows information about checkpoint > failure and that it is saving namespace. Once the namespace is saved, > namenode can come out of safemode. > This means service unavailability (even in HA cluster). But it might be worth > it to avoid long startup times or need for other manual fixes. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)