[ https://issues.apache.org/jira/browse/HDFS-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694490#comment-13694490 ]
Chris Nauroth commented on HDFS-4923: ------------------------------------- {quote} In general though, I'm in favor of giving the admin flexibility. {quote} Agreed. I do think some environments will continue to prefer checkpoint on startup (slow startup) over checkpoint on shutdown (slow shutdown). For example, I once worked in a virtualization infrastructure that would timeout and kill any VM (virtually pulling the plug) that spent more than 5 minutes in normal shutdown. If namenode shutdown were tied to SysV init "service stop" scripts in an environment like this, then a checkpoint taking longer than 5 minutes on shutdown would not be helpful. The infrastructure would just kill the VM, and then we'd need to checkpoint on the next startup anyway. The final result would be a longer total restart time for that VM. > Save namespace when the namenode is stopped > ------------------------------------------- > > Key: HDFS-4923 > URL: https://issues.apache.org/jira/browse/HDFS-4923 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.0.0 > Reporter: Suresh Srinivas > Assignee: Suresh Srinivas > > In rare instances the namenode fails to load editlog due to corruption during > startup. This has more severe impact if editlog segment to be checkpointed > has corruption, as checkpointing fails because the editlog with corruption > cannot be consumed. If an administrator does not notice this and address it > by saving the namespace, recovering the namespace would involve complex file > editing, using previous backups or losing last set of modifications. > The other issue that also happens frequently is, checkpointing fails and has > not happened for a long time, resulting in long editlogs and even corrupt > editlogs. > To handle these issues, when namenode is stopped, we can put it in safemode > and save the namespace, before the process is shutdown. As an added benefit, > the namenode restart would be faster, given there is no editlog to consume. > What do folks think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira