[
https://issues.apache.org/jira/browse/HDFS-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694490#comment-13694490
]
Chris Nauroth commented on HDFS-4923:
-------------------------------------
{quote}
In general though, I'm in favor of giving the admin flexibility.
{quote}
Agreed. I do think some environments will continue to prefer checkpoint on
startup (slow startup) over checkpoint on shutdown (slow shutdown).
For example, I once worked in a virtualization infrastructure that would
timeout and kill any VM (virtually pulling the plug) that spent more than 5
minutes in normal shutdown. If namenode shutdown were tied to SysV init
"service stop" scripts in an environment like this, then a checkpoint taking
longer than 5 minutes on shutdown would not be helpful. The infrastructure
would just kill the VM, and then we'd need to checkpoint on the next startup
anyway. The final result would be a longer total restart time for that VM.
> Save namespace when the namenode is stopped
> -------------------------------------------
>
> Key: HDFS-4923
> URL: https://issues.apache.org/jira/browse/HDFS-4923
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 3.0.0
> Reporter: Suresh Srinivas
> Assignee: Suresh Srinivas
>
> In rare instances the namenode fails to load editlog due to corruption during
> startup. This has more severe impact if editlog segment to be checkpointed
> has corruption, as checkpointing fails because the editlog with corruption
> cannot be consumed. If an administrator does not notice this and address it
> by saving the namespace, recovering the namespace would involve complex file
> editing, using previous backups or losing last set of modifications.
> The other issue that also happens frequently is, checkpointing fails and has
> not happened for a long time, resulting in long editlogs and even corrupt
> editlogs.
> To handle these issues, when namenode is stopped, we can put it in safemode
> and save the namespace, before the process is shutdown. As an added benefit,
> the namenode restart would be faster, given there is no editlog to consume.
> What do folks think?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira