[ 
https://issues.apache.org/jira/browse/HDFS-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694490#comment-13694490
 ] 

Chris Nauroth commented on HDFS-4923:
-------------------------------------

{quote}
In general though, I'm in favor of giving the admin flexibility.
{quote}

Agreed.  I do think some environments will continue to prefer checkpoint on 
startup (slow startup) over checkpoint on shutdown (slow shutdown).

For example, I once worked in a virtualization infrastructure that would 
timeout and kill any VM (virtually pulling the plug) that spent more than 5 
minutes in normal shutdown.  If namenode shutdown were tied to SysV init 
"service stop" scripts in an environment like this, then a checkpoint taking 
longer than 5 minutes on shutdown would not be helpful.  The infrastructure 
would just kill the VM, and then we'd need to checkpoint on the next startup 
anyway.  The final result would be a longer total restart time for that VM.

                
> Save namespace when the namenode is stopped
> -------------------------------------------
>
>                 Key: HDFS-4923
>                 URL: https://issues.apache.org/jira/browse/HDFS-4923
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>
> In rare instances the namenode fails to load editlog due to corruption during 
> startup. This has more severe impact if editlog segment to be checkpointed 
> has corruption, as checkpointing fails because the editlog with corruption 
> cannot be consumed. If an administrator does not notice this and address it 
> by saving the namespace, recovering the namespace would involve complex file 
> editing, using previous backups or losing last set of modifications.
> The other issue that also happens frequently is, checkpointing fails and has 
> not happened for a long time, resulting in long editlogs and even corrupt 
> editlogs.
> To handle these issues, when namenode is stopped, we can put it in safemode 
> and save the namespace, before the process is shutdown. As an added benefit, 
> the namenode restart would be faster, given there is no editlog to consume.
> What do folks think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to