[ 
https://issues.apache.org/jira/browse/HDFS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555856#comment-14555856
 ] 

Vinayakumar B commented on HDFS-7991:
-------------------------------------

You never know whether all the time machine will be up for admin to execute 
stop command to have the checkpoint. And also AFAIK in some real and big 
clusters executing stop command itself is very very rare, especially in these 
cases where standby not available.

What if machine itself goes down suddenly after running for months/years, 
having tons of millions of edits without checkpoint ? I have also seen 
sometimes, due to some overusage of openfiles/connections, I was not able to 
open SSH terminal itself to execute command.
Still in this case restart of NN going to take hours/days based on load. Then 
All the effort spent on discussion in this Jira would go waste.

Instead of doing everything at the end while stopping, why not implement a 
periodic check inside Active NameNode itself to check for the checkpoint.
 Similar to {{FSNameSystem#NameNodeEditLogRoller}} added to roll edits after 
reaching threshold to avoid bigger edit logs. Infact we can re-use this thread 
itself to check for checkpoint also with different interval. Interval may be 
multiple of checkpoint interval configured.

Anyway doing *checkpoint* in Active NameNode is not a big deal. Its just saving 
FsImage to all available disks. No big process of loading edits involved as its 
already uptodate. So even NN can do this with just acquiring {{writeLock()}} 
instead of entering safemode and coming out. Still {{saveNamespace()}} external 
RPC can retain current behaviour. 

Since this problem can happen only if Standby/Secondary NameNode not available 
for long time, I feel its Okay for client's operation to wait for 
saveNamespace() to be over.

Any thoughts?

> Allow users to skip checkpoint when stopping NameNode
> -----------------------------------------------------
>
>                 Key: HDFS-7991
>                 URL: https://issues.apache.org/jira/browse/HDFS-7991
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7991-shellpart.patch, HDFS-7991.000.patch, 
> HDFS-7991.001.patch, HDFS-7991.002.patch, HDFS-7991.003.patch, 
> HDFS-7991.004.patch
>
>
> This is a follow-up jira of HDFS-6353. HDFS-6353 adds the functionality to 
> check if saving namespace is necessary before stopping namenode. As [~kihwal] 
> pointed out in this 
> [comment|https://issues.apache.org/jira/browse/HDFS-6353?focusedCommentId=14380898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380898],
>  in a secured cluster this new functionality requires the user to be kinit'ed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to