[ 
https://issues.apache.org/jira/browse/HDFS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556496#comment-14556496
 ] 

Allen Wittenauer commented on HDFS-7991:
----------------------------------------

bq. Ideally when 2NN or standby is working. But we have had many issues where 
checkpointing is not done by SNN or standby, for the following reasons:

OK, so these are not new issues at all and have been around for literally years 
(decade now?). We had it happen at Y! back in 2007 and it's a story I often 
tell during talks. 

bq. We need a way to be able to save namespace. 

Then fix the NN<->2NN relationship to provide better alerting when stuff goes 
wrong.  Hacking the shell code (and, yes, the code in branch-2 and in trunk are 
clearly hacks.  Heck, the branch-2 doesn't even trigger if you are running NN 
in non-daemon mode...) is completely the wrong thing to do.

.. and has been pointed out, this hack does NOTHING to help in the case of 
hardware failure, when you want it most.

bq. Today operators who understand this situation do save namespace manually 
before stopping the namenode.

I don't think I can put enough lol's in here to express how many laughs this 
statement got from around the office. No, operators who understand this issue 
monitor the size of the edits file and the 2NN and then act appropriately.  We 
don't do safemode->checkpoint->shutdown on every NN bring down.


> Allow users to skip checkpoint when stopping NameNode
> -----------------------------------------------------
>
>                 Key: HDFS-7991
>                 URL: https://issues.apache.org/jira/browse/HDFS-7991
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7991-shellpart.patch, HDFS-7991.000.patch, 
> HDFS-7991.001.patch, HDFS-7991.002.patch, HDFS-7991.003.patch, 
> HDFS-7991.004.patch
>
>
> This is a follow-up jira of HDFS-6353. HDFS-6353 adds the functionality to 
> check if saving namespace is necessary before stopping namenode. As [~kihwal] 
> pointed out in this 
> [comment|https://issues.apache.org/jira/browse/HDFS-6353?focusedCommentId=14380898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380898],
>  in a secured cluster this new functionality requires the user to be kinit'ed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to