[ 
https://issues.apache.org/jira/browse/HDFS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555300#comment-14555300
 ] 

Allen Wittenauer commented on HDFS-7991:
----------------------------------------

bq. Then if we also let this java program send out the checkpoint check 
command, and considering our current RPC already has the capability to handle 
timeout and retry, I guess we can directly utilize the current saveNamespace 
RPC?

I would keep it simple:  shutdown also triggers the logic for if checkpoint is 
necessary.  There's zero value in "waiting" for the helper app to trigger it. 
This also means the helper app is extremely simple:  an unauthenticated call 
that does "is checkpoint still happening? Is checkpoint still happening? What 
about now? Are we down yet Papa Smurf?"  This way we also fix [~sureshms] issue:

bq. Blindly sending kill -9 is not an option in my opinion. 

That's why it's not blind.  The helper app's *sole* purpose should be to 
provide the hint to the shell code if things are so screwed up that kill -9 is 
the only way out.  This way all of the key, important logic is in Java code and 
the one thing the Java code probably shouldn't do (kill) is left to the shell 
code.

bq. Instead of emphasizing namenode stop functionality works, I would rather 
see save namespace work.

To the person who isn't looking at the code, these are effectively one and the 
same. If I'm stopping the namenode, I expect it to do what is necessary to come 
back up in a sane state.  Why should an admin have to make the decision here 
when the NN itself knows the state best?  Telling me to run save namespace is 
dumb:  "Why didn't you just do it yourself, you stupid program?" :D

bq.  Isn't there an environment variable that enables this functionality? For 
folks who want stop to not save namespace or a different behavior, it can be be 
used to go back to the previous behavior, right?

The # of times this is going to be needed should approach zero... and in those 
cases, a Java property (or properties!) is *way* better.  Some clueless person 
is going to tell others "Hey, set this to make your system shut down faster."  
The Java apps can read the properties do whatever it needed/desired.  This also 
means they can prompt to say "are you sure?" because this is the type of 
operation (shutdown w/out checkpoint) that sounds like should never happen in 
an automated way.

> Allow users to skip checkpoint when stopping NameNode
> -----------------------------------------------------
>
>                 Key: HDFS-7991
>                 URL: https://issues.apache.org/jira/browse/HDFS-7991
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7991.000.patch, HDFS-7991.001.patch, 
> HDFS-7991.002.patch, HDFS-7991.003.patch, HDFS-7991.004.patch
>
>
> This is a follow-up jira of HDFS-6353. HDFS-6353 adds the functionality to 
> check if saving namespace is necessary before stopping namenode. As [~kihwal] 
> pointed out in this 
> [comment|https://issues.apache.org/jira/browse/HDFS-6353?focusedCommentId=14380898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380898],
>  in a secured cluster this new functionality requires the user to be kinit'ed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to