[ https://issues.apache.org/jira/browse/HDFS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555300#comment-14555300 ]
Allen Wittenauer commented on HDFS-7991: ---------------------------------------- bq. Then if we also let this java program send out the checkpoint check command, and considering our current RPC already has the capability to handle timeout and retry, I guess we can directly utilize the current saveNamespace RPC? I would keep it simple: shutdown also triggers the logic for if checkpoint is necessary. There's zero value in "waiting" for the helper app to trigger it. This also means the helper app is extremely simple: an unauthenticated call that does "is checkpoint still happening? Is checkpoint still happening? What about now? Are we down yet Papa Smurf?" This way we also fix [~sureshms] issue: bq. Blindly sending kill -9 is not an option in my opinion. That's why it's not blind. The helper app's *sole* purpose should be to provide the hint to the shell code if things are so screwed up that kill -9 is the only way out. This way all of the key, important logic is in Java code and the one thing the Java code probably shouldn't do (kill) is left to the shell code. bq. Instead of emphasizing namenode stop functionality works, I would rather see save namespace work. To the person who isn't looking at the code, these are effectively one and the same. If I'm stopping the namenode, I expect it to do what is necessary to come back up in a sane state. Why should an admin have to make the decision here when the NN itself knows the state best? Telling me to run save namespace is dumb: "Why didn't you just do it yourself, you stupid program?" :D bq. Isn't there an environment variable that enables this functionality? For folks who want stop to not save namespace or a different behavior, it can be be used to go back to the previous behavior, right? The # of times this is going to be needed should approach zero... and in those cases, a Java property (or properties!) is *way* better. Some clueless person is going to tell others "Hey, set this to make your system shut down faster." The Java apps can read the properties do whatever it needed/desired. This also means they can prompt to say "are you sure?" because this is the type of operation (shutdown w/out checkpoint) that sounds like should never happen in an automated way. > Allow users to skip checkpoint when stopping NameNode > ----------------------------------------------------- > > Key: HDFS-7991 > URL: https://issues.apache.org/jira/browse/HDFS-7991 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.0.0 > Reporter: Jing Zhao > Assignee: Jing Zhao > Attachments: HDFS-7991.000.patch, HDFS-7991.001.patch, > HDFS-7991.002.patch, HDFS-7991.003.patch, HDFS-7991.004.patch > > > This is a follow-up jira of HDFS-6353. HDFS-6353 adds the functionality to > check if saving namespace is necessary before stopping namenode. As [~kihwal] > pointed out in this > [comment|https://issues.apache.org/jira/browse/HDFS-6353?focusedCommentId=14380898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380898], > in a secured cluster this new functionality requires the user to be kinit'ed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)