[ https://issues.apache.org/jira/browse/HDFS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554800#comment-14554800 ]
Jing Zhao commented on HDFS-7991: --------------------------------- Thanks Allen. Yes, I also just realized that jmx may not be a good solution here. bq. to do a REST or RPC call to ask the NN what it's doing The same question here is what if this RPC/REST call fails (or timeout)? Should we retry and how? Or should we kill the NameNode? To me this is not fundamentally different from the "saveNamespace" solution: # We're using kill to trigger the shutdown hook which does the checkpoint. This can be mapped to the step sending out a saveNamespace command to NN. # We then keep polling the state of the NameNode using a REST/RPC call, just like waiting for the response from the saveNamespace RPC. # Both solutions finally need to answer the same question: what if the REST/RPC call fails? bq. This will almost certainly break init.d/rc.d/service/launchd/whatever scripts. Yes, but I think if the checkpoint is necessary at this time, breaking these scripts may not be that bad compared with killing the namenode then waiting hours for the namenode to load edits or even fixing corrupted edits. bq. currently does not require a Kerberos credential Regarding to the auth part, how about directly parsing the hdfs-site.xml and getting the namenode fsimage/edits directory location? Then we can directly check if the checkpoint is necessary by going through the fsimage/edits file names. > Allow users to skip checkpoint when stopping NameNode > ----------------------------------------------------- > > Key: HDFS-7991 > URL: https://issues.apache.org/jira/browse/HDFS-7991 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.0.0 > Reporter: Jing Zhao > Assignee: Jing Zhao > Attachments: HDFS-7991.000.patch, HDFS-7991.001.patch, > HDFS-7991.002.patch, HDFS-7991.003.patch, HDFS-7991.004.patch > > > This is a follow-up jira of HDFS-6353. HDFS-6353 adds the functionality to > check if saving namespace is necessary before stopping namenode. As [~kihwal] > pointed out in this > [comment|https://issues.apache.org/jira/browse/HDFS-6353?focusedCommentId=14380898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380898], > in a secured cluster this new functionality requires the user to be kinit'ed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)