[ 
https://issues.apache.org/jira/browse/HDFS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554800#comment-14554800
 ] 

Jing Zhao commented on HDFS-7991:
---------------------------------

Thanks Allen. Yes, I also just realized that jmx may not be a good solution 
here.

bq. to do a REST or RPC call to ask the NN what it's doing
The same question here is what if this RPC/REST call fails (or timeout)? Should 
we retry and how? Or should we kill the NameNode? To me this is not 
fundamentally different from the "saveNamespace" solution:
# We're using kill to trigger the shutdown hook which does the checkpoint. This 
can be mapped to the step sending out a saveNamespace command to NN.
# We then keep polling the state of the NameNode using a REST/RPC call, just 
like waiting for the response from the saveNamespace RPC.
# Both solutions finally need to answer the same question: what if the REST/RPC 
call fails?

bq. This will almost certainly break init.d/rc.d/service/launchd/whatever 
scripts.
Yes, but I think if the checkpoint is necessary at this time, breaking these 
scripts may not be that bad compared with killing the namenode then waiting 
hours for the namenode to load edits or even fixing corrupted edits.

bq. currently does not require a Kerberos credential
Regarding to the auth part, how about directly parsing the hdfs-site.xml and 
getting the namenode fsimage/edits directory location? Then we can directly 
check if the checkpoint is necessary by going through the fsimage/edits file 
names.

> Allow users to skip checkpoint when stopping NameNode
> -----------------------------------------------------
>
>                 Key: HDFS-7991
>                 URL: https://issues.apache.org/jira/browse/HDFS-7991
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7991.000.patch, HDFS-7991.001.patch, 
> HDFS-7991.002.patch, HDFS-7991.003.patch, HDFS-7991.004.patch
>
>
> This is a follow-up jira of HDFS-6353. HDFS-6353 adds the functionality to 
> check if saving namespace is necessary before stopping namenode. As [~kihwal] 
> pointed out in this 
> [comment|https://issues.apache.org/jira/browse/HDFS-6353?focusedCommentId=14380898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380898],
>  in a secured cluster this new functionality requires the user to be kinit'ed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to