[ 
https://issues.apache.org/jira/browse/HDFS-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643276#comment-13643276
 ] 

Aaron T. Myers commented on HDFS-4754:
--------------------------------------

Hi Nicolas, in general I'm a little leery of adding a client API which allows 
arbitrary clients to affect the perceived health of whole DNs, given the 
potential for abuse. The only mildly similar thing that currently exists that 
I'm aware of is the ClientProtocol#reportBadBlocks API, though that obviously 
works only with single replicas, not whole DNs.

That said, I won't block this change, especially if we make make it possible to 
disable the feature on the server side.
                
> Add an API in the namenode to mark a datanode as stale
> ------------------------------------------------------
>
>                 Key: HDFS-4754
>                 URL: https://issues.apache.org/jira/browse/HDFS-4754
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>            Reporter: Nicolas Liochon
>            Priority: Critical
>
> There is a detection of the stale datanodes in HDFS since HDFS-3703, with a 
> timeout, defaulted to 30s.
> There are two reasons to add an API to mark a node as stale even if the 
> timeout is not yet reached:
>  1) ZooKeeper can detect that a client is dead at any moment. So, for HBase, 
> we sometimes start the recovery before a node is marked staled. (even with 
> reasonable settings as: stale: 20s; HBase ZK timeout: 30s
>  2) Some third parties could detect that a node is dead before the timeout, 
> hence saving us the cost of retrying. An example or such hw is Arista, 
> presented here by [~tsuna] 
> http://tsunanet.net/~tsuna/fsf-hbase-meetup-april13.pdf, and confirmed in 
> HBASE-6290.
> As usual, even if the node is dead it can comeback before the 10 minutes 
> limit. So I would propose to set a timebound. The API would be
> namenode.markStale(String ipAddress, int port, long durationInMs);
> After durationInMs, the namenode would again rely only on its heartbeat to 
> decide.
> Thoughts?
> If there is no objections, and if nobody in the hdfs dev team has the time to 
> spend some time on it, I will give it a try for branch 2 & 3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to