[ 
https://issues.apache.org/jira/browse/HDFS-16261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425866#comment-17425866
 ] 

Bryan Beaudreault commented on HDFS-16261:
------------------------------------------

I've verified that setting "dfs.namenode.redundancy.interval.seconds" to, for 
example, 5 minutes and setting the DFSClient block location refresh to 10 
seconds (https://issues.apache.org/jira/browse/HDFS-16262) results in zero 
ReplicaNotFoundExceptions even when all the primary replica for all blocks are 
shuffled to do different hosts. Enabling debug logging of the refresh thread, I 
can see that while blocks are being shuffled the refresh thread will trigger 
for files whose blocks have moved and then once all block moves are finished 
the refresh thread will settle down to 0 blocks refreshed.

I'm going to dig more into the above comment tomorrow, but wanted to test the 
simple change just to prove the concept. That appears to have been a success.

> Configurable grace period around deletion of invalidated blocks
> ---------------------------------------------------------------
>
>                 Key: HDFS-16261
>                 URL: https://issues.apache.org/jira/browse/HDFS-16261
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>
> When a block is moved with REPLACE_BLOCK, the new location is recorded in the 
> NameNode and the NameNode instructs the old host to in invalidate the block 
> using DNA_INVALIDATE. As it stands today, this invalidation is async but 
> tends to happen relatively quickly.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). One 
> issue is that HBase tends to keep open long running DFSInputStreams and 
> moving blocks from under them causes lots of warns in the RegionServer and 
> increases long tail latencies due to the necessary retries in the DFSClient.
> One way I'd like to fix this is to provide a configurable grace period on 
> async invalidations. This would give the DFSClient enough time to refresh 
> block locations before hitting any errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to