Bryan Beaudreault created HBASE-26304:
-----------------------------------------

             Summary: Reflect out-of-band locality improvements in served 
requests
                 Key: HBASE-26304
                 URL: https://issues.apache.org/jira/browse/HBASE-26304
             Project: HBase
          Issue Type: Sub-task
            Reporter: Bryan Beaudreault
            Assignee: Bryan Beaudreault


Once the LocalityHealer has improved locality of a StoreFile (by moving blocks 
onto the correct host), the Reader's DFSInputStream and Region's localityIndex 
metric must be refreshed. Without refreshing the DFSInputStream, the improved 
locality will not improve latencies. In fact, the DFSInputStream may try to 
fetch blocks that have moved, resulting in a ReplicaNotFoundException. This is 
automatically retried, but the retry will increase long tail latencies relative 
to configured backoff strategy.

See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in 
backoff strategy which can greatly mitigate latency impact of the missing block 
retry.

Even with that mitigation, a StoreFile is often made up of many blocks. Without 
some sort of intervention, we will continue to hit ReplicaNotFoundException 
over time as clients naturally request data from moved blocks.

In the original LocalityHealer design, I created a new 
RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
of region names and, for each region store, re-opens the underlying StoreFile 
if the locality has changed.

I will submit a PR with that implementation, but I am also investigating other 
avenues. For example, I noticed 
https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but 
maybe can be improved as an automatic lower-level handling of block moves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to