[ 
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440687#comment-17440687
 ] 

Huaxiang Sun commented on HBASE-26304:
--------------------------------------

Thanks [~bbeaudreault] . Will try to look at the patch in the following days 
and try out at my testing clusters as well.

> Reflect out-of-band locality improvements in served requests
> ------------------------------------------------------------
>
>                 Key: HBASE-26304
>                 URL: https://issues.apache.org/jira/browse/HBASE-26304
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>
> Once the LocalityHealer has improved locality of a StoreFile (by moving 
> blocks onto the correct host), the Reader's DFSInputStream and Region's 
> localityIndex metric must be refreshed. Without refreshing the 
> DFSInputStream, the improved locality will not improve latencies. In fact, 
> the DFSInputStream may try to fetch blocks that have moved, resulting in a 
> ReplicaNotFoundException. This is automatically retried, but the retry will 
> temporarily increase long tail latencies relative to configured backoff 
> strategy.
> In the original LocalityHealer design, I created a new 
> RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
> of region names and, for each region store, re-opens the underlying StoreFile 
> if the locality has changed. This implementation was complicated both in 
> integrating callbacks into the HDFS Dispatcher and in terms of safely 
> re-opening StoreFiles without impacting reads or caches. 
> In working to port the LocalityHealer to the Apache projects, I'm taking a 
> different approach:
>  * The part of the LocalityHealer that moves blocks will be an HDFS project 
> contribution
>  * As such, the DFSClient should be able to more gracefully recover from 
> block moves.
>  * Additionally, HBase has some caches of block locations for locality 
> reporting and the balancer. Those need to be kept up-to-date.
> The DFSClient improvements are covered in 
> https://issues.apache.org/jira/browse/HDFS-16261. As such, this issue becomes 
> about updating HBase's block location caches.
> I considered a few different approaches, but the most elegant one I could 
> come up with was to tie the HDFSBlockDistribution metrics directly to the 
> underlying DFSInputStream of each StoreFile's initialReader. That way, our 
> locality metrics are identically representing the block allocations that our 
> reads are going through. This also means that our locality metrics will 
> naturally adjust as the DFSInputStream adjusts to block moves.
> Once we have accurate locality metrics on the regionserver, the Balancer's 
> cache can easily be invalidated via our usual heartbeat methods. 
> RegionServers report to the HMaster periodically, which keeps a 
> ClusterMetrics method up to date. Right before each balancer invocation, the 
> balancer is updated with the latest ClusterMetrics. At this time, we compare 
> the old ClusterMetrics to the new, and invalidate the caches for any regions 
> whose locality has changed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to