[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440687#comment-17440687 ]
Huaxiang Sun commented on HBASE-26304: -------------------------------------- Thanks [~bbeaudreault] . Will try to look at the patch in the following days and try out at my testing clusters as well. > Reflect out-of-band locality improvements in served requests > ------------------------------------------------------------ > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task > Reporter: Bryan Beaudreault > Assignee: Bryan Beaudreault > Priority: Major > > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > temporarily increase long tail latencies relative to configured backoff > strategy. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. This implementation was complicated both in > integrating callbacks into the HDFS Dispatcher and in terms of safely > re-opening StoreFiles without impacting reads or caches. > In working to port the LocalityHealer to the Apache projects, I'm taking a > different approach: > * The part of the LocalityHealer that moves blocks will be an HDFS project > contribution > * As such, the DFSClient should be able to more gracefully recover from > block moves. > * Additionally, HBase has some caches of block locations for locality > reporting and the balancer. Those need to be kept up-to-date. > The DFSClient improvements are covered in > https://issues.apache.org/jira/browse/HDFS-16261. As such, this issue becomes > about updating HBase's block location caches. > I considered a few different approaches, but the most elegant one I could > come up with was to tie the HDFSBlockDistribution metrics directly to the > underlying DFSInputStream of each StoreFile's initialReader. That way, our > locality metrics are identically representing the block allocations that our > reads are going through. This also means that our locality metrics will > naturally adjust as the DFSInputStream adjusts to block moves. > Once we have accurate locality metrics on the regionserver, the Balancer's > cache can easily be invalidated via our usual heartbeat methods. > RegionServers report to the HMaster periodically, which keeps a > ClusterMetrics method up to date. Right before each balancer invocation, the > balancer is updated with the latest ClusterMetrics. At this time, we compare > the old ClusterMetrics to the new, and invalidate the caches for any regions > whose locality has changed. -- This message was sent by Atlassian Jira (v8.20.1#820001)