[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502334#comment-17502334 ] Hudson commented on HBASE-26304: Results for branch master [build #528 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/General_20Nightly_20Build_20Report/] (x) {color:red}-1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2 > > > Edit: Description updated to avoid needing to read the full investigation > laid out in the comments. > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > temporarily increase long tail latencies relative to configured backoff > strategy. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. This implementation was complicated both in > integrating callbacks into the HDFS Dispatcher and in terms of safely > re-opening StoreFiles without impacting reads or caches. > In working to port the LocalityHealer to the Apache projects, I'm taking a > different approach: > * The part of the LocalityHealer that moves blocks will be an HDFS project > contribution > * As such, the DFSClient should be able to more gracefully recover from > block moves. > * Additionally, HBase has some caches of block locations for locality > reporting and the balancer. Those need to be kept up-to-date. > The DFSClient improvements are covered in HDFS-16261 and HDFS-16262. As such, > this issue becomes about updating HBase's block location caches. > I considered a few different approaches, but the most elegant one I could > come up with was to tie the HDFSBlockDistribution metrics directly to the > underlying DFSInputStream of each StoreFile's initialReader. That way, our > locality metrics are identically representing the block allocations that our > reads are going through. This also means that our locality metrics will > naturally adjust as the DFSInputStream adjusts to block moves. > Once we have accurate locality metrics on the regionserver, the Balancer's > cache can easily be invalidated via our usual heartbeat methods. > RegionServers report to the HMaster periodically, which keeps a > ClusterMetrics method up to date. Right before each balancer invocation, the > balancer is updated with the latest ClusterMetrics. At this time, we compare > the old ClusterMetrics to the new, and invalidate the caches for any regions > whose locality has changed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453495#comment-17453495 ] Hudson commented on HBASE-26304: Results for branch branch-2 [build #409 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/409/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/409/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/409/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/409/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/409/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2 > > > Edit: Description updated to avoid needing to read the full investigation > laid out in the comments. > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > temporarily increase long tail latencies relative to configured backoff > strategy. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. This implementation was complicated both in > integrating callbacks into the HDFS Dispatcher and in terms of safely > re-opening StoreFiles without impacting reads or caches. > In working to port the LocalityHealer to the Apache projects, I'm taking a > different approach: > * The part of the LocalityHealer that moves blocks will be an HDFS project > contribution > * As such, the DFSClient should be able to more gracefully recover from > block moves. > * Additionally, HBase has some caches of block locations for locality > reporting and the balancer. Those need to be kept up-to-date. > The DFSClient improvements are covered in HDFS-16261 and HDFS-16262. As such, > this issue becomes about updating HBase's block location caches. > I considered a few different approaches, but the most elegant one I could > come up with was to tie the HDFSBlockDistribution metrics directly to the > underlying DFSInputStream of each StoreFile's initialReader. That way, our > locality metrics are identically representing the block allocations that our > reads are going through. This also means that our locality metrics will > naturally adjust as the DFSInputStream adjusts to block moves. > Once we have accurate locality metrics on the regionserver, the Balancer's > cache can easily be invalidated via our usual heartbeat methods. > RegionServers report to the HMaster periodically, which keeps a > ClusterMetrics method up to date. Right before each balancer invocation, the > balancer is updated with the latest ClusterMetrics. At this time, we compare > the old ClusterMetrics to the new, and invalidate the caches for any regions > whose locality has changed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449916#comment-17449916 ] Hudson commented on HBASE-26304: Results for branch master [build #453 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/453/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/453/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/453/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/453/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Edit: Description updated to avoid needing to read the full investigation > laid out in the comments. > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > temporarily increase long tail latencies relative to configured backoff > strategy. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. This implementation was complicated both in > integrating callbacks into the HDFS Dispatcher and in terms of safely > re-opening StoreFiles without impacting reads or caches. > In working to port the LocalityHealer to the Apache projects, I'm taking a > different approach: > * The part of the LocalityHealer that moves blocks will be an HDFS project > contribution > * As such, the DFSClient should be able to more gracefully recover from > block moves. > * Additionally, HBase has some caches of block locations for locality > reporting and the balancer. Those need to be kept up-to-date. > The DFSClient improvements are covered in HDFS-16261 and HDFS-16262. As such, > this issue becomes about updating HBase's block location caches. > I considered a few different approaches, but the most elegant one I could > come up with was to tie the HDFSBlockDistribution metrics directly to the > underlying DFSInputStream of each StoreFile's initialReader. That way, our > locality metrics are identically representing the block allocations that our > reads are going through. This also means that our locality metrics will > naturally adjust as the DFSInputStream adjusts to block moves. > Once we have accurate locality metrics on the regionserver, the Balancer's > cache can easily be invalidated via our usual heartbeat methods. > RegionServers report to the HMaster periodically, which keeps a > ClusterMetrics method up to date. Right before each balancer invocation, the > balancer is updated with the latest ClusterMetrics. At this time, we compare > the old ClusterMetrics to the new, and invalidate the caches for any regions > whose locality has changed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449828#comment-17449828 ] Bryan Beaudreault commented on HBASE-26304: --- Will do on Monday, thanks for reviewing! > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Edit: Description updated to avoid needing to read the full investigation > laid out in the comments. > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > temporarily increase long tail latencies relative to configured backoff > strategy. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. This implementation was complicated both in > integrating callbacks into the HDFS Dispatcher and in terms of safely > re-opening StoreFiles without impacting reads or caches. > In working to port the LocalityHealer to the Apache projects, I'm taking a > different approach: > * The part of the LocalityHealer that moves blocks will be an HDFS project > contribution > * As such, the DFSClient should be able to more gracefully recover from > block moves. > * Additionally, HBase has some caches of block locations for locality > reporting and the balancer. Those need to be kept up-to-date. > The DFSClient improvements are covered in HDFS-16261 and HDFS-16262. As such, > this issue becomes about updating HBase's block location caches. > I considered a few different approaches, but the most elegant one I could > come up with was to tie the HDFSBlockDistribution metrics directly to the > underlying DFSInputStream of each StoreFile's initialReader. That way, our > locality metrics are identically representing the block allocations that our > reads are going through. This also means that our locality metrics will > naturally adjust as the DFSInputStream adjusts to block moves. > Once we have accurate locality metrics on the regionserver, the Balancer's > cache can easily be invalidated via our usual heartbeat methods. > RegionServers report to the HMaster periodically, which keeps a > ClusterMetrics method up to date. Right before each balancer invocation, the > balancer is updated with the latest ClusterMetrics. At this time, we compare > the old ClusterMetrics to the new, and invalidate the caches for any regions > whose locality has changed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449725#comment-17449725 ] Duo Zhang commented on HBASE-26304: --- [~bbeaudreault] Please open a PR for branch-2? The master patch can not be applied to branch-2 cleanly. Thanks. > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Edit: Description updated to avoid needing to read the full investigation > laid out in the comments. > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > temporarily increase long tail latencies relative to configured backoff > strategy. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. This implementation was complicated both in > integrating callbacks into the HDFS Dispatcher and in terms of safely > re-opening StoreFiles without impacting reads or caches. > In working to port the LocalityHealer to the Apache projects, I'm taking a > different approach: > * The part of the LocalityHealer that moves blocks will be an HDFS project > contribution > * As such, the DFSClient should be able to more gracefully recover from > block moves. > * Additionally, HBase has some caches of block locations for locality > reporting and the balancer. Those need to be kept up-to-date. > The DFSClient improvements are covered in HDFS-16261 and HDFS-16262. As such, > this issue becomes about updating HBase's block location caches. > I considered a few different approaches, but the most elegant one I could > come up with was to tie the HDFSBlockDistribution metrics directly to the > underlying DFSInputStream of each StoreFile's initialReader. That way, our > locality metrics are identically representing the block allocations that our > reads are going through. This also means that our locality metrics will > naturally adjust as the DFSInputStream adjusts to block moves. > Once we have accurate locality metrics on the regionserver, the Balancer's > cache can easily be invalidated via our usual heartbeat methods. > RegionServers report to the HMaster periodically, which keeps a > ClusterMetrics method up to date. Right before each balancer invocation, the > balancer is updated with the latest ClusterMetrics. At this time, we compare > the old ClusterMetrics to the new, and invalidate the caches for any regions > whose locality has changed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440687#comment-17440687 ] Huaxiang Sun commented on HBASE-26304: -- Thanks [~bbeaudreault] . Will try to look at the patch in the following days and try out at my testing clusters as well. > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > temporarily increase long tail latencies relative to configured backoff > strategy. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. This implementation was complicated both in > integrating callbacks into the HDFS Dispatcher and in terms of safely > re-opening StoreFiles without impacting reads or caches. > In working to port the LocalityHealer to the Apache projects, I'm taking a > different approach: > * The part of the LocalityHealer that moves blocks will be an HDFS project > contribution > * As such, the DFSClient should be able to more gracefully recover from > block moves. > * Additionally, HBase has some caches of block locations for locality > reporting and the balancer. Those need to be kept up-to-date. > The DFSClient improvements are covered in > https://issues.apache.org/jira/browse/HDFS-16261. As such, this issue becomes > about updating HBase's block location caches. > I considered a few different approaches, but the most elegant one I could > come up with was to tie the HDFSBlockDistribution metrics directly to the > underlying DFSInputStream of each StoreFile's initialReader. That way, our > locality metrics are identically representing the block allocations that our > reads are going through. This also means that our locality metrics will > naturally adjust as the DFSInputStream adjusts to block moves. > Once we have accurate locality metrics on the regionserver, the Balancer's > cache can easily be invalidated via our usual heartbeat methods. > RegionServers report to the HMaster periodically, which keeps a > ClusterMetrics method up to date. Right before each balancer invocation, the > balancer is updated with the latest ClusterMetrics. At this time, we compare > the old ClusterMetrics to the new, and invalidate the caches for any regions > whose locality has changed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440580#comment-17440580 ] Bryan Beaudreault commented on HBASE-26304: --- This has been rolled out to ~80 clusters in our QA environment. Will likely start doing prod rollouts in the near future. > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > temporarily increase long tail latencies relative to configured backoff > strategy. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. This implementation was complicated both in > integrating callbacks into the HDFS Dispatcher and in terms of safely > re-opening StoreFiles without impacting reads or caches. > In working to port the LocalityHealer to the Apache projects, I'm taking a > different approach: > * The part of the LocalityHealer that moves blocks will be an HDFS project > contribution > * As such, the DFSClient should be able to more gracefully recover from > block moves. > * Additionally, HBase has some caches of block locations for locality > reporting and the balancer. Those need to be kept up-to-date. > The DFSClient improvements are covered in > https://issues.apache.org/jira/browse/HDFS-16261. As such, this issue becomes > about updating HBase's block location caches. > I considered a few different approaches, but the most elegant one I could > come up with was to tie the HDFSBlockDistribution metrics directly to the > underlying DFSInputStream of each StoreFile's initialReader. That way, our > locality metrics are identically representing the block allocations that our > reads are going through. This also means that our locality metrics will > naturally adjust as the DFSInputStream adjusts to block moves. > Once we have accurate locality metrics on the regionserver, the Balancer's > cache can easily be invalidated via our usual heartbeat methods. > RegionServers report to the HMaster periodically, which keeps a > ClusterMetrics method up to date. Right before each balancer invocation, the > balancer is updated with the latest ClusterMetrics. At this time, we compare > the old ClusterMetrics to the new, and invalidate the caches for any regions > whose locality has changed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435597#comment-17435597 ] Bryan Beaudreault commented on HBASE-26304: --- I was able to get the 3rd (arguably ideal) option to work. I've been running it in one of our internal test clusters and it's been working great. I updated the original description and pushed an updated PR based on the final approach taken, so that people don't need to read the above wall of text :) > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > temporarily increase long tail latencies relative to configured backoff > strategy. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. This implementation was complicated both in > integrating callbacks into the HDFS Dispatcher and in terms of safely > re-opening StoreFiles without impacting reads or caches. > In working to port the LocalityHealer to the Apache projects, I'm taking a > different approach: > * The part of the LocalityHealer that moves blocks will be an HDFS project > contribution > * As such, the DFSClient should be able to more gracefully recover from > block moves. > * Additionally, HBase has some caches of block locations for locality > reporting and the balancer. Those need to be kept up-to-date. > The DFSClient improvements are covered in > https://issues.apache.org/jira/browse/HDFS-16261. As such, this issue becomes > about updating HBase's block location caches. > I considered a few different approaches, but the most elegant one I could > come up with was to tie the HDFSBlockDistribution metrics directly to the > underlying DFSInputStream of each StoreFile's initialReader. That way, our > locality metrics are identically representing the block allocations that our > reads are going through. This also means that our locality metrics will > naturally adjust as the DFSInputStream adjusts to block moves. > Once we have accurate locality metrics on the regionserver, the Balancer's > cache can easily be invalidated via our usual heartbeat methods. > RegionServers report to the HMaster periodically, which keeps a > ClusterMetrics method up to date. Right before each balancer invocation, the > balancer is updated with the latest ClusterMetrics. At this time, we compare > the old ClusterMetrics to the new, and invalidate the caches for any regions > whose locality has changed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433138#comment-17433138 ] Bryan Beaudreault commented on HBASE-26304: --- Unfortunately there isn't an easy way to achieve the 3rd option. I could potentially add something to DFSInputStream, but we'd be stuck waiting for it to get backported to all supported hadoop versions. I also realized that it's very possible even without the locality healer for blocks to move, and we don't really reflect that at all today. I ended up going with option 1 above, as the easiest option and also most off the critical path. I created a LocalityMetricsRefreshChore which periodically refreshes the HDFSBlockDistribution for all stores on the server. I thought about limiting it to only non-100% locality stores, but decided against it so that we can also reflect locality _regressions_ due to datanodes dying or if someone ran Balancer or Mover. For anyone monitoring or alerting on locality, it's just as important to know if something just totally tanked locality as it is to knowing whether it was improved by something like the healer. I'm doing some testing, but will push a PR with this new chore probably next week. > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > increase long tail latencies relative to configured backoff strategy. > See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in > backoff strategy which can greatly mitigate latency impact of the missing > block retry. > Even with that mitigation, a StoreFile is often made up of many blocks. > Without some sort of intervention, we will continue to hit > ReplicaNotFoundException over time as clients naturally request data from > moved blocks. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. > I will submit a PR with that implementation, but I am also investigating > other avenues. For example, I noticed > https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but > maybe can be improved as an automatic lower-level handling of block moves. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432456#comment-17432456 ] Bryan Beaudreault commented on HBASE-26304: --- As mentioned above, I have implementations for the above 2 HDFS issues and it works great for ensuring HBase is able to take advantage of new locality improvements without any DFSClient warnings. Before pushing PRs for those, I'm now taking a look at the localityIndex reporting issue in case that affects the strategy. The core problem is that when a StoreFile is opened, a StoreFileInfo object is created. Initializing that StoreFileInfo calls computeHDFSBlocksDistribution and caches the result for the lifetime of the StoreFileInfo. The resulting value is available via the getHDFSBlockDistribution method. The getHDFSBlockDistribution has three usages: * RatioBasedCompactionPolicy and DateTieredCompactionPolicy uses it to force a major compaction on files whose BlockLocalityIndex is less than a threshold * The value is aggregated for all StoreFiles in an HRegion, and used to create RegionLoad objects. RegionLoads are created in a few ways: ** On demand, when loading RegionServer UI "Regions" section ** On demand, through HBaseAdmin.getRegionLoad(ServerName, TableName) ** Periodically, in reporting heartbeat to HMaster, by default 3s. The HMaster uses these in a few ways: *** Available to query via HBaseAdmin *** Used in HMaster UI, where you can see localityIndex when viewing table page *** Used in various load balancer functions (though not localityIndex, since the balancer computes that separately) * The value is aggregated for all StoreFiles in an HRegion, and used to report localityIndex metrics. ** This happens in a thread which executes on an interval, by default 5s. The resulting metrics are available in JMX, hbtop, and the "Server Metrics" section at the top of RegionServer UIs. All of these usages are non-time sensitive, i.e. not in a core read path or anything. As such I think we could consider the StoreFileInfo hdfsBlockDistribution a cache which must be cleared. Previously it was a cache of a value that rarely changed, and now we need more control over clearing. I can think of 3 options for this: * We could create a periodic chore which reloads the cached value for all store files. This could be filtered to only clear values which are not fully local. * We could add a TTL on the cached value, which gets enforced at read time. In other words, when getHDFSBlockDistribution is called, re-compute if TTL is expired. We could similarly limit this to only files which are not fully local. * We could use some trigger from the DFSInputStream to intelligently refresh the HDFSBlockDistribution only if the underlying stream has been updated. I think this would have to happen at the HStoreFile level, which has a similar getHDFSBlockDistribution which is the only caller to the StorefileInfo method. The HStoreFile has access to the initialReader object which can access the underlying FSDataInputStreamWrapper. We'd need to expose something in DFSInputStream that can be used to trigger the logic. Of the options, I think the last one is most appealing because we could avoid yet another config (the refresh ttl/period). That one also is the most involved and requires some investigation. My second preference would be the 2nd option above, because I'd like to avoid another chore. I don't think the minor latency hit of fetching block locations should be an issue for any of the use cases mentioned above. I'm going to do a little more investigation into what the 3rd option could look like. > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > increase long tail latencies relative to configured backoff strategy. > See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in > backoff strategy which can greatly mitigate latency impact of the missing > block retry. > Even with that mitigation, a StoreFile is often made up of many blocks. > Without some sort of intervention, we will continue to hit > Re
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425868#comment-17425868 ] Bryan Beaudreault commented on HBASE-26304: --- I have a proof of concept working with the above 2 HDFS issues in a test cluster. Works great, though as mentioned above I still need to figure out how to update localityIndex, aka how to trigger computeHdfsBlockDistribution in StoreFileInfo > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > increase long tail latencies relative to configured backoff strategy. > See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in > backoff strategy which can greatly mitigate latency impact of the missing > block retry. > Even with that mitigation, a StoreFile is often made up of many blocks. > Without some sort of intervention, we will continue to hit > ReplicaNotFoundException over time as clients naturally request data from > moved blocks. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. > I will submit a PR with that implementation, but I am also investigating > other avenues. For example, I noticed > https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but > maybe can be improved as an automatic lower-level handling of block moves. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425263#comment-17425263 ] Bryan Beaudreault commented on HBASE-26304: --- Note: the above should drastically reduce the impact of block moves, but will not fix our localityIndex metrics. I'm going to need to find some other way to expose on the DFSInputStream that locality has improved, which could trigger a recalculation of localityIndex. Alternatively, we could just do it as part of a chore in the regionserver. > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > increase long tail latencies relative to configured backoff strategy. > See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in > backoff strategy which can greatly mitigate latency impact of the missing > block retry. > Even with that mitigation, a StoreFile is often made up of many blocks. > Without some sort of intervention, we will continue to hit > ReplicaNotFoundException over time as clients naturally request data from > moved blocks. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. > I will submit a PR with that implementation, but I am also investigating > other avenues. For example, I noticed > https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but > maybe can be improved as an automatic lower-level handling of block moves. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests
[ https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425216#comment-17425216 ] Bryan Beaudreault commented on HBASE-26304: --- I submitted the above PR which straight ported the approach I used internally for refreshing store files after locality had been healed. After thinking about it some more, I've decided to take this in a different direction: The LocalityHealer (which moves blocks to requested hosts) itself will end up being an HDFS project contribution. In the narrow scope of HBase, this issue is about ensuring a RegionServer can gracefully recover after blocks have been moved from under it. Given the LocalityHealer will be an HDFS project contribution, I think ideally the DFSClient itself can gracefully recover from such an event. With that in mind, I'm going to try to take a somewhat different approach: * HDFS-15119 added a basic invalidation of DFSInputStream cached LocatedBlocks. I'm going to expand upon that so that we can safely and reliably refresh block locations for DFSInputStreams lacking a local replica: https://issues.apache.org/jira/browse/HDFS-16262 * Additionally, I'm going to try to add a grace period to block invalidations in https://issues.apache.org/jira/browse/HDFS-16261. When a block is moved with REPLACE_BLOCK, the block is invalidated on the old host and asynchronously deleted. Adding a configurable grace period on the deletion where will give the above refresh enough time to refresh cached locations and totally skip any pain related to moving blocks around. > Reflect out-of-band locality improvements in served requests > > > Key: HBASE-26304 > URL: https://issues.apache.org/jira/browse/HBASE-26304 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > Once the LocalityHealer has improved locality of a StoreFile (by moving > blocks onto the correct host), the Reader's DFSInputStream and Region's > localityIndex metric must be refreshed. Without refreshing the > DFSInputStream, the improved locality will not improve latencies. In fact, > the DFSInputStream may try to fetch blocks that have moved, resulting in a > ReplicaNotFoundException. This is automatically retried, but the retry will > increase long tail latencies relative to configured backoff strategy. > See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in > backoff strategy which can greatly mitigate latency impact of the missing > block retry. > Even with that mitigation, a StoreFile is often made up of many blocks. > Without some sort of intervention, we will continue to hit > ReplicaNotFoundException over time as clients naturally request data from > moved blocks. > In the original LocalityHealer design, I created a new > RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list > of region names and, for each region store, re-opens the underlying StoreFile > if the locality has changed. > I will submit a PR with that implementation, but I am also investigating > other avenues. For example, I noticed > https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but > maybe can be improved as an automatic lower-level handling of block moves. -- This message was sent by Atlassian Jira (v8.3.4#803005)