[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833435#comment-16833435 ] Andrew Purtell commented on HBASE-22301: Let me circle back here. As it turns out the team was eventually able to observe the underlying cause almost as it happened. The latency sensitive high ingest cluster spans on the order of low hundreds of servers across several racks. Occasionally a handful of failing network links in this medium scale installation would experience elevated error rates. When this would happen the default Linux TCP retransmission timeout of 200 milliseconds would need to elapse before the DFS pipeline could make further progress. Long trains of 200 ms delays or longer would effectively stall the writer pipelines transiting the affected links and the stall would eventually propagate up to the application. The root cause has been addressed. This was more difficult for us to track down than it should have been because although we were collecting metrics on TCP retransmissions we were not utilizing them for alerting or problem investigation. That oversight will be rectified and I offer it as general advice. The mitigation in this patch would have been useful had it been in place. We took other actions that in effect rolled writer pipelines until the pipeline was utilizing datanodes and network circuits not impacted by hardware issues. I’m confident a repeat of the circumstances of this incident will be less impactful with this in place. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mi
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831458#comment-16831458 ] Hudson commented on HBASE-22301: Results for branch branch-2 [build #1860 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1860/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1860//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1860//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1860//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exace
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831446#comment-16831446 ] Hudson commented on HBASE-22301: Results for branch master [build #976 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/976/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/976//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/976//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/976//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those p
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831415#comment-16831415 ] Hudson commented on HBASE-22301: Results for branch branch-1 [build #802 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/802/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/802//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/802//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/802//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered b
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831359#comment-16831359 ] Andrew Purtell commented on HBASE-22301: Ok, thank you, ROLL_ON_SYNC_TIME_MS should be hbase.regionserver.wal.roll.on.sync.ms, I will fix that on commit. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831337#comment-16831337 ] David Manning commented on HBASE-22301: --- Other {{hlog}} references in config settings were updated to {{wal}} across the branches as part of the previous patch, but the latest patch is now adding a {{hlog}} reference to branch-1 and branch-2 (because of the backport of the HBASE-21806 from master) As long as that's the desired design, that's ok (as you say, they are synchronized across the branches), but it just looked weird to me given the other changes towards {{wal}}. {code:java} static final String SLOW_SYNC_TIME_MS ="hbase.regionserver.wal.slowsync.ms"; static final String ROLL_ON_SYNC_TIME_MS = "hbase.regionserver.hlog.roll.on.sync.ms"; static final String SLOW_SYNC_ROLL_THRESHOLD = "hbase.regionserver.wal.slowsync.roll.threshold"; static final String SLOW_SYNC_ROLL_INTERVAL_MS = "hbase.regionserver.wal.slowsync.roll.interval.ms"; {code} > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered b
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831267#comment-16831267 ] Andrew Purtell commented on HBASE-22301: All configs are synchronized across the branches, except in master where we don't do the fallback to deprecated ones. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831245#comment-16831245 ] David Manning commented on HBASE-22301: --- Looks good to me. Regarding earlier comment from [~busbey] should {{hbase.regionserver.hlog.roll.on.sync.ms}} be changed to {{hbase.regionserver.wal.roll.on.sync.ms}} in branch-1, branch-2 and master? > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831120#comment-16831120 ] Andrew Purtell commented on HBASE-22301: Precommit results look good. This checkstyle nit I won't be able to fix: {code}TestLogRolling.java:106: @Test:3: Method length is 164 lines (max allowed is 150). [MethodLength]{code} I have a +1 already for the branch-1 work. Unless objection I am going to commit this to branch-1, branch-2, and master today, after running more local checks. If you have any concerns please post them now. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830834#comment-16830834 ] HBase QA commented on HBASE-22301: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 23s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 8s{color} | {color:red} hbase-server: The patch generated 1 new + 24 unchanged - 0 fixed = 25 total (was 24) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 22s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}140m 31s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}186m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/PreCommit-HBASE-Build/223/artifact/patchprocess/Dockerfile | | JIRA Issue | HBASE-22301 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967533/HBASE-22301.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 50369a6b5d7f 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personal
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830754#comment-16830754 ] Andrew Purtell commented on HBASE-22301: With regard to this functional area there are three distinct WAL implementations, one in the branch-1s, one in the branch-2s, and one in master branch. Where possible I made small and careful refactors to make them more similar. The result is three patches, one for branch-1, one for branch-2, and one for master. TestLogRolling and its new unit testing the changes in this change pass in all three versions. Let's see what precommit says, modulo possible unrelated failures, and I am running the complete suite locally for these changes now. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, a
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830509#comment-16830509 ] Andrew Purtell commented on HBASE-22301: I see the new checkstyle warnings, let me fix them for branch-1 and post a new patch for that along with the branch-2/master patch forthcoming... > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830507#comment-16830507 ] Andrew Purtell commented on HBASE-22301: I ran TestLogRolling 25 times in a loop and it passed every time. I don't think my additions will make it flaky, although the new test is expensive in terms of time, and overall this unit now takes too long. I will file a follow up to break it up. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830477#comment-16830477 ] Sean Busbey commented on HBASE-22301: - should be in HFileSystem > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830463#comment-16830463 ] Sergey Shelukhin commented on HBASE-22301: -- It may do so anyway, due to HDFS-14387 What is the thing that prevents it from picking local node +1 > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830224#comment-16830224 ] Sean Busbey commented on HBASE-22301: - WAL should not be picking a local DN at all for WAL pipelines unless our stuff for telling DFS to not do that is broken. if it is, that's a different problem from what is being solved here. +1 on current union of approaches. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830031#comment-16830031 ] chenxu commented on HBASE-22301: There used to be a JIRA here to do the same thing (HBASE-20902), upload the patch recently, hope you can review it for us.:) > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829981#comment-16829981 ] Andrew Purtell commented on HBASE-22301: bq. DFSClient will choose local DN first, if RS & DN deployed on the same node, the local slow DN will always be selected, so i think we should exclude the slow DN first. This is indeed supposed to be a mitigation while the slow DNs are found and removed. See top post. It is not expected to be nor intended to be a total solution. But it is better than the current state of affairs (which does nothing) > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829974#comment-16829974 ] chenxu commented on HBASE-22301: {quote}On most clusters the probability the new pipeline includes the slow datanode will be low. {quote} DFSClient will choose local DN first, if RS & DN deployed on the same node, the local slow DN will always be selected, so i think we should exclude the slow DN first. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829907#comment-16829907 ] HBase QA commented on HBASE-22301: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 53s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 50s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 49s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 23s{color} | {color:red} hbase-server: The patch generated 3 new + 94 unchanged - 6 fixed = 97 total (was 100) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 48s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}101m 30s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 45s{color} | {color:green} The patch does not generate ASF License warnings. {colo
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829867#comment-16829867 ] Andrew Purtell commented on HBASE-22301: Assuming this is committed I will file a follow up to break up TestLogRolling. Running time for me now is over five minutes, about 350 seconds. Some of the other cases within this unit run for a long time too besides the additions from this patch. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829862#comment-16829862 ] Andrew Purtell commented on HBASE-22301: Attached updated patch that takes a union of the approaches, as discussed. I also added negative tests before and after the positive tests in the unit. Submitting for precommit but let me also run this test in a loop to be as diligent as I can about not introducing a flake. If I see a problem I will report it; otherwise assume all good. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829820#comment-16829820 ] Andrew Purtell commented on HBASE-22301: Just to be clear I'm not suggesting a revert of HBASE-21806 [~sershe]. This was our original approach. Based on data from an incident retrospective it wouldn't have been a sufficient mitigation, though, so we had to set it aside as too simplistic and prone to false positives at the latency threshold we need, but I would not claim it is a bad thing to do, especially if made a difference to you at the high threshold committed with that patch. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829810#comment-16829810 ] Andrew Purtell commented on HBASE-22301: Maybe we can consider a future enhancement that does weighting or smoothing or even prediction (phi-accrual?) but we run this risk of spending a lot of effort over-engineering a mitigation only thing that might not see much use nor prove to be useful enough. The patch here counts the number of slow sync warnings within an interval. If we trip a limit over that time, we will roll. I am working on a union patch that will trigger a roll as soon as we see an outlier beyond the threshold introduced by HBASE-21806. So we will roll: - If any single sync exceeds the HBASE-21806 threshold - If a the count of slow syncs within the current monitoring interval exceeds a threshold (the last revs of the patch on this issue) > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal >
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829803#comment-16829803 ] Sergey Shelukhin commented on HBASE-22301: -- Well I meant Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829801#comment-16829801 ] Andrew Purtell commented on HBASE-22301: bq. Should the rolling simply based on a single value that is a total/weighted sync time accumulated over a N latest syncs, with no minimum threshold by count? That way it can accumulate the offending amount of sync over a single bad one or multiple somewhat-bad ones. That is the approach we took. (smile) More or less. I don't see a harm with a union approach. Our approach here that counts the small slow sync warns and triggers on a train of them will trigger rolls when we needed them in our incidents. The separate single high latency trigger that HBASE-21806 added will I assume trigger rolls when you needed them in your incident(s). > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and >
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829797#comment-16829797 ] Sergey Shelukhin commented on HBASE-22301: -- Sorry about the master only. I just started contributing to HBase again and was assuming that we should move forwards, not backwards ;) I saw that branch-2 is still very much alive now so I'm committing recent fixes there too. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829796#comment-16829796 ] Andrew Purtell commented on HBASE-22301: I sent an email to dev@hbase titled "Trunk only commits are a waste of everyone's time". I am making this claim on that thread (smile). Let's take any response to that to the email thread. Back soon with a patch for branch-1 that takes the union of this approach and HBASE-21806 > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829795#comment-16829795 ] Sergey Shelukhin commented on HBASE-22301: -- In our case though the problem was that each slow sync would take 10s of seconds, so with current DEFAULT_SLOW_SYNC_ROLL_THRESHOLD as far as I can tell from the patch the condition would not trigger for a very long time. Should the rolling simply based on a single value that is a total/weighted sync time accumulated over a N latest syncs, with no minimum threshold by count? That way it can accumulate the offending amount of sync over a single bad one or multiple somewhat-bad ones. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829792#comment-16829792 ] Andrew Purtell commented on HBASE-22301: For example, in the incident we discussed, and the data discussed above, the threshold of 10s HBASE-21806 would never have been tripped. So instead we count trains of slow syncs with the lower default threshold and get an effective mitigation. I'd be happy to keep the hbase.regionserver.hlog.roll.on.sync.ms config from HBASE-21806 and the simple threshold trigger. I would have discovered these when trying to forward port. Sucks that there was no attempt to backport so none of us who work with branch-1 or even branch-2 had any idea this existed. Folks committing trunk only are doing a disservice to the community. Let me make a union of these mechanisms in the context of branch-1 and post it so you can see what I'm talking about before proceeding. I have just about exceeded the time I can spend on this and will have to move on soon. We have been going around and around on this for a week straight. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829789#comment-16829789 ] Andrew Purtell commented on HBASE-22301: No, if you read above we arrived at a different way of doing it. Basically the HBASE-21806 approach is too simplistic and prone to false positives. I'd prefer to replace HBASE-21806 with this approach. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829784#comment-16829784 ] Sergey Shelukhin commented on HBASE-22301: -- Should this augment/be similar to HBASE-21806? > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829676#comment-16829676 ] Andrew Purtell commented on HBASE-22301: Great, to move forward I'll make a patch for branch-2 and master for the sync WAL. Back soon. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827428#comment-16827428 ] HBase QA commented on HBASE-22301: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 46s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 48s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} hbase-server: The patch generated 0 new + 94 unchanged - 6 fixed = 94 total (was 100) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 54s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 43s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hbase
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827426#comment-16827426 ] David Manning commented on HBASE-22301: --- {quote}Because I always want to check this and update the related metric if we hit this condition. The other conditions can be exclusive with respect to each other. {quote} It will now be theoretically possible for {{checkLogRoll}} to call {{requestLogRoll}} twice in a row. If this is acceptable, then sounds good to me. I couldn't immediately tell whether this would really roll twice or would coalesce into one WAL roll. +1 from me, thank you for the fix! > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827398#comment-16827398 ] Andrew Purtell commented on HBASE-22301: Updated patch, here is the change: {code:java} diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java index 24a2271564..b562946671 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java @@ -734,6 +734,8 @@ public class FSHLog implements WAL { // NewPath could be equal to oldPath if replaceWriter fails. newPath = replaceWriter(oldPath, newPath, nextWriter, nextHdfsOut); tellListenersAboutPostLogRoll(oldPath, newPath); + // We got a new writer, so reset the slow sync count + slowSyncCount.set(0); // Can we delete any of the old log files? if (getNumRolledLogFiles() > 0) { cleanOldLogs(); {code} Seemed better to me to reset the count after we switched the writer, and it was a bug actually that we didn't. Thanks for the catch [~dmanning] > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so o
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827394#comment-16827394 ] Andrew Purtell commented on HBASE-22301: {quote}nit: why not else if checkSlowSync in {{checkLogRoll}}: {quote} Because I always want to check this and update the related metric if we hit this condition. The other conditions can be exclusive with respect to each other. {quote}it seems like {{slowSyncCount.set(0);}} should also be called in requestLogRoll() since we're going to get a new WAL pipeline at that point {quote} Sure, new patch in just a sec. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827391#comment-16827391 ] David Manning commented on HBASE-22301: --- +1 for current patch, but 2 comments below. nit: why not else if checkSlowSync in {{checkLogRoll}}: {code:java} if (checkSlowSync()) { {code} Also, it seems like {{slowSyncCount.set(0);}} should also be called in requestLogRoll() since we're going to get a new WAL pipeline at that point. If we had enough slow syncs during the interval, but then we rolled the WAL for an unrelated reason before the end of the interval, then we will roll again once the interval is up given the current code. Both comments are to try and limit extraneous roll requests, but seem unlikely to cause any major issues. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827383#comment-16827383 ] HBase QA commented on HBASE-22301: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 46s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 44s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s{color} | {color:green} hbase-server: The patch generated 0 new + 94 unchanged - 6 fixed = 94 total (was 100) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 50s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 43s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hba
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827341#comment-16827341 ] Andrew Purtell commented on HBASE-22301: Updated patch for branch-1 with improvements as discussed > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827141#comment-16827141 ] Andrew Purtell commented on HBASE-22301: [~dmanning] It is possible there could be no writes for a long time, not very likely, but we can handle it. If we find that the difference between 'now' and the last time we triggered a roll is twice the monitoring interval when the count finally goes over threshold, we can reset the count instead of requesting a roll. This will prevent the corner case you describe. Regarding the default thresholds in this patch. I picked 10 slow syncs in one minute as a totally arbitrary choice so I could complete the change and get a patch up for consideration. Now let's discuss what should be reasonable defaults. Based on your analysis of our fleet under normal operation this change would result in: - If threshold is 10 slow syncs in 1 minute, we would request ~ 30,000 WAL rolls under normal operating conditions per day over on the order of 100 clusters. Load is distributed unevenly so dividing this number evenly by number of clusters doesn't make sense. This is more than we would want, I think. - If threshold is 200 slow syncs in 1 minute, we would request ~ 475 WAL rolls under normal operating conditions per day over on the order of 100 clusters. This would not be harmful. - During the incident that inspired this change, we had in excess of 500 slow sync warnings in one minute. As mentioned above, slow sync warnings can easily be false positives due to regionserver GC activity, which makes using them as signal problematic, but not unreasonable if we set the thresholds to sufficiently discriminate abnormal conditions. Also, bear in mind that under steady state writes we will frequently roll the log upon reaching the file size roll threshold anyway. False positive slow sync based rolls will be noise among this activity if we set the threshold right. Therefore, I think the next patch will have a default threshold of 100 slow syncs in one minute. Still kind of arbitrary, as defaults tend to be, but given the particular example of our production that would amount to ~950 rolls under normal operating conditions over 100 clusters in one day, but, in trade, it would trigger even if cluster is only under modest write load and would certainly have discriminated the HDFS level issues we encountered during our incident. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826688#comment-16826688 ] David Manning commented on HBASE-22301: --- {quote}To be more precise what I’m thinking is if a long time had elapsed and then we are finally pushed over the threshold, we should set the counter to 1 and return false. We start over rather than trigger a roll. Seems reasonable to me. {quote} Ah yes, I misread the code. I thought {{checkSlowSync}} was being called as part of recognizing we had a slow sync, and not from {{checkLogRoll}} which is happening much more frequently. So it seems much less likely that we would be a long time past the monitoring interval, compared to what I was thinking. I'd also be okay with ignoring my previous comment - even if it is more correct, it seems to only matter in very lightly utilized servers that would not sync to the WAL for extended periods of time, unless I'm missing something. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnorma
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826674#comment-16826674 ] David Manning commented on HBASE-22301: --- I've spent hours looking at logs and assessing the background state of slow syncs throughout various clusters. With the current default of 10 slow syncs per 1 minute interval, we could expect the number of WAL rolls to increase by 15% on a heavily utilized cluster, or 5-7% on other clusters. In these cases, I would not expect improved performance from most of these added rolls, as these are not problematic WAL pipelines, but "normal" pipelines. Perhaps these clusters could be better tuned to avoid so many background slow syncs, but it seems reasonable to think others in the community will also have similar clusters. Based on these data, I would recommend a much higher default of 250 for {{slowSyncRollThreshold}}. For the default of 5 syncer threads, this is almost one slow sync per second if all syncer threads are utilized and waiting on the WAL writer. This would have remedied the problem in our incident, as we were seeing 500-800+ slow syncs reported per minute on affected pipelines (each sync taking 100-500ms+, each sync reported by all 5 threads). However, the threshold is also high enough to prevent sizable increases in WAL rolls in normally operating clusters. From my log investigations, in normal operation of a heavy utilized cluster, this would still add <1% new WAL rolls. It would add ~5% new WAL rolls during spiky traffic (some clusters with large multi requests or hotspotted servers that may be doing some GC thrashing.) For normal operation of a normal cluster, it would add a negligible amount of WAL rolls (<0.01%). Unfortunately, such a high value would not detect a case of the WAL being so slow that it couldn't even perform 50 syncs per minute. If we want to do this, we'll need to be fancier with the logic. Perhaps we would have to sum up all the {{timeInNanos}} in {{postSync}} over the {{slowSyncCheckInterval}}, and then check if we spent greater than X% of the interval in slow syncs. This could catch issues where we spent 5 slow syncs of 10 seconds each, or 100 slow syncs of 500ms each, and request a WAL roll in either case. FWIW, I like this approach, but realize that it adds complexity while we're striving for simplicity. {quote}We could divide the count by number of syncer threads. Or, multiply the theshold by number of threads. Or, simply set a higher threshold. {quote} If we stick with the count-based approach, I recommend 50 multiplied by the number of threads. If we don't want to include number of threads, then I recommend a threshold of 250 (50 times the default of 5 threads). > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's asyn
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826656#comment-16826656 ] Andrew Purtell commented on HBASE-22301: To be more precise what I’m thinking is if a long time had elapsed and then we are finally pushed over the threshold, we should set the counter to 1 and return false. We start over rather than trigger a roll. Seems reasonable to me. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826654#comment-16826654 ] Andrew Purtell commented on HBASE-22301: I like your first suggestion better too [~dmanning] and will put up a new patch with it incorporated tomorrow. Thanks for the idea. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826649#comment-16826649 ] David Manning commented on HBASE-22301: --- {code:java} private boolean checkSlowSync() { boolean result = false; long now = EnvironmentEdgeManager.currentTime(); if (now - lastTimeCheckSlowSync >= slowSyncCheckInterval) { if (slowSyncCount.get() >= slowSyncRollThreshold) { LOG.warn("Requesting log roll because we exceeded slow sync threshold; count=" + slowSyncCount.get() + ", threshold=" + slowSyncRollThreshold + ", current pipeline: " + Arrays.toString(getPipeLine())); result = true; } lastTimeCheckSlowSync = now; slowSyncCount.set(0); } return result; } {code} Assuming {{slowSyncCheckInterval}} is 6, and {{slowSyncRollThreshold}} is 10, what about the scenario where we get 20 slow syncs in 50 seconds, and then we don't get any more slow syncs for an hour. On the next slow sync an hour later, it looks like we will roll the WAL on the first slow sync. Can we check to see that it's not too long after the interval period? If it's been too long after the interval period, we can assume we should be resetting the counters because the previous situation corrected itself. It gets a little messy, but perhaps something like: {code:java} if (now - lastTimeCheckSlowSync >= slowSyncCheckInterval) { if (now - lastTimeCheckSlowSync <= 2 * slowSyncCheckInterval && slowSyncCount.get() >= slowSyncRollThreshold) { {code} Alternatively, resetting {{lastTimeCheckSlowSync}} and {{slowSyncCount}} could also be done in {{requestLogRoll}}. I like that approach less, but it also would make it less likely we would request a WAL roll from one rogue slow sync much later. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we ro
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826614#comment-16826614 ] Sean Busbey commented on HBASE-22301: - I'm +1 on the current patch either as-is or with adjustments for handling the sync thread count and to defaults based on feedback from David. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826609#comment-16826609 ] HBase QA commented on HBASE-22301: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 0s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 53s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 56s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} hbase-server: The patch generated 0 new + 94 unchanged - 6 fixed = 94 total (was 100) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 52s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hba
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826557#comment-16826557 ] Andrew Purtell commented on HBASE-22301: Updated patch implements the logging change requested by [~busbey] > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826232#comment-16826232 ] Andrew Purtell commented on HBASE-22301: {quote}Should the threshold factor in {{hbase.regionserver.hlog.syncer.count}}? {quote} We could divide the count by number of syncer threads. Or, multiply the theshold by number of threads. Or, simply set a higher threshold. The latter is simplest but I'd be interested in thoughts here. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826231#comment-16826231 ] Andrew Purtell commented on HBASE-22301: {quote}maybe the log message could go into checkSlowSync so that the count is still visible {quote} Sure, no problem. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826220#comment-16826220 ] Sean Busbey commented on HBASE-22301: - the approach makes sense. {code} 1354 if (checkSlowSync()) { 1355LOG.warn("Requesting log roll because we exceeded slow sync threshold; threshold=" + 1356 slowSyncRollThreshold + ", current pipeline: " + Arrays.toString(getPipeLine())); 1357requestLogRoll(SLOW_SYNC); 1335 } 1358 } {code} this log message doesn't have enough detail since it's just going to tell me e.g. "10" without saying how slow things had to be over what period of time, nor how many times we actually crossed that line. maybe the log message could go into checkSlowSync so that the count is still visible? or {{checkSlowSync}} could return it and treat "<= 0" to mean "don't request a roll"? > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then ins
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826216#comment-16826216 ] David Manning commented on HBASE-22301: --- I do like the count-based approach better, and think it may offer better results in both a default or well-tuned state. Thank you for presenting that option. I'm trying to review the incident data and non-incident data to help inform the defaults, if possible. So far, I've seen in sample incident data that we had ~800 slow syncs per minute (160 per thread, with 5 syncer threads.) Background level for that cluster, for hot nodes, ends up being around ~10 slow syncs per minute. So I could imagine having a higher default to avoid too much log rolling, but still be a useful default. Should the threshold factor in {{hbase.regionserver.hlog.syncer.count}}? A slow pipeline will be reported X times, where X is the number of syncer threads waiting on the pipeline. I will spend more time looking at data today, and see what I can find. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under b
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826205#comment-16826205 ] Andrew Purtell commented on HBASE-22301: No. The problem is GC activity is indistinguishable from real slow syncs if you only examine a single data point, unless you set a very high threshold, and then we would probably not trigger enough to make a difference. Data from our incident shows a train of slow sync warnings, a few peaks at 1-3 seconds. Unlikely triggering only on the rare peak outliers would have made a difference. The conservative 10s trigger in this patch would never have been reached. Instead, if we triggered on trains of smaller data points in the range of 200-600ms the mitigation would have fired enough to make a difference and these trains correlated to real problems not GC activity. And by GC activity I mean that of the regionserver process. As you probably know any one or a handful of slow sync warnings can be false positives due to GC rather than real latency on the pipeline. It makes things difficult here. We can try to avoid false positives either by setting a high latency threshold or by waiting for an unusual number to occur within some window of time. There are patches for review that take either approach. It would seem the high threshold approach may not offer enough mitigation in practice given the data on hand. At any rate the thresholds are tunable and can be experimented with in production to find the right trade off, and the feature is self limiting so slow sync triggered log rolls do not become a problem themselves. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826182#comment-16826182 ] Sean Busbey commented on HBASE-22301: - my next review cycle will probably be late friday or monday, just fyi. is the concern that *we* had a GC pause that crossed the 10s threshold or that one of the DNs in the pipeline did? > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825703#comment-16825703 ] HBase QA commented on HBASE-22301: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 2s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 45s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 37s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} hbase-server: The patch generated 0 new + 94 unchanged - 6 fixed = 94 total (was 100) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 38s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 38s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s{color} | {color:green} hbase
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825639#comment-16825639 ] Andrew Purtell commented on HBASE-22301: Currently there are two patches on this issue: one posted seven hours ago that got a +1, and the alternative approach described above. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825637#comment-16825637 ] Andrew Purtell commented on HBASE-22301: Thanks [~busbey] [~dmanning] convinced me a count based threshold might be better. Let me attach a new version of this patch. Please let me know what you think. We could commit either option. The drawback to rolling on detection of a single latency outlier is that the threshold might have to be set really high to exclude "slow syncs" which are really GC activity. We went back and forth on what threshold might be appropriate to avoid false positives yet still be effective. I still want to do something simple, so instead of creating a new latency threshold we reuse the existing slow sync warn threshold, and then count how many times we exceed the slow sync warn threshold within a longer and configurable interval. If we counted enough warnings over that latter interval, then we ask for a roll. We can only ask once per interval so retain a limit on pacing to prevent runaway roll requests during an outage or incident. This can be a better strategy because while it is possible a single outlier is GC activity, the probability of many data points being a false positive is less the more there are. Defaults in new patch is an interval of one minute; and a count based threshold of ten slow sync warnings within that interval. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825621#comment-16825621 ] Sean Busbey commented on HBASE-22301: - +1 on the branch-1 patch > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825537#comment-16825537 ] Andrew Purtell commented on HBASE-22301: The latest precommit failures do not appear to be related to this patch. Just in case I tried a local reproduction and was unable to reproduce. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825526#comment-16825526 ] HBase QA commented on HBASE-22301: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 45s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} branch-1 passed with JDK v1.8.0_202 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} branch-1 passed with JDK v1.7.0_211 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 47s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 49s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} branch-1 passed with JDK v1.8.0_202 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} branch-1 passed with JDK v1.7.0_211 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed with JDK v1.8.0_202 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed with JDK v1.7.0_211 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} hbase-server: The patch generated 0 new + 94 unchanged - 6 fixed = 94 total (was 100) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 51s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed with JDK v1.8.0_202 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed with JDK v1.7.0_211 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hbase
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825437#comment-16825437 ] Andrew Purtell commented on HBASE-22301: Updated patch. Fixes checkstyle nit. I broke the test by updating the config constants per request in FSHLog in a fast pass without running the test, oops. Fixed. Cannot reproduce TestFailedAppendAndSync failure. Although it is flagged as a medium test it completes almost instantly and never fails. Forgot to update FSHLog FIXED_OVERHEAD, did so this time around. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825353#comment-16825353 ] Andrew Purtell commented on HBASE-22301: There is no default switch case by deliberate choice, but since it triggers checkstyle I'll change that. Whoops on the test timeout, will make it 10x. Agree TestOfflineMetaRebuildBase failure is not related. Let me update the patch in a bit, and we need one for branch-2 and up, at least for the same code in the sync WAL. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824992#comment-16824992 ] Sean Busbey commented on HBASE-22301: - bq. hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase I don't think this one is related. it just started to fail in nightly last build. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824990#comment-16824990 ] Sean Busbey commented on HBASE-22301: - {code} java.lang.AssertionError: The regionserver should have thrown an exception at org.apache.hadoop.hbase.regionserver.TestFailedAppendAndSync.testLockupAroundBadAssignSync(TestFailedAppendAndSync.java:258) {code} This looks like it might be related, since the changed code path is getting exercised, but I haven't dug in enough to figure out what's going on with it. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824989#comment-16824989 ] Sean Busbey commented on HBASE-22301: - {code} ./hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/MetricsWAL.java:77: switch (reason) {: switch without "default" clause. [MissingSwitchDefault] {code} this is a good find from checkstyle, since without a default that fails it'll be easy for someone to add to the enum but forget to update the metrics. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824988#comment-16824988 ] Sean Busbey commented on HBASE-22301: - {code} junit.framework.AssertionFailedError: Waiting timed out after [1,000] msec at org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testSlowSyncLogRolling(TestLogRolling.java:321) {code} a per-test timeout of 1s isn't going to work for this test. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824812#comment-16824812 ] HBase QA commented on HBASE-22301: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 1s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 35s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 48s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 51s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 21s{color} | {color:red} hbase-server: The patch generated 1 new + 95 unchanged - 5 fixed = 96 total (was 100) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 47s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 56s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s{color} | {color:green} The patch does not generate ASF License warnings. {colo
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824805#comment-16824805 ] HBase QA commented on HBASE-22301: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 50s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 47s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 49s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 23s{color} | {color:red} hbase-server: The patch generated 1 new + 95 unchanged - 5 fixed = 96 total (was 100) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 49s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 39s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}115m 1s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s{color} | {color:green} The patch does not generate ASF License warnings. {colo
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824747#comment-16824747 ] HBase QA commented on HBASE-22301: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 57s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 3s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 35s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 6s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 37s{color} | {color:green} branch-1 passed with JDK v1.8.0_202 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s{color} | {color:green} branch-1 passed with JDK v1.7.0_211 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 59s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 56s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 30s{color} | {color:green} branch-1 passed with JDK v1.8.0_202 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 27s{color} | {color:green} branch-1 passed with JDK v1.7.0_211 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 0s{color} | {color:green} the patch passed with JDK v1.8.0_202 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s{color} | {color:green} the patch passed with JDK v1.7.0_211 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 30s{color} | {color:red} hbase-server: The patch generated 1 new + 95 unchanged - 5 fixed = 96 total (was 100) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 44s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 40s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed with JDK v1.8.0_202 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed with JDK v1.7.0_211 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 53s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {colo
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824742#comment-16824742 ] Andrew Purtell commented on HBASE-22301: {quote}Sorry, just realized {{hbase.regionserver.hlog.slowsync.ms}} was existing. {quote} I changed the names as requested but added code to fall back to the old config setting if the new one isn't found. No harm there. Only planning to put this in for new minors, also, esp. 1.5.0. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824740#comment-16824740 ] Andrew Purtell commented on HBASE-22301: Updated patch uses "wal" instead of "hlog" and "log" in config and metrics changes. Also I lowered the default interval we wait between slow sync based roll requests from 5 minutes to 2 minutes. We want to be responsive yet not flake on transient issues nor exacerbate HDFS layer instability. I think the new default is a better trade off. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, > HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824737#comment-16824737 ] Sean Busbey commented on HBASE-22301: - Oof yeah. Patch viewing on mobile is rough. Sorry, just realized {{hbase.regionserver.hlog.slowsync.ms}} was existing. Looks like configs on master still use "hlog" so let's leave them as-is. If that use in configs is worth changing let's stick to a different jira and just do all of them for e.g. 3.0.0. Using WAL in the metrics and log messages will probably save me an explanation on the future though. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824735#comment-16824735 ] Andrew Purtell commented on HBASE-22301: Sure we can use "wal" instead of "hlog" and "log" since things are changing anyway. I kept them similar to surrounding code is all. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824734#comment-16824734 ] Sean Busbey commented on HBASE-22301: - This is great! Two minor requests: Can the configs use "wal" instead of "hlog"? Can the metrics use "wal" instead of "log"? (So as make clear they're not about the system log output from the region server process) > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824733#comment-16824733 ] Andrew Purtell commented on HBASE-22301: Updated patch with a small improvement in FSHLog#checkLogRoll. While it's harmless to request a roll if any of the three implemented conditions are met, and an argument could be made it would be good to reflect in the logs and metrics that more than one condition applied, I do think it would be counter to what operators would expect to see in the metrics. So let's stop when the first condition applies. > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
[ https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824672#comment-16824672 ] Andrew Purtell commented on HBASE-22301: Already verified that modified unit tests TestMetricsWAL and TestLogRolling pass > Consider rolling the WAL if the HDFS write pipeline is slow > --- > > Key: HBASE-22301 > URL: https://issues.apache.org/jira/browse/HBASE-22301 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: HBASE-22301-branch-1.patch > > > Consider the case when a subset of the HDFS fleet is unhealthy but suffering > a gray failure not an outright outage. HDFS operations, notably syncs, are > abnormally slow on pipelines which include this subset of hosts. If the > regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be > consumed waiting for acks from the datanodes in the pipeline (recall that > some of them are sick). Imagine a write heavy application distributing load > uniformly over the cluster at a fairly high rate. With the WAL subsystem > slowed by HDFS level issues, all handlers can be blocked waiting to append to > the WAL. Once all handlers are blocked, the application will experience > backpressure. All (HBase) clients eventually have too many outstanding writes > and block. > Because the application is distributing writes near uniformly in the > keyspace, the probability any given service endpoint will dispatch a request > to an impacted regionserver, even a single regionserver, approaches 1.0. So > the probability that all service endpoints will be affected approaches 1.0. > In order to break the logjam, we need to remove the slow datanodes. Although > there is HDFS level monitoring, mechanisms, and procedures for this, we > should also attempt to take mitigating action at the HBase layer as soon as > we find ourselves in trouble. It would be enough to remove the affected > datanodes from the writer pipelines. A super simple strategy that can be > effective is described below: > This is with branch-1 code. I think branch-2's async WAL can mitigate but > still can be susceptible. branch-2 sync WAL is susceptible. > We already roll the WAL writer if the pipeline suffers the failure of a > datanode and the replication factor on the pipeline is too low. We should > also consider how much time it took for the write pipeline to complete a sync > the last time we measured it, or the max over the interval from now to the > last time we checked. If the sync time exceeds a configured threshold, roll > the log writer then too. Fortunately we don't need to know which datanode is > making the WAL write pipeline slow, only that syncs on the pipeline are too > slow and exceeding a threshold. This is enough information to know when to > roll it. Once we roll it, we will get three new randomly selected datanodes. > On most clusters the probability the new pipeline includes the slow datanode > will be low. (And if for some reason it does end up with a problematic > datanode again, we roll again.) > This is not a silver bullet but this can be a reasonably effective mitigation. > Provide a metric for tracking when log roll is requested (and for what > reason). > Emit a log line at log roll time that includes datanode pipeline details for > further debugging and analysis, similar to the existing slow FSHLog sync log > line. > If we roll too many times within a short interval of time this probably means > there is a widespread problem with the fleet and so our mitigation is not > helping and may be exacerbating those problems or operator difficulties. > Ensure log roll requests triggered by this new feature happen infrequently > enough to not cause difficulties under either normal or abnormal conditions. > A very simple strategy that could work well under both normal and abnormal > conditions is to define a fairly lengthy interval, default 5 minutes, and > then insure we do not roll more than once during this interval for this > reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)