[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-05-05 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833435#comment-16833435
 ] 

Andrew Purtell commented on HBASE-22301:


Let me circle back here. As it turns out the team was eventually able to 
observe the underlying cause almost as it happened. The latency sensitive high 
ingest cluster spans on the order of low hundreds of servers across several 
racks. Occasionally a handful of failing network links in this medium scale 
installation would experience elevated error rates. When this would happen the 
default Linux TCP retransmission timeout of 200 milliseconds would need to 
elapse before the DFS pipeline could make further progress. Long trains of 200 
ms delays or longer would effectively stall the writer pipelines transiting the 
affected links and the stall would eventually propagate up to the application. 
The root cause has been addressed. This was more difficult for us to track down 
than it should have been because although we were collecting metrics on TCP 
retransmissions we were not utilizing them for alerting or problem 
investigation. That oversight will be rectified and I offer it as general 
advice. 

The mitigation in this patch would have been useful had it been in place. We 
took other actions that in effect rolled writer pipelines until the pipeline 
was utilizing datanodes and network circuits not impacted by hardware issues. 
I’m confident a repeat of the circumstances of this incident will be less 
impactful with this in place. 

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mi

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-05-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831458#comment-16831458
 ] 

Hudson commented on HBASE-22301:


Results for branch branch-2
[build #1860 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1860/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1860//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1860//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1860//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exace

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-05-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831446#comment-16831446
 ] 

Hudson commented on HBASE-22301:


Results for branch master
[build #976 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/976/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/976//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/976//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/976//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those p

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-05-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831415#comment-16831415
 ] 

Hudson commented on HBASE-22301:


Results for branch branch-1
[build #802 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/802/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/802//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/802//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/802//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered b

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-05-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831359#comment-16831359
 ] 

Andrew Purtell commented on HBASE-22301:


Ok, thank you, ROLL_ON_SYNC_TIME_MS should be 
hbase.regionserver.wal.roll.on.sync.ms, I will fix that on commit.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-05-01 Thread David Manning (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831337#comment-16831337
 ] 

David Manning commented on HBASE-22301:
---

Other {{hlog}} references in config settings were updated to {{wal}} across the 
branches as part of the previous patch, but the latest patch is now adding a 
{{hlog}} reference to branch-1 and branch-2 (because of the backport of the 
HBASE-21806 from master) As long as that's the desired design, that's ok (as 
you say, they are synchronized across the branches), but it just looked weird 
to me given the other changes towards {{wal}}.
{code:java}
static final String SLOW_SYNC_TIME_MS ="hbase.regionserver.wal.slowsync.ms";
static final String ROLL_ON_SYNC_TIME_MS = 
"hbase.regionserver.hlog.roll.on.sync.ms";
static final String SLOW_SYNC_ROLL_THRESHOLD = 
"hbase.regionserver.wal.slowsync.roll.threshold";
static final String SLOW_SYNC_ROLL_INTERVAL_MS = 
"hbase.regionserver.wal.slowsync.roll.interval.ms";
{code}


> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered b

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-05-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831267#comment-16831267
 ] 

Andrew Purtell commented on HBASE-22301:


All configs are synchronized across the branches, except in master where we 
don't do the fallback to deprecated ones.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-05-01 Thread David Manning (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831245#comment-16831245
 ] 

David Manning commented on HBASE-22301:
---

Looks good to me. Regarding earlier comment from [~busbey] should 
{{hbase.regionserver.hlog.roll.on.sync.ms}} be changed to 
{{hbase.regionserver.wal.roll.on.sync.ms}} in branch-1, branch-2 and master?

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-05-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831120#comment-16831120
 ] 

Andrew Purtell commented on HBASE-22301:


Precommit results look good. This checkstyle nit I won't be able to fix:

{code}TestLogRolling.java:106:  @Test:3: Method length is 164 lines (max 
allowed is 150). [MethodLength]{code}

I have a +1 already for the branch-1 work. Unless objection I am going to 
commit this to branch-1, branch-2, and master today, after running more local 
checks. If you have any concerns please post them now.


> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-30 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830834#comment-16830834
 ] 

HBase QA commented on HBASE-22301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
23s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
8s{color} | {color:red} hbase-server: The patch generated 1 new + 24 unchanged 
- 0 fixed = 25 total (was 24) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
22s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}140m 
31s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}186m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/PreCommit-HBASE-Build/223/artifact/patchprocess/Dockerfile
 |
| JIRA Issue | HBASE-22301 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967533/HBASE-22301.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 50369a6b5d7f 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personal

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-30 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830754#comment-16830754
 ] 

Andrew Purtell commented on HBASE-22301:


With regard to this functional area there are three distinct WAL 
implementations, one in the branch-1s, one in the branch-2s, and one in master 
branch. Where possible I made small and careful refactors to make them more 
similar. The result is three patches, one for branch-1, one for branch-2, and 
one for master. TestLogRolling and its new unit testing the changes in this 
change pass in all three versions. Let's see what precommit says, modulo 
possible unrelated failures, and I am running the complete suite locally for 
these changes now. 

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-2.patch, HBASE-22301.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, a

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-30 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830509#comment-16830509
 ] 

Andrew Purtell commented on HBASE-22301:


I see the new checkstyle warnings, let me fix them for branch-1 and post a new 
patch for that along with the branch-2/master patch forthcoming...

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-30 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830507#comment-16830507
 ] 

Andrew Purtell commented on HBASE-22301:


I ran TestLogRolling 25 times in a loop and it passed every time. I don't think 
my additions will make it flaky, although the new test is expensive in terms of 
time, and overall this unit now takes too long. I will file a follow up to 
break it up. 

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-30 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830477#comment-16830477
 ] 

Sean Busbey commented on HBASE-22301:
-

should be in HFileSystem

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-30 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830463#comment-16830463
 ] 

Sergey Shelukhin commented on HBASE-22301:
--

It may do so anyway, due to HDFS-14387 
What is the thing that prevents it from picking local node
+1

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-30 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830224#comment-16830224
 ] 

Sean Busbey commented on HBASE-22301:
-

WAL should not be picking a local DN at all for WAL pipelines unless our stuff 
for telling DFS to not do that is broken. if it is, that's a different problem 
from what is being solved here.

+1 on current union of approaches.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-30 Thread chenxu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830031#comment-16830031
 ] 

chenxu commented on HBASE-22301:


There used to be a JIRA here to do the same thing (HBASE-20902), upload the 
patch recently, hope you can review it for us.:)

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829981#comment-16829981
 ] 

Andrew Purtell commented on HBASE-22301:


bq. DFSClient will choose local DN first, if RS & DN deployed on the same node, 
the local slow DN will always be selected, so i think we should exclude the 
slow DN first.

This is indeed supposed to be a mitigation while the slow DNs are found and 
removed. See top post. It is not expected to be nor intended to be a total 
solution. But it is better than the current state of affairs (which does 
nothing)

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread chenxu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829974#comment-16829974
 ] 

chenxu commented on HBASE-22301:


{quote}On most clusters the probability the new pipeline includes the slow 
datanode will be low.
{quote}
DFSClient will choose local DN first, if RS & DN deployed on the same node, the 
local slow DN will always be selected, so i think we should exclude the slow DN 
first.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829907#comment-16829907
 ] 

HBase QA commented on HBASE-22301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
53s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
50s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
23s{color} | {color:red} hbase-server: The patch generated 3 new + 94 unchanged 
- 6 fixed = 97 total (was 100) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
48s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}101m 30s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{colo

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829867#comment-16829867
 ] 

Andrew Purtell commented on HBASE-22301:


Assuming this is committed I will file a follow up to break up TestLogRolling. 
Running time for me now is over five minutes, about 350 seconds. Some of the 
other cases within this unit run for a long time too besides the additions from 
this patch.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829862#comment-16829862
 ] 

Andrew Purtell commented on HBASE-22301:


Attached updated patch that takes a union of the approaches, as discussed.

I also added negative tests before and after the positive tests in the unit. 
Submitting for precommit but let me also run this test in a loop to be as 
diligent as I can about not introducing a flake. If I see a problem I will 
report it; otherwise assume all good.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829820#comment-16829820
 ] 

Andrew Purtell commented on HBASE-22301:


Just to be clear I'm not suggesting a revert of HBASE-21806  [~sershe]. This 
was our original approach. Based on data from an incident retrospective it 
wouldn't have been a sufficient mitigation, though, so we had to set it aside 
as too simplistic and prone to false positives at the latency threshold we 
need, but I would not claim it is a bad thing to do, especially if made a 
difference to you at the high threshold committed with that patch.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829810#comment-16829810
 ] 

Andrew Purtell commented on HBASE-22301:


Maybe we can consider a future enhancement that does weighting or smoothing or 
even prediction (phi-accrual?) but we run this risk of spending a lot of effort 
over-engineering a mitigation only thing that might not see much use nor prove 
to be useful enough. The patch here counts the number of slow sync warnings 
within an interval. If we trip a limit over that time, we will roll. I am 
working on a union patch that will trigger a roll as soon as we see an outlier 
beyond the threshold introduced by HBASE-21806. So we will roll:
- If any single sync exceeds the HBASE-21806 threshold
- If a the count of slow syncs within the current monitoring interval exceeds a 
threshold (the last revs of the patch on this issue)


> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> 

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829803#comment-16829803
 ] 

Sergey Shelukhin commented on HBASE-22301:
--

Well I meant  Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829801#comment-16829801
 ] 

Andrew Purtell commented on HBASE-22301:


bq. Should the rolling simply based on a single value that is a total/weighted 
sync time accumulated over a N latest syncs, with no minimum threshold by 
count? That way it can accumulate the offending amount of sync over a single 
bad one or multiple somewhat-bad ones.

That is the approach we took. (smile) More or less.

I don't see a harm with a union approach. Our approach here that counts the 
small slow sync warns and triggers on a train of them will trigger rolls when 
we needed them in our incidents. The separate single high latency trigger that 
HBASE-21806 added will I assume trigger rolls when you needed them in your 
incident(s).


> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
>

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829797#comment-16829797
 ] 

Sergey Shelukhin commented on HBASE-22301:
--

Sorry about the master only.  I just started contributing to HBase again and 
was assuming that we should move forwards, not backwards ;)
I saw that branch-2 is still very much alive now so I'm committing recent fixes 
there too.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829796#comment-16829796
 ] 

Andrew Purtell commented on HBASE-22301:


I sent an email to dev@hbase titled "Trunk only commits are a waste of 
everyone's time". I am making this claim on that thread (smile). Let's take any 
response to that to the email thread. 

Back soon with a patch for branch-1 that takes the union of this approach and 
HBASE-21806

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829795#comment-16829795
 ] 

Sergey Shelukhin commented on HBASE-22301:
--

In our case though the problem was that each slow sync would take 10s of 
seconds, so with current DEFAULT_SLOW_SYNC_ROLL_THRESHOLD as far as I can tell 
from the patch the condition would not trigger for a very long time.
Should the rolling simply based on a single value that is a total/weighted sync 
time accumulated over a N latest syncs, with no minimum threshold by count? 
That way it can accumulate the offending amount of sync over a single bad one 
or multiple somewhat-bad ones.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829792#comment-16829792
 ] 

Andrew Purtell commented on HBASE-22301:


For example, in the incident we discussed, and the data discussed above, the 
threshold of 10s HBASE-21806 would never have been tripped. So instead we count 
trains of slow syncs with the lower default threshold and get an effective 
mitigation.

I'd be happy to keep the hbase.regionserver.hlog.roll.on.sync.ms config from 
HBASE-21806 and the simple threshold trigger. I would have discovered these 
when trying to forward port. Sucks that there was no attempt to backport so 
none of us who work with branch-1 or even branch-2 had any idea this existed. 
Folks committing trunk only are doing a disservice to the community.

Let me make a union of these mechanisms in the context of branch-1 and post it 
so you can see what I'm talking about before proceeding.

I have just about exceeded the time I can spend on this and will have to move 
on soon. We have been going around and around on this for a week straight.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this 

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829789#comment-16829789
 ] 

Andrew Purtell commented on HBASE-22301:


No, if you read above we arrived at a different way of doing it. Basically the 
HBASE-21806 approach is too simplistic and prone to false positives. I'd prefer 
to replace HBASE-21806 with this approach.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829784#comment-16829784
 ] 

Sergey Shelukhin commented on HBASE-22301:
--

Should this augment/be similar to HBASE-21806?

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829676#comment-16829676
 ] 

Andrew Purtell commented on HBASE-22301:


Great, to move forward I'll make a patch for branch-2 and master for the sync 
WAL. Back soon.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-26 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827428#comment-16827428
 ] 

HBase QA commented on HBASE-22301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
46s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
48s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} hbase-server: The patch generated 0 new + 94 
unchanged - 6 fixed = 94 total (was 100) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
54s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 43s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hbase

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-26 Thread David Manning (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827426#comment-16827426
 ] 

David Manning commented on HBASE-22301:
---

{quote}Because I always want to check this and update the related metric if we 
hit this condition. The other conditions can be exclusive with respect to each 
other.
{quote}
It will now be theoretically possible for {{checkLogRoll}} to call 
{{requestLogRoll}} twice in a row. If this is acceptable, then sounds good to 
me. I couldn't immediately tell whether this would really roll twice or would 
coalesce into one WAL roll.

+1 from me, thank you for the fix!

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-26 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827398#comment-16827398
 ] 

Andrew Purtell commented on HBASE-22301:


Updated patch, here is the change:
{code:java}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
index 24a2271564..b562946671 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
@@ -734,6 +734,8 @@ public class FSHLog implements WAL {
 // NewPath could be equal to oldPath if replaceWriter fails.
 newPath = replaceWriter(oldPath, newPath, nextWriter, nextHdfsOut);
 tellListenersAboutPostLogRoll(oldPath, newPath);
+    // We got a new writer, so reset the slow sync count
+    slowSyncCount.set(0);
 // Can we delete any of the old log files?
 if (getNumRolledLogFiles() > 0) {
   cleanOldLogs();
{code}
Seemed better to me to reset the count after we switched the writer, and it was 
a bug actually that we didn't. Thanks for the catch [~dmanning]

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so o

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-26 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827394#comment-16827394
 ] 

Andrew Purtell commented on HBASE-22301:


{quote}nit: why not else if checkSlowSync in {{checkLogRoll}}:
{quote}
Because I always want to check this and update the related metric if we hit 
this condition. The other conditions can be exclusive with respect to each 
other.
{quote}it seems like {{slowSyncCount.set(0);}} should also be called in 
requestLogRoll() since we're going to get a new WAL pipeline at that point
{quote}
Sure, new patch in just a sec.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-26 Thread David Manning (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827391#comment-16827391
 ] 

David Manning commented on HBASE-22301:
---

+1 for current patch, but 2 comments below.

nit: why not else if checkSlowSync in {{checkLogRoll}}:
{code:java}
  if (checkSlowSync()) {
{code}
Also, it seems like {{slowSyncCount.set(0);}} should also be called in 
requestLogRoll() since we're going to get a new WAL pipeline at that point. If 
we had enough slow syncs during the interval, but then we rolled the WAL for an 
unrelated reason before the end of the interval, then we will roll again once 
the interval is up given the current code.

Both comments are to try and limit extraneous roll requests, but seem unlikely 
to cause any major issues.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-26 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827383#comment-16827383
 ] 

HBase QA commented on HBASE-22301:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
46s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
44s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} hbase-server: The patch generated 0 new + 94 
unchanged - 6 fixed = 94 total (was 100) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
50s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 43s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} hba

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-26 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827341#comment-16827341
 ] 

Andrew Purtell commented on HBASE-22301:


Updated patch for branch-1 with improvements as discussed

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-26 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827141#comment-16827141
 ] 

Andrew Purtell commented on HBASE-22301:


[~dmanning] It is possible there could be no writes for a long time, not very 
likely, but we can handle it. If we find that the difference between 'now' and 
the last time we triggered a roll is twice the monitoring interval when the 
count finally goes over threshold, we can reset the count instead of requesting 
a roll. This will prevent the corner case you describe. 

Regarding the default thresholds in this patch. I picked 10 slow syncs in one 
minute as a totally arbitrary choice so I could complete the change and get a 
patch up for consideration. Now let's discuss what should be reasonable 
defaults.

Based on your analysis of our fleet under normal operation this change would 
result in:
 - If threshold is 10 slow syncs in 1 minute, we would request ~ 30,000 WAL 
rolls under normal operating conditions per day over on the order of 100 
clusters. Load is distributed unevenly so dividing this number evenly by number 
of clusters doesn't make sense. This is more than we would want, I think. 
 - If threshold is 200 slow syncs in 1 minute, we would request ~ 475 WAL rolls 
under normal operating conditions per day over on the order of 100 clusters. 
This would not be harmful. 
 - During the incident that inspired this change, we had in excess of 500 slow 
sync warnings in one minute.

As mentioned above, slow sync warnings can easily be false positives due to 
regionserver GC activity, which makes using them as signal problematic, but not 
unreasonable if we set the thresholds to sufficiently discriminate abnormal 
conditions.

Also, bear in mind that under steady state writes we will frequently roll the 
log upon reaching the file size roll threshold anyway. False positive slow sync 
based rolls will be noise among this activity if we set the threshold right.

Therefore, I think the next patch will have a default threshold of 100 slow 
syncs in one minute. Still kind of arbitrary, as defaults tend to be, but given 
the particular example of our production that would amount to ~950 rolls under 
normal operating conditions over 100 clusters in one day, but, in trade, it 
would trigger even if cluster is only under modest write load and would 
certainly have discriminated the HDFS level issues we encountered during our 
incident.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to 

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread David Manning (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826688#comment-16826688
 ] 

David Manning commented on HBASE-22301:
---

{quote}To be more precise what I’m thinking is if a long time had elapsed and 
then we are finally pushed over the threshold, we should set the counter to 1 
and return false. We start over rather than trigger a roll. Seems reasonable to 
me.
{quote}
Ah yes, I misread the code. I thought {{checkSlowSync}} was being called as 
part of recognizing we had a slow sync, and not from {{checkLogRoll}} which is 
happening much more frequently. So it seems much less likely that we would be a 
long time past the monitoring interval, compared to what I was thinking. I'd 
also be okay with ignoring my previous comment - even if it is more correct, it 
seems to only matter in very lightly utilized servers that would not sync to 
the WAL for extended periods of time, unless I'm missing something.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnorma

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread David Manning (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826674#comment-16826674
 ] 

David Manning commented on HBASE-22301:
---

I've spent hours looking at logs and assessing the background state of slow 
syncs throughout various clusters. With the current default of 10 slow syncs 
per 1 minute interval, we could expect the number of WAL rolls to increase by 
15% on a heavily utilized cluster, or 5-7% on other clusters. In these cases, I 
would not expect improved performance from most of these added rolls, as these 
are not problematic WAL pipelines, but "normal" pipelines. Perhaps these 
clusters could be better tuned to avoid so many background slow syncs, but it 
seems reasonable to think others in the community will also have similar 
clusters.

Based on these data, I would recommend a much higher default of 250 for 
{{slowSyncRollThreshold}}. For the default of 5 syncer threads, this is almost 
one slow sync per second if all syncer threads are utilized and waiting on the 
WAL writer. This would have remedied the problem in our incident, as we were 
seeing 500-800+ slow syncs reported per minute on affected pipelines (each sync 
taking 100-500ms+, each sync reported by all 5 threads). However, the threshold 
is also high enough to prevent sizable increases in WAL rolls in normally 
operating clusters. From my log investigations, in normal operation of a heavy 
utilized cluster, this would still add <1% new WAL rolls. It would add ~5% new 
WAL rolls during spiky traffic (some clusters with large multi requests or 
hotspotted servers that may be doing some GC thrashing.) For normal operation 
of a normal cluster, it would add a negligible amount of WAL rolls (<0.01%).

Unfortunately, such a high value would not detect a case of the WAL being so 
slow that it couldn't even perform 50 syncs per minute. If we want to do this, 
we'll need to be fancier with the logic. Perhaps we would have to sum up all 
the {{timeInNanos}} in {{postSync}} over the {{slowSyncCheckInterval}}, and 
then check if we spent greater than X% of the interval in slow syncs. This 
could catch issues where we spent 5 slow syncs of 10 seconds each, or 100 slow 
syncs of 500ms each, and request a WAL roll in either case. FWIW, I like this 
approach, but realize that it adds complexity while we're striving for 
simplicity.
{quote}We could divide the count by number of syncer threads. Or, multiply the 
theshold by number of threads. Or, simply set a higher threshold.
{quote}
If we stick with the count-based approach, I recommend 50 multiplied by the 
number of threads. If we don't want to include number of threads, then I 
recommend a threshold of 250 (50 times the default of 5 threads).

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's asyn

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826656#comment-16826656
 ] 

Andrew Purtell commented on HBASE-22301:


To be more precise what I’m thinking is if a long time had elapsed and then we 
are finally pushed over the threshold, we should set the counter to 1 and 
return false. We start over rather than trigger a roll. Seems reasonable to me. 

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826654#comment-16826654
 ] 

Andrew Purtell commented on HBASE-22301:


I like your first suggestion better too [~dmanning] and will put up a new patch 
with it incorporated tomorrow. Thanks for the idea. 

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread David Manning (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826649#comment-16826649
 ] 

David Manning commented on HBASE-22301:
---

 
{code:java}
private boolean checkSlowSync() {
  boolean result = false;
  long now = EnvironmentEdgeManager.currentTime();
  if (now - lastTimeCheckSlowSync >= slowSyncCheckInterval) {
if (slowSyncCount.get() >= slowSyncRollThreshold) {
  LOG.warn("Requesting log roll because we exceeded slow sync threshold; 
count=" +
slowSyncCount.get() + ", threshold=" + slowSyncRollThreshold +
", current pipeline: " + Arrays.toString(getPipeLine()));
  result = true;
}
lastTimeCheckSlowSync = now;
slowSyncCount.set(0);
  }
  return result;
}
{code}
Assuming {{slowSyncCheckInterval}} is 6, and {{slowSyncRollThreshold}} is 
10, what about the scenario where we get 20 slow syncs in 50 seconds, and then 
we don't get any more slow syncs for an hour. On the next slow sync an hour 
later, it looks like we will roll the WAL on the first slow sync.

Can we check to see that it's not too long after the interval period? If it's 
been too long after the interval period, we can assume we should be resetting 
the counters because the previous situation corrected itself. It gets a little 
messy, but perhaps something like:
{code:java}
  if (now - lastTimeCheckSlowSync >= slowSyncCheckInterval) {
if (now - lastTimeCheckSlowSync <= 2 * slowSyncCheckInterval && 
slowSyncCount.get() >= slowSyncRollThreshold) {
{code}
Alternatively, resetting {{lastTimeCheckSlowSync}} and {{slowSyncCount}} could 
also be done in {{requestLogRoll}}. I like that approach less, but it also 
would make it less likely we would request a WAL roll from one rogue slow sync 
much later.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we ro

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826614#comment-16826614
 ] 

Sean Busbey commented on HBASE-22301:
-

I'm +1 on the current patch either as-is or with adjustments for handling the 
sync thread count and to defaults based on feedback from David.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826609#comment-16826609
 ] 

HBase QA commented on HBASE-22301:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 0s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
53s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
56s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} hbase-server: The patch generated 0 new + 94 
unchanged - 6 fixed = 94 total (was 100) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
52s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} hba

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826557#comment-16826557
 ] 

Andrew Purtell commented on HBASE-22301:


Updated patch implements the logging change requested by [~busbey]

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826232#comment-16826232
 ] 

Andrew Purtell commented on HBASE-22301:


{quote}Should the threshold factor in {{hbase.regionserver.hlog.syncer.count}}?
{quote}
We could divide the count by number of syncer threads. Or, multiply the 
theshold by number of threads. Or, simply set a higher threshold.

The latter is simplest but I'd be interested in thoughts here.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826231#comment-16826231
 ] 

Andrew Purtell commented on HBASE-22301:


{quote}maybe the log message could go into checkSlowSync so that the count is 
still visible
{quote}
Sure, no problem.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826220#comment-16826220
 ] 

Sean Busbey commented on HBASE-22301:
-

the approach makes sense.

{code}

1354  if (checkSlowSync()) {
1355LOG.warn("Requesting log roll because we exceeded slow sync 
threshold; threshold=" +
1356  slowSyncRollThreshold + ", current pipeline: " + 
Arrays.toString(getPipeLine()));
1357requestLogRoll(SLOW_SYNC);
1335  } 1358  }
{code}

this log message doesn't have enough detail since it's just going to tell me 
e.g. "10" without saying how slow things had to be over what period of time, 
nor how many times we actually crossed that line.

maybe the log message could go into checkSlowSync so that the count is still 
visible? or {{checkSlowSync}} could return it and treat "<= 0" to mean "don't 
request a roll"?

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then ins

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread David Manning (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826216#comment-16826216
 ] 

David Manning commented on HBASE-22301:
---

I do like the count-based approach better, and think it may offer better 
results in both a default or well-tuned state. Thank you for presenting that 
option. I'm trying to review the incident data and non-incident data to help 
inform the defaults, if possible. So far, I've seen in sample incident data 
that we had ~800 slow syncs per minute (160 per thread, with 5 syncer threads.) 
Background level for that cluster, for hot nodes, ends up being around ~10 slow 
syncs per minute. So I could imagine having a higher default to avoid too much 
log rolling, but still be a useful default. Should the threshold factor in 
{{hbase.regionserver.hlog.syncer.count}}? A slow pipeline will be reported X 
times, where X is the number of syncer threads waiting on the pipeline.

I will spend more time looking at data today, and see what I can find.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under b

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826205#comment-16826205
 ] 

Andrew Purtell commented on HBASE-22301:


No. The problem is GC activity is indistinguishable from real slow syncs if you 
only examine a single data point, unless you set a very high threshold, and 
then we would probably not trigger enough to make a difference. Data from our 
incident shows a train of slow sync warnings, a few peaks at 1-3 seconds. 
Unlikely triggering only on the rare peak outliers would have made a 
difference. The conservative 10s trigger in this patch would never have been 
reached. Instead, if we triggered on trains of smaller data points in the range 
of 200-600ms the mitigation would have fired enough to make a difference and 
these trains correlated to real problems not GC activity. And by GC activity I 
mean that of the regionserver process. As you probably know any one or a 
handful of slow sync warnings can be false positives due to GC rather than real 
latency on the pipeline. It makes things difficult here. We can try to avoid 
false positives either by setting a high latency threshold or by waiting for an 
unusual number to occur within some window of time. There are patches for 
review that take either approach. It would seem the high threshold approach may 
not offer enough mitigation in practice given the data on hand. At any rate the 
thresholds are tunable and can be experimented with in production to find the 
right trade off, and the feature is self limiting so slow sync triggered log 
rolls do not become a problem themselves. 

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-25 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826182#comment-16826182
 ] 

Sean Busbey commented on HBASE-22301:
-

my next review cycle will probably be late friday or monday, just fyi.

is the concern that *we* had a GC pause that crossed the 10s threshold or that 
one of the DNs in the pipeline did?

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825703#comment-16825703
 ] 

HBase QA commented on HBASE-22301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
2s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
45s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
37s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} hbase-server: The patch generated 0 new + 94 
unchanged - 6 fixed = 94 total (was 100) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
38s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 38s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
25s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
33s{color} | {color:green} hbase

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825639#comment-16825639
 ] 

Andrew Purtell commented on HBASE-22301:


Currently there are two patches on this issue: one posted seven hours ago that 
got a +1, and the alternative approach described above.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825637#comment-16825637
 ] 

Andrew Purtell commented on HBASE-22301:


Thanks [~busbey]

[~dmanning] convinced me a count based threshold might be better. Let me attach 
a new version of this patch. Please let me know what you think. We could commit 
either option.

The drawback to rolling on detection of a single latency outlier is that the 
threshold might have to be set really high to exclude "slow syncs" which are 
really GC activity. We went back and forth on what threshold might be 
appropriate to avoid false positives yet still be effective. I still want to do 
something simple, so instead of creating a new latency threshold we reuse the 
existing slow sync warn threshold, and then count how many times we exceed the 
slow sync warn threshold within a longer and configurable interval. If we 
counted enough warnings over that latter interval, then we ask for a roll. We 
can only ask once per interval so retain a limit on pacing to prevent runaway 
roll requests during an outage or incident. This can be a better strategy 
because while it is possible a single outlier is GC activity, the probability 
of many data points being a false positive is less the more there are. Defaults 
in new patch is an interval of one minute; and a count based threshold of ten 
slow sync warnings within that interval.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we 

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825621#comment-16825621
 ] 

Sean Busbey commented on HBASE-22301:
-

+1 on the branch-1 patch

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825537#comment-16825537
 ] 

Andrew Purtell commented on HBASE-22301:


The latest precommit failures do not appear to be related to this patch. Just 
in case I tried a local reproduction and was unable to reproduce.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825526#comment-16825526
 ] 

HBase QA commented on HBASE-22301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
45s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} branch-1 passed with JDK v1.8.0_202 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} branch-1 passed with JDK v1.7.0_211 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
47s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} branch-1 passed with JDK v1.8.0_202 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} branch-1 passed with JDK v1.7.0_211 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed with JDK v1.8.0_202 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed with JDK v1.7.0_211 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} hbase-server: The patch generated 0 new + 94 
unchanged - 6 fixed = 94 total (was 100) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
51s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed with JDK v1.8.0_202 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed with JDK v1.7.0_211 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hbase

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825437#comment-16825437
 ] 

Andrew Purtell commented on HBASE-22301:


Updated patch. Fixes checkstyle nit. I broke the test by updating the config 
constants per request in FSHLog in a fast pass without running the test, oops. 
Fixed. Cannot reproduce TestFailedAppendAndSync failure. Although it is flagged 
as a medium test it completes almost instantly and never fails. Forgot to 
update FSHLog FIXED_OVERHEAD, did so this time around.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825353#comment-16825353
 ] 

Andrew Purtell commented on HBASE-22301:


There is no default switch case by deliberate choice, but since it triggers 
checkstyle I'll change that.

Whoops on the test timeout, will make it 10x.

Agree TestOfflineMetaRebuildBase failure is not related.

Let me update the patch in a bit, and we need one for branch-2 and up, at least 
for the same code in the sync WAL.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824992#comment-16824992
 ] 

Sean Busbey commented on HBASE-22301:
-

bq. hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase

I don't think this one is related. it just started to fail in nightly last 
build.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824990#comment-16824990
 ] 

Sean Busbey commented on HBASE-22301:
-

{code}
java.lang.AssertionError: The regionserver should have thrown an exception
at 
org.apache.hadoop.hbase.regionserver.TestFailedAppendAndSync.testLockupAroundBadAssignSync(TestFailedAppendAndSync.java:258)
{code}

This looks like it might be related, since the changed code path is getting 
exercised, but I haven't dug in enough to figure out what's going on with it.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824989#comment-16824989
 ] 

Sean Busbey commented on HBASE-22301:
-

{code}
./hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/MetricsWAL.java:77:
switch (reason) {: switch without "default" clause. [MissingSwitchDefault]
{code}

this is a good find from checkstyle, since without a default that fails it'll 
be easy for someone to add to the enum but forget to update the metrics.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-24 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824988#comment-16824988
 ] 

Sean Busbey commented on HBASE-22301:
-

{code}
junit.framework.AssertionFailedError: Waiting timed out after [1,000] msec
at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testSlowSyncLogRolling(TestLogRolling.java:321)

{code}

a per-test timeout of 1s isn't going to work for this test.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-23 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824812#comment-16824812
 ] 

HBase QA commented on HBASE-22301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 1s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
35s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
48s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
51s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
21s{color} | {color:red} hbase-server: The patch generated 1 new + 95 unchanged 
- 5 fixed = 96 total (was 100) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
47s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 56s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{colo

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-23 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824805#comment-16824805
 ] 

HBase QA commented on HBASE-22301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
47s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
23s{color} | {color:red} hbase-server: The patch generated 1 new + 95 unchanged 
- 5 fixed = 96 total (was 100) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
49s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 39s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}115m  1s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{colo

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-23 Thread HBase QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824747#comment-16824747
 ] 

HBase QA commented on HBASE-22301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
57s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
3s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
35s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 6s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
37s{color} | {color:green} branch-1 passed with JDK v1.8.0_202 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} branch-1 passed with JDK v1.7.0_211 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
59s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
56s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
30s{color} | {color:green} branch-1 passed with JDK v1.8.0_202 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
27s{color} | {color:green} branch-1 passed with JDK v1.7.0_211 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed with JDK v1.8.0_202 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed with JDK v1.7.0_211 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
30s{color} | {color:red} hbase-server: The patch generated 1 new + 95 unchanged 
- 5 fixed = 96 total (was 100) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
44s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 40s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed with JDK v1.8.0_202 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed with JDK v1.7.0_211 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
31s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 53s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{colo

[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-23 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824742#comment-16824742
 ] 

Andrew Purtell commented on HBASE-22301:


{quote}Sorry, just realized {{hbase.regionserver.hlog.slowsync.ms}} was 
existing.
{quote}
I changed the names as requested but added code to fall back to the old config 
setting if the new one isn't found. No harm there. Only planning to put this in 
for new minors, also, esp. 1.5.0.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-23 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824740#comment-16824740
 ] 

Andrew Purtell commented on HBASE-22301:


Updated patch uses "wal" instead of "hlog" and "log" in config and metrics 
changes.

Also I lowered the default interval we wait between slow sync based roll 
requests from 5 minutes to 2 minutes. We want to be responsive yet not flake on 
transient issues nor exacerbate HDFS layer instability. I think the new default 
is a better trade off.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch, 
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-23 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824737#comment-16824737
 ] 

Sean Busbey commented on HBASE-22301:
-

Oof yeah. Patch viewing on mobile is rough. Sorry, just realized 
{{hbase.regionserver.hlog.slowsync.ms}} was existing.

Looks like configs on master still use "hlog" so let's leave them as-is. If 
that use in configs is worth changing let's stick to a different jira and just 
do all of them for e.g. 3.0.0.

Using WAL in the metrics and log messages will probably save me an explanation 
on the future though.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-23 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824735#comment-16824735
 ] 

Andrew Purtell commented on HBASE-22301:


Sure we can use "wal" instead of "hlog" and "log" since things are changing 
anyway. I kept them similar to surrounding code is all.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-23 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824734#comment-16824734
 ] 

Sean Busbey commented on HBASE-22301:
-

This is great! Two minor requests:

Can the configs use "wal" instead of "hlog"?

Can the metrics use "wal" instead of "log"? (So as make clear they're not about 
the system log output from the region server process)

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-23 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824733#comment-16824733
 ] 

Andrew Purtell commented on HBASE-22301:


Updated patch with a small improvement in FSHLog#checkLogRoll. While it's 
harmless to request a roll if any of the three implemented conditions are met, 
and an argument could be made it would be good to reflect in the logs and 
metrics that more than one condition applied, I do think it would be counter to 
what operators would expect to see in the metrics. So let's stop when the first 
condition applies.

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow

2019-04-23 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824672#comment-16824672
 ] 

Andrew Purtell commented on HBASE-22301:


Already verified that modified unit tests TestMetricsWAL and TestLogRolling pass

> Consider rolling the WAL if the HDFS write pipeline is slow
> ---
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering 
> a gray failure not an outright outage. HDFS operations, notably syncs, are 
> abnormally slow on pipelines which include this subset of hosts. If the 
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be 
> consumed waiting for acks from the datanodes in the pipeline (recall that 
> some of them are sick). Imagine a write heavy application distributing load 
> uniformly over the cluster at a fairly high rate. With the WAL subsystem 
> slowed by HDFS level issues, all handlers can be blocked waiting to append to 
> the WAL. Once all handlers are blocked, the application will experience 
> backpressure. All (HBase) clients eventually have too many outstanding writes 
> and block.
> Because the application is distributing writes near uniformly in the 
> keyspace, the probability any given service endpoint will dispatch a request 
> to an impacted regionserver, even a single regionserver, approaches 1.0. So 
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although 
> there is HDFS level monitoring, mechanisms, and procedures for this, we 
> should also attempt to take mitigating action at the HBase layer as soon as 
> we find ourselves in trouble. It would be enough to remove the affected 
> datanodes from the writer pipelines. A super simple strategy that can be 
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but 
> still can be susceptible. branch-2 sync WAL is susceptible. 
> We already roll the WAL writer if the pipeline suffers the failure of a 
> datanode and the replication factor on the pipeline is too low. We should 
> also consider how much time it took for the write pipeline to complete a sync 
> the last time we measured it, or the max over the interval from now to the 
> last time we checked. If the sync time exceeds a configured threshold, roll 
> the log writer then too. Fortunately we don't need to know which datanode is 
> making the WAL write pipeline slow, only that syncs on the pipeline are too 
> slow and exceeding a threshold. This is enough information to know when to 
> roll it. Once we roll it, we will get three new randomly selected datanodes. 
> On most clusters the probability the new pipeline includes the slow datanode 
> will be low. (And if for some reason it does end up with a problematic 
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what 
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for 
> further debugging and analysis, similar to the existing slow FSHLog sync log 
> line.
> If we roll too many times within a short interval of time this probably means 
> there is a widespread problem with the fleet and so our mitigation is not 
> helping and may be exacerbating those problems or operator difficulties. 
> Ensure log roll requests triggered by this new feature happen infrequently 
> enough to not cause difficulties under either normal or abnormal conditions. 
> A very simple strategy that could work well under both normal and abnormal 
> conditions is to define a fairly lengthy interval, default 5 minutes, and 
> then insure we do not roll more than once during this interval for this 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)