[jira] [Commented] (HBASE-18549) Unclaimed replication queues can go undetected

Hudson (JIRA) Tue, 02 Oct 2018 05:12:27 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635365#comment-16635365
 ]


Hudson commented on HBASE-18549:
--------------------------------

Results for branch branch-2.1
        [build #407 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/407/]: 
(/) *{color:green}+1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/407//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/407//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/407//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Unclaimed replication queues can go undetected
> ----------------------------------------------
>
>                 Key: HBASE-18549
>                 URL: https://issues.apache.org/jira/browse/HBASE-18549
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Ashu Pachauri
>            Assignee: Xu Cang
>            Priority: Critical
>             Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
>         Attachments: HBASE-18549-.master.001.patch, 
> HBASE-18549-.master.002.patch, HBASE-18549-.master.003.patch, 
> HBASE-18549-.master.004.patch, HBASE-18549.branch-1.001.patch, 
> HBASE-18549.branch-1.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18549) Unclaimed replication queues can go undetected

Reply via email to