[ https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635365#comment-16635365 ]
Hudson commented on HBASE-18549: -------------------------------- Results for branch branch-2.1 [build #407 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/407/]: (/) *{color:green}+1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/407//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/407//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/407//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Unclaimed replication queues can go undetected > ---------------------------------------------- > > Key: HBASE-18549 > URL: https://issues.apache.org/jira/browse/HBASE-18549 > Project: HBase > Issue Type: Bug > Components: Replication > Reporter: Ashu Pachauri > Assignee: Xu Cang > Priority: Critical > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-18549-.master.001.patch, > HBASE-18549-.master.002.patch, HBASE-18549-.master.003.patch, > HBASE-18549-.master.004.patch, HBASE-18549.branch-1.001.patch, > HBASE-18549.branch-1.001.patch > > > We have come across this situation multiple times where a zookeeper issues > can cause NodeFailoverWorker to fail picking up replication queue for a dead > region server silently. One example is when the znode size for a particular > queue exceed jute.maxBuffer value. > There can be other situations that may lead to this and just go undetected. > We need to have a metric for number of unclaimed replication queues. This > will help in mitigating the problem through alerting on the metric and > identifying underlying issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005)