[ https://issues.apache.org/jira/browse/HBASE-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005720#comment-15005720 ]
Hudson commented on HBASE-14802: -------------------------------- FAILURE: Integrated in HBase-Trunk_matrix #466 (See [https://builds.apache.org/job/HBase-Trunk_matrix/466/]) Revert "HBASE-14802 Replaying server crash recovery procedure after a (stack: rev bb6581345fd9ecac964e19cea2293477162801ca) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDeadServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java > Replaying server crash recovery procedure after a failover causes incorrect > handling of deadservers > --------------------------------------------------------------------------------------------------- > > Key: HBASE-14802 > URL: https://issues.apache.org/jira/browse/HBASE-14802 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 2.0.0, 1.2.0, 1.2.1 > Reporter: Ashu Pachauri > Assignee: Ashu Pachauri > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14802-1.patch, HBASE-14802-2.patch, > HBASE-14802-3.patch, HBASE-14802.patch > > > The way dead servers are processed is that a ServerCrashProcedure is launched > for a server after it is added to the dead servers list. > Every time a server is added to the dead list, a counter "numProcessing" is > incremented and it is decremented when a crash recovery procedure finishes. > Since, adding a dead server and recovering it are two separate events, it can > cause inconsistencies. > If a master failover occurs in the middle of the crash recovery, the > numProcessing counter resets but the ServerCrashProcedure is replayed by the > new master. This causes the counter to go negative and makes the master think > that dead servers are still in process of recovery. > This has ramifications on the balancer that the balancer ceases to run after > such a failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)