[ https://issues.apache.org/jira/browse/HBASE-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584995#comment-14584995 ]
Hadoop QA commented on HBASE-13605: ----------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12739472/hbase-13605_v4-branch-1.1.patch against branch-1.1 branch at commit 682b8ab8a542a903e5807053282693e3a96bad2d. ATTACHMENT ID: 12739472 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3814 checkstyle errors (more than the master's current 3813 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14401//testReport/ Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14401//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14401//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14401//console This message is automatically generated. > RegionStates should not keep its list of dead servers > ----------------------------------------------------- > > Key: HBASE-13605 > URL: https://issues.apache.org/jira/browse/HBASE-13605 > Project: HBase > Issue Type: Bug > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Priority: Critical > Fix For: 2.0.0, 1.0.2, 1.1.1 > > Attachments: hbase-13605_v1.patch, hbase-13605_v3-branch-1.1.patch, > hbase-13605_v4-branch-1.1.patch > > > As mentioned in > https://issues.apache.org/jira/browse/HBASE-9514?focusedCommentId=13769761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769761 > and HBASE-12844 we should have only 1 source of cluster membership. > The list of dead server and RegionStates doing it's own liveliness check > (ServerManager.isServerReachable()) has caused an assignment problem again in > a test cluster where the region states "thinks" that the server is dead and > SSH will handle the region assignment. However the RS is not dead at all, > living happily, and never gets zk expiry or YouAreDeadException or anything. > This leaves the list of regions unassigned in OFFLINE state. > master assigning the region: > {code} > 15-04-20 09:02:25,780 DEBUG [AM.ZK.Worker-pool3-t330] master.RegionStates: > Onlined 77dddcd50c22e56bfff133c0e1f9165b on > os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 {ENCODED => > 77dddcd50c > {code} > Master then disabled the table, and unassigned the region: > {code} > 2015-04-20 09:02:27,158 WARN [ProcedureExecutorThread-1] > zookeeper.ZKTableStateManager: Moving table loadtest_d1 state from DISABLING > to DISABLING > Starting unassign of > loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b. (offlining), > current state: {77dddcd50c22e56bfff133c0e1f9165b state=OPEN, > ts=1429520545780, > server=os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268} > bleProcedure$BulkDisabler-0] master.AssignmentManager: Sent CLOSE to > os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 for region > loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b. > 2015-04-20 09:02:27,414 INFO [AM.ZK.Worker-pool3-t316] master.RegionStates: > Offlined 77dddcd50c22e56bfff133c0e1f9165b from > os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 > {code} > On table re-enable, AM does not assign the region: > {code} > 2015-04-20 09:02:30,415 INFO [ProcedureExecutorThread-3] > balancer.BaseLoadBalancer: Reassigned 25 regions. 25 retained the pre-restart > assignment.ยท > 2015-04-20 09:02:30,415 INFO [ProcedureExecutorThread-3] > procedure.EnableTableProcedure: Bulk assigning 25 region(s) across 5 > server(s), retainAssignment=true > l,16000,1429515659726-GeneralBulkAssigner-4] master.RegionStates: Couldn't > reach online server > os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 > l,16000,1429515659726-GeneralBulkAssigner-4] master.AssignmentManager: > Updating the state to OFFLINE to allow to be reassigned by SSH > nmentManager: Skip assigning > loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b., it is on a dead > but not processed yet server: > os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)