[ https://issues.apache.org/jira/browse/HBASE-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777260#comment-13777260 ]
Hadoop QA commented on HBASE-9593: ---------------------------------- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604967/HBASE-9593_v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7371//console This message is automatically generated. > Region server left in online servers list forever if it went down after > registering to master and before creating ephemeral node > -------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-9593 > URL: https://issues.apache.org/jira/browse/HBASE-9593 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.94.11 > Reporter: rajeshbabu > Assignee: rajeshbabu > Fix For: 0.98.0, 0.94.13, 0.96.1 > > Attachments: HBASE-9593.patch, HBASE-9593_v2.patch, > HBASE-9593_v3.patch > > > In some of our tests we found that regionserer always showing online in > master UI but its actually dead. > If region server went down in the middle following steps then the region > server always showing in master online servers list. > 1) register to master > 2) create ephemeral znode > Since no notification from zookeeper, master is not removing the expired > server from online servers list. > Assignments will fail if the RS is selected as destination server. > Some cases ROOT or META also wont be assigned if the RS is randomly selected > every time need to wait for timeout. > Here are the logs: > 1) HOST-10-18-40-153 is registered to master > {code} > 2013-09-19 19:47:41,123 DEBUG org.apache.hadoop.hbase.master.ServerManager: > STARTUP: Server HOST-10-18-40-153,61020,1379600260255 came back up, removed > it from the dead servers list > 2013-09-19 19:47:41,123 INFO org.apache.hadoop.hbase.master.ServerManager: > Registering server=HOST-10-18-40-153,61020,1379600260255 > {code} > {code} > 2013-09-19 19:47:41,119 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at > HOST-10-18-40-153/10.18.40.153:61000 > 2013-09-19 19:47:41,119 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at > HOST-10-18-40-153,61000,1379600055284 that we are up with port=61020, > startcode=1379600260255 > {code} > 2) Terminated before creating ephemeral node. > {code} > Thu Sep 19 19:47:41 IST 2013 Terminating regionserver > {code} > 3) The RS can be selected for assignment and they will fail. > {code} > 2013-09-19 19:47:54,049 WARN > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of > -ROOT-,,0.70236052 to HOST-10-18-40-153,61020,1379600260255, trying to assign > elsewhere instead; retry=0 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:390) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:436) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1127) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy15.openRegion(Unknown Source) > at > org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:533) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1734) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1431) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1406) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1401) > at > org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:2374) > at > org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignRoot(MetaServerShutdownHandler.java:136) > at > org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignRootWithRetries(MetaServerShutdownHandler.java:160) > at > org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:82) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2013-09-19 19:47:54,050 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for > -ROOT-,,0.70236052 destination server is HOST-10-18-40-153,61020,1379600260255 > 2013-09-19 19:47:54,050 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan > was found (or we are ignoring an existing plan) for -ROOT-,,0.70236052 so > generated a random one; hri=-ROOT-,,0.70236052, src=, > dest=HOST-10-18-40-153,61020,1379600260255; 1 (online=1, available=1) > available servers > 2013-09-19 19:47:54,050 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:61000-0x14135a277ff017d Creating (or updating) unassigned node for > 70236052 with OFFLINE state > 2013-09-19 19:47:54,070 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=M_ZK_REGION_OFFLINE, server=HOST-10-18-40-153,61000,1379600055284, > region=70236052/-ROOT- > 2013-09-19 19:47:54,071 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for > -ROOT-,,0.70236052 destination server is HOST-10-18-40-153,61020,1379600260255 > 2013-09-19 19:47:54,071 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for > region -ROOT-,,0.70236052; plan=hri=-ROOT-,,0.70236052, src=, > dest=HOST-10-18-40-153,61020,1379600260255 > 2013-09-19 19:47:54,071 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > -ROOT-,,0.70236052 to HOST-10-18-40-153,61020,1379600260255 > 2013-09-19 19:47:54,072 WARN > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of > -ROOT-,,0.70236052 to HOST-10-18-40-153,61020,1379600260255, trying to assign > elsewhere instead; retry=1 > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: HOST-10-18-40-153/10.18.40.153:61020 > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1127) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at $Proxy15.openRegion(Unknown Source) > at > org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:533) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1734) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1431) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1406) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1401) > at > org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:2374) > at > org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignRoot(MetaServerShutdownHandler.java:136) > at > org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignRootWithRetries(MetaServerShutdownHandler.java:160) > at > org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:82) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2013-09-19 19:47:54,072 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for > -ROOT-,,0.70236052 destination server is HOST-10-18-40-153,61020,1379600260255 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira