[ https://issues.apache.org/jira/browse/HBASE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeffrey Zhong updated HBASE-10895: ---------------------------------- Status: Patch Available (was: Open) > unassign a region fails due to the hosting region server is in > FailedServerList > ------------------------------------------------------------------------------- > > Key: HBASE-10895 > URL: https://issues.apache.org/jira/browse/HBASE-10895 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Affects Versions: 0.98.1, 0.96.1, 0.99.0 > Reporter: Jeffrey Zhong > Assignee: Jeffrey Zhong > Attachments: hbase-10895.patch > > > This issue is similar as HBASE-10833 which deal with the sendRegionOpen RPC > while the JIRA issue happens with sendRegionClose. > Once a RS in in failed server list due to a network hiccup, AM quickly > exhausted all retries and failed the whole region assignment later. Below is > a sample stack trace: > {noformat} > 2014-03-31 13:39:10,056 INFO [AM.-pool1-t8] master.AssignmentManager: Server > hor16n09.gq1.ygridcore.net,60020,1396270942046 returned > org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is > in the failed servers list: hor16n09.gq1.ygridcore.net/68.142.246.220:60020 > for loadtest_d1,59999994,1396261861562.fcef8d691632e99948fbf876d24f907e., > try=20 of 20 > org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is > in the failed servers list: hor16n09.gq1.ygridcore.net/68.142.246.220:60020 > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:880) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1065) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.tracedWriteRequest(RpcClient.java:1032) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1474) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1684) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1737) > at > org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.closeRegion(AdminProtos.java:20854) > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1656) > at > org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1685) > at > org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1786) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1436) > at > org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45) > .... > 2014-03-31 13:39:10,056 WARN [AM.-pool1-t8] master.RegionStates: Failed to > open/close fcef8d691632e99948fbf876d24f907e on > hor16n09.gq1.ygridcore.net,60020,1396270942046, set to FAILED_CLOSE > 2014-03-31 13:39:10,056 INFO [AM.-pool1-t8] master.RegionStates: > Transitioned {fcef8d691632e99948fbf876d24f907e state=PENDING_OPEN, > ts=1396273149814, server=hor16n09.gq1.ygridcore.net,60020,1396270942046} to > {fcef8d691632e99948fbf876d24f907e state=FAILED_CLOSE, ts=1396273150056, > server=hor16n09.gq1.ygridcore.net,60020,1396270942046} > 2014-03-31 13:39:10,056 INFO [AM.-pool1-t8] master.AssignmentManager: Skip > assigning {ENCODED => fcef8d691632e99948fbf876d24f907e, NAME => > 'loadtest_d1,59999994,1396261861562.fcef8d691632e99948fbf876d24f907e.', > STARTKEY => '59999994', ENDKEY => '66666660'}, we couldn't close it: > {fcef8d691632e99948fbf876d24f907e state=FAILED_CLOSE, ts=1396273150056, > server=hor16n09.gq1.ygridcore.net,60020,1396270942046} > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)