[ https://issues.apache.org/jira/browse/HBASE-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-18408: -------------------------- Priority: Blocker (was: Major) > AM consumes CPU and fills up the logs really fast when there is no RS to > assign > ------------------------------------------------------------------------------- > > Key: HBASE-18408 > URL: https://issues.apache.org/jira/browse/HBASE-18408 > Project: HBase > Issue Type: Bug > Reporter: Enis Soztutar > Priority: Blocker > Fix For: 2.0.0-alpha-2 > > > I was testing something else when I discovered that when there is no RS to > assign a region to (but master is alive), then AM/LB creates GB's of logs. > Logs like this: > {code} > 2017-07-18 16:40:00,712 WARN [AssignmentThread] balancer.BaseLoadBalancer: > Wanted to do round robin assignment but no servers to assign to > 2017-07-18 16:40:00,712 WARN [AssignmentThread] > assignment.AssignmentManager: unable to round-robin assignment > org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for > regions=1 > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587) > 2017-07-18 16:40:00,865 WARN [AssignmentThread] balancer.BaseLoadBalancer: > Wanted to do round robin assignment but no servers to assign to > 2017-07-18 16:40:00,866 WARN [AssignmentThread] > assignment.AssignmentManager: unable to round-robin assignment > org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for > regions=1 > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587) > 2017-07-18 16:40:01,019 WARN [AssignmentThread] balancer.BaseLoadBalancer: > Wanted to do round robin assignment but no servers to assign to > 2017-07-18 16:40:01,019 WARN [AssignmentThread] > assignment.AssignmentManager: unable to round-robin assignment > org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for > regions=1 > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587) > 2017-07-18 16:40:01,173 WARN [AssignmentThread] balancer.BaseLoadBalancer: > Wanted to do round robin assignment but no servers to assign to > 2017-07-18 16:40:01,173 WARN [AssignmentThread] > assignment.AssignmentManager: unable to round-robin assignment > org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for > regions=1 > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587) > {code} > Reproduction is easy: > - Start pseudo-distributed cluster > - Create a table > - kill region server > I have also noticed that we are just spinning CPU in another case consuming > 100-200% (but this is in a very old code base from master) in this cycle: > {code} > "ProcedureExecutor-0" #106 daemon prio=5 os_prio=0 tid=0x00007fab54851800 > nid=0xcf1 runnable [0x00007fab4e7b0000] > java.lang.Thread.State: RUNNABLE > at java.lang.Object.hashCode(Native Method) > at > java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106) > at > java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:6158) > - locked <0x00000000c4cb62e8> (a > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6829) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6790) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2125) > at org.apache.hadoop.hbase.client.HTable$1.call(HTable.java:425) > at org.apache.hadoop.hbase.client.HTable$1.call(HTable.java:416) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:102) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:433) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:399) > at > org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:1084) > at > org.apache.hadoop.hbase.master.TableStateManager.readMetaState(TableStateManager.java:188) > at > org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:172) > at > org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:131) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.processDeadRegion(ServerCrashProcedure.java:666) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.calcRegionsToAssign(ServerCrashProcedure.java:460) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:72) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:133) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:523) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1061) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:855) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:808) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:495) > {code} > I think this happens when meta is not hosted in master. -- This message was sent by Atlassian JIRA (v6.4.14#64029)