[ https://issues.apache.org/jira/browse/HBASE-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923748#action_12923748 ]
stack commented on HBASE-3139: ------------------------------ I tried to run w/ latest over on review board and got this: {code} 2010-10-22 03:36:56,957 FATAL org.apache.hadoop.hbase.master.HMaster: Failed assignment of regions to serverName=sv2borg181,60020,1287718591505, load=(requests=0, regions=0, usedHeap=29, maxHeap=987) org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.submit(ExecutorService.java:285) at org.apache.hadoop.hbase.executor.ExecutorService.submit(ExecutorService.java:216) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1944) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegions(HRegionServer.java:1952) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:561) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:749) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:255) at $Proxy1.openRegions(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:536) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:687) at org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1111) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2010-10-22 03:36:56,959 INFO org.apache.hadoop.hbase.master.HMaster: Aborting {code] > Server shutdown processor stuck because meta not online > ------------------------------------------------------- > > Key: HBASE-3139 > URL: https://issues.apache.org/jira/browse/HBASE-3139 > Project: HBase > Issue Type: Bug > Reporter: stack > Fix For: 0.90.0 > > Attachments: ExecutorServiceUnitTest.patch > > > Playing with rolling restart I see that the server hosting root and meta can > go down close to each other. In below, note how we are processing server > hosting -ROOT- and part of its processing involves reading .META. content to > see what servers it was carrying. Well, note that .META. is offline at time > (our verification attempt failed because server had just been shutdown and > verification got ConnectException). So we pause the server shutdown > processing till .META. comes back online -- only it never does. > {code} > 2010-10-21 07:32:23,931 INFO > org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region > location in ZooKeeper > 2010-10-21 07:32:23,953 INFO > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs > for sv2borg182,60020,1287645693959 > 2010-10-21 07:32:23,994 INFO > org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region > location in ZooKeeper > 2010-10-21 07:32:24,020 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x12bcd5b344b0115 Creating (or updating) unassigned node for > 70236052 with OFFLINE state > 2010-10-21 07:32:24,045 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan > for -ROOT-,,0.70236052 so generated a random one; hri=-ROOT-,,0.70236052, > src=, dest=sv2borg181,60020,1287646329081; 8 (online=8, exclude=null) > available servers > > 2010-10-21 07:32:24,045 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > -ROOT-,,0.70236052 to sv2borg181,60020,1287646329081 > 2010-10-21 07:32:24,048 > INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification of > .META.,,1; java.net.ConnectException: Connection refused > 2010-10-21 07:32:24,048 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: > Current cached META location is not valid, resetting > 2010-10-21 07:32:24,079 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, server=sv2borg181,60020,1287646329081, > region=70236052/-ROOT- 2010-10-21 > 07:32:24,162 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, server=sv2borg181,60020,1287646329081, > region=70236052/-ROOT- > 2010-10-21 07:32:24,212 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENED, server=sv2borg181,60020,1287646329081, > region=70236052/-ROOT- 2010-10-21 > 07:32:24,212 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED > event for 70236052; deleting unassigned node > 2010-10-21 07:32:24,212 DEBUG > org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x12bcd5b344b0115 > Deleting existing unassigned node for 70236052 that is in expected state > RS_ZK_REGION_OPENED 2010-10-21 07:32:24,238 > DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened > region -ROOT-,,0.70236052 > 2010-10-21 07:32:27,902 > INFO org.apache.hadoop.hbase.master.ServerManager: Registering > server=sv2borg183,60020,1287646347597, regionCount=0, userLoad=false > 2010-10-21 > 07:32:30,523 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: > RegionServer ephemeral node deleted, processing expiration > [sv2borg184,60020,1287645693960] > 2010-10-21 07:32:30,523 DEBUG > org.apache.hadoop.hbase.master.ServerManager: > Added=sv2borg184,60020,1287645693960 to dead servers, submitted shutdown > handler to be executed > 2010-10-21 07:32:36,254 INFO org.apache.hadoop.hbase.master.ServerManager: > Registering server=sv2borg184,60020,1287646355951, regionCount=0, > userLoad=false > 2010-10-21 07:32:39,567 INFO > org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral > node deleted, processing expiration [sv2borg185,60020,1287645693959] > 2010-10-21 07:32:39,567 DEBUG > org.apache.hadoop.hbase.master.ServerManager: > Added=sv2borg185,60020,1287645693959 to dead servers, submitted shutdown > handler to be executed > 2010-10-21 07:32:45,614 INFO org.apache.hadoop.hbase.master.ServerManager: > Registering server=sv2borg185,60020,1287646365304, regionCount=0, > userLoad=false > 2010-10-21 07:32:48,652 INFO > org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral > node deleted, processing expiration [sv2borg186,60020,1287645693962] > 2010-10-21 07:32:48,652 DEBUG > org.apache.hadoop.hbase.master.ServerManager: > Added=sv2borg186,60020,1287645693962 to dead servers, submitted shutdown > handler to be executed > 2010-10-21 07:32:50,097 INFO org.apache.hadoop.hbase.master.ServerManager: > regionservers=8, averageload=93.38, > deadservers=[sv2borg185,60020,1287645693959, sv2borg183,60020,1287645693959, > sv2borg182,60020,1287645693959, sv2borg184,60020,1287645693960, > sv2borg186,60020,1287645693962] > .... > {code} > We're supposed to have a thread of 5 executors to handle server shutdowns. I > see an executor stuck waiting on .META. but I dont see any others running. > Odd. Trying to figure why executors are 1 only. > {code} > "MASTER_SERVER_OPERATIONS-sv2borg180:60000-1" daemon prio=10 > tid=0x0000000041dc7000 nid=0x50a4 in Object.wait() [0x00007f285d537000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:324) > - locked <0x00007f286d150ce8> (a > java.util.concurrent.atomic.AtomicBoolean) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:359) > at > org.apache.hadoop.hbase.catalog.MetaReader.getServerUserRegions(MetaReader.java:487) > at > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:115) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:150) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.