[ 
https://issues.apache.org/jira/browse/HBASE-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923748#action_12923748
 ] 

stack commented on HBASE-3139:
------------------------------

I tried to run w/ latest over on review board and got this:

{code}
2010-10-22 03:36:56,957 FATAL org.apache.hadoop.hbase.master.HMaster: Failed 
assignment of regions to serverName=sv2borg181,60020,1287718591505, 
load=(requests=0, regions=0,        usedHeap=29, maxHeap=987)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
java.util.concurrent.RejectedExecutionException
    at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
    at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
    at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
    at 
org.apache.hadoop.hbase.executor.ExecutorService$Executor.submit(ExecutorService.java:285)
    at 
org.apache.hadoop.hbase.executor.ExecutorService.submit(ExecutorService.java:216)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1944)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegions(HRegionServer.java:1952)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:561)
    at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025)

    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:749)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:255)
    at $Proxy1.openRegions(Unknown Source)
    at 
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:536)
    at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:687)
    at 
org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1111)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
2010-10-22 03:36:56,959 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
{code]

> Server shutdown processor stuck because meta not online
> -------------------------------------------------------
>
>                 Key: HBASE-3139
>                 URL: https://issues.apache.org/jira/browse/HBASE-3139
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.90.0
>
>         Attachments: ExecutorServiceUnitTest.patch
>
>
> Playing with rolling restart I see that the server hosting root and meta can 
> go down close to each other.  In below, note how we are processing server 
> hosting -ROOT- and part of its processing involves reading .META. content to 
> see what servers it was carrying.  Well, note that .META. is offline at time 
> (our verification attempt failed because server had just been shutdown and 
> verification got ConnectException).  So we pause the server shutdown 
> processing till .META. comes back online -- only it never does.
> {code}
> 2010-10-21 07:32:23,931 INFO 
> org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region 
> location in ZooKeeper
> 2010-10-21 07:32:23,953 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for sv2borg182,60020,1287645693959                                            
>                                            2010-10-21 07:32:23,994 INFO 
> org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region 
> location in ZooKeeper
> 2010-10-21 07:32:24,020 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:60000-0x12bcd5b344b0115 Creating (or updating) unassigned node for 
> 70236052 with OFFLINE state                                                   
>  2010-10-21 07:32:24,045 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
> for -ROOT-,,0.70236052 so generated a random one; hri=-ROOT-,,0.70236052, 
> src=, dest=sv2borg181,60020,1287646329081; 8 (online=8, exclude=null) 
> available servers                                                             
>                                                                               
>                                               2010-10-21 07:32:24,045 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> -ROOT-,,0.70236052 to sv2borg181,60020,1287646329081                          
>                                                     2010-10-21 07:32:24,048 
> INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification of 
> .META.,,1; java.net.ConnectException: Connection refused
> 2010-10-21 07:32:24,048 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
> Current cached META location is not valid, resetting
> 2010-10-21 07:32:24,079 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=sv2borg181,60020,1287646329081, 
> region=70236052/-ROOT-                                            2010-10-21 
> 07:32:24,162 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=sv2borg181,60020,1287646329081, 
> region=70236052/-ROOT-
> 2010-10-21 07:32:24,212 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=sv2borg181,60020,1287646329081, 
> region=70236052/-ROOT-                                             2010-10-21 
> 07:32:24,212 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
> event for 70236052; deleting unassigned node                                  
>                                            2010-10-21 07:32:24,212 DEBUG 
> org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x12bcd5b344b0115 
> Deleting existing unassigned node for 70236052 that is in expected state 
> RS_ZK_REGION_OPENED                              2010-10-21 07:32:24,238 
> DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened 
> region -ROOT-,,0.70236052                                                     
>                                                     2010-10-21 07:32:27,902 
> INFO org.apache.hadoop.hbase.master.ServerManager: Registering 
> server=sv2borg183,60020,1287646347597, regionCount=0, userLoad=false          
>                                                               2010-10-21 
> 07:32:30,523 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: 
> RegionServer ephemeral node deleted, processing expiration 
> [sv2borg184,60020,1287645693960]                                              
>       2010-10-21 07:32:30,523 DEBUG 
> org.apache.hadoop.hbase.master.ServerManager: 
> Added=sv2borg184,60020,1287645693960 to dead servers, submitted shutdown 
> handler to be executed                                                        
> 2010-10-21 07:32:36,254 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Registering server=sv2borg184,60020,1287646355951, regionCount=0, 
> userLoad=false                                                                
>         2010-10-21 07:32:39,567 INFO 
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral 
> node deleted, processing expiration [sv2borg185,60020,1287645693959]          
>                                           2010-10-21 07:32:39,567 DEBUG 
> org.apache.hadoop.hbase.master.ServerManager: 
> Added=sv2borg185,60020,1287645693959 to dead servers, submitted shutdown 
> handler to be executed                                                        
> 2010-10-21 07:32:45,614 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Registering server=sv2borg185,60020,1287646365304, regionCount=0, 
> userLoad=false                                                                
>         2010-10-21 07:32:48,652 INFO 
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral 
> node deleted, processing expiration [sv2borg186,60020,1287645693962]          
>                                           2010-10-21 07:32:48,652 DEBUG 
> org.apache.hadoop.hbase.master.ServerManager: 
> Added=sv2borg186,60020,1287645693962 to dead servers, submitted shutdown 
> handler to be executed                                                        
> 2010-10-21 07:32:50,097 INFO org.apache.hadoop.hbase.master.ServerManager: 
> regionservers=8, averageload=93.38, 
> deadservers=[sv2borg185,60020,1287645693959, sv2borg183,60020,1287645693959, 
> sv2borg182,60020,1287645693959,        sv2borg184,60020,1287645693960, 
> sv2borg186,60020,1287645693962]
> ....
> {code}
> We're supposed to have a thread of 5 executors to handle server shutdowns.  I 
> see an executor stuck waiting on .META. but I dont see any others running.  
> Odd.  Trying to figure why executors are 1 only.
> {code}
> "MASTER_SERVER_OPERATIONS-sv2borg180:60000-1" daemon prio=10 
> tid=0x0000000041dc7000 nid=0x50a4 in Object.wait() [0x00007f285d537000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>     at java.lang.Object.wait(Native Method)
>     at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:324)
>     - locked <0x00007f286d150ce8> (a 
> java.util.concurrent.atomic.AtomicBoolean)
>     at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:359)
>     at 
> org.apache.hadoop.hbase.catalog.MetaReader.getServerUserRegions(MetaReader.java:487)
>     at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:115)
>     at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:150)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to