Server shutdown processor stuck because meta not online
-------------------------------------------------------

                 Key: HBASE-3139
                 URL: https://issues.apache.org/jira/browse/HBASE-3139
             Project: HBase
          Issue Type: Bug
            Reporter: stack
             Fix For: 0.90.0


Playing with rolling restart I see that the server hosting root and meta can go 
down close to each other.  In below, note how we are processing server hosting 
-ROOT- and part of its processing involves reading .META. content to see what 
servers it was carrying.  Well, note that .META. is offline at time (our 
verification attempt failed because server had just been shutdown and 
verification got ConnectException).  So we pause the server shutdown processing 
till .META. comes back online -- only it never does.

{code}
2010-10-21 07:32:23,931 INFO 
org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region 
location in ZooKeeper
2010-10-21 07:32:23,953 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
for sv2borg182,60020,1287645693959                                              
                                         2010-10-21 07:32:23,994 INFO 
org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region 
location in ZooKeeper
2010-10-21 07:32:24,020 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x12bcd5b344b0115 Creating (or updating) unassigned node for 
70236052 with OFFLINE state                                                    
2010-10-21 07:32:24,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
No previous transition plan for -ROOT-,,0.70236052 so generated a random one; 
hri=-ROOT-,,0.70236052, src=, dest=sv2borg181,60020,1287646329081; 8 (online=8, 
exclude=null) available servers                                                 
                                                                                
                                                        2010-10-21 07:32:24,045 
DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
-ROOT-,,0.70236052 to sv2borg181,60020,1287646329081                            
                                                  2010-10-21 07:32:24,048 INFO 
org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification of 
.META.,,1; java.net.ConnectException: Connection refused
2010-10-21 07:32:24,048 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
Current cached META location is not valid, resetting
2010-10-21 07:32:24,079 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=sv2borg181,60020,1287646329081, region=70236052/-ROOT-                   
                         2010-10-21 07:32:24,162 DEBUG 
org.apache.hadoop.hbase.master.AssignmentManager: Handling 
transition=RS_ZK_REGION_OPENING, server=sv2borg181,60020,1287646329081, 
region=70236052/-ROOT-
2010-10-21 07:32:24,212 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, server=sv2borg181,60020,1287646329081, 
region=70236052/-ROOT-                                             2010-10-21 
07:32:24,212 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: 
Handling OPENED event for 70236052; deleting unassigned node                    
                                                         2010-10-21 
07:32:24,212 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x12bcd5b344b0115 Deleting existing unassigned node for 70236052 
that is in expected state RS_ZK_REGION_OPENED                              
2010-10-21 07:32:24,238 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
-ROOT-,,0.70236052                                                              
                                           2010-10-21 07:32:27,902 INFO 
org.apache.hadoop.hbase.master.ServerManager: Registering 
server=sv2borg183,60020,1287646347597, regionCount=0, userLoad=false            
                                                            2010-10-21 
07:32:30,523 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: 
RegionServer ephemeral node deleted, processing expiration 
[sv2borg184,60020,1287645693960]                                                
    2010-10-21 07:32:30,523 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Added=sv2borg184,60020,1287645693960 to dead servers, submitted shutdown 
handler to be executed                                                        
2010-10-21 07:32:36,254 INFO org.apache.hadoop.hbase.master.ServerManager: 
Registering server=sv2borg184,60020,1287646355951, regionCount=0, 
userLoad=false                                                                  
      2010-10-21 07:32:39,567 INFO 
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral 
node deleted, processing expiration [sv2borg185,60020,1287645693959]            
                                        2010-10-21 07:32:39,567 DEBUG 
org.apache.hadoop.hbase.master.ServerManager: 
Added=sv2borg185,60020,1287645693959 to dead servers, submitted shutdown 
handler to be executed                                                        
2010-10-21 07:32:45,614 INFO org.apache.hadoop.hbase.master.ServerManager: 
Registering server=sv2borg185,60020,1287646365304, regionCount=0, 
userLoad=false                                                                  
      2010-10-21 07:32:48,652 INFO 
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral 
node deleted, processing expiration [sv2borg186,60020,1287645693962]            
                                        2010-10-21 07:32:48,652 DEBUG 
org.apache.hadoop.hbase.master.ServerManager: 
Added=sv2borg186,60020,1287645693962 to dead servers, submitted shutdown 
handler to be executed                                                        
2010-10-21 07:32:50,097 INFO org.apache.hadoop.hbase.master.ServerManager: 
regionservers=8, averageload=93.38, 
deadservers=[sv2borg185,60020,1287645693959, sv2borg183,60020,1287645693959, 
sv2borg182,60020,1287645693959,        sv2borg184,60020,1287645693960, 
sv2borg186,60020,1287645693962]
....
{code}

We're supposed to have a thread of 5 executors to handle server shutdowns.  I 
see an executor stuck waiting on .META. but I dont see any others running.  
Odd.  Trying to figure why executors are 1 only.

{code}
"MASTER_SERVER_OPERATIONS-sv2borg180:60000-1" daemon prio=10 
tid=0x0000000041dc7000 nid=0x50a4 in Object.wait() [0x00007f285d537000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:324)
    - locked <0x00007f286d150ce8> (a java.util.concurrent.atomic.AtomicBoolean)
    at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:359)
    at 
org.apache.hadoop.hbase.catalog.MetaReader.getServerUserRegions(MetaReader.java:487)
    at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:115)
    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:150)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to