[
https://issues.apache.org/jira/browse/HBASE-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924137#action_12924137
]
HBase Review Board commented on HBASE-3139:
-------------------------------------------
Message from: [email protected]
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1064/
-----------------------------------------------------------
(Updated 2010-10-22 23:27:28.712219)
Review request for hbase and stack.
Changes
-------
Remove priority queue -- not needed. Fix naming format for threads. Make the
thread pool max and corepool size the same so we have a fixed pool size.
Adjust thread pool sizes down. They were upped when we weren't getting
throughput when we thought executorservice was doing parallelism but wasnt'.
Summary
-------
See HBASE-3139
This addresses bug hbase-3139.
http://issues.apache.org/jira/browse/hbase-3139
Diffs (updated)
-----
trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java
1026564
trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1026564
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
1026564
trunk/src/test/java/org/apache/hadoop/hbase/executor/TestExecutorService.java
1026564
Diff: http://review.cloudera.org/r/1064/diff
Testing
-------
Thanks,
Jonathan
> Server shutdown processor stuck because meta not online
> -------------------------------------------------------
>
> Key: HBASE-3139
> URL: https://issues.apache.org/jira/browse/HBASE-3139
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Fix For: 0.90.0
>
> Attachments: 3139-v5.txt, ExecutorServiceUnitTest.patch
>
>
> Playing with rolling restart I see that the server hosting root and meta can
> go down close to each other. In below, note how we are processing server
> hosting -ROOT- and part of its processing involves reading .META. content to
> see what servers it was carrying. Well, note that .META. is offline at time
> (our verification attempt failed because server had just been shutdown and
> verification got ConnectException). So we pause the server shutdown
> processing till .META. comes back online -- only it never does.
> {code}
> 2010-10-21 07:32:23,931 INFO
> org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region
> location in ZooKeeper
> 2010-10-21 07:32:23,953 INFO
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs
> for sv2borg182,60020,1287645693959
> 2010-10-21 07:32:23,994 INFO
> org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region
> location in ZooKeeper
> 2010-10-21 07:32:24,020 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> master:60000-0x12bcd5b344b0115 Creating (or updating) unassigned node for
> 70236052 with OFFLINE state
> 2010-10-21 07:32:24,045 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan
> for -ROOT-,,0.70236052 so generated a random one; hri=-ROOT-,,0.70236052,
> src=, dest=sv2borg181,60020,1287646329081; 8 (online=8, exclude=null)
> available servers
>
> 2010-10-21 07:32:24,045 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> -ROOT-,,0.70236052 to sv2borg181,60020,1287646329081
> 2010-10-21 07:32:24,048
> INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification of
> .META.,,1; java.net.ConnectException: Connection refused
> 2010-10-21 07:32:24,048 INFO org.apache.hadoop.hbase.catalog.CatalogTracker:
> Current cached META location is not valid, resetting
> 2010-10-21 07:32:24,079 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_OPENING, server=sv2borg181,60020,1287646329081,
> region=70236052/-ROOT- 2010-10-21
> 07:32:24,162 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_OPENING, server=sv2borg181,60020,1287646329081,
> region=70236052/-ROOT-
> 2010-10-21 07:32:24,212 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_OPENED, server=sv2borg181,60020,1287646329081,
> region=70236052/-ROOT- 2010-10-21
> 07:32:24,212 DEBUG
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED
> event for 70236052; deleting unassigned node
> 2010-10-21 07:32:24,212 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x12bcd5b344b0115
> Deleting existing unassigned node for 70236052 that is in expected state
> RS_ZK_REGION_OPENED 2010-10-21 07:32:24,238
> DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened
> region -ROOT-,,0.70236052
> 2010-10-21 07:32:27,902
> INFO org.apache.hadoop.hbase.master.ServerManager: Registering
> server=sv2borg183,60020,1287646347597, regionCount=0, userLoad=false
> 2010-10-21
> 07:32:30,523 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker:
> RegionServer ephemeral node deleted, processing expiration
> [sv2borg184,60020,1287645693960]
> 2010-10-21 07:32:30,523 DEBUG
> org.apache.hadoop.hbase.master.ServerManager:
> Added=sv2borg184,60020,1287645693960 to dead servers, submitted shutdown
> handler to be executed
> 2010-10-21 07:32:36,254 INFO org.apache.hadoop.hbase.master.ServerManager:
> Registering server=sv2borg184,60020,1287646355951, regionCount=0,
> userLoad=false
> 2010-10-21 07:32:39,567 INFO
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral
> node deleted, processing expiration [sv2borg185,60020,1287645693959]
> 2010-10-21 07:32:39,567 DEBUG
> org.apache.hadoop.hbase.master.ServerManager:
> Added=sv2borg185,60020,1287645693959 to dead servers, submitted shutdown
> handler to be executed
> 2010-10-21 07:32:45,614 INFO org.apache.hadoop.hbase.master.ServerManager:
> Registering server=sv2borg185,60020,1287646365304, regionCount=0,
> userLoad=false
> 2010-10-21 07:32:48,652 INFO
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral
> node deleted, processing expiration [sv2borg186,60020,1287645693962]
> 2010-10-21 07:32:48,652 DEBUG
> org.apache.hadoop.hbase.master.ServerManager:
> Added=sv2borg186,60020,1287645693962 to dead servers, submitted shutdown
> handler to be executed
> 2010-10-21 07:32:50,097 INFO org.apache.hadoop.hbase.master.ServerManager:
> regionservers=8, averageload=93.38,
> deadservers=[sv2borg185,60020,1287645693959, sv2borg183,60020,1287645693959,
> sv2borg182,60020,1287645693959, sv2borg184,60020,1287645693960,
> sv2borg186,60020,1287645693962]
> ....
> {code}
> We're supposed to have a thread of 5 executors to handle server shutdowns. I
> see an executor stuck waiting on .META. but I dont see any others running.
> Odd. Trying to figure why executors are 1 only.
> {code}
> "MASTER_SERVER_OPERATIONS-sv2borg180:60000-1" daemon prio=10
> tid=0x0000000041dc7000 nid=0x50a4 in Object.wait() [0x00007f285d537000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:324)
> - locked <0x00007f286d150ce8> (a
> java.util.concurrent.atomic.AtomicBoolean)
> at
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:359)
> at
> org.apache.hadoop.hbase.catalog.MetaReader.getServerUserRegions(MetaReader.java:487)
> at
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:115)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:150)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.