[
https://issues.apache.org/jira/browse/HBASE-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665177#action_12665177
]
Jim Kellerman commented on HBASE-1123:
--------------------------------------
A dead server is not removed from the deadServer list until
ProcessServerShutdown has split the logs and
scanned all the meta regions for regions that the dead server was serving.
The root region will get reassigned immediately after the log is split (if the
dead server was serving the
root region).
If the dead server was serving other regions, ProcessServerShutdown is requeued
to wait for the root
region to come back on-line.
Once the root region is back on-line, it can be scanned for any meta regions
that the dead server was
serving. If there are any, they are marked as unassigned and
ProcessServerShutdown is requeued to
wait for all the meta regions to be on-line.
Once all the meta regions are on-line, they can be scanned and any that were
being served by the
dead server are marked as unassigned.
At this point, the dead server is removed from the dead server list.
> Server never leaves the dead list though logs have all been processed if
> crashed server had -ROOT- (seemingly)
> --------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1123
> URL: https://issues.apache.org/jira/browse/HBASE-1123
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.19.0
> Reporter: stack
> Assignee: Jim Kellerman
> Fix For: 0.20.0
>
>
> Cluster is just hung after host that had -ROOT- completed splitting its
> logs... old server is just stuck on the dead list and never comes off it.
> {code}
> ..
> 2009-01-13 01:09:36,448 [HMaster] DEBUG
> org.apache.hadoop.hbase.regionserver.HLog: Splitting 6 of 6:
> hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/log_XX.XX.XX.142_1231717984112_60020/hlog.dat.1231718928939
> 2009-01-13 01:09:37,396 [IPC Server handler 4 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Waiting on XX.XX.XX142:60020
> removal from dead list before processing report-for-duty request
> 2009-01-13 01:09:38,591 [HMaster] DEBUG
> org.apache.hadoop.hbase.regionserver.HLog: Creating new log file writer for
> path
> hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/TestTable/712889985/oldlogfile.log
> and region TestTable,0040922294,1231559109829
> 2009-01-13 01:09:38,670 [HMaster] DEBUG
> org.apache.hadoop.hbase.regionserver.HLog: Creating new log file writer for
> path
> hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/TestTable/484208094/oldlogfile.log
> and region TestTable,0042007133,1231628296909
> 2009-01-13 01:09:45,096 [HMaster] INFO
> org.apache.hadoop.hbase.regionserver.HLog: log file splitting completed for
> hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/log_XX.XX.XX.142_1231717984112_60020
> 2009-01-13 01:09:47,317 [SocketListener0-2] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for
> row <> in tableName .META.: location serverXX.XX.XX.142:60020, location
> region name .META.,,1
> 2009-01-13 01:09:47,416 [IPC Server handler 4 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Waiting on XX.XX.XX142:60020
> removal from dead list before processing report-for-duty request
> 2009-01-13 01:09:47,518 [IPC Server handler 3 on 60000] INFO
> org.apache.hadoop.hbase.master.RegionManager: assigning region -ROOT-,,0 to
> server XX.XX.XX141:60020
> 2009-01-13 01:09:49,007 [IPC Server handler 6 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Total Load: 430, Num Servers:
> 3, Avg Load: 144.0
> 2009-01-13 01:09:50,219 [SocketListener0-0] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for
> row <> in tableName .META.: location server XX.XX.XX.142:60020, location
> region name .META.,,1
> 2009-01-13 01:09:50,539 [IPC Server handler 2 on 60000] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received
> MSG_REPORT_PROCESS_OPEN: -ROOT-,,0 from XX.XX.XX.141:60020
> 2009-01-13 01:09:50,539 [IPC Server handler 2 on 60000] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN:
> -ROOT-,,0 from 208.76.44.141:60020
> 2009-01-13 01:09:50,719 [SocketListener0-3] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for
> row <> in tableName .META.: location server XX.XX.XX.142:60020, location
> region name .META.,,1
> 2009-01-13 01:09:50,967 [SocketListener0-4] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for
> row <> in tableName .META.: location serverXX.XX.XX.142:60020, location
> region name .META.,,1
> 2009-01-13 01:09:52,117 [SocketListener0-5] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for
> row <> in tableName .META.: location server XX.XX.XX.142:60020, location
> region name .META.,,1
> ....
> 2009-01-13 01:09:57,426 [IPC Server handler 4 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Waiting on XX.XX.XX.142:60020
> removal from dead list before processing report-for-duty request
> ....
> 2009-01-13 01:10:45,156 [HMaster] DEBUG
> org.apache.hadoop.hbase.master.HMaster: Processing todo:
> ProcessServerShutdown of XX.XX.XX142:60020
> 2009-01-13 01:10:45,156 [HMaster] INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server XX.XX.XX.142:60020: logSplit: true, rootRescanned: false,
> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> 2009-01-13 01:10:45,156 [HMaster] DEBUG
> org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanRootRegion: process
> server shutdown scanning root region on XX.XX.XX.141
> 2009-01-13 01:10:45,182 [HMaster] DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: process server shutdown
> scanning root region on XX.XX.XX.141 finished HMaster
> 2009-01-13 01:10:45,183 [HMaster] DEBUG
> org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanMetaRegions: process
> server shutdown scanning .META.,,1 on XX.XX.XX.142:60020
> 2009-01-13 01:10:47,496 [IPC Server handler 4 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Waiting on XX.XX.XX.142:60020
> removal from dead list before processing report-for-duty request
> 2009-01-13 01:10:49,320 [IPC Server handler 8 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Total Load: 431, Num Servers:
> 3, Avg Load: 144.0
> .....
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.