[
https://issues.apache.org/jira/browse/HBASE-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665296#action_12665296
]
Jim Kellerman commented on HBASE-1123:
--------------------------------------
I'd expect it to be similar to what happens when a region server does not
respond to a client because it is too busy.
Don't we get rpc retries in that case?
I don't have the region server log, so I can't tell exactly what happened in
this case. Is there a region server log still
available from when this happened?
> Server never leaves the dead list though logs have all been processed if
> crashed server had -ROOT- (seemingly)
> --------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1123
> URL: https://issues.apache.org/jira/browse/HBASE-1123
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.19.0
> Reporter: stack
> Assignee: Jim Kellerman
> Fix For: 0.20.0
>
> Attachments: 1123.patch
>
>
> Cluster is just hung after host that had -ROOT- completed splitting its
> logs... old server is just stuck on the dead list and never comes off it.
> {code}
> ..
> 2009-01-13 01:09:36,448 [HMaster] DEBUG
> org.apache.hadoop.hbase.regionserver.HLog: Splitting 6 of 6:
> hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/log_XX.XX.XX.142_1231717984112_60020/hlog.dat.1231718928939
> 2009-01-13 01:09:37,396 [IPC Server handler 4 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Waiting on XX.XX.XX142:60020
> removal from dead list before processing report-for-duty request
> 2009-01-13 01:09:38,591 [HMaster] DEBUG
> org.apache.hadoop.hbase.regionserver.HLog: Creating new log file writer for
> path
> hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/TestTable/712889985/oldlogfile.log
> and region TestTable,0040922294,1231559109829
> 2009-01-13 01:09:38,670 [HMaster] DEBUG
> org.apache.hadoop.hbase.regionserver.HLog: Creating new log file writer for
> path
> hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/TestTable/484208094/oldlogfile.log
> and region TestTable,0042007133,1231628296909
> 2009-01-13 01:09:45,096 [HMaster] INFO
> org.apache.hadoop.hbase.regionserver.HLog: log file splitting completed for
> hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/log_XX.XX.XX.142_1231717984112_60020
> 2009-01-13 01:09:47,317 [SocketListener0-2] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for
> row <> in tableName .META.: location serverXX.XX.XX.142:60020, location
> region name .META.,,1
> 2009-01-13 01:09:47,416 [IPC Server handler 4 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Waiting on XX.XX.XX142:60020
> removal from dead list before processing report-for-duty request
> 2009-01-13 01:09:47,518 [IPC Server handler 3 on 60000] INFO
> org.apache.hadoop.hbase.master.RegionManager: assigning region -ROOT-,,0 to
> server XX.XX.XX141:60020
> 2009-01-13 01:09:49,007 [IPC Server handler 6 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Total Load: 430, Num Servers:
> 3, Avg Load: 144.0
> 2009-01-13 01:09:50,219 [SocketListener0-0] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for
> row <> in tableName .META.: location server XX.XX.XX.142:60020, location
> region name .META.,,1
> 2009-01-13 01:09:50,539 [IPC Server handler 2 on 60000] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received
> MSG_REPORT_PROCESS_OPEN: -ROOT-,,0 from XX.XX.XX.141:60020
> 2009-01-13 01:09:50,539 [IPC Server handler 2 on 60000] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN:
> -ROOT-,,0 from 208.76.44.141:60020
> 2009-01-13 01:09:50,719 [SocketListener0-3] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for
> row <> in tableName .META.: location server XX.XX.XX.142:60020, location
> region name .META.,,1
> 2009-01-13 01:09:50,967 [SocketListener0-4] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for
> row <> in tableName .META.: location serverXX.XX.XX.142:60020, location
> region name .META.,,1
> 2009-01-13 01:09:52,117 [SocketListener0-5] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for
> row <> in tableName .META.: location server XX.XX.XX.142:60020, location
> region name .META.,,1
> ....
> 2009-01-13 01:09:57,426 [IPC Server handler 4 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Waiting on XX.XX.XX.142:60020
> removal from dead list before processing report-for-duty request
> ....
> 2009-01-13 01:10:45,156 [HMaster] DEBUG
> org.apache.hadoop.hbase.master.HMaster: Processing todo:
> ProcessServerShutdown of XX.XX.XX142:60020
> 2009-01-13 01:10:45,156 [HMaster] INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server XX.XX.XX.142:60020: logSplit: true, rootRescanned: false,
> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> 2009-01-13 01:10:45,156 [HMaster] DEBUG
> org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanRootRegion: process
> server shutdown scanning root region on XX.XX.XX.141
> 2009-01-13 01:10:45,182 [HMaster] DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: process server shutdown
> scanning root region on XX.XX.XX.141 finished HMaster
> 2009-01-13 01:10:45,183 [HMaster] DEBUG
> org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanMetaRegions: process
> server shutdown scanning .META.,,1 on XX.XX.XX.142:60020
> 2009-01-13 01:10:47,496 [IPC Server handler 4 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Waiting on XX.XX.XX.142:60020
> removal from dead list before processing report-for-duty request
> 2009-01-13 01:10:49,320 [IPC Server handler 8 on 60000] DEBUG
> org.apache.hadoop.hbase.master.ServerManager: Total Load: 431, Num Servers:
> 3, Avg Load: 144.0
> .....
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.