[
https://issues.apache.org/jira/browse/HBASE-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jim Kellerman resolved HBASE-611.
---------------------------------
Resolution: Fixed
Added method isHealthy to HRegionServer. Reviewed by Stack. Committed
> regionserver should do basic health check before reporting alls-well to the
> master
> ----------------------------------------------------------------------------------
>
> Key: HBASE-611
> URL: https://issues.apache.org/jira/browse/HBASE-611
> Project: Hadoop HBase
> Issue Type: Improvement
> Affects Versions: 0.1.2
> Reporter: stack
> Priority: Minor
> Fix For: 0.2.0
>
>
> On IRC this afternoon, a user killed a regionserver. It did something in
> HDFS. Another regionserver, one carrying the catalog tables, started to get
> exceptions out of HDFS. The last thing out of it was:
> {code}
> [15:55] <jgray> 2008-05-01 15:49:51,710 FATAL
> org.apache.hadoop.hbase.HRegionServer: Replay of hlog required. Forcing
> server restart
> [15:55] <jgray> org.apache.hadoop.hbase.DroppedSnapshotException: Could
> not get block locations. Aborting...
> {code}
> Thats fine.
> Only it didn't go down... it was in a state where it continued to send the
> master pings as though nothing was wrong so its lease never timed out and
> master was hosed because it couldn't get to catalog tables.
> Regionservers should do a basic check that alls-healthy before they ping the
> master. If critical threads have exited or a flag saying hdfs has been found
> bad has been set, then regionserver should stop reporting the master so
> master can deploy its load elsewhere.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.