On 28 April 2014 13:30, Jiangying (Jenny) <[email protected]> wrote: > Nova now can detect host unreachable. But it fails to make out host > isolation, host dead and nova compute service down. When host unreachable is > reported, users have to find out the exact state by himself and then take > the appropriate measure to recover. Therefore we’d like to improve the host > detection for nova. > > Currently the service group API factors out the host detection and makes it > a set of abstract internal APIs with a pluggable backend implementation. The > backend we designed is as follows: > > A detection central agent is introduced. When a member joins into the > service group, the member host starts to send network heartbeat to the > central agent and writes timestamp in shared storage periodically. When the > central agent stops receiving the network heartbeats from a member, it pings > the member and checks the storage heartbeat before declaring the host to > have failed. > > ---------------------------------------------------------------------------------------------------------------- > > network heartbeat|network ping|storage heartbeat| state | reason > > ------------------------|-----------------|------------------------|---------------------------|------------------------------------------ > > OK | - | - | Running | - > > Not OK | Not OK | Not OK | Dead | > hardware failure/abnormal host shut down > > Not OK | OK | Not OK | Service unreachable| > service process crashed > > Not OK | Not OK | OK | Isolated | > network unreachable > > ---------------------------------------------------------------------------------------------------------------- > > Based on the state recognition table, nova can discern the exact host state > and assign the reasons. > > Thoughts?
I don't think Nova should try to include functionality that re-implements other good monitoring tools (Nagios, etc) Having said that, having a new service group API that uses information from external tools to decide if a host is dead or not, and describes why, is maybe worth considering. John _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
