rhtyd commented on issue #1960: [4.11/Future] CLOUDSTACK-9782: Host HA and KVM HA provider URL: https://github.com/apache/cloudstack/pull/1960#issuecomment-299110561 @koushik-das Host-HA is disabled by default, it needs explicit configuration and to be enabled in order to be used. It further provides kill-switches (explicit disable APIs) for hosts across partitions (zone/cluster/host). Further, it has strict eligibility checks with pluggable mechanism/implementation that ensure only eligible resources take part in the HA lifecycle. The framework is written in an agnostic way to re-think our HA approach, and in comparison with the status quo this work provides mechanisms for eligibility checks, to degrade and recover a HA resource and policies/mechanisms to conduct (configurable) rounds of investigation and decisions based on a threshold ratios (granular/configurable). The new agnostic HA framework also has resource life cycle management using bounded ephemeral queues for managing very large environments (10k+ hosts), improved background polling task management against a FSM, reporting/alerting, ownership management of resource (in case of multiple-mgmt servers). Pluggability of the framework was another reason, it cleanly allows for large scale users to implement their own algorithms/logic to carry out various operations such as investigate/activity-checks/degrade/recover/fence etc. Due to several reasons and requirements, oobm could not been simply put in existing KVM specific VM-HA. In order for this to work, VM-HA and Host-HA must work in tandem and the cleanest way was to control VM-HA in a safe and backward-compatible way to over-ride the status returned by the KVM investigator based on Host HA FSM states. This strictly works only when host-ha is enabled for a host, I'll tag you on the specific code where this happen for you to verify. Any regressions/issues (if it creeps in) will not affect all HVs since the injected override is specific to KVM only. See our FS, last section on VM-HA Host-HA coordination. Lastly, remember this should not cause any issues post-upgrade for general users as the feature itself is disabled by default. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
With regards, Apache Git Services