Currently the way Cloudstack deals with PS failure is to reboot all hosts 
associated with the cluster. Selectively cleaning up the affected VMs would 
have been the best option, but since issues were seen with stopping VMs on the 
hypervisors (at least in Xenserver 5.6 [1]) reboot was the next option. The 
down side with this approach is if there are more than one PS in the cluster 
then healthy VMs will unnecessarily get affected due to host reboots.

Recently I tried this scenario using both XS 6.1 and 6.2. On 6.1 I think the 
behaviour is similar to 5.6, if the PS is not available then any operation the 
VM like shutdown would hang (waited for more than 30 mins and the operation was 
still stuck). But on 6.2 looks like these scenarios are handled more 
gracefully. In 6.2 on doing a shutdown the VMs power state was changed to 
'halted' and then it was possible to even destroy the VM. Based on this I think 
that at least for XS 6.2 we can do a selective VM cleanup if the PS is not 
available. For older XS version the existing approach would still be used.

Thoughts/comments?

Also for KVM the same approach is used. Can someone let me know if newer 
versions of KVM can handle primary storage failure in a better way wrt to VM 
operations? In that case for KVM also the behaviour can be changed.

For Vmware since it is an externally managed cluster I don't think this issue 
exists.

Thanks,
Koushik

[1] https://issues.apache.org/jira/browse/CLOUDSTACK-3367
[2] http://comments.gmane.org/gmane.comp.apache.cloudstack.user/4254

Reply via email to