[ https://issues.apache.org/jira/browse/CLOUDSTACK-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohit Yadav reassigned CLOUDSTACK-8943: --------------------------------------- Assignee: Rohit Yadav > KVM HA is broken, let's fix it > ------------------------------ > > Key: CLOUDSTACK-8943 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8943 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Environment: Linux distros with KVM/libvirt > Reporter: Nux > Assignee: Rohit Yadav > > Currently KVM HA works by monitoring an NFS based heartbeat file and it can > often fail whenever this network share becomes slower, causing the > hypervisors to reboot. > This can be particularly annoying when you have different kinds of primary > storages in place which are working fine (people running CEPH etc). > Having to wait for the affected HV which triggered this to come back and > declare it's not running VMs is a bad idea; this HV could require hours or > days of maintenance! > This is embarrassing. How can we fix it? Ideas, suggestions? How are other > hypervisors doing it? > Let's discuss, test, implement. :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)