On 10/10/2016 10:21 AM, Klaus Wenninger wrote: > On 10/10/2016 04:54 PM, Ken Gaillot wrote: >> On 10/10/2016 07:36 AM, Pavel Levshin wrote: >>> 10.10.2016 15:11, Klaus Wenninger: >>>> On 10/10/2016 02:00 PM, Pavel Levshin wrote: >>>>> 10.10.2016 14:32, Klaus Wenninger: >>>>>> Why are the order-constraints between libvirt & vms optional? >>>>> If they were mandatory, then all the virtual machines would be >>>>> restarted when libvirtd restarts. This is not desired nor needed. When >>>>> this happens, the node is fenced because it is unable to restart VM in >>>>> absence of working libvirtd. >>>> Was guessing something like that ... >>>> So let me reformulate my question: >>>> Why does libvirtd have to be restarted? >>>> If it is because of config-changes making it reloadable might be a >>>> solution ... >>>> >>> Right, config changes come to my mind first of all. But sometimes a >>> service, including libvirtd, may fail unexpectedly. In this case I would >>> prefer to restart it without disturbing VirtualDomains, which will fail >>> eternally. >> I think the mandatory colocation of VMs with libvirtd negates your goal. >> If libvirtd stops, the VMs will have to stop anyway because they can't >> be colocated with libvirtd. Making the colocation optional should fix that. >> >>> The question is, why the cluster does not obey optional constraint, when >>> both libvirtd and VM stop in a single transition? >> If it truly is in the same transition, then it should be honored. >> >> You have *mandatory* constraints for DLM -> CLVMd -> cluster-config -> >> libvirtd, but only an *optional* constraint for libvirtd -> VMs. >> Therefore, libvirtd will generally have to wait longer than the VMs to >> be started. >> >> It might help to add mandatory constraints for cluster-config -> VMs. >> That way, they have the same requirements as libvirtd, and are more >> likely to start in the same transition. >> >> However I'm sure there are still problematic situations. What you want >> is a simple idea, but a rather complex specification: "If rsc1 fails, >> block any instances of this other RA on the same node." >> >> It might be possible to come up with some node attribute magic to >> enforce this. You'd need some custom RAs. I imagine something like one >> RA that sets a node attribute, and another RA that checks it. >> >> The setter would be grouped with libvirtd. Anytime that libvirtd starts, >> the setter would set a node attribute on the local node. Anytime that >> libvirtd stopped or failed, the setter would unset the attribute value. >> >> The checker would simply monitor the attribute, and fail if the >> attribute is unset. The group would have on-fail=block. So anytime the >> the attribute was unset, the VM would not be started or stopped. (There >> would be no constraints between the two groups -- the checker RA would >> take the place of constraints.) > > In how far would that behave differently to just putting libvirtd > into this on-fail=block group? (apart from of course the > possibility to group the vms into more than one group ...)
You could stop or restart libvirtd without stopping the VMs. It would cause a "failure" of the checker that would need to be cleaned later, but the VMs wouldn't stop. >> >> I haven't thought through all possible scenarios, but it seems feasible >> to me. >> >>> In my eyes, these services are bound by a HARD obvious colocation >>> constraint: VirtualDomain should never ever be touched in absence of >>> working libvirtd. Unfortunately, I cannot figure out a way to reflect >>> this constraint in the cluster. >>> >>> >>> -- >>> Pavel Levshin _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org