On 19/09/16 02:30 PM, Jan Pokorný wrote: > On 18/09/16 15:37 -0400, Digimer wrote: >> If, for example, a server's definition file is corrupted while the >> server is running, rgmanager will put the server into a 'failed' state. >> That's fine and fair. > > Please, be more precise. Is it "vm" resource agent that you are talking > about, hence server is the particular virtual machine to be managed? > Is the agent in the role of a service (defined at a top-level) or > a standard resource (without special treatment, possibly with > dependent services further in the group)?
In 'clustat', vm:foo reports 'failed' after the vm.sh calls a status and gets a bad return (because the foo.xml file was corrupted by creating a typo that breaks the XML, as an example). I'm not sure if that answers your question, sorry. >> The problem is that, once the file is fixed, there appears to be no >> way to go failed -> started without disabling (and thus powering off) >> the VM. This is troublesom because it forces an interruption when the >> service could have been placed under resource management without a reboot. >> >> For example, doing 'clusvcadm -e <server>' when the service was >> 'disabled' (say because of a manual boot of the server), rgmanager >> detects that the server is running fine and simply marks the server as >> 'started'. Is there no way to do something similar to go 'failed' -> >> 'started' without the 'disable' step? > > In case it's a VM as a service, this could possibly be "exploited" > (never tested that, though): > > # MANWIDTH=72 man rgmanager | col -b \ > | sed -n '/^VIRTUAL MACHINE/{:a;p;n;/^\s*$/d;ba}' >> VIRTUAL MACHINE FEATURES >> Apart from what is noted in the VM resource agent, rgman- >> ager provides a few convenience features when dealing >> with virtual machines. >> * it will use live migration when transferring a virtual >> machine to a more-preferred host in the cluster as a >> consequence of failover domain operation >> * it will search the other instances of rgmanager in the >> cluster in the case that a user accidentally moves a >> virtual machine using other management tools >> * unlike services, adding a virtual machine to rgman- >> ager’s configuration will not cause the virtual machine >> to be restarted >> * removing a virtual machine from rgmanager’s >> configuration will leave the virtual machine running. > > (see the last two items). So a possible "recover" would be to remove the VM from rgmanager, then add it back? I can see that working, but it seems heavy handed. :) >> I tried freezing the service, no luck. I also tried coalescing via >> '-c', but that didn't help either. > > Any path from "failed" in the resource (group) life-cycle goes either > through "disabled" or "stopped" if I am not mistaken, so would rather > experiment with adding a new service and dropping the old one per > the above description as a possible workaround (perhaps in the reverse > order so as to retain the same name for the service, indeed unless > rgmanager would actively prevent that anyway -- no idea). This is my understanding as well, yes (that failed must go through 'disabled' or 'stopped'). I'll try the remove/re-add option and report back. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org