On Mon, Jun 27, 2011 at 12:40:19PM +0200, Dominik Klein wrote: > >> With the agent before the mentioned patch, during probe of a newly > >> configured resource, the cluster would have learned that the VM is not > >> available on one of the nodes (ERR_INSTALLED), so it would never start > >> the resource there. > > > > This is exactly the problem with shared storage setups, where > > such an exit code can prevent resource from ever being started on > > a node which is otherwise perfectly capable of running that > > resource. > > I see and understand that that, too, is a valid setup and concern.
In this case this resource wouldn't function at all. The worst would be that the config is available on one node and the resource would be started there, but there'll be no failover, because all other nodes would report ERR_INSTALLED. > > But really, if a resource can _never_ run on a node, then there > > should be a negative location constraint or the cluster should be > > setup as asymmetrical. > > There did not have to be a negative location constraint up to now, > because the cluster took care of that. Only because it didn't work correctly. > > Now, I understand that in your case, it is > > actually due to the administrator's fault. > > Yes, that's how I noticed the problem with the agent. > > > This particular setup is a special case of shared storage. The > > images are on shared storage, but the configurations are local. I > > think that you really need to make sure that the configurations > > are present where they need to be. Best would be that the > > configuration is kept on the storage along with the corresponding > > VM image. Since you're using a raw device as image, that's > > obviously not possible. Otherwise, use csync2 or similar to keep > > files in sync. > > Actually, this is a wanted setup. It happened that VMs configs were > changed in ways that lead to a VM not being startable any more. For that > case, they wanted to be able to start the old config on the other node. Wow! So, they can have different configurations at different nodes. > I agree that the cases that lead me to finding this change in the agent > are cases that could have been solved with better configuration and that > your suggestions make sense. Still, I feel that the change introduces a > new way of doing things that might affect running and working setups in > unintended ways. I refuse to believe that I am the only one doing HA VMs > like this (although of course I might be wrong on that, too ...). The only issue you may have with this cluster is if the administrator erronously removes a config on some node, right? And that then some time afterwards the cluster does a probe on that node. And then again the cluster wants to fail over this VM to that node. And that at this point in time no other node can run this VM and that it is going to repeatedly try to start and fail. And that "failed start is fatal" isn't configured. No doubt that this could happen, but what's the probability? And, finally, that doesn't look like a well maintained cluster. Thanks, Dejan > Regards > Dominik > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
