On Tue, Nov 6, 2012 at 10:30 PM, Gao,Yan <y...@suse.com> wrote: > Hi, > > Currently, we can manage VMs via the VM agents. But the services running > within VMs are not very easy to be monitored. If we could use > nagios/icinga probes from the host to the guest, that would allow us to > achieve this. > > Lars, Dejan and I have been discussing on this for some time. There have > been quite some thoughts on how to implement it. Now we are inclined to > a proposal from Lars. Please let me introduce the idea here, and see > what you think about it. > > First, we could add a resource agent class. The RAs belonging to this > class wrap around nagois/icinga probes. They can be configured as > special monitor operations for the VMs. The behaviors should be like: > > 1. The special monitor operations start working after the VMs and the > services inside are started. > > 2. Any failure of the monitor operations is treated as the failure of > the VM, which triggers the recovery of the VM. > > Let me show a example: > > primitive db-vm ocf:heartbeat:VirtualDomain \ > params config="db-vm" hypervisor="xen:///" \ > ip="192.168.1.122" \ > op monitor nagios:ftp interval="30s" params user="test" > > The "nagios:ftp" specifies which monitor agent is used to monitor the > VM. It's an optional attributes group expressing "class/provider/type" > of the monitor agent, which defaults to "ocf:heartbeat:VirtualDomain" > for this VM (if so, the monitor would be a normal one like we usually > configure). We can add more monitors like "nagios:www" type and so on.
What do you propose the XML should look like? > We can specify particular "params" for a monitor. And the "ip" is > actually not a useful parameter for the VirtualDomain, we put it there > for its monitor operations to inherit, so that we don't have to specify > for each monitor respectively. You plan to add 'ip' to the VirtualDomain metadata? > > > Other issues: > - As we can see, there's some time window between when the VM is > started, but prior to the monitored service starting. A solution is > adding a "first-failure" flag for the monitor operation, which could > allow us to ignore the *first* failures of a monitor until it has > returned healthy once, unless the time is out. Ideally, it could be > handled in LRM. What happens if there is never a first success? The cluster will never find out. > > - A limitation is we would have to specify different monitor interval > values for the services within a VM. Probably we could fix it in some > way finally. > > > Anyway, this's the most straightforward solution we can think of so far > (Please correct me if I'm missing anything). It's open for discussion. > Any comments and suggestions are welcome and appreciated. Doesn't look too bad. Some finer points to discuss but I'm sure we can reach agreement. > > Thanks, > Gao,Yan > -- > Gao,Yan <y...@suse.com> > Software Engineer > China Server Team, SUSE. > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org