> -----Original Message----- > From: Wido den Hollander [mailto:w...@widodh.nl] > Sent: Wednesday, August 07, 2013 10:53 AM > To: dev@cloudstack.apache.org > Cc: shadow...@gmail.com > Subject: [KVM] Helper for agent during HA operations > > Hi, > > In our production setups we have seen some crashes of the KVM agent.
If we can make sure KVM agent restarted immediately after crash, then you don't another separate service running on your KVM host. Not sure jsvc can automatically restart agent or not, I remember we have a small c daemon program in the 3.0.x source code, which can monitor agent. > This could happen for all kinds of reasons, but that's not what I wanted to > discuss. > > Also see this issue: https://issues.apache.org/jira/browse/CLOUDSTACK- > 3954 > > What I've been writing for a PoC in our company is a small helper written in > Python which runs on port 8251. > > The Investigator can query this webservice (attached) which will simply tell > it > which VMs are running on that host. > > It's online here: http://stack01.ceph.widodh.nl:8251/ > > You can also do a query like this: > http://stack01.ceph.widodh.nl:8251/ping/i-2-6570-VM > > This way we can more reliably verify if a specific VM is still running if the > Agent stops responding for some reason. A ICMP echo-request isn't safe > since the Security Groups could prevent ICMP from coming through. > > I'd rather not have the management server query libvirt directly, since that > would open a potential security whole. This webservice is read-only and on > my production setups I have libvirt listening on the private bridge only. > > What do you think? > > Wido