For me the most useful part is that I can take down the agent without
anything freaking out. It's fairly common to upgrade the agent (or
stop for a minute or two) without putting the host into maintenance
and dealing with migrations. That's by far the normal case for when
the agent isn't running, I don't think I've seen it crash yet or
become unresponsive (knock on wood). This works fine now, but it
probably won't once the HA stuff is patched up, so something like this
is nice. However, I assume that it would be a part of the agent
packaging, thus anything running this would also be killed/upgraded
during such a procedure. I suppose it doesn't have to be like that,
but it seems like a bit of a juggling act to keep two services going
and make sure at least one is running at all times during upgrades.

On Wed, Aug 7, 2013 at 1:30 PM, Jörgen Maas <jorgen.m...@gmail.com> wrote:
> why not use systemd and launchd facilities ?
>
>
> On Wed, Aug 7, 2013 at 8:28 PM, Edison Su <edison...@citrix.com> wrote:
>>
>>
>>
>> > -----Original Message-----
>> > From: Wido den Hollander [mailto:w...@widodh.nl]
>> > Sent: Wednesday, August 07, 2013 10:53 AM
>> > To: dev@cloudstack.apache.org
>> > Cc: shadow...@gmail.com
>> > Subject: [KVM] Helper for agent during HA operations
>> >
>> > Hi,
>> >
>> > In our production setups we have seen some crashes of the KVM agent.
>>
>> If we can make sure KVM agent restarted immediately after crash, then you
>> don't another separate service running on your KVM host.
>> Not sure jsvc can automatically restart agent or not, I remember we have a
>> small c daemon program in the 3.0.x source code, which can monitor agent.
>>
>> > This could happen for all kinds of reasons, but that's not what I wanted
>> > to
>> > discuss.
>> >
>> > Also see this issue: https://issues.apache.org/jira/browse/CLOUDSTACK-
>> > 3954
>> >
>> > What I've been writing for a PoC in our company is a small helper
>> > written in
>> > Python which runs on port 8251.
>> >
>> > The Investigator can query this webservice (attached) which will simply
>> > tell it
>> > which VMs are running on that host.
>> >
>> > It's online here: http://stack01.ceph.widodh.nl:8251/
>> >
>> > You can also do a query like this:
>> > http://stack01.ceph.widodh.nl:8251/ping/i-2-6570-VM
>> >
>> > This way we can more reliably verify if a specific VM is still running
>> > if the
>> > Agent stops responding for some reason. A ICMP echo-request isn't safe
>> > since the Security Groups could prevent ICMP from coming through.
>> >
>> > I'd rather not have the management server query libvirt directly, since
>> > that
>> > would open a potential security whole. This webservice is read-only and
>> > on
>> > my production setups I have libvirt listening on the private bridge
>> > only.
>> >
>> > What do you think?
>> >
>> > Wido
>
>
>
>
> --
> Grtz,
> Jörgen Maas

Reply via email to