Good idea. If x and y and z are borked, initiate shutdown? More generically, it seems we need some form of in-VM automation that can co-ordinate with top-level orchestration
On 9/28/13 4:14 AM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >Even when always restarting on every glitch we need to monitor the inside >of the vr to know when to restart/respin a new vr. There is much >functionality present on the vr an for us it is not possible to say for >sure what is important to a customer installation so the admin should be >able to define the minimal reqs that will stop us from spinning up a new >vr. And there must be tools present for monitoring these reqs. > >makes sense? > > >On Thu, Sep 26, 2013 at 10:01 PM, David Nalley <da...@gnsa.us> wrote: > >> For what it's worth we created an ACS-specific MIB (beneath the >> org.apache MIB) so really this is just a matter of defining and >> publishing it. >> >> But lets think about monit being used to restart services - with HA, >> Redundant VR, are we sure that we want to inject yet another point of >> control into things? Is it better to just respawn an instance since >> they are essentially stateless? I don't know, but management server, >> local daemons, and other SysVMs making decisions seems like we are >> increasing complexity. >> >> --David >> >> On Thu, Sep 26, 2013 at 10:31 AM, Chiradeep Vittal >> <chiradeep.vit...@citrix.com> wrote: >> > In this case you would have to invent another enterprise MIB. Not too >> > hard, but I'd argue that it needs to be proxied through some other >> service >> > anyway and it represents a different integration point with ACS. >>Depends >> > on whether you consider the system vm part of the ACS deployment, or >>an >> > entity like a host. >> > >> > On 9/26/13 10:27 AM, "Alex Huang" <alex.hu...@citrix.com> wrote: >> > >> >>Using SNMP for alert notification is not a bad idea though. I don't >>see >> >>why we can't do that instead of posting to the management server. >>This >> >>is specifically referring to the second part of the proposal. Why >> >>reinvent that part of it? >> >> >> >>--Alex >> >> >> >>> -----Original Message----- >> >>> From: Chiradeep Vittal [mailto:chiradeep.vit...@citrix.com] >> >>> Sent: Wednesday, September 25, 2013 10:28 PM >> >>> To: dev@cloudstack.apache.org >> >>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router >> >>> >> >>> SNMP wouldn't restart a failed process nor would it generate >>alerts. It >> >>>is >> >>> simply too generic for the requirements outlined here. The proposal >> does >> >>> not talk about modifying monit, just using it. That wouldn't trigger >> >>>the AGPL. >> >>> I think the idea is to have a tight monitoring loop that scales: so >> >>>executing the >> >>> monitoring loop in-situ makes sense. >> >>> >> >>> >> >>> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote: >> >>> >> >>> >On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi >> >>> ><jayapalreddy.ur...@citrix.com> wrote: >> >>> >> Hi, >> >>> >> >> >>> >> Currently in virtual router there is no way to recover and >>notify if >> >>> >>some service goes down unexpectedly. >> >>> >> >> >>> >> This feature is about monitoring all the services rendered by the >> >>> >>virtual router, ensure that the services are running through the >>life >> >>> >>time of the VR. >> >>> >> >> >>> >> On service failure: >> >>> >> 1. Generate an alert and event indicating failure 2. Restart the >> >>> >> service >> >>> >> >> >>> >> Services to be monitored: >> >>> >> DHCP, DNS, haproxy, password server etc. >> >>> >> >> >>> >> As part of monitoring there are two activities >> >>> >> >> >>> >> 1. One is monitoring the services in VR and log the events. Using >> >>> >>monit for monitoring services 2. Second part is pushing alerts >>from >> >>> >>router to MS server. Thinking on POST the logs to web server in >>MS. >> >>> >> >> >>> >> I will be updating more details and FS in this thread. >> >>> >> >> >>> >> I created enhancement bug for this. >> >>> >> https://issues.apache.org/jira/browse/CLOUDSTACK-4736 >> >>> >> >> >>> >> Thanks, >> >>> >> Jayapal >> >>> > >> >>> >So several things - why not make this via SNMP? Query processes, >>and >> >>> >many other things. This should be relatively simple, is well known, >> can >> >>> >be locked down (or could be monitored for many other things by >> external >> >>> >monitoring packages) and is the defacto standard for monitoring >>hosts. >> >>> >Second - monit is Affero GPL licensed - which is a cat-x license. >> >>> >While I expect that we would merely use this and not do any >>hacking on >> >>> >it - I think its inclusion might be a surprise (and forbidden in >>many >> >>> >environments) to our users >> >>> > >> >>> >--David >> >> >> > >>