This is a very useful feature. Can this be extended to the other system VMs? SSVM and CPVM
Based on the discussion I see that there is an assumption that restarting services/rebooting should fix the issues. Is that always true? What if the service fails to restart after repeated attempts? What is the fallback? -Koushik On 01-Oct-2013, at 3:15 AM, Chiradeep Vittal <[email protected]> wrote: > Good idea. If x and y and z are borked, initiate shutdown? > > More generically, it seems we need some form of in-VM automation that can > co-ordinate with top-level orchestration > > On 9/28/13 4:14 AM, "Daan Hoogland" <[email protected]> wrote: > >> Even when always restarting on every glitch we need to monitor the inside >> of the vr to know when to restart/respin a new vr. There is much >> functionality present on the vr an for us it is not possible to say for >> sure what is important to a customer installation so the admin should be >> able to define the minimal reqs that will stop us from spinning up a new >> vr. And there must be tools present for monitoring these reqs. >> >> makes sense? >> >> >> On Thu, Sep 26, 2013 at 10:01 PM, David Nalley <[email protected]> wrote: >> >>> For what it's worth we created an ACS-specific MIB (beneath the >>> org.apache MIB) so really this is just a matter of defining and >>> publishing it. >>> >>> But lets think about monit being used to restart services - with HA, >>> Redundant VR, are we sure that we want to inject yet another point of >>> control into things? Is it better to just respawn an instance since >>> they are essentially stateless? I don't know, but management server, >>> local daemons, and other SysVMs making decisions seems like we are >>> increasing complexity. >>> >>> --David >>> >>> On Thu, Sep 26, 2013 at 10:31 AM, Chiradeep Vittal >>> <[email protected]> wrote: >>>> In this case you would have to invent another enterprise MIB. Not too >>>> hard, but I'd argue that it needs to be proxied through some other >>> service >>>> anyway and it represents a different integration point with ACS. >>> Depends >>>> on whether you consider the system vm part of the ACS deployment, or >>> an >>>> entity like a host. >>>> >>>> On 9/26/13 10:27 AM, "Alex Huang" <[email protected]> wrote: >>>> >>>>> Using SNMP for alert notification is not a bad idea though. I don't >>> see >>>>> why we can't do that instead of posting to the management server. >>> This >>>>> is specifically referring to the second part of the proposal. Why >>>>> reinvent that part of it? >>>>> >>>>> --Alex >>>>> >>>>>> -----Original Message----- >>>>>> From: Chiradeep Vittal [mailto:[email protected]] >>>>>> Sent: Wednesday, September 25, 2013 10:28 PM >>>>>> To: [email protected] >>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router >>>>>> >>>>>> SNMP wouldn't restart a failed process nor would it generate >>> alerts. It >>>>>> is >>>>>> simply too generic for the requirements outlined here. The proposal >>> does >>>>>> not talk about modifying monit, just using it. That wouldn't trigger >>>>>> the AGPL. >>>>>> I think the idea is to have a tight monitoring loop that scales: so >>>>>> executing the >>>>>> monitoring loop in-situ makes sense. >>>>>> >>>>>> >>>>>> On 9/25/13 9:53 PM, "David Nalley" <[email protected]> wrote: >>>>>> >>>>>>> On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi >>>>>>> <[email protected]> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Currently in virtual router there is no way to recover and >>> notify if >>>>>>>> some service goes down unexpectedly. >>>>>>>> >>>>>>>> This feature is about monitoring all the services rendered by the >>>>>>>> virtual router, ensure that the services are running through the >>> life >>>>>>>> time of the VR. >>>>>>>> >>>>>>>> On service failure: >>>>>>>> 1. Generate an alert and event indicating failure 2. Restart the >>>>>>>> service >>>>>>>> >>>>>>>> Services to be monitored: >>>>>>>> DHCP, DNS, haproxy, password server etc. >>>>>>>> >>>>>>>> As part of monitoring there are two activities >>>>>>>> >>>>>>>> 1. One is monitoring the services in VR and log the events. Using >>>>>>>> monit for monitoring services 2. Second part is pushing alerts >>> from >>>>>>>> router to MS server. Thinking on POST the logs to web server in >>> MS. >>>>>>>> >>>>>>>> I will be updating more details and FS in this thread. >>>>>>>> >>>>>>>> I created enhancement bug for this. >>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-4736 >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jayapal >>>>>>> >>>>>>> So several things - why not make this via SNMP? Query processes, >>> and >>>>>>> many other things. This should be relatively simple, is well known, >>> can >>>>>>> be locked down (or could be monitored for many other things by >>> external >>>>>>> monitoring packages) and is the defacto standard for monitoring >>> hosts. >>>>>>> Second - monit is Affero GPL licensed - which is a cat-x license. >>>>>>> While I expect that we would merely use this and not do any >>> hacking on >>>>>>> it - I think its inclusion might be a surprise (and forbidden in >>> many >>>>>>> environments) to our users >>>>>>> >>>>>>> --David >>>>> >>>> >>> >
