Jayapal, I have gone through the FS posted @ https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services
Following are the few review comments: 1. First line in the Introduction section says "Virtual router has running services which needs to run always until cloudsack disable it." What is the meaning of disable by cloudstack ? If cloudstack disables few services how the monitoring tool differentiate whether the service is disabled by cloudstack admin or its due to some failure? 2. Is monitoring VR services is optional or will be monitored always? Any ways to set whether to enable this feature or not? 3. Is service monitoring frequency configurable? If yes how do we configure? FS says the default value is 5 secs. 4. FS says monitoring VR services has two tasks. 1. monitoring services in VR 2. sending alerts from router to external receivers What external receivers we will be supporting? Also please specify what all the ways the monitoring tool indicates the failure? Are we going to use exiting Cloudstack Alerts and Events framework to indicate the failure? 5. If multiple instances of the same processes are running do we monitor all the instances of the same process? 6. After how many restarts the monitoring service decides that something is wrong with the process in bringing it up? 7. After N no.of restarts if the process is still not running are we going to remove it from the monitoring processes list? If yes how the tools informs the admin that it is not able to restart the process? Or it will be restarting the process forever? 8. Is there way for the admin to specify the tool to monitor only particular services? 9. Apart from dnsmasq,haproxy,sshd,apache webserver services are we not monitoring the password service(socat)? Socat process is not mentioned in the Monitoring Services section in the FS 10. Is this supported in RVR case as well? 11. Specify the hypervisors supported for this feature? 12. As per my understanding this tool will be part of systemvm.iso. After upgrade from pre 4.3 release to 4.3 iso will be pushed to the hypervisors. So stop, start VR is required for the exiting VRs to get this service. Please confirm. 13. Please specify the expected date for confirming the scope for failure notifications. Scope is not clear from "sending alerts from router" section in FS Thanks, Sanjeev -----Original Message----- From: John Kinsella [mailto:j...@stratosec.co] Sent: Thursday, November 07, 2013 6:26 AM To: <dev@cloudstack.apache.org> Cc: <us...@cloudstack.apache.org> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router Thx for putting this together, Jayapal. A few comments: I'd really like to have a config flag to specify if things should be restarted automatically or not. Worst case, track the restarts - if a service is restarted more than X times in Y seconds, something's obviously wrong so stop tail-chasing[1]. Personally I'm much more interested in knowing there's a problem and then taking whatever happens to be the appropriate actions for our situation. Regarding communicating with a monitoring system - what makes more sense to me is setting up a solid framework that provides folks flexibility to use various monitoring tools, from sending an email to contacting pager duty or whatever. So, to me there's 3 parts to that: 1) At VR creation, ACS calls defined hook-script which knows how to contact monitoring system to tell it about system to monitor 2) At boot, VR sends API query to which the mgmt server responds with a URL for an install script - VR runs that to download/setup appropriate monitoring agent 3) VR has standardized scripts for agent to call to find out what should be running, and then agent can go check for itself. With a setup like this, you can support SNMP, Opsview/Nagios, Monit, NSA, Zenoss, HPOV, Tivoli, etc etc etc. I'll happily write the Opsview/Nagios module (I'm thinking module is hosted outside ACS, but I guess it could be a plugin - see earlier licensing points). Thoughts? Just my 2c. Happy to tweak wiki if folks lean towards this. John 1: Aside - this applies to SSVM creation currently - that hamster[2] keeps trying to spin that create SSVM wheel.. 2: Apache CloudHamster, CloudMonkey's furry monitoring friend? On Nov 6, 2013, at 7:58 AM, Jayapal Reddy Uradi <jayapalreddy.ur...@citrix.com> wrote: > Please find below update FS > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+s > ervices > > Thanks, > Jayapal > > On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <santhosh.eduku...@citrix.com> > wrote: > >> A shell script can be used. Few thoughts below: >> >> 1. Collect the process id of all daemons you wanted to monitor using "pidof" >> of command and then use "kill" command to check if the pid you got is valid. >> Using kill we can send a signal 0, then check the status using echo $? . For >> sending a notification use linux syslog call ( man 3 syslogd) or "logger" >> command to send to syslog. If wanted to send email then you may also have to >> look for firewall not allowing outbound smtp port communiation. Even for >> snmp this holds same( i mean if any blocking through firewall rules ). >> Using syslog may be good as it by default exposes various debug log levels >> through its api call. >> >> Now, to keep the monitor script up always up and runninig. Keep the monitor >> script run continuosly through cron or at at regular\scheduled intervals. >> This way even if monitor script goes down, the next xth interval, it is up >> again. >> >> With this there is a catch though, we may got multiple pids for a given >> daemon provided if there are multiple daemons spawned by same\multiple >> applications, if this scenario is not common then its ok, otherwise we may >> have to track it differently maintaining state of each spawned daemon and >> see if it exists. If multiple applications launch the same daemon, you may >> also wanted to say its application which got killed. EX: A launched httpd, >> and during its exit logic, it is killing all daemons it launched, then you >> may wanted to add A is not available, rather than just http is not >> available. >> >> >> 2. Using netstat command : Check for available, listening and active ports >> on local host, provided all the daemons you wanted to monitor are running on >> "standard" ports or if we know the listening ports of those deamons to be >> monitored. Again, this script can be added through cron\at to be scheduled >> to run x units, if it gets killed the next x units after the monitor script >> is up again. >> >> Also, there could be many other approaches as well. >> >> >> Thanks! >> Santhosh >> ________________________________________ >> From: Jayapal Reddy Uradi [jayapalreddy.ur...@citrix.com] >> Sent: Saturday, October 05, 2013 5:17 AM >> To: <dev@cloudstack.apache.org> >> Cc: <us...@cloudstack.apache.org> >> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router >> >> Hi, >> >> +users list >> If any one is already using any tools for monitoring then please share your >> ideas. >> Also share the cases where you experienced service crashes. >> >> Thanks, >> Jayapal >> >> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <chiradeep.vit...@citrix.com> >> wrote: >> >>> Well just make sure that your script is resilient to its own crashes >>> as well. >>> >>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" >>> <jayapalreddy.ur...@citrix.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I am planning to write script utility to monitor processes and >>>> restart on the event of failure. It will also logs the events. >>>> >>>> Thanks, >>>> Jayapal >>>> >>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <swel...@ena.com> wrote: >>>> >>>>> supervisord maybe? >>>>> >>>>> ----- Original Message ----- >>>>> >>>>> From: "Chiradeep Vittal" <chiradeep.vit...@citrix.com> >>>>> To: dev@cloudstack.apache.org >>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM >>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router >>>>> >>>>> Got it. Any other OSS tool out there similar to monit? >>>>> >>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote: >>>>> >>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal >>>>>> <chiradeep.vit...@citrix.com> wrote: >>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts. >>>>>>> It >>>>>>> is >>>>>>> simply too generic for the requirements outlined here. The >>>>>>> proposal does not talk about modifying monit, just using it. >>>>>>> That wouldn't trigger the AGPL. >>>>>> >>>>>> Let me restate my objection to anything AGPL. >>>>>> People are largely comfortable with GPLv2 software - Linux is >>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 >>>>>> software (we actually saw this when CS was GPLv3 licensed.) But >>>>>> the Affero GPL license is anathema in many corporate >>>>>> environments, and by forcing it on folks in the default System VM >>>>>> I fear it will hurt adoption of CloudStack. >>>>>> >>>>>> --David >>>>> >>>>> >>>> >>> >> >