Hi Sanjeev, Thanks your comments. Please find my comments inline. Also update the FS.
Thanks, Jayapal On 07-Nov-2013, at 11:55 AM, Sanjeev Neelarapu <sanjeev.neelar...@citrix.com> wrote: > Jayapal, > > I have gone through the FS posted @ > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services > > Following are the few review comments: > > 1. First line in the Introduction section says "Virtual router has running > services which needs to run always until cloudsack disable it." What is the > meaning of disable by cloudstack ? If cloudstack disables few services how > the monitoring tool differentiate whether the service is disabled by > cloudstack admin or its due to some failure? It means the services should run until cloudstack instruct to stop. The service disable/enable happens with network offering. on VR boot and monitor configuration get updated with new services. There are default services also. > 2. Is monitoring VR services is optional or will be monitored always? Any > ways to set whether to enable this feature or not? Currently it is not configurable.By default monitoring default services like sshd, web server. > 3. Is service monitoring frequency configurable? If yes how do we > configure? FS says the default value is 5 secs. No. > 4. FS says monitoring VR services has two tasks. > 1. monitoring services in VR > 2. sending alerts from router to external receivers > What external receivers we will be supporting? Also please specify what all > the ways the monitoring tool indicates the failure? Are we going to use > exiting Cloudstack Alerts and Events framework to indicate the failure? This item will be updated once finalised about sending alerts from VR. > 5. If multiple instances of the same processes are running do we monitor > all the instances of the same process? It monitors the parent service, which has its pid in pid file. > 6. After how many restarts the monitoring service decides that something > is wrong with the process in bringing it up? five > 7. After N no.of restarts if the process is still not running are we going > to remove it from the monitoring processes list? If yes how the tools informs > the admin that it is not able to restart the process? Or it will be > restarting the process forever? Unmonit process after N number re tries is not there. monitor log the service fail. Admin can knows only from the logs. For this release sending alerts from VR is not implemented. > 8. Is there way for the admin to specify the tool to monitor only > particular services? Currently the services are selected based on network offering and default services from db. Configuring services from API/UI is not there. > 9. Apart from dnsmasq,haproxy,sshd,apache webserver services are we not > monitoring the password service(socat)? Socat process is not mentioned in the > Monitoring Services section in the FS Not monitoring socat because socat is automatically restarted by password server > 10. Is this supported in RVR case as well? No. > 11. Specify the hypervisors supported for this feature? xen,kvm and vmware > 12. As per my understanding this tool will be part of systemvm.iso. After > upgrade from pre 4.3 release to 4.3 iso will be pushed to the hypervisors. So > stop, start VR is required for the exiting VRs to get this service. Please > confirm. yes > 13. Please specify the expected date for confirming the scope for failure > notifications. Scope is not clear from "sending alerts from router" section > in FS > > Thanks, > Sanjeev > > -----Original Message----- > From: John Kinsella [mailto:j...@stratosec.co] > Sent: Thursday, November 07, 2013 6:26 AM > To: <dev@cloudstack.apache.org> > Cc: <us...@cloudstack.apache.org> > Subject: Re: [PROPOSAL] Service monitoring tool in virtual router > > Thx for putting this together, Jayapal. A few comments: > > I'd really like to have a config flag to specify if things should be > restarted automatically or not. Worst case, track the restarts - if a service > is restarted more than X times in Y seconds, something's obviously wrong so > stop tail-chasing[1]. Personally I'm much more interested in knowing there's > a problem and then taking whatever happens to be the appropriate actions for > our situation. > > Regarding communicating with a monitoring system - what makes more sense to > me is setting up a solid framework that provides folks flexibility to use > various monitoring tools, from sending an email to contacting pager duty or > whatever. > > So, to me there's 3 parts to that: > 1) At VR creation, ACS calls defined hook-script which knows how to contact > monitoring system to tell it about system to monitor > 2) At boot, VR sends API query to which the mgmt server responds with a URL > for an install script - VR runs that to download/setup appropriate monitoring > agent > 3) VR has standardized scripts for agent to call to find out what should be > running, and then agent can go check for itself. > > With a setup like this, you can support SNMP, Opsview/Nagios, Monit, NSA, > Zenoss, HPOV, Tivoli, etc etc etc. I'll happily write the Opsview/Nagios > module (I'm thinking module is hosted outside ACS, but I guess it could be a > plugin - see earlier licensing points). > > Thoughts? > > Just my 2c. Happy to tweak wiki if folks lean towards this. > > John > 1: Aside - this applies to SSVM creation currently - that hamster[2] keeps > trying to spin that create SSVM wheel.. > 2: Apache CloudHamster, CloudMonkey's furry monitoring friend? > > On Nov 6, 2013, at 7:58 AM, Jayapal Reddy Uradi > <jayapalreddy.ur...@citrix.com> wrote: > >> Please find below update FS >> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+s >> ervices >> >> Thanks, >> Jayapal >> >> On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <santhosh.eduku...@citrix.com> >> wrote: >> >>> A shell script can be used. Few thoughts below: >>> >>> 1. Collect the process id of all daemons you wanted to monitor using >>> "pidof" of command and then use "kill" command to check if the pid you got >>> is valid. Using kill we can send a signal 0, then check the status using >>> echo $? . For sending a notification use linux syslog call ( man 3 syslogd) >>> or "logger" command to send to syslog. If wanted to send email then you may >>> also have to look for firewall not allowing outbound smtp port >>> communiation. Even for snmp this holds same( i mean if any blocking through >>> firewall rules ). Using syslog may be good as it by default exposes >>> various debug log levels through its api call. >>> >>> Now, to keep the monitor script up always up and runninig. Keep the monitor >>> script run continuosly through cron or at at regular\scheduled intervals. >>> This way even if monitor script goes down, the next xth interval, it is up >>> again. >>> >>> With this there is a catch though, we may got multiple pids for a given >>> daemon provided if there are multiple daemons spawned by same\multiple >>> applications, if this scenario is not common then its ok, otherwise we may >>> have to track it differently maintaining state of each spawned daemon and >>> see if it exists. If multiple applications launch the same daemon, you may >>> also wanted to say its application which got killed. EX: A launched httpd, >>> and during its exit logic, it is killing all daemons it launched, then you >>> may wanted to add A is not available, rather than just http is not >>> available. >>> >>> >>> 2. Using netstat command : Check for available, listening and active >>> ports on local host, provided all the daemons you wanted to monitor are >>> running on "standard" ports or if we know the listening ports of those >>> deamons to be monitored. Again, this script can be added through cron\at to >>> be scheduled to run x units, if it gets killed the next x units after the >>> monitor script is up again. >>> >>> Also, there could be many other approaches as well. >>> >>> >>> Thanks! >>> Santhosh >>> ________________________________________ >>> From: Jayapal Reddy Uradi [jayapalreddy.ur...@citrix.com] >>> Sent: Saturday, October 05, 2013 5:17 AM >>> To: <dev@cloudstack.apache.org> >>> Cc: <us...@cloudstack.apache.org> >>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router >>> >>> Hi, >>> >>> +users list >>> If any one is already using any tools for monitoring then please share your >>> ideas. >>> Also share the cases where you experienced service crashes. >>> >>> Thanks, >>> Jayapal >>> >>> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <chiradeep.vit...@citrix.com> >>> wrote: >>> >>>> Well just make sure that your script is resilient to its own crashes >>>> as well. >>>> >>>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" >>>> <jayapalreddy.ur...@citrix.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am planning to write script utility to monitor processes and >>>>> restart on the event of failure. It will also logs the events. >>>>> >>>>> Thanks, >>>>> Jayapal >>>>> >>>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <swel...@ena.com> wrote: >>>>> >>>>>> supervisord maybe? >>>>>> >>>>>> ----- Original Message ----- >>>>>> >>>>>> From: "Chiradeep Vittal" <chiradeep.vit...@citrix.com> >>>>>> To: dev@cloudstack.apache.org >>>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM >>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router >>>>>> >>>>>> Got it. Any other OSS tool out there similar to monit? >>>>>> >>>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote: >>>>>> >>>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal >>>>>>> <chiradeep.vit...@citrix.com> wrote: >>>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts. >>>>>>>> It >>>>>>>> is >>>>>>>> simply too generic for the requirements outlined here. The >>>>>>>> proposal does not talk about modifying monit, just using it. >>>>>>>> That wouldn't trigger the AGPL. >>>>>>> >>>>>>> Let me restate my objection to anything AGPL. >>>>>>> People are largely comfortable with GPLv2 software - Linux is >>>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 >>>>>>> software (we actually saw this when CS was GPLv3 licensed.) But >>>>>>> the Affero GPL license is anathema in many corporate >>>>>>> environments, and by forcing it on folks in the default System VM >>>>>>> I fear it will hurt adoption of CloudStack. >>>>>>> >>>>>>> --David >>>>>> >>>>>> >>>>> >>>> >>> >> >