Re: [PROPOSAL] Service monitoring tool in virtual router

Jayapal Reddy Uradi Thu, 07 Nov 2013 04:11:05 -0800

Hi Sanjeev,

Thanks your comments.
Please find my comments inline. 
Also update the FS.


Thanks,
Jayapal

On 07-Nov-2013, at 11:55 AM, Sanjeev Neelarapu <sanjeev.neelar...@citrix.com>
 wrote:

> Jayapal,
> 
> I have gone through the FS posted @ 
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services
> 
> Following are the few review comments:
> 
> 1.    First line in the Introduction section says "Virtual router has running 
> services which needs to run always until cloudsack disable it." What is the 
> meaning of disable by cloudstack ? If cloudstack disables few services how 
> the monitoring tool differentiate whether the service is disabled by 
> cloudstack admin or its due to some failure?
It means the services should run until cloudstack instruct  to stop.
The service disable/enable happens with network offering. on VR boot and 
monitor configuration get updated with new services. There are default services 
also.
> 2.    Is monitoring VR services is optional or will be monitored always? Any 
> ways to set whether to enable this feature or not?
Currently it is not configurable.By default monitoring default services like 
sshd, web server.
> 3.    Is service monitoring frequency configurable? If yes how do we 
> configure? FS says the default value is 5 secs.
No.
> 4.    FS says monitoring VR services has two tasks.
> 1.    monitoring services in VR
> 2.    sending alerts from router to external receivers
> What external receivers we will be supporting? Also please specify what all 
> the ways the monitoring tool indicates the failure? Are we going to use 
> exiting Cloudstack Alerts and Events framework to indicate the failure?
This item will be updated once finalised about sending alerts from VR.
> 5.    If multiple instances of the same processes are running do we monitor 
> all the instances of the same process?
It monitors the parent service, which has its pid in pid file. 
> 6.    After how many restarts the monitoring service decides that something 
> is wrong with the process in bringing it up?
five
> 7.    After N no.of restarts if the process is still not running are we going 
> to remove it from the monitoring processes list? If yes how the tools informs 
> the admin that it is not able to restart the process? Or it will be 
> restarting the process forever?
Unmonit process after N number re tries is not there. 
monitor log the service fail. Admin can knows only from the logs. 
For this release sending alerts from VR is not implemented.
> 8.    Is there way for the admin to specify the tool to monitor only 
> particular services?
Currently the services are selected based on network offering and default 
services from db.
Configuring services from API/UI is not there.
> 9.    Apart from dnsmasq,haproxy,sshd,apache webserver services are we not 
> monitoring the password service(socat)? Socat process is not mentioned in the 
> Monitoring Services section in the FS
Not monitoring socat because socat is automatically restarted by password server
> 10.   Is this supported in RVR case as well?
No.
> 11.   Specify the hypervisors supported for this feature?
xen,kvm and vmware
> 12.   As per my understanding this tool will be part of systemvm.iso. After 
> upgrade from pre 4.3 release to 4.3 iso will be pushed to the hypervisors. So 
> stop, start VR is required for the exiting VRs to get this service. Please 
> confirm.
yes
> 13.   Please specify the expected date for confirming the scope for failure 
> notifications. Scope is not clear from "sending alerts from router" section 
> in FS
> 
> Thanks,
> Sanjeev
> 
> -----Original Message-----
> From: John Kinsella [mailto:j...@stratosec.co] 
> Sent: Thursday, November 07, 2013 6:26 AM
> To: <dev@cloudstack.apache.org>
> Cc: <us...@cloudstack.apache.org>
> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
> 
> Thx for putting this together, Jayapal. A few comments:
> 
> I'd really like to have a config flag to specify if things should be 
> restarted automatically or not. Worst case, track the restarts - if a service 
> is restarted more than X times in Y seconds, something's obviously wrong so 
> stop tail-chasing[1]. Personally I'm much more interested in knowing there's 
> a problem and then taking whatever happens to be the appropriate actions for 
> our situation.
> 
> Regarding communicating with a monitoring system - what makes more sense to 
> me is setting up a solid framework that provides folks flexibility to use 
> various monitoring tools, from sending an email to contacting pager duty or 
> whatever.
> 
> So, to me there's 3 parts to that:
> 1) At VR creation, ACS calls defined hook-script which knows how to contact 
> monitoring system to tell it about system to monitor
> 2) At boot, VR sends API query to which the mgmt server responds with a URL 
> for an install script - VR runs that to download/setup appropriate monitoring 
> agent
> 3) VR has standardized scripts for agent to call to find out what should be 
> running, and then agent can go check for itself.
> 
> With a setup like this, you can support SNMP, Opsview/Nagios, Monit, NSA, 
> Zenoss, HPOV, Tivoli, etc etc etc. I'll happily write the Opsview/Nagios 
> module (I'm thinking module is hosted outside ACS, but I guess it could be a 
> plugin - see earlier licensing points).
> 
> Thoughts?
> 
> Just my 2c. Happy to tweak wiki if folks lean towards this.
> 
> John
> 1: Aside - this applies to SSVM creation currently - that hamster[2] keeps 
> trying to spin that create SSVM wheel..
> 2: Apache CloudHamster, CloudMonkey's furry monitoring friend?
> 
> On Nov 6, 2013, at 7:58 AM, Jayapal Reddy Uradi 
> <jayapalreddy.ur...@citrix.com> wrote:
> 
>> Please find below update FS
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+s
>> ervices
>> 
>> Thanks,
>> Jayapal
>> 
>> On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <santhosh.eduku...@citrix.com> 
>> wrote:
>> 
>>> A shell script can be used. Few thoughts below:
>>> 
>>> 1. Collect the process id of all daemons you wanted to monitor using 
>>> "pidof" of command and then use "kill" command to check if the pid you got 
>>> is valid. Using kill we can send a signal 0, then check the status using 
>>> echo $? . For sending a notification use linux syslog call ( man 3 syslogd) 
>>> or "logger" command to send to syslog. If wanted to send email then you may 
>>> also have to look for firewall not allowing outbound smtp port 
>>> communiation. Even for snmp this holds same( i mean if any blocking through 
>>> firewall rules ).  Using syslog may be good as it by default exposes 
>>> various debug log levels through its api call.
>>> 
>>> Now, to keep the monitor script up always up and runninig. Keep the monitor 
>>> script run continuosly through cron or at at regular\scheduled intervals. 
>>> This way even if monitor script goes down, the next xth interval, it is up 
>>> again. 
>>> 
>>> With this there is a catch though, we may got multiple pids for a given 
>>> daemon provided if there are multiple daemons spawned by same\multiple 
>>> applications, if this scenario is not common then its ok, otherwise we may 
>>> have to track it differently maintaining state of each spawned daemon and 
>>> see if it exists. If multiple applications launch the same daemon, you may 
>>> also wanted to say its application which got killed. EX: A launched httpd, 
>>> and during its exit logic, it is killing all daemons it launched, then you 
>>> may wanted to add  A is not available, rather than just http is not 
>>> available. 
>>> 
>>> 
>>> 2.  Using  netstat command : Check for available, listening and active 
>>> ports on local host, provided all the daemons you wanted to monitor are 
>>> running on "standard" ports or if we know the listening ports of those 
>>> deamons to be monitored. Again, this script can be added through cron\at to 
>>> be scheduled to run x units, if it gets killed the next x units after the 
>>> monitor script is up again. 
>>> 
>>> Also, there could be many other approaches as well.
>>> 
>>> 
>>> Thanks!
>>> Santhosh
>>> ________________________________________
>>> From: Jayapal Reddy Uradi [jayapalreddy.ur...@citrix.com]
>>> Sent: Saturday, October 05, 2013 5:17 AM
>>> To: <dev@cloudstack.apache.org>
>>> Cc: <us...@cloudstack.apache.org>
>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>> 
>>> Hi,
>>> 
>>> +users list
>>> If any one is already using any tools for monitoring then please share your 
>>> ideas.
>>> Also share the cases where you experienced service crashes.
>>> 
>>> Thanks,
>>> Jayapal
>>> 
>>> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <chiradeep.vit...@citrix.com> 
>>> wrote:
>>> 
>>>> Well just make sure that your script is resilient to its own crashes 
>>>> as well.
>>>> 
>>>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" 
>>>> <jayapalreddy.ur...@citrix.com>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am planning to write script utility to monitor processes and 
>>>>> restart on the event of failure. It will also logs the events.
>>>>> 
>>>>> Thanks,
>>>>> Jayapal
>>>>> 
>>>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <swel...@ena.com> wrote:
>>>>> 
>>>>>> supervisord maybe?
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>> 
>>>>>> From: "Chiradeep Vittal" <chiradeep.vit...@citrix.com>
>>>>>> To: dev@cloudstack.apache.org
>>>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>>> 
>>>>>> Got it. Any other OSS tool out there similar to monit?
>>>>>> 
>>>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>>>>> 
>>>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal 
>>>>>>> <chiradeep.vit...@citrix.com> wrote:
>>>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>>>>> It
>>>>>>>> is
>>>>>>>> simply too generic for the requirements outlined here. The 
>>>>>>>> proposal does not talk about modifying monit, just using it. 
>>>>>>>> That wouldn't trigger the AGPL.
>>>>>>> 
>>>>>>> Let me restate my objection to anything AGPL.
>>>>>>> People are largely comfortable with GPLv2 software - Linux is 
>>>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 
>>>>>>> software (we actually saw this when CS was GPLv3 licensed.) But 
>>>>>>> the Affero GPL license is anathema in many corporate 
>>>>>>> environments, and by forcing it on folks in the default System VM 
>>>>>>> I fear it will hurt adoption of CloudStack.
>>>>>>> 
>>>>>>> --David
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: [PROPOSAL] Service monitoring tool in virtual router

Reply via email to