RE: [PROPOSAL] Service monitoring tool in virtual router

Sanjeev Neelarapu Wed, 06 Nov 2013 22:26:54 -0800

Jayapal,

I have gone through the FS posted @ 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services

Following are the few review comments:

1.      First line in the Introduction section says "Virtual router has running 
services which needs to run always until cloudsack disable it." What is the 
meaning of disable by cloudstack ? If cloudstack disables few services how the 
monitoring tool differentiate whether the service is disabled by cloudstack 
admin or its due to some failure?
2.      Is monitoring VR services is optional or will be monitored always? Any 
ways to set whether to enable this feature or not?
3.      Is service monitoring frequency configurable? If yes how do we 
configure? FS says the default value is 5 secs.
4.      FS says monitoring VR services has two tasks.
1.      monitoring services in VR
2.      sending alerts from router to external receivers
What external receivers we will be supporting? Also please specify what all the 
ways the monitoring tool indicates the failure? Are we going to use exiting 
Cloudstack Alerts and Events framework to indicate the failure?
5.      If multiple instances of the same processes are running do we monitor 
all the instances of the same process?
6.      After how many restarts the monitoring service decides that something 
is wrong with the process in bringing it up?
7.      After N no.of restarts if the process is still not running are we going 
to remove it from the monitoring processes list? If yes how the tools informs 
the admin that it is not able to restart the process? Or it will be restarting 
the process forever?
8.      Is there way for the admin to specify the tool to monitor only 
particular services?
9.      Apart from dnsmasq,haproxy,sshd,apache webserver services are we not 
monitoring the password service(socat)? Socat process is not mentioned in the 
Monitoring Services section in the FS
10.     Is this supported in RVR case as well?
11.     Specify the hypervisors supported for this feature?
12.     As per my understanding this tool will be part of systemvm.iso. After 
upgrade from pre 4.3 release to 4.3 iso will be pushed to the hypervisors. So 
stop, start VR is required for the exiting VRs to get this service. Please 
confirm.
13.     Please specify the expected date for confirming the scope for failure 
notifications. Scope is not clear from "sending alerts from router" section in 
FS

Thanks,
Sanjeev

-----Original Message-----
From: John Kinsella [mailto:[email protected]] 
Sent: Thursday, November 07, 2013 6:26 AM
To: <[email protected]>
Cc: <[email protected]>
Subject: Re: [PROPOSAL] Service monitoring tool in virtual router

Thx for putting this together, Jayapal. A few comments:

I'd really like to have a config flag to specify if things should be restarted 
automatically or not. Worst case, track the restarts - if a service is 
restarted more than X times in Y seconds, something's obviously wrong so stop 
tail-chasing[1]. Personally I'm much more interested in knowing there's a 
problem and then taking whatever happens to be the appropriate actions for our 
situation.

Regarding communicating with a monitoring system - what makes more sense to me 
is setting up a solid framework that provides folks flexibility to use various 
monitoring tools, from sending an email to contacting pager duty or whatever.

So, to me there's 3 parts to that:
1) At VR creation, ACS calls defined hook-script which knows how to contact 
monitoring system to tell it about system to monitor
2) At boot, VR sends API query to which the mgmt server responds with a URL for 
an install script - VR runs that to download/setup appropriate monitoring agent
3) VR has standardized scripts for agent to call to find out what should be 
running, and then agent can go check for itself.

With a setup like this, you can support SNMP, Opsview/Nagios, Monit, NSA, 
Zenoss, HPOV, Tivoli, etc etc etc. I'll happily write the Opsview/Nagios module 
(I'm thinking module is hosted outside ACS, but I guess it could be a plugin - 
see earlier licensing points).

Thoughts?

Just my 2c. Happy to tweak wiki if folks lean towards this.

John
1: Aside - this applies to SSVM creation currently - that hamster[2] keeps 
trying to spin that create SSVM wheel..
2: Apache CloudHamster, CloudMonkey's furry monitoring friend?

On Nov 6, 2013, at 7:58 AM, Jayapal Reddy Uradi <[email protected]> 
wrote:

> Please find below update FS
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+s
> ervices
> 
> Thanks,
> Jayapal
> 
> On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <[email protected]> 
> wrote:
> 
>> A shell script can be used. Few thoughts below:
>> 
>> 1. Collect the process id of all daemons you wanted to monitor using "pidof" 
>> of command and then use "kill" command to check if the pid you got is valid. 
>> Using kill we can send a signal 0, then check the status using echo $? . For 
>> sending a notification use linux syslog call ( man 3 syslogd) or "logger" 
>> command to send to syslog. If wanted to send email then you may also have to 
>> look for firewall not allowing outbound smtp port communiation. Even for 
>> snmp this holds same( i mean if any blocking through firewall rules ).  
>> Using syslog may be good as it by default exposes various debug log levels 
>> through its api call.
>> 
>> Now, to keep the monitor script up always up and runninig. Keep the monitor 
>> script run continuosly through cron or at at regular\scheduled intervals. 
>> This way even if monitor script goes down, the next xth interval, it is up 
>> again. 
>> 
>> With this there is a catch though, we may got multiple pids for a given 
>> daemon provided if there are multiple daemons spawned by same\multiple 
>> applications, if this scenario is not common then its ok, otherwise we may 
>> have to track it differently maintaining state of each spawned daemon and 
>> see if it exists. If multiple applications launch the same daemon, you may 
>> also wanted to say its application which got killed. EX: A launched httpd, 
>> and during its exit logic, it is killing all daemons it launched, then you 
>> may wanted to add  A is not available, rather than just http is not 
>> available. 
>> 
>> 
>> 2.  Using  netstat command : Check for available, listening and active ports 
>> on local host, provided all the daemons you wanted to monitor are running on 
>> "standard" ports or if we know the listening ports of those deamons to be 
>> monitored. Again, this script can be added through cron\at to be scheduled 
>> to run x units, if it gets killed the next x units after the monitor script 
>> is up again. 
>> 
>> Also, there could be many other approaches as well.
>> 
>> 
>> Thanks!
>> Santhosh
>> ________________________________________
>> From: Jayapal Reddy Uradi [[email protected]]
>> Sent: Saturday, October 05, 2013 5:17 AM
>> To: <[email protected]>
>> Cc: <[email protected]>
>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>> 
>> Hi,
>> 
>> +users list
>> If any one is already using any tools for monitoring then please share your 
>> ideas.
>> Also share the cases where you experienced service crashes.
>> 
>> Thanks,
>> Jayapal
>> 
>> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <[email protected]> 
>> wrote:
>> 
>>> Well just make sure that your script is resilient to its own crashes 
>>> as well.
>>> 
>>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" 
>>> <[email protected]>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am planning to write script utility to monitor processes and 
>>>> restart on the event of failure. It will also logs the events.
>>>> 
>>>> Thanks,
>>>> Jayapal
>>>> 
>>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <[email protected]> wrote:
>>>> 
>>>>> supervisord maybe?
>>>>> 
>>>>> ----- Original Message -----
>>>>> 
>>>>> From: "Chiradeep Vittal" <[email protected]>
>>>>> To: [email protected]
>>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>> 
>>>>> Got it. Any other OSS tool out there similar to monit?
>>>>> 
>>>>> On 10/1/13 8:24 AM, "David Nalley" <[email protected]> wrote:
>>>>> 
>>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal 
>>>>>> <[email protected]> wrote:
>>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>>>> It
>>>>>>> is
>>>>>>> simply too generic for the requirements outlined here. The 
>>>>>>> proposal does not talk about modifying monit, just using it. 
>>>>>>> That wouldn't trigger the AGPL.
>>>>>> 
>>>>>> Let me restate my objection to anything AGPL.
>>>>>> People are largely comfortable with GPLv2 software - Linux is 
>>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 
>>>>>> software (we actually saw this when CS was GPLv3 licensed.) But 
>>>>>> the Affero GPL license is anathema in many corporate 
>>>>>> environments, and by forcing it on folks in the default System VM 
>>>>>> I fear it will hurt adoption of CloudStack.
>>>>>> 
>>>>>> --David
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

RE: [PROPOSAL] Service monitoring tool in virtual router

Reply via email to