Disable all alerting for 20 minutes

2006-12-13 Thread Ben Ragg
Hi there,

We often make changes to our network at 3am, and while every effort is
made to disable the appropriate services, quite often something will slip
through the cracks and wake someone up.

Is there an option to disable all alerts from being sent for 20 minutes,
and only display via the webpage (Failed, NoAlerts)

Regards,
Ben

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Disable all alerting for 20 minutes

2006-12-13 Thread Amias Channer
On Thu, 14 Dec 2006 00:20:40 +1030 (CST)
Ben Ragg <[EMAIL PROTECTED]> wrote:

> Hi there,
> 
> We often make changes to our network at 3am, and while every effort is
> made to disable the appropriate services, quite often something will slip
> through the cracks and wake someone up.
> 
> Is there an option to disable all alerts from being sent for 20 minutes,
> and only display via the webpage (Failed, NoAlerts)

I think you might need the depend and dep_behaviour options , you could
then make it depend on the time not being between 3am and 3:20am .

If the the dep_behaviour is set to A (alert)  alerts will be suppressed
but the mon scripts will still run and the webpage will still report
normally.

I learnt this from the manpage but have not had great success
implementing it so would be interested to know how you get on.

-- 
Kind Regards

Amias Channer, Systems Administrator
Direct: +44 (0) 1225 731412 Mobile +44 (0)7989 301577
[EMAIL PROTECTED]  http://www.metacharge.com/

< Merchant accounts from a choice of tier 1 banks >
< Card payment processing for ecommerce merchants >
< Low rates, fast-track application, 24x7 merchant support > 
< Enterprise grade account management and reporting software >
< Metacharge is a Visa/MasterCard certified member service provider >

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Disable all alerting for 20 minutes

2006-12-13 Thread David Nolan


--On Thursday, December 14, 2006 00:20:40 +1030 Ben Ragg 
<[EMAIL PROTECTED]> wrote:

> Hi there,
>
> We often make changes to our network at 3am, and while every effort is
> made to disable the appropriate services, quite often something will slip
> through the cracks and wake someone up.
>
> Is there an option to disable all alerts from being sent for 20 minutes,
> and only display via the webpage (Failed, NoAlerts)
>

There are a few options right now for this.

If its a regular occurance you could configure an exclude period on the 
services, or configure the alert periods themselves to exclude that time 
frame.

If its an irregular occurance you can stop the mon scheduler via the web 
interface (or from cron), and restart when done.(The UI will see no 
updates, because nothing will be tested...)

Finally the most "evil hack" style method, which I've used on occasion, is:
cd 
chmod -x *
... 
chmod +x *


You could also do something like write a script that uses Mon::Client and 
disables all hostgroups.  (This would show the status updates in the UI 
without sending alerts, at least with the current (CVS, 1.2.0rc1) Mon it 
would, I can't remember whether 0.99.2 did that.)


-David



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Disable all alerting for 20 minutes

2006-12-13 Thread David Nolan
(lets keep the discussion on the list )

On 12/13/06, Aled Treharne <[EMAIL PROTECTED]> wrote:
> On 13/12/06, David Nolan <[EMAIL PROTECTED]> wrote:
> > If its a regular occurance you could configure an exclude period on the
> > services, or configure the alert periods themselves to exclude that time
> > frame.
>
> We have a similar problem, however the maintenance slots that we have
> aren't regular (mainly because we montior some of our clients who host
> with a variety of hosting providers).

Are they regular *for that cilent*?


> I think what I'd like ideally is somethign similar to Nagios where you
> can put in scheduled maintenance and it won't alert if somethign fails
> during that maintenance. My perl knowledge is nowhere near good enough
> to implement this though. :(
>

The best option in Mon right now for scheduled maintenance is to use
either exclude_period or craft your period definitions carefully.  If
I'm understanding you correctly Nagios provides a way to enter a
one-time scheduled maintenance period via the interface?  I could see
adding that to Mon, but would you want it to be global, or would you
need a way to restrict it to a subset of the hostgroups?

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Disable all alerting for 20 minutes

2006-12-13 Thread Jim Trocki
On Wed, 13 Dec 2006, David Nolan wrote:

> You could also do something like write a script that uses Mon::Client and
> disables all hostgroups.  (This would show the status updates in the UI
> without sending alerts, at least with the current (CVS, 1.2.0rc1) Mon it
> would, I can't remember whether 0.99.2 did that.)

it would probably require less effort to just add a "holdalerts" feature
to the server, or something of that nature.

i can imagine this could be done a few different ways:

 1. walk through the watch structure and disable each

 2. have a global "hold alerts" flag which leaves the
watch structure alone but is respected by do_alert

i'd lean towards #2 because it wouldn't blow away any previously
disabled watches or services.

how's that sound?

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Disable all alerting for 20 minutes

2006-12-13 Thread Jim Trocki
On Wed, 13 Dec 2006, David Nolan wrote:

> I'm understanding you correctly Nagios provides a way to enter a
> one-time scheduled maintenance period via the interface?  I could see
> adding that to Mon, but would you want it to be global, or would you
> need a way to restrict it to a subset of the hostgroups?

it would be best to implement it both ways. then, people could just pick
which behavior they want.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Disable all alerting for 20 minutes

2006-12-13 Thread Aled Treharne
On 13/12/06, David Nolan <[EMAIL PROTECTED]> wrote:
> (lets keep the discussion on the list )

Sorry, my bad - I'm used to mailing lists where the reply-to is set to
the list. :)

> On 12/13/06, Aled Treharne <[EMAIL PROTECTED]> wrote:
> > On 13/12/06, David Nolan <[EMAIL PROTECTED]> wrote:
> > > If its a regular occurance you could configure an exclude period on the
> > > services, or configure the alert periods themselves to exclude that time
> > > frame.
> >
> > We have a similar problem, however the maintenance slots that we have
> > aren't regular (mainly because we montior some of our clients who host
> > with a variety of hosting providers).
>
> Are they regular *for that cilent*?

No. None of our clients or their hosting providers have  regular
maintenance. It's scheduled, but irregular.

> The best option in Mon right now for scheduled maintenance is to use
> either exclude_period or craft your period definitions carefully.  If
> I'm understanding you correctly Nagios provides a way to enter a
> one-time scheduled maintenance period via the interface?  I could see

That's right. We used it at my previous workplace - I haven't set it
up here. IIRC, you could enter a time window, add the appropriate
services and save a label for it too.

> adding that to Mon, but would you want it to be global, or would you
> need a way to restrict it to a subset of the hostgroups?

So long as it follows dependencies, I'd say hostgroups. That is, if
there's maintenance on the router at HostingProviderA, then we add the
HostingProviderA-rtr hostgroup to the maintenance slot, and mon also
ignore alerts from HostingProviderA-svr.

Does that make sense?

Cheers,
Aled.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Disable all alerting for 20 minutes

2006-12-13 Thread Aled Treharne
On 13/12/06, Jim Trocki <[EMAIL PROTECTED]> wrote:
> it would probably require less effort to just add a "holdalerts" feature
> to the server, or something of that nature.
>
> i can imagine this could be done a few different ways:
>
>  1. walk through the watch structure and disable each
>
>  2. have a global "hold alerts" flag which leaves the
> watch structure alone but is respected by do_alert
>
> i'd lean towards #2 because it wouldn't blow away any previously
> disabled watches or services.
>
> how's that sound?

I'm definately leaning for #2 - we have alerts that are ack'd (e.g.
Hardware replacement on order, but takes a week to arrive) but not
disabled, and watches that are disabled (e.g. service offline for an
indeterminate time, but will eventually come back).

I'd also like to see it implemented so that you don't need to reload
mon to add a new maintenance slot.

Cheers,
Aled.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Disable all alerting for 20 minutes

2006-12-13 Thread Ben Ragg
I'd be leaning towards #2 as well.

In our situation all alerts are sent via email (and we make use of an 
email to sms gateway for some alerts). The option we'll probably go for 
here, is to make the alert scripts check for the presense of a file 
containing a timestamp and an email address, if alerts make it to the 
alert script before that timestamp (with sanity checking to make sure 
the timestamp is less than four hours away) instead of being sent to the 
email address given to the alert script, they are sent to the email 
address in the file

ie something like...

/local/mon/etc/outage:
1166045104   [EMAIL PROTECTED]

If 1166045104 > now + 14400 - alert someone, having mon away from 
default behaviour this long is bad and continue with normal behaviour
If now +14400 > 1166045104 > now - redirect any alert to 
[EMAIL PROTECTED]
If now > 1166045104 - old outage window, ignore and continue with normal 
behaviour

Cheers,
Ben

Aled Treharne wrote:
> I'm definately leaning for #2 - we have alerts that are ack'd (e.g.
> Hardware replacement on order, but takes a week to arrive) but not
> disabled, and watches that are disabled (e.g. service offline for an
> indeterminate time, but will eventually come back).
>
> I'd also like to see it implemented so that you don't need to reload
> mon to add a new maintenance slot.
>
> Cheers,
> Aled.
>
> ___
> mon mailing list
> mon@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/mon
>   


-- 
Ben Ragg - Internode - Network Operations
150 Grenfell Street, Adelaide, SA, 5000
Phone: 13NODE Web: http://www.on.net

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon