Adding a new status like ATTENTION or CRITITICAL or ....is a very big change
to how Servers Alive works.
It means that for various checks like diskspace/snmp/ping/.... (probably not
for nt service or process or url or db checks) you would need the
possibilities to define different "numbers" for each of the statuses.
Something like "below 40% ATTENTION", "below 10% CRITITICAL", "at 0% DOWN"
(talking about diskspace checks here). The alerting engine now works on the
DOWN/UP (and with the always option we don't even look at the status :-)),
that would mean that the alerting become much more complicated too since you
will need to define not only the rules as they are now, but also what status
is assisiated with it. And you'll need some more logic/correlation too.
Again an example :-)
diskspace check:
below 40% attention
below 10% crititical
at 0% down
alert on 4x attention user_1
alert on 2x critical user_2
cycle 1,2,3 diskspace is at 11% -> attention BUT no alert is
generated since it should only generate the alert on 4 times attention
cylce 4 it's at 9% -> critical BUT again no alert is generated since
the critical alert is only done after 2 times critical
Already 4 cycles in a bad condition and still no alert
As you can see it's not as simple as it looks.....
Dirk.
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of [EMAIL PROTECTED]
Sent: Wednesday, June 30, 2004 9:24 AM
To: [EMAIL PROTECTED]
Subject: Re: [SA-list] Possible Future Feature Request
Hi Andy,
I think what you are suggesting is to have two (or more) levels to indicate
how critical a failing check must be seen. Indeed, if a check doesn't meet
the criteria, then it is not always a "DOWN" situation. I can support the
idea/concept to have the ability to be able to apply a kind of scale to
each check-result, ranging from "DOWN" to "ATTENTION" or something that can
be defined by the systems manager (myself)....
Have a nice day.
Igor Kerstges.
|---------+----------------------------->
| | "Carroll, Andy" |
| | <[EMAIL PROTECTED]|
| | rgraph.com> |
| | Sent by: |
| | [EMAIL PROTECTED]|
| | tone.nu |
| | |
| | |
| | 06/30/2004 08:59 |
| | AM |
| | Please respond to |
| | salive |
| | |
|---------+----------------------------->
>---------------------------------------------------------------------------
---------------------------------------------------|
|
|
| To: [EMAIL PROTECTED]
|
| cc:
|
| Subject: [SA-list] Possible Future Feature Request
|
>---------------------------------------------------------------------------
---------------------------------------------------|
Dirk / Forum members,
There are a number of early system morning checks and late evening system
checks that we perform daily, and I have a number of Servers Alive checks
that I have set up to alert me of various situations that may be only
relevant for a short period during these times each day.
I have set up these checks, via their schedules, to only check for 1 hour
during the relevant period that we are performing these early morning and
late evening checks.
This causes me some difficulties occasionally as it is not always possible
to complete these system checks within the scheduled time that the Servers
Alive checks are 'Live', which means that occasionally I have to manually
check for the conditions that I have set the Servers Alive checks to
monitor
for.
Also some of these checks fail, which is a valid condition and we deal with
the problem, but once we have noted this we do not want this DOWN condition
to continue to alert for the rest of it's scheduled check time and report a
DOWN condition which would show on our HTML Frontend which is used by
various other departments, including helpdesk and senior management, but we
don't want to manually put the check on MAINTENANCE as there is always the
possibility that we would forget to make the check 'Live' again before the
next scheduled check time.
Would it be possible to have an option to put a check into MAINTENANCE mode
until the next scheduled check time as defined in the check's schedule tab?
creating a sort of "DOWN condition noted mode/status" which takes the check
out of the check cycle until the next scheduled period starts.
This would allow us to extend the schedule checking period without the
concern of down conditions being reported for longer than operationally
necessary.
Does anyone else see the benefit of this as an option? or even see what I
am
trying to explain.....
Regards,
Andy
--------------
[This E-mail scanned for viruses by Declude Virus]
To unsubscribe from a list, send a mail message to [EMAIL PROTECTED]
With the following in the body of the message:
unsubscribe SAlive
--------------
[This E-mail scanned for viruses by Declude Virus]
To unsubscribe from a list, send a mail message to [EMAIL PROTECTED]
With the following in the body of the message:
unsubscribe SAlive
--------------
[This E-mail scanned for viruses by Declude Virus]
To unsubscribe from a list, send a mail message to [EMAIL PROTECTED]
With the following in the body of the message:
unsubscribe SAlive