RE: [SA-list] Possible Future Feature Request

Dirk Bulinckx Wed, 30 Jun 2004 05:00:03 -0700

Setting it to maintenance is something that the system does by itself or is
this something manual?

Dirk.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of Carroll, Andy
Sent: Wednesday, June 30, 2004 10:45 AM
To: [EMAIL PROTECTED]
Subject: RE: [SA-list] Possible Future Feature Request

Dirk,

I am not suggesting something this complex, although I am sure that people
would be able to make use of it.

I believe in the simple view of UP / DOWN / MAINTENANCE, and if you want to
monitor various different levels of failure on disk space for instance then
you should set up individual checks for each of these.

However I have had other users put checks into MANTECA because they wished
to avoid a DOWN condition being displayed on certain checks, then they
forgot to put these checks live again and as the problem that caused the
down condition was not resolved then the condition went un-noticed for more
than a week, as Servers Alive was not monitoring this due to being manually
put into Maintenance.

I am looking to enhance the usability rather than make the check criteria
more complex.

I don't see this as a complex new status like ATTENTION or CRITICAL that
requires significant configuration,

I am after the option of having the check ignored as if it was on
MAINTENANCE between the time that someone interacts with the Servers Alive
Interface and the next scheduled check time, without further manual
intervention to re-instate the check.

Do you see the differentiation.

Regards,

Andy

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Behalf Of Dirk Bulinckx
Sent: 30 June 2004 08:58
To: [EMAIL PROTECTED]
Subject: RE: [SA-list] Possible Future Feature Request

Adding a new status like ATTENTION or CRITITICAL or ....is a very big change
to how Servers Alive works.
It means that for various checks like diskspace/snmp/ping/.... (probably not
for nt service or process or url or db checks) you would need the
possibilities to define different "numbers" for each of the statuses.
Something like "below 40% ATTENTION", "below 10% CRITITICAL", "at 0% DOWN"
(talking about diskspace checks here).  The alerting engine now works on the
DOWN/UP (and with the always option we don't even look at the status :-)),
that would mean that the alerting become much more complicated too since you
will need to define not only the rules as they are now, but also what status
is assisiated with it.  And you'll need some more logic/correlation too.
Again an example :-)
        diskspace check:
                below 40% attention
                below 10% crititical
                at 0% down

        alert on 4x attention user_1
        alert on 2x critical user_2

        cycle 1,2,3 diskspace is at 11% -> attention BUT no alert is
generated since it should only generate the alert on 4 times attention
        cylce 4 it's at 9% -> critical BUT again no alert is generated since
the critical alert is only done after 2 times critical

        Already 4 cycles in a bad condition and still no alert

As you can see it's not as simple as it looks.....

Dirk.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of [EMAIL PROTECTED]
Sent: Wednesday, June 30, 2004 9:24 AM
To: [EMAIL PROTECTED]
Subject: Re: [SA-list] Possible Future Feature Request

Hi Andy,
I think what you are suggesting is to have two (or more) levels to indicate
how critical a failing check must be seen. Indeed, if a check doesn't meet
the criteria, then it is not always a "DOWN" situation. I can support the
idea/concept to have the ability to be able to apply a kind of scale to
each check-result, ranging from "DOWN" to "ATTENTION" or something that can
be defined by the systems manager (myself)....

Have a nice day.
Igor Kerstges.

|---------+----------------------------->
|         |           "Carroll, Andy"   |
|         |           <[EMAIL PROTECTED]|
|         |           rgraph.com>       |
|         |           Sent by:          |
|         |           [EMAIL PROTECTED]|
|         |           tone.nu           |
|         |                             |
|         |                             |
|         |           06/30/2004 08:59  |
|         |           AM                |
|         |           Please respond to |
|         |           salive            |
|         |                             |
|---------+----------------------------->

>---------------------------------------------------------------------------
---------------------------------------------------|
  |
|
  |       To:       [EMAIL PROTECTED]
|
  |       cc:
|
  |       Subject:  [SA-list] Possible Future Feature Request
|

>---------------------------------------------------------------------------
---------------------------------------------------|

Dirk / Forum members,

There are a number of early system morning checks and late evening system
checks that we perform daily, and I have a number of Servers Alive checks
that I have set up to alert me of various situations that may be only
relevant for a short period during these times each day.

I have set up these checks, via their schedules, to only check for 1 hour
during the relevant period that we are performing these early morning and
late evening checks.

This causes me some difficulties occasionally as it is not always possible
to complete these system checks within the scheduled time that the Servers
Alive checks are 'Live', which means that occasionally I have to manually
check for the conditions that I have set the Servers Alive checks to
monitor
for.

Also some of these checks fail, which is a valid condition and we deal with
the problem, but once we have noted this we do not want this DOWN condition
to continue to alert for the rest of it's scheduled check time and report a
DOWN condition which would show on our HTML Frontend which is used by
various other departments, including helpdesk and senior management, but we
don't want to manually put the check on MAINTENANCE as there is always the
possibility that we would forget to make the check 'Live' again before the
next scheduled check time.

Would it be possible to have an option to put a check into MAINTENANCE mode
until the next scheduled check time as defined in the check's schedule tab?
creating a sort of "DOWN condition noted mode/status" which takes the check
out of the check cycle until the next scheduled period starts.

This would allow us to extend the schedule checking period without the
concern of down conditions being reported for longer than operationally
necessary.

Does anyone else see the benefit of this as an option? or even see what I
am
trying to explain.....

Regards,

Andy

--------------

[This E-mail scanned for viruses by Declude Virus]

To unsubscribe from a list, send a mail message to [EMAIL PROTECTED]
With the following in the body of the message:
   unsubscribe SAlive

--------------

[This E-mail scanned for viruses by Declude Virus]

To unsubscribe from a list, send a mail message to [EMAIL PROTECTED]
With the following in the body of the message:
   unsubscribe SAlive

--------------

[This E-mail scanned for viruses by Declude Virus]

To unsubscribe from a list, send a mail message to [EMAIL PROTECTED]
With the following in the body of the message:
   unsubscribe SAlive

--------------

[This E-mail scanned for viruses by Declude Virus]

To unsubscribe from a list, send a mail message to [EMAIL PROTECTED]
With the following in the body of the message:
   unsubscribe SAlive

--------------

[This E-mail scanned for viruses by Declude Virus]

To unsubscribe from a list, send a mail message to [EMAIL PROTECTED]
With the following in the body of the message:
   unsubscribe SAlive

RE: [SA-list] Possible Future Feature Request

Reply via email to