avoid duplicated alerts in a multi-host/mon context

2007-10-17 Thread Jacques Klein
Hello,

I am using mon to monitor several services on multiple hosts in a network.

Each host runs a mon daemon with a mon.cf configured according to the 
services to watch;
this depends of the kind of apps. installed on this host.
An application may need access to (a service on) an other host, so there 
is a watch checking for this remote service.
When such a "server host" goes down (or crashes, or ...) , than there 
can (will) be several alerts raised, one on each "client host".
This will, for example, result in duplicated and redondant email messages.

How can such a behavior avoided ?.

I am thinking about forwarding all alerts to a "master-mon" where some 
filtering would happend before emitting an alert.
This "master-mon" must be dynamically elected in order to avoid a 
single-point of failure, handle a failure resulting in a network split, 
a mail-server failure, a.s.o. .

Are there features the current mon (1.2 ?) provides that can be hepful 
to implement such a thing ?,
maybe there are better ways to achieve what I need ?

Thank's for any hint.


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: avoid duplicated alerts in a multi-host/mon context

2007-10-17 Thread Augie Schwer
On 10/17/07, Jacques Klein <[EMAIL PROTECTED]> wrote:
> Each host runs a mon daemon with a mon.cf configured according to the
> services to watch;
> this depends of the kind of apps. installed on this host.
> An application may need access to (a service on) an other host, so there
> is a watch checking for this remote service.
> When such a "server host" goes down (or crashes, or ...) , than there
> can (will) be several alerts raised, one on each "client host".
> This will, for example, result in duplicated and redondant email messages.
> How can such a behavior avoided ?.

If I am reading you right, then you want the "depend" definition in
your watch group:

http://mon.wiki.kernel.org/index.php/Mon_Manual

depend dependexpression
  The depend keyword is used to specify a  dependency  expression,
  which  evaluates  to either true of false, in the boolean sense.
  Dependencies are actual Perl expressions, and must obey all syn-
  tactical rules. The expressions are evaluated in their own pack-
  age space so as to not accidentally  have  some  unwanted  side-
  effect.   If a syntax error is found when evaluating the expres-
  sion, it is logged via syslog.

  Before evaluation, the following substitutions on the expression
  occur:  phrases  which look like "group:service" are substituted
  with the value of the current operational status of that  speci-
  fied  service.  These opstatus substitutions are computed recur-
  sively, so if service A depends upon service B,  and  service  B
  depends  upon  service C, then service A depends upon service C.
  Successful operational statuses  (which  evaluate  to  "1")  are
  "STAT_OK",  "STAT_COLDSTART",  "STAT_WARMSTART", and
  "STAT_UNKNOWN".  The word "SELF" (in all caps) can be  used  for
  the  group (e.g. "SELF:service"), and is an abbreviation for the
  current watch group.

  This feature can be used to control alerts  for  services  which
  are  dependent  on  other  services,  e.g. an SMTP test which is
  dependent upon the machine being ping-reachable.


-- 
Augie Schwer-[EMAIL PROTECTED]-http://schwer.us
Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: avoid duplicated alerts in a multi-host/mon context

2007-10-17 Thread Jacques Klein
Augie Schwer wrote:
> If I am reading you right, then you want the "depend" definition in
> your watch group:
>
> http://mon.wiki.kernel.org/index.php/Mon_Manual
>
> depend dependexpression
> The depend keyword is used to specify a  dependency  expression,
> 
> This feature can be used to control alerts  for  services  which
> are  dependent  on  other  services,  e.g. an SMTP test which is
> dependent upon the machine being ping-reachable.
>
>
>   
Well, not really, or not enough in fact.
If I understand the "depend", it's a way to avoid multiple alerts by 
specifying dependencies between services in ONE mon.
If I take this concept, then it would have to be extended to 
dependencies between services in a GROUP of mon(s) (one per host), 
interesting but seems very complicated.


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: avoid duplicated alerts in a multi-host/mon context

2007-10-17 Thread Jim Trocki
On Wed, 17 Oct 2007, Jacques Klein wrote:

> If I understand the "depend", it's a way to avoid multiple alerts by
> specifying dependencies between services in ONE mon.
> If I take this concept, then it would have to be extended to
> dependencies between services in a GROUP of mon(s) (one per host),
> interesting but seems very complicated.

Yes, one of the ways you could implement this functionality is by using
traps to feed the status to a mon server which uses this input to control
the alerts and implement the dependencies. You are on the right track in
what you said in your previous mail.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: avoid duplicated alerts in a multi-host/mon context

2007-10-17 Thread David Nolan
On 10/17/07, Jacques Klein <[EMAIL PROTECTED]> wrote:
> Well, not really, or not enough in fact.
> If I understand the "depend", it's a way to avoid multiple alerts by
> specifying dependencies between services in ONE mon.
> If I take this concept, then it would have to be extended to
> dependencies between services in a GROUP of mon(s) (one per host),
> interesting but seems very complicated.
>

If you configure each of your mon servers to send traps to all of the
others on status updates, then you can use dependencies on each server
based on state changes from other servers.

If they're all one one LAN you could probably even do that by sending
the status updates as broadcast packets.   I've never tried that, it
might take minor coding in Mon to make it process broadcast packets.
Of course even better would be multicast, but that would definitely
require some code changes.

The best way to cause all status updates to get propagated is by using
the 'redistribute' config option.  From the manual:

   redistribute alert [arg...]
  A  service  may have one redistribute option, which is a special
  form of an an alert definition.  This alert will  be  called  on
  every  service  status  update,  even  sequential success status
  updates.  This can be used to integrate Mon with  another  moni-
  toring  system,  or to link together multiple Mon servers via an
  alert script that generates Mon traps.  See the "ALERT PROGRAMS"
  section  above  for a list of the parameters mon will pass auto-
  matically to alert programs.


Combine redistribute with trap.alert, define all your watches and
services on all servers, and then you can do lots of stuff with
dependencies.

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


mon config testing

2007-10-17 Thread William Taylor
Do any better tools exist for parsing/validating mon.cf?
test_config() doesn't seem to catch a lot of errors like random
text entered on lines.


Thanks,
  William

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon