[Nagios-users] Service checks for redundant hosts

Ben Prew Tue, 30 Jul 2013 11:08:51 -0700

Hey,

I'm looking for some suggestions for implementing a service check on a
redundant host pair that access a shared resource.


Here's our setup:

We have N hosts that process (via delayed_job) a shared job queue
(mysql/redis).  We have several checks that are host-specific (# of workers
on that host), but we also have several checks that examine the shared job
queue (# of unprocessed jobs).

I have several possible implementations:

============
1. Shared Job Queue check on single processing host (current setup)
Pros:
* We only get notified once when the shared queue is high

Cons:
* If the single host goes down, we lose the shared queue check

============
2. Shared Job Queue check on all processing hosts

Pros:
* If a single processing host goes down, the shared queue check still
functions

Cons:
* Multiple emails from hosts when the shared check fails

============
3. Shared Job Queue check on job queue host (ie the DB box)

Pros:
* If the DB goes down, you can't reach the queue anyway
* Single email on failure

Cons:
* The check requires app knowledge, which requires having the app deployed
on the job queue host

How are others adding a check like this?  #2 and just bite the bullet for
multiple emails?

Thanks

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk

_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Service checks for redundant hosts

Reply via email to