Re: [opsview-users] Master in HA

David LaPorte Wed, 17 Mar 2010 07:29:04 -0700

We currently run OpsView with heartbeat and DRBD on two geographically diverse 
servers.  We're currently monitoring 702 hosts and 2184 services and the drbd 
replication is using about 12 mbit of bandwidth which is fairly consistent.  We 
keep /usr/local/nagios/, /usr/local/opsview-web/, /usr/local/opsview-reports/, 
/etc/httpd/, and /var/lib/mysql/ on our drbd mount point.

One trick I used was to mirror the relevant main routing table entries in the 
local table with the source address of the shared heartbeat IP.  This being 
triggered by heartbeat through a script.  This is to make sure that checks are 
coming from a predictable source for ACL/config purposes.  I think that some 
OSs don't let you modify the local routing table but CentOS5 does.  The reason 
for using the local routing table is that it has a higher preference than the 
main routing table and this way you don't have to mess with modifying routes on 
hb takeover and give them up on standby.  When you lose the IP address, the 
routes you added to the local table are dropped automatically causing traffic 
to hit the main table and your original routes.

Right now, our single points of failure are the locations themselves.  We 
monitor hosts in public IP space through a default route at one location and 
have a static route to the other physical location for monitoring hosts on 
private IP space.  It's not the worst because if we lose the location where a 
majority of the private IP space is located, we won't be able to monitor much 
of it anyway.  :)  The problem is if we lose the location where the default 
(and public) route is located then we unnecessarily cut off access to our 
ability to monitor hosts in public IP space.  We're looking at more dynamic 
options for that.

Also, I understand that running heartbeat over this kind of distance is 
considered bad practice.  And I could write a script to manually take down 
services and bring up services and just run the script on each server for 
manual failover.  But I like heartbeat scheduling things for me so I just set 
the timeout values really high (like, a day or two) and wrote checks to alert 
me in the unlikely scenario that late heartbeats are detected so I can prevent 
a split brain scenario before it happens.  Plus obvious checks for the 
individual servers in the cluster with checks for OpsView itself residing in 
the host that checks the shared IP(s).

-David

On Mar 17, 2010, at 9:03 AM, Simone Felici wrote:

> 
> Hi,
> 
> Are there some experiences on the community with OPSView Master in High 
> Availability?
> I've read the documentation on 
> http://docs.opsview.org/doku.php?id=opsview-community:hamaster but here it's 
> used the old 
> version of heartbeat.
> Meanwhile there is openais/corosync, drbd instead of NFS, and so forth.
> I ask only if someone has found a good and tested solution to follow and test 
> :)
> In addiction some informations on how much hosts/services are monitored...
> 
> Thanks a lot to all for the attention,
> 
> Simon
> _______________________________________________
> Opsview-users mailing list
> [email protected]
> http://lists.opsview.org/lists/listinfo/opsview-users

_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users

Re: [opsview-users] Master in HA

Reply via email to