On 14 Dec 2009, at 17:34, Bill Campbell (NC) wrote:

I have opsview set up in an HA configuration using heartbeat.  The two servers share the /usr/local/nagios partition and only one server at a time runs opsview.  It has all been working fine until a couple of days ago I noticed a heavy load on both servers and discovered that I had about 20 nmis.pl processes running and using about %30 of the total capacity of 4 cores.  To make matters more confusing, the standby server also had nmis.pl processes running and using a significant portion of available cpu cycles.  I commented out the nmis cron jobs on the standby server and the nmis processes still come back.  So my two questions are:
 
1.)    Why all of the sudden are all of the nmis processes running and taking up so much cpu time?
2.)    How are these processes getting started on my standby server?

NMIS is run out of cron, not from a daemon.  When Opsview reloads, its putting all the config into a shared partition between the two hosts, the cron on the standy starts nmis with valid nmis config.  As for why they are taking up so much CPU i cannot tell without trawling through your system.  I would guess its because both servers are trying to hit that same devices via snmp and then trying to update the same data files via locking all at the same time.

I would suggest commenting out all cron jobs on the standby and ensuring nothing is running as the nagios user.

  Duncs
 
-- 
Duncan Ferguson
Senior Developer



Opsera Limited | Unit 69 Suttons Business Park
Reading | Berkshire | RG6 1AZ | UK

Phone:   
+44 (0) 845 057 7887
Mobile
:   +44 (0) 7968 148 748
Skype:   duncan_j_ferguson     Email:   
[email protected]
www.opsera.com

Opsera Limited is registered in the UK under Company Number 5396532. Our registered office is Gorse View, Horsell Rise, Woking, SurreyGU21 4RB.

_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users

Reply via email to