[opsview-users] Extreme slowdown in reload after upgrade to 3.7.0

Toni Van Remortel Wed, 19 May 2010 06:22:36 -0700

Hi,

I upgraded my Opsview yesterday to 3.7.0
After I made my changes in the contacts (merging all separate 'contacts' for 
each person into 1 contact with some profiles connected to it), the reload of 
the entire system went from 3 minutes 50 seconds to almost 30 minutes.


When I watch the processes on the master server, I do see the nagconfgen.pl 
scripts go on high speed, followed by a minute of 100% cpu usage by the Nagios 
process. Then it goes quit on the master server, but the reload indicator 
states it is still busy.

This is opsviewd.log:
[2010/05/19 14:59:54] [slave_node_event_handler] [INFO] Starting
[2010/05/19 14:59:54] [slave_node_event_handler] [INFO] Only running on HARD 
state change - currently SOFT
[2010/05/19 14:59:54] [slave_node_event_handler] [INFO] Finished
[2010/05/19 15:00:28] [slave_node_event_handler] [INFO] Starting
[2010/05/19 15:00:28] [slave_node_event_handler] [INFO] Only running on HARD 
state change - currently SOFT
[2010/05/19 15:00:28] [slave_node_event_handler] [INFO] Finished
[2010/05/19 15:00:52] [slave_node_event_handler] [INFO] Starting
[2010/05/19 15:00:52] [slave_node_event_handler] [INFO] Only running on HARD 
state change - currently SOFT
[2010/05/19 15:00:52] [slave_node_event_handler] [INFO] Finished
[2010/05/19 15:01:26] [slave_node_event_handler] [INFO] Starting
[2010/05/19 15:01:26] [slave_node_event_handler] [INFO] Only running on HARD 
state change - currently SOFT
[2010/05/19 15:01:26] [slave_node_event_handler] [INFO] Finished
[2010/05/19 15:01:54] [slave_node_event_handler] [INFO] Starting
[2010/05/19 15:01:54] [slave_node_event_handler] [INFO] Only running when OK - 
state is currently CRITICAL
[2010/05/19 15:01:54] [slave_node_event_handler] [INFO] Finished
[2010/05/19 15:02:16] [slave_node_event_handler] [INFO] Starting
[2010/05/19 15:02:16] [slave_node_event_handler] [INFO] Only running when OK - 
state is currently CRITICAL
[2010/05/19 15:02:16] [slave_node_event_handler] [INFO] Finished
[2010/05/19 15:04:02] [import_runtime] [INFO] Starting
[2010/05/19 15:04:02] [import_runtime] [INFO] Importing for 2010-05-19 12:00:00
[2010/05/19 15:04:02] [import_runtime] [INFO] Importing all results and 
performance data
[2010/05/19 15:04:24] [import_runtime] [INFO] Importing downtime starts
[2010/05/19 15:04:24] [import_runtime] [INFO] Importing downtime ends
[2010/05/19 15:04:24] [import_runtime] [INFO] Checking for incorrect downtimes
[2010/05/19 15:04:24] [import_runtime] [INFO] Caculating relevant downtimes
[2010/05/19 15:04:24] [import_runtime] [INFO] Importing notifications
[2010/05/19 15:04:24] [import_runtime] [INFO] Importing acknowledgements
[2010/05/19 15:04:24] [import_runtime] [INFO] Importing state history
[2010/05/19 15:04:25] [import_runtime] [INFO] Calculating hourly availability
[2010/05/19 15:04:31] [import_runtime] [INFO] Finished import for hour
[2010/05/19 15:04:32] [import_runtime] [INFO] Finished
[2010/05/19 15:14:06] [create_and_send_configs] [INFO] Ending overall with 
error=0

I guess this is because the config files for the contacts are now huge:
-rw-r----- 1 nagios nagios 2.2M 2010-05-19 14:47 contactgroups.cfg
-rw-r----- 1 nagios nagios  13M 2010-05-19 14:47 contacts.cfg

Yes my slaves are reachable over slow lines, that's the idea of a slave.

How are these configs copied to the slave? Entire copy with scp? Wouldn't rsync 
be much much better? After all, most changes are small in between reloads.

Regards,
Toni Van Remortel
System Engineer @ Precision Operations N.V.
+32 3 451 92 20 - [email protected]
Satenrozen 2a, 2550 Kontich, Belgium

_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users

[opsview-users] Extreme slowdown in reload after upgrade to 3.7.0

Reply via email to