On Thu, 6 Jan 2011, Christoph Maser wrote:
> Am Donnerstag, den 06.01.2011, 08:26 +0100 schrieb Tomas Macek:
>> Hi, I have Intel server with 2 xeon 3.00 GHz physical processors (together
>> 4 cores) and 16 GB RAM on RAID 1 array. I monitor with this about 1600
>> services and 1000 hosts. The service monitoring consists mostly (maybe
>> 95%) on check_ping and check_snmp services.
>>
>> I have an experience from the past, that when the nagios/icinga load is
>> too heavy, some checks are somehow skipped and I can see it for example in
>> "Host
>> problems" in the column "last check" - the last check is for example some
>> hours old, when it should be checked every 5 minutes. Forcing the check
>> solves always the problem.
>> For example this morning some host was down but never recovered, altbough
>> the check ping service on this was OK. Forcing the host check resolved
>> this.
>>
>> Do you think that this hardware is enaugh for such a load? Don't you think
>> that I'm doing something wrong? Thank you for experiances
>>
>> Regards, Tomas
>>
>
>
> Hi Tomas
>
> how high is the load on the system? Are periodic host checks globally
> enabled (execute_host_checks=1)? How does icingastats output look? Do
> you run any addons (idoutils) or other services on the same machine?
>
> I think your hardware should be good enough for the number of
> hosts/services unless you run the service checks in very short
> intervals.
>
> Chris
Hi Chris,
the load does not seems to be so high, sometimes in top's 'load
average' appears someting like '5, X, Y', that means 5 processes waits
for CPU in previous minute. But this is only sometimes.
The load could cause our graphing, that is enabled on
the server. It's a
software, that updates 10.000 RRD files of 100 kB each. The graphing
causes
that the disk system is under heave load each 5 minutes for about 1
minute. Maybe this causes the load and a could a little slow down the
Icinga.
My system has "execute_host_checks=1". From the doc I don't understand
exactly what that means for system performance.
I don't run any addons.
My Icingastats is here:
-------------------------------
Icinga Stats 1.2.1
Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 10-25-2010
License: GPL
CURRENT STATUS DATA
------------------------------------------------------
Status File: /var/icinga/status.dat
Status File Age: 0d 0h 0m 3s
Status File Version: 1.2.1
Program Running Time: 0d 18h 54m 22s
Icinga PID: 16634
Used/High/Total Command Buffers: 0 / 2 / 4096
Total Services: 1635
Services Checked: 1635
Services Scheduled: 1635
Services Actively Checked: 1635
Services Passively Checked: 0
Total Service State Change: 0.000 / 10.790 / 0.043 %
Active Service Latency: 0.003 / 3.086 / 0.362 sec
Active Service Execution Time: 0.014 / 13.349 / 2.318 sec
Active Service State Change: 0.000 / 10.790 / 0.043 %
Active Services Last 1/5/15/60 min: 564 / 1581 / 1633 / 1634
Passive Service Latency: 0.000 / 0.000 / 0.000 sec
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 1628 / 1 / 2 / 4
Services Flapping: 0
Services In Downtime: 0
Total Hosts: 1006
Hosts Checked: 1006
Hosts Scheduled: 1006
Hosts Actively Checked: 1006
Host Passively Checked: 0
Total Host State Change: 0.000 / 10.260 / 0.028 %
Active Host Latency: 0.007 / 2.104 / 0.366 sec
Active Host Execution Time: 0.009 / 9.021 / 0.046 sec
Active Host State Change: 0.000 / 10.260 / 0.028 %
Active Hosts Last 1/5/15/60 min: 253 / 991 / 1006 / 1006
Passive Host Latency: 0.000 / 0.000 / 0.000 sec
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 1004 / 2 / 0
Hosts Flapping: 0
Hosts In Downtime: 0
Active Host Checks Last 1/5/15 min: 293 / 1058 / 3078
Scheduled: 291 / 1049 / 3052
On-demand: 2 / 9 / 26
Parallel: 291 / 1050 / 3054
Serial: 0 / 0 / 0
Cached: 2 / 8 / 24
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
Active Service Checks Last 1/5/15 min: 610 / 2449 / 7074
Scheduled: 610 / 2449 / 7074
On-demand: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0
External Commands Last 1/5/15 min: 0 / 0 / 0
------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/icinga-users