On Thu, 6 Jan 2011, Christoph Maser wrote:

> Am Donnerstag, den 06.01.2011, 08:26 +0100 schrieb Tomas Macek:
>> Hi, I have Intel server with 2 xeon 3.00 GHz physical processors (together
>> 4 cores) and 16 GB RAM on RAID 1 array. I monitor with this about 1600
>> services and 1000 hosts. The service monitoring consists mostly (maybe
>> 95%) on check_ping and check_snmp services.
>>
>> I have an experience from the past, that when the nagios/icinga load is
>> too heavy, some checks are somehow skipped and I can see it for example in
>> "Host
>> problems" in the column "last check" - the last check is for example some
>> hours old, when it should be checked every 5 minutes. Forcing the check
>> solves always the problem.
>> For example this morning some host was down but never recovered, altbough
>> the check ping service on this was OK. Forcing the host check resolved
>> this.
>>
>> Do you think that this hardware is enaugh for such a load? Don't you think
>> that I'm doing something wrong? Thank you for experiances
>>
>> Regards, Tomas
>>
>
>
> Hi Tomas
>
> how high is the load on the system? Are periodic host checks globally
> enabled (execute_host_checks=1)? How does icingastats output look? Do
> you run any addons (idoutils) or other services on the same machine?
>
> I think your hardware should be good enough for the number of
> hosts/services unless you run the service checks in very short
> intervals.
>
> Chris

Hi Chris,
the load does not seems to be so high, sometimes in top's 'load 
average' appears someting like '5, X, Y', that means 5 processes waits
for CPU in previous minute. But this is only sometimes.

The load could cause our graphing, that is enabled on 
the server. It's a 
software, that updates 10.000 RRD files of 100 kB each. The graphing 
causes 
that the disk system is under heave load each 5 minutes for about 1 
minute. Maybe this causes the load and a could a little slow down the 
Icinga.

My system has "execute_host_checks=1". From the doc I don't understand 
exactly what that means for system performance.
I don't run any addons.

My Icingastats is here:
-------------------------------


Icinga Stats 1.2.1
Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 10-25-2010
License: GPL

CURRENT STATUS DATA
------------------------------------------------------
Status File:                            /var/icinga/status.dat
Status File Age:                        0d 0h 0m 3s
Status File Version:                    1.2.1

Program Running Time:                   0d 18h 54m 22s
Icinga PID:                             16634
Used/High/Total Command Buffers:        0 / 2 / 4096

Total Services:                         1635
Services Checked:                       1635
Services Scheduled:                     1635
Services Actively Checked:              1635
Services Passively Checked:             0
Total Service State Change:             0.000 / 10.790 / 0.043 %
Active Service Latency:                 0.003 / 3.086 / 0.362 sec
Active Service Execution Time:          0.014 / 13.349 / 2.318 sec
Active Service State Change:            0.000 / 10.790 / 0.043 %
Active Services Last 1/5/15/60 min:     564 / 1581 / 1633 / 1634
Passive Service Latency:                0.000 / 0.000 / 0.000 sec
Passive Service State Change:           0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min:    0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit:              1628 / 1 / 2 / 4
Services Flapping:                      0
Services In Downtime:                   0

Total Hosts:                            1006
Hosts Checked:                          1006
Hosts Scheduled:                        1006
Hosts Actively Checked:                 1006
Host Passively Checked:                 0
Total Host State Change:                0.000 / 10.260 / 0.028 %
Active Host Latency:                    0.007 / 2.104 / 0.366 sec
Active Host Execution Time:             0.009 / 9.021 / 0.046 sec
Active Host State Change:               0.000 / 10.260 / 0.028 %
Active Hosts Last 1/5/15/60 min:        253 / 991 / 1006 / 1006
Passive Host Latency:                   0.000 / 0.000 / 0.000 sec
Passive Host State Change:              0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min:       0 / 0 / 0 / 0
Hosts Up/Down/Unreach:                  1004 / 2 / 0
Hosts Flapping:                         0
Hosts In Downtime:                      0

Active Host Checks Last 1/5/15 min:     293 / 1058 / 3078
    Scheduled:                           291 / 1049 / 3052
    On-demand:                           2 / 9 / 26
    Parallel:                            291 / 1050 / 3054
    Serial:                              0 / 0 / 0
    Cached:                              2 / 8 / 24
Passive Host Checks Last 1/5/15 min:    0 / 0 / 0
Active Service Checks Last 1/5/15 min:  610 / 2449 / 7074
    Scheduled:                           610 / 2449 / 7074
    On-demand:                           0 / 0 / 0
    Cached:                              0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min:      0 / 0 / 0



------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/icinga-users

Reply via email to