[Nagios-users] [SPAM] Large installation

2012-06-11 Thread Brandino Andreas
Hi all,

my nagios installation  has currently 400+ hosts and around 1400 checks.
As the server load grows, delays are appearing.

Is any way to move a part of active checks to a second nagios server?
And in that case how will these two nagios servers exchange data?
If this is feasible can you point me to some documentation?

Thank you

<> ---  ---  --- <> 
Brandino Andreas
ampra...@gmail.com
<> ---  ---  --- <> 


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Large scale installation

2012-06-11 Thread Andreas Brandino
Hi all,

my nagios installation  has currently 400+ hosts and around 1400 checks.
As the server load grows, delays are appearing.

Is any way to move a part of active checks to a second nagios server?
And in that case how will these two nagios servers exchange data?
If this is feasible can you point me to some documentation?

Thank you
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Passive freshness checks go wrong

2012-06-11 Thread MAD
Hi,

I have a Nagios 3.2.3 box on an Ubuntu server 11.04 with about 930 hosts 
and 28500 services. For each host, I have 5 passive checks mapped on 
SNMP Traps. In order to secure those checks, I set up a freshness checks 
using check_dummy and a freshness threshold of 15min, as I normally 
receive traps every 10min.

When I added 5 hosts to Nagios, all my passive checks went progressively 
to a critical status saying that "return code of 127 is out of bound - 
plugin may be missing". I didn't touch my commands.cfg file nor my 
services files, I have only added 5 files to my hosts configuration 
directory and reloaded nagios.

Has somebody already seen that kind of behaviour?

Thanks in advance,

Marc-André

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_snmp_ibm_imm.sh Plugin Help

2012-06-11 Thread Peter . Shankland
When I run the following command, to check running it against the nagios 
user:

su nagios -s /bin/bash -c "./check_snmp_ibm_imm.sh -H 172.29.13.16 -C 
4H2KZNpX -T voltage"

I do get correct output:

Planar 3.3V = 3250
Planar 5V = 4900
Planar 12V = 11880
Planar VBAT = 2920
|Voltage1=3250 Voltage2=4900 Voltage3=11880 Voltage4=2920

Within Nagios, I am using the IP address of the IMM card and not a DNS 
name of either the IMM or physical server.

Any ideas?

Thanks.



Jake Xu  
04/06/2012 21:32
Please respond to
Nagios Users List 


To
Nagios Users List , 
cc

Subject
Re: [Nagios-users] check_snmp_ibm_imm.sh Plugin Help






In the HOSTNAME variable, you might want to use the ip of the IMM instead 
of the ip of the host. IBM server itself doesn't support this IMM check. 
You should be running the check against the IBM IMM device.

Jake

On Fri, Jun 1, 2012 at 1:56 AM, Giles Coochey  wrote:
On 01/06/2012 00:18, Stuart Browne wrote: 
What happens when you run this check as the user Nagios runs as (usually 
'nagios')?
 
Stuart
 
From: peter.shankl...@ricoh-rpl.com [mailto:peter.shankl...@ricoh-rpl.com] 

Sent: Thursday, 31 May 2012 8:14 PM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] check_snmp_ibm_imm.sh Plugin Help
 
Hi all, 

I am trying to use the check_snmp_ibm_imm.sh plugin from Nagios Exchange (
http://exchange.nagios.org/directory/Plugins/Hardware/Server-Hardware/IBM/check_snmp_ibm_imm-2Esh/details
) but am having issues with getting a status back through Nagios - command 
line seems to work fine. 

As an example, when I run "./check_snmp_ibm_imm.sh -H  -C 
 -T voltage" at the command line of the Nagios server I get 
output: 



So this shows that the plugin is working fine. However, when I then 
translate that into Nagios with the new command of: 
Not quite - I don't see a status of OK / WARNING or CRITICAL - and this 
doesn't show you the exit code of the check either.
I don't know this plugin, but are there any parameters for warning or 
critical ? You may need to put these in. At the moment it is only 
returning performance data to you.





I just get the following: 



It is obviously something I am doing wrong in Nagios but could do with a 
push in the right direction :) 

Thanks. 
Pete. 



 Peter Shankland 
 TECHNICAL NETWORK SPECIALIST 
 IT DEPARTMENT 
 DD:+44 (0) 1952 205160 
 F:+44 (0) 1952 213100 
 M:+44 (0) 7919 444077   
 E: peter.shankl...@ricoh-rpl.com 



 Ricoh UK Products Limited 
 Priorslee | Telford | Shropshire | TF2 9NS 
 T: +44 (0) 1952 290090 
  
Please do not print this email unless absolutely necessary in order to 
save paper and energy, and you will contribute to resource conservation 
and CO2 reduction. This email including attachments is intended for the 
addressee(s) only. It may be labelled confidential/ private and contain 
confidential/private information. Please respect the wishes of the sender 
in the way you treat this email and the information contained within. If 
in doubt clarify the wishes of the sender before acting. If you have 
received this email in error, you may not review, copy or forward this 
message in whole or in part. Ricoh UK Products employees should delete 
from their system and notify us of the error via the ISMS Security 
Incident Reporting database. External recipients should delete from their 
system and alert us via email, advising the name of the sender and the 
time and date of receipt. Any views expressed in this email may not 
necessarily reflect those of Ricoh UK Products Ltd. You should ensure that 
the onward transmission, opening or use of this message or attachments 
will not adversely affect your system or data and carry out anti-virus 
checks before downloading. Internet communications are not secure and 
therefore Ricoh UK Products Ltd accepts no responsibility for any direct, 
indirect or consequential damage resulting from the transmission of this 
message.

Registered in England No. 1763860
Registered Office: Ricoh UK Products Limited, Priorslee, Telford, 
Shropshire, TF2 9NS 


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when 
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


-- 
Regards,

Giles Coochey, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 7983 877438
http://www.coochey.net
http://www.netsecspec.co.uk
gi...@coochey.net



Re: [Nagios-users] Large scale installation

2012-06-11 Thread Daniel Wittenberg
Take a look at mod_gearman for distributing the checks:

http://labs.consol.de/lang/de/nagios/mod-gearman/

Dan


On Jun 10, 2012, at 7:38 AM, Andreas Brandino wrote:

Hi all,

my nagios installation  has currently 400+ hosts and around 1400 checks.
As the server load grows, delays are appearing.

Is any way to move a part of active checks to a second nagios server?
And in that case how will these two nagios servers exchange data?
If this is feasible can you point me to some documentation?

Thank you 
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. 
http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Large scale installation

2012-06-11 Thread Assaf Flatto

See the documentation on distributed monitoring .

Also check out check_mk and mod_gearman.

Assaf


On 10/06/12 13:38, Andreas Brandino wrote:

Hi all,

my nagios installation  has currently 400+ hosts and around 1400 checks.
As the server load grows, delays are appearing.

Is any way to move a part of active checks to a second nagios server?
And in that case how will these two nagios servers exchange data?
If this is feasible can you point me to some documentation?

Thank you


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Large scale installation

2012-06-11 Thread Randal, Phil
What's the spec of your nagios server?

We're checking (around) 500 hosts, 4500 active, 5000 passive service with 
Nagios 3.4.1 in a CentOS 5.8 VM with 2GB RAM, 4 vCPUs, without problems with 
the help of check_mk /mk_livestatus (http://mathias-kettner.de/check_mk.html)

Also using pnp4nagios and rrdcached and ramdisk for checkresults.

Large installation config tweaks, and tuning the check result reaper frequency 
all help even out the load.

Cheers,

Phil
--
Phil Randal
Infrastructure Engineer
Hoople Ltd | Thorn Office Centre | Hereford HR2 6JT
Tel: 01432 260415 | Email: phil.ran...@hoopleltd.co.uk

From: Andreas Brandino [mailto:ampra...@gmail.com]
Sent: 10 June 2012 13:39
To: Nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Large scale installation

Hi all,

my nagios installation  has currently 400+ hosts and around 1400 checks.
As the server load grows, delays are appearing.

Is any way to move a part of active checks to a second nagios server?
And in that case how will these two nagios servers exchange data?
If this is feasible can you point me to some documentation?

Thank you
"Any opinion expressed in this e-mail or any attached files are those of the 
individual and not necessarily those of Hoople Ltd. You should be aware that 
Hoople Ltd. monitors its email service. This e-mail and any attached files are 
confidential and intended solely for the use of the addressee. This 
communication may contain material protected by law from being passed on. If 
you are not the intended recipient and have received this e-mail in error, you 
are advised that any use, dissemination, forwarding, printing or copying of 
this e-mail is strictly prohibited. If you have received this e-mail in error 
please contact the sender immediately and destroy all copies of it.
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Large scale installation

2012-06-11 Thread Paul Weaver
Doesn't it depend on how often you're performing the checks too? 1000 checks 
every 10 seconds is harder than 10,000 checks every hour.

We have 589 hosts/3619 service on a 2 cpu 2.8GHz xeon with 1GB of ram, which 
does other things too. The machine is about 8-10 years old.

The checks are scheduled every 4 minutes, however only 45% have run in the last 
4 minutes. 95% have run in the last 15.

This is with nagios2, which has issues like blocking when hosts are down (32 
currently are) though.


-Original Message-
From: Randal, Phil [mailto:phil.ran...@hoopleltd.co.uk]
Sent: Mon 11/06/2012 4:54 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Large scale installation
 
What's the spec of your nagios server?

We're checking (around) 500 hosts, 4500 active, 5000 passive service with 
Nagios 3.4.1 in a CentOS 5.8 VM with 2GB RAM, 4 vCPUs, without problems with 
the help of check_mk /mk_livestatus (http://mathias-kettner.de/check_mk.html)

Also using pnp4nagios and rrdcached and ramdisk for checkresults.

Large installation config tweaks, and tuning the check result reaper frequency 
all help even out the load.

Cheers,

Phil
--
Phil Randal
Infrastructure Engineer
Hoople Ltd | Thorn Office Centre | Hereford HR2 6JT
Tel: 01432 260415 | Email: phil.ran...@hoopleltd.co.uk

From: Andreas Brandino [mailto:ampra...@gmail.com]
Sent: 10 June 2012 13:39
To: Nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Large scale installation

Hi all,

my nagios installation  has currently 400+ hosts and around 1400 checks.
As the server load grows, delays are appearing.

Is any way to move a part of active checks to a second nagios server?
And in that case how will these two nagios servers exchange data?
If this is feasible can you point me to some documentation?

Thank you
"Any opinion expressed in this e-mail or any attached files are those of the 
individual and not necessarily those of Hoople Ltd. You should be aware that 
Hoople Ltd. monitors its email service. This e-mail and any attached files are 
confidential and intended solely for the use of the addressee. This 
communication may contain material protected by law from being passed on. If 
you are not the intended recipient and have received this e-mail in error, you 
are advised that any use, dissemination, forwarding, printing or copying of 
this e-mail is strictly prohibited. If you have received this e-mail in error 
please contact the sender immediately and destroy all copies of it.


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] [SPAM] Large installation

2012-06-11 Thread Claudio Kuenzler
I think mod_gearman is what you're looking for:
http://labs.consol.de/lang/en/nagios/mod-gearman/

Also 1400 checks is not that huge of a setup. You can also specify
check_interval for certain checks which don't need to be executed every now
and then (e.g. Disk Space Utilization, HDD Smart Status or validity of SSL
certificates).
This can help a lot to bring down the load on your Nagios server.

On Sun, Jun 10, 2012 at 2:28 PM, Brandino Andreas wrote:

> Hi all,
>
> my nagios installation  has currently 400+ hosts and around 1400 checks.
> As the server load grows, delays are appearing.
>
> Is any way to move a part of active checks to a second nagios server?
> And in that case how will these two nagios servers exchange data?
> If this is feasible can you point me to some documentation?
>
> Thank you
>
> <> ---  ---  --- <>
>Brandino Andreas
>ampra...@gmail.com
> <> ---  ---  --- <>
>
>
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Large scale installation

2012-06-11 Thread Giorgio Zarrelli
Hi,

there are some key factors involved in high delays:

iowait


Ciao,

Giorgio

Il giorno 10/giu/2012, alle ore 14:38, Andreas Brandino  ha 
scritto:

> Hi all,
> 
> my nagios installation  has currently 400+ hosts and around 1400 checks.
> As the server load grows, delays are appearing.
> 
> Is any way to move a part of active checks to a second nagios server?
> And in that case how will these two nagios servers exchange data?
> If this is feasible can you point me to some documentation?
> 
> Thank you
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Large scale installation

2012-06-11 Thread Giorgio Zarrelli
Hi,

I suggest to review your installation. Try with the large installation
tweaks http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html.

Then, check whether you need all your checks at 5 mins or you can move
some of them to 10 mins pace.

Then, review your check plugins: Perl plugins eat more memory and CPU
cycles then C compiled checks. If they support EPN
http://nagios.sourceforge.net/docs/3_0/embeddedperl.html, use it, it makes
your plugin faster and lighter.

Then, check your checks. Some checks return data slower then others. Let's
say, SNMP checks are not lightning fast.

Then, check your graphs. Graphing perfdata takes CPU cycles and uses
memory. Do you need all your graphs?

Then, get rid of NDOUtils. They are chocking all the way, not efficient,
clumsy, old and heavy. If you want to store your data in MySQL, use Merlin
instead.

Anyway, did you tune your MySQL? Is it causing too much I/O? Is it
munching too much RAM or CPU cycles?

Did you tune your Apache or http server? Does it cope with your needs? Is
it munching too much RAM or CPU cycles?

If you want live infos about your hosts and services, let's say to use
with Navis, grab MKlive: it's blazing fast and gives you access to the
core Nagios process.

Are you using a virtualized environment? If so, remember that I/O layer in
virtualized environments has a poor performance, use fast and real disks
and your I/O will drop dramatically.

Try to move status.dat to /dev/shm. The latter is a ram disk ready to use
and writing in ram is always faster then writing on disk.

Avoid logging too much, it increases I/O and takes CPU and RAM.

What iotop and iostat are telling you?

What do you see in top or htop?

If you can or wish, compile all from sources, it will go faster on your
system.

You can use passive checks with NSCA or NRDP to reduce load, even though I
do not like them a lot.

These are just few ideas that came to my mind.


Let's talk about sharing load.

You can use different methods:

Merlin
(http://www.op5.org/community/plugin-inventory/op5-projects/merlin): gives
you loadbalancing and redundancy. I use it for Ninja, never used for load
balancing and redundancy.

DNX (http://dnx.sourceforge.net/): Something new, it's gaining momentum,
good to offload the checks. Worth to give a try.

Mod_gearman (http://labs.consol.de/lang/de/nagios/mod-gearman/): Love at
first site :-) Easy, powerful, load balancing and fault tolerant. Compile
gearmand with memcached support and all the result checks will go directly
to ram, avoiding I/O on disk. It's really simple to setup, if one of the
workers go down, the others will share its work. Be careful: security is a
problem, there is not a good auth system, but using a VPN will solve the
problem. Efficient, I use a virtual machine with 2 cores and 2 gb of ram
to make about 5K checks. And the load is not a concern. You need more
horse power? Add a worker. You have some checks timing out due to poor
connections to the targets? Put a worker close to the target, but be
careful, the timing, let's say the rta of a ping, will be from the worker
perspective.

Well, hope it helps.










--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Large scale installation

2012-06-11 Thread Jake Xu
Also, you might want to find out the performance of your service checks.

The nagios profiler is a very good tool to find execution time of
individual services.

http://exchange.nagios.org/directory/Plugins/Network-and-Systems-Management/Nagios/Profiler-to-check-plugin-execution-time/details

On Mon, Jun 11, 2012 at 12:40 PM, Giorgio Zarrelli wrote:

> Hi,
>
> I suggest to review your installation. Try with the large installation
> tweaks http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html.
>
> Then, check whether you need all your checks at 5 mins or you can move
> some of them to 10 mins pace.
>
> Then, review your check plugins: Perl plugins eat more memory and CPU
> cycles then C compiled checks. If they support EPN
> http://nagios.sourceforge.net/docs/3_0/embeddedperl.html, use it, it makes
> your plugin faster and lighter.
>
> Then, check your checks. Some checks return data slower then others. Let's
> say, SNMP checks are not lightning fast.
>
> Then, check your graphs. Graphing perfdata takes CPU cycles and uses
> memory. Do you need all your graphs?
>
> Then, get rid of NDOUtils. They are chocking all the way, not efficient,
> clumsy, old and heavy. If you want to store your data in MySQL, use Merlin
> instead.
>
> Anyway, did you tune your MySQL? Is it causing too much I/O? Is it
> munching too much RAM or CPU cycles?
>
> Did you tune your Apache or http server? Does it cope with your needs? Is
> it munching too much RAM or CPU cycles?
>
> If you want live infos about your hosts and services, let's say to use
> with Navis, grab MKlive: it's blazing fast and gives you access to the
> core Nagios process.
>
> Are you using a virtualized environment? If so, remember that I/O layer in
> virtualized environments has a poor performance, use fast and real disks
> and your I/O will drop dramatically.
>
> Try to move status.dat to /dev/shm. The latter is a ram disk ready to use
> and writing in ram is always faster then writing on disk.
>
> Avoid logging too much, it increases I/O and takes CPU and RAM.
>
> What iotop and iostat are telling you?
>
> What do you see in top or htop?
>
> If you can or wish, compile all from sources, it will go faster on your
> system.
>
> You can use passive checks with NSCA or NRDP to reduce load, even though I
> do not like them a lot.
>
> These are just few ideas that came to my mind.
>
>
> Let's talk about sharing load.
>
> You can use different methods:
>
> Merlin
> (http://www.op5.org/community/plugin-inventory/op5-projects/merlin): gives
> you loadbalancing and redundancy. I use it for Ninja, never used for load
> balancing and redundancy.
>
> DNX (http://dnx.sourceforge.net/): Something new, it's gaining momentum,
> good to offload the checks. Worth to give a try.
>
> Mod_gearman (http://labs.consol.de/lang/de/nagios/mod-gearman/): Love at
> first site :-) Easy, powerful, load balancing and fault tolerant. Compile
> gearmand with memcached support and all the result checks will go directly
> to ram, avoiding I/O on disk. It's really simple to setup, if one of the
> workers go down, the others will share its work. Be careful: security is a
> problem, there is not a good auth system, but using a VPN will solve the
> problem. Efficient, I use a virtual machine with 2 cores and 2 gb of ram
> to make about 5K checks. And the load is not a concern. You need more
> horse power? Add a worker. You have some checks timing out due to poor
> connections to the targets? Put a worker close to the target, but be
> careful, the timing, let's say the rta of a ping, will be from the worker
> perspective.
>
> Well, hope it helps.
>
>
>
>
>
>
>
>
>
>
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugi

Re: [Nagios-users] Large scale installation

2012-06-11 Thread Ian Orszaczki
Great advice.  Funny you should mention status.dat in ramdisk as we have
hit a hiccup this morning which has meant we have lost comments and
downtimes.

We had moved status.dat to a ramdisk as recommended for large installations
(we monitoring 3390 hosts with 18748 services from one server, latencies
below 2 secs and load under 2) but after running out of open files the
status.dat was zero'd.


As an extreme hack I ran a quick script across the output of -
# grep EXTERNAL nagios.log | grep ACK | cut -c57- > /tmp/acks.txt

Script -
 #!/bin/sh
 # This is a sample shell script showing how you can submit the
ACKNOWLEDGE_HOST_PROBLEM command
 # to Nagios.  Adjust variables to fit your environment as necessary.
 now=`date +%s`
 commandfile='/app/nagios/var/rw/nagios.cmd'
 cat /tmp/acks.txt | while read line
 do
 echo $line
 /usr/bin/printf "[%lu] $line\n" $now > $commandfile
 done

Therefore I am going to move status.dat back onto the localdisk (luckily
SSD drives) so that we can at least restore from a recent backup. I will
probably also create valid copy, along with retention.dat, every hour to
enable quick recovery. And yes, I have increased the process and open files
limits for the nagios user.

Am I missing anything obvious >


On Tue, Jun 12, 2012 at 5:40 AM, Giorgio Zarrelli  wrote:

> Hi,
>
> I suggest to review your installation. Try with the large installation
> tweaks http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html.
>
> Then, check whether you need all your checks at 5 mins or you can move
> some of them to 10 mins pace.
>
> Then, review your check plugins: Perl plugins eat more memory and CPU
> cycles then C compiled checks. If they support EPN
> http://nagios.sourceforge.net/docs/3_0/embeddedperl.html, use it, it makes
> your plugin faster and lighter.
>
> Then, check your checks. Some checks return data slower then others. Let's
> say, SNMP checks are not lightning fast.
>
> Then, check your graphs. Graphing perfdata takes CPU cycles and uses
> memory. Do you need all your graphs?
>
> Then, get rid of NDOUtils. They are chocking all the way, not efficient,
> clumsy, old and heavy. If you want to store your data in MySQL, use Merlin
> instead.
>
> Anyway, did you tune your MySQL? Is it causing too much I/O? Is it
> munching too much RAM or CPU cycles?
>
> Did you tune your Apache or http server? Does it cope with your needs? Is
> it munching too much RAM or CPU cycles?
>
> If you want live infos about your hosts and services, let's say to use
> with Navis, grab MKlive: it's blazing fast and gives you access to the
> core Nagios process.
>
> Are you using a virtualized environment? If so, remember that I/O layer in
> virtualized environments has a poor performance, use fast and real disks
> and your I/O will drop dramatically.
>
> Try to move status.dat to /dev/shm. The latter is a ram disk ready to use
> and writing in ram is always faster then writing on disk.
>
> Avoid logging too much, it increases I/O and takes CPU and RAM.
>
> What iotop and iostat are telling you?
>
> What do you see in top or htop?
>
> If you can or wish, compile all from sources, it will go faster on your
> system.
>
> You can use passive checks with NSCA or NRDP to reduce load, even though I
> do not like them a lot.
>
> These are just few ideas that came to my mind.
>
>
> Let's talk about sharing load.
>
> You can use different methods:
>
> Merlin
> (http://www.op5.org/community/plugin-inventory/op5-projects/merlin): gives
> you loadbalancing and redundancy. I use it for Ninja, never used for load
> balancing and redundancy.
>
> DNX (http://dnx.sourceforge.net/): Something new, it's gaining momentum,
> good to offload the checks. Worth to give a try.
>
> Mod_gearman (http://labs.consol.de/lang/de/nagios/mod-gearman/): Love at
> first site :-) Easy, powerful, load balancing and fault tolerant. Compile
> gearmand with memcached support and all the result checks will go directly
> to ram, avoiding I/O on disk. It's really simple to setup, if one of the
> workers go down, the others will share its work. Be careful: security is a
> problem, there is not a good auth system, but using a VPN will solve the
> problem. Efficient, I use a virtual machine with 2 cores and 2 gb of ram
> to make about 5K checks. And the load is not a concern. You need more
> horse power? Add a worker. You have some checks timing out due to poor
> connections to the targets? Put a worker close to the target, but be
> careful, the timing, let's say the rta of a ping, will be from the worker
> perspective.
>
> Well, hope it helps.
>
>
>
>
>
>
>
>
>
>
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfr

Re: [Nagios-users] Large scale installation

2012-06-11 Thread Giorgio Zarrelli
Hi,

You are right., open files IS a major concern I forgot to mention. A quick and 
dirty method to solve it is to raise the number of open files putting ulimit 
command folllowed by a high value in The Nagios startup script.

ulimit -a will tell The current system wirde ulimit value.

Lucky you, ssd disks are a good improvement!

Ciao,

Giorgio

Il giorno 12/giu/2012, alle ore 03:59, Ian Orszaczki  ha 
scritto:

> 
> Great advice.  Funny you should mention status.dat in ramdisk as we have hit 
> a hiccup this morning which has meant we have lost comments and downtimes.
> 
> We had moved status.dat to a ramdisk as recommended for large installations 
> (we monitoring 3390 hosts with 18748 services from one server, latencies 
> below 2 secs and load under 2) but after running out of open files the 
> status.dat was zero'd.
> 
> 
> As an extreme hack I ran a quick script across the output of -
> # grep EXTERNAL nagios.log | grep ACK | cut -c57- > /tmp/acks.txt
> 
> Script -
> 
> 
> #!/bin/sh
> 
> 
> # This is a sample shell script showing how you can submit the 
> ACKNOWLEDGE_HOST_PROBLEM command
> 
> 
> # to Nagios.  Adjust variables to fit your environment as necessary.
> 
> 
> now=`date +%s`
> 
> 
> commandfile='/app/nagios/var/rw/nagios.cmd'
> 
> 
> cat /tmp/acks.txt | while read line
> 
> 
> do
> 
> 
> echo $line
> 
> 
> /usr/bin/printf "[%lu] $line\n" $now > $commandfile
> 
> 
> done
> 
> Therefore I am going to move status.dat back onto the localdisk (luckily SSD 
> drives) so that we can at least restore from a recent backup. I will probably 
> also create valid copy, along with retention.dat, every hour to enable quick 
> recovery. And yes, I have increased the process and open files limits for the 
> nagios user.
> 
> Am I missing anything obvious >
> 
> 
> On Tue, Jun 12, 2012 at 5:40 AM, Giorgio Zarrelli  wrote:
> Hi,
> 
> I suggest to review your installation. Try with the large installation
> tweaks http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html.
> 
> Then, check whether you need all your checks at 5 mins or you can move
> some of them to 10 mins pace.
> 
> Then, review your check plugins: Perl plugins eat more memory and CPU
> cycles then C compiled checks. If they support EPN
> http://nagios.sourceforge.net/docs/3_0/embeddedperl.html, use it, it makes
> your plugin faster and lighter.
> 
> Then, check your checks. Some checks return data slower then others. Let's
> say, SNMP checks are not lightning fast.
> 
> Then, check your graphs. Graphing perfdata takes CPU cycles and uses
> memory. Do you need all your graphs?
> 
> Then, get rid of NDOUtils. They are chocking all the way, not efficient,
> clumsy, old and heavy. If you want to store your data in MySQL, use Merlin
> instead.
> 
> Anyway, did you tune your MySQL? Is it causing too much I/O? Is it
> munching too much RAM or CPU cycles?
> 
> Did you tune your Apache or http server? Does it cope with your needs? Is
> it munching too much RAM or CPU cycles?
> 
> If you want live infos about your hosts and services, let's say to use
> with Navis, grab MKlive: it's blazing fast and gives you access to the
> core Nagios process.
> 
> Are you using a virtualized environment? If so, remember that I/O layer in
> virtualized environments has a poor performance, use fast and real disks
> and your I/O will drop dramatically.
> 
> Try to move status.dat to /dev/shm. The latter is a ram disk ready to use
> and writing in ram is always faster then writing on disk.
> 
> Avoid logging too much, it increases I/O and takes CPU and RAM.
> 
> What iotop and iostat are telling you?
> 
> What do you see in top or htop?
> 
> If you can or wish, compile all from sources, it will go faster on your
> system.
> 
> You can use passive checks with NSCA or NRDP to reduce load, even though I
> do not like them a lot.
> 
> These are just few ideas that came to my mind.
> 
> 
> Let's talk about sharing load.
> 
> You can use different methods:
> 
> Merlin
> (http://www.op5.org/community/plugin-inventory/op5-projects/merlin): gives
> you loadbalancing and redundancy. I use it for Ninja, never used for load
> balancing and redundancy.
> 
> DNX (http://dnx.sourceforge.net/): Something new, it's gaining momentum,
> good to offload the checks. Worth to give a try.
> 
> Mod_gearman (http://labs.consol.de/lang/de/nagios/mod-gearman/): Love at
> first site :-) Easy, powerful, load balancing and fault tolerant. Compile
> gearmand with memcached support and all the result checks will go directly
> to ram, avoiding I/O on disk. It's really simple to setup, if one of the
> workers go down, the others will share its work. Be careful: security is a
> problem, there is not a good auth system, but using a VPN will solve the
> problem. Efficient, I use a virtual machine with 2 cores and 2 gb of ram
> to make about 5K checks. And the load is not a concern. You need more
> horse power? Add a worker. You have some checks timing out due to poor
> connections to