Re: [Nagios-users] Notifications severely impacting Nagios performance. Ideas?
Ryan Bowlby wrote: > How does everyone handle this issue? Obviously the notifications are a > bottleneck. Has anyone configured notifications to be dropped in a file or > fifo to be processed by an external daemon? Something like OCP daemon if > anyone is familiar with it. I've experienced the exact same behavior on both 2.x and 3.x when remote sites go down. Since each passive result takes 10s to time out, the queue just becomes so backed up it stops being useful. OCP Daemon fixed it for me. -- Sean McAfee System Engineer -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notifications severely impacting Nagios performance. Ideas?
Hello Ryan, I do not know what is organization of your network, but maybe you could take a look into service/host dependency. It depends on your network organization, but I think it could help http://nagios.sourceforge.net/docs/2_0/dependencies.html its purpose is to suppress notifications in case "main" host is down, in such case you get only notification about that problem. As I beleive that you have some gateway/router and in case it has some problems you are getting many notifications, and I think it is rara that many hosts have some local problem at same time Hope this helps Nice regards, Arlen On Tue, Aug 25, 2009 at 4:43 AM, Ryan Bowlby wrote: > Hello Everyone, > > We are experiencing issues with Nagios notifications. Apparently when a large > amount of services become critical and trigger notifications the system all > but stops processing new service check results (passive). The "last check > times" in the web interface stop updating until notifications are disabled. > > Tailing the nagios.log file during this issue shows mostly notifications and > hardly any check results being processed. As soon as I temporarily disable > notifications - system wide - the passive check results literally fly off the > screen, and all the stale results in the web interface clear up quickly. > > How does everyone handle this issue? Obviously the notifications are a > bottleneck. Has anyone configured notifications to be dropped in a file or > fifo to be processed by an external daemon? Something like OCP daemon if > anyone is familiar with it. > > Obviously parent child relationships would prove useful in curbing the > notifications, but implementation of relationships is a while off (5 D.C.s > lot's a data to have to collect). Any suggestions would be greatly > appreciated! > > -Ryan B. > > == > > FreeBSD 7.0-RELEASE amd64, Nagios 3.0 w/ NDO2DB 1.4b7 > Total Hosts: 3741, Total Services: 9321 > > > > > > -- > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notifications severely impacting Nagios performance. Ideas?
On Aug 24, 2009, at 9:43 PM, Ryan Bowlby wrote: > Hello Everyone, > > We are experiencing issues with Nagios notifications. Apparently > when a large amount of services become critical and trigger > notifications the system all but stops processing new service check > results (passive). The "last check times" in the web interface stop > updating until notifications are disabled. > > Tailing the nagios.log file during this issue shows mostly > notifications and hardly any check results being processed. As soon > as I temporarily disable notifications - system wide - the passive > check results literally fly off the screen, and all the stale > results in the web interface clear up quickly. I can confirm this behavior, at least under 2.x. I haven't had a major outage under 3.x yet to see if it still happens. My previous speculation on the issue is at the link below. I don't have enough knowledge of the code to be certain that's what's really happening though. Since then, I've come to suspect that it might actually be the logic to determine who gets notified rather than the actual notification itself. http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg10803.html > How does everyone handle this issue? I tried backgrounding the notification script but that didn't help. We recognize that in the event of a major outage, we don't want to receive those notifications anyway so we disable notifications program- wide until the outage is over. Not a fix, per-se, but a workable process for us. It just doesn't happen often enough to really troubleshoot or spend a lot of time on. -- Marc -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notifications severely impacting Nagios performance. Ideas?
2009/8/25 Ryan Bowlby : > Hello Everyone, > > We are experiencing issues with Nagios notifications. Apparently when a large > amount of services become critical and trigger notifications the system all > but stops processing new service check results (passive). The "last check > times" in the web interface stop updating until notifications are disabled. > > Tailing the nagios.log file during this issue shows mostly notifications and > hardly any check results being processed. As soon as I temporarily disable > notifications - system wide - the passive check results literally fly off the > screen, and all the stale results in the web interface clear up quickly. > > How does everyone handle this issue? Obviously the notifications are a > bottleneck. Has anyone configured notifications to be dropped in a file or > fifo to be processed by an external daemon? Something like OCP daemon if > anyone is familiar with it. > > Obviously parent child relationships would prove useful in curbing the > notifications, but implementation of relationships is a while off (5 D.C.s > lot's a data to have to collect). Any suggestions would be greatly > appreciated! If you have set retry_check_interval to a very low value compared with normal_check_interval, then inevitably Nagios will be working a lot harder if a lot of hosts are down at the same time. hth (a bit), Jim -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Notifications severely impacting Nagios performance. Ideas?
Hello Everyone, We are experiencing issues with Nagios notifications. Apparently when a large amount of services become critical and trigger notifications the system all but stops processing new service check results (passive). The "last check times" in the web interface stop updating until notifications are disabled. Tailing the nagios.log file during this issue shows mostly notifications and hardly any check results being processed. As soon as I temporarily disable notifications - system wide - the passive check results literally fly off the screen, and all the stale results in the web interface clear up quickly. How does everyone handle this issue? Obviously the notifications are a bottleneck. Has anyone configured notifications to be dropped in a file or fifo to be processed by an external daemon? Something like OCP daemon if anyone is familiar with it. Obviously parent child relationships would prove useful in curbing the notifications, but implementation of relationships is a while off (5 D.C.s lot's a data to have to collect). Any suggestions would be greatly appreciated! -Ryan B. == FreeBSD 7.0-RELEASE amd64, Nagios 3.0 w/ NDO2DB 1.4b7 Total Hosts: 3741, Total Services: 9321 -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null