Re: [Nagios-users] status.cgi very high cpu usage = Problem gone away
Not slow because of many requests, slow because cpu execution time is like 30 times slower, effectively running like 100Mh i486 cpu (benchmark on the console only, disconnect all network NIC. So no ; not because of Google :-) Obviously vmware is doing something nasty. When the box is slow and high load, the host load is still rather low. Cheers, On Fri, Feb 22, 2008 at 7:26 PM, Hugo van der Kooij [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Steve Kieu wrote: | The most scary thing is, as suddenly like when it came, it suddenly went | away this morning. Not any changes to the nagios system or vmware guest | and hosts. | | I have been doing the similar benchmark of status.cgi on the real host | with the same status.dat file with the host having problem. It takes 0.1 | sec to process, and in the vmware host it took 3.9 second. I have a | quick look of how status.cgi parse the text file and see lots of memory | cmp (strcmp call to libc) and move. So my wild guess is that memory | operation on the vm is terribly slow for some reason. And suddenly as | come from nowhere, it just become as fast as normal. | | Any one has a bright idea of what is going on ? You wouldn't happen to own a google indexing device now would you? THat could hammer down a webbased server like nagios. Hugo. - -- [EMAIL PROTECTED] http://hugo.vanderkooij.org/ PGP/GPG http://hugo.vanderkooij.org/PGP/GPG? Use: http://hugo.vanderkooij.org/0x58F19981.asc A: Yes. Q: Are you sure? A: Because it reverses the logical flow of conversation. Q: Why is top posting frowned upon? Bored? Click on http://spamornot.org/ and rate those images. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHvmsXBvzDRVjxmYERAi0qAKC3ZCvsFQ24Hz4BvQj7M+MMAzEiJwCgnuMV emfj+Gf5tZjQxS0zZEdnyzM= =YkCx -END PGP SIGNATURE- - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Steve Kieu Mob: (+64) 021 250 6437 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] status.cgi very high cpu usage = Problem gone away
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Steve Kieu wrote: | The most scary thing is, as suddenly like when it came, it suddenly went | away this morning. Not any changes to the nagios system or vmware guest | and hosts. | | I have been doing the similar benchmark of status.cgi on the real host | with the same status.dat file with the host having problem. It takes 0.1 | sec to process, and in the vmware host it took 3.9 second. I have a | quick look of how status.cgi parse the text file and see lots of memory | cmp (strcmp call to libc) and move. So my wild guess is that memory | operation on the vm is terribly slow for some reason. And suddenly as | come from nowhere, it just become as fast as normal. | | Any one has a bright idea of what is going on ? You wouldn't happen to own a google indexing device now would you? THat could hammer down a webbased server like nagios. Hugo. - -- [EMAIL PROTECTED] http://hugo.vanderkooij.org/ PGP/GPG? Use: http://hugo.vanderkooij.org/0x58F19981.asc A: Yes. Q: Are you sure? A: Because it reverses the logical flow of conversation. Q: Why is top posting frowned upon? Bored? Click on http://spamornot.org/ and rate those images. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHvmsXBvzDRVjxmYERAi0qAKC3ZCvsFQ24Hz4BvQj7M+MMAzEiJwCgnuMV emfj+Gf5tZjQxS0zZEdnyzM= =YkCx -END PGP SIGNATURE- - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] status.cgi very high cpu usage = Problem gone away
Hi, The most scary thing is, as suddenly like when it came, it suddenly went away this morning. Not any changes to the nagios system or vmware guest and hosts. I have been doing the similar benchmark of status.cgi on the real host with the same status.dat file with the host having problem. It takes 0.1 sec to process, and in the vmware host it took 3.9 second. I have a quick look of how status.cgi parse the text file and see lots of memory cmp (strcmp call to libc) and move. So my wild guess is that memory operation on the vm is terribly slow for some reason. And suddenly as come from nowhere, it just become as fast as normal. Any one has a bright idea of what is going on ? Thanks -- Steve Kieu Mob: (+64) 021 250 6437 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] status.cgi very high cpu usage
Steve, On Feb 18, 2008 10:51 PM, Steve Kieu [EMAIL PROTECTED] wrote: I have a problem with status.cgi taking up too much cpu so the page is very slow to render. Is there any way to find out where the problem is? We have about 650 services monitored. The output os nagios -s command is Many of the Monitoring reports don't work well at volume, I've been asking users to only use Unhandled reports. You may get better response in Mozilla, but 'status.cgi' can kill Internet Explorer because of how it's loading everything in one large list. Nagios is at the point where it needs an SQL back end with a more modular look at how it stores site data. Perhaps, rolling status up into summary reports that are queried to create reports then go into host tables only when someone drills down into host information. In production you'll want to be on a multi-core multi-threaded machine; 2 cores won't do it if you'll have more than one user in the system. Until then, keep users in the Unhandled menus around {Service,Host} Problems Best, Justin -- Attention Sales And Marketing Professionals Who Serve B2B Executives http://hittpublishingdirect.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] status.cgi very high cpu usage
-Original Message- From: [EMAIL PROTECTED] [mailto:nagios-users- [EMAIL PROTECTED] On Behalf Of Justin Hitt Sent: Tuesday, February 19, 2008 9:15 AM To: Steve Kieu Cc: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] status.cgi very high cpu usage Steve, On Feb 18, 2008 10:51 PM, Steve Kieu [EMAIL PROTECTED] wrote: I have a problem with status.cgi taking up too much cpu so the page is very slow to render. Is there any way to find out where the problem is? We have about 650 services monitored. The output os nagios -s command is Many of the Monitoring reports don't work well at volume, I've been asking users to only use Unhandled reports. You may get better response in Mozilla, but 'status.cgi' can kill Internet Explorer because of how it's loading everything in one large list. This is a browser rendering issue, nothing to do with the nagios' speed at reading and parsing its status file. To get the html to display status for all 3800 of my services takes under 1.5 seconds -- $ time wget http://redacted/cgi-bin/status.cgi?host=all --12:59:50-- http://redacted/cgi-bin/status.cgi?host=all = `status.cgi?host=all' Resolving redacted Connecting to redacted|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] [ = ] 4,540,227 23.92M/s 12:59:51 (23.84 MB/s) - `status.cgi?host=all' saved [4540227] real0m1.364s user0m0.010s sys 0m0.000s $ wc -l status.cgi\?host\=all 128601 status.cgi?host=all Nagios is at the point where it needs an SQL back end with a more modular look at how it stores site data. Perhaps, rolling status up Perhaps, but not for speed/performance reasons (outside of long-duration archive reports), IMHO. into summary reports that are queried to create reports then go into host tables only when someone drills down into host information. Isn't this already available? Status Summary, various links from Tactical Overview, Service Problems, etc. In production you'll want to be on a multi-core multi-threaded machine; 2 cores won't do it if you'll have more than one user in the system. Until then, keep users in the Unhandled menus around {Service,Host} Problems This is best from a workflow perspective but saying that you need to have dual cores if you have more than one nagios user is a bit a dubious statement. My own experience is that the above test used less than 3% cpu for the duration of a 3.1Ghz Xeon cpu, even when viewed with a browser. Granted it's not a vmware box but if vmware introduces that significant of a performance impact, I'd abandon it without prejudice. -- Marc - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] status.cgi very high cpu usage
Many of the Monitoring reports don't work well at volume, I've been asking users to only use Unhandled reports. You may get better response in Mozilla, but 'status.cgi' can kill Internet Explorer because of how it's loading everything in one large list. This is a browser rendering issue, nothing to do with the nagios' speed at reading and parsing its status file. To get the html to display status for all 3800 of my services takes under 1.5 seconds -- In my case it is not the browser issue. The problem is that status,cgi does not return data rather than the data it generates is so big. And status.cgi hog cpu time at the monitoring host (not the desktop viewing using a browser) I guess there are something wrong with status.cgi and even the extinfo.cgitake considerable amount of cpu as well Thanks -- Steve Kieu Mob: (+64) 021 250 6437 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] status.cgi very high cpu usage
-Original Message- From: Steve Kieu [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 19, 2008 2:32 PM To: Marc Powell Cc: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] status.cgi very high cpu usage Many of the Monitoring reports don't work well at volume, I've been asking users to only use Unhandled reports. You may get better response in Mozilla, but 'status.cgi' can kill Internet Explorer because of how it's loading everything in one large list. This is a browser rendering issue, nothing to do with the nagios' speed at reading and parsing its status file. To get the html to display status for all 3800 of my services takes under 1.5 seconds -- In my case it is not the browser issue. The problem is that status,cgi does not return data rather than the data it generates is so big. 4.4M of data in 1.39s -- [EMAIL PROTECTED] nagios]$ export REQUEST_METHOD=GET; export QUERY_STRING=host=all; export REMOTE_USER=mpowell; [EMAIL PROTECTED] nagios]$ time ./sbin/status.cgi foo real0m1.390s user0m1.300s sys 0m0.090s [EMAIL PROTECTED] nagios]$ du -sh foo 4.4Mfoo And status.cgi hog cpu time at the monitoring host (not the desktop viewing using a browser) I guess there are something wrong with status.cgi and even the extinfo.cgi take considerable amount of cpu as well What status view specifically is the issue for you? What version of nagios? Unless you're still using nagios-1.x, my gut feeling is that there is something outside of nagios causing the issue (i.e. disk IO, memory pressure, etc). Nagios-2 should be able to easily handle the status data for 650 services. That was an area of significant focus and improvement with the current version. -- Marc - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] status.cgi very high cpu usage
Hello, [EMAIL PROTECTED] nagios]$ time ./sbin/status.cgi foo real0m1.390s user0m1.300s sys 0m0.090s [EMAIL PROTECTED] nagios]$ du -sh foo 4.4Mfoo Similar benchmark in my case : nagtst01:/usr/local/nagios/sbin # time ./status.cgi testdata real0m3.553s user0m3.316s sys 0m0.044s nagtst01:/usr/local/nagios/sbin # ls -l ../var/status.dat -rw-rw-r-- 1 nagios nagios 849578 2008-02-20 11:02 ../var/status.dat Compare with another box we have which is 30 times slower, and every part (sys usr ; is around 20 times slower. We suspect there are some thing wrong with the vmware config in this case. The other (good) box has around 460 services In another box: illuminati:/usr/local/nagios/sbin # time ./status.cgi testdata real0m0.364s user0m0.104s sys 0m0.002s illuminati:/usr/local/nagios/sbin # ls -l ../var/status.dat -rw-rw-r-- 1 nagios nagcmd 622377 Feb 20 11:03 ../var/status.dat What status view specifically is the issue for you? What version of nagios? Unless you're still using nagios-1.x, my gut feeling is that Nagios 2.9 having some custom code I made but just change the some of the html tag output only, trivvial and 100% not affect performance. The view is requested by nagvis. I do not do the nagvis config then I am not sure how many request that nagvis generate but even the main page of nagios it still takes too much cpu for status.cgi to run Thanks for your help Cheers there is something outside of nagios causing the issue (i.e. disk IO, memory pressure, etc). Nagios-2 should be able to easily handle the status data for 650 services. That was an area of significant focus and improvement with the current version. -- Marc - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Steve Kieu Mob: (+64) 021 250 6437 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] status.cgi very high cpu usage
Hello, I have a problem with status.cgi taking up too much cpu so the page is very slow to render. Is there any way to find out where the problem is? We have about 650 services monitored. The output os nagios -s command is below: HOST SCHEDULING INFORMATION --- Total hosts: 114 Total scheduled hosts: 5 Host inter-check delay method: SMART Average host check interval: 300.00 sec Host inter-check delay: 60.00 sec Max host check spread: 15 min First scheduled check: Tue Feb 19 16:44:01 2008 Last scheduled check:Tue Feb 19 16:48:01 2008 SERVICE SCHEDULING INFORMATION --- Total services: 645 Total scheduled services: 596 Service inter-check delay method: SMART Average service check interval: 877.85 sec Inter-check delay: 1.47 sec Interleave factor method: SMART Average services per host: 5.66 Service interleave factor: 6 Max service check spread: 15 min First scheduled check: Tue Feb 19 16:46:28 2008 Last scheduled check: Tue Feb 19 17:01:09 2008 CHECK PROCESSING INFORMATION Service check reaper interval: 10 sec Max concurrent service checks: Unlimited PERFORMANCE SUGGESTIONS --- I have no suggestions - things look okay. It is running on a vmware host with 1Gb of ram and we just allocate 1 more cpu (3Ghz) without any improvement. The custom frontend using Nagvis but even if we do not access nagvis, accessing normal nagios services list still causes high cpu usage and slow response in a such unusable state. Please help. Thanks you in advance. -- Steve Kieu Mob: (+64) 021 250 6437 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null