Re: [Nagios-users] status.cgi very high cpu usage = Problem gone away

2008-02-22 Thread Steve Kieu
Not slow because of many requests, slow because cpu execution time is like
30 times slower, effectively running like 100Mh i486 cpu (benchmark on the
console only, disconnect all network NIC. So no ; not because of Google :-)



Obviously vmware is doing something nasty. When the box is slow and high
load, the host load is still rather low.
Cheers,


On Fri, Feb 22, 2008 at 7:26 PM, Hugo van der Kooij 
[EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Steve Kieu wrote:

 | The most scary thing is, as suddenly like when it came, it suddenly went
 | away this morning. Not any changes to the nagios system or vmware guest
 | and hosts.
 |
 | I have been doing the similar benchmark of status.cgi on the real host
 | with the same status.dat file with the host having problem. It takes 0.1
 | sec to process, and in the vmware host it took 3.9 second. I have a
 | quick look of how status.cgi parse the text file and see lots of  memory
 | cmp (strcmp call to libc) and move. So my wild guess is that memory
 | operation on the vm is terribly slow for some reason. And suddenly as
 | come from nowhere, it just become as fast as normal.
 |
 | Any one has a bright idea of what is going on ?

 You wouldn't happen to own a google indexing device now would you? THat
 could hammer down a webbased server like nagios.

 Hugo.

 - --
 [EMAIL PROTECTED]   http://hugo.vanderkooij.org/
 PGP/GPG http://hugo.vanderkooij.org/PGP/GPG? Use:
 http://hugo.vanderkooij.org/0x58F19981.asc

A: Yes.
Q: Are you sure?
A: Because it reverses the logical flow of conversation.
Q: Why is top posting frowned upon?

 Bored? Click on http://spamornot.org/ and rate those images.

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)

 iD8DBQFHvmsXBvzDRVjxmYERAi0qAKC3ZCvsFQ24Hz4BvQj7M+MMAzEiJwCgnuMV
 emfj+Gf5tZjQxS0zZEdnyzM=
 =YkCx
 -END PGP SIGNATURE-

 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2008.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null




-- 
Steve Kieu
Mob: (+64) 021 250 6437
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] status.cgi very high cpu usage = Problem gone away

2008-02-21 Thread Hugo van der Kooij
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Steve Kieu wrote:

| The most scary thing is, as suddenly like when it came, it suddenly went
| away this morning. Not any changes to the nagios system or vmware guest
| and hosts.
|
| I have been doing the similar benchmark of status.cgi on the real host
| with the same status.dat file with the host having problem. It takes 0.1
| sec to process, and in the vmware host it took 3.9 second. I have a
| quick look of how status.cgi parse the text file and see lots of  memory
| cmp (strcmp call to libc) and move. So my wild guess is that memory
| operation on the vm is terribly slow for some reason. And suddenly as
| come from nowhere, it just become as fast as normal.
|
| Any one has a bright idea of what is going on ?

You wouldn't happen to own a google indexing device now would you? THat
could hammer down a webbased server like nagios.

Hugo.

- --
[EMAIL PROTECTED]   http://hugo.vanderkooij.org/
PGP/GPG? Use: http://hugo.vanderkooij.org/0x58F19981.asc

A: Yes.
Q: Are you sure?
A: Because it reverses the logical flow of conversation.
Q: Why is top posting frowned upon?

Bored? Click on http://spamornot.org/ and rate those images.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHvmsXBvzDRVjxmYERAi0qAKC3ZCvsFQ24Hz4BvQj7M+MMAzEiJwCgnuMV
emfj+Gf5tZjQxS0zZEdnyzM=
=YkCx
-END PGP SIGNATURE-

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] status.cgi very high cpu usage = Problem gone away

2008-02-20 Thread Steve Kieu
Hi,

The most scary thing is, as suddenly like when it came, it suddenly went
away this morning. Not any changes to the nagios system or vmware guest and
hosts.

I have been doing the similar benchmark of status.cgi on the real host with
the same status.dat file with the host having problem. It takes 0.1 sec to
process, and in the vmware host it took 3.9 second. I have a quick look of
how status.cgi parse the text file and see lots of  memory cmp (strcmp call
to libc) and move. So my wild guess is that memory operation on the vm is
terribly slow for some reason. And suddenly as come from nowhere, it just
become as fast as normal.

Any one has a bright idea of what is going on ?

Thanks


-- 
Steve Kieu
Mob: (+64) 021 250 6437
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] status.cgi very high cpu usage

2008-02-19 Thread Justin Hitt
Steve,

On Feb 18, 2008 10:51 PM, Steve Kieu [EMAIL PROTECTED] wrote:
 I have a problem with status.cgi taking up too much cpu so the page is very
 slow to  render. Is there any way to find out where the problem is?
 We have about 650 services monitored. The output os nagios -s command is

Many of the Monitoring reports don't work well at volume, I've been
asking users to only use Unhandled reports.  You may get better
response in Mozilla, but 'status.cgi' can kill Internet Explorer
because of how it's loading everything in one large list.

Nagios is at the point where it needs an SQL back end with a more
modular look at how it stores site data.  Perhaps, rolling status up
into summary reports that are queried to create reports then go into
host tables only when someone drills down into host information.

In production you'll want to be on a multi-core multi-threaded
machine; 2 cores won't do it if you'll have more than one user in the
system.  Until then, keep users in the Unhandled menus around
{Service,Host} Problems

Best,

Justin
-- 
Attention Sales And Marketing Professionals Who Serve B2B Executives
   http://hittpublishingdirect.com/

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] status.cgi very high cpu usage

2008-02-19 Thread Marc Powell


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Justin Hitt
 Sent: Tuesday, February 19, 2008 9:15 AM
 To: Steve Kieu
 Cc: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] status.cgi very high cpu usage
 
 Steve,
 
 On Feb 18, 2008 10:51 PM, Steve Kieu [EMAIL PROTECTED] wrote:
  I have a problem with status.cgi taking up too much cpu so the page
is
 very
  slow to  render. Is there any way to find out where the problem is?
  We have about 650 services monitored. The output os nagios -s
command is
 
 Many of the Monitoring reports don't work well at volume, I've been
 asking users to only use Unhandled reports.  You may get better
 response in Mozilla, but 'status.cgi' can kill Internet Explorer
 because of how it's loading everything in one large list.

This is a browser rendering issue, nothing to do with the nagios' speed
at reading and parsing its status file. To get the html to display
status for all 3800 of my services takes under 1.5 seconds --

$ time wget http://redacted/cgi-bin/status.cgi?host=all
--12:59:50--  http://redacted/cgi-bin/status.cgi?host=all
   = `status.cgi?host=all'
Resolving redacted
Connecting to redacted|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ =
] 4,540,227 23.92M/s 

12:59:51 (23.84 MB/s) - `status.cgi?host=all' saved [4540227]


real0m1.364s
user0m0.010s
sys 0m0.000s
$ wc -l status.cgi\?host\=all 
 128601 status.cgi?host=all

 
 Nagios is at the point where it needs an SQL back end with a more
 modular look at how it stores site data.  Perhaps, rolling status up

Perhaps, but not for speed/performance reasons (outside of long-duration
archive reports), IMHO.

 into summary reports that are queried to create reports then go into
 host tables only when someone drills down into host information.

Isn't this already available? Status Summary, various links from
Tactical Overview, Service Problems, etc. 

 In production you'll want to be on a multi-core multi-threaded
 machine; 2 cores won't do it if you'll have more than one user in the
 system.  Until then, keep users in the Unhandled menus around
 {Service,Host} Problems

This is best from a workflow perspective but saying that you need to
have dual cores if you have more than one nagios user is a bit a dubious
statement. My own experience is that the above test used less than 3%
cpu for the duration of a 3.1Ghz Xeon cpu, even when viewed with a
browser. Granted it's not a vmware box but if vmware introduces that
significant of a performance impact, I'd abandon it without prejudice.

--
Marc

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] status.cgi very high cpu usage

2008-02-19 Thread Steve Kieu

 
  Many of the Monitoring reports don't work well at volume, I've been
  asking users to only use Unhandled reports.  You may get better
  response in Mozilla, but 'status.cgi' can kill Internet Explorer
  because of how it's loading everything in one large list.

 This is a browser rendering issue, nothing to do with the nagios' speed
 at reading and parsing its status file. To get the html to display
 status for all 3800 of my services takes under 1.5 seconds --



In my case it is not the browser issue. The problem is that status,cgi  does
not return data rather than the data it generates is so big.

And status.cgi hog cpu time at the monitoring host (not the desktop viewing
using a browser)

I guess there are something wrong with status.cgi and even the
extinfo.cgitake considerable amount of cpu as well

Thanks

-- 
Steve Kieu
Mob: (+64) 021 250 6437
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] status.cgi very high cpu usage

2008-02-19 Thread Marc Powell


 -Original Message-
 From: Steve Kieu [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, February 19, 2008 2:32 PM
 To: Marc Powell
 Cc: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] status.cgi very high cpu usage
 
   
Many of the Monitoring reports don't work well at volume,
I've
 been
asking users to only use Unhandled reports.  You may get
better
response in Mozilla, but 'status.cgi' can kill Internet
Explorer
because of how it's loading everything in one large list.
 
 
   This is a browser rendering issue, nothing to do with the
nagios'
 speed
   at reading and parsing its status file. To get the html to
display
   status for all 3800 of my services takes under 1.5 seconds --
 
 
 
 
 In my case it is not the browser issue. The problem is that status,cgi
 does not return data rather than the data it generates is so big.
 

4.4M of data in 1.39s --

[EMAIL PROTECTED] nagios]$ export REQUEST_METHOD=GET; export
QUERY_STRING=host=all; export REMOTE_USER=mpowell;
[EMAIL PROTECTED] nagios]$ time ./sbin/status.cgi foo

real0m1.390s
user0m1.300s
sys 0m0.090s
[EMAIL PROTECTED] nagios]$ du -sh foo
4.4Mfoo

 And status.cgi hog cpu time at the monitoring host (not the desktop
 viewing using a browser)
 
 I guess there are something wrong with status.cgi and even the
extinfo.cgi
 take considerable amount of cpu as well

What status view specifically is the issue for you? What version of
nagios? Unless you're still using nagios-1.x, my gut feeling is that
there is something outside of nagios causing the issue (i.e. disk IO,
memory pressure, etc). Nagios-2 should be able to easily handle the
status data for 650 services. That was an area of significant focus and
improvement with the current version.

--
Marc

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] status.cgi very high cpu usage

2008-02-19 Thread Steve Kieu
Hello,

[EMAIL PROTECTED] nagios]$ time ./sbin/status.cgi foo

 real0m1.390s
 user0m1.300s
 sys 0m0.090s
 [EMAIL PROTECTED] nagios]$ du -sh foo
 4.4Mfoo



Similar benchmark  in my case :
nagtst01:/usr/local/nagios/sbin # time ./status.cgi  testdata

real0m3.553s
user0m3.316s
sys 0m0.044s
nagtst01:/usr/local/nagios/sbin # ls -l ../var/status.dat
-rw-rw-r-- 1 nagios nagios 849578 2008-02-20 11:02 ../var/status.dat


Compare with another box we have which is 30 times slower, and every part
(sys usr ; is around 20 times slower. We suspect there are some thing wrong
with the vmware config in this case. The other (good) box has around 460
services

In another box:

illuminati:/usr/local/nagios/sbin # time ./status.cgi  testdata

real0m0.364s
user0m0.104s
sys 0m0.002s
illuminati:/usr/local/nagios/sbin #  ls -l ../var/status.dat
-rw-rw-r--  1 nagios nagcmd 622377 Feb 20 11:03 ../var/status.dat



 What status view specifically is the issue for you? What version of
 nagios? Unless you're still using nagios-1.x, my gut feeling is that


Nagios 2.9 having some custom code I made but  just change the some of the
html tag output only, trivvial and 100% not affect performance. The view is
requested by nagvis. I do not do the nagvis config then I am not sure how
many request that nagvis generate but even the main page of nagios it still
takes too much cpu for status.cgi to run

Thanks for your help

Cheers





 there is something outside of nagios causing the issue (i.e. disk IO,
 memory pressure, etc). Nagios-2 should be able to easily handle the
 status data for 650 services. That was an area of significant focus and
 improvement with the current version.

 --
 Marc

 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2008.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null




-- 
Steve Kieu
Mob: (+64) 021 250 6437
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] status.cgi very high cpu usage

2008-02-18 Thread Steve Kieu
Hello,

I have a problem with status.cgi taking up too much cpu so the page is very
slow to  render. Is there any way to find out where the problem is?

We have about 650 services monitored. The output os nagios -s command is
below:


HOST SCHEDULING INFORMATION
---
Total hosts: 114
Total scheduled hosts:   5
Host inter-check delay method:   SMART
Average host check interval: 300.00 sec
Host inter-check delay:  60.00 sec
Max host check spread:   15 min
First scheduled check:   Tue Feb 19 16:44:01 2008
Last scheduled check:Tue Feb 19 16:48:01 2008


SERVICE SCHEDULING INFORMATION
---
Total services: 645
Total scheduled services:   596
Service inter-check delay method:   SMART
Average service check interval: 877.85 sec
Inter-check delay:  1.47 sec
Interleave factor method:   SMART
Average services per host:  5.66
Service interleave factor:  6
Max service check spread:   15 min
First scheduled check:  Tue Feb 19 16:46:28 2008
Last scheduled check:   Tue Feb 19 17:01:09 2008


CHECK PROCESSING INFORMATION

Service check reaper interval:  10 sec
Max concurrent service checks:  Unlimited


PERFORMANCE SUGGESTIONS
---
I have no suggestions - things look okay.


It is running on a vmware host with 1Gb of ram and we just allocate 1 more
cpu (3Ghz) without any improvement. The custom frontend using Nagvis but
even if we do not access nagvis, accessing normal nagios services list still
causes high cpu usage and slow response in a such unusable state.

Please help. Thanks you in advance.



-- 
Steve Kieu
Mob: (+64) 021 250 6437
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null