Re: monitoring load average

2003-01-12 Thread Christian Hammers
On Wed, Jan 08, 2003 at 08:45:58AM +0100, Javier wrote:
> I think that "vmstat 5 2" and getting the last line could give you a
> good result.

BTW: I started to keep a 
vmstat 5 | logger -t vmstat:
while true; do  ps faxu|logger -t ps: ; sleep 15; done
running and log the output with everything else to a seperate host who
has logcheck and some other monitoring stuff installed.

The ps line is quite interesting if the server crashes, if e.g. a server
starts eating up all memory no minutely (cron granularity) run check is 
able to detect it.

bye,

-christian-
-- 
Christian HammersWESTEND GmbH - Aachen und Dueren Tel 0241/701333-0
[EMAIL PROTECTED] Internet & Security for ProfessionalsFax 0241/911879
  WESTEND ist CISCO Systems Partner - Authorized Reseller


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: monitoring load average

2003-01-08 Thread Nate Campi
On Wed, Jan 08, 2003 at 07:08:29AM +0100, Russell Coker wrote:
> On Tue, 7 Jan 2003 20:15, Javier wrote:
> > Perhaps you can try with vmstat. It gives you the CPU idle time, so you
> > can easily program an script that returns (100 - idle time). Use
> > netsaint_statd plugin to return to netsaint server what your script
> > returns.
> 
> Thanks for the suggestion.  However I still need to have a separate script 
> running vmstat as it's results are wildly inaccurate if run as "vmstat", you 
> need to run "vmstat 2" to get reliable results (and the first line won't be 
> the one you want).
> 
> I was thinking of having something like vmstat constantly running and 
> periodically writing it's results to a file.
> 
> Another issue is that I don't want a load spike to trigger an alert.  So I 
> want to have an average over say a minute "vmstat 60" (which makes it 
> impossible to run vmstat from the script, reading from an output file from a 
> daemon process is the only real option).

I'd use SNMP. I graph the basic stuff you're looking for with RRDtool:
http://www.campin.net/perl/RRDsnmp.cgi?host=vpn-pat>

I don't do any I/O stuff, but you could look for it in the MIB2 host MIB
or UCD enterprise MIBs - I'm sure there's something. If there isn't, do
what I do for DNS stat graphing and fire off a shell script to extend
it: http://www.campin.net/DNS/graph.html>

A major benefit to using SNMP is that many other network monitoring and
management systems utilize it, so if you deploy one it'll be able to
work with your existing infrastructure.
-- 
Nate Campi   http://www.campin.net 

I have a spelling checker
It came with my PC;
It plainly marks four my revue
Mistakes I cannot sea.
I've run this poem threw it,
I'm sure your pleased too no,
Its letter perfect in it's weigh,
My checker tolled me sew. 
 -Janet Minor  
 
"Hardware: the parts of a computer that can be kicked."  -Jeff Pesis  


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




RE: monitoring load average

2003-01-07 Thread Javier
You're right,

I think that "vmstat 5 2" and getting the last line could give you a
good result.

Another solution (in the same direction) could be: Execute with cron 3
tasks:

1.- Each five seconds: vmstat 5 2 >
/tmp/output.last.five.seconds
2.- Each 60 seconds: vmstat 60 2 >
/tmp/output.last.sixty.seconds
3.- Each 300 seconds: vmstat 300 2 > /tmp/output.last.five.mins

and modify netstat_statd plugin to make it returns those three values.

-Mensaje original-
De: Russell Coker [mailto:[EMAIL PROTECTED]] 
Enviado el: miércoles, 08 de enero de 2003 7:08
Para: Javier; 'Debian ISP'
Asunto: Re: monitoring load average

On Tue, 7 Jan 2003 20:15, Javier wrote:
> Perhaps you can try with vmstat. It gives you the CPU idle time, so
you
> can easily program an script that returns (100 - idle time). Use
> netsaint_statd plugin to return to netsaint server what your script
> returns.

Thanks for the suggestion.  However I still need to have a separate
script 
running vmstat as it's results are wildly inaccurate if run as "vmstat",
you 
need to run "vmstat 2" to get reliable results (and the first line won't
be 
the one you want).

I was thinking of having something like vmstat constantly running and 
periodically writing it's results to a file.

Another issue is that I don't want a load spike to trigger an alert.  So
I 
want to have an average over say a minute "vmstat 60" (which makes it 
impossible to run vmstat from the script, reading from an output file
from a 
daemon process is the only real option).

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux
packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: monitoring load average

2003-01-07 Thread Russell Coker
On Tue, 7 Jan 2003 20:15, Javier wrote:
> Perhaps you can try with vmstat. It gives you the CPU idle time, so you
> can easily program an script that returns (100 - idle time). Use
> netsaint_statd plugin to return to netsaint server what your script
> returns.

Thanks for the suggestion.  However I still need to have a separate script 
running vmstat as it's results are wildly inaccurate if run as "vmstat", you 
need to run "vmstat 2" to get reliable results (and the first line won't be 
the one you want).

I was thinking of having something like vmstat constantly running and 
periodically writing it's results to a file.

Another issue is that I don't want a load spike to trigger an alert.  So I 
want to have an average over say a minute "vmstat 60" (which makes it 
impossible to run vmstat from the script, reading from an output file from a 
daemon process is the only real option).

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: monitoring load average

2003-01-07 Thread Corey Ralph
Sorry, no advise on how to collect this from the network.


The check_by_ssh plugin works well for me.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: monitoring load average

2003-01-07 Thread Васил Колев
vmstat is great, but just one word of advice... I had some machines
running AOLserver (damn good, but i found better and faster than him),
and it had about 1024+ threads, and everything - ps, top, vmstat , which
read the processes information in /proc , skewed a lot the information,
because it took a lot of CPU (in the kernel, not in userspace). I
haven't checked if that's changed with recent kernels (my last test was
in 2.4.4, afaik, and in 2.5 there is a lot done about threads), but
whatever you use to monitor the system, be sure that it doesn't affect
it too much.

Íà âò, 2003-01-07 â 22:28, Adrian 'Dagurashibanipal' von Bidder çàïèñà:
> On Tue, 2003-01-07 at 17:49, Russell Coker wrote:
> 
> > Any suggestions?
> 
> Monitoring vmstat output? I feel vmstat gives you all relevant data in
> one place: memory, disk, cpu.
> 
> Sorry, no advise on how to collect this from the network.
> 
> cheers
> -- vbi


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: monitoring load average

2003-01-07 Thread Gavin Hamill
On Tuesday 07 January 2003 8:28 pm, Adrian 'Dagurashibanipal' von Bidder 

> Monitoring vmstat output? I feel vmstat gives you all relevant data in
> one place: memory, disk, cpu.
>
> Sorry, no advise on how to collect this from the network.

inetd?

inetd.conf:
vmstat  stream  tcp nowait  root/usr/bin/vmstat /usr/bin/vmstat

services:
vmstat  1551/tcp

then... 

gdh@lindesk:~$ telnet 10.0.0.1 1551
Trying 10.0.0.1...
Connected to 10.0.0.1.
Escape character is '^]'.
   procs  memoryswap  io system 
cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  
id
 0  0  0   1280  26380  22672 121520   0   01319  17841   2   1  
97
Connection closed by foreign host.

The joys of UNIX - then just use hosts.allow to restrict access to this port 

=)

gdh


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: monitoring load average

2003-01-07 Thread Adrian 'Dagurashibanipal' von Bidder
On Tue, 2003-01-07 at 17:49, Russell Coker wrote:

> Any suggestions?

Monitoring vmstat output? I feel vmstat gives you all relevant data in
one place: memory, disk, cpu.

Sorry, no advise on how to collect this from the network.

cheers
-- vbi

-- 
this email is protected by a digital signature: http://fortytwo.ch/gpg



signature.asc
Description: This is a digitally signed message part


RE: monitoring load average

2003-01-07 Thread Javier

Hi,

Perhaps you can try with vmstat. It gives you the CPU idle time, so you
can easily program an script that returns (100 - idle time). Use
netsaint_statd plugin to return to netsaint server what your script
returns.

I hope this helps.

Un saludo.
Javier.




-Mensaje original-
De: Russell Coker [mailto:[EMAIL PROTECTED]] 
Enviado el: martes, 07 de enero de 2003 17:50
Para: Debian ISP
CC: [EMAIL PROTECTED]
Asunto: monitoring load average

I am involved with setting up NetSaint monitoring of a medium size
network.

One problem I have is determining suitable ways of monitoring system
load.  A 
machine with 100% usage of a resource by server processes will have
request 
queues that grow indefinately (and performance will suck).

So the load average doesn't seem particularly useful.  If a machine has
a 
sustained load average of 3.0 from from CPU operations and it has two
CPUs 
then that indicates a problem.  If it is from disk operations and there
are 
four disks in a RAID-5 array then it's equal to the number of non-parity

stripes and the load is probably at the limit of what it can handle.  If
it's 
half from CPU and half from disk then it shouldn't be a problem at all.

I think that perhaps a better way would be to have one test measure on
the 
amount of CPU time used (the sum of the "user" and "system" percentages
of 
the CPU usage as reported by top would do - nice time doesn't matter).

Then I could have another test measure the disk utilization in terms of
the 
await, svctm, or %util fields as reported by iostat.

Any suggestions?

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux
packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact
[EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




monitoring load average

2003-01-07 Thread Russell Coker
I am involved with setting up NetSaint monitoring of a medium size network.

One problem I have is determining suitable ways of monitoring system load.  A 
machine with 100% usage of a resource by server processes will have request 
queues that grow indefinately (and performance will suck).

So the load average doesn't seem particularly useful.  If a machine has a 
sustained load average of 3.0 from from CPU operations and it has two CPUs 
then that indicates a problem.  If it is from disk operations and there are 
four disks in a RAID-5 array then it's equal to the number of non-parity 
stripes and the load is probably at the limit of what it can handle.  If it's 
half from CPU and half from disk then it shouldn't be a problem at all.

I think that perhaps a better way would be to have one test measure on the 
amount of CPU time used (the sum of the "user" and "system" percentages of 
the CPU usage as reported by top would do - nice time doesn't matter).

Then I could have another test measure the disk utilization in terms of the 
await, svctm, or %util fields as reported by iostat.

Any suggestions?

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]