Re: Performance monitoring?

2002-12-20 Thread Ben Boulanger
On Fri, 2002-12-20 at 08:53, [EMAIL PROTECTED] wrote:
> The [k]bytes or blocks per second read or written metric suffers from 
> the same problem in that I can't correlate them to anything which was 
> actually requested. So, I have one set of metrics telling me that 
> something was requested, but not the size of the request, and another 
> metric telling me how much data was transferred, but nothing telling 
> me how much was requested.

Well, there's the DISK-IO MIB in SNMP..  on my box it's
/usr/share/snmp/mibs/UCD-DISKIO-MIB.txt

Have you considered using SNMP already?

Using that with MRTG and maybe some outside tools (you could write a
script that returns a number for MRTG to graph) you may be able to
accomplish a fair amount of what you're going for.

Ben


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread Michael O'Donnell


>Does anyone know of any utilities which can dig into disk drive 
>performance?  I'm looking to discover the "disk busy" time, i.e., 
>what percentage of time is the disk off doing something, such that 
>requests to the disk are blocked.


I'm not sure I understand what you're trying to
measure, but note that you might find it difficult
to characterize certain behaviors of your disk units
if they have caches (pretty much all of them do)
that are enabled (pretty much all have them enabled
by default) because the caches will, to some extent,
hide the delays due to seek- and rotational-latencies.
You can always shut the caches off (using commands
like hdparm and scsiinfo) but I'm guessing that's
not representative of your normal operating mode,
so it might only serve to satisfy your curiosity
and slow the system throughput waayy down...

___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread pll

In a message dated: 20 Dec 2002 09:30:15 EST
Ben Boulanger said:

>Have you considered using SNMP already?

Yes, but I don't believe it provides me with anything I don't already have.

>Using that with MRTG and maybe some outside tools (you could write a
>script that returns a number for MRTG to graph) you may be able to
>accomplish a fair amount of what you're going for.

I believe that the SNMP DiskIO MIB polls the information from iostat, 
and hence, would be nothing more than a middle-man for information I 
already have direct access to.

In addition, to use SNMP, I'd have to install apache, the mibs, the 
snmp tools, gnuplot, and who knows what else.  Unfortunately, with 
all this crap, I begin to affect the very performance I'm trying to 
measure :(

Of course, I'd love to be proven wrong and told that SNMP has some 
magical powers or otherwise 'all-access pass' into Disk IO 
metrics that iostat is incapable of providing, and therefore, it's 
worth going around and installing all this stuff on the 64+ nodes to 
get this information :)
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

 If you're not having fun, you're not doing it right!


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread pll

In a message dated: Fri, 20 Dec 2002 09:52:53 EST
Michael O'Donnell said:

>I'm not sure I understand what you're trying to measure,

I want to know what percentage of time the disk is busy doing stuff.
Or, conversely, what percentage of time the disk is not doing stuff.

Here's the problem set.

I/We have this box being used as a black-box storage device.  We are 
trying to characterize it's over-all performance; from CPU 
utilization to disk IO, memory, swapping, etc.

Since most of the people I work with are hard-core storage engineers, 
one of the metrics they're used to is "disk busy time", that time 
that the disk is spent doing something, and therefore is unable to 
take more requests.

The interesting thing is that iostat will tell you how many requests 
per second were made, and how much data per second was transferred, 
but there is no correlation between the two; i.e. I know how many 
requests were made, but not the size of those requests.  Therefore, I 
have no idea what percentage of requests is represented by the amount 
of data transferred.

  mod> you might find it difficult to characterize certain behaviors
  mod> of your disk units if they have caches (pretty much all of
  mod> them do) that are enabled (pretty much all have them enabled
  mod> by default) because the caches will, to some extent, hide the
  mod> delays due to seek- and rotational-latencies.

Well, yes, I understand that, however at this level, I don't think 
it matters.  The OS should only care how fast it's requests are 
answered.  If the OS makes a request, and it gets blocked, then the 
disk is "busy".  On the other hand, if it makes a request, and gets 
and answer, the disk wasn't "busy".  Some answers are obtained faster 
than others, which might be attributable to cache hits, but the 
bottom line is, the disk wasn't busy when I made my request.

(btw, I too question the validity of such a number, but hey, that's 
what they asked me for, and they pay me, so I need to humor them :) 
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

 If you're not having fun, you're not doing it right!


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread Ben Boulanger
On Fri, 2002-12-20 at 10:07, [EMAIL PROTECTED] wrote:
> >Have you considered using SNMP already?
> 
> Yes, but I don't believe it provides me with anything I don't already have.

My thought was that it would provide the correlation that I thought you
were looking for.  If you've got bytes written, bytes read and accesses,
you could correlate that data onto one graph.

Ben


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread bscott
On Fri, 20 Dec 2002, at 9:52am, [EMAIL PROTECTED] wrote:
> I'm not sure I understand what you're trying to measure ...

  It sounds like he is trying to measure how big a bottleneck disk
performance is.  In other words, he wants to find out how much time does the
system spend waiting for the disk drive.  Is that accurate, Paul?

> ... note that you might find it difficult to characterize certain
> behaviors of your disk units if they have caches ...

  Since the cache will normally be enabled, one would what it included in
whatever benchmark you are running.  Running the benchmark under load over
time should smooth out an irregularities, and give you a better picture of
what you real-world performance is.

-- 
Ben Scott <[EMAIL PROTECTED]>
| The opinions expressed in this message are those of the author and do not |
| necessarily represent the views or policy of any other person, entity or  |
| organization.  All information is provided without warranty of any kind.  |

___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread pll

In a message dated: 20 Dec 2002 10:43:10 EST
Ben Boulanger said:

>On Fri, 2002-12-20 at 10:07, [EMAIL PROTECTED] wrote:
>> >Have you considered using SNMP already?
>> 
>> Yes, but I don't believe it provides me with anything I don't already have.
>
>My thought was that it would provide the correlation that I thought you
>were looking for.  If you've got bytes written, bytes read and accesses,
>you could correlate that data onto one graph.

Oh, yeah, that's a lot of hassle though, just for graphing, which I 
can accomplish with a little perl and a lot of gnuplot :)

Thanks though!
(oh, and if you're a fan of MRTG, check out NRG http://nrg.hep.wisc.edu/ )
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

 If you're not having fun, you're not doing it right!


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread tbuskey
> In a message dated: 20 Dec 2002 09:30:15 EST
> Ben Boulanger said:
> 
> >Have you considered using SNMP already?
> 
> Yes, but I don't believe it provides me with anything I don't already have.

SNMP makes it very easy to get the data into MRTG/RRDtool

> 
> >Using that with MRTG and maybe some outside tools (you could write a
> >script that returns a number for MRTG to graph) you may be able to
> >accomplish a fair amount of what you're going for.
> 
> I believe that the SNMP DiskIO MIB polls the information from iostat, 
> and hence, would be nothing more than a middle-man for information I 
> already have direct access to.

That's not really the point.  SNMP makes it easy for MRTG to gather the info
into a central database.  And produce graphical reports

> In addition, to use SNMP, I'd have to install apache, the mibs, the 
> snmp tools, gnuplot, and who knows what else.  Unfortunately, with 
> all this crap, I begin to affect the very performance I'm trying to 
> measure :(

Well...  You need SNMP agents on all the clients.  On the one system running
MRTG, you need perl, a bunch of perl packages, cron jobs and disk space.  You
can run a browser on the MRTG pointing at the directories.  Or run apache to
access it remotely.

MRTG is fairly low impact on the server and clients.  It's also a great way to
get running data averaged over a long time.

> 
> Of course, I'd love to be proven wrong and told that SNMP has some 
> magical powers or otherwise 'all-access pass' into Disk IO 
> metrics that iostat is incapable of providing, and therefore, it's 
> worth going around and installing all this stuff on the 64+ nodes to 
> get this information :)

It's not going to get you more info then you can get directly.  But it's *much*
easier to gather.  Especially if you want data on more then 2-3 systems.

When I was doing network admin, I had an MRTG graph of every switch and hub my
servers were attached to as well as the routers and our outbound connection. 
When users said the network was slow, I could point at the bandwidth hogs.  I
could also do some load balancing based on historical data; I swapped users
between hubs and switches.  I also graphed my servers.  NetApp has a MIB to work
with MRTG to graph its data btw.

It took about 2 minutes to add a router/switch port/server to the system.

Your choice: custom written scripts to gather data, centralize it, generate
reports and graphs, and trim old data.  You're still going to need to put thaqt
script on all your nodes.

Or install SNMP agents everywhere, set up a server to gather the data with MRTG
and create reports for you.  If you take your time, probably a day to setup for
all 64 nodes.  Management likes the pretty graphs too.  MRTG's database is
static size too.

If your nodes are Solaris, Virtual Adrian is a good tool too.  I don't think
it's on anything else though.  MRTG will work with anything that has an SNMP
agent on it.
___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread pll


SNMP is currently not an option due to very specific system 
environment constraints. i.e. I do not have network connectivity to 
all the systems which need to be monitored.

(oh, and all the mgmt weenies want Excel spreadsheets with the raw 
data so they can draw their own pretty graphs, but that's not a 
primary factor, well, not yet anyway :)

I need both the graphs and the raw data.
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

 If you're not having fun, you're not doing it right!


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread pll

In a message dated: Fri, 20 Dec 2002 10:04:17 EST
[EMAIL PROTECTED] said:

>On Fri, 20 Dec 2002, at 9:52am, [EMAIL PROTECTED] wrote:
>> I'm not sure I understand what you're trying to measure ...
>
>  It sounds like he is trying to measure how big a bottleneck disk
>performance is.  In other words, he wants to find out how much time does the
>system spend waiting for the disk drive.  Is that accurate, Paul?

Absolutely.  Essentially, I'm looking for 'load average' numbers for 
a disk drive :)

>> ... note that you might find it difficult to characterize certain
>> behaviors of your disk units if they have caches ...
>
>  Since the cache will normally be enabled, one would what it included in
>whatever benchmark you are running.  Running the benchmark under load over
>time should smooth out an irregularities, and give you a better picture of
>what you real-world performance is.

Exactly.  I don't much care about cache hits (well, not today, anyway :)
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

 If you're not having fun, you're not doing it right!


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread pll

In a message dated: Fri, 20 Dec 2002 11:17:24 EST
Derek Martin said:

>Performance monitoring has always been the one area where I felt Linux
>was lacking...  I've heard roumors (from kernel hacker types) of people
>working on comprehensive performance monitoring tools, but they were
>never very specific, and I haven't found such beasts myself.  If
>someone does know of a quality tool for this on Linux, I'd definitely
>be interested myself.

Well, I've dug up a bunch of utilities I've found googling around for 
such things.  Over all, I'm very impressed with the amount of data 
one can get from sar, iostat, mpstat, etc.

SGI has open sourced their Performance Co-Pilot (aka PCP :)
which looks like a really cool tool, just overly complicated for what 
I've been asked to look at (and, I have a sneaking suspicion, would 
actually introduce extraneous load to the systems being monitored).

Most people seem disappointed when they find out there aren't any 
really cool GUI apps for this akin to NT's widget.  Fortunately, 
that's not at all what I'm looking  for :)  I want text-based output 
I can parse with perl and dump into gnuplot in a very automated 
fashion :)
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

 If you're not having fun, you're not doing it right!


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread bscott
On Fri, 20 Dec 2002, at 11:17am, [EMAIL PROTECTED] wrote:
> Without all of that information, unless you can show that the system's I/O
> requests exceeded the system's capacity to provide I/O during some time
> period ...

  But is Paul really looking for that kind of detail?  It sounds to me like
Paul just wants to know how much time the *system* spends waiting for the
disk (or disks).  Compare that to total system time, and you have a fair
indication of whether the system is waiting on the disk a lot.  Maybe the
system isn't waiting on the disk a lot, in which case, you look elsewhere
for the cause of your performance problem.

  (Paul, if I'm wrong in my assumptions about what you want, please say so.)

  You could measure time spent waiting on the disk in the device driver
fairly easily.  First, the driver would have to note the time a request is
submitted to the disk.  Then, when the result comes back, the driver would
note the new time, take the difference, and add the difference to some
counter somewhere.

  Of course, I have no idea if Linux, or any of the disk device drivers,
actually *do* this, but I wouldn't think it would be *that* hard.  Of
course, I really don't have any idea what I am talking about, but that's
never stopped me before ;-)

-- 
Ben Scott <[EMAIL PROTECTED]>
| The opinions expressed in this message are those of the author and do not |
| necessarily represent the views or policy of any other person, entity or  |
| organization.  All information is provided without warranty of any kind.  |

___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2002-12-20 Thread bscott
On Fri, 20 Dec 2002, at 1:47pm, [EMAIL PROTECTED] wrote:
> Well, if he's using DMA, then technically none (or at least
> very little).

  Well, okay, yah.  When I said "waiting for the disk", I didn't mean "doing
nothing else", but rather "doing something else (including running the idle
loop) while waiting for the disk".  Upon further consideration, I suppose
interrupt latency (i.e., disk completes request, but system is busy doing
something else, so the ISR does not get called immediately) would affect my
suggested algorithm.  But you could probably make up a good excuse for
including interrupt latency in the benchmark.  ;-)

> It sounded to me he was trying to get an idea of how often requests
> weren't being serviced because the disks were already busy.

  He specifically said he was after "disk busy time", which I presume to
mean he wants to know what percent of time the disk drive is not idle.  
Rather like the "CPU utilization" (percent of CPU time not spent in the
kernel idle loop).

-- 
Ben Scott <[EMAIL PROTECTED]>
| The opinions expressed in this message are those of the author and do not |
| necessarily represent the views or policy of any other person, entity or  |
| organization.  All information is provided without warranty of any kind.  |




___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2003-01-02 Thread pll


(sorry for the delay in answering, I've been away from e-mail since 
 before Christmas :)

In a message dated: Fri, 20 Dec 2002 13:25:07 EST
[EMAIL PROTECTED] said:

>On Fri, 20 Dec 2002, at 11:17am, [EMAIL PROTECTED] wrote:
>> Without all of that information, unless you can show that the system's I/O
>> requests exceeded the system's capacity to provide I/O during some time
>> period ...
>
>  But is Paul really looking for that kind of detail?  It sounds to me like
>Paul just wants to know how much time the *system* spends waiting for the
>disk (or disks).  Compare that to total system time, and you have a fair
>indication of whether the system is waiting on the disk a lot.  Maybe the
>system isn't waiting on the disk a lot, in which case, you look elsewhere
>for the cause of your performance problem.
>
>  (Paul, if I'm wrong in my assumptions about what you want, please say so.)

Nope, you're right on.

>  You could measure time spent waiting on the disk in the device driver
>fairly easily.

Yeah, but that would mean hacking kernel code, and I'd like to avoid 
that if possible. 

Ideally, you'd think that there would be some way to query the device 
and determine if it can respond to a request, some-what like 'ping' 
for devices.

>  Of course, I have no idea if Linux, or any of the disk device drivers,
>actually *do* this, but I wouldn't think it would be *that* hard.  Of
>course, I really don't have any idea what I am talking about, but that's
>never stopped me before ;-)

Ahh, we won't hold that against you :)
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

 If you're not having fun, you're not doing it right!


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2003-01-02 Thread pll

In a message dated: Mon, 23 Dec 2002 11:32:12 EST
mike ledoux said:

>Is this different somehow from the %util field in 'iostat -x'?

The %util field will probably suffice, however, it appears to require 
a kernel patch for it to work (note, RH stock kernels already have 
the patch applied) and I can't seem to find the actual patch (after 
only 30 seconds of looking in /pub/linux/sct/fs/profiling/ where the 
'sysstat' web site states it should be).  I'll find the patch and 
play with it.  It might suffice for the people who care/are pestering me :)

>> The interesting thing is that iostat will tell you how many requests 
>> per second were made, and how much data per second was transferred, 
>> but there is no correlation between the two; i.e. I know how many 
>> requests were made, but not the size of those requests.
>
>'iostat -x' gives you an average request size.  You don't actually want
>to know the exact size of each individual request, do you?

Well, no, average size might suffice. 

>> Therefore, I have no idea what percentage of requests is represented
>> by the amount of data transferred.
>
>100%.

100% of the requests made, or 100% of the requests serviced?  Of the 
number of requests made per second, how many were actually completed?

Obviously the amount of data transferred == number of requests per 
second completed.  However, there may have been more requests made, 
than were completed.  How do you measure that delta?
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

 If you're not having fun, you're not doing it right!


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2003-01-02 Thread pll

In a message dated: Fri, 20 Dec 2002 13:47:44 EST
Derek Martin said:


>It sounded to me he was trying to get an idea of how often requests
>weren't being serviced because the disks were already busy.

Well, yeah.

>So, I dunno...  Paul, what /are/ you talking about?

I want to measue the amount of time a disk is busy doing "stuff" from 
an OS perspective.  I don't really care about it at the DMA level, 
but, if the OS/kernel makes a request, which gets sent to the DMA 
controller, and is forced into a context switch while waiting for the 
DMA controller to come back with the answer, then technically, that 
time spent doing "other stuff" while waiting, would constitute as 
"disk busy time".

Does that help?
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

 If you're not having fun, you're not doing it right!


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2003-01-02 Thread pll

In a message dated: Fri, 20 Dec 2002 18:17:32 EST
[EMAIL PROTECTED] said:

>  He specifically said he was after "disk busy time", which I presume to
>mean he wants to know what percent of time the disk drive is not idle.  
>Rather like the "CPU utilization" (percent of CPU time not spent in the
>kernel idle loop).

EXACTLY :)
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

 If you're not having fun, you're not doing it right!


___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss



Re: Performance monitoring?

2003-01-02 Thread bscott
On Thu, 2 Jan 2003, at 10:59am, [EMAIL PROTECTED] wrote:
> The %util field will probably suffice, however, it appears to require a
> kernel patch for it to work (note, RH stock kernels already have the patch
> applied) and I can't seem to find the actual patch ...

  If all else fails, the source RPM from Red Hat will contain the patch.  
RPM's management of "pristine" sources, patches, and building binary
packages is one of the really nice things about RPM.

-- 
Ben Scott <[EMAIL PROTECTED]>
| The opinions expressed in this message are those of the author and do not |
| necessarily represent the views or policy of any other person, entity or  |
| organization.  All information is provided without warranty of any kind.  |

___
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss