Re: Performance monitoring?
On Fri, 2002-12-20 at 08:53, [EMAIL PROTECTED] wrote: > The [k]bytes or blocks per second read or written metric suffers from > the same problem in that I can't correlate them to anything which was > actually requested. So, I have one set of metrics telling me that > something was requested, but not the size of the request, and another > metric telling me how much data was transferred, but nothing telling > me how much was requested. Well, there's the DISK-IO MIB in SNMP.. on my box it's /usr/share/snmp/mibs/UCD-DISKIO-MIB.txt Have you considered using SNMP already? Using that with MRTG and maybe some outside tools (you could write a script that returns a number for MRTG to graph) you may be able to accomplish a fair amount of what you're going for. Ben ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
>Does anyone know of any utilities which can dig into disk drive >performance? I'm looking to discover the "disk busy" time, i.e., >what percentage of time is the disk off doing something, such that >requests to the disk are blocked. I'm not sure I understand what you're trying to measure, but note that you might find it difficult to characterize certain behaviors of your disk units if they have caches (pretty much all of them do) that are enabled (pretty much all have them enabled by default) because the caches will, to some extent, hide the delays due to seek- and rotational-latencies. You can always shut the caches off (using commands like hdparm and scsiinfo) but I'm guessing that's not representative of your normal operating mode, so it might only serve to satisfy your curiosity and slow the system throughput waayy down... ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
In a message dated: 20 Dec 2002 09:30:15 EST Ben Boulanger said: >Have you considered using SNMP already? Yes, but I don't believe it provides me with anything I don't already have. >Using that with MRTG and maybe some outside tools (you could write a >script that returns a number for MRTG to graph) you may be able to >accomplish a fair amount of what you're going for. I believe that the SNMP DiskIO MIB polls the information from iostat, and hence, would be nothing more than a middle-man for information I already have direct access to. In addition, to use SNMP, I'd have to install apache, the mibs, the snmp tools, gnuplot, and who knows what else. Unfortunately, with all this crap, I begin to affect the very performance I'm trying to measure :( Of course, I'd love to be proven wrong and told that SNMP has some magical powers or otherwise 'all-access pass' into Disk IO metrics that iostat is incapable of providing, and therefore, it's worth going around and installing all this stuff on the 64+ nodes to get this information :) -- Seeya, Paul -- Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE It may look like I'm just sitting here doing nothing, but I'm really actively waiting for all my problems to go away. If you're not having fun, you're not doing it right! ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
In a message dated: Fri, 20 Dec 2002 09:52:53 EST Michael O'Donnell said: >I'm not sure I understand what you're trying to measure, I want to know what percentage of time the disk is busy doing stuff. Or, conversely, what percentage of time the disk is not doing stuff. Here's the problem set. I/We have this box being used as a black-box storage device. We are trying to characterize it's over-all performance; from CPU utilization to disk IO, memory, swapping, etc. Since most of the people I work with are hard-core storage engineers, one of the metrics they're used to is "disk busy time", that time that the disk is spent doing something, and therefore is unable to take more requests. The interesting thing is that iostat will tell you how many requests per second were made, and how much data per second was transferred, but there is no correlation between the two; i.e. I know how many requests were made, but not the size of those requests. Therefore, I have no idea what percentage of requests is represented by the amount of data transferred. mod> you might find it difficult to characterize certain behaviors mod> of your disk units if they have caches (pretty much all of mod> them do) that are enabled (pretty much all have them enabled mod> by default) because the caches will, to some extent, hide the mod> delays due to seek- and rotational-latencies. Well, yes, I understand that, however at this level, I don't think it matters. The OS should only care how fast it's requests are answered. If the OS makes a request, and it gets blocked, then the disk is "busy". On the other hand, if it makes a request, and gets and answer, the disk wasn't "busy". Some answers are obtained faster than others, which might be attributable to cache hits, but the bottom line is, the disk wasn't busy when I made my request. (btw, I too question the validity of such a number, but hey, that's what they asked me for, and they pay me, so I need to humor them :) -- Seeya, Paul -- Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE It may look like I'm just sitting here doing nothing, but I'm really actively waiting for all my problems to go away. If you're not having fun, you're not doing it right! ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
On Fri, 2002-12-20 at 10:07, [EMAIL PROTECTED] wrote: > >Have you considered using SNMP already? > > Yes, but I don't believe it provides me with anything I don't already have. My thought was that it would provide the correlation that I thought you were looking for. If you've got bytes written, bytes read and accesses, you could correlate that data onto one graph. Ben ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
On Fri, 20 Dec 2002, at 9:52am, [EMAIL PROTECTED] wrote: > I'm not sure I understand what you're trying to measure ... It sounds like he is trying to measure how big a bottleneck disk performance is. In other words, he wants to find out how much time does the system spend waiting for the disk drive. Is that accurate, Paul? > ... note that you might find it difficult to characterize certain > behaviors of your disk units if they have caches ... Since the cache will normally be enabled, one would what it included in whatever benchmark you are running. Running the benchmark under load over time should smooth out an irregularities, and give you a better picture of what you real-world performance is. -- Ben Scott <[EMAIL PROTECTED]> | The opinions expressed in this message are those of the author and do not | | necessarily represent the views or policy of any other person, entity or | | organization. All information is provided without warranty of any kind. | ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
In a message dated: 20 Dec 2002 10:43:10 EST Ben Boulanger said: >On Fri, 2002-12-20 at 10:07, [EMAIL PROTECTED] wrote: >> >Have you considered using SNMP already? >> >> Yes, but I don't believe it provides me with anything I don't already have. > >My thought was that it would provide the correlation that I thought you >were looking for. If you've got bytes written, bytes read and accesses, >you could correlate that data onto one graph. Oh, yeah, that's a lot of hassle though, just for graphing, which I can accomplish with a little perl and a lot of gnuplot :) Thanks though! (oh, and if you're a fan of MRTG, check out NRG http://nrg.hep.wisc.edu/ ) -- Seeya, Paul -- Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE It may look like I'm just sitting here doing nothing, but I'm really actively waiting for all my problems to go away. If you're not having fun, you're not doing it right! ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
> In a message dated: 20 Dec 2002 09:30:15 EST > Ben Boulanger said: > > >Have you considered using SNMP already? > > Yes, but I don't believe it provides me with anything I don't already have. SNMP makes it very easy to get the data into MRTG/RRDtool > > >Using that with MRTG and maybe some outside tools (you could write a > >script that returns a number for MRTG to graph) you may be able to > >accomplish a fair amount of what you're going for. > > I believe that the SNMP DiskIO MIB polls the information from iostat, > and hence, would be nothing more than a middle-man for information I > already have direct access to. That's not really the point. SNMP makes it easy for MRTG to gather the info into a central database. And produce graphical reports > In addition, to use SNMP, I'd have to install apache, the mibs, the > snmp tools, gnuplot, and who knows what else. Unfortunately, with > all this crap, I begin to affect the very performance I'm trying to > measure :( Well... You need SNMP agents on all the clients. On the one system running MRTG, you need perl, a bunch of perl packages, cron jobs and disk space. You can run a browser on the MRTG pointing at the directories. Or run apache to access it remotely. MRTG is fairly low impact on the server and clients. It's also a great way to get running data averaged over a long time. > > Of course, I'd love to be proven wrong and told that SNMP has some > magical powers or otherwise 'all-access pass' into Disk IO > metrics that iostat is incapable of providing, and therefore, it's > worth going around and installing all this stuff on the 64+ nodes to > get this information :) It's not going to get you more info then you can get directly. But it's *much* easier to gather. Especially if you want data on more then 2-3 systems. When I was doing network admin, I had an MRTG graph of every switch and hub my servers were attached to as well as the routers and our outbound connection. When users said the network was slow, I could point at the bandwidth hogs. I could also do some load balancing based on historical data; I swapped users between hubs and switches. I also graphed my servers. NetApp has a MIB to work with MRTG to graph its data btw. It took about 2 minutes to add a router/switch port/server to the system. Your choice: custom written scripts to gather data, centralize it, generate reports and graphs, and trim old data. You're still going to need to put thaqt script on all your nodes. Or install SNMP agents everywhere, set up a server to gather the data with MRTG and create reports for you. If you take your time, probably a day to setup for all 64 nodes. Management likes the pretty graphs too. MRTG's database is static size too. If your nodes are Solaris, Virtual Adrian is a good tool too. I don't think it's on anything else though. MRTG will work with anything that has an SNMP agent on it. ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
SNMP is currently not an option due to very specific system environment constraints. i.e. I do not have network connectivity to all the systems which need to be monitored. (oh, and all the mgmt weenies want Excel spreadsheets with the raw data so they can draw their own pretty graphs, but that's not a primary factor, well, not yet anyway :) I need both the graphs and the raw data. -- Seeya, Paul -- Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE It may look like I'm just sitting here doing nothing, but I'm really actively waiting for all my problems to go away. If you're not having fun, you're not doing it right! ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
In a message dated: Fri, 20 Dec 2002 10:04:17 EST [EMAIL PROTECTED] said: >On Fri, 20 Dec 2002, at 9:52am, [EMAIL PROTECTED] wrote: >> I'm not sure I understand what you're trying to measure ... > > It sounds like he is trying to measure how big a bottleneck disk >performance is. In other words, he wants to find out how much time does the >system spend waiting for the disk drive. Is that accurate, Paul? Absolutely. Essentially, I'm looking for 'load average' numbers for a disk drive :) >> ... note that you might find it difficult to characterize certain >> behaviors of your disk units if they have caches ... > > Since the cache will normally be enabled, one would what it included in >whatever benchmark you are running. Running the benchmark under load over >time should smooth out an irregularities, and give you a better picture of >what you real-world performance is. Exactly. I don't much care about cache hits (well, not today, anyway :) -- Seeya, Paul -- Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE It may look like I'm just sitting here doing nothing, but I'm really actively waiting for all my problems to go away. If you're not having fun, you're not doing it right! ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
In a message dated: Fri, 20 Dec 2002 11:17:24 EST Derek Martin said: >Performance monitoring has always been the one area where I felt Linux >was lacking... I've heard roumors (from kernel hacker types) of people >working on comprehensive performance monitoring tools, but they were >never very specific, and I haven't found such beasts myself. If >someone does know of a quality tool for this on Linux, I'd definitely >be interested myself. Well, I've dug up a bunch of utilities I've found googling around for such things. Over all, I'm very impressed with the amount of data one can get from sar, iostat, mpstat, etc. SGI has open sourced their Performance Co-Pilot (aka PCP :) which looks like a really cool tool, just overly complicated for what I've been asked to look at (and, I have a sneaking suspicion, would actually introduce extraneous load to the systems being monitored). Most people seem disappointed when they find out there aren't any really cool GUI apps for this akin to NT's widget. Fortunately, that's not at all what I'm looking for :) I want text-based output I can parse with perl and dump into gnuplot in a very automated fashion :) -- Seeya, Paul -- Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE It may look like I'm just sitting here doing nothing, but I'm really actively waiting for all my problems to go away. If you're not having fun, you're not doing it right! ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
On Fri, 20 Dec 2002, at 11:17am, [EMAIL PROTECTED] wrote: > Without all of that information, unless you can show that the system's I/O > requests exceeded the system's capacity to provide I/O during some time > period ... But is Paul really looking for that kind of detail? It sounds to me like Paul just wants to know how much time the *system* spends waiting for the disk (or disks). Compare that to total system time, and you have a fair indication of whether the system is waiting on the disk a lot. Maybe the system isn't waiting on the disk a lot, in which case, you look elsewhere for the cause of your performance problem. (Paul, if I'm wrong in my assumptions about what you want, please say so.) You could measure time spent waiting on the disk in the device driver fairly easily. First, the driver would have to note the time a request is submitted to the disk. Then, when the result comes back, the driver would note the new time, take the difference, and add the difference to some counter somewhere. Of course, I have no idea if Linux, or any of the disk device drivers, actually *do* this, but I wouldn't think it would be *that* hard. Of course, I really don't have any idea what I am talking about, but that's never stopped me before ;-) -- Ben Scott <[EMAIL PROTECTED]> | The opinions expressed in this message are those of the author and do not | | necessarily represent the views or policy of any other person, entity or | | organization. All information is provided without warranty of any kind. | ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
On Fri, 20 Dec 2002, at 1:47pm, [EMAIL PROTECTED] wrote: > Well, if he's using DMA, then technically none (or at least > very little). Well, okay, yah. When I said "waiting for the disk", I didn't mean "doing nothing else", but rather "doing something else (including running the idle loop) while waiting for the disk". Upon further consideration, I suppose interrupt latency (i.e., disk completes request, but system is busy doing something else, so the ISR does not get called immediately) would affect my suggested algorithm. But you could probably make up a good excuse for including interrupt latency in the benchmark. ;-) > It sounded to me he was trying to get an idea of how often requests > weren't being serviced because the disks were already busy. He specifically said he was after "disk busy time", which I presume to mean he wants to know what percent of time the disk drive is not idle. Rather like the "CPU utilization" (percent of CPU time not spent in the kernel idle loop). -- Ben Scott <[EMAIL PROTECTED]> | The opinions expressed in this message are those of the author and do not | | necessarily represent the views or policy of any other person, entity or | | organization. All information is provided without warranty of any kind. | ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
(sorry for the delay in answering, I've been away from e-mail since before Christmas :) In a message dated: Fri, 20 Dec 2002 13:25:07 EST [EMAIL PROTECTED] said: >On Fri, 20 Dec 2002, at 11:17am, [EMAIL PROTECTED] wrote: >> Without all of that information, unless you can show that the system's I/O >> requests exceeded the system's capacity to provide I/O during some time >> period ... > > But is Paul really looking for that kind of detail? It sounds to me like >Paul just wants to know how much time the *system* spends waiting for the >disk (or disks). Compare that to total system time, and you have a fair >indication of whether the system is waiting on the disk a lot. Maybe the >system isn't waiting on the disk a lot, in which case, you look elsewhere >for the cause of your performance problem. > > (Paul, if I'm wrong in my assumptions about what you want, please say so.) Nope, you're right on. > You could measure time spent waiting on the disk in the device driver >fairly easily. Yeah, but that would mean hacking kernel code, and I'd like to avoid that if possible. Ideally, you'd think that there would be some way to query the device and determine if it can respond to a request, some-what like 'ping' for devices. > Of course, I have no idea if Linux, or any of the disk device drivers, >actually *do* this, but I wouldn't think it would be *that* hard. Of >course, I really don't have any idea what I am talking about, but that's >never stopped me before ;-) Ahh, we won't hold that against you :) -- Seeya, Paul -- Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE It may look like I'm just sitting here doing nothing, but I'm really actively waiting for all my problems to go away. If you're not having fun, you're not doing it right! ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
In a message dated: Mon, 23 Dec 2002 11:32:12 EST mike ledoux said: >Is this different somehow from the %util field in 'iostat -x'? The %util field will probably suffice, however, it appears to require a kernel patch for it to work (note, RH stock kernels already have the patch applied) and I can't seem to find the actual patch (after only 30 seconds of looking in /pub/linux/sct/fs/profiling/ where the 'sysstat' web site states it should be). I'll find the patch and play with it. It might suffice for the people who care/are pestering me :) >> The interesting thing is that iostat will tell you how many requests >> per second were made, and how much data per second was transferred, >> but there is no correlation between the two; i.e. I know how many >> requests were made, but not the size of those requests. > >'iostat -x' gives you an average request size. You don't actually want >to know the exact size of each individual request, do you? Well, no, average size might suffice. >> Therefore, I have no idea what percentage of requests is represented >> by the amount of data transferred. > >100%. 100% of the requests made, or 100% of the requests serviced? Of the number of requests made per second, how many were actually completed? Obviously the amount of data transferred == number of requests per second completed. However, there may have been more requests made, than were completed. How do you measure that delta? -- Seeya, Paul -- Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE It may look like I'm just sitting here doing nothing, but I'm really actively waiting for all my problems to go away. If you're not having fun, you're not doing it right! ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
In a message dated: Fri, 20 Dec 2002 13:47:44 EST Derek Martin said: >It sounded to me he was trying to get an idea of how often requests >weren't being serviced because the disks were already busy. Well, yeah. >So, I dunno... Paul, what /are/ you talking about? I want to measue the amount of time a disk is busy doing "stuff" from an OS perspective. I don't really care about it at the DMA level, but, if the OS/kernel makes a request, which gets sent to the DMA controller, and is forced into a context switch while waiting for the DMA controller to come back with the answer, then technically, that time spent doing "other stuff" while waiting, would constitute as "disk busy time". Does that help? -- Seeya, Paul -- Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE It may look like I'm just sitting here doing nothing, but I'm really actively waiting for all my problems to go away. If you're not having fun, you're not doing it right! ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
In a message dated: Fri, 20 Dec 2002 18:17:32 EST [EMAIL PROTECTED] said: > He specifically said he was after "disk busy time", which I presume to >mean he wants to know what percent of time the disk drive is not idle. >Rather like the "CPU utilization" (percent of CPU time not spent in the >kernel idle loop). EXACTLY :) -- Seeya, Paul -- Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE It may look like I'm just sitting here doing nothing, but I'm really actively waiting for all my problems to go away. If you're not having fun, you're not doing it right! ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Performance monitoring?
On Thu, 2 Jan 2003, at 10:59am, [EMAIL PROTECTED] wrote: > The %util field will probably suffice, however, it appears to require a > kernel patch for it to work (note, RH stock kernels already have the patch > applied) and I can't seem to find the actual patch ... If all else fails, the source RPM from Red Hat will contain the patch. RPM's management of "pristine" sources, patches, and building binary packages is one of the really nice things about RPM. -- Ben Scott <[EMAIL PROTECTED]> | The opinions expressed in this message are those of the author and do not | | necessarily represent the views or policy of any other person, entity or | | organization. All information is provided without warranty of any kind. | ___ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss