Supports for new slab allocator now in latest release
I just wanted to let people know that as a result of a discussion on linux-mm I've added support for the new slab allocator to my collectl utility, now making it real easy to dynamically monitor allocations along with all the other types of monitoring collectl does. I've also put together a webpage at http://collectl.sourceforge.net/SlabInfo.html to give a taste of how this all works as well as to show a few different types of output. -mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Update on collectl
Last summer I announced that I had released a performance monitoring tool called collectl and just wanted to let people know I've since significantly improved the website at http://collectl.sourceforge.net/ to include examples, a block diagram and even included a couple of pages on some interesting kernel problems it helped identify, though they've since been addressed. Perhaps one of the more interesting ones is that not too long ago, and I'm really not sure when it actually got fixed, it was impossible to accurately measure network traffic at 1 second intervals and worse, you'd periodically see double the actual rate reported. Try it out on an older kernel and see for yourself! However, since collectl can monitor at subsecond intervals you could monitor those older kernels at 0.9765 seconds and see accurate data. Rather than me try to explain it, take a look at http://collectl.sourceforge.net/NetworkStats.html to read more. I think a couple of other features I may not have said enough about is monitoring Infiniband and Lustre performance, for which I don't believe there are any good tools available. You can get IB data from asking the switch, but you can't easily get it from the local system. There is actually a wealth of information Lustre provides but no good tools to mine it. Now there is. With collectl you can see a second-by-second (or any other interval you prefer) snapshot of just what is happening to these key resources and can even watch the load on cpu, memory and network at the same time. If you prefer, and most people do, just run collectl as a service and it will maintain a set of compressed rolling logs containing 10 second samples (all customizable) and do it all at <0.1% of system overhead. Enough rambling already. Download it and see for yourselves... -mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Patch for inconsistent recording of block device statistics
The read/write statistics for both sectors and merges are calculated at the time requests first enter the request queue but the remainder of the statistics, such as the number of read/writes are calculated at the time the I/O completes. As a result, one cannot accurately determine the data rates read or written at the actual time the I/O is performed. This behavior is masked with smaller queue sizes but is very real and was very noticeable with earlier 2.6 kenels using the cfq scheduler which had a default queue size of 8192 where the time difference between these sets of counters could exceed 10 seconds for large file writes and small monitoring intervals such as 1 second. In that environment, one would see extremely high bursts of I/O, sometimes exceeding 500 or even 1000 MB/sec for the first second or two and then drop to 0 for a long time while the 'number of operations' counters accurately reflect what is really happening. The attached patch fixes this problem by simply accumulating the read/write sector/merge data in temporary variables stored in the request queue entry, and when the I/O completes copies those values to the disk statistics block. -mark diff -uprN -X dontdiff ../linux-2.6.11.4/drivers/block/ll_rw_blk.c ../linux-2.6.11.4-mjs/drivers/block/ll_rw_blk.c --- ../linux-2.6.11.4/drivers/block/ll_rw_blk.c2005-03-15 19:09:00.0 -0500 +++ ../linux-2.6.11.4-mjs/drivers/block/ll_rw_blk.c2005-03-22 15:43:07.0 -0500 @@ -2107,13 +2107,13 @@ void drive_stat_acct(struct request *rq, return; if (rw == READ) { -__disk_stat_add(rq->rq_disk, read_sectors, nr_sectors); +rq->read_sectors_accum += nr_sectors; if (!new_io) -__disk_stat_inc(rq->rq_disk, read_merges); +rq->read_merges_accum += 1; } else if (rw == WRITE) { -__disk_stat_add(rq->rq_disk, write_sectors, nr_sectors); +rq->write_sectors_accum += nr_sectors; if (!new_io) -__disk_stat_inc(rq->rq_disk, write_merges); +rq->write_merges_accum += 1; } if (new_io) { disk_round_stats(rq->rq_disk); @@ -2487,6 +2487,11 @@ get_rq: req->rq_disk = bio->bi_bdev->bd_disk; req->start_time = jiffies; +req->write_sectors_accum=0; +req->write_merges_accum=0; +req->read_sectors_accum=0; +req->read_merges_accum=0; + add_request(q, req); out: if (freereq) @@ -2989,10 +2994,14 @@ void end_that_request_last(struct reques case WRITE: __disk_stat_inc(disk, writes); __disk_stat_add(disk, write_ticks, duration); +__disk_stat_add(disk, write_sectors, req->write_sectors_accum); +__disk_stat_add(disk, write_merges, req->write_merges_accum); break; case READ: __disk_stat_inc(disk, reads); __disk_stat_add(disk, read_ticks, duration); +__disk_stat_add(disk, read_sectors, req->read_sectors_accum); +__disk_stat_add(disk, read_merges, req->read_merges_accum); break; } disk_round_stats(disk); diff -uprN -X dontdiff ../linux-2.6.11.4/include/linux/blkdev.h ../linux-2.6.11.4-mjs/include/linux/blkdev.h --- ../linux-2.6.11.4/include/linux/blkdev.h2005-03-15 19:09:02.0 -0500 +++ ../linux-2.6.11.4-mjs/include/linux/blkdev.h2005-03-22 15:42:47.0 -0500 @@ -176,6 +176,12 @@ struct request { * For Power Management requests */ struct request_pm_state *pm; + +/* + * accumulate intermediate stats + */ +unsigned long read_sectors_accum, write_sectors_accum; +unsigned long read_merges_accum, write_merges_accum; }; /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Patch for inconsistent recording of block device statistics
I don't like this patch, it adds 4 * sizeof(unsigned long) to struct request when it can be solved without adding anything. The idea is sound, though, the current way the stats are done isn't very interesting. Actually I wasn't all that excited about using the extra variable myself. However, I wasn't entirely sure what was going on and this at least allowed me to test the concept without doing anything harmful. How about accounting merges the way we currently do it, since that piece of the stats _is_ interesting at queueing time. And then account completion in __end_that_request_first(). Untested patch attached. I also agree with your suggestion about keeping the merged counts where they are and am copying the author of iostat to suggest the man page be updated to reflect the fact that merges are counts for requests queued rather than 'issued to the device' as it currently states. re: your patch - I did try it on both an Operton and Xeon box. It worked find on the Opeteron and reported 0 for all the sectors on the Xeon. If nothing immediately jumps to your mind could it have been something I did wrong? I'll try another build after I send this along, but I don't see how that will help as I did the first one from a brand new source kit. The one thing that still jumps out at me about this patch is that the sectors are being counted in one routine and the number of I/Os in another. If the best place to update the sector counts is indeed where you suggest doing it, is there any reason not to move the update code for all the disk stats from end_that_request_last() to that same place as well for consistency and for better assurances that they are updated as close to the same point in time as possible? -mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Process level I/O stats?
Apologies if this has been discussed recently but I couldn't find anything. As I've seen others ask over the years, have there been any newer thoughts on when and how this capability might be added? -mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Patch for inconsistent recording of block device statistics
re: your patch - I did try it on both an Operton and Xeon box. It worked find on the Opeteron and reported 0 for all the sectors on the Xeon. If nothing immediately jumps to your mind could it have been something I did wrong? I'll try another build after I send this along, but I don't see how that will help as I did the first one from a brand new source kit. Sounds very strange, it is generic code so should work for all. Different storage? Works fine now. Obviously I screwed up something and just wanted to let you know it was cockpit error on my end. Is your plan to move this into some future kernel? Do you need anything more from me at this point? -mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
announcing collectl - a new performance monitoring tool
Just a quick plug for a utility I wrote a number of years ago and have recently open sourced. It's been around as an internal tool for about 4 years and so has been pretty well shaken out. There's a pretty good description and some example output at http://collectl.sourceforge.net/index.html and you can download it at http://sourceforge.net/projects/collectl. What I believe makes this tool different from the already large number of 'stat' and other performance monitoring tools is its goal is to be a one-stop place for everything and it can collect data on most system counters as well as non-standard things such as lustre, infiniband, quadrics to name a few. The data of your choice (the default is cpu, disk and network) is displayed horizontally for easy reading, one line per sample, which is only limited by how wide you want to make your window. You can display summaries, such as aggregate cpu, disk, network, slabs or even lustre traffic OR you can report detail level data such as individual NIC traffic, disks, cpus and in the case of lustre individual OSTs! How about only reporting those slabs that changed in size between polling intervals? Or how about seeing nfsstat output on a single line! You can even choose fractional polling intervals. Did you know network stats are only updated once a second? Run collectl with a 0.1 second polling interval and see for yourself! Another biggie is collectl can generate data in 'space separated' format so that you can easily plot it using gnuplot, openoffice or if you're of that persuasion you can even use excel. And the best news of all, collectl is very light-weight, using about 0.03% of the cpu on my amd/xeon boxes. Naturally your mileage may vary depending how many processes or devices may be on your system, but we run it continuously on most of our systems and don't even know it's there. There's far too much to say about it so I won't try. Install it, read the FAQ and check out the 'extended help' and all the man pages - yes, there are multiple ones. Feel free to download it and let me know what you think. Just be aware I'm getting ready to release a new version so if you like what you see check back in a day or so and get a newer version. -mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/