mike ledoux wrote: > On Tue, Jan 15, 2008 at 04:57:02PM -0500, Bruce Dawson wrote: > >> Its been several years (and major kernel versions) since I've played >> with iostat, but perhaps my statements here will goad someone with more >> recent experience to inject more accurate truths... >> >> * iostat used to "not work well" on SMP systems. >> > > That's unfortunate. Hopefully that's been fixed. > > >> * your "510kB/s average write on dm-5, but only 184.01kB/s average >> write on the enclosing PV?" observation may be due to "write >> behinds", caching, and seek/latency optimization. >> > > Caching issues was my first thought, but doesn't apply to the 43+ > day average numbers in play here. I don't think any of those other > optimizations would have such a significant effect over periods this > large, either. Eventually all of the data written to the LV needs > to be written to the PV, right? > > Sigh. That's what I used to think until I discovered the seek/latency optimization, and optimization in general. But I have to prefix what I'm about to say with: I haven't read the code, so I don't really know what's going on.
Now, with that said, consider: * On some file systems, when all-zero or all-one blocks are written (or read), the zeros and ones won't actually be written, but update a pointer to the "ones" or "zeros" block list. I don't think ext2 (or ext3) is smart enough to do this, but I believe some others are (zfs?) To make matters even more complex, some drives handle this in the drive controller. * With seek/latency optimization, data isn't always written to the disk until the physical head is near the sector. With further optimization, there may be several writes to the same block but only the last is physically written, previous writes to the same block are dropped - this happens a lot on swap partitions (if Linux actually swaps anymore). You'll also see it when, for instance, semaphores are managed on a disk file, shared memory mapped files, quorum disks, ... This would account for more data being written to the logical volume than to the physical volume. * Raid volumes (especially stripping) causes all sorts of confusion between physical and logical volumes. Mirroring is only slightly better because you can *eventually* find the delayed write/read. * Some file systems (especially AdvFS - is that on Linux yet?) can do a surprising amount of activity for a simple write. I remember an incident (admittedly during some pathological testing) where we wrote a single block and it took the filesystem something like 10K physical I/Os to find enough metadata, inodes, ... to commit it to disk. When testing/debugging filesystems, I always try to turn on synchronous writes so I can tell what's going on. Unfortunately, that's counter to performance. >> * iostat essentially just does arithmetic on the counters kept by >> the kernel. >> * For long uptimes, counters can overflow and produce some *really >> strange* numbers. I would expect Linux to use 64 bit counters in >> recent kernels though. >> > > I'd hope there would be some trap to reset all of the counters to 0 > if one overflows, but that may just be dreaming on my part. That > may be what is happening, though, as the numbers look OK on 10 > second intervals. I suppose I'll have to schedule a reboot to get > decent numbers. *grumble* > > >> Don't you just love documentation written by developers (I'm referring >> to the iostat man page)? >> > > I like it, but that's just me. > > Hmm. Sounds like the man page improved at some point. But, I was referring to all the information the developer doesn't put in there because they never anticipated the program to be used "for that". As a developer, I sympathize with them; as a user, I curse them. --Bruce _______________________________________________ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/