mike ledoux wrote:
> On Tue, Jan 15, 2008 at 04:57:02PM -0500, Bruce Dawson wrote:
>   
>> Its been several years (and major kernel versions) since I've played
>> with iostat, but perhaps my statements here will goad someone with more
>> recent experience to inject more accurate truths...
>>
>>     * iostat used to "not work well" on SMP systems.
>>     
>
> That's unfortunate.  Hopefully that's been fixed. 
>
>   
>>     * your "510kB/s average write on dm-5, but only 184.01kB/s average
>>       write on the enclosing PV?" observation may be due to "write
>>       behinds", caching, and seek/latency optimization.
>>     
>
> Caching issues was my first thought, but doesn't apply to the 43+
> day average numbers in play here.  I don't think any of those other
> optimizations would have such a significant effect over periods this
> large, either.  Eventually all of the data written to the LV needs
> to be written to the PV, right?
>
>   
Sigh. That's what I used to think until I discovered the seek/latency
optimization, and optimization in general. But I have to prefix what I'm
about to say with: I haven't read the code, so I don't really know
what's going on.

Now, with that said, consider:

    * On some file systems, when all-zero or all-one blocks are written
      (or read), the zeros and ones won't actually be written, but
      update a pointer to the "ones" or "zeros" block list. I don't
      think ext2 (or ext3) is smart enough to do this, but I believe
      some others are (zfs?) To make matters even more complex, some
      drives handle this in the drive controller.
    * With seek/latency optimization, data isn't always written to the
      disk until the physical head is near the sector. With further
      optimization, there may be several writes to the same block but
      only the last is physically written, previous writes to the same
      block are dropped - this happens a lot on swap partitions (if
      Linux actually swaps anymore). You'll also see it when, for
      instance, semaphores are managed on a disk file, shared memory
      mapped files, quorum disks, ... This would account for more data
      being written to the logical volume than to the physical volume.
    * Raid volumes (especially stripping) causes all sorts of confusion
      between physical and logical volumes. Mirroring is only slightly
      better because you can *eventually* find the delayed write/read.
    * Some file systems (especially AdvFS - is that on Linux yet?) can
      do a surprising amount of activity for a simple write. I remember
      an incident (admittedly during some pathological testing) where we
      wrote a single block and it took the filesystem something like 10K
      physical I/Os to find enough metadata, inodes, ... to commit it to
      disk.

When testing/debugging filesystems, I always try to turn on synchronous
writes so I can tell what's going on. Unfortunately, that's counter to
performance.
>>     * iostat essentially just does arithmetic on the counters kept by
>>       the kernel.
>>     * For long uptimes, counters can overflow and produce some *really
>>       strange* numbers. I would expect Linux to use 64 bit counters in
>>       recent kernels though.
>>     
>
> I'd hope there would be some trap to reset all of the counters to 0
> if one overflows, but that may just be dreaming on my part.  That
> may be what is happening, though, as the numbers look OK on 10
> second intervals.  I suppose I'll have to schedule a reboot to get
> decent numbers. *grumble*
>
>   
>> Don't you just love documentation written by developers (I'm referring
>> to the iostat man page)?
>>     
>
> I like it, but that's just me.
>
>   
Hmm. Sounds like the man page improved at some point.

But, I was referring to all the information the developer doesn't put in
there because they never anticipated the program to be used "for that".
As a developer, I sympathize with them; as a user, I curse them.

--Bruce
_______________________________________________
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

Reply via email to