Re: SAR and Linux File system IO counts.

Jon Doyle Wed, 12 Dec 2001 14:10:09 -0800

Right,

Funny, I was just having this same issue and I went and found that there
is a patch to the kernel you need, however, from what I have heard from
some folks at SuSE is that this hits performance and maintaining that
patch is why it is not in the Kernel. So, from further investigation it
looks like there is a way to grab the info from proc. Andrea wrote this:

-------------

The most messy part of that stats is the index.

You see there are 4 columns of numbers per line. Of course you can have
more than 4 harddisks. This mean that some of such columns will be
shared by multiple harddisks.

They are shared between IDE and SCSI and the other drivers like the DAC960
and compaq smart array. If your machine is only ide based the first column
means hda the seconds hdb, third is hdc and then hdd.

If you're SCSI based their refers to the first four scsi disks: sda, sdb,
sdc, sdd.

Same for DAC960 and compaq smart array. This is the code that converts the
major/minor pair to the stat column (aka disk_index). Only index &gt;=0
and
&lt;4 are considered:

        switch (major) {
                case DAC960_MAJOR+0:
                        disk_index = (minor &amp; 0x00f8) &gt;&gt; 3;
                        break;
                case SCSI_DISK0_MAJOR:
                case COMPAQ_SMART2_MAJOR+0:
                case COMPAQ_SMART2_MAJOR+1:
                case COMPAQ_SMART2_MAJOR+2:
                case COMPAQ_SMART2_MAJOR+3:
                case COMPAQ_SMART2_MAJOR+4:
                case COMPAQ_SMART2_MAJOR+5:
                case COMPAQ_SMART2_MAJOR+6:
                case COMPAQ_SMART2_MAJOR+7:
                        disk_index = (minor &amp; 0x00f0) &gt;&gt; 4;
                        break;
                case IDE0_MAJOR:        /* same as HD_MAJOR */
                case XT_DISK_MAJOR:
                        disk_index = (minor &amp; 0x0040) &gt;&gt; 6;
                        break;
                case IDE1_MAJOR:
                        disk_index = ((minor &amp; 0x0040) &gt;&gt; 6) +
2;
                        break;
                default:
                        disk_index = -1;
                        break;

      }
        if (disk_index &gt;= 0 &amp;&amp; disk_index &lt; 4)
                drive_stat_acct(req-&gt;cmd, req-&gt;nr_sectors,
disk_index);

Not all blockdevices are traked by the stats, but only the major/minor
above are.

To make an example if you have both IDE and SCSI the first column is the
sum of the stats of hda and sda.

In your below example you can see you don't have any hdd/sdd/whatever that
would generate a disk_index == 3.

&gt;disk 2603142 686350 140320 0

`disk' mean how many requests are been sent to the device driver (doesn't
matter if they were read or writes). The number of requests in general
corresponds to the number of DMA transaction done in scatter gather (in
SCSI could mean the number of scsi command sent to the hardware). Of
course a SCSI command can be larger than blocksize (with our kernel you
can do up to 64k of I/O per SCSI command). I recently raised this limit to
512k per SCSI command in 2.4.x btw (patch is pending in Linus's mailbox
:).

&gt;disk_rio 1208895 169703 140320 0
&gt;disk_wio 1394247 516647 0 0

`disk_rio' are exactly like the above `disk' _but_ they only include the
_read_ requests. `disk_wio' only includes write requests.

This mean `disk_rio + disk_wio' is always equal to `disk':

1208895+1394247 == 2603142

This also shows how this interface is badly designed (`disk' is completly
redoundant and it's wasting CPU time and memory).

&gt;disk_rblk 9671004 1357024 561094 0
&gt;disk_wblk 11153976 4133176 0 0

disk_rblk says how many _sectors_ (not number of requests) are been sent
to the hardware.

A sector means 512 byte (doesn't matter the real hardsector size, we
always consider it 512bytes for the kernel internals).

So on your hda+sda+whateverwithdisk_index==1 you did (9671004+11153976)/2
kbyte of I/O == 9gigabyte of I/O since you booted the machine. (btw your
hdc/sdc is probably a CDROM because it only did read-I/O)

One nice feature that you could provide by snapshotting at regular
intervals the disk_rblk/disk_wblk/disk_rio/disk_wrio, is to say the mean
size of the requests that are being sent to the hardware to show how well
the request coalescing/merging is working. To do that you can simply:

        disk_rblk/disk_rio
        disk_wblk/disk_wio

and you'll know the mean size of the read write I/O request in the last X
seconds.

The bigger the number of sector per request is, the better :).

Andrea

---------------

Jon R. Doyle
Sendmail Inc.
6425 Christie Ave
Emeryville, Ca. 94608

                   (o_
       (o_   (o_   //\
       (/)_  (\)_  V_/_

On Wed, 12 Dec 2001, Eric Lawler wrote:

> Hello,
>
> We have SAR running on Linux guests under VM/ESA 2.4.0. We get the CPU,
> memory and swaps statistics, but nothing for the filesystem I/O as displayed
> by the SAR IOSTAT command. The README file states that SAR will only report
> on information present in the /proc filesystem so I think we are not
> recording IO statistics. How do we turn it on, if we can or is there
> something peculiar with having the EXT2 filesystem on VM CMS minidisks ?
>
> Many thanks.
>
> Eric.
>
>
>
>
> Eric Lawler,
> Infrastructure Support (Platforms),
> Barclays Service Provision,
> Ground Floor (A7), Block 10, Radbroke Hall, WA16 9EU (M).
> Phone: 7-2000-3729 (Internal)
> +44 (0)1565 613729 (External)
> Email: [EMAIL PROTECTED]
>
>
> Internet communications are not secure and therefore the Barclays Group
> does not accept legal responsibility for the contents of this message.
> Although the Barclays Group operates anti-virus programmes, it does not
> accept responsibility for any damage whatsoever that is caused by
> viruses being passed.  Any views or opinions presented are solely those
> of the author and do not necessarily represent those of the Barclays
> Group.  Replies to this email may be monitored by the Barclays Group
> for operational or business reasons.
>

Re: SAR and Linux File system IO counts.

Reply via email to