Re: Extremely slow read performance, but write speeds are near-perfect!

Adrian Head Fri, 28 Nov 2008 14:46:37 -0800

Just a quick update on where I got to regarding this.

Spent quite a few days & nights playing and was finally able to get
the following results (from 140 tests of  bonnie++):

Block Write:
 Min             93.35 MB/s
 Avg             98.57 MB/s
 Max          103.22 MB/s

Block Read:
 Min            127.79 MB/s
 Avg            136.45 MB/s
 Max           142.27 MB/s

This is for the xen linux client reading and writing to a reiserfs
file-system on top of LVM on top of SW RAID1 across two iSCSI exported
drives that exist as an LV on top of SW RAID0 on the targets.  One
target being on the Xen host machine the other over the real network.

After reading the AoE alignment paper referenced from this mailing
list - I was a little confused as to where the realignment took place.
 I've assumed that you change these values on the exported drive and
not on the source drive.  I did play with changing the heads/sectors
value on the exported drives.  When I did this - if I created a
partition with fdisk I got significantly slower writes and reads.  If
I changed the heads/sectors value but didn't create a partition and
just had SW RAID use the whole disk I received a 17% improvement in
write speeds as per the RAID rebuild speed provided by "/proc/mdstat".
 The file-system tests by dd also showed a 13% improvement but the raw
LV read test suffered a 22% decrease.

The read and write values are the average from 10 runs of dd whereas
the RAID1 check value is the average value from /proc/mdstat during
one rebuild session.  This is all from the point of the xen client
only.
                                              Without Alignment
With Alignment on exported target
Read       /dev/sda                                   143
       145 MB/s                     1.38 % Inc
Read       /dev/sdb                                   103
       107 MB/s                      3.74 % Inc
Read       /dev/md0                                  106
      106 MB/s                      0.00 % No Change
Read       /dev/vg_md0/iscsi_test          145                     113
MB/s                   22.07 % Dec
Write       /mnt/iscsi_test/iscsi_test.raw  106                    106
MB/s                      0.00 % No Change
Read      /mnt/iscsi_test/iscsi_test.raw   92.9                  107
MB/s                    13.18 % Inc
RAID1 Check/Rebuild                             63
   76 MB/s                    17.11 % Inc

Some observations:
*  When running iSCSI across a xen virtual bridged network you can
almost get raw disk speed.  (sda is exported from the xen host and
147MB/s is the raw speed for the test within the host on the raw
disk).
* Although some layers might be low (e.g. md0=106MB/s) the actual
workable file-system speed might be quite reasonable and faster than
expected (e.g. iscsi_test_raw =107MB/s)
* Therefore, it seems quite pointless pontificating over speeds of
various layers - only the file-system layer counts.
* By extension you only need to optimise a couple key layers to
improve the results - not all layers.  For example - I did try using
blockdev --setra 65536 on all layers on both target and initiator and
the speeds dropped through the floor.  Exactly which layers should
have blockdev optimised is yet to be determined.  It appears at the
moment that on the target the raw disks need blockdev as well as the
iSCSI exported disks on the initiator.  It also appears that blockdev
on md0 seems to be of some help but is a bit iffy - can either
drastically improve the speed or crash the speed through the floor
depending on value.  The LVM auto read-ahead may also give better
results than a blanket blockdev in some cases.
* That changing the heads/sector values for the exported disk does
improve speed.  Just don't use a partition as it crashes the speed
once the geometry change has been made - use the complete disk - at
least for SW RAID.
* On write - the network is 99% utilised (as per nettop) but on reads
even for the speeds above the network is only ~80% utilised.  I'm not
sure why the big difference.  Maybe I have to change the target from
IET to something else or maybe I have to tweek the network
optimisation settings.  But why it is better in one direction than the
other given that most things are equal are a mystery.  I'm now also
looking at what xen interactions there might be.

Areas for further effort:
* The raw disks are 512 sectors and are using either 4K or 128K SW
RAID0 chunks on the targets***.  The xen client RAID1 of the exported
disks are using chunk sizes of 32K.   It would be interesting to see
what aligning all these chunk values would do for performance.  Even
aligning the disk chuck values against the network MTU.
* To use the LVM dm-mirror instead of the SW RAID1 code to see what
performance that might have on the xen client across the iSCSI
exported drives.
* Try to find out which blockdev values to use on the different layers
or which layers to optimise and which not to get the best file-system
speeds.  Early tests have resulted in 110MB/s or better reads under
some situations - but the results are currently not consistent.
* Try to optimise the network further to try to get the full 117MB/s
for a /dev/sdb read.  I can reach this speed using dd > netcat >
netcat > dd  so it should be possible with iSCSI as the PCI bus
doesn't seem to be a limiting factor.
* Work out why the big difference in network utilisation between read
and write.  Look into what affect xen might play with this.

Using RAID1 on exported iSCSI disks at least looks like a reasonable
solution for redundancy.  I've already encountered a situation where
an eSATA cable was dislodged from the xen host and the client keep
running uninterrupted using the exported iSCSI across the real network
without any noticeable effects.  In fact I didn't notice for a couple
of days.  And re-adding was easy and trouble-free and almost as fast
as it would have been given a raw physical disk   I might even look at
this as a solution for backups in the future (having RAID1 across 3 or
o more exported disks - to take a backup - break the mirror).

Thanks for people's advice and suggestions.

Adrian.

***Why the difference in SW RAID0 chunk sizes between targets?:  I've
got a bug in my xen kernel where if I use SW RAID0 chuck sizes smaller
than 128K on the xen host - I get file-system corruption on the xen
clients.  Not sure why as I have not found anyone willing to help me
drill down to find out why.  Ext2/3 is worse than reiserfs.  Reiserfs
has survived for ~6months whereas ext3 will kill everything in
seconds.***

On 11/22/08, Bart Van Assche <[EMAIL PROTECTED]> wrote:
> On Sat, Nov 22, 2008 at 10:58 AM, Tracy Reed <[EMAIL PROTECTED]> wrote:
>> On Sat, Nov 22, 2008 at 09:44:09AM +0100, Bart Van Assche spake thusly:
>>> Maybe not the advice you are looking for, but did you already have a
>>> look at the SCST iSCSI target implementation ? It's faster than IET
>>> and better maintained. There is also a section about how to tune
>>
>> Does it do MC/S or Error Recovery Level 2 or 3?
>
> Not that I know of. Would you like to see these features implemented in SCST
> ?
>
> (CC'd Vladislav Bolkhovitin, SCST maintainer)
>
> Bart.
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Re: Extremely slow read performance, but write speeds are near-perfect!

Reply via email to