Re: Slow Soft-RAID 5 performance

2007-07-19 Thread Rui Santos
koan wrote:
> Are you sure about that chunk size? In you initial posting you show
> /proc/mdstat reporting:
>
> "md2 : active raid5 sdc3[2] sda3[0] sdb3[1]
>  780083968 blocks level 5, 128k chunk, algorithm 2 [3/3] [UUU]"
>
> Which would seem to state a 128K chunk, and thus with a 4k block size
> you would need a stride of 32.

Hi Koan,

Yes, I'm sure... Those 128K chunk was my initial setup, before the
enlightenment from http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html
My reported test setup is by using 256K chunk.

>
>
>
> On 7/18/07, Rui Santos <[EMAIL PROTECTED]> wrote:
>> koan wrote:
>> > How did you create the ext3 filesystem?
>>
>> The chunk_size is at 256KB, ext3 block size is 4k. I believe the correct
>> option that should be passed trough to --stride is 64.
>> Am I correct ?
>>
>> I've also tested ( after sending my first report ) with xfs.
>> I've also increases readahead to 65535 on all HD's
>> I've also increases the stripe_cache_size to 16384.
>>
>> I can now get ~100MB/sec...
>>
>> >
>> > Did you use the appropriate --stride option as noted here:
>> > http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html (#5.11)
>> > -
>> > To unsubscribe from this list: send the line "unsubscribe
>> > linux-kernel" in
>> > the body of a message to [EMAIL PROTECTED]
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > Please read the FAQ at  http://www.tux.org/lkml/
>> >
>> >
>> >
>>
>>
>
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Soft-RAID 5 performance

2007-07-19 Thread Rui Santos
koan wrote:
 Are you sure about that chunk size? In you initial posting you show
 /proc/mdstat reporting:

 md2 : active raid5 sdc3[2] sda3[0] sdb3[1]
  780083968 blocks level 5, 128k chunk, algorithm 2 [3/3] [UUU]

 Which would seem to state a 128K chunk, and thus with a 4k block size
 you would need a stride of 32.

Hi Koan,

Yes, I'm sure... Those 128K chunk was my initial setup, before the
enlightenment from http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html
My reported test setup is by using 256K chunk.




 On 7/18/07, Rui Santos [EMAIL PROTECTED] wrote:
 koan wrote:
  How did you create the ext3 filesystem?

 The chunk_size is at 256KB, ext3 block size is 4k. I believe the correct
 option that should be passed trough to --stride is 64.
 Am I correct ?

 I've also tested ( after sending my first report ) with xfs.
 I've also increases readahead to 65535 on all HD's
 I've also increases the stripe_cache_size to 16384.

 I can now get ~100MB/sec...

 
  Did you use the appropriate --stride option as noted here:
  http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html (#5.11)
  -
  To unsubscribe from this list: send the line unsubscribe
  linux-kernel in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
 
 






-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Soft-RAID 5 performance

2007-07-18 Thread koan

Are you sure about that chunk size? In you initial posting you show
/proc/mdstat reporting:

"md2 : active raid5 sdc3[2] sda3[0] sdb3[1]
 780083968 blocks level 5, 128k chunk, algorithm 2 [3/3] [UUU]"

Which would seem to state a 128K chunk, and thus with a 4k block size
you would need a stride of 32.



On 7/18/07, Rui Santos <[EMAIL PROTECTED]> wrote:

koan wrote:
> How did you create the ext3 filesystem?

The chunk_size is at 256KB, ext3 block size is 4k. I believe the correct
option that should be passed trough to --stride is 64.
Am I correct ?

I've also tested ( after sending my first report ) with xfs.
I've also increases readahead to 65535 on all HD's
I've also increases the stripe_cache_size to 16384.

I can now get ~100MB/sec...

>
> Did you use the appropriate --stride option as noted here:
> http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html (#5.11)
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Soft-RAID 5 performance

2007-07-18 Thread Rui Santos


J.A. Magallón wrote:
> On Wed, 18 Jul 2007 10:56:11 +0100, Rui Santos <[EMAIL PROTECTED]> wrote:
>
>   
>> Hi,
>>
>> I'm getting a strange slow performance behavior on a recently installed
>> Server. Here are the details:
>>
>> 
> ...
>   
>> I can get a write throughput of 60 MB/sec on each HD by issuing the
>> command 'time `dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 /
>> 4 )); sync`'
>>
>> 
> ...
>   
>> The RAID device I'm testing on is /dev/md2. Now, by issuing the same
>> command 'dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 / 4 ));
>> sync`' on the raid device mount point, I get the following speeds:
>> With stripe_cache_size at default '265': 51 MB/sec
>> With stripe_cache_size at '8192': 73 MB/sec
>>
>> 
>
> I know many people consider this stupid, but can you post some hdparm -tT
> data ?
>   

Of course. Here's the output:

NewServer-RD:~ # hdparm -tT /dev/md2

/dev/md2:
 Timing cached reads:   1738 MB in  2.00 seconds = 868.93 MB/sec
 Timing buffered disk reads:  444 MB in  3.01 seconds = 147.69 MB/sec


NewServer-RD:~ # hdparm --direct -tT /dev/md2

/dev/md2:
 Timing O_DIRECT cached reads:   290 MB in  2.01 seconds = 144.05 MB/sec
 Timing O_DIRECT disk reads:  396 MB in  3.01 seconds = 131.75 MB/sec

> The culprit can be the filesystem+pagecache, the md driver or the disk
> driver, so I think trying just hdparm will show if the disk o md are
> going nuts...
>
> In my case, I have a box with 2 raids, one with SCSI disks and one with
> IDE ones.
>
> Some results:
>
> lsscsi:
> [0:0:0:0]diskIBM  DDYS-T18350N S96H  /dev/sda
> [2:0:0:0]diskSEAGATE  ST336807LW   0C01  /dev/sdb
> [2:0:1:0]diskSEAGATE  ST336807LW   0C01  /dev/sdc
> [2:0:2:0]diskSEAGATE  ST336807LW   0C01  /dev/sdd
> [2:0:3:0]diskSEAGATE  ST336807LW   0C01  /dev/sde
> [3:0:0:0]diskATA  ST3120022A   3.06  /dev/sdf
> [3:0:1:0]cd/dvd  HL-DT-ST DVDRAM GSA-4040B A300  /dev/sr0
> [4:0:0:0]diskATA  ST3120022A   3.76  /dev/sdg
>
>
> /dev/md0:
> Version : 00.90.03
>   Creation Time : Mon Jun 18 13:40:57 2007
>  Raid Level : raid5
>  Array Size : 107522304 (102.54 GiB 110.10 GB)
>   Used Dev Size : 35840768 (34.18 GiB 36.70 GB)
>Raid Devices : 4
>   Total Devices : 4
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Wed Jul 18 13:31:22 2007
>   State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
>
>  Layout : left-symmetric
>  Chunk Size : 256K
>
>UUID : 51ad72a7:a4d20d15:0f3ea3a1:5ccb49a0
>  Events : 0.2
>
> Number   Major   Minor   RaidDevice State
>0   8   170  active sync   /dev/sdb1
>1   8   331  active sync   /dev/sdc1
>2   8   492  active sync   /dev/sdd1
>3   8   653  active sync   /dev/sde1
>
> This is, four scsi disks on a Adaptec U320, doing raid5:
>
> /dev/sdb:
>  Timing cached reads:   904 MB in  2.00 seconds = 451.84 MB/sec
>  Timing buffered disk reads:  228 MB in  3.00 seconds =  75.90 MB/sec
> /dev/sdc:
>  Timing buffered disk reads:  226 MB in  3.01 seconds =  75.01 MB/sec
> /dev/sdd:
>  Timing buffered disk reads:  228 MB in  3.00 seconds =  75.88 MB/sec
> /dev/sde:
>  Timing buffered disk reads:  226 MB in  3.00 seconds =  75.31 MB/sec
>
> /dev/md0:
>  Timing buffered disk reads:  562 MB in  3.01 seconds = 186.88 MB/sec
>
> Nearly 75x3 = 215 Mb/s. And this looks like a small regression, I remember
> to have seen 200Mb on this setup on previous kernels.
> Performance is like 186/215 = 86%.
>
> And /dev/md1, raid0 on 2 IDE disks:
>
> /dev/sdf:
>  Timing buffered disk reads:  148 MB in  3.02 seconds =  48.93 MB/sec
> /dev/sdg:
>  Timing buffered disk reads:  124 MB in  3.00 seconds =  41.33 MB/sec
>
> /dev/md1:
>  Timing buffered disk reads:  204 MB in  3.01 seconds =  67.68 MB/sec
>
> Performance: 67 / 90 = 75%, more or less...not too good.
>
> Now that I read the hdparm man page, perhaps would be better to repeat
> the tests with hdparm --direct.
>
> --
> J.A. Magallon  \   Software is like sex:
>  \ It's better when it's free
> Mandriva Linux release 2008.0 (Cooker) for i586
> Linux 2.6.21-jam12 (gcc 4.2.1 20070704 (4.2.1-3mdv2008.0)) SMP PREEMPT
> 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
>
>
>
>   
Thanks for your reply.
Rui Santos

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Soft-RAID 5 performance

2007-07-18 Thread Rui Santos
koan wrote:
> How did you create the ext3 filesystem?

The chunk_size is at 256KB, ext3 block size is 4k. I believe the correct
option that should be passed trough to --stride is 64.
Am I correct ?

I've also tested ( after sending my first report ) with xfs.
I've also increases readahead to 65535 on all HD's
I've also increases the stripe_cache_size to 16384.

I can now get ~100MB/sec...

>
> Did you use the appropriate --stride option as noted here:
> http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html (#5.11)
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Soft-RAID 5 performance

2007-07-18 Thread J.A. Magallón
On Wed, 18 Jul 2007 10:56:11 +0100, Rui Santos <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I'm getting a strange slow performance behavior on a recently installed
> Server. Here are the details:
> 
...
> 
> I can get a write throughput of 60 MB/sec on each HD by issuing the
> command 'time `dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 /
> 4 )); sync`'
> 
...
> 
> The RAID device I'm testing on is /dev/md2. Now, by issuing the same
> command 'dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 / 4 ));
> sync`' on the raid device mount point, I get the following speeds:
> With stripe_cache_size at default '265': 51 MB/sec
> With stripe_cache_size at '8192': 73 MB/sec
> 

I know many people consider this stupid, but can you post some hdparm -tT
data ?
The culprit can be the filesystem+pagecache, the md driver or the disk
driver, so I think trying just hdparm will show if the disk o md are
going nuts...

In my case, I have a box with 2 raids, one with SCSI disks and one with
IDE ones.

Some results:

lsscsi:
[0:0:0:0]diskIBM  DDYS-T18350N S96H  /dev/sda
[2:0:0:0]diskSEAGATE  ST336807LW   0C01  /dev/sdb
[2:0:1:0]diskSEAGATE  ST336807LW   0C01  /dev/sdc
[2:0:2:0]diskSEAGATE  ST336807LW   0C01  /dev/sdd
[2:0:3:0]diskSEAGATE  ST336807LW   0C01  /dev/sde
[3:0:0:0]diskATA  ST3120022A   3.06  /dev/sdf
[3:0:1:0]cd/dvd  HL-DT-ST DVDRAM GSA-4040B A300  /dev/sr0
[4:0:0:0]diskATA  ST3120022A   3.76  /dev/sdg


/dev/md0:
Version : 00.90.03
  Creation Time : Mon Jun 18 13:40:57 2007
 Raid Level : raid5
 Array Size : 107522304 (102.54 GiB 110.10 GB)
  Used Dev Size : 35840768 (34.18 GiB 36.70 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Jul 18 13:31:22 2007
  State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 256K

   UUID : 51ad72a7:a4d20d15:0f3ea3a1:5ccb49a0
 Events : 0.2

Number   Major   Minor   RaidDevice State
   0   8   170  active sync   /dev/sdb1
   1   8   331  active sync   /dev/sdc1
   2   8   492  active sync   /dev/sdd1
   3   8   653  active sync   /dev/sde1

This is, four scsi disks on a Adaptec U320, doing raid5:

/dev/sdb:
 Timing cached reads:   904 MB in  2.00 seconds = 451.84 MB/sec
 Timing buffered disk reads:  228 MB in  3.00 seconds =  75.90 MB/sec
/dev/sdc:
 Timing buffered disk reads:  226 MB in  3.01 seconds =  75.01 MB/sec
/dev/sdd:
 Timing buffered disk reads:  228 MB in  3.00 seconds =  75.88 MB/sec
/dev/sde:
 Timing buffered disk reads:  226 MB in  3.00 seconds =  75.31 MB/sec

/dev/md0:
 Timing buffered disk reads:  562 MB in  3.01 seconds = 186.88 MB/sec

Nearly 75x3 = 215 Mb/s. And this looks like a small regression, I remember
to have seen 200Mb on this setup on previous kernels.
Performance is like 186/215 = 86%.

And /dev/md1, raid0 on 2 IDE disks:

/dev/sdf:
 Timing buffered disk reads:  148 MB in  3.02 seconds =  48.93 MB/sec
/dev/sdg:
 Timing buffered disk reads:  124 MB in  3.00 seconds =  41.33 MB/sec

/dev/md1:
 Timing buffered disk reads:  204 MB in  3.01 seconds =  67.68 MB/sec

Performance: 67 / 90 = 75%, more or less...not too good.

Now that I read the hdparm man page, perhaps would be better to repeat
the tests with hdparm --direct.

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.0 (Cooker) for i586
Linux 2.6.21-jam12 (gcc 4.2.1 20070704 (4.2.1-3mdv2008.0)) SMP PREEMPT
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Slow Soft-RAID 5 performance

2007-07-18 Thread koan

How did you create the ext3 filesystem?

Did you use the appropriate --stride option as noted here:
http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html (#5.11)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Soft-RAID 5 performance

2007-07-18 Thread Justin Piszcz



On Wed, 18 Jul 2007, Rui Santos wrote:


Hi,

I'm getting a strange slow performance behavior on a recently installed
Server. Here are the details:

Server: Asus AS-TS500-E4A
Board: Asus DSBV-D (
http://uk.asus.com/products.aspx?l1=9=39=299=0=1210=2
)
Hard Drives: 3x Seagate ST3400620AS (
http://www.seagate.com/ww/v/index.jsp?vgnextoid=8eff99f4fa74c010VgnVCM10dd04090aRCRD=en-US
)
I'm using the AHCI driver, although with ata_piix, the behavior is the
same. Here's some info about the AHCI controler:



With three disks, if everything was perfect yeah 120MB/s writes.  When I 
had started out with 4 raptors I was getting 164MB/s read and write.  By 
default with no optimizations you will not get good speed.


With no optimizations with 10 raptors I get 180-200MB/s, with 
optimizations, 464MB/s write and 622MB/s read.


1. Use XFS if you wan't speed.
2. Use 128k, 256k or 1MiB chunk size.
3. Use 8192k, 16384k stripe_cache_size.
4. Use 65536 readahead size.

These are only some of the optimizations I use.

Justin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Slow Soft-RAID 5 performance

2007-07-18 Thread Rui Santos
Hi,

I'm getting a strange slow performance behavior on a recently installed
Server. Here are the details:

Server: Asus AS-TS500-E4A
Board: Asus DSBV-D (
http://uk.asus.com/products.aspx?l1=9=39=299=0=1210=2
)
Hard Drives: 3x Seagate ST3400620AS (
http://www.seagate.com/ww/v/index.jsp?vgnextoid=8eff99f4fa74c010VgnVCM10dd04090aRCRD=en-US
)
I'm using the AHCI driver, although with ata_piix, the behavior is the
same. Here's some info about the AHCI controler:
   
00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA Storage
Controller AHCI (rev 09) (prog-if 01 [AHCI 1.0])
Subsystem: ASUSTeK Computer Inc. Unknown device 81dc
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
I/O ports at 18c0 [size=8]
I/O ports at 1894 [size=4]
I/O ports at 1898 [size=8]
I/O ports at 1890 [size=4]
I/O ports at 18a0 [size=32]
Memory at c8000400 (32-bit, non-prefetchable) [size=1K]
Capabilities: [70] Power Management version 2
Capabilities: [a8] #12 [0010]


The Kernel boot log is attached as boot.msg

I can get a write throughput of 60 MB/sec on each HD by issuing the
command 'time `dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 /
4 )); sync`'

Until this point everything seems acceptable, IMHO. The problem starts
when I test the software-raid on all three HD's.

Configuration: output of 'sfdisk -l'

Disk /dev/sda: 48641 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start End   #cyls#blocks   Id  System
/dev/sda1   *  0+ 16  17-136521   fd  Linux raid autodetect
/dev/sda2 17  82  66 530145   fd  Linux raid autodetect
/dev/sda3 83   48640   48558  390042135   fd  Linux raid autodetect
/dev/sda4  0   -   0  00  Empty

Disk /dev/sdb: 48641 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start End   #cyls#blocks   Id  System
/dev/sdb1   *  0+ 16  17-136521   fd  Linux raid autodetect
/dev/sdb2 17  82  66 530145   fd  Linux raid autodetect
/dev/sdb3 83   48640   48558  390042135   fd  Linux raid autodetect
/dev/sdb4  0   -   0  00  Empty

Disk /dev/sdc: 48641 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start End   #cyls#blocks   Id  System
/dev/sdc1   *  0+ 16  17-136521   fd  Linux raid autodetect
/dev/sdc2 17  82  66 530145   fd  Linux raid autodetect
/dev/sdc3 83   48640   48558  390042135   fd  Linux raid autodetect
/dev/sdc4  0   -   0  00  Empty

Configuration: output of 'cat /proc/mdstat'

Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [linear]
md0 : active raid1 sda1[0] sdc1[2] sdb1[1]
  136448 blocks [3/3] [UUU]

md1 : active raid5 sda2[0] sdc2[2] sdb2[1]
  1060096 blocks level 5, 128k chunk, algorithm 2 [3/3] [UUU]

md2 : active raid5 sdc3[2] sda3[0] sdb3[1]
  780083968 blocks level 5, 128k chunk, algorithm 2 [3/3] [UUU]

unused devices: 


The RAID device I'm testing on is /dev/md2. Now, by issuing the same
command 'dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 / 4 ));
sync`' on the raid device mount point, I get the following speeds:
With stripe_cache_size at default '265': 51 MB/sec
With stripe_cache_size at '8192': 73 MB/sec


Extra notes:
- All HD have queue_depth at '31', with means NCQ is ON. If I disable
NCQ by setting the value to '1' the write speed achieved is lower.
- Although I started from a fresh openSUSE 10.2 installation, I'm now
running a vanilla 2.6.22.1 kernel
- Kernel is running with Generic-x86-64
- Soft-RAID bitmap is disabled. If Enable it, the performance takes a
serious hit.
- The processor is the Intel  Xeon Dual Core 5060 Family 15 with
Hypertheading activated. If it is deactivated, the performance on this
specific subject is the same.
- Filesystem is ext3


Final quote: Shouldn't I, at least, be able to get write speeds of
120MB/sec instead of the current 73MB/sec? Is this a Soft-RAID problem
or could it be something else ? Or I'm just missing something ?

Thanks for your time,
Rui Santos

Inspecting /boot/System.map-2.6.22.1-default
Loaded 26530 symbols from /boot/System.map-2.6.22.1-default.
Symbols match kernel version 2.6.22.
No module symbols loaded - kernel modules not enabled.

klogd 1.4.1, log source = ksyslog started.
<5>Linux version 2.6.22.1-default ([EMAIL PROTECTED]) (gcc version 4.1.2 
20061115 (prerelease) (SUSE Linux)) #1 SMP Tue Jul 17 14:38:37 WEST 2007
<6>Command line: root=/dev/md2 vga=normal noresume splash=off showopts
<6>BIOS-provided physical RAM map:
<4> BIOS-e820:  - 0009cc00 (usable)
<4> BIOS-e820: 0009cc00 - 

Slow Soft-RAID 5 performance

2007-07-18 Thread Rui Santos
Hi,

I'm getting a strange slow performance behavior on a recently installed
Server. Here are the details:

Server: Asus AS-TS500-E4A
Board: Asus DSBV-D (
http://uk.asus.com/products.aspx?l1=9l2=39l3=299l4=0model=1210modelmenu=2
)
Hard Drives: 3x Seagate ST3400620AS (
http://www.seagate.com/ww/v/index.jsp?vgnextoid=8eff99f4fa74c010VgnVCM10dd04090aRCRDlocale=en-US
)
I'm using the AHCI driver, although with ata_piix, the behavior is the
same. Here's some info about the AHCI controler:
   
00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA Storage
Controller AHCI (rev 09) (prog-if 01 [AHCI 1.0])
Subsystem: ASUSTeK Computer Inc. Unknown device 81dc
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
I/O ports at 18c0 [size=8]
I/O ports at 1894 [size=4]
I/O ports at 1898 [size=8]
I/O ports at 1890 [size=4]
I/O ports at 18a0 [size=32]
Memory at c8000400 (32-bit, non-prefetchable) [size=1K]
Capabilities: [70] Power Management version 2
Capabilities: [a8] #12 [0010]


The Kernel boot log is attached as boot.msg

I can get a write throughput of 60 MB/sec on each HD by issuing the
command 'time `dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 /
4 )); sync`'

Until this point everything seems acceptable, IMHO. The problem starts
when I test the software-raid on all three HD's.

Configuration: output of 'sfdisk -l'

Disk /dev/sda: 48641 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start End   #cyls#blocks   Id  System
/dev/sda1   *  0+ 16  17-136521   fd  Linux raid autodetect
/dev/sda2 17  82  66 530145   fd  Linux raid autodetect
/dev/sda3 83   48640   48558  390042135   fd  Linux raid autodetect
/dev/sda4  0   -   0  00  Empty

Disk /dev/sdb: 48641 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start End   #cyls#blocks   Id  System
/dev/sdb1   *  0+ 16  17-136521   fd  Linux raid autodetect
/dev/sdb2 17  82  66 530145   fd  Linux raid autodetect
/dev/sdb3 83   48640   48558  390042135   fd  Linux raid autodetect
/dev/sdb4  0   -   0  00  Empty

Disk /dev/sdc: 48641 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start End   #cyls#blocks   Id  System
/dev/sdc1   *  0+ 16  17-136521   fd  Linux raid autodetect
/dev/sdc2 17  82  66 530145   fd  Linux raid autodetect
/dev/sdc3 83   48640   48558  390042135   fd  Linux raid autodetect
/dev/sdc4  0   -   0  00  Empty

Configuration: output of 'cat /proc/mdstat'

Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [linear]
md0 : active raid1 sda1[0] sdc1[2] sdb1[1]
  136448 blocks [3/3] [UUU]

md1 : active raid5 sda2[0] sdc2[2] sdb2[1]
  1060096 blocks level 5, 128k chunk, algorithm 2 [3/3] [UUU]

md2 : active raid5 sdc3[2] sda3[0] sdb3[1]
  780083968 blocks level 5, 128k chunk, algorithm 2 [3/3] [UUU]

unused devices: none


The RAID device I'm testing on is /dev/md2. Now, by issuing the same
command 'dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 / 4 ));
sync`' on the raid device mount point, I get the following speeds:
With stripe_cache_size at default '265': 51 MB/sec
With stripe_cache_size at '8192': 73 MB/sec


Extra notes:
- All HD have queue_depth at '31', with means NCQ is ON. If I disable
NCQ by setting the value to '1' the write speed achieved is lower.
- Although I started from a fresh openSUSE 10.2 installation, I'm now
running a vanilla 2.6.22.1 kernel
- Kernel is running with Generic-x86-64
- Soft-RAID bitmap is disabled. If Enable it, the performance takes a
serious hit.
- The processor is the Intel  Xeon Dual Core 5060 Family 15 with
Hypertheading activated. If it is deactivated, the performance on this
specific subject is the same.
- Filesystem is ext3


Final quote: Shouldn't I, at least, be able to get write speeds of
120MB/sec instead of the current 73MB/sec? Is this a Soft-RAID problem
or could it be something else ? Or I'm just missing something ?

Thanks for your time,
Rui Santos

Inspecting /boot/System.map-2.6.22.1-default
Loaded 26530 symbols from /boot/System.map-2.6.22.1-default.
Symbols match kernel version 2.6.22.
No module symbols loaded - kernel modules not enabled.

klogd 1.4.1, log source = ksyslog started.
5Linux version 2.6.22.1-default ([EMAIL PROTECTED]) (gcc version 4.1.2 
20061115 (prerelease) (SUSE Linux)) #1 SMP Tue Jul 17 14:38:37 WEST 2007
6Command line: root=/dev/md2 vga=normal noresume splash=off showopts
6BIOS-provided physical RAM map:
4 BIOS-e820:  - 0009cc00 (usable)
4 BIOS-e820: 

Re: Slow Soft-RAID 5 performance

2007-07-18 Thread Justin Piszcz



On Wed, 18 Jul 2007, Rui Santos wrote:


Hi,

I'm getting a strange slow performance behavior on a recently installed
Server. Here are the details:

Server: Asus AS-TS500-E4A
Board: Asus DSBV-D (
http://uk.asus.com/products.aspx?l1=9l2=39l3=299l4=0model=1210modelmenu=2
)
Hard Drives: 3x Seagate ST3400620AS (
http://www.seagate.com/ww/v/index.jsp?vgnextoid=8eff99f4fa74c010VgnVCM10dd04090aRCRDlocale=en-US
)
I'm using the AHCI driver, although with ata_piix, the behavior is the
same. Here's some info about the AHCI controler:



With three disks, if everything was perfect yeah 120MB/s writes.  When I 
had started out with 4 raptors I was getting 164MB/s read and write.  By 
default with no optimizations you will not get good speed.


With no optimizations with 10 raptors I get 180-200MB/s, with 
optimizations, 464MB/s write and 622MB/s read.


1. Use XFS if you wan't speed.
2. Use 128k, 256k or 1MiB chunk size.
3. Use 8192k, 16384k stripe_cache_size.
4. Use 65536 readahead size.

These are only some of the optimizations I use.

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Slow Soft-RAID 5 performance

2007-07-18 Thread koan

How did you create the ext3 filesystem?

Did you use the appropriate --stride option as noted here:
http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html (#5.11)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Soft-RAID 5 performance

2007-07-18 Thread J.A. Magallón
On Wed, 18 Jul 2007 10:56:11 +0100, Rui Santos [EMAIL PROTECTED] wrote:

 Hi,
 
 I'm getting a strange slow performance behavior on a recently installed
 Server. Here are the details:
 
...
 
 I can get a write throughput of 60 MB/sec on each HD by issuing the
 command 'time `dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 /
 4 )); sync`'
 
...
 
 The RAID device I'm testing on is /dev/md2. Now, by issuing the same
 command 'dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 / 4 ));
 sync`' on the raid device mount point, I get the following speeds:
 With stripe_cache_size at default '265': 51 MB/sec
 With stripe_cache_size at '8192': 73 MB/sec
 

I know many people consider this stupid, but can you post some hdparm -tT
data ?
The culprit can be the filesystem+pagecache, the md driver or the disk
driver, so I think trying just hdparm will show if the disk o md are
going nuts...

In my case, I have a box with 2 raids, one with SCSI disks and one with
IDE ones.

Some results:

lsscsi:
[0:0:0:0]diskIBM  DDYS-T18350N S96H  /dev/sda
[2:0:0:0]diskSEAGATE  ST336807LW   0C01  /dev/sdb
[2:0:1:0]diskSEAGATE  ST336807LW   0C01  /dev/sdc
[2:0:2:0]diskSEAGATE  ST336807LW   0C01  /dev/sdd
[2:0:3:0]diskSEAGATE  ST336807LW   0C01  /dev/sde
[3:0:0:0]diskATA  ST3120022A   3.06  /dev/sdf
[3:0:1:0]cd/dvd  HL-DT-ST DVDRAM GSA-4040B A300  /dev/sr0
[4:0:0:0]diskATA  ST3120022A   3.76  /dev/sdg


/dev/md0:
Version : 00.90.03
  Creation Time : Mon Jun 18 13:40:57 2007
 Raid Level : raid5
 Array Size : 107522304 (102.54 GiB 110.10 GB)
  Used Dev Size : 35840768 (34.18 GiB 36.70 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Jul 18 13:31:22 2007
  State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 256K

   UUID : 51ad72a7:a4d20d15:0f3ea3a1:5ccb49a0
 Events : 0.2

Number   Major   Minor   RaidDevice State
   0   8   170  active sync   /dev/sdb1
   1   8   331  active sync   /dev/sdc1
   2   8   492  active sync   /dev/sdd1
   3   8   653  active sync   /dev/sde1

This is, four scsi disks on a Adaptec U320, doing raid5:

/dev/sdb:
 Timing cached reads:   904 MB in  2.00 seconds = 451.84 MB/sec
 Timing buffered disk reads:  228 MB in  3.00 seconds =  75.90 MB/sec
/dev/sdc:
 Timing buffered disk reads:  226 MB in  3.01 seconds =  75.01 MB/sec
/dev/sdd:
 Timing buffered disk reads:  228 MB in  3.00 seconds =  75.88 MB/sec
/dev/sde:
 Timing buffered disk reads:  226 MB in  3.00 seconds =  75.31 MB/sec

/dev/md0:
 Timing buffered disk reads:  562 MB in  3.01 seconds = 186.88 MB/sec

Nearly 75x3 = 215 Mb/s. And this looks like a small regression, I remember
to have seen 200Mb on this setup on previous kernels.
Performance is like 186/215 = 86%.

And /dev/md1, raid0 on 2 IDE disks:

/dev/sdf:
 Timing buffered disk reads:  148 MB in  3.02 seconds =  48.93 MB/sec
/dev/sdg:
 Timing buffered disk reads:  124 MB in  3.00 seconds =  41.33 MB/sec

/dev/md1:
 Timing buffered disk reads:  204 MB in  3.01 seconds =  67.68 MB/sec

Performance: 67 / 90 = 75%, more or less...not too good.

Now that I read the hdparm man page, perhaps would be better to repeat
the tests with hdparm --direct.

--
J.A. Magallon jamagallon()ono!com \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.0 (Cooker) for i586
Linux 2.6.21-jam12 (gcc 4.2.1 20070704 (4.2.1-3mdv2008.0)) SMP PREEMPT
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Soft-RAID 5 performance

2007-07-18 Thread Rui Santos
koan wrote:
 How did you create the ext3 filesystem?

The chunk_size is at 256KB, ext3 block size is 4k. I believe the correct
option that should be passed trough to --stride is 64.
Am I correct ?

I've also tested ( after sending my first report ) with xfs.
I've also increases readahead to 65535 on all HD's
I've also increases the stripe_cache_size to 16384.

I can now get ~100MB/sec...


 Did you use the appropriate --stride option as noted here:
 http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html (#5.11)
 -
 To unsubscribe from this list: send the line unsubscribe
 linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Soft-RAID 5 performance

2007-07-18 Thread Rui Santos


J.A. Magallón wrote:
 On Wed, 18 Jul 2007 10:56:11 +0100, Rui Santos [EMAIL PROTECTED] wrote:

   
 Hi,

 I'm getting a strange slow performance behavior on a recently installed
 Server. Here are the details:

 
 ...
   
 I can get a write throughput of 60 MB/sec on each HD by issuing the
 command 'time `dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 /
 4 )); sync`'

 
 ...
   
 The RAID device I'm testing on is /dev/md2. Now, by issuing the same
 command 'dd if=/dev/zero of=test.raw bs=4k count=$(( 1024 * 1024 / 4 ));
 sync`' on the raid device mount point, I get the following speeds:
 With stripe_cache_size at default '265': 51 MB/sec
 With stripe_cache_size at '8192': 73 MB/sec

 

 I know many people consider this stupid, but can you post some hdparm -tT
 data ?
   

Of course. Here's the output:

NewServer-RD:~ # hdparm -tT /dev/md2

/dev/md2:
 Timing cached reads:   1738 MB in  2.00 seconds = 868.93 MB/sec
 Timing buffered disk reads:  444 MB in  3.01 seconds = 147.69 MB/sec


NewServer-RD:~ # hdparm --direct -tT /dev/md2

/dev/md2:
 Timing O_DIRECT cached reads:   290 MB in  2.01 seconds = 144.05 MB/sec
 Timing O_DIRECT disk reads:  396 MB in  3.01 seconds = 131.75 MB/sec

 The culprit can be the filesystem+pagecache, the md driver or the disk
 driver, so I think trying just hdparm will show if the disk o md are
 going nuts...

 In my case, I have a box with 2 raids, one with SCSI disks and one with
 IDE ones.

 Some results:

 lsscsi:
 [0:0:0:0]diskIBM  DDYS-T18350N S96H  /dev/sda
 [2:0:0:0]diskSEAGATE  ST336807LW   0C01  /dev/sdb
 [2:0:1:0]diskSEAGATE  ST336807LW   0C01  /dev/sdc
 [2:0:2:0]diskSEAGATE  ST336807LW   0C01  /dev/sdd
 [2:0:3:0]diskSEAGATE  ST336807LW   0C01  /dev/sde
 [3:0:0:0]diskATA  ST3120022A   3.06  /dev/sdf
 [3:0:1:0]cd/dvd  HL-DT-ST DVDRAM GSA-4040B A300  /dev/sr0
 [4:0:0:0]diskATA  ST3120022A   3.76  /dev/sdg


 /dev/md0:
 Version : 00.90.03
   Creation Time : Mon Jun 18 13:40:57 2007
  Raid Level : raid5
  Array Size : 107522304 (102.54 GiB 110.10 GB)
   Used Dev Size : 35840768 (34.18 GiB 36.70 GB)
Raid Devices : 4
   Total Devices : 4
 Preferred Minor : 0
 Persistence : Superblock is persistent

 Update Time : Wed Jul 18 13:31:22 2007
   State : clean
  Active Devices : 4
 Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0

  Layout : left-symmetric
  Chunk Size : 256K

UUID : 51ad72a7:a4d20d15:0f3ea3a1:5ccb49a0
  Events : 0.2

 Number   Major   Minor   RaidDevice State
0   8   170  active sync   /dev/sdb1
1   8   331  active sync   /dev/sdc1
2   8   492  active sync   /dev/sdd1
3   8   653  active sync   /dev/sde1

 This is, four scsi disks on a Adaptec U320, doing raid5:

 /dev/sdb:
  Timing cached reads:   904 MB in  2.00 seconds = 451.84 MB/sec
  Timing buffered disk reads:  228 MB in  3.00 seconds =  75.90 MB/sec
 /dev/sdc:
  Timing buffered disk reads:  226 MB in  3.01 seconds =  75.01 MB/sec
 /dev/sdd:
  Timing buffered disk reads:  228 MB in  3.00 seconds =  75.88 MB/sec
 /dev/sde:
  Timing buffered disk reads:  226 MB in  3.00 seconds =  75.31 MB/sec

 /dev/md0:
  Timing buffered disk reads:  562 MB in  3.01 seconds = 186.88 MB/sec

 Nearly 75x3 = 215 Mb/s. And this looks like a small regression, I remember
 to have seen 200Mb on this setup on previous kernels.
 Performance is like 186/215 = 86%.

 And /dev/md1, raid0 on 2 IDE disks:

 /dev/sdf:
  Timing buffered disk reads:  148 MB in  3.02 seconds =  48.93 MB/sec
 /dev/sdg:
  Timing buffered disk reads:  124 MB in  3.00 seconds =  41.33 MB/sec

 /dev/md1:
  Timing buffered disk reads:  204 MB in  3.01 seconds =  67.68 MB/sec

 Performance: 67 / 90 = 75%, more or less...not too good.

 Now that I read the hdparm man page, perhaps would be better to repeat
 the tests with hdparm --direct.

 --
 J.A. Magallon jamagallon()ono!com \   Software is like sex:
  \ It's better when it's free
 Mandriva Linux release 2008.0 (Cooker) for i586
 Linux 2.6.21-jam12 (gcc 4.2.1 20070704 (4.2.1-3mdv2008.0)) SMP PREEMPT
 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0



   
Thanks for your reply.
Rui Santos

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Soft-RAID 5 performance

2007-07-18 Thread koan

Are you sure about that chunk size? In you initial posting you show
/proc/mdstat reporting:

md2 : active raid5 sdc3[2] sda3[0] sdb3[1]
 780083968 blocks level 5, 128k chunk, algorithm 2 [3/3] [UUU]

Which would seem to state a 128K chunk, and thus with a 4k block size
you would need a stride of 32.



On 7/18/07, Rui Santos [EMAIL PROTECTED] wrote:

koan wrote:
 How did you create the ext3 filesystem?

The chunk_size is at 256KB, ext3 block size is 4k. I believe the correct
option that should be passed trough to --stride is 64.
Am I correct ?

I've also tested ( after sending my first report ) with xfs.
I've also increases readahead to 65535 on all HD's
I've also increases the stripe_cache_size to 16384.

I can now get ~100MB/sec...


 Did you use the appropriate --stride option as noted here:
 http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html (#5.11)
 -
 To unsubscribe from this list: send the line unsubscribe
 linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/






-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/