Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Al Boldi
Justin Piszcz wrote:
> On Sat, 13 Jan 2007, Al Boldi wrote:
> > Justin Piszcz wrote:
> > > Btw, max sectors did improve my performance a little bit but
> > > stripe_cache+read_ahead were the main optimizations that made
> > > everything go faster by about ~1.5x.   I have individual bonnie++
> > > benchmarks of [only] the max_sector_kb tests as well, it improved the
> > > times from 8min/bonnie run -> 7min 11 seconds or so, see below and
> > > then after that is what you requested.
> >
> > Can you repeat with /dev/sda only?
>
> For sda-- (is a 74GB raptor only)-- but ok.

Do you get the same results for the 150GB-raptor on sd{e,g,i,k}?

> # uptime
>  16:25:38 up 1 min,  3 users,  load average: 0.23, 0.14, 0.05
> # cat /sys/block/sda/queue/max_sectors_kb
> 512
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=/dev/sda of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 150.891 seconds, 71.2 MB/s
> # echo 192 > /sys/block/sda/queue/max_sectors_kb
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=/dev/sda of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 150.192 seconds, 71.5 MB/s
> # echo 128 > /sys/block/sda/queue/max_sectors_kb
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=/dev/sda of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 150.15 seconds, 71.5 MB/s
>
>
> Does this show anything useful?

Probably a latency issue.  md is highly latency sensitive.

What CPU type/speed do you have?  Bootlog/dmesg?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Justin Piszcz


On Sat, 13 Jan 2007, Al Boldi wrote:

> Justin Piszcz wrote:
> > Btw, max sectors did improve my performance a little bit but
> > stripe_cache+read_ahead were the main optimizations that made everything
> > go faster by about ~1.5x.   I have individual bonnie++ benchmarks of
> > [only] the max_sector_kb tests as well, it improved the times from
> > 8min/bonnie run -> 7min 11 seconds or so, see below and then after that is
> > what you requested.
> >
> > # echo 3 > /proc/sys/vm/drop_caches
> > # dd if=/dev/md3 of=/dev/null bs=1M count=10240
> > 10240+0 records in
> > 10240+0 records out
> > 10737418240 bytes (11 GB) copied, 399.352 seconds, 26.9 MB/s
> > # for i in sde sdg sdi sdk; do   echo 192 >
> > /sys/block/"$i"/queue/max_sectors_kb;   echo "Set
> > /sys/block/"$i"/queue/max_sectors_kb to 192kb"; done
> > Set /sys/block/sde/queue/max_sectors_kb to 192kb
> > Set /sys/block/sdg/queue/max_sectors_kb to 192kb
> > Set /sys/block/sdi/queue/max_sectors_kb to 192kb
> > Set /sys/block/sdk/queue/max_sectors_kb to 192kb
> > # echo 3 > /proc/sys/vm/drop_caches
> > # dd if=/dev/md3 of=/dev/null bs=1M count=10240
> > 10240+0 records in
> > 10240+0 records out
> > 10737418240 bytes (11 GB) copied, 398.069 seconds, 27.0 MB/s
> >
> > Awful performance with your numbers/drop_caches settings.. !
> 
> Can you repeat with /dev/sda only?
> 
> With fresh reboot to shell, then:
> $ cat /sys/block/sda/queue/max_sectors_kb
> $ echo 3 > /proc/sys/vm/drop_caches
> $ dd if=/dev/sda of=/dev/null bs=1M count=10240
> 
> $ echo 192 > /sys/block/sda/queue/max_sectors_kb
> $ echo 3 > /proc/sys/vm/drop_caches
> $ dd if=/dev/sda of=/dev/null bs=1M count=10240
> 
> $ echo 128 > /sys/block/sda/queue/max_sectors_kb
> $ echo 3 > /proc/sys/vm/drop_caches
> $ dd if=/dev/sda of=/dev/null bs=1M count=10240
> 
> > What were your tests designed to show?
> 
> A problem with the block-io.
> 
> 
> Thanks!
> 
> --
> Al
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Here you go:

For sda-- (is a 74GB raptor only)-- but ok.

# uptime
 16:25:38 up 1 min,  3 users,  load average: 0.23, 0.14, 0.05
# cat /sys/block/sda/queue/max_sectors_kb
512
# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/sda of=/dev/null bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 150.891 seconds, 71.2 MB/s
# 


# 
# 
# echo 192 > /sys/block/sda/queue/max_sectors_kb
# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/sda of=/dev/null bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 150.192 seconds, 71.5 MB/s
# echo 128 > /sys/block/sda/queue/max_sectors_kb
# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/sda of=/dev/null bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 150.15 seconds, 71.5 MB/s


Does this show anything useful?


Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Al Boldi
Justin Piszcz wrote:
> Btw, max sectors did improve my performance a little bit but
> stripe_cache+read_ahead were the main optimizations that made everything
> go faster by about ~1.5x.   I have individual bonnie++ benchmarks of
> [only] the max_sector_kb tests as well, it improved the times from
> 8min/bonnie run -> 7min 11 seconds or so, see below and then after that is
> what you requested.
>
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=/dev/md3 of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 399.352 seconds, 26.9 MB/s
> # for i in sde sdg sdi sdk; do   echo 192 >
> /sys/block/"$i"/queue/max_sectors_kb;   echo "Set
> /sys/block/"$i"/queue/max_sectors_kb to 192kb"; done
> Set /sys/block/sde/queue/max_sectors_kb to 192kb
> Set /sys/block/sdg/queue/max_sectors_kb to 192kb
> Set /sys/block/sdi/queue/max_sectors_kb to 192kb
> Set /sys/block/sdk/queue/max_sectors_kb to 192kb
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=/dev/md3 of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 398.069 seconds, 27.0 MB/s
>
> Awful performance with your numbers/drop_caches settings.. !

Can you repeat with /dev/sda only?

With fresh reboot to shell, then:
$ cat /sys/block/sda/queue/max_sectors_kb
$ echo 3 > /proc/sys/vm/drop_caches
$ dd if=/dev/sda of=/dev/null bs=1M count=10240

$ echo 192 > /sys/block/sda/queue/max_sectors_kb
$ echo 3 > /proc/sys/vm/drop_caches
$ dd if=/dev/sda of=/dev/null bs=1M count=10240

$ echo 128 > /sys/block/sda/queue/max_sectors_kb
$ echo 3 > /proc/sys/vm/drop_caches
$ dd if=/dev/sda of=/dev/null bs=1M count=10240

> What were your tests designed to show?

A problem with the block-io.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Bill Davidsen

Justin Piszcz wrote:

# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/md3 of=/dev/null bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 399.352 seconds, 26.9 MB/s
# for i in sde sdg sdi sdk; do   echo 192 > 
/sys/block/"$i"/queue/max_sectors_kb;   echo "Set 
/sys/block/"$i"/queue/max_sectors_kb to 192kb"; done

Set /sys/block/sde/queue/max_sectors_kb to 192kb
Set /sys/block/sdg/queue/max_sectors_kb to 192kb
Set /sys/block/sdi/queue/max_sectors_kb to 192kb
Set /sys/block/sdk/queue/max_sectors_kb to 192kb
# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/md3 of=/dev/null bs=1M count=10240 
10240+0 records in

10240+0 records out
10737418240 bytes (11 GB) copied, 398.069 seconds, 27.0 MB/s

Awful performance with your numbers/drop_caches settings.. !

What were your tests designed to show?
  
To start, I expect then to show change in write, not read... and IIRC (I 
didn't look it up) drop_caches just flushes the caches so you start with 
known memory contents, none.


Justin.

On Fri, 12 Jan 2007, Justin Piszcz wrote:

  

On Fri, 12 Jan 2007, Al Boldi wrote:



Justin Piszcz wrote:
  

RAID 5 TWEAKED: 1:06.41 elapsed @ 60% CPU

This should be 1:14 not 1:06(was with a similarly sized file but not the
same) the 1:14 is the same file as used with the other benchmarks.  and to
get that I used 256mb read-ahead and 16384 stripe size ++ 128
max_sectors_kb (same size as my sw raid5 chunk size)

max_sectors_kb is probably your key. On my system I get twice the read 
performance by just reducing max_sectors_kb from default 512 to 192.


Can you do a fresh reboot to shell and then:
$ cat /sys/block/hda/queue/*
$ cat /proc/meminfo
$ echo 3 > /proc/sys/vm/drop_caches
$ dd if=/dev/hda of=/dev/null bs=1M count=10240
$ echo 192 > /sys/block/hda/queue/max_sectors_kb
$ echo 3 > /proc/sys/vm/drop_caches
$ dd if=/dev/hda of=/dev/null bs=1M count=10240

  


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-12 Thread James Ralston
On 2007-01-12 at 09:39-08 dean gaudet <[EMAIL PROTECTED]> wrote:

> On Thu, 11 Jan 2007, James Ralston wrote:
> 
> > I'm having a discussion with a coworker concerning the cost of
> > md's raid5 implementation versus hardware raid5 implementations.
> > 
> > Specifically, he states:
> > 
> > > The performance [of raid5 in hardware] is so much better with
> > > the write-back caching on the card and the offload of the
> > > parity, it seems to me that the minor increase in work of having
> > > to upgrade the firmware if there's a buggy one is a highly
> > > acceptable trade-off to the increased performance.  The md
> > > driver still commits you to longer run queues since IO calls to
> > > disk, parity calculator and the subsequent kflushd operations
> > > are non-interruptible in the CPU.  A RAID card with write-back
> > > cache releases the IO operation virtually instantaneously.
> > 
> > It would seem that his comments have merit, as there appears to be
> > work underway to move stripe operations outside of the spinlock:
> > 
> > http://lwn.net/Articles/184102/
> > 
> > What I'm curious about is this: for real-world situations, how
> > much does this matter?  In other words, how hard do you have to
> > push md raid5 before doing dedicated hardware raid5 becomes a real
> > win?
> 
> hardware with battery backed write cache is going to beat the
> software at small write traffic latency essentially all the time but
> it's got nothing to do with the parity computation.

I'm not convinced that's true.  What my coworker is arguing is that md
raid5 code spinlocks while it is performing this sequence of
operations:

1.  executing the write
2.  reading the blocks necessary for recalculating the parity
3.  recalculating the parity
4.  updating the parity block

My [admittedly cursory] read of the code, coupled with the link above,
leads me to believe that my coworker is correct, which is why I was
for trolling for [informed] opinions about how much of a performance
hit the spinlock causes.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Justin Piszcz
Btw, max sectors did improve my performance a little bit but 
stripe_cache+read_ahead were the main optimizations that made everything 
go faster by about ~1.5x.   I have individual bonnie++ benchmarks of 
[only] the max_sector_kb tests as well, it improved the times from 8min/bonnie 
run -> 7min 11 seconds or so, see below and then after that is what you 
requested.

# Options used:
# blockdev --setra 1536 /dev/md3 (back to default)
# cat /sys/block/sd{e,g,i,k}/queue/max_sectors_kb
# value: 512
# value: 512
# value: 512
# value: 512
# Test with, chunksize of raid array (128)
# echo 128 > /sys/block/sde/queue/max_sectors_kb
# echo 128 > /sys/block/sdg/queue/max_sectors_kb
# echo 128 > /sys/block/sdi/queue/max_sectors_kb
# echo 128 > /sys/block/sdk/queue/max_sectors_kb

max_sectors_kb128_run1:max_sectors_kb128_run1,4000M,46522,98,109829,19,42776,12,46527,97,86206,14,647.7,1,16:10:16/64,874,9,29123,97,2778,16,852,9,25399,86,1396,10
max_sectors_kb128_run2:max_sectors_kb128_run2,4000M,44037,99,107971,19,42420,12,46385,97,85773,14,628.8,1,16:10:16/64,981,10,23006,77,3185,19,848,9,27891,94,1737,13
max_sectors_kb128_run3:max_sectors_kb128_run3,4000M,46501,98,108313,19,42558,12,46314,97,87697,15,617.0,1,16:10:16/64,864,9,29795,99,2744,16,897,9,29021,98,1439,10
max_sectors_kb128_run4:max_sectors_kb128_run4,4000M,40750,98,108959,19,42519,12,45027,97,86484,14,637.0,1,16:10:16/64,929,10,29641,98,2476,14,883,9,29529,99,1867,13
max_sectors_kb128_run5:max_sectors_kb128_run5,4000M,46664,98,108387,19,42801,12,46423,97,87379,14,642.5,0,16:10:16/64,925,10,29756,99,2759,16,915,10,28694,97,1215,8

162.54user 43.96system 7:12.02elapsed 47%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (5major+1104minor)pagefaults 0swaps
168.75user 43.51system 7:14.49elapsed 48%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (13major+1092minor)pagefaults 0swaps
162.76user 44.18system 7:12.26elapsed 47%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (13major+1096minor)pagefaults 0swaps
178.91user 43.39system 7:24.39elapsed 50%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (13major+1094minor)pagefaults 0swaps
162.45user 43.86system 7:11.26elapsed 47%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (13major+1092minor)pagefaults 0swaps

---

# cat /sys/block/sd[abcdefghijk]/queue/*
cat: /sys/block/sda/queue/iosched: Is a directory
32767
512
128
128
noop [anticipatory] 
cat: /sys/block/sdb/queue/iosched: Is a directory
32767
512
128
128
noop [anticipatory] 
cat: /sys/block/sdc/queue/iosched: Is a directory
32767
128
128
128
noop [anticipatory] 
cat: /sys/block/sdd/queue/iosched: Is a directory
32767
128
128
128
noop [anticipatory] 
cat: /sys/block/sde/queue/iosched: Is a directory
32767
128
128
128
noop [anticipatory] 
cat: /sys/block/sdf/queue/iosched: Is a directory
32767
128
128
128
noop [anticipatory] 
cat: /sys/block/sdg/queue/iosched: Is a directory
32767
128
128
128
noop [anticipatory] 
cat: /sys/block/sdh/queue/iosched: Is a directory
32767
128
128
128
noop [anticipatory] 
cat: /sys/block/sdi/queue/iosched: Is a directory
32767
128
128
128
noop [anticipatory] 
cat: /sys/block/sdj/queue/iosched: Is a directory
32767
128
128
128
noop [anticipatory] 
cat: /sys/block/sdk/queue/iosched: Is a directory
32767
128
128
128
noop [anticipatory] 
# 

(note I am only using four of these (which are raptors, in raid5 for md3))

# cat /proc/meminfo
MemTotal:  2048904 kB
MemFree:   1299980 kB
Buffers:  1408 kB
Cached:  58032 kB
SwapCached:  0 kB
Active:  65012 kB
Inactive:33796 kB
HighTotal: 1153312 kB
HighFree:  1061792 kB
LowTotal:   895592 kB
LowFree:238188 kB
SwapTotal: 2200760 kB
SwapFree:  2200760 kB
Dirty:   8 kB
Writeback:   0 kB
AnonPages:   39332 kB
Mapped:  20248 kB
Slab:37116 kB
SReclaimable:10580 kB
SUnreclaim:  26536 kB
PageTables:   1284 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:   3225212 kB
Committed_AS:   111056 kB
VmallocTotal:   114680 kB
VmallocUsed:  3828 kB
VmallocChunk:   110644 kB
# 

# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/md3 of=/dev/null bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 399.352 seconds, 26.9 MB/s
# for i in sde sdg sdi sdk; do   echo 192 > 
/sys/block/"$i"/queue/max_sectors_kb;   echo "Set 
/sys/block/"$i"/queue/max_sectors_kb to 192kb"; done
Set /sys/block/sde/queue/max_sectors_kb to 192kb
Set /sys/block/sdg/queue/max_sectors_kb to 192kb
Set /sys/block/sdi/queue/max_sectors_kb to 192kb
Set /sys/block/sdk/queue/max_sectors_kb to 192kb
# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/md3 of=/dev/null bs=1M count=10240 
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 398.069 seconds, 27.0 MB/s

Awful performance with your numbers/drop_caches settings.. !

What were your tests designed to show?


Justin.

On Fri, 1

Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Justin Piszcz


On Fri, 12 Jan 2007, Al Boldi wrote:

> Justin Piszcz wrote:
> > RAID 5 TWEAKED: 1:06.41 elapsed @ 60% CPU
> >
> > This should be 1:14 not 1:06(was with a similarly sized file but not the
> > same) the 1:14 is the same file as used with the other benchmarks.  and to
> > get that I used 256mb read-ahead and 16384 stripe size ++ 128
> > max_sectors_kb (same size as my sw raid5 chunk size)
> 
> max_sectors_kb is probably your key. On my system I get twice the read 
> performance by just reducing max_sectors_kb from default 512 to 192.
> 
> Can you do a fresh reboot to shell and then:
> $ cat /sys/block/hda/queue/*
> $ cat /proc/meminfo
> $ echo 3 > /proc/sys/vm/drop_caches
> $ dd if=/dev/hda of=/dev/null bs=1M count=10240
> $ echo 192 > /sys/block/hda/queue/max_sectors_kb
> $ echo 3 > /proc/sys/vm/drop_caches
> $ dd if=/dev/hda of=/dev/null bs=1M count=10240
> 
> 
> Thanks!
> 
> --
> Al
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Ok. sec
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Al Boldi
Justin Piszcz wrote:
> RAID 5 TWEAKED: 1:06.41 elapsed @ 60% CPU
>
> This should be 1:14 not 1:06(was with a similarly sized file but not the
> same) the 1:14 is the same file as used with the other benchmarks.  and to
> get that I used 256mb read-ahead and 16384 stripe size ++ 128
> max_sectors_kb (same size as my sw raid5 chunk size)

max_sectors_kb is probably your key. On my system I get twice the read 
performance by just reducing max_sectors_kb from default 512 to 192.

Can you do a fresh reboot to shell and then:
$ cat /sys/block/hda/queue/*
$ cat /proc/meminfo
$ echo 3 > /proc/sys/vm/drop_caches
$ dd if=/dev/hda of=/dev/null bs=1M count=10240
$ echo 192 > /sys/block/hda/queue/max_sectors_kb
$ echo 3 > /proc/sys/vm/drop_caches
$ dd if=/dev/hda of=/dev/null bs=1M count=10240


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-12 Thread dean gaudet
On Thu, 11 Jan 2007, James Ralston wrote:

> I'm having a discussion with a coworker concerning the cost of md's
> raid5 implementation versus hardware raid5 implementations.
> 
> Specifically, he states:
> 
> > The performance [of raid5 in hardware] is so much better with the
> > write-back caching on the card and the offload of the parity, it
> > seems to me that the minor increase in work of having to upgrade the
> > firmware if there's a buggy one is a highly acceptable trade-off to
> > the increased performance.  The md driver still commits you to
> > longer run queues since IO calls to disk, parity calculator and the
> > subsequent kflushd operations are non-interruptible in the CPU.  A
> > RAID card with write-back cache releases the IO operation virtually
> > instantaneously.
> 
> It would seem that his comments have merit, as there appears to be
> work underway to move stripe operations outside of the spinlock:
> 
> http://lwn.net/Articles/184102/
> 
> What I'm curious about is this: for real-world situations, how much
> does this matter?  In other words, how hard do you have to push md
> raid5 before doing dedicated hardware raid5 becomes a real win?

hardware with battery backed write cache is going to beat the software at 
small write traffic latency essentially all the time but it's got nothing 
to do with the parity computation.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Justin Piszcz
RAID 5 TWEAKED: 1:06.41 elapsed @ 60% CPU

This should be 1:14 not 1:06(was with a similarly sized file but not the 
same) the 1:14 is the same file as used with the other benchmarks.  and to 
get that I used 256mb read-ahead and 16384 stripe size ++ 128 
max_sectors_kb (same size as my sw raid5 chunk size)

On Fri, 12 Jan 2007, Justin Piszcz wrote:

> 
> 
> On Fri, 12 Jan 2007, Michael Tokarev wrote:
> 
> > Justin Piszcz wrote:
> > > Using 4 raptor 150s:
> > > 
> > > Without the tweaks, I get 111MB/s write and 87MB/s read.
> > > With the tweaks, 195MB/s write and 211MB/s read.
> > > 
> > > Using kernel 2.6.19.1.
> > > 
> > > Without the tweaks and with the tweaks:
> > > 
> > > # Stripe tests:
> > > echo 8192 > /sys/block/md3/md/stripe_cache_size
> > > 
> > > # DD TESTS [WRITE]
> > > 
> > > DEFAULT: (512K)
> > > $ dd if=/dev/zero of=10gb.no.optimizations.out bs=1M count=10240
> > > 10240+0 records in
> > > 10240+0 records out
> > > 10737418240 bytes (11 GB) copied, 96.6988 seconds, 111 MB/s
> > []
> > > 8192K READ AHEAD
> > > $ dd if=10gb.16384k.stripe.out of=/dev/null bs=1M
> > > 10240+0 records in
> > > 10240+0 records out
> > > 10737418240 bytes (11 GB) copied, 64.9454 seconds, 165 MB/s
> > 
> > What exactly are you measuring?  Linear read/write, like copying one
> > device to another (or to /dev/null), in large chunks?
> Check bonnie benchmarks below.
> > 
> > I don't think it's an interesting test.  Hint: how many times a day
> > you plan to perform such a copy?
> It is a measurement of raw performance.
> > 
> > (By the way, for a copy of one block device to another, try using
> > O_DIRECT, with two dd processes doing the copy - one reading, and
> > another writing - this way, you'll get best results without huge
> > affect on other things running on the system.  Like this:
> > 
> >  dd if=/dev/onedev bs=1M iflag=direct |
> >  dd of=/dev/twodev bs=1M oflag=direct
> > )
> Interesting, I will take this into consideration-- however, an untar test 
> shows a 2:1 improvement, see below.
> > 
> > /mjt
> > 
> 
> Decompress/unrar a DVD-sized file:
> 
> On the following RAID volumes with the same set of [4] 150GB raptors:
> 
> RAID  0] 1:13.16 elapsed @ 49% CPU
> RAID  4] 2:05.85 elapsed @ 30% CPU 
> RAID  5] 2:01.94 elapsed @ 32% CPU
> RAID  6] 2:39.34 elapsed @ 24% CPU
> RAID 10] 1:52.37 elapsed @ 32% CPU
> 
> RAID 5 Tweaked (8192 stripe_cache & 16384 setra/blockdev)::
> 
> RAID 5 TWEAKED: 1:06.41 elapsed @ 60% CPU
> 
> I did not tweak raid 0, but seeing how RAID5 tweaked is faster than RAID0 
> is good enough for me :)
> 
> RAID0 did 278MB/s read and 317MB/s write (by the way)
> 
> Here are the bonnie results, the times alone speak for themselves, from 8 
> minutes to min and 48-59 seconds.
> 
> # No optimizations:
> # Run Benchmarks
> Default Bonnie: 
> [nr_requests=128,max_sectors_kb=512,stripe_cache_size=256,read_ahead=1536]
> default_run1,4000M,42879,98,105436,19,41081,11,46277,96,87845,15,639.2,1,16:10:16/64,380,4,29642,99,2990,18,469,5,11784,40,1712,12
> default_run2,4000M,47145,99,108664,19,40931,11,46466,97,94158,16,634.8,0,16:10:16/64,377,4,16990,56,2850,17,431,4,21066,71,1800,13
> default_run3,4000M,43653,98,109063,19,40898,11,46447,97,97141,16,645.8,1,16:10:16/64,373,4,22302,75,2793,16,420,4,16708,56,1794,13
> default_run4,4000M,46485,98,110664,20,41102,11,46443,97,93616,16,631.3,1,16:10:16/64,363,3,14484,49,2802,17,388,4,25532,86,1604,12
> default_run5,4000M,43813,98,109800,19,41214,11,46457,97,92563,15,635.1,1,16:10:16/64,376,4,28990,95,2827,17,388,4,22874,76,1817,13
> 
> 169.88user 44.01system 8:02.98elapsed 44%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (6major+1102minor)pagefaults 0swaps
> 161.60user 44.33system 7:53.14elapsed 43%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (13major+1095minor)pagefaults 0swaps
> 166.64user 45.24system 8:00.07elapsed 44%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (13major+1096minor)pagefaults 0swaps
> 161.90user 44.66system 8:00.85elapsed 42%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (13major+1094minor)pagefaults 0swaps
> 167.61user 44.12system 8:03.26elapsed 43%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (13major+1092minor)pagefaults 0swaps
> 
> 
> All optimizations [bonnie++] 
> 
> 168.08user 46.05system 5:55.13elapsed 60%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (16major+1092minor)pagefaults 0swaps
> 162.65user 46.21system 5:48.47elapsed 59%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (7major+1101minor)pagefaults 0swaps
> 168.06user 45.74system 5:59.84elapsed 59%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (7major+1102minor)pagefaults 0swaps
> 168.00user 46.18system 5:58.77elapsed 59%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (13major+1095minor)pagefaults 0swaps
> 167.98user 45.53system 5:56.49elapsed 59%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (5major+1101minor)pagefaults 0s

Re: FailSpare event?

2007-01-12 Thread Ernst Herzberg
On Thursday 11 January 2007 23:23, Neil Brown wrote:
> On Thursday January 11, [EMAIL PROTECTED] wrote:
> > Can someone tell me what this means please? I just received this in
> > an email from one of my servers:
>
> 
>

Same problem here, on different machines. But only with mdadm 2.6, with 
mdadm 2.5.5 no problems.

First machine sends direct after starting mdadm in monitor mode:
(kernel 2.6.20-rc3)
-
event=DeviceDisappeared
mddev=/dev/md1
device=Wrong-Level

Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] 
md1 : active raid0 sdb2[1] sda2[0]
  3904704 blocks 16k chunks
  
md2 : active raid0 sdb3[1] sda3[0]
  153930112 blocks 16k chunks
  
md3 : active raid5 sdf1[3] sde1[2] sdd1[1] sdc1[0]
  732587712 blocks level 5, 16k chunk, algorithm 2 [4/4] []
  
md0 : active raid1 sdb1[1] sda1[0]
  192640 blocks [2/2] [UU]
  
unused devices: 
---
and a second time for md2.
Then every about 60 sec 4 times

event=SpareActive
mddev=/dev/md3

**

Second machine sends about every 60sec 8 messages with:
(kernel 2.6.19.2)
--
event=SpareActive
mddev=/dev/md0
device=

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md1 : active raid1 sdb1[1] sda1[0]
  979840 blocks [2/2] [UU]
  
md3 : active raid5 sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] sdc1[0]
  4899200 blocks level 5, 8k chunk, algorithm 2 [6/6] [UU]
  
md2 : active raid5 sdh2[7] sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] 
sda2[0]
  6858880 blocks level 5, 4k chunk, algorithm 2 [8/8] []
  
md0 : active raid5 sdh3[7] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1] 
sda3[0]
  235086656 blocks level 5, 16k chunk, algorithm 2 [8/8] []
  
unused devices: 

--

Both machines had nerver seen any spare device, and there are no failing 
devices, everything works as expected.


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Justin Piszcz


On Fri, 12 Jan 2007, Michael Tokarev wrote:

> Justin Piszcz wrote:
> > Using 4 raptor 150s:
> > 
> > Without the tweaks, I get 111MB/s write and 87MB/s read.
> > With the tweaks, 195MB/s write and 211MB/s read.
> > 
> > Using kernel 2.6.19.1.
> > 
> > Without the tweaks and with the tweaks:
> > 
> > # Stripe tests:
> > echo 8192 > /sys/block/md3/md/stripe_cache_size
> > 
> > # DD TESTS [WRITE]
> > 
> > DEFAULT: (512K)
> > $ dd if=/dev/zero of=10gb.no.optimizations.out bs=1M count=10240
> > 10240+0 records in
> > 10240+0 records out
> > 10737418240 bytes (11 GB) copied, 96.6988 seconds, 111 MB/s
> []
> > 8192K READ AHEAD
> > $ dd if=10gb.16384k.stripe.out of=/dev/null bs=1M
> > 10240+0 records in
> > 10240+0 records out
> > 10737418240 bytes (11 GB) copied, 64.9454 seconds, 165 MB/s
> 
> What exactly are you measuring?  Linear read/write, like copying one
> device to another (or to /dev/null), in large chunks?
Check bonnie benchmarks below.
> 
> I don't think it's an interesting test.  Hint: how many times a day
> you plan to perform such a copy?
It is a measurement of raw performance.
> 
> (By the way, for a copy of one block device to another, try using
> O_DIRECT, with two dd processes doing the copy - one reading, and
> another writing - this way, you'll get best results without huge
> affect on other things running on the system.  Like this:
> 
>  dd if=/dev/onedev bs=1M iflag=direct |
>  dd of=/dev/twodev bs=1M oflag=direct
> )
Interesting, I will take this into consideration-- however, an untar test 
shows a 2:1 improvement, see below.
> 
> /mjt
> 

Decompress/unrar a DVD-sized file:

On the following RAID volumes with the same set of [4] 150GB raptors:

RAID  0] 1:13.16 elapsed @ 49% CPU
RAID  4] 2:05.85 elapsed @ 30% CPU 
RAID  5] 2:01.94 elapsed @ 32% CPU
RAID  6] 2:39.34 elapsed @ 24% CPU
RAID 10] 1:52.37 elapsed @ 32% CPU

RAID 5 Tweaked (8192 stripe_cache & 16384 setra/blockdev)::

RAID 5 TWEAKED: 1:06.41 elapsed @ 60% CPU

I did not tweak raid 0, but seeing how RAID5 tweaked is faster than RAID0 
is good enough for me :)

RAID0 did 278MB/s read and 317MB/s write (by the way)

Here are the bonnie results, the times alone speak for themselves, from 8 
minutes to min and 48-59 seconds.

# No optimizations:
# Run Benchmarks
Default Bonnie: 
[nr_requests=128,max_sectors_kb=512,stripe_cache_size=256,read_ahead=1536]
default_run1,4000M,42879,98,105436,19,41081,11,46277,96,87845,15,639.2,1,16:10:16/64,380,4,29642,99,2990,18,469,5,11784,40,1712,12
default_run2,4000M,47145,99,108664,19,40931,11,46466,97,94158,16,634.8,0,16:10:16/64,377,4,16990,56,2850,17,431,4,21066,71,1800,13
default_run3,4000M,43653,98,109063,19,40898,11,46447,97,97141,16,645.8,1,16:10:16/64,373,4,22302,75,2793,16,420,4,16708,56,1794,13
default_run4,4000M,46485,98,110664,20,41102,11,46443,97,93616,16,631.3,1,16:10:16/64,363,3,14484,49,2802,17,388,4,25532,86,1604,12
default_run5,4000M,43813,98,109800,19,41214,11,46457,97,92563,15,635.1,1,16:10:16/64,376,4,28990,95,2827,17,388,4,22874,76,1817,13

169.88user 44.01system 8:02.98elapsed 44%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (6major+1102minor)pagefaults 0swaps
161.60user 44.33system 7:53.14elapsed 43%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (13major+1095minor)pagefaults 0swaps
166.64user 45.24system 8:00.07elapsed 44%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (13major+1096minor)pagefaults 0swaps
161.90user 44.66system 8:00.85elapsed 42%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (13major+1094minor)pagefaults 0swaps
167.61user 44.12system 8:03.26elapsed 43%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (13major+1092minor)pagefaults 0swaps


All optimizations [bonnie++] 

168.08user 46.05system 5:55.13elapsed 60%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (16major+1092minor)pagefaults 0swaps
162.65user 46.21system 5:48.47elapsed 59%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (7major+1101minor)pagefaults 0swaps
168.06user 45.74system 5:59.84elapsed 59%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (7major+1102minor)pagefaults 0swaps
168.00user 46.18system 5:58.77elapsed 59%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (13major+1095minor)pagefaults 0swaps
167.98user 45.53system 5:56.49elapsed 59%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (5major+1101minor)pagefaults 0swaps

c6300-optimized:4000M,43976,99,167209,29,73109,22,43471,91,208572,40,511.4,1,16:10:16/64,1109,12,26948,89,2469,14,1051,11,29037,97,2167,16
c6300-optimized:4000M,47455,99,190212,35,70402,21,43167,92,206290,40,503.3,1,16:10:16/64,1071,11,29893,99,2804,16,1059,12,24887,84,2090,16
c6300-optimized:4000M,43979,99,172543,29,71811,21,41760,87,201870,39,498.9,1,16:10:16/64,1042,11,30276,99,2800,16,1063,12,29491,99,2257,17
c6300-optimized:4000M,43824,98,164585,29,73470,22,43098,90,207003,40,489.1,1,16:10:16/64,1045,11,30288,98,2512,15,1018,11,27365,92,2097,16
c6300-optimiz

Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Michael Tokarev
Justin Piszcz wrote:
> Using 4 raptor 150s:
> 
> Without the tweaks, I get 111MB/s write and 87MB/s read.
> With the tweaks, 195MB/s write and 211MB/s read.
> 
> Using kernel 2.6.19.1.
> 
> Without the tweaks and with the tweaks:
> 
> # Stripe tests:
> echo 8192 > /sys/block/md3/md/stripe_cache_size
> 
> # DD TESTS [WRITE]
> 
> DEFAULT: (512K)
> $ dd if=/dev/zero of=10gb.no.optimizations.out bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 96.6988 seconds, 111 MB/s
[]
> 8192K READ AHEAD
> $ dd if=10gb.16384k.stripe.out of=/dev/null bs=1M
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 64.9454 seconds, 165 MB/s

What exactly are you measuring?  Linear read/write, like copying one
device to another (or to /dev/null), in large chunks?

I don't think it's an interesting test.  Hint: how many times a day
you plan to perform such a copy?

(By the way, for a copy of one block device to another, try using
O_DIRECT, with two dd processes doing the copy - one reading, and
another writing - this way, you'll get best results without huge
affect on other things running on the system.  Like this:

 dd if=/dev/onedev bs=1M iflag=direct |
 dd of=/dev/twodev bs=1M oflag=direct
)

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html