Re: [opensuse] Raid5/LVM2/XFS alignment

2008-01-29 Thread Greg Freemyer
On Jan 29, 2008 3:05 PM, Ciro Iriarte [EMAIL PROTECTED] wrote:
 2008/1/28, Greg Freemyer [EMAIL PROTECTED]:
  On Jan 28, 2008 6:41 PM, Ciro Iriarte [EMAIL PROTECTED] wrote:
  
  Ok, I guess you know reads are not significantly impacted by the
  tuning were talking about.  This is mostly about tuning for raid5
  write performance.
 
  Anyway, are you planning to stripe together multiple md5 arrays via
  LVM?  I believe that is what --stripes and --stripesize are for.  (ie.
  If you have 8 drives, you could create 2 raid5 arrays, and use LVM to
  interleave them by using --stripes = 2.)  I've never used that
  feature.
 
  You need to worry about the vg extents.  I think vgcreate
  --physicalextentsize is what you need to tune.  I would make each
  extent an even number of stripes in size.  ie. 768KB * N.  Maybe use
  N=10, so -s 7680K
 
  Assuming your not using lvm strips and since this appears to be a new
  setup, I would also use -C or --contiguous to ensure all the data is
  sequential.  It maybe overkill, but it will further ensure you _avoid_
  LV extents that don't end on a stripe boundary.  (a stripe == 3 raid5
  chunks for you).
 
  Then if you are going to use the snapshot feature, you need to set
  your chunksize efficiently.  If you only are going to have large
  files, then I would use a large LVM snapshot chunksize.  256KB seems
  like a good choice, but I have not benchmarked snapshot chunksizes.
 
  Greg
  --

 Just for the record, dealing with a bug that made the raid hang, found
 a workaround that also gave me performance boost: echo 4096 
 /sys/block/md2/md/stripe_cache_size

 Result:

 mainwks:~ # dd if=/dev/zero bs=1024k count=1000 of=/datos/test
 1000+0 records in
 1000+0 records out
 1048576000 bytes (1,0 GB) copied, 6,78341 s, 155 MB/s

 mainwks:~ # rm /datos/test

 mainwks:~ # dd if=/dev/zero bs=1024k count=2 of=/datos/test
 2+0 records in
 2+0 records out
 2097152 bytes (21 GB) copied, 199,135 s, 105 MB/s

 Ciro

Ciro,

105 MB/s seems strange to me.  I would have expected 75 MB/s or 225MB/ s

ie. For normal non-full stripe i/o, it should be 75MB/s * 4 / 4.
Where 75MB/sec is what I see for one drive typically, the first 4 is
the number of drives that can be doing parallel i/o and the second 4
is the number of i/o's per write.

ie. When you do a non-full stripe write, the kernel has to read the
old checksum.  read the old chunk data, recalc the checksum, write the
new chunk data, write the checksum.

Out of curiosity, on the dd line, do you get better performance if you
set your blocksize to exactly one stripe?  ie. 3x 256KB = 768KB
stripe.   I've read the Linux's raid5 implementation is optimized to
handle full stripe write's.

ie. Writing 3 chunks produces:  Calc new checksum from all new data,
Write d1, d2, d3, p so to get 3 256KB chunks to the drive, the kernel
ends up invoking 4 256KB writes.

Or 75 MB/s * 4 * 3 / 4 = 225 MB / sec

If you have everything optimized, I think you should see the same
performance with a 2-stripe write.  ie. 6x 256KB.  If your
optimization is wrong, you will see a speed improvement because the
alignment between your writes and stripes will be wrong.  With the
bigger write, you will be guaranteed at least one full stripe write.

Thanks
Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence  Technology
http://www.norcrossgroup.com
-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] Raid5/LVM2/XFS alignment

2008-01-29 Thread Ciro Iriarte
2008/1/28, Greg Freemyer [EMAIL PROTECTED]:
 On Jan 28, 2008 6:41 PM, Ciro Iriarte [EMAIL PROTECTED] wrote:
 
 Ok, I guess you know reads are not significantly impacted by the
 tuning were talking about.  This is mostly about tuning for raid5
 write performance.

 Anyway, are you planning to stripe together multiple md5 arrays via
 LVM?  I believe that is what --stripes and --stripesize are for.  (ie.
 If you have 8 drives, you could create 2 raid5 arrays, and use LVM to
 interleave them by using --stripes = 2.)  I've never used that
 feature.

 You need to worry about the vg extents.  I think vgcreate
 --physicalextentsize is what you need to tune.  I would make each
 extent an even number of stripes in size.  ie. 768KB * N.  Maybe use
 N=10, so -s 7680K

 Assuming your not using lvm strips and since this appears to be a new
 setup, I would also use -C or --contiguous to ensure all the data is
 sequential.  It maybe overkill, but it will further ensure you _avoid_
 LV extents that don't end on a stripe boundary.  (a stripe == 3 raid5
 chunks for you).

 Then if you are going to use the snapshot feature, you need to set
 your chunksize efficiently.  If you only are going to have large
 files, then I would use a large LVM snapshot chunksize.  256KB seems
 like a good choice, but I have not benchmarked snapshot chunksizes.

 Greg
 --

Just for the record, dealing with a bug that made the raid hang, found
a workaround that also gave me performance boost: echo 4096 
/sys/block/md2/md/stripe_cache_size

Result:

mainwks:~ # dd if=/dev/zero bs=1024k count=1000 of=/datos/test
1000+0 records in
1000+0 records out
1048576000 bytes (1,0 GB) copied, 6,78341 s, 155 MB/s

mainwks:~ # rm /datos/test

mainwks:~ # dd if=/dev/zero bs=1024k count=2 of=/datos/test
2+0 records in
2+0 records out
2097152 bytes (21 GB) copied, 199,135 s, 105 MB/s

Ciro
-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] Raid5/LVM2/XFS alignment

2008-01-28 Thread Greg Freemyer
On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote:
 Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm
 getting 20mb/s with dd and I think it can be improved. I'll add config
 parameters as soon as i get home. I'm using md raid5 on a motherboard
 with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with
 OpenSUSE [EMAIL PROTECTED]

 Regards,
 Ciro
 --

I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad,
but not outrageous I suppose.  I can get about 4-5GB/min from new sata
drives.  So about 75 MB/sec from a single raw drive (ie. dd
if=/dev/zero of=/dev/sdb bs=4k)

You don't say how your invoking dd.  The default bs is only 512 bytes
I think and that is totally inefficient with the linux kernel.

I typically use 4k which maps to what the kernel uses.  ie. dd
if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but
meaningful test..

I think the default stride is 64k per drive, so if your writing 3x 64K
at a time, you may get perfect alignment and miss the overhead of
having to recalculate the checksum all the time.

As another data point, I would bump that up to 30x 64K and see if you
continue to get speed improvements.

So tell us the write speed for
bs=512
bs=4k
bs=192k
bs=1920k

And the read speeds for the same.  ie.  dd if=big-file of=/dev/null bs=4k, etc.

I would expect the write speed to go up with each increase in bs, but
the read speed to be more or less constant.  Then you need to figure
out what sort of real world block sizes your going to be using.  Once
you have a bs, or collection of bs sizes that match your needs, then
you can start tuning your stack.

Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence  Technology
http://www.norcrossgroup.com
-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] Raid5/LVM2/XFS alignment

2008-01-28 Thread Ciro Iriarte
2008/1/28, Greg Freemyer [EMAIL PROTECTED]:
 On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote:
  Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm
  getting 20mb/s with dd and I think it can be improved. I'll add config
  parameters as soon as i get home. I'm using md raid5 on a motherboard
  with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with
  OpenSUSE [EMAIL PROTECTED]
 
  Regards,
  Ciro
  --

 I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad,
 but not outrageous I suppose.  I can get about 4-5GB/min from new sata
 drives.  So about 75 MB/sec from a single raw drive (ie. dd
 if=/dev/zero of=/dev/sdb bs=4k)

 You don't say how your invoking dd.  The default bs is only 512 bytes
 I think and that is totally inefficient with the linux kernel.

 I typically use 4k which maps to what the kernel uses.  ie. dd
 if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but
 meaningful test..

 I think the default stride is 64k per drive, so if your writing 3x 64K
 at a time, you may get perfect alignment and miss the overhead of
 having to recalculate the checksum all the time.

 As another data point, I would bump that up to 30x 64K and see if you
 continue to get speed improvements.

 So tell us the write speed for
 bs=512
 bs=4k
 bs=192k
 bs=1920k

 And the read speeds for the same.  ie.  dd if=big-file of=/dev/null bs=4k, 
 etc.

 I would expect the write speed to go up with each increase in bs, but
 the read speed to be more or less constant.  Then you need to figure
 out what sort of real world block sizes your going to be using.  Once
 you have a bs, or collection of bs sizes that match your needs, then
 you can start tuning your stack.

 Greg

Hi, posted the first mail from my cell phone, so couldn't add more info

- I created the raid with chunk size= 256k.

mainwks:~ # mdadm --misc --detail /dev/md2
/dev/md2:
Version : 01.00.03
  Creation Time : Sun Jan 27 20:08:48 2008
 Raid Level : raid5
 Array Size : 1465151232 (1397.28 GiB 1500.31 GB)
  Used Dev Size : 976767488 (465.76 GiB 500.10 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 2
Persistence : Superblock is persistent

  Intent Bitmap : Internal

Update Time : Mon Jan 28 17:42:51 2008
  State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 256K

   Name : 2
   UUID : 65cb16de:d89af60e:6cac47da:88828cfe
 Events : 12

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1
   4   8   813  active sync   /dev/sdf1

- Speed reported by hdparm:

mainwks:~ # hdparm -tT /dev/sdc

/dev/sdc:
 Timing cached reads:   1754 MB in  2.00 seconds = 877.60 MB/sec
 Timing buffered disk reads:  226 MB in  3.02 seconds =  74.76 MB/sec
mainwks:~ # hdparm -tT /dev/md2

/dev/md2:
 Timing cached reads:   1250 MB in  2.00 seconds = 624.82 MB/sec
 Timing buffered disk reads:  620 MB in  3.01 seconds = 206.09 MB/sec

- LVM:

mainwks:~ # vgdisplay data
  Incorrect metadata area header checksum
  --- Volume group ---
  VG Name   data
  System ID
  Formatlvm2
  Metadata Areas1
  Metadata Sequence No  5
  VG Access read/write
  VG Status resizable
  MAX LV0
  Cur LV2
  Open LV   2
  Max PV0
  Cur PV1
  Act PV1
  VG Size   1.36 TB
  PE Size   4.00 MB
  Total PE  357702
  Alloc PE / Size   51200 / 200.00 GB
  Free  PE / Size   306502 / 1.17 TB
  VG UUID   KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI

mainwks:~ # pvdisplay /dev/sdc1
  Incorrect metadata area header checksum
  --- Physical volume ---
  PV Name   /dev/md2
  VG Name   data
  PV Size   1.36 TB / not usable 3.75 MB
  Allocatable   yes
  PE Size (KByte)   4096
  Total PE  357702
  Free PE   306502
  Allocated PE  51200
  PV UUID   Axl2c0-RP95-WwO0-inHP-aJEF-6SYJ-Fqhnga

- XFS:

mainwks:~ # xfs_info /dev/data/test
meta-data=/dev/mapper/data-test  isize=256agcount=16, agsize=1638400 blks
 =   sectsz=512   attr=0
data =   bsize=4096   blocks=26214400, imaxpct=25
 =   sunit=16 swidth=48 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=16384, version=1
 =   sectsz=512   sunit=0 blks, lazy-count=0
realtime =none   extsz=4096   blocks=0, rtextents=0

- The reported dd
mainwks:~ # dd if=/dev/zero 

[opensuse] Raid5/LVM2/XFS alignment

2008-01-28 Thread Ciro Iriarte
Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm
getting 20mb/s with dd and I think it can be improved. I'll add config
parameters as soon as i get home. I'm using md raid5 on a motherboard
with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with
OpenSUSE [EMAIL PROTECTED]

Regards,
Ciro
-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] Raid5/LVM2/XFS alignment

2008-01-28 Thread Greg Freemyer
On Jan 28, 2008 3:51 PM, Ciro Iriarte [EMAIL PROTECTED] wrote:
 2008/1/28, Greg Freemyer [EMAIL PROTECTED]:

  On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote:
   Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm
   getting 20mb/s with dd and I think it can be improved. I'll add config
   parameters as soon as i get home. I'm using md raid5 on a motherboard
   with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with
   OpenSUSE [EMAIL PROTECTED]
  
   Regards,
   Ciro
   --
 
  I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad,
  but not outrageous I suppose.  I can get about 4-5GB/min from new sata
  drives.  So about 75 MB/sec from a single raw drive (ie. dd
  if=/dev/zero of=/dev/sdb bs=4k)
 
  You don't say how your invoking dd.  The default bs is only 512 bytes
  I think and that is totally inefficient with the linux kernel.
 
  I typically use 4k which maps to what the kernel uses.  ie. dd
  if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but
  meaningful test..
 
  I think the default stride is 64k per drive, so if your writing 3x 64K
  at a time, you may get perfect alignment and miss the overhead of
  having to recalculate the checksum all the time.
 
  As another data point, I would bump that up to 30x 64K and see if you
  continue to get speed improvements.
 
  So tell us the write speed for
  bs=512
  bs=4k
  bs=192k
  bs=1920k
 
  And the read speeds for the same.  ie.  dd if=big-file of=/dev/null bs=4k, 
  etc.
 
  I would expect the write speed to go up with each increase in bs, but
  the read speed to be more or less constant.  Then you need to figure
  out what sort of real world block sizes your going to be using.  Once
  you have a bs, or collection of bs sizes that match your needs, then
  you can start tuning your stack.
 
  Greg

 Hi, posted the first mail from my cell phone, so couldn't add more info

 - I created the raid with chunk size= 256k.

 mainwks:~ # mdadm --misc --detail /dev/md2
 /dev/md2:
 Version : 01.00.03
   Creation Time : Sun Jan 27 20:08:48 2008
  Raid Level : raid5
  Array Size : 1465151232 (1397.28 GiB 1500.31 GB)
   Used Dev Size : 976767488 (465.76 GiB 500.10 GB)
Raid Devices : 4
   Total Devices : 4
 Preferred Minor : 2
 Persistence : Superblock is persistent

   Intent Bitmap : Internal

 Update Time : Mon Jan 28 17:42:51 2008
   State : active
  Active Devices : 4
 Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0

  Layout : left-symmetric
  Chunk Size : 256K

Name : 2
UUID : 65cb16de:d89af60e:6cac47da:88828cfe
  Events : 12

 Number   Major   Minor   RaidDevice State
0   8   330  active sync   /dev/sdc1
1   8   491  active sync   /dev/sdd1
2   8   652  active sync   /dev/sde1
4   8   813  active sync   /dev/sdf1

 - Speed reported by hdparm:

 mainwks:~ # hdparm -tT /dev/sdc

 /dev/sdc:
  Timing cached reads:   1754 MB in  2.00 seconds = 877.60 MB/sec
  Timing buffered disk reads:  226 MB in  3.02 seconds =  74.76 MB/sec
 mainwks:~ # hdparm -tT /dev/md2

 /dev/md2:
  Timing cached reads:   1250 MB in  2.00 seconds = 624.82 MB/sec
  Timing buffered disk reads:  620 MB in  3.01 seconds = 206.09 MB/sec

 - LVM:

 mainwks:~ # vgdisplay data
   Incorrect metadata area header checksum
   --- Volume group ---
   VG Name   data
   System ID
   Formatlvm2
   Metadata Areas1
   Metadata Sequence No  5
   VG Access read/write
   VG Status resizable
   MAX LV0
   Cur LV2
   Open LV   2
   Max PV0
   Cur PV1
   Act PV1
   VG Size   1.36 TB
   PE Size   4.00 MB
   Total PE  357702
   Alloc PE / Size   51200 / 200.00 GB
   Free  PE / Size   306502 / 1.17 TB
   VG UUID   KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI

 mainwks:~ # pvdisplay /dev/sdc1
   Incorrect metadata area header checksum
   --- Physical volume ---
   PV Name   /dev/md2
   VG Name   data
   PV Size   1.36 TB / not usable 3.75 MB
   Allocatable   yes
   PE Size (KByte)   4096
   Total PE  357702
   Free PE   306502
   Allocated PE  51200
   PV UUID   Axl2c0-RP95-WwO0-inHP-aJEF-6SYJ-Fqhnga

 - XFS:

 mainwks:~ # xfs_info /dev/data/test
 meta-data=/dev/mapper/data-test  isize=256agcount=16, agsize=1638400 blks
  =   sectsz=512   attr=0
 data =   bsize=4096   blocks=26214400, imaxpct=25
  =   sunit=16 swidth=48 blks, unwritten=1
 naming   =version 2  bsize=4096
 log  =internal   bsize=4096   blocks=16384, version=1

Re: [opensuse] Raid5/LVM2/XFS alignment

2008-01-28 Thread Ciro Iriarte
2008/1/28, Greg Freemyer [EMAIL PROTECTED]:
 On Jan 28, 2008 3:51 PM, Ciro Iriarte [EMAIL PROTECTED] wrote:
  2008/1/28, Greg Freemyer [EMAIL PROTECTED]:
 
   On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote:
Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm
getting 20mb/s with dd and I think it can be improved. I'll add config
parameters as soon as i get home. I'm using md raid5 on a motherboard
with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with
OpenSUSE [EMAIL PROTECTED]
   
Regards,
Ciro
--
  
   I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad,
   but not outrageous I suppose.  I can get about 4-5GB/min from new sata
   drives.  So about 75 MB/sec from a single raw drive (ie. dd
   if=/dev/zero of=/dev/sdb bs=4k)
  
   You don't say how your invoking dd.  The default bs is only 512 bytes
   I think and that is totally inefficient with the linux kernel.
  
   I typically use 4k which maps to what the kernel uses.  ie. dd
   if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but
   meaningful test..
  
   I think the default stride is 64k per drive, so if your writing 3x 64K
   at a time, you may get perfect alignment and miss the overhead of
   having to recalculate the checksum all the time.
  
   As another data point, I would bump that up to 30x 64K and see if you
   continue to get speed improvements.
  
   So tell us the write speed for
   bs=512
   bs=4k
   bs=192k
   bs=1920k
  
   And the read speeds for the same.  ie.  dd if=big-file of=/dev/null 
   bs=4k, etc.
  
   I would expect the write speed to go up with each increase in bs, but
   the read speed to be more or less constant.  Then you need to figure
   out what sort of real world block sizes your going to be using.  Once
   you have a bs, or collection of bs sizes that match your needs, then
   you can start tuning your stack.
  
   Greg
 
  Hi, posted the first mail from my cell phone, so couldn't add more info
 
  - I created the raid with chunk size= 256k.
 
  mainwks:~ # mdadm --misc --detail /dev/md2
  /dev/md2:
  Version : 01.00.03
Creation Time : Sun Jan 27 20:08:48 2008
   Raid Level : raid5
   Array Size : 1465151232 (1397.28 GiB 1500.31 GB)
Used Dev Size : 976767488 (465.76 GiB 500.10 GB)
 Raid Devices : 4
Total Devices : 4
  Preferred Minor : 2
  Persistence : Superblock is persistent
 
Intent Bitmap : Internal
 
  Update Time : Mon Jan 28 17:42:51 2008
State : active
   Active Devices : 4
  Working Devices : 4
   Failed Devices : 0
Spare Devices : 0
 
   Layout : left-symmetric
   Chunk Size : 256K
 
 Name : 2
 UUID : 65cb16de:d89af60e:6cac47da:88828cfe
   Events : 12
 
  Number   Major   Minor   RaidDevice State
 0   8   330  active sync   /dev/sdc1
 1   8   491  active sync   /dev/sdd1
 2   8   652  active sync   /dev/sde1
 4   8   813  active sync   /dev/sdf1
 
  - Speed reported by hdparm:
 
  mainwks:~ # hdparm -tT /dev/sdc
 
  /dev/sdc:
   Timing cached reads:   1754 MB in  2.00 seconds = 877.60 MB/sec
   Timing buffered disk reads:  226 MB in  3.02 seconds =  74.76 MB/sec
  mainwks:~ # hdparm -tT /dev/md2
 
  /dev/md2:
   Timing cached reads:   1250 MB in  2.00 seconds = 624.82 MB/sec
   Timing buffered disk reads:  620 MB in  3.01 seconds = 206.09 MB/sec
 
  - LVM:
 
  mainwks:~ # vgdisplay data
Incorrect metadata area header checksum
--- Volume group ---
VG Name   data
System ID
Formatlvm2
Metadata Areas1
Metadata Sequence No  5
VG Access read/write
VG Status resizable
MAX LV0
Cur LV2
Open LV   2
Max PV0
Cur PV1
Act PV1
VG Size   1.36 TB
PE Size   4.00 MB
Total PE  357702
Alloc PE / Size   51200 / 200.00 GB
Free  PE / Size   306502 / 1.17 TB
VG UUID   KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI
 
  mainwks:~ # pvdisplay /dev/sdc1
Incorrect metadata area header checksum
--- Physical volume ---
PV Name   /dev/md2
VG Name   data
PV Size   1.36 TB / not usable 3.75 MB
Allocatable   yes
PE Size (KByte)   4096
Total PE  357702
Free PE   306502
Allocated PE  51200
PV UUID   Axl2c0-RP95-WwO0-inHP-aJEF-6SYJ-Fqhnga
 
  - XFS:
 
  mainwks:~ # xfs_info /dev/data/test
  meta-data=/dev/mapper/data-test  isize=256agcount=16, agsize=1638400 
  blks
   =   sectsz=512   attr=0
  data =   bsize=4096   blocks=26214400, 

Re: [opensuse] Raid5/LVM2/XFS alignment

2008-01-28 Thread Greg Freemyer
On Jan 28, 2008 6:41 PM, Ciro Iriarte [EMAIL PROTECTED] wrote:

 2008/1/28, Greg Freemyer [EMAIL PROTECTED]:
  On Jan 28, 2008 3:51 PM, Ciro Iriarte [EMAIL PROTECTED] wrote:
   2008/1/28, Greg Freemyer [EMAIL PROTECTED]:
  
On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote:
 Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm
 getting 20mb/s with dd and I think it can be improved. I'll add config
 parameters as soon as i get home. I'm using md raid5 on a motherboard
 with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with
 OpenSUSE [EMAIL PROTECTED]

 Regards,
 Ciro
 --
   
I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad,
but not outrageous I suppose.  I can get about 4-5GB/min from new sata
drives.  So about 75 MB/sec from a single raw drive (ie. dd
if=/dev/zero of=/dev/sdb bs=4k)
   
You don't say how your invoking dd.  The default bs is only 512 bytes
I think and that is totally inefficient with the linux kernel.
   
I typically use 4k which maps to what the kernel uses.  ie. dd
if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but
meaningful test..
   
I think the default stride is 64k per drive, so if your writing 3x 64K
at a time, you may get perfect alignment and miss the overhead of
having to recalculate the checksum all the time.
   
As another data point, I would bump that up to 30x 64K and see if you
continue to get speed improvements.
   
So tell us the write speed for
bs=512
bs=4k
bs=192k
bs=1920k
   
And the read speeds for the same.  ie.  dd if=big-file of=/dev/null 
bs=4k, etc.
   
I would expect the write speed to go up with each increase in bs, but
the read speed to be more or less constant.  Then you need to figure
out what sort of real world block sizes your going to be using.  Once
you have a bs, or collection of bs sizes that match your needs, then
you can start tuning your stack.
   
Greg
  
   Hi, posted the first mail from my cell phone, so couldn't add more 
   info
  
   - I created the raid with chunk size= 256k.
  
   mainwks:~ # mdadm --misc --detail /dev/md2
   /dev/md2:
   Version : 01.00.03
 Creation Time : Sun Jan 27 20:08:48 2008
Raid Level : raid5
Array Size : 1465151232 (1397.28 GiB 1500.31 GB)
 Used Dev Size : 976767488 (465.76 GiB 500.10 GB)
  Raid Devices : 4
 Total Devices : 4
   Preferred Minor : 2
   Persistence : Superblock is persistent
  
 Intent Bitmap : Internal
  
   Update Time : Mon Jan 28 17:42:51 2008
 State : active
Active Devices : 4
   Working Devices : 4
Failed Devices : 0
 Spare Devices : 0
  
Layout : left-symmetric
Chunk Size : 256K
  
  Name : 2
  UUID : 65cb16de:d89af60e:6cac47da:88828cfe
Events : 12
  
   Number   Major   Minor   RaidDevice State
  0   8   330  active sync   /dev/sdc1
  1   8   491  active sync   /dev/sdd1
  2   8   652  active sync   /dev/sde1
  4   8   813  active sync   /dev/sdf1
  
   - Speed reported by hdparm:
  
   mainwks:~ # hdparm -tT /dev/sdc
  
   /dev/sdc:
Timing cached reads:   1754 MB in  2.00 seconds = 877.60 MB/sec
Timing buffered disk reads:  226 MB in  3.02 seconds =  74.76 MB/sec
   mainwks:~ # hdparm -tT /dev/md2
  
   /dev/md2:
Timing cached reads:   1250 MB in  2.00 seconds = 624.82 MB/sec
Timing buffered disk reads:  620 MB in  3.01 seconds = 206.09 MB/sec
  
   - LVM:
  
   mainwks:~ # vgdisplay data
 Incorrect metadata area header checksum
 --- Volume group ---
 VG Name   data
 System ID
 Formatlvm2
 Metadata Areas1
 Metadata Sequence No  5
 VG Access read/write
 VG Status resizable
 MAX LV0
 Cur LV2
 Open LV   2
 Max PV0
 Cur PV1
 Act PV1
 VG Size   1.36 TB
 PE Size   4.00 MB
 Total PE  357702
 Alloc PE / Size   51200 / 200.00 GB
 Free  PE / Size   306502 / 1.17 TB
 VG UUID   KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI
  
   mainwks:~ # pvdisplay /dev/sdc1
 Incorrect metadata area header checksum
 --- Physical volume ---
 PV Name   /dev/md2
 VG Name   data
 PV Size   1.36 TB / not usable 3.75 MB
 Allocatable   yes
 PE Size (KByte)   4096
 Total PE  357702
 Free PE   306502
 Allocated PE  51200
 PV UUID   Axl2c0-RP95-WwO0-inHP-aJEF-6SYJ-Fqhnga
  
   - XFS:
  
   mainwks:~ # xfs_info /dev/data/test
  

Re: [opensuse] Raid5/LVM2/XFS alignment

2008-01-28 Thread Ciro Iriarte
2008/1/28, Greg Freemyer [EMAIL PROTECTED]:
 On Jan 28, 2008 6:41 PM, Ciro Iriarte [EMAIL PROTECTED] wrote:
 
  2008/1/28, Greg Freemyer [EMAIL PROTECTED]:
   On Jan 28, 2008 3:51 PM, Ciro Iriarte [EMAIL PROTECTED] wrote:
2008/1/28, Greg Freemyer [EMAIL PROTECTED]:
   
 On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote:
  Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm
  getting 20mb/s with dd and I think it can be improved. I'll add 
  config
  parameters as soon as i get home. I'm using md raid5 on a 
  motherboard
  with nvidia sata controller, 4x500gb samsung sata2 disks and lvm 
  with
  OpenSUSE [EMAIL PROTECTED]
 
  Regards,
  Ciro
  --

 I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad,
 but not outrageous I suppose.  I can get about 4-5GB/min from new sata
 drives.  So about 75 MB/sec from a single raw drive (ie. dd
 if=/dev/zero of=/dev/sdb bs=4k)

 You don't say how your invoking dd.  The default bs is only 512 bytes
 I think and that is totally inefficient with the linux kernel.

 I typically use 4k which maps to what the kernel uses.  ie. dd
 if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but
 meaningful test..

 I think the default stride is 64k per drive, so if your writing 3x 64K
 at a time, you may get perfect alignment and miss the overhead of
 having to recalculate the checksum all the time.

 As another data point, I would bump that up to 30x 64K and see if you
 continue to get speed improvements.

 So tell us the write speed for
 bs=512
 bs=4k
 bs=192k
 bs=1920k

 And the read speeds for the same.  ie.  dd if=big-file of=/dev/null 
 bs=4k, etc.

 I would expect the write speed to go up with each increase in bs, but
 the read speed to be more or less constant.  Then you need to figure
 out what sort of real world block sizes your going to be using.  Once
 you have a bs, or collection of bs sizes that match your needs, then
 you can start tuning your stack.

 Greg
   
Hi, posted the first mail from my cell phone, so couldn't add more 
info
   
- I created the raid with chunk size= 256k.
   
mainwks:~ # mdadm --misc --detail /dev/md2
/dev/md2:
Version : 01.00.03
  Creation Time : Sun Jan 27 20:08:48 2008
 Raid Level : raid5
 Array Size : 1465151232 (1397.28 GiB 1500.31 GB)
  Used Dev Size : 976767488 (465.76 GiB 500.10 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 2
Persistence : Superblock is persistent
   
  Intent Bitmap : Internal
   
Update Time : Mon Jan 28 17:42:51 2008
  State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
   
 Layout : left-symmetric
 Chunk Size : 256K
   
   Name : 2
   UUID : 65cb16de:d89af60e:6cac47da:88828cfe
 Events : 12
   
Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1
   4   8   813  active sync   /dev/sdf1
   
- Speed reported by hdparm:
   
mainwks:~ # hdparm -tT /dev/sdc
   
/dev/sdc:
 Timing cached reads:   1754 MB in  2.00 seconds = 877.60 MB/sec
 Timing buffered disk reads:  226 MB in  3.02 seconds =  74.76 MB/sec
mainwks:~ # hdparm -tT /dev/md2
   
/dev/md2:
 Timing cached reads:   1250 MB in  2.00 seconds = 624.82 MB/sec
 Timing buffered disk reads:  620 MB in  3.01 seconds = 206.09 MB/sec
   
- LVM:
   
mainwks:~ # vgdisplay data
  Incorrect metadata area header checksum
  --- Volume group ---
  VG Name   data
  System ID
  Formatlvm2
  Metadata Areas1
  Metadata Sequence No  5
  VG Access read/write
  VG Status resizable
  MAX LV0
  Cur LV2
  Open LV   2
  Max PV0
  Cur PV1
  Act PV1
  VG Size   1.36 TB
  PE Size   4.00 MB
  Total PE  357702
  Alloc PE / Size   51200 / 200.00 GB
  Free  PE / Size   306502 / 1.17 TB
  VG UUID   KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI
   
mainwks:~ # pvdisplay /dev/sdc1
  Incorrect metadata area header checksum
  --- Physical volume ---
  PV Name   /dev/md2
  VG Name   data
  PV Size   1.36 TB / not usable 3.75 MB
  Allocatable   yes
  PE Size (KByte)   4096
  Total PE