stride / stripe alignment on LVM ?

2007-11-01 Thread Janek Kozicki
Hello,

I have raid5 /dev/md1, --chunk=128 --metadata=1.1. On it I have
created LVM volume called 'raid5', and finally a logical volume
'backup'.

Then I formatted it with command:

   mkfs.ext3 -b 4096 -E stride=32 -E resize=550292480 /dev/raid5/backup

And because LVM is putting its own metadata on /dev/md1, the ext3
partition is shifted by some (unknown for me) amount of bytes from
the beginning of /dev/md1.

I was wondering, how big is the shift, and would it hurt the
performance/safety if the `ext3 stride=32` didn't align perfectly
with the physical stripes on HDD?

PS: the resize option is to make sure that I can grow this fs
in the future.

PSS: I looked in the archive but didn't find this question asked
before. I'm sorry if it really was asked.

-- 
Janek Kozicki |
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stride / stripe alignment on LVM ?

2007-11-01 Thread Neil Brown
On Thursday November 1, [EMAIL PROTECTED] wrote:
> Hello,
> 
> I have raid5 /dev/md1, --chunk=128 --metadata=1.1. On it I have
> created LVM volume called 'raid5', and finally a logical volume
> 'backup'.
> 
> Then I formatted it with command:
> 
>mkfs.ext3 -b 4096 -E stride=32 -E resize=550292480 /dev/raid5/backup
> 
> And because LVM is putting its own metadata on /dev/md1, the ext3
> partition is shifted by some (unknown for me) amount of bytes from
> the beginning of /dev/md1.
> 
> I was wondering, how big is the shift, and would it hurt the
> performance/safety if the `ext3 stride=32` didn't align perfectly
> with the physical stripes on HDD?

It is probably better to ask this question on an ext3 list as people
there might know exactly what 'stride' does.

I *think* it causes the inode tables to be offset in different
block-groups so that they are not all on the same drive.  If that is
the case, then an offset causes by LVM isn't going to make any
difference at all.

NeilBrown


> 
> PS: the resize option is to make sure that I can grow this fs
> in the future.
> 
> PSS: I looked in the archive but didn't find this question asked
> before. I'm sorry if it really was asked.

Thanks for trying!
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stride / stripe alignment on LVM ?

2007-11-02 Thread Bill Davidsen

Neil Brown wrote:

On Thursday November 1, [EMAIL PROTECTED] wrote:
  

Hello,

I have raid5 /dev/md1, --chunk=128 --metadata=1.1. On it I have
created LVM volume called 'raid5', and finally a logical volume
'backup'.

Then I formatted it with command:

   mkfs.ext3 -b 4096 -E stride=32 -E resize=550292480 /dev/raid5/backup

And because LVM is putting its own metadata on /dev/md1, the ext3
partition is shifted by some (unknown for me) amount of bytes from
the beginning of /dev/md1.

I was wondering, how big is the shift, and would it hurt the
performance/safety if the `ext3 stride=32` didn't align perfectly
with the physical stripes on HDD?



It is probably better to ask this question on an ext3 list as people
there might know exactly what 'stride' does.

I *think* it causes the inode tables to be offset in different
block-groups so that they are not all on the same drive.  If that is
the case, then an offset causes by LVM isn't going to make any
difference at all.
  


Actually, I think that all of the performance evil Doug was mentioning 
will apply to LVM as well. So if things are poorly aligned, they will be 
poorly handled, a stripe-sized write will not go in a stripe, but will 
overlap chunks and cause all the data from all chunks to be read back 
for a new raid-5 calculation.


So I would expect this to make a very large performance difference, so 
even if it work it would do so slowly.


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stride / stripe alignment on LVM ?

2007-11-02 Thread Michal Soltys

Janek Kozicki wrote:


And because LVM is putting its own metadata on /dev/md1, the ext3
partition is shifted by some (unknown for me) amount of bytes from
the beginning of /dev/md1.



It seems to be multiply of 64KiB. You can specify it during pvcreate, with 
--metadatasize option. It will be rounded to multiply of 64 KiB, and will 
add another 64 KiB on its own. Extents will follow directly after that. 4 
sectors mentioned in pcvreate's man page are covered by that option as well.


So i.e. if you have chunk 1MiB, then pvcreate ... --metadatasize 960K ...
should give you chunk-aligned logical volumes, assuming you have actual 
extent size set appropriately as well. If you use default chunk size, you 
shouldn't need any extra options.


Make sure if it really is this way, after pv/vg/first lv creation. I found 
it experimentally, so ymmv.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stride / stripe alignment on LVM ?

2007-11-02 Thread Janek Kozicki
Bill Davidsen said: (by the date of Fri, 02 Nov 2007 09:01:05 -0400)

> So I would expect this to make a very large performance difference, so 
> even if it work it would do so slowly.

I was trying to find out the stripe layout for few hours, using
hexedit and dd. And I'm baffled:

md1 : active raid5 hda3[0] sda3[1]
  969907968 blocks super 1.1 level 5, 128k chunk, algorithm 2 [3/2] [UU_]
  bitmap: 8/8 pages [32KB], 32768KB chunk

I fill md1 with random data:

# dd bs=128k count=64 if=/dev/urandom of=/dev/md1

# hexedit /dev/md1

I copy/paste (and remove formmatting) the first 32 bytes of /dev/md1,
now I search for those 32 bytes in /dev/hda3 and in /dev/sda3:

# hexedit /dev/hda3
# hexedit /dev/sda3

And no luck! I'd expect the first bytes of /dev/md1 to be on
beginning of the first drive (hda3).

I pick next 20 bytes from /dev/md1 and I can find them on /dev/hda3
starting just after address 0x1. The bytes before and after those
20 bytes are similar to those on /dev/md1. So now I hexedit /dev/md1
and write by hand 32 bytes of 0xAA. Then I look at address 0x1
on /dev/hda3 - and there is no 0xAA at all.

Well.. it's not critical for me, so you can just ignore my mumbling,
I was just wondering what obvious did I miss. There seems to be more
XORing (or sth. else) involved than I expected.

Maybe the disc did not flush writes, and what I see on /dev/md1 is
not yet present on /dev/hda3 (how's that possible?)

Nevertheless, I think that I will resign from LVM, and just put ext3
on /dev/md1, to avoid this stripe misalignment. I wanted LVM here
only because I might wanted to use lvm-snapshot, but I can live
without that. I can already grow /dev/md1 without LVM, but using
mdadm grow.

best regards
-- 
Janek Kozicki |
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stride / stripe alignment on LVM ?

2007-11-03 Thread Doug Ledford
On Fri, 2007-11-02 at 23:16 +0100, Janek Kozicki wrote:
> Bill Davidsen said: (by the date of Fri, 02 Nov 2007 09:01:05 -0400)
> 
> > So I would expect this to make a very large performance difference, so 
> > even if it work it would do so slowly.
> 
> I was trying to find out the stripe layout for few hours, using
> hexedit and dd. And I'm baffled:
> 
> md1 : active raid5 hda3[0] sda3[1]
>   969907968 blocks super 1.1 level 5, 128k chunk, algorithm 2 [3/2] [UU_]
   ^^^

You have the raid superblock in the front of hda3 and sda3, as well as a
bitmap I assume, which means that any data you write to md0 will
actually be written to sda3/hda3 *after* the superblock and bitmap.  If
you run mdadm -D /dev/md1 it will tell you the data offset (in sectors
IIRC).  When you hexedit hda3, you need to jump forward the same number
of sectors to get at the beginning of the actual md1 data.

Of course, being raid5 with one disk missing, the data may or may not be
present in its non-parity format depending on exactly which block you
are looking at.

However, you don't really need to do anything to figure out the stripe
size on your array, you have it already.  All the talk about figuring
out stripe layouts is for external raid devices that hide the raid
layout from you.  When you are talking about your own raid device that
you created with mdadm, you specified the stripe layout when you created
the array.  In your case, the chunk size is 128K, and since you have a 3
disk raid5 array and one chunk in each stripe of a raid5 array is
parity, the amount of data stored per stripe is chunk size * (number of
disks - 1), so 256K in your case.  But you don't even have to align the
lvm to the stripe, just to a chunk, so you really only need to align the
lvm superblock so that data starts at 128K offset into the raid array.

-- 
Doug Ledford <[EMAIL PROTECTED]>
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: stride / stripe alignment on LVM ?

2007-11-03 Thread Janek Kozicki
Doug Ledford said: (by the date of Sat, 03 Nov 2007 14:40:48 -0400)

> so you really only need to align the
> lvm superblock so that data starts at 128K offset into the raid array.

Sorry, I thought that it will be easier to figure this out
experimentally - put LVM here or there, write 128k of data to the
disc (inside LVM partition), then see (with hexedit) if this data is
really split across several discs or not.

In fact I even managed to find where LVM superblock starts inside
RAID, the problem for me was that I wasn't sure where it ends, and
where the actual data, starts, and *THAT* data has to be aligned on
128K offset. Now I know that I should simply look more carefully at
LVM manuals, to see exactly what is the size of LVM superblock.

So I was unable to do that simple 128k test like that:

# dd if=./128k_of_0xAA of=/dev/lvm_raid5/test

then looking for 128k(or 64k or 32k) of 0xAA on hda3 and sda3. 
But most of the time was spent searching the search pattern
(scanning the disc). So my efficiency was low, and in fact I should
have simply used a smaller test partitions (eg. hda4, sda4 with
just 20MB), so scanning would be faster.

With smaller test partitions perhaps I'd have enough time to overcome
the main difficulty - dealing with degraded array (and encoded data).

Possibly I'll try this next time when I'll buy fourth disc to the
array (next year), so I'll be able to have two degraded arrays
of two discs at the same time. Then I could use LVM again and 
"dd" all data from old array to new one, then grow the new array
to use all 4 HDDs.

Currently I just formatted /dev/md1 with ext3, without LVM.

Thanks, I got to remember that in 1.1 the superblock is on the front.
And I shouldn't forget about the bitmap either :)

> If you run mdadm -D /dev/md1 it will tell you the data offset
> (in sectors IIRC).

Uh, I don't see it:

backup:~# mdadm -D /dev/md1
/dev/md1:
Version : 01.01.03
  Creation Time : Fri Nov  2 23:35:37 2007
 Raid Level : raid5
 Array Size : 966807296 (922.02 GiB 990.01 GB)
Device Size : 966807296 (461.01 GiB 495.01 GB)
   Raid Devices : 3
  Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

  Intent Bitmap : Internal

Update Time : Sat Nov  3 20:59:06 2007
  State : active, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 128K

   Name : backup:1  (local to host backup)
   UUID : 22f22c35:99613d52:31d407a6:55bdeb84
 Events : 39975

Number   Major   Minor   RaidDevice State
   0   330  active sync   /dev/hda3
   1   831  active sync   /dev/sda3
   2   002  removed


thanks again for all your helpful responses!
-- 
Janek Kozicki  |
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stride / stripe alignment on LVM ?

2007-11-03 Thread Doug Ledford
On Sat, 2007-11-03 at 21:21 +0100, Janek Kozicki wrote:

> > If you run mdadm -D /dev/md1 it will tell you the data offset
> > (in sectors IIRC).
> 
> Uh, I don't see it:

Sorry, it's part of mdadm -E instead:

[EMAIL PROTECTED] ~]# mdadm -E /dev/sdc1
/dev/sdc1:
  Magic : a92b4efc
Version : 1.1
Feature Map : 0x1
 Array UUID : c746e4f5:b015ffac:7216dbbd:48d973a7
   Name : firewall:home:2
  Creation Time : Mon May 28 20:47:07 2007
 Raid Level : raid1
   Raid Devices : 2

  Used Dev Size : 625137018 (298.09 GiB 320.07 GB)
 Array Size : 625137018 (298.09 GiB 320.07 GB)
Data Offset : 264 sectors
   Super Offset : 0 sectors
  State : clean
Device UUID : 7efd05d5:dd921536:1d1a1750:6ba49303

Internal Bitmap : 8 sectors from superblock
Update Time : Sat Nov  3 21:01:24 2007
   Checksum : 27b3958f - correct
 Events : 2


Array Slot : 0 (0, 1)
   Array State : Uu
[EMAIL PROTECTED] ~]# 

-- 
Doug Ledford <[EMAIL PROTECTED]>
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: stride / stripe alignment on LVM ?

2007-11-07 Thread Goswin von Brederlow
Neil Brown <[EMAIL PROTECTED]> writes:

> On Thursday November 1, [EMAIL PROTECTED] wrote:
>> Hello,
>> 
>> I have raid5 /dev/md1, --chunk=128 --metadata=1.1. On it I have
>> created LVM volume called 'raid5', and finally a logical volume
>> 'backup'.
>> 
>> Then I formatted it with command:
>> 
>>mkfs.ext3 -b 4096 -E stride=32 -E resize=550292480 /dev/raid5/backup
>> 
>> And because LVM is putting its own metadata on /dev/md1, the ext3
>> partition is shifted by some (unknown for me) amount of bytes from
>> the beginning of /dev/md1.
>> 
>> I was wondering, how big is the shift, and would it hurt the
>> performance/safety if the `ext3 stride=32` didn't align perfectly
>> with the physical stripes on HDD?
>
> It is probably better to ask this question on an ext3 list as people
> there might know exactly what 'stride' does.
>
> I *think* it causes the inode tables to be offset in different
> block-groups so that they are not all on the same drive.  If that is
> the case, then an offset causes by LVM isn't going to make any
> difference at all.
>
> NeilBrown

Afaik that is true and I never found any significant speed difference
in ext3 no matter what stripe size I select. The natural speed
fluctuations of e.g. bonnie seem to be bigger than the difference the
stripe size option makes.

But then again I test for large files so inode operations are not that
common. You probably have to test creating lots of dirs and small
files and file deletion to see any effect.

MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stride / stripe alignment on LVM ?

2007-11-07 Thread Goswin von Brederlow
Janek Kozicki <[EMAIL PROTECTED]> writes:

> Doug Ledford said: (by the date of Sat, 03 Nov 2007 14:40:48 -0400)
>
>> so you really only need to align the
>> lvm superblock so that data starts at 128K offset into the raid array.
>
> Sorry, I thought that it will be easier to figure this out
> experimentally - put LVM here or there, write 128k of data to the
> disc (inside LVM partition), then see (with hexedit) if this data is
> really split across several discs or not.
>
> In fact I even managed to find where LVM superblock starts inside
> RAID, the problem for me was that I wasn't sure where it ends, and
> where the actual data, starts, and *THAT* data has to be aligned on
> 128K offset. Now I know that I should simply look more carefully at
> LVM manuals, to see exactly what is the size of LVM superblock.

I would just check with "dmsetup table dev" to which byte offsets the
logical volume gets mapped to.

MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stride / stripe alignment on LVM ?

2007-11-11 Thread Alasdair G Kergon
On Wed, Nov 07, 2007 at 10:00:39AM +0100, Goswin von Brederlow wrote:
> I would just check with "dmsetup table dev" to which byte offsets the
> logical volume gets mapped to.
 
Or use:
  pvs -o+pe_start
(optionally with --units)

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html