Re: Very slow LVM performance

2010-07-13 Thread Stan Hoeppner
Arcady Genkin put forth on 7/12/2010 10:49 PM:



> After dealing with all the idiosyncrasies of iSCSI and software RAID
> under Linux I am a bit skeptical whether what we are building is going
> to actually be better than a black-box fiber-attached RAID solution,
> but it surely is cheaper and more expandable.

I share your skepticism.  Cheaper in initial acquisition cost, yes, but maybe
not long term reliability and serviceability.  Have you performed manual
catastrophic iSCSI target node failure tests yet, monitored the
node/disk/array reconstruction process to verify it all works as expected
without user interruption?  This is always the main concern with homegrown
storage systems of this nature, and where "black box" solutions typically
prove themselves more cost effective (at least in user good will $$) than home
brew solutions.

I myself am a fan of Nexsan storage arrays.  They offer some of the least
expensive and most feature rich and performant FC and iSCSI arrays on the
market.  Given what you've built, it would appear the SATABeast would fit your
needs.  42 SATA drives in a 4U chassis, dual controllers with 4 x 4Gb FC ports
and 4 x 1GbE iSCSI ports, 600MB/s sustained per controller, 1.2GB/s with both
controllers, up to 4GB read/write battery backed cache per controller, web
management/snmp/email alerts via mngt 10/100 ethernet port, etc, etc.  The web
management interface is particularly nice making it almost too easy to
configure and manage arrays and LUN assignments.

http://www.nexsan.com/satabeast.php

One of these will run somewhere between $20-40k depending on disk qty/size/rpm
and whether you want/need both controllers.  They also offer an SAS version
with 15krpm drives at higher cost.  I've installed a couple of the single
controller SATABeast models and the discontinued SATABlade model.  They've
performed flawlessly, no drive failures to date.  Last I checked Nexsan still
uses only Hitachi (formerly IBM) UltraStar drives.

Good product/solution all around.  If you end up in the market for a "black
box" storage solution after all, I'd recommend you start your search with
Nexsan.  I'm not selling here, just a very happy customer.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c3ce8e8.1010...@hardwarefreak.com



Re: Very slow LVM performance

2010-07-12 Thread Arcady Genkin
On Mon, Jul 12, 2010 at 22:28, Stan Hoeppner  wrote:
> I'm curious as to why you're (apparently) wasting 2/3 of your storage for
> redundancy.  Have you considered a straight RAID 10 across those 30
> disks/LUNs?

This is a very good question.  And the answer is: because Linux's MD
does not implement RAID10 the way we expected (as you have found out
for yourself).  We started out thinking exactly that we'd have a
RAID10 stripe with cardinality of 3, instead of the multi-layered MD
design.  But for us it's important to have full control over what
physical disks form the triplets (see below for discussion); instead,
MD's so-called RAID10 only guarantees that there will be exactly N
copies of each chunk on N different drives, but makes on promise as to
on *which* drives.

The reason the drive assignment is important to us is that we can
achieve more data redundancy if we form each triplet from an iSCSI
disk that lives on a different iSCSI target (host).

Suppose that you have six iSCSI target hosts h0 through h5, and each
of them has five disks d0 through d4.  Then if you form the first
triplet as (h0:d0, h1:d0, h2:d0), and so forth until (h3:d4, h4:d4,
h5:d4), then if any iSCSI host goes down for whatever reason, then all
triplets still stay up and are still redundant, only running on two
copies instead of three.

Linux's RAID10 implementation did not allow us to do this.  So we had
to layer by first creating RAID1 (or RAID10 with n=3) triplets, and
striping them in a higher layer.

> I'm also curious as to why you're running software RAID at all given the fact
> than pretty much every iSCSI target is itself an array controller with built
> in hardware RAID.  Can you tell us a little bit about your iSCSI target 
> devices?

Our boss wanted us to only use commodity hardware to build this
solution, so we don't employ any fancy RAID controllers - all drives
are connected to on-board SATA ports.  Staying away from the "black
box" implementations as much as possible was also part of the wish
list.

After dealing with all the idiosyncrasies of iSCSI and software RAID
under Linux I am a bit skeptical whether what we are building is going
to actually be better than a black-box fiber-attached RAID solution,
but it surely is cheaper and more expandable.
-- 
Arcady Genkin


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktinwfqgcutlg6rm0c5pmgtgzwnvfxc-kbzsdq...@mail.gmail.com



Re: Very slow LVM performance

2010-07-12 Thread Aaron Toponce
On 07/12/2010 06:26 PM, Stan Hoeppner wrote:
> Now, you can argue what RAID 10 is from now until you are blue in the face,
> and the list is tired of hearing it.  But that won't change the industry
> definition of RAID 10.  It's been well documented for over 15 years and won't
> be changing any time soon.

Your lack of understanding content and subject matter is rather unfortunate.

-- 
. O .   O . O   . . O   O . .   . O .
. . O   . O O   O . O   . O O   . . O
O O O   . O .   . O O   O O .   O O O



signature.asc
Description: OpenPGP digital signature


Re: Very slow LVM performance

2010-07-12 Thread Arcady Genkin
On Mon, Jul 12, 2010 at 20:06, Stan Hoeppner  wrote:

> I had the same reaction Mike.  Turns out mdadm actually performs RAID 1E with
> 3 disks when you specify RAID 10.  I'm not sure what, if any, benefit RAID 1E
> yields here--almost nobody uses it.

The people who are surprised to see us do RAID10 over three devices
probably overlooked that we do RAID10 with cardinality of 3, which, in
combination with "--layout=n3" is almost an equivalent of creating a
three-way RAID1 mirror.  I'm saying "almost" because it's equivalent
in as much as each of the three disks is an exact copy of the others,
but the difference is in performance.

We found out empirically (and then confirmed by reading a number of
posts on the 'net) that MD does not implement RAID1 in, let's say, the
most desirable way.  In particular, it does not make use of the data
redundancy for reading when you have only one process doing the
reading.  In other words, if you have a three-way RAID1 mirror, and
only one reader process, MD would read from only one of the disks, so
you don't get performance benefit from using the mirror.  If you have
more than one large read, or more than one process reading, then MD
does the right thing and uses the disks in what seems to be a round
robin algorithm (I may be wrong about this).

When we tried using RAID10 with n=3 instead of RAID1, we saw much
better performance.  And we verified that all
three disks are bit-to-bit exact copies.

> I just hope the OP gets prompt and concise drive failure information the
> instant one goes down, and has a tested array rebuild procedure in place.
> Rebuilding a failed drive in this kind of setup may get a bit hairy.

Actually, it's the other way around because you get quite a bit of
redundancy from doing the three-way mirroring.  You are still
redundant if you loose just one drive, and we are planning to have
about four global hot spares standing by in case a drive fails.
-- 
Arcady Genkin


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktil-nzmsyi8ubqnblrbhqkvdr4angpgsvi9uh...@mail.gmail.com



Re: Very slow LVM performance

2010-07-12 Thread Stan Hoeppner
Arcady Genkin put forth on 7/12/2010 12:45 PM:
> I just tried to use LVM for striping the RAID1 triplets together
> (instead of MD).  Using the following three commands to create the
> logical volume, I get 550 MB/s sequential read speed, which is quite
> faster than before, but is still 10% slower than what plain MD RAID0
> stripe can do with the same disks (612 MB/s).
> 
>   pvcreate /dev/md{0,5,1,6,2,7,3,8,4,9}
>   vgcreate vg0 /dev/md{0,5,1,6,2,7,3,8,4,9}
>   lvcreate -i 10 -I 1024 -l 102390 vg0
> 
> test4:~# dd of=/dev/null bs=8K count=250 if=/dev/vg0/lvol0
> 250+0 records in
> 250+0 records out
> 2048000 bytes (20 GB) copied, 37.2381 s, 550 MB/s
> 
> I would still like to know why LVM on top of RAID0 performs so poorly
> in our case.

I'm curious as to why you're (apparently) wasting 2/3 of your storage for
redundancy.  Have you considered a straight RAID 10 across those 30
disks/LUNs?  Performance should be enhanced by about 50%  or more over your
current setup (assuming you're not hitting your ethernet b/w limits
currently), and you'd only be losing half your storage to fault tolerance
instead of 2/3rds of it.  RAID 10 has the highest fault tolerance of all
standard RAID levels and higher performance than anything but a straight stripe.

I'm guessing lvm wouldn't have any problems atop a straight mdadm RAID 10
across those 30 disks.  I'm also guessing the previous lvm problem you had was
probably due to running it atop nested mdadm RAID devices.  Straight mdadm
RAID 10 doesn't create or use nested devices.

I'm also curious as to why you're running software RAID at all given the fact
than pretty much every iSCSI target is itself an array controller with built
in hardware RAID.  Can you tell us a little bit about your iSCSI target devices?

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c3bcf4a.6090...@hardwarefreak.com



Re: Very slow LVM performance

2010-07-12 Thread Stan Hoeppner
Aaron Toponce put forth on 7/12/2010 6:56 PM:

> The argument is not whether Linux software RAID 10 is standard or not,
> but the requirement of the number of disks that Linux software RAID
> supports. In this case, it supports 2+ disks, regardless what its
> "effectiveness" is.

Yes, it is the argument.  The argument is ensuring _accurate_ information is
presented here for the benefit of others who will go searching for this
information.

The _accurate_ information is that Linux software md RAID 10 on anything less
than 4 disks, or using the md RAID 10 "F2" layout on any number of disks, is
not standard RAID 10.  That is a very important distinction to make, and
that's the reason I'm making it.  That's what the current "argument" is about.

I made the statement that you can't run RAID 10 on 3 disks, and I and the
list, were told that the information I presented was "incorrect".  It wasn't
incorrect at all.  The information presented in rebuttal to it is what was
incorrect.  I'm setting the record straight.

Now, you can argue what RAID 10 is from now until you are blue in the face,
and the list is tired of hearing it.  But that won't change the industry
definition of RAID 10.  It's been well documented for over 15 years and won't
be changing any time soon.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c3bb29b.20...@hardwarefreak.com



Re: Very slow LVM performance

2010-07-12 Thread Stan Hoeppner
Roger Leigh put forth on 7/12/2010 5:45 PM:

> Have a closer look at lvcreate(8).  The last arguments are:
> 
>[-Z|--zero y|n] VolumeGroupName [PhysicalVolumePath[:PE[-PE]]...]

Good catch.  As I said I've never used it before, so I wasn't exactly sure how
it all fits.  Seemed logical that when he went from testing the mdadm device
to the lvm volume and lost almost exactly 10x that a striping issue wrt lvm
may be in play.

> AFAICT the striping options are entirely pointless when layered on
> RAID, and could be responsible for the performance issues if it
> can have a negative impact (such as thrashing the disks if you
> tell it to write multiple stripes to a single disc).

I would have thought so as well, but didn't understand the exact function of
-i at the time.  I thought it was more like the xfs "-d sw=" switch.

>From another post it looks like the OP is making some good progress, although
there are still some minor questions unanswered.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c3bb03a.3050...@hardwarefreak.com



Re: Very slow LVM performance

2010-07-12 Thread Stan Hoeppner
Mike Bird put forth on 7/12/2010 4:00 PM:
> On Mon July 12 2010 12:45:57 Arcady Genkin wrote:
>> Creating the ten 3-way RAID1 triplets - for N in 0 through 9:
>> mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \
>>  --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \
>>  --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ
> 
> RAID 10 with three devices?

I had the same reaction Mike.  Turns out mdadm actually performs RAID 1E with
3 disks when you specify RAID 10.  I'm not sure what, if any, benefit RAID 1E
yields here--almost nobody uses it.

RAID 0 over (10 * RAID 1E) over 6 iSCSI targets isn't something I've ever seen
anyone do.  Not saying it's bad, just...unique.

I just hope the OP gets prompt and concise drive failure information the
instant one goes down, and has a tested array rebuild procedure in place.
Rebuilding a failed drive in this kind of setup may get a bit hairy.

-- 
Stan



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c3badff.9050...@hardwarefreak.com



Re: Very slow LVM performance

2010-07-12 Thread Aaron Toponce
On 7/12/2010 5:52 PM, Stan Hoeppner wrote:
> Aaron Toponce put forth on 7/12/2010 5:16 PM:
>> On 7/12/2010 4:13 PM, Stan Hoeppner wrote:
>>> Is that a typo, or are you turning those 3 disk mdadm sets into RAID10 as
>>> shown above, instead of the 3-way mirror sets you stated previously?  RAID 
>>> 10
>>> requires a minimum of 4 disks, you have 3.  Something isn't right here...
>>
>> Incorrect. The Linux RAID implementation can do level 10 across 3 disks.
>> In fact, it can even do it across 2 disks.
> 
> Only throw the bold "incorrect" or "correct" statements around when you really
> know the subject material.  You don't.  Linux md RAID 10 is not standard RAID
> 10 when used on 2 and 3 drives.  When used on 3 drives it's actually RAID 1E,
> and on two drives it's the same as RAID1.  Another Wikipedia article linked
> within the one you quoted demonstrates this.  Note the page title
> "Non-standard_RAID_levels".

The argument is not whether Linux software RAID 10 is standard or not,
but the requirement of the number of disks that Linux software RAID
supports. In this case, it supports 2+ disks, regardless what its
"effectiveness" is.

Try again.

-- 
. O .   O . O   . . O   O . .   . O .
. . O   . O O   O . O   . O O   . . O
O O O   . O .   . O O   O O .   O O O



signature.asc
Description: OpenPGP digital signature


Re: Very slow LVM performance

2010-07-12 Thread Stan Hoeppner
Aaron Toponce put forth on 7/12/2010 5:16 PM:
> On 7/12/2010 4:13 PM, Stan Hoeppner wrote:
>> Is that a typo, or are you turning those 3 disk mdadm sets into RAID10 as
>> shown above, instead of the 3-way mirror sets you stated previously?  RAID 10
>> requires a minimum of 4 disks, you have 3.  Something isn't right here...
> 
> Incorrect. The Linux RAID implementation can do level 10 across 3 disks.
> In fact, it can even do it across 2 disks.

Only throw the bold "incorrect" or "correct" statements around when you really
know the subject material.  You don't.  Linux md RAID 10 is not standard RAID
10 when used on 2 and 3 drives.  When used on 3 drives it's actually RAID 1E,
and on two drives it's the same as RAID1.  Another Wikipedia article linked
within the one you quoted demonstrates this.  Note the page title
"Non-standard_RAID_levels".

http://en.wikipedia.org/wiki/Non-standard_RAID_levels
Linux MD RAID 10

The Linux kernel software RAID driver (called md, for "multiple device") can
be used to build a classic RAID 1+0 array, but also (since version 2.6.9) as a
single level[4] with some interesting extensions[5].

The standard "near" layout, where each chunk is repeated n times in a k-way
stripe array, is equivalent to the standard RAID-10 arrangement, but it does
not require that n divide k. For example an n2 layout on 2, 3 and 4 drives
would look like:

2 drives 3 drives4 drives
 ----
A1  A1   A1  A1  A2A1  A1  A2  A2
A2  A2   A2  A3  A3A3  A3  A4  A4
A3  A3   A4  A4  A5A5  A5  A6  A6
A4  A4   A5  A6  A6A7  A7  A8  A8
..  ..   ..  ..  ....  ..  ..  ..

*The 4-drive example is identical to a standard RAID-1+0 array, while the
3-drive example is a software implementation of RAID-1E. The 2-drive example
is equivalent RAID 1.*

-- 
Stan





-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c3baabe.2080...@hardwarefreak.com



Re: Very slow LVM performance

2010-07-12 Thread Mike Bird
On Mon July 12 2010 15:16:47 Aaron Toponce wrote:
> Incorrect. The Linux RAID implementation can do level 10 across 3 disks.
> In fact, it can even do it across 2 disks.
>
> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

Thanks, I learned something new today.

Now I guess the question is, does LVM understand the performance
implications of 10 RAID-1E PV's, or would the OP be better off
assigning his 30 devices as 15 RAID-1 PV's.

--Mike Bird


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201007121559.05684.mgb-deb...@yosemite.net



Re: Very slow LVM performance

2010-07-12 Thread Roger Leigh
On Mon, Jul 12, 2010 at 05:13:16PM -0500, Stan Hoeppner wrote:
> Arcady Genkin put forth on 7/12/2010 11:52 AM:
> > On Mon, Jul 12, 2010 at 02:05, Stan Hoeppner  wrote:
> > 
> >> lvcreate -i 10 -I [stripe_size] -l 102389 vg0
> >>
> >> I believe you're losing 10x performance because you have a 10 "disk" mdadm
> >> stripe but you didn't inform lvcreate about this fact.
> > 
> > Hi, Stan:
> > 
> > I believe that the -i and -I options are for using *LVM* to do the
> > striping, am I wrong?  
> 
> If this were the case, lvcreate would require the set of physical or pseudo
> (mdadm) device IDs to stripe across wouldn't it?  There are no options in
> lvcreate to specify physical or pseudo devices.  The only input to lvcreate is
> a volume group ID.  Therefor, lvcreate is ignorant of the physical devices
> underlying it, is it not?

Have a closer look at lvcreate(8).  The last arguments are:

   [-Z|--zero y|n] VolumeGroupName [PhysicalVolumePath[:PE[-PE]]...]

So after the VG, you can specify explicitly the exact PEs within
that VG to stripe across that the -I/-i options configure.

I'm unsure why one would necessarily /want/ to do that.  I run LVM
on top of md RAID1.  Here, I have a single PV on top of the RAID
array, and I can't see that adding additional striping on top of
that would benefit performance in any way.  I can only assume it
makes sense if you /don't/ have underlying RAID and want to tell
LVM to stripe over multiple PVs on different physical discs,
which /would/ have some performance impact since you spread the
I/O over multiple discs.

AFAICT the striping options are entirely pointless when layered on
RAID, and could be responsible for the performance issues if it
can have a negative impact (such as thrashing the disks if you
tell it to write multiple stripes to a single disc).


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


signature.asc
Description: Digital signature


Re: Very slow LVM performance

2010-07-12 Thread Aaron Toponce
On 7/12/2010 4:13 PM, Stan Hoeppner wrote:
> Is that a typo, or are you turning those 3 disk mdadm sets into RAID10 as
> shown above, instead of the 3-way mirror sets you stated previously?  RAID 10
> requires a minimum of 4 disks, you have 3.  Something isn't right here...

Incorrect. The Linux RAID implementation can do level 10 across 3 disks.
In fact, it can even do it across 2 disks.

http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

-- 
. O .   O . O   . . O   O . .   . O .
. . O   . O O   O . O   . O O   . . O
O O O   . O .   . O O   O O .   O O O



signature.asc
Description: OpenPGP digital signature


Re: Very slow LVM performance

2010-07-12 Thread Stan Hoeppner
Arcady Genkin put forth on 7/12/2010 11:52 AM:
> On Mon, Jul 12, 2010 at 02:05, Stan Hoeppner  wrote:
> 
>> lvcreate -i 10 -I [stripe_size] -l 102389 vg0
>>
>> I believe you're losing 10x performance because you have a 10 "disk" mdadm
>> stripe but you didn't inform lvcreate about this fact.
> 
> Hi, Stan:
> 
> I believe that the -i and -I options are for using *LVM* to do the
> striping, am I wrong?  

If this were the case, lvcreate would require the set of physical or pseudo
(mdadm) device IDs to stripe across wouldn't it?  There are no options in
lvcreate to specify physical or pseudo devices.  The only input to lvcreate is
a volume group ID.  Therefor, lvcreate is ignorant of the physical devices
underlying it, is it not?

> In our case (when LVM sits on top of one RAID0
> MD stripe) the option -i does not seem to make sense:
> 
> test4:~# lvcreate -i 10 -I 1024 -l 102380 vg0
>   Number of stripes (10) must not exceed number of physical volumes (1)

It makes sense once you accept the fact that lvcreate is ignorant of the
underlying disk device count/configuration.  Once you accept that fact, you
will realize the -i option is what allows one to educate lvcreate that there
are, in your case, 10 devices underlying it which one desires to stripe data
across.  I believe the -i option exists merely to educate lvcreate about the
underlying device structure.

> My understanding is that LVM should be agnostic of what's underlying
> it as the physical storage, so it should treat the MD stripe as one
> large disk, and thus let the MD device to handle the load balancing
> (which it seems to be doing fine).

If lvcreate is agnostic of the underlying structure, why does it have stripe
width and stripe size options at all?  As a parallel example of this,
filesystems such as XFS are ignorant of underlying disk structure as well.
mkfs.xfs has no less than 4 sub options to optimize its performance atop RAID
stripes.  One of it's options, sw, specifies stripe width, which is the number
of physical or logical devices in the RAID stripe.  In your case, if you use
xfs, this would be "-d sw=10".  These options in lvcreate serve the same
function as those in mkfs.xfs, which is to optimize their performance atop a
RAID stripe.

> Besides, the speed we are getting from the LVM volume is more than
> twice slower than an individual component of the RAID10 stripe.  Even
> if we assume that LVM manages somehow distribute its data so that it
> always hits only one physical disk (a disk triplet in our case), there
> would still be the question why it is doing it *that* slow.  It's 57
> MB/s vs 134 MB/s that an individual triplet can do:

Forget comparing performance to one of your single mdadm mirror sets.  What's
key here, and why I suggested "lvcreate -i 10 .." to begin with, is the fact
that your lvm performance is almost exactly 10 times lower than the underlying
mdadm device, which has exactly 10 physical stripes.  Isn't that more than
just a bit coincidental?  The 10x drop only occurs when talking to the lvm
device.  Put on your Sherlock Holmes hat for a minute.

> We are using chunk size of 1024 (i.e. 1MB) with the MD devices.  For
> the record, we used the following commands to create the md devices:
> 
> For N in 0 through 9:
> mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \
>   --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \
>   --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ

Is that a typo, or are you turning those 3 disk mdadm sets into RAID10 as
shown above, instead of the 3-way mirror sets you stated previously?  RAID 10
requires a minimum of 4 disks, you have 3.  Something isn't right here...

> Then the big stripe:
> mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe \
>   --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9}

And I'm pretty sure this is the stripe lvcreate needs to know about to fix the
10x performance drop issue.  Create a new lvm test volume with the lvcreate
options I've mentioned, and see how it performs against the current 400GB test
volume that's running slow.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c3b937c.1080...@hardwarefreak.com



Re: Very slow LVM performance

2010-07-12 Thread Mike Bird
On Mon July 12 2010 12:45:57 Arcady Genkin wrote:
> Creating the ten 3-way RAID1 triplets - for N in 0 through 9:
> mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \
>  --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \
>  --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ

RAID 10 with three devices?

--Mike Bird


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201007121400.42233.mgb-deb...@yosemite.net



Re: Very slow LVM performance

2010-07-12 Thread Aaron Toponce
On 7/12/2010 1:45 PM, Arcady Genkin wrote:
> Creating the ten 3-way RAID1 triplets - for N in 0 through 9:
> mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \
>  --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \
>  --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ
> 
> Then the big stripe:
> mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe \
>  --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9}

I must admit, that I haven't seen a software RAID implementation where
you create multiple devices from the same set of disks, then stripe
across those devices. As such, when using LVM, I'm not exactly sure how
the kernel will handle that- mostly if it will see the appropriate
amount of disk, and what physical extents it will use to place the data.
So for me, this is uncharted territory.

But, your commands look sound. I might suggest changing the default PE
size from 4MB to 1MB. That might help. Worth testing anyway. The PE size
can be changed with 'vgcreate -s 1M'.

However, do you really want --bitmap with your mdadm command? I
understand the benefits, but using 'internal' does come with a
performance hit.

> From the man page to 'lvcreate' it seems that the -c option sets the
> chunk size for something snapshot-related, so it should have no
> bearing in our performance testing, which involved no snapshots.  Am I
> misreading the man page?

Ah yes, you are correct. I should probably pull up the man page before
replying. :)


-- 
. O .   O . O   . . O   O . .   . O .
. . O   . O O   O . O   . O O   . . O
O O O   . O .   . O O   O O .   O O O



signature.asc
Description: OpenPGP digital signature


Re: Very slow LVM performance

2010-07-12 Thread Arcady Genkin
On Mon, Jul 12, 2010 at 14:54, Aaron Toponce  wrote:
> Can you provide the commands from start to finish when building the volume?
>
> fdisk ...
> mdadm ...
> pvcreate ...
> vgcreate ...
> lvcreate ...

Hi, Aaron, I already provided all of the above commands in earlier
messages (except for fdisk, since we are giving the entire disks to
MD, not partitions).  I'll repeat them here for your convenience:

Creating the ten 3-way RAID1 triplets - for N in 0 through 9:
mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \
 --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \
 --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ

Then the big stripe:
mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe \
 --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9}

Then the LVM business:
pvcreate /dev/md10
vgcreate vg0 /dev/md10
lvcreate -l 102389 vg0

Note that the file system is not being created on top of LVM at this
point, and I ran the test by simply dd-ing /dev/vg0/lvol0.

> My experience has been that LVM will introduce about a 1-2% performance
> hit compared to not using it

This is what we were expecting, it's encouraging.

> On a side note, I've never seen any reason to increase or decrease the
> chunk size with software RAID. However, you may want to match your chunk
> size with '-c' for 'lvcreate'.

We have tested a variety of chunk sizes (from 64K to 4MB) with
bonnie++ and found that 1MB chunks worked the best for our usage,
which is a general purpose NFS server, so it's mainly small random
reads.  In this scenario it's best to tune the chunk size to increase
the probability that a small read from the stripe would result in only
one read from the disk.  If the chunk size is too small, then a 1KB
read has a pretty high chance to be fragmented between two chunks,
and, thus, require two I/Os to service instead of one I/O (and, thus,
most likely two drive head seeks instead of just one). Modern
commodity drives can do about only 100-120 seeks per second.  But this
is a side note for your side note. :))

>From the man page to 'lvcreate' it seems that the -c option sets the
chunk size for something snapshot-related, so it should have no
bearing in our performance testing, which involved no snapshots.  Am I
misreading the man page?

Thanks!
-- 
Arcady Genkin


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktild4umo3vaq7h2fokbsnt8xl2fyi-8vtnpfm...@mail.gmail.com



Re: Very slow LVM performance

2010-07-12 Thread Aaron Toponce
On 7/12/2010 11:45 AM, Arcady Genkin wrote:
> I would still like to know why LVM on top of RAID0 performs so poorly
> in our case.

Can you provide the commands from start to finish when building the volume?

fdisk ...
mdadm ...
pvcreate ...
vgcreate ...
lvcreate ...

etc.

My experience has been that LVM will introduce about a 1-2% performance
hit compared to not using it, in many different situations, whether it
be on top of software/hardware RAID, on plain disk/partitions. So, I'm
curious what commandline options you're passing to each of your
commands, how your partitioned/built your disks, and so forth. Might
help troubleshoot why you're seeing such a hit.

On a side note, I've never seen any reason to increase or decrease the
chunk size with software RAID. However, you may want to match your chunk
size with '-c' for 'lvcreate'.

-- 
. O .   O . O   . . O   O . .   . O .
. . O   . O O   O . O   . O O   . . O
O O O   . O .   . O O   O O .   O O O



signature.asc
Description: OpenPGP digital signature


Re: Very slow LVM performance

2010-07-12 Thread Arcady Genkin
I just tried to use LVM for striping the RAID1 triplets together
(instead of MD).  Using the following three commands to create the
logical volume, I get 550 MB/s sequential read speed, which is quite
faster than before, but is still 10% slower than what plain MD RAID0
stripe can do with the same disks (612 MB/s).

  pvcreate /dev/md{0,5,1,6,2,7,3,8,4,9}
  vgcreate vg0 /dev/md{0,5,1,6,2,7,3,8,4,9}
  lvcreate -i 10 -I 1024 -l 102390 vg0

test4:~# dd of=/dev/null bs=8K count=250 if=/dev/vg0/lvol0
250+0 records in
250+0 records out
2048000 bytes (20 GB) copied, 37.2381 s, 550 MB/s

I would still like to know why LVM on top of RAID0 performs so poorly
in our case.
-- 
Arcady Genkin


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktilcdxiuexhnmb7jf9cxz9k_2tkvi_2qsjtld...@mail.gmail.com



Re: Very slow LVM performance

2010-07-12 Thread Arcady Genkin
On Mon, Jul 12, 2010 at 02:05, Stan Hoeppner  wrote:

> lvcreate -i 10 -I [stripe_size] -l 102389 vg0
>
> I believe you're losing 10x performance because you have a 10 "disk" mdadm
> stripe but you didn't inform lvcreate about this fact.

Hi, Stan:

I believe that the -i and -I options are for using *LVM* to do the
striping, am I wrong?  In our case (when LVM sits on top of one RAID0
MD stripe) the option -i does not seem to make sense:

test4:~# lvcreate -i 10 -I 1024 -l 102380 vg0
  Number of stripes (10) must not exceed number of physical volumes (1)

My understanding is that LVM should be agnostic of what's underlying
it as the physical storage, so it should treat the MD stripe as one
large disk, and thus let the MD device to handle the load balancing
(which it seems to be doing fine).

Besides, the speed we are getting from the LVM volume is more than
twice slower than an individual component of the RAID10 stripe.  Even
if we assume that LVM manages somehow distribute its data so that it
always hits only one physical disk (a disk triplet in our case), there
would still be the question why it is doing it *that* slow.  It's 57
MB/s vs 134 MB/s that an individual triplet can do:

test4:~# dd of=/dev/null bs=8K count=250 if=/dev/md0
250+0 records in
250+0 records out
2048000 bytes (20 GB) copied, 153.084 s, 134 MB/s

> If you specified a chunk size when you created the mdadm RAID 0 stripe, then
> use that chunk size for the lvcreate stripe_size.  Again, if performance is
> still lacking, recreate with whatever chunk size you specified in mdadm and
> multiply that by 10.

We are using chunk size of 1024 (i.e. 1MB) with the MD devices.  For
the record, we used the following commands to create the md devices:

For N in 0 through 9:
mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \
  --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \
  --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ

Then the big stripe:
mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe \
  --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9}

Thanks,
-- 
Arcady Genkin


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktilk5for3gq2w9kvajfe7vgzvqmagyjjbkvfl...@mail.gmail.com



Re: Very slow LVM performance

2010-07-11 Thread Stan Hoeppner
Arcady Genkin put forth on 7/11/2010 10:46 PM:

>  lvcreate -l 102389 vg0

Should be:

lvcreate -i 10 -I [stripe_size] -l 102389 vg0

I believe you're losing 10x performance because you have a 10 "disk" mdadm
stripe but you didn't inform lvcreate about this fact.  Delete the vg, and
then recreate the vg with the above command line, specifying 64 for the stripe
size (the mdadm default).  If performance is still lacking, recreate it again
with 640 for the stripe size.  (I'm not exactly sure of the relationship
between mdadm chunk size and lvm stripe size--it's either equal, or it's mdadm
stripe width * mdadm chunk size)

If you specified a chunk size when you created the mdadm RAID 0 stripe, then
use that chunk size for the lvcreate stripe_size.  Again, if performance is
still lacking, recreate with whatever chunk size you specified in mdadm and
multiply that by 10.

Hope this helps.  Let us know.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c3ab09a.4090...@hardwarefreak.com



Very slow LVM performance

2010-07-11 Thread Arcady Genkin
I'm seeing a 10-fold performance hit when using an LVM2 logical volume
that sits on top of a RAID0 stripe.  Using dd to read directly from
the stripe (i.e. a large sequential read) I get speeds over 600MB/s.
Reading from the logical volume using the same method only gives
around 57MB/s.  I am new to LVM and I need to for the snapshots.
Would anyone suggest where to start looking for the problem?

The server runs the amd64 version of Lenny.  Most packages (including
lvm2) are stock from Lenny, but we had to upgrade the kernel to the
one from lenny-backports (2.6.32).

There are ten RAID1 triplets: md0 through md9 (that's 30 physical
disks arranged into ten 3-way mirrors), connected over iSCSI from six
targets.  The ten triplets are then striped together into a RAID0
stripe /dev/md10.  I don't think we have any issues with the MD
layers, because each of them seems to perform fairly well; it's when
we add LVM into the soup the speeds start getting slow.

test4:~# uname -a
Linux test4 2.6.32-bpo.4-amd64 #1 SMP Thu Apr 8 10:20:24 UTC 2010
x86_64 GNU/Linux

test4:~# dd of=/dev/null bs=8K count=250 if=/dev/md10
250+0 records in
250+0 records out
2048000 bytes (20 GB) copied, 33.4619 s, 612 MB/s

test4:~# dd of=/dev/null bs=8K count=250 if=/dev/vg0/lvol0
250+0 records in
250+0 records out
2048000 bytes (20 GB) copied, 354.951 s, 57.7 MB/s

I used the following commands to create the volume group:

 pvcreate /dev/md10
 vgcreate vg0 /dev/md10
 lvcreate -l 102389 vg0

Here's what LVM reports of its devices:

test4:~# pvdisplay
 --- Physical volume ---
 PV Name   /dev/md10
 VG Name   vg0
 PV Size   399.96 GB / not usable 4.00 MB
 Allocatable   yes (but full)
 PE Size (KByte)   4096
 Total PE  102389
 Free PE   0
 Allocated PE  102389
 PV UUID   ocIGdd-cqcy-GNQl-jxRo-FHmW-THMi-fqofbd

test4:~# vgdisplay
 --- Volume group ---
 VG Name   vg0
 System ID
 Formatlvm2
 Metadata Areas1
 Metadata Sequence No  2
 VG Access read/write
 VG Status resizable
 MAX LV0
 Cur LV1
 Open LV   0
 Max PV0
 Cur PV1
 Act PV1
 VG Size   399.96 GB
 PE Size   4.00 MB
 Total PE  102389
 Alloc PE / Size   102389 / 399.96 GB
 Free  PE / Size   0 / 0
 VG UUID   o2TeAm-gPmZ-VvJc-OSfU-quvW-OB3a-y1pQaB

test4:~# lvdisplay
 --- Logical volume ---
 LV Name/dev/vg0/lvol0
 VG Namevg0
 LV UUIDQ3nA6w-0jgw-ImWY-IYJK-kvMJ-aybW-GAdoOs
 LV Write Accessread/write
 LV Status  available
 # open 0
 LV Size399.96 GB
 Current LE 102389
 Segments   1
 Allocation inherit
 Read ahead sectors auto
 - currently set to 256
 Block device   254:0

Many thanks in advance for any pointers!
-- 
Arcady Genkin


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktiksmhwitdv1_iji72tak_1irx9dxpj2mccah...@mail.gmail.com