Re: RAID1 and load-balancing during read

2007-09-10 Thread Iustin Pop
On Mon, Sep 10, 2007 at 10:51:37PM +0300, Dimitrios Apostolou wrote:
> On Monday 10 September 2007 22:35:30 Iustin Pop wrote:
> > On Mon, Sep 10, 2007 at 10:29:30PM +0300, Dimitrios Apostolou wrote:
> > > Hello list,
> > >
> > > I just created a RAID1 array consisting of two disks. After experiments
> > > with processes *reading* from the device (badblocks, dd) and the iostat
> > > program, I can see that only one disk is being utilised for reading. To
> > > be exact, every time I execute the command one of the two disks is being
> > > randomly used, but the other one has absolutely no activity.
> > >
> > > My question is: why isn't load balancing happening? Is there an option
> > > I'm missing? Until now I though it was the default for all RAID1
> > > implementations.
> >
> > Did you read the archives of this list? This question has been answered,
> > like, 4 times already in the last months.
> >
> > And yes, the driver does do load balancing. Just not as RAID0 does,
> > since it's not RAID0.
> 
> Of course I did a quick search in the archives but couldn't find anything. 
Hmm, it's true that searching does not point out an easy to find
response.

> I'll search better, thanks anyway. Moreover, I think I found the answer in 
> the code after posting. There is a comment somewhere in read_balance() 
> saying "Don't change to another disk for sequential reads". I have to study 
> it a bit to figure out *why* you chose that way. 
Well, from what I understand, you cannot make a mirror behave like a
stripe, plain and simple. There is no simple algorithm that makes
sequential raid behave better.

OTOH, random I/O or multiple threads are being sped up by raid1. And
people have said on the list that using the raid10 module with only two
disks and (IIRC) in offset or far mode will give better read
performance, albeit it reduces write performance.

Hmmm, I think a patch is needed to md.4 in order to explain this right
at the source of the confusion.

thanks,
iustin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Explain the read-balancing algorithm for RAID1 better in md.4

2007-09-10 Thread Iustin Pop
There are many questions on the mailing list about the RAID1 read
performance profile. This patch adds a new paragraph to the RAID1
section in md.4 that details what kind of speed-up one should expect
from RAID1.

Signed-off-by: Iustin Pop <[EMAIL PROTECTED]>
---
this patch is against the git tree of mdadm.

 md.4 |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/md.4 b/md.4
index cf423cb..db39aba 100644
--- a/md.4
+++ b/md.4
@@ -168,6 +168,13 @@ All devices in a RAID1 array should be the same size.  If 
they are
 not, then only the amount of space available on the smallest device is
 used (any extra space on other devices is wasted).
 
+Note that the read balancing done by the driver does not make the RAID1
+performance profile be the same as for RAID0; a single stream of
+sequential input will not be accelerated (e.g. a single dd), but
+multiple sequential streams or a random workload will use more than one
+spindle. In theory, having an N-disk RAID1 will allow N sequential
+threads to read from all disks.
+
 .SS RAID4
 
 A RAID4 array is like a RAID0 array with an extra device for storing
-- 
1.5.3.1

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


unreasonable latentcies under heavy write load

2007-09-10 Thread Jeffrey W. Baker
I'm having some troubles with the system below:

Linux inhale 2.6.18-4-amd64 #1 SMP Thu May 10 01:01:58 UTC 2007 x86_64 GNU/Linux

md2 is a 

md2 : active raid1 sda3[0] sdb3[1]
  484359680 blocks [2/2] [UU]

sda and sdb are both

  Vendor: ATA   Model: ST3500630AS   Rev: 3.AA
  Type:   Direct-Access  ANSI SCSI revision: 05

The problem is _extreme_ latencies under a write load, and also weird
accounting as seen in iostat.  I know I've complained about this over on
linux-mm, but with the raid1 it seems even worse than usual.  Nothing
happens on the system for literally minutes at a stretch.  It takes half
an hour to unpack a 1GB .zip archive (into a 7.4GB directory).  During
the lengthy pauses, md2_raid1 is the only process that gets any time
(normally < 1% CPU).  iostat reads weirdly (iostat -kx 10):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   2.050.003.55   51.070.00   43.33

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda  12.30  3495.00  5.80 89.80   916.80 14119.20   314.56   143.68 
1522.19  10.46 100.04
sdb   0.00  3494.20  0.10 87.50 0.40 13920.40   317.8325.08 
 274.39  10.89  95.44
md0   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
md1   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
md2   0.00 0.00 18.10 2425.10   904.40  9700.40 8.68 
0.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.000.005.10   49.380.00   45.53

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00  2954.60  0.20 99.10 0.80 12481.20   251.40   143.49 
1449.52  10.07 100.04
sdb   0.00  2954.60  0.10 99.00 0.40 12240.40   247.0439.32 
 398.24  10.09 100.04
md0   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
md1   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
md2   0.00 0.00  0.30 16917.30 1.20 67669.20 8.00 
0.000.00   0.00   0.00

That seems a little weird to me.  Why is sda + sdb != md2?  Is md2
really issuing 17000 writes per second over a period of ten seconds?  If
so, why?  Also:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.000.001.25   48.750.00   50.00

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00  2805.69  0.00 104.69 0.00 11638.72   222.35
87.11  848.35   9.54  99.84
sdb   0.00  2806.29  0.00 102.50 0.00 11471.06   223.84   
143.09 1400.85   9.74  99.84
md0   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
md1   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
md2   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00

Is that normal?

-jwb

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 and load-balancing during read

2007-09-10 Thread Dimitrios Apostolou
On Monday 10 September 2007 22:35:30 Iustin Pop wrote:
> On Mon, Sep 10, 2007 at 10:29:30PM +0300, Dimitrios Apostolou wrote:
> > Hello list,
> >
> > I just created a RAID1 array consisting of two disks. After experiments
> > with processes *reading* from the device (badblocks, dd) and the iostat
> > program, I can see that only one disk is being utilised for reading. To
> > be exact, every time I execute the command one of the two disks is being
> > randomly used, but the other one has absolutely no activity.
> >
> > My question is: why isn't load balancing happening? Is there an option
> > I'm missing? Until now I though it was the default for all RAID1
> > implementations.
>
> Did you read the archives of this list? This question has been answered,
> like, 4 times already in the last months.
>
> And yes, the driver does do load balancing. Just not as RAID0 does,
> since it's not RAID0.

Of course I did a quick search in the archives but couldn't find anything. 
I'll search better, thanks anyway. Moreover, I think I found the answer in 
the code after posting. There is a comment somewhere in read_balance() 
saying "Don't change to another disk for sequential reads". I have to study 
it a bit to figure out *why* you chose that way. 


Thanks, 
Dimitris
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 and load-balancing during read

2007-09-10 Thread Iustin Pop
On Mon, Sep 10, 2007 at 10:29:30PM +0300, Dimitrios Apostolou wrote:
> Hello list, 
> 
> I just created a RAID1 array consisting of two disks. After experiments with 
> processes *reading* from the device (badblocks, dd) and the iostat program, I 
> can see that only one disk is being utilised for reading. To be exact, every 
> time I execute the command one of the two disks is being randomly used, but 
> the other one has absolutely no activity. 
> 
> My question is: why isn't load balancing happening? Is there an option I'm 
> missing? Until now I though it was the default for all RAID1 implementations. 

Did you read the archives of this list? This question has been answered,
like, 4 times already in the last months.

And yes, the driver does do load balancing. Just not as RAID0 does,
since it's not RAID0.

regards,
iustin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID1 and load-balancing during read

2007-09-10 Thread Dimitrios Apostolou
Hello list, 

I just created a RAID1 array consisting of two disks. After experiments with 
processes *reading* from the device (badblocks, dd) and the iostat program, I 
can see that only one disk is being utilised for reading. To be exact, every 
time I execute the command one of the two disks is being randomly used, but 
the other one has absolutely no activity. 

My question is: why isn't load balancing happening? Is there an option I'm 
missing? Until now I though it was the default for all RAID1 implementations. 
Even md man page mentions in the RAID1 section: 

  The driver attempts to distribute read requests across all devices to 
maximise performance.


Thanks in advance, 
Dimitris
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Expose the degraded status of an assembled array through sysfs

2007-09-10 Thread Iustin Pop
The 'degraded' attribute is useful to quickly determine if the array is
degraded, instead of parsing 'mdadm -D' output or relying on the other
techniques (number of working devices against number of defined devices, etc.).
The md code already keeps track of this attribute, so it's useful to export it.

Signed-off-by: Iustin Pop <[EMAIL PROTECTED]>
---
Note: I sent this back in January and it people agreed it was a good
idea.  However, it has not been picked up. So here I resend it again.

Patch is against 2.6.23-rc5

Thanks,
Iustin Pop

 drivers/md/md.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index f883b7e..3e3ad71 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2842,6 +2842,12 @@ sync_max_store(mddev_t *mddev, const char *buf, size_t 
len)
 static struct md_sysfs_entry md_sync_max =
 __ATTR(sync_speed_max, S_IRUGO|S_IWUSR, sync_max_show, sync_max_store);
 
+static ssize_t
+degraded_show(mddev_t *mddev, char *page)
+{
+   return sprintf(page, "%i\n", mddev->degraded);
+}
+static struct md_sysfs_entry md_degraded = __ATTR_RO(degraded);
 
 static ssize_t
 sync_speed_show(mddev_t *mddev, char *page)
@@ -2985,6 +2991,7 @@ static struct attribute *md_redundancy_attrs[] = {
&md_suspend_lo.attr,
&md_suspend_hi.attr,
&md_bitmap.attr,
+   &md_degraded.attr,
NULL,
 };
 static struct attribute_group md_redundancy_group = {
-- 
1.5.3.1

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reducing the number of disks a RAID1 expects

2007-09-10 Thread Iustin Pop
On Sun, Sep 09, 2007 at 09:31:54PM -1000, J. David Beutel wrote:
> [EMAIL PROTECTED] ~]# mdadm --grow /dev/md5 -n2
> mdadm: Cannot set device size/shape for /dev/md5: Device or resource busy
>
> mdadm - v1.6.0 - 4 June 2004
> Linux 2.6.12-1.1381_FC3 #1 Fri Oct 21 03:46:55 EDT 2005 i686 athlon i386 
> GNU/Linux

I'm not sure that such an old kernel supports reshaping an array. The
mdadm version should not be a problem, as that message is probably
generated by the kernel.

I'd recommend trying to boot with a newer kernel, even if only for the
duration of the reshape.

regards,
iustin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reducing the number of disks a RAID1 expects

2007-09-10 Thread J. David Beutel

Richard Scobie wrote:

Have a look at the "Grow Mode" section of the mdadm man page.


Thanks!  I overlooked that, although I did look at the man page before 
posting.


It looks as though you should just need to use the same command you 
used to grow it to 3 drives, except specify only 2 this time.


I think I hot-added it.  Anyway, --grow looks like what I need, but I'm 
having some difficulty with it.  The man page says, "Change the size or 
shape of an active array."  But I got:


[EMAIL PROTECTED] ~]# mdadm --grow /dev/md5 -n2
mdadm: Cannot set device size/shape for /dev/md5: Device or resource busy
[EMAIL PROTECTED] ~]# umount /dev/md5
[EMAIL PROTECTED] ~]# mdadm --grow /dev/md5 -n2
mdadm: Cannot set device size/shape for /dev/md5: Device or resource busy

So I tried stopping it, but got:

[EMAIL PROTECTED] ~]# mdadm --stop /dev/md5
[EMAIL PROTECTED] ~]# mdadm --grow /dev/md5 -n2
mdadm: Cannot get array information for /dev/md5: No such device
[EMAIL PROTECTED] ~]# mdadm --query /dev/md5 --scan
/dev/md5: is an md device which is not active
/dev/md5: is too small to be an md component.
[EMAIL PROTECTED] ~]# mdadm --grow /dev/md5 --scan -n2
mdadm: option s not valid in grow mode

Am I trying the right thing, but running into some limitation of my 
version of mdadm or the kernel?  Or am I overlooking something 
fundamental yet again?  md5 looked like this in /proc/mdstat before I 
stopped it:


md5 : active raid1 hdc8[2] hdg8[1]
 58604992 blocks [3/2] [_UU]

For -n the man page says, "This  number can only be changed using --grow 
for RAID1 arrays, and only on kernels which provide necessary support."


Grow mode says, "Various types of growth may be added during 2.6  
development, possibly  including  restructuring  a  raid5 array to have 
more active devices. Currently the only support available is to change 
the "size" attribute for  arrays  with  redundancy,  and  the raid-disks 
attribute of RAID1 arrays.  ...  When  reducing the number of devices in 
a RAID1 array, the slots which are to be removed from the array must 
already be vacant.  That is, the devices that which were in those slots 
must be failed and removed."


I don't know how I overlooked all that the first time, but I can't see 
what I'm overlooking now.


mdadm - v1.6.0 - 4 June 2004
Linux 2.6.12-1.1381_FC3 #1 Fri Oct 21 03:46:55 EDT 2005 i686 athlon i386 
GNU/Linux


Cheers,
11011011
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html