Re: idle array consuming cpu ??!!

2008-01-23 Thread Neil Brown
On Tuesday January 22, [EMAIL PROTECTED] wrote:
> Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15:
>  >On Sunday January 20, [EMAIL PROTECTED] wrote:
>  >> A raid6 array with a spare and bitmap is idle: not mounted and with no
>  >> IO to it or any of its disks (obviously), as shown by iostat. However
>  >> it's consuming cpu: since reboot it used about 11min in 24h, which is 
> quite
>  >> a lot even for a busy array (the cpus are fast). The array was cleanly
>  >> shutdown so there's been no reconstruction/check or anything else.
>  >> 
>  >> How can this be? Kernel is 2.6.22.16 with the two patches for the
>  >> deadlock ("[PATCH 004 of 4] md: Fix an occasional deadlock in raid5 -
>  >> FIX") and the previous one.
>  >
>  >Maybe the bitmap code is waking up regularly to do nothing.
>  >
>  >Would you be happy to experiment?  Remove the bitmap with
>  >   mdadm --grow /dev/mdX --bitmap=none
>  >
>  >and see how that affects cpu usage?
> 
> Confirmed, removing the bitmap stopped cpu consumption.

Thanks.

This patch should substantiallly reduce cpu consumption on an idle
bitmap.

NeilBrown

--
Reduce CPU wastage on idle md array with a write-intent bitmap.

On an md array with a write-intent bitmap, a thread wakes up every few
seconds to and scans the bitmap looking for work to do.  If there
array is idle, there will be no work to do, but a lot of scanning is
done to discover this.

So cache the fact that the bitmap is completely clean, and avoid
scanning the whole bitmap when the cache is known to be clean.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./drivers/md/bitmap.c |   19 +--
 ./include/linux/raid/bitmap.h |2 ++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c   2008-01-24 15:53:45.0 +1100
+++ ./drivers/md/bitmap.c   2008-01-24 15:54:29.0 +1100
@@ -1047,6 +1047,11 @@ void bitmap_daemon_work(struct bitmap *b
if (time_before(jiffies, bitmap->daemon_lastrun + 
bitmap->daemon_sleep*HZ))
return;
bitmap->daemon_lastrun = jiffies;
+   if (bitmap->allclean) {
+   bitmap->mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
+   return;
+   }
+   bitmap->allclean = 1;
 
for (j = 0; j < bitmap->chunks; j++) {
bitmap_counter_t *bmc;
@@ -1068,8 +1073,10 @@ void bitmap_daemon_work(struct bitmap *b
clear_page_attr(bitmap, page, 
BITMAP_PAGE_NEEDWRITE);
 
spin_unlock_irqrestore(&bitmap->lock, flags);
-   if (need_write)
+   if (need_write) {
write_page(bitmap, page, 0);
+   bitmap->allclean = 0;
+   }
continue;
}
 
@@ -1098,6 +1105,9 @@ void bitmap_daemon_work(struct bitmap *b
 /*
   if (j < 100) printk("bitmap: j=%lu, *bmc = 0x%x\n", j, *bmc);
 */
+   if (*bmc)
+   bitmap->allclean = 0;
+
if (*bmc == 2) {
*bmc=1; /* maybe clear the bit next time */
set_page_attr(bitmap, page, BITMAP_PAGE_CLEAN);
@@ -1132,6 +1142,8 @@ void bitmap_daemon_work(struct bitmap *b
}
}
 
+   if (bitmap->allclean == 0)
+   bitmap->mddev->thread->timeout = bitmap->daemon_sleep * HZ;
 }
 
 static bitmap_counter_t *bitmap_get_counter(struct bitmap *bitmap,
@@ -1226,6 +1238,7 @@ int bitmap_startwrite(struct bitmap *bit
sectors -= blocks;
else sectors = 0;
}
+   bitmap->allclean = 0;
return 0;
 }
 
@@ -1296,6 +1309,7 @@ int bitmap_start_sync(struct bitmap *bit
}
}
spin_unlock_irq(&bitmap->lock);
+   bitmap->allclean = 0;
return rv;
 }
 
@@ -1332,6 +1346,7 @@ void bitmap_end_sync(struct bitmap *bitm
}
  unlock:
spin_unlock_irqrestore(&bitmap->lock, flags);
+   bitmap->allclean = 0;
 }
 
 void bitmap_close_sync(struct bitmap *bitmap)
@@ -1399,7 +1414,7 @@ static void bitmap_set_memory_bits(struc
set_page_attr(bitmap, page, BITMAP_PAGE_CLEAN);
}
spin_unlock_irq(&bitmap->lock);
-
+   bitmap->allclean = 0;
 }
 
 /* dirty the memory and file bits for bitmap chunks "s" to "e" */

diff .prev/include/linux/raid/bitmap.h ./include/linux/raid/bitmap.h
--- .prev/include/linux/raid/bitmap.h   2008-01-24 15:53:45.0 +1100
+++ ./include/linux/raid/bitmap.h   2008-01-24 15:54:29.0 +1100
@@ -235,6 +235,8 @@ struct bitmap {
 
unsigned long flags;
 
+   int allclean;
+
unsigned long max_write_behind; /* write-behind mode */
atomi

Re: [BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock

2008-01-23 Thread Neil Brown
On Tuesday January 15, [EMAIL PROTECTED] wrote:
> 
> This message describes the details about md-RAID1 issue found by
> testing the md RAID1 using the SCSI fault injection framework.
> 
> Abstract:
> Both the error handler for md RAID1 and write access request to the md RAID1
> use raid1d kernel thread. The nr_pending flag could cause a race condition
> in raid1d, results in a raid1d deadlock.

Thanks for finding and reporting this.

I believe the following patch should fix the deadlock.

If you are able to repeat your test and confirm this I would
appreciate it.

Thanks,
NeilBrown



Fix deadlock in md/raid1 when handling a read error.

When handling a read error, we freeze the array to stop any other
IO while attempting to over-write with correct data.

This is done in the raid1d thread and must wait for all submitted IO
to complete (except for requests that failed and are sitting in the
retry queue - these are counted in ->nr_queue and will stay there during
a freeze).

However write requests need attention from raid1d as bitmap updates
might be required.  This can cause a deadlock as raid1 is waiting for
requests to finish that themselves need attention from raid1d.

So we create a new function 'flush_pending_writes' to give that attention,
and call it in freeze_array to be sure that we aren't waiting on raid1d.

Thanks to "K.Tanaka" <[EMAIL PROTECTED]> for finding and reporting
this problem.

Cc: "K.Tanaka" <[EMAIL PROTECTED]>
Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./drivers/md/raid1.c |   66 ++-
 1 file changed, 45 insertions(+), 21 deletions(-)

diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c2008-01-18 11:19:09.0 +1100
+++ ./drivers/md/raid1.c2008-01-24 14:21:55.0 +1100
@@ -592,6 +592,37 @@ static int raid1_congested(void *data, i
 }
 
 
+static int flush_pending_writes(conf_t *conf)
+{
+   /* Any writes that have been queue but are awaiting
+* bitmap updates get flushed here.
+* We return 1 if any requests were actually submitted.
+*/
+   int rv = 0;
+
+   spin_lock_irq(&conf->device_lock);
+
+   if (conf->pending_bio_list.head) {
+   struct bio *bio;
+   bio = bio_list_get(&conf->pending_bio_list);
+   blk_remove_plug(conf->mddev->queue);
+   spin_unlock_irq(&conf->device_lock);
+   /* flush any pending bitmap writes to
+* disk before proceeding w/ I/O */
+   bitmap_unplug(conf->mddev->bitmap);
+
+   while (bio) { /* submit pending writes */
+   struct bio *next = bio->bi_next;
+   bio->bi_next = NULL;
+   generic_make_request(bio);
+   bio = next;
+   }
+   rv = 1;
+   } else
+   spin_unlock_irq(&conf->device_lock);
+   return rv;
+}
+
 /* Barriers
  * Sometimes we need to suspend IO while we do something else,
  * either some resync/recovery, or reconfigure the array.
@@ -678,10 +709,14 @@ static void freeze_array(conf_t *conf)
spin_lock_irq(&conf->resync_lock);
conf->barrier++;
conf->nr_waiting++;
+   spin_unlock_irq(&conf->resync_lock);
+
+   spin_lock_irq(&conf->resync_lock);
wait_event_lock_irq(conf->wait_barrier,
conf->barrier+conf->nr_pending == conf->nr_queued+2,
conf->resync_lock,
-   raid1_unplug(conf->mddev->queue));
+   ({ flush_pending_writes(conf);
+  raid1_unplug(conf->mddev->queue); }));
spin_unlock_irq(&conf->resync_lock);
 }
 static void unfreeze_array(conf_t *conf)
@@ -907,6 +942,9 @@ static int make_request(struct request_q
blk_plug_device(mddev->queue);
spin_unlock_irqrestore(&conf->device_lock, flags);
 
+   /* In case raid1d snuck into freeze_array */
+   wake_up(&conf->wait_barrier);
+
if (do_sync)
md_wakeup_thread(mddev->thread);
 #if 0
@@ -1473,28 +1511,14 @@ static void raid1d(mddev_t *mddev)

for (;;) {
char b[BDEVNAME_SIZE];
-   spin_lock_irqsave(&conf->device_lock, flags);
-
-   if (conf->pending_bio_list.head) {
-   bio = bio_list_get(&conf->pending_bio_list);
-   blk_remove_plug(mddev->queue);
-   spin_unlock_irqrestore(&conf->device_lock, flags);
-   /* flush any pending bitmap writes to disk before 
proceeding w/ I/O */
-   bitmap_unplug(mddev->bitmap);
-
-   while (bio) { /* submit pending writes */
-   struct bio *next = bio->bi_next;
-   bio->bi_next = NULL;
-   generic_make_request(b

Re: idle array consuming cpu ??!!

2008-01-23 Thread Bill Davidsen

Carlos Carvalho wrote:

Bill Davidsen ([EMAIL PROTECTED]) wrote on 22 January 2008 17:53:
 >Carlos Carvalho wrote:
 >> Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15:
 >>  >On Sunday January 20, [EMAIL PROTECTED] wrote:
 >>  >> A raid6 array with a spare and bitmap is idle: not mounted and with no
 >>  >> IO to it or any of its disks (obviously), as shown by iostat. However
 >>  >> it's consuming cpu: since reboot it used about 11min in 24h, which is 
quite
 >>  >> a lot even for a busy array (the cpus are fast). The array was cleanly
 >>  >> shutdown so there's been no reconstruction/check or anything else.
 >>  >> 
 >>  >> How can this be? Kernel is 2.6.22.16 with the two patches for the

 >>  >> deadlock ("[PATCH 004 of 4] md: Fix an occasional deadlock in raid5 -
 >>  >> FIX") and the previous one.
 >>  >
 >>  >Maybe the bitmap code is waking up regularly to do nothing.
 >>  >
 >>  >Would you be happy to experiment?  Remove the bitmap with
 >>  >   mdadm --grow /dev/mdX --bitmap=none
 >>  >
 >>  >and see how that affects cpu usage?
 >>
 >> Confirmed, removing the bitmap stopped cpu consumption.
 >
 >Looks like quite a bit of CPU going into idle arrays here, too.

I don't mind the cpu time (in the machines where we use it here), what
worries me is that it shouldn't happen when the disks are completely
idle. Looks like there's a bug somewhere.


That's my feeling, I have one array with an internal bitmap and one with 
no bitmap, and the internal bitmap uses CPU even when the machine is 
idle. I have *not* tried an external bitmap.


--
Bill Davidsen <[EMAIL PROTECTED]>
 "Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over..." Otto von Bismark 



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Error on /dev/sda, but takes down RAID-1

2008-01-23 Thread Neil Brown
On Wednesday January 23, [EMAIL PROTECTED] wrote:
> Hi, 
> 
> I'm not sure this is completely linux-raid related, but I can't figure out 
> where to start: 
> 
> A few days ago, my server died. I was able to log in and salvage this content 
> of dmesg: 
> http://pastebin.com/m4af616df 

At line 194:

   end_request: I/O error, dev sdb, sector 80324865

then at line 384

   end_request: I/O error, dev sda, sector 80324865

> 
> I talked to my hosting-people and they said it was an io-error on /dev/sda, 
> and replaced that drive. 
> After this, I was able to boot into a PXE-image and re-build the two RAID-1 
> devices with no problems - indicating that sdb was fine. 
> 
> I expected RAID-1 to be able to stomach exactly this kind of error - one 
> drive dying. What did I do wrong? 

Trouble is it wasn't "one drive dying".  You got errors from two
drives, at almost exactly the same time.  So maybe the controller
died.  Or maybe when one drive died, the controller or the driver got
confused and couldn't work with the other drive any more.

Certainly the "blk: request botched" message (line 233 onwards)
suggest some confusion in the driver.

Maybe post to [EMAIL PROTECTED] - that is where issues with
SATA drivers and controllers can be discussed.

NeilBrown


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


performance of raid10,f2 on 4 disks

2008-01-23 Thread Keld Jørn Simonsen
Hi!

I have played around with raid10,f2 on a 2 disk array set,
and I really liked the performance on the sequential reads.
It looked like double up on the speed, about 173 MB/s
for two SATA-2 disks. 

I then went on to look at my 4 new SATS-2 disks, to have
the same kind of performance I made the array by:

mdadm --create /dev/md3 --chunk=256 -R -l 10 -n 4 -p f2 /dev/sd[abcd]1

And my first tests showed a sequential read rate of 320 MB/s.
Impressive! I then tried it a few more times, but then I could not
get more than around 160 MB/s, which is less than what I got on 2 disks.

Any ideas of what is going on?

Best regards
keld
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Error on /dev/sda, but takes down RAID-1

2008-01-23 Thread Michael Tokarev
Martin Seebach wrote:
> Hi, 
> 
> I'm not sure this is completely linux-raid related, but I can't figure out 
> where to start: 
> 
> A few days ago, my server died. I was able to log in and salvage this content 
> of dmesg: 
> http://pastebin.com/m4af616df 
> 
> I talked to my hosting-people and they said it was an io-error on /dev/sda, 
> and replaced that drive. 
> After this, I was able to boot into a PXE-image and re-build the two RAID-1 
> devices with no problems - indicating that sdb was fine. 
> 
> I expected RAID-1 to be able to stomach exactly this kind of error - one 
> drive dying. What did I do wrong? 

from that pastebin page.

First, sdb has failed for whatever reason:

ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: revalidation failed (errno=-5)
ata2.00: disabled
ata2: EH complete
sd 1:0:0:0: SCSI error: return code = 0x0004
end_request: I/O error, dev sdb, sector 80324865
raid1: Disk failure on sdb1, disabling device.
Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda1
 disk 1, wo:1, o:0, dev:sdb1
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda1

At this time, it started to (re)sync other(?) arrays for
some reason:

md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) 
for reconstruction.
md: using 128k window, over a total of 40162432 blocks.
md: md0: sync done.
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda1
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) 
for reconstruction.
md: using 128k window, over a total of 100060736 blocks.

Note again, errors on sdb:

sd 1:0:0:0: SCSI error: return code = 0x0004
end_request: I/O error, dev sdb, sector 112455000
sd 1:0:0:0: SCSI error: return code = 0x0004
end_request: I/O error, dev sdb, sector 112455256
sd 1:0:0:0: SCSI error: return code = 0x0004
end_request: I/O error, dev sdb, sector 112455512
...

raid1: Disk failure on sdb3, disabling device.
Operation continuing on 1 devices

so another md array detected sdb failure.  So we're
with sda only.  And volia, sda fails too, some time
later:

ata1: EH complete
sd 0:0:0:0: SCSI error: return code = 0x0004
end_request: I/O error, dev sda, sector 80324865
sd 0:0:0:0: SCSI error: return code = 0x0004
end_request: I/O error, dev sda, sector 115481
...

At this point, the arrays are hosed - all disks
of each array has failed, there's no data any
more to read/write from/to.

Since later sda has been replaced, and sdb recovered
from the errors (it contains still-valid superblocks
but with somewhat stale information), everything
went ok.

But the original problem is that you had BOTH disks
failed, not only one.  What caused THIS problem is
another question.  Maybe some overheating or power
unit problem or somesuch, -- I don't know...  But
md code worked the best it can here.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-23 Thread Carlos Carvalho
Tim Southerwood ([EMAIL PROTECTED]) wrote on 23 January 2008 13:37:
 >Sorry if this breaks threaded mail readers, I only just subscribed to 
 >the list so don;t have the original post to reply to.
 >
 >I believe I'm having the same problem.
 >
 >Regarding XFS on a raid5 md array:
 >
 >Kernels 2.6.22-14 (Ubuntu Gutsy generic and server builds) *and* 
 >2.6.24-rc8 (pure build from virgin sources) compiled for amd64 arch.

This has been corrected already, install Neil's patches. It worked for
several people under high stress, including us.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AACRAID driver broken in 2.6.22.x (and beyond?) [WAS: Re: 2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN]

2008-01-23 Thread Mike Snitzer
On Jan 23, 2008 9:28 AM, Salyzyn, Mark <[EMAIL PROTECTED]> wrote:
> At which version of the kernel did the aacraid driver allegedly first go 
> broken? At which version did it get fixed? (Since 1.1.5-2451 is older than 
> latest represented on kernel.org)

snitzer:
I don't know where the kernel.org aacraid driver first allegedly broke
relative to this drive pull test.  All I know is 1.1.5-2451 enables
the driver and raid1 layer to behave as expected at the system level.
That is:
1) the aacraid driver enables the pulled scsi device to be offlined
2) the raid1 layer gets a write failure back from the pulled drive and
marks that raid1 member faulty

The demonstration of this is as follows:
aacraid: Host adapter abort request (0,0,27,0)
aacraid: Host adapter abort request (0,0,14,0)
aacraid: Host adapter abort request (0,0,21,0)
aacraid: Host adapter abort request (0,0,25,0)
aacraid: Host adapter abort request (0,0,18,0)
aacraid: Host adapter abort request (0,0,8,0)
aacraid: Host adapter abort request (0,0,23,0)
aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter abort request (0,0,5,0)
aacraid: Host adapter abort request (0,0,1,0)
aacraid: Host adapter abort request (0,0,17,0)
aacraid: Host adapter abort request (0,0,12,0)
aacraid: Host adapter abort request (0,0,3,0)
aacraid: Host adapter abort request (0,0,4,0)
aacraid: Host adapter abort request (0,0,22,0)
aacraid: Host adapter abort request (0,0,11,0)
aacraid: Host adapter abort request (0,0,26,0)
aacraid: Host adapter abort request (0,0,20,0)
aacraid: Host adapter abort request (0,0,2,0)
aacraid: Host adapter abort request (0,0,6,0)
aacraid: Host adapter reset request. SCSI hang ?
AAC: Host adapter BLINK LED 0x7
AAC0: adapter kernel panic'd 7.
AAC0: Non-DASD support enabled.
AAC0: 64 Bit DAC enabled
sd 0:0:27:0: scsi: Device offlined - not ready after error recovery
sd 0:0:27:0: rejecting I/O to offline device
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on sdab1, disabling device.
Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sdab1
 disk 1, wo:0, o:1, dev:nbd2
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:nbd2

Clearly the BlinkLED, firmware panic is _not_ good but in the end the
system stays alive and functions as expected.

> How is the SATA disk'd arrayed on the aacraid controller? The controller is 
> limited to generating 24 arrays and since /dev/sdac is the 29th target, it 
> would appear we need more details on your array's topology inside the aacraid 
> controller. If you are using the driver with aacraid.physical=1 and thus 
> using the physical drives directly (in the case of a SATA disk, a SATr0.9 
> translation in the Firmware), this is not a supported configuration and was 
> added only to enable limited experimentation. If there is a problem in that 
> path in the driver, I will glad to fix it, but still unsupported.

snitzer:
I'm using the 5.2-0 (15206) firmware that is not limited to 24 arrays;
it supports up to 30 AFAIK.  All disks are being exported to Linux as
a 'Simple Volume'.  I'm not playing games with aacraid.physical=1

Is the 5.2-0 (15206) firmware unsupported on the Adaptec 3085?

I can try the same test with the most current 5.2-0 (15333) firmware
to see if the drive pull behaves any differently with both the
1.1.5-2451 and 2.6.22.16's 1.1-5[2437]-mh4.

> You may need to acquire a diagnostic dump from the controller (Adaptec 
> technical support can advise, it will depend on your application suite) and a 
> report of any error recovery actions reported by the driver in the system log 
> as initiated by the SCSI subsystem.

snitzer:
OK, I can engage Adaptec support on this.

> There are no changes in the I/O path for the aacraid driver. Due to the 
> simplicity of the I/O path to the processor based controller, it is unlikely 
> to be an issue in this path. There have been several changes in the driver to 
> deal with error recovery actions initiated by the SCSI subsystem. One likely 
> candidate was to extend the default SCSI layer timeout because it was shorter 
> than the adapter's firmware timeout. You can check if this is the issue by 
> manually increasing the timeout for the target(s) via sysfs. There were 
> recent patches to deal with orphaned commands resulting from devices being 
> taken offline by the SCSI layer. There has been changes in the driver to 
> reset the controller should it go into a BlinkLED (Firmware Assert) state. 
> The symptom also acts like a condition in the older drivers (pre 08/08/2006 
> on scsi-misc-2.6, showing up in 2.6.20.4) which did not reset the adapter 
> when it entered the BlinkLED state and merely allowed the system to lock, but 
> alas you are working with a driver with this reset fix in the version you 
> report. A BlinkLED condition generally indicates a serious hardware problem 
> or target incompatibility; and is generally rare as they are a result of 
> corner case conditio

Re: identifying failed disk/s in an array.

2008-01-23 Thread Nagilum

- Message from [EMAIL PROTECTED] -
Date: Wed, 23 Jan 2008 16:05:40 +1100
From: Michael Harris <[EMAIL PROTECTED]>
Reply-To: Michael Harris <[EMAIL PROTECTED]>
 Subject: identifying failed disk/s in an array.
  To: linux-raid@vger.kernel.org



Hi,

I have just built a Raid 5 array  using mdadm and while it is   
running fine I have a question, about identifying the order of disks  
 in the array.


In the pre sata days you would connect your drives as follows:

Primary Master - HDA
Primary Slave - HDB
Secondary - Master - HDC
Secondary - Slave -HDD

So if disk HDC failed i would know it was the primary disk on the   
secondary controller and would replace that drive.


My current setup is as follows

MB Primary Master (PATA) Primary Master - Operating System

The array disks are attached to:

MB Sata port 1
MB Sata port 2
PCI card Sata port 1

When i setup the array the OS drive was SDA and the other SDB,SDC,SDD.

Now the problem is everytime i reboot, the drives are sometimes   
detected in a different order, now because i mount root via the UUID  
 of the OS disk and the kernel looks at the superblocks of the  
raided  drives everything comes up fine, but I'm worried that if i  
move the  array to another machine and need to do a mdadm --assemble  
that i  won't know the correct order of the disks and what is more  
worrying  if i have a disk fail say HDC for example, i wont know  
which disk  HDC is as it could be any of the 5 disks in the PC. Is  
there anyway  to make it easier to identify which disk is which?.


- End message from [EMAIL PROTECTED] -

Try this:
mdadm -Q --detail /dev/md0
to see which disk is which disk in the raid.
To identify a disk you can examine it using:
mdadm -E /dev/sd[b-d]
and read your dmesg.
And finally you can use "blkid" to associate UUIDs with devices.
I hope this helps.
Kind regards,
Alex.


#_  __  _ __ http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__  _(_) /_  _  [EMAIL PROTECTED] \n +491776461165 #
#  // _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#   /___/ x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #




cakebox.homeunix.net - all the machine one needs..
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Error on /dev/sda, but takes down RAID-1

2008-01-23 Thread Martin Seebach
Hi, 

I'm not sure this is completely linux-raid related, but I can't figure out 
where to start: 

A few days ago, my server died. I was able to log in and salvage this content 
of dmesg: 
http://pastebin.com/m4af616df 

I talked to my hosting-people and they said it was an io-error on /dev/sda, and 
replaced that drive. 
After this, I was able to boot into a PXE-image and re-build the two RAID-1 
devices with no problems - indicating that sdb was fine. 

I expected RAID-1 to be able to stomach exactly this kind of error - one drive 
dying. What did I do wrong? 

Regards, 
Martin Seebach 


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: AACRAID driver broken in 2.6.22.x (and beyond?) [WAS: Re: 2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN]

2008-01-23 Thread Salyzyn, Mark
At which version of the kernel did the aacraid driver allegedly first go 
broken? At which version did it get fixed? (Since 1.1.5-2451 is older than 
latest represented on kernel.org)

How is the SATA disk'd arrayed on the aacraid controller? The controller is 
limited to generating 24 arrays and since /dev/sdac is the 29th target, it 
would appear we need more details on your array's topology inside the aacraid 
controller. If you are using the driver with aacraid.physical=1 and thus using 
the physical drives directly (in the case of a SATA disk, a SATr0.9 translation 
in the Firmware), this is not a supported configuration and was added only to 
enable limited experimentation. If there is a problem in that path in the 
driver, I will glad to fix it, but still unsupported.

You may need to acquire a diagnostic dump from the controller (Adaptec 
technical support can advise, it will depend on your application suite) and a 
report of any error recovery actions reported by the driver in the system log 
as initiated by the SCSI subsystem.

There are no changes in the I/O path for the aacraid driver. Due to the 
simplicity of the I/O path to the processor based controller, it is unlikely to 
be an issue in this path. There have been several changes in the driver to deal 
with error recovery actions initiated by the SCSI subsystem. One likely 
candidate was to extend the default SCSI layer timeout because it was shorter 
than the adapter's firmware timeout. You can check if this is the issue by 
manually increasing the timeout for the target(s) via sysfs. There were recent 
patches to deal with orphaned commands resulting from devices being taken 
offline by the SCSI layer. There has been changes in the driver to reset the 
controller should it go into a BlinkLED (Firmware Assert) state. The symptom 
also acts like a condition in the older drivers (pre 08/08/2006 on 
scsi-misc-2.6, showing up in 2.6.20.4) which did not reset the adapter when it 
entered the BlinkLED state and merely allowed the system to lock, but alas you 
are working with a driver with this reset fix in the version you report. A 
BlinkLED condition generally indicates a serious hardware problem or target 
incompatibility; and is generally rare as they are a result of corner case 
conditions within the Adapter Firmware. The diagnostic dump reported by the 
Adaptec utilities should be able to point to the fault you are experiencing if 
these appear to be the root causes.

Sincerely -- Mark Salyzyn

> -Original Message-
> From: Mike Snitzer [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 22, 2008 7:10 PM
> To: linux-raid@vger.kernel.org; NeilBrown
> Cc: [EMAIL PROTECTED]; K. Tanaka; AACRAID;
> [EMAIL PROTECTED]
> Subject: AACRAID driver broken in 2.6.22.x (and beyond?)
> [WAS: Re: 2.6.22.16 MD raid1 doesn't mark removed disk
> faulty, MD thread goes UN]
>
> On Jan 22, 2008 12:29 AM, Mike Snitzer <[EMAIL PROTECTED]> wrote:
> > cc'ing Tanaka-san given his recent raid1 BUG report:
> > http://lkml.org/lkml/2008/1/14/515
> >
> >
> > On Jan 21, 2008 6:04 PM, Mike Snitzer <[EMAIL PROTECTED]> wrote:
> > > Under 2.6.22.16, I physically pulled a SATA disk
> (/dev/sdac, connected to
> > > an aacraid controller) that was acting as the local raid1
> member of
> > > /dev/md30.
> > >
> > > Linux MD didn't see an /dev/sdac1 error until I tried
> forcing the issue by
> > > doing a read (with dd) from /dev/md30:
> 
> > The raid1d thread is locked at line 720 in raid1.c
> (raid1d+2437); aka
> > freeze_array:
> >
> > (gdb) l *0x2539
> > 0x2539 is in raid1d (drivers/md/raid1.c:720).
> > 715  * wait until barrier+nr_pending match nr_queued+2
> > 716  */
> > 717 spin_lock_irq(&conf->resync_lock);
> > 718 conf->barrier++;
> > 719 conf->nr_waiting++;
> > 720 wait_event_lock_irq(conf->wait_barrier,
> > 721
> conf->barrier+conf->nr_pending ==
> > conf->nr_queued+2,
> > 722 conf->resync_lock,
> > 723
> raid1_unplug(conf->mddev->queue));
> > 724 spin_unlock_irq(&conf->resync_lock);
> >
> > Given Tanaka-san's report against 2.6.23 and me hitting
> what seems to
> > be the same deadlock in 2.6.22.16; it stands to reason this affects
> > raid1 in 2.6.24-rcX too.
>
> Turns out that the aacraid driver in 2.6.22.x is HORRIBLY BROKEN (when
> you pull a drive); it responds to MD's write requests with uptodate=1
> (in raid1_end_write_request) for the drive that was pulled!  I've not
> looked to see if aacraid has been fixed in newer kernels... are others
> aware of any crucial aacraid fixes in 2.6.23.x or 2.6.24?
>
> After the drive was physically pulled, and small periodic writes
> continued to the associated MD device, the raid1 MD driver did _NOT_
> detect the pulled drive's writes as having failed (verified this with
> systemtap).  MD happily thought the write completed to both members
> (so MD had no reason to mark the pulled drive "fau

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-23 Thread Tim Southerwood
Sorry if this breaks threaded mail readers, I only just subscribed to 
the list so don;t have the original post to reply to.


I believe I'm having the same problem.

Regarding XFS on a raid5 md array:

Kernels 2.6.22-14 (Ubuntu Gutsy generic and server builds) *and* 
2.6.24-rc8 (pure build from virgin sources) compiled for amd64 arch.


Raid 5 configured across 4 x 500GB SATA disks (Nforce nv_sata driver, 
Asus M2N-E mobo, Athlon X64, 4GB RAM


MD Chunk size is 1024k. This is allocated to an LVM2 PV, then sliced up.
Taking one sample logical volume of 150GB I ran

mkfs.xfs -d su=1024k,sw=3 -L vol_linux /dev/vg00/vol_linux

I then found that putting high write load on that filesystem cause a 
hang. High load could be a little as a single rsync of a mirror of 
Ubunty Gutsy (many 10's of GB) from my old server to here. Hang would 
happen in a few hours typically.


I could generate relatively quick hangs by running xfs_fsr (defragger) 
in parallel.


Trying the workaround up upping /sys/block/md1/md/stripe_cache_size to 
4096 seems (fingers crossed) to have helped. Been running the rsync 
again, plus xfs_fst + a few dd's of 11 GB to the same filesystem.


I did notice also that the write speed increased dramatically with a 
bigger stripe_cache_size.


A more detailed analysis of the problem indicated that, after the hang:

I could log in;

One CPU core was stuck in 100% IO wait.
The other core was useable, with care. So I managed to get a SysRQ T and 
 one place the system appeared blocked was via this path:


[ 2039.466258] xfs_fsr   D  0  7324   7308
[ 2039.466260]  810119399858 0082  
0046
[ 2039.466263]  810110d6c680 8101102ba998 8101102ba770 
8054e5e0
[ 2039.466265]  8101102ba998 00010014a1e6  
810110ddcb30

[ 2039.466268] Call Trace:
[ 2039.466277]  [] :raid456:get_active_stripe+0x1cb/0x610
[ 2039.466282]  [] default_wake_function+0x0/0x10
[ 2039.466289]  [] :raid456:make_request+0x1f8/0x610
[ 2039.466293]  [] autoremove_wake_function+0x0/0x30
[ 2039.466295]  [] __up_read+0x21/0xb0
[ 2039.466300]  [] generic_make_request+0x1d6/0x3d0
[ 2039.466303]  [] vm_normal_page+0x3d/0xc0
[ 2039.466307]  [] submit_bio+0x6f/0xf0
[ 2039.466311]  [] dio_bio_submit+0x5c/0x90
[ 2039.466313]  [] dio_send_cur_page+0x43/0xa0
[ 2039.466316]  [] submit_page_section+0x4e/0x150
[ 2039.466319]  [] __blockdev_direct_IO+0x742/0xb50
[ 2039.466342]  [] :xfs:xfs_vm_direct_IO+0x182/0x190
[ 2039.466357]  [] :xfs:xfs_get_blocks_direct+0x0/0x20
[ 2039.466370]  [] :xfs:xfs_end_io_direct+0x0/0x80
[ 2039.466375]  [] __wait_on_bit_lock+0x65/0x80
[ 2039.466380]  [] generic_file_direct_IO+0xe3/0x190
[ 2039.466385]  [] generic_file_direct_write+0x74/0x150
[ 2039.466402]  [] :xfs:xfs_write+0x492/0x8f0
[ 2039.466421]  [] :xfs:xfs_iunlock+0x2c/0xb0
[ 2039.466437]  [] :xfs:xfs_read+0x186/0x240
[ 2039.466443]  [] do_sync_write+0xd9/0x120
[ 2039.466448]  [] autoremove_wake_function+0x0/0x30
[ 2039.466457]  [] vfs_write+0xdd/0x190
[ 2039.466461]  [] sys_write+0x53/0x90
[ 2039.466465]  [] system_call+0x7e/0x83


However, I'm of the opinion that the system should not deadlock, even if 
tunable parameters are unfavourable. I'm happy with the workaround 
(indeed the system performs better).


However, it will take me a week's worth of testing before I'm willing to 
commission this as my new fileserver.


So, if there is anything anyone would like me to try, I'm happy to 
volunteer as a guinea pig :)


Yes, I can build and patch kernels. But I'm not hot at debugging kernels 
so if kernel core dumps or whatever are needed, please point me at the 
right document or hint as to which commands I need to read about.


Cheers

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: identifying failed disk/s in an array.

2008-01-23 Thread Wolfgang Denk
In message <[EMAIL PROTECTED]> you wrote:
>
> And/or use smartctl to look up the make/model/serial number and look at the
> drive label. I always do this to make sure I'm pulling the right drive (also
> useful to RMA the drive)

Or, probblay even faster, do a "ls -l /dev/disk/by-id" (assuming you
are using udev).

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
Command, n.:
Statement presented by a human and accepted by a computer
in such a manner as to make the human feel as if he is in control.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: identifying failed disk/s in an array.

2008-01-23 Thread David Greaves
Tomasz Chmielewski wrote:
> Michael Harris schrieb:
>> i have a disk fail say HDC for example, i wont know which disk HDC is
>> as it could be any of the 5 disks in the PC. Is there anyway to make
>> it easier to identify which disk is which?.
> 
> If the drives have any LEDs, the most reliable way would be:
> 
> dd if=/dev/drive of=/dev/null
> 
> Then look which LED is the one which blinks the most.

And/or use smartctl to look up the make/model/serial number and look at the
drive label. I always do this to make sure I'm pulling the right drive (also
useful to RMA the drive)


David
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html