Re: idle array consuming cpu ??!!
On Tuesday January 22, [EMAIL PROTECTED] wrote: > Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15: > >On Sunday January 20, [EMAIL PROTECTED] wrote: > >> A raid6 array with a spare and bitmap is idle: not mounted and with no > >> IO to it or any of its disks (obviously), as shown by iostat. However > >> it's consuming cpu: since reboot it used about 11min in 24h, which is > quite > >> a lot even for a busy array (the cpus are fast). The array was cleanly > >> shutdown so there's been no reconstruction/check or anything else. > >> > >> How can this be? Kernel is 2.6.22.16 with the two patches for the > >> deadlock ("[PATCH 004 of 4] md: Fix an occasional deadlock in raid5 - > >> FIX") and the previous one. > > > >Maybe the bitmap code is waking up regularly to do nothing. > > > >Would you be happy to experiment? Remove the bitmap with > > mdadm --grow /dev/mdX --bitmap=none > > > >and see how that affects cpu usage? > > Confirmed, removing the bitmap stopped cpu consumption. Thanks. This patch should substantiallly reduce cpu consumption on an idle bitmap. NeilBrown -- Reduce CPU wastage on idle md array with a write-intent bitmap. On an md array with a write-intent bitmap, a thread wakes up every few seconds to and scans the bitmap looking for work to do. If there array is idle, there will be no work to do, but a lot of scanning is done to discover this. So cache the fact that the bitmap is completely clean, and avoid scanning the whole bitmap when the cache is known to be clean. Signed-off-by: Neil Brown <[EMAIL PROTECTED]> ### Diffstat output ./drivers/md/bitmap.c | 19 +-- ./include/linux/raid/bitmap.h |2 ++ 2 files changed, 19 insertions(+), 2 deletions(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2008-01-24 15:53:45.0 +1100 +++ ./drivers/md/bitmap.c 2008-01-24 15:54:29.0 +1100 @@ -1047,6 +1047,11 @@ void bitmap_daemon_work(struct bitmap *b if (time_before(jiffies, bitmap->daemon_lastrun + bitmap->daemon_sleep*HZ)) return; bitmap->daemon_lastrun = jiffies; + if (bitmap->allclean) { + bitmap->mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT; + return; + } + bitmap->allclean = 1; for (j = 0; j < bitmap->chunks; j++) { bitmap_counter_t *bmc; @@ -1068,8 +1073,10 @@ void bitmap_daemon_work(struct bitmap *b clear_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE); spin_unlock_irqrestore(&bitmap->lock, flags); - if (need_write) + if (need_write) { write_page(bitmap, page, 0); + bitmap->allclean = 0; + } continue; } @@ -1098,6 +1105,9 @@ void bitmap_daemon_work(struct bitmap *b /* if (j < 100) printk("bitmap: j=%lu, *bmc = 0x%x\n", j, *bmc); */ + if (*bmc) + bitmap->allclean = 0; + if (*bmc == 2) { *bmc=1; /* maybe clear the bit next time */ set_page_attr(bitmap, page, BITMAP_PAGE_CLEAN); @@ -1132,6 +1142,8 @@ void bitmap_daemon_work(struct bitmap *b } } + if (bitmap->allclean == 0) + bitmap->mddev->thread->timeout = bitmap->daemon_sleep * HZ; } static bitmap_counter_t *bitmap_get_counter(struct bitmap *bitmap, @@ -1226,6 +1238,7 @@ int bitmap_startwrite(struct bitmap *bit sectors -= blocks; else sectors = 0; } + bitmap->allclean = 0; return 0; } @@ -1296,6 +1309,7 @@ int bitmap_start_sync(struct bitmap *bit } } spin_unlock_irq(&bitmap->lock); + bitmap->allclean = 0; return rv; } @@ -1332,6 +1346,7 @@ void bitmap_end_sync(struct bitmap *bitm } unlock: spin_unlock_irqrestore(&bitmap->lock, flags); + bitmap->allclean = 0; } void bitmap_close_sync(struct bitmap *bitmap) @@ -1399,7 +1414,7 @@ static void bitmap_set_memory_bits(struc set_page_attr(bitmap, page, BITMAP_PAGE_CLEAN); } spin_unlock_irq(&bitmap->lock); - + bitmap->allclean = 0; } /* dirty the memory and file bits for bitmap chunks "s" to "e" */ diff .prev/include/linux/raid/bitmap.h ./include/linux/raid/bitmap.h --- .prev/include/linux/raid/bitmap.h 2008-01-24 15:53:45.0 +1100 +++ ./include/linux/raid/bitmap.h 2008-01-24 15:54:29.0 +1100 @@ -235,6 +235,8 @@ struct bitmap { unsigned long flags; + int allclean; + unsigned long max_write_behind; /* write-behind mode */ atomi
Re: [BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock
On Tuesday January 15, [EMAIL PROTECTED] wrote: > > This message describes the details about md-RAID1 issue found by > testing the md RAID1 using the SCSI fault injection framework. > > Abstract: > Both the error handler for md RAID1 and write access request to the md RAID1 > use raid1d kernel thread. The nr_pending flag could cause a race condition > in raid1d, results in a raid1d deadlock. Thanks for finding and reporting this. I believe the following patch should fix the deadlock. If you are able to repeat your test and confirm this I would appreciate it. Thanks, NeilBrown Fix deadlock in md/raid1 when handling a read error. When handling a read error, we freeze the array to stop any other IO while attempting to over-write with correct data. This is done in the raid1d thread and must wait for all submitted IO to complete (except for requests that failed and are sitting in the retry queue - these are counted in ->nr_queue and will stay there during a freeze). However write requests need attention from raid1d as bitmap updates might be required. This can cause a deadlock as raid1 is waiting for requests to finish that themselves need attention from raid1d. So we create a new function 'flush_pending_writes' to give that attention, and call it in freeze_array to be sure that we aren't waiting on raid1d. Thanks to "K.Tanaka" <[EMAIL PROTECTED]> for finding and reporting this problem. Cc: "K.Tanaka" <[EMAIL PROTECTED]> Signed-off-by: Neil Brown <[EMAIL PROTECTED]> ### Diffstat output ./drivers/md/raid1.c | 66 ++- 1 file changed, 45 insertions(+), 21 deletions(-) diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c --- .prev/drivers/md/raid1.c2008-01-18 11:19:09.0 +1100 +++ ./drivers/md/raid1.c2008-01-24 14:21:55.0 +1100 @@ -592,6 +592,37 @@ static int raid1_congested(void *data, i } +static int flush_pending_writes(conf_t *conf) +{ + /* Any writes that have been queue but are awaiting +* bitmap updates get flushed here. +* We return 1 if any requests were actually submitted. +*/ + int rv = 0; + + spin_lock_irq(&conf->device_lock); + + if (conf->pending_bio_list.head) { + struct bio *bio; + bio = bio_list_get(&conf->pending_bio_list); + blk_remove_plug(conf->mddev->queue); + spin_unlock_irq(&conf->device_lock); + /* flush any pending bitmap writes to +* disk before proceeding w/ I/O */ + bitmap_unplug(conf->mddev->bitmap); + + while (bio) { /* submit pending writes */ + struct bio *next = bio->bi_next; + bio->bi_next = NULL; + generic_make_request(bio); + bio = next; + } + rv = 1; + } else + spin_unlock_irq(&conf->device_lock); + return rv; +} + /* Barriers * Sometimes we need to suspend IO while we do something else, * either some resync/recovery, or reconfigure the array. @@ -678,10 +709,14 @@ static void freeze_array(conf_t *conf) spin_lock_irq(&conf->resync_lock); conf->barrier++; conf->nr_waiting++; + spin_unlock_irq(&conf->resync_lock); + + spin_lock_irq(&conf->resync_lock); wait_event_lock_irq(conf->wait_barrier, conf->barrier+conf->nr_pending == conf->nr_queued+2, conf->resync_lock, - raid1_unplug(conf->mddev->queue)); + ({ flush_pending_writes(conf); + raid1_unplug(conf->mddev->queue); })); spin_unlock_irq(&conf->resync_lock); } static void unfreeze_array(conf_t *conf) @@ -907,6 +942,9 @@ static int make_request(struct request_q blk_plug_device(mddev->queue); spin_unlock_irqrestore(&conf->device_lock, flags); + /* In case raid1d snuck into freeze_array */ + wake_up(&conf->wait_barrier); + if (do_sync) md_wakeup_thread(mddev->thread); #if 0 @@ -1473,28 +1511,14 @@ static void raid1d(mddev_t *mddev) for (;;) { char b[BDEVNAME_SIZE]; - spin_lock_irqsave(&conf->device_lock, flags); - - if (conf->pending_bio_list.head) { - bio = bio_list_get(&conf->pending_bio_list); - blk_remove_plug(mddev->queue); - spin_unlock_irqrestore(&conf->device_lock, flags); - /* flush any pending bitmap writes to disk before proceeding w/ I/O */ - bitmap_unplug(mddev->bitmap); - - while (bio) { /* submit pending writes */ - struct bio *next = bio->bi_next; - bio->bi_next = NULL; - generic_make_request(b
Re: idle array consuming cpu ??!!
Carlos Carvalho wrote: Bill Davidsen ([EMAIL PROTECTED]) wrote on 22 January 2008 17:53: >Carlos Carvalho wrote: >> Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15: >> >On Sunday January 20, [EMAIL PROTECTED] wrote: >> >> A raid6 array with a spare and bitmap is idle: not mounted and with no >> >> IO to it or any of its disks (obviously), as shown by iostat. However >> >> it's consuming cpu: since reboot it used about 11min in 24h, which is quite >> >> a lot even for a busy array (the cpus are fast). The array was cleanly >> >> shutdown so there's been no reconstruction/check or anything else. >> >> >> >> How can this be? Kernel is 2.6.22.16 with the two patches for the >> >> deadlock ("[PATCH 004 of 4] md: Fix an occasional deadlock in raid5 - >> >> FIX") and the previous one. >> > >> >Maybe the bitmap code is waking up regularly to do nothing. >> > >> >Would you be happy to experiment? Remove the bitmap with >> > mdadm --grow /dev/mdX --bitmap=none >> > >> >and see how that affects cpu usage? >> >> Confirmed, removing the bitmap stopped cpu consumption. > >Looks like quite a bit of CPU going into idle arrays here, too. I don't mind the cpu time (in the machines where we use it here), what worries me is that it shouldn't happen when the disks are completely idle. Looks like there's a bug somewhere. That's my feeling, I have one array with an internal bitmap and one with no bitmap, and the internal bitmap uses CPU even when the machine is idle. I have *not* tried an external bitmap. -- Bill Davidsen <[EMAIL PROTECTED]> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Error on /dev/sda, but takes down RAID-1
On Wednesday January 23, [EMAIL PROTECTED] wrote: > Hi, > > I'm not sure this is completely linux-raid related, but I can't figure out > where to start: > > A few days ago, my server died. I was able to log in and salvage this content > of dmesg: > http://pastebin.com/m4af616df At line 194: end_request: I/O error, dev sdb, sector 80324865 then at line 384 end_request: I/O error, dev sda, sector 80324865 > > I talked to my hosting-people and they said it was an io-error on /dev/sda, > and replaced that drive. > After this, I was able to boot into a PXE-image and re-build the two RAID-1 > devices with no problems - indicating that sdb was fine. > > I expected RAID-1 to be able to stomach exactly this kind of error - one > drive dying. What did I do wrong? Trouble is it wasn't "one drive dying". You got errors from two drives, at almost exactly the same time. So maybe the controller died. Or maybe when one drive died, the controller or the driver got confused and couldn't work with the other drive any more. Certainly the "blk: request botched" message (line 233 onwards) suggest some confusion in the driver. Maybe post to [EMAIL PROTECTED] - that is where issues with SATA drivers and controllers can be discussed. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
performance of raid10,f2 on 4 disks
Hi! I have played around with raid10,f2 on a 2 disk array set, and I really liked the performance on the sequential reads. It looked like double up on the speed, about 173 MB/s for two SATA-2 disks. I then went on to look at my 4 new SATS-2 disks, to have the same kind of performance I made the array by: mdadm --create /dev/md3 --chunk=256 -R -l 10 -n 4 -p f2 /dev/sd[abcd]1 And my first tests showed a sequential read rate of 320 MB/s. Impressive! I then tried it a few more times, but then I could not get more than around 160 MB/s, which is less than what I got on 2 disks. Any ideas of what is going on? Best regards keld - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Error on /dev/sda, but takes down RAID-1
Martin Seebach wrote: > Hi, > > I'm not sure this is completely linux-raid related, but I can't figure out > where to start: > > A few days ago, my server died. I was able to log in and salvage this content > of dmesg: > http://pastebin.com/m4af616df > > I talked to my hosting-people and they said it was an io-error on /dev/sda, > and replaced that drive. > After this, I was able to boot into a PXE-image and re-build the two RAID-1 > devices with no problems - indicating that sdb was fine. > > I expected RAID-1 to be able to stomach exactly this kind of error - one > drive dying. What did I do wrong? from that pastebin page. First, sdb has failed for whatever reason: ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2.00: revalidation failed (errno=-5) ata2.00: disabled ata2: EH complete sd 1:0:0:0: SCSI error: return code = 0x0004 end_request: I/O error, dev sdb, sector 80324865 raid1: Disk failure on sdb1, disabling device. Operation continuing on 1 devices RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sda1 disk 1, wo:1, o:0, dev:sdb1 RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sda1 At this time, it started to (re)sync other(?) arrays for some reason: md: syncing RAID array md0 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for reconstruction. md: using 128k window, over a total of 40162432 blocks. md: md0: sync done. RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sda1 md: syncing RAID array md1 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for reconstruction. md: using 128k window, over a total of 100060736 blocks. Note again, errors on sdb: sd 1:0:0:0: SCSI error: return code = 0x0004 end_request: I/O error, dev sdb, sector 112455000 sd 1:0:0:0: SCSI error: return code = 0x0004 end_request: I/O error, dev sdb, sector 112455256 sd 1:0:0:0: SCSI error: return code = 0x0004 end_request: I/O error, dev sdb, sector 112455512 ... raid1: Disk failure on sdb3, disabling device. Operation continuing on 1 devices so another md array detected sdb failure. So we're with sda only. And volia, sda fails too, some time later: ata1: EH complete sd 0:0:0:0: SCSI error: return code = 0x0004 end_request: I/O error, dev sda, sector 80324865 sd 0:0:0:0: SCSI error: return code = 0x0004 end_request: I/O error, dev sda, sector 115481 ... At this point, the arrays are hosed - all disks of each array has failed, there's no data any more to read/write from/to. Since later sda has been replaced, and sdb recovered from the errors (it contains still-valid superblocks but with somewhat stale information), everything went ok. But the original problem is that you had BOTH disks failed, not only one. What caused THIS problem is another question. Maybe some overheating or power unit problem or somesuch, -- I don't know... But md code worked the best it can here. /mjt - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 reproducible raid5 hang
Tim Southerwood ([EMAIL PROTECTED]) wrote on 23 January 2008 13:37: >Sorry if this breaks threaded mail readers, I only just subscribed to >the list so don;t have the original post to reply to. > >I believe I'm having the same problem. > >Regarding XFS on a raid5 md array: > >Kernels 2.6.22-14 (Ubuntu Gutsy generic and server builds) *and* >2.6.24-rc8 (pure build from virgin sources) compiled for amd64 arch. This has been corrected already, install Neil's patches. It worked for several people under high stress, including us. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AACRAID driver broken in 2.6.22.x (and beyond?) [WAS: Re: 2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN]
On Jan 23, 2008 9:28 AM, Salyzyn, Mark <[EMAIL PROTECTED]> wrote: > At which version of the kernel did the aacraid driver allegedly first go > broken? At which version did it get fixed? (Since 1.1.5-2451 is older than > latest represented on kernel.org) snitzer: I don't know where the kernel.org aacraid driver first allegedly broke relative to this drive pull test. All I know is 1.1.5-2451 enables the driver and raid1 layer to behave as expected at the system level. That is: 1) the aacraid driver enables the pulled scsi device to be offlined 2) the raid1 layer gets a write failure back from the pulled drive and marks that raid1 member faulty The demonstration of this is as follows: aacraid: Host adapter abort request (0,0,27,0) aacraid: Host adapter abort request (0,0,14,0) aacraid: Host adapter abort request (0,0,21,0) aacraid: Host adapter abort request (0,0,25,0) aacraid: Host adapter abort request (0,0,18,0) aacraid: Host adapter abort request (0,0,8,0) aacraid: Host adapter abort request (0,0,23,0) aacraid: Host adapter abort request (0,0,0,0) aacraid: Host adapter abort request (0,0,5,0) aacraid: Host adapter abort request (0,0,1,0) aacraid: Host adapter abort request (0,0,17,0) aacraid: Host adapter abort request (0,0,12,0) aacraid: Host adapter abort request (0,0,3,0) aacraid: Host adapter abort request (0,0,4,0) aacraid: Host adapter abort request (0,0,22,0) aacraid: Host adapter abort request (0,0,11,0) aacraid: Host adapter abort request (0,0,26,0) aacraid: Host adapter abort request (0,0,20,0) aacraid: Host adapter abort request (0,0,2,0) aacraid: Host adapter abort request (0,0,6,0) aacraid: Host adapter reset request. SCSI hang ? AAC: Host adapter BLINK LED 0x7 AAC0: adapter kernel panic'd 7. AAC0: Non-DASD support enabled. AAC0: 64 Bit DAC enabled sd 0:0:27:0: scsi: Device offlined - not ready after error recovery sd 0:0:27:0: rejecting I/O to offline device md: super_written gets error=-5, uptodate=0 raid1: Disk failure on sdab1, disabling device. Operation continuing on 1 devices RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:1, o:0, dev:sdab1 disk 1, wo:0, o:1, dev:nbd2 RAID1 conf printout: --- wd:1 rd:2 disk 1, wo:0, o:1, dev:nbd2 Clearly the BlinkLED, firmware panic is _not_ good but in the end the system stays alive and functions as expected. > How is the SATA disk'd arrayed on the aacraid controller? The controller is > limited to generating 24 arrays and since /dev/sdac is the 29th target, it > would appear we need more details on your array's topology inside the aacraid > controller. If you are using the driver with aacraid.physical=1 and thus > using the physical drives directly (in the case of a SATA disk, a SATr0.9 > translation in the Firmware), this is not a supported configuration and was > added only to enable limited experimentation. If there is a problem in that > path in the driver, I will glad to fix it, but still unsupported. snitzer: I'm using the 5.2-0 (15206) firmware that is not limited to 24 arrays; it supports up to 30 AFAIK. All disks are being exported to Linux as a 'Simple Volume'. I'm not playing games with aacraid.physical=1 Is the 5.2-0 (15206) firmware unsupported on the Adaptec 3085? I can try the same test with the most current 5.2-0 (15333) firmware to see if the drive pull behaves any differently with both the 1.1.5-2451 and 2.6.22.16's 1.1-5[2437]-mh4. > You may need to acquire a diagnostic dump from the controller (Adaptec > technical support can advise, it will depend on your application suite) and a > report of any error recovery actions reported by the driver in the system log > as initiated by the SCSI subsystem. snitzer: OK, I can engage Adaptec support on this. > There are no changes in the I/O path for the aacraid driver. Due to the > simplicity of the I/O path to the processor based controller, it is unlikely > to be an issue in this path. There have been several changes in the driver to > deal with error recovery actions initiated by the SCSI subsystem. One likely > candidate was to extend the default SCSI layer timeout because it was shorter > than the adapter's firmware timeout. You can check if this is the issue by > manually increasing the timeout for the target(s) via sysfs. There were > recent patches to deal with orphaned commands resulting from devices being > taken offline by the SCSI layer. There has been changes in the driver to > reset the controller should it go into a BlinkLED (Firmware Assert) state. > The symptom also acts like a condition in the older drivers (pre 08/08/2006 > on scsi-misc-2.6, showing up in 2.6.20.4) which did not reset the adapter > when it entered the BlinkLED state and merely allowed the system to lock, but > alas you are working with a driver with this reset fix in the version you > report. A BlinkLED condition generally indicates a serious hardware problem > or target incompatibility; and is generally rare as they are a result of > corner case conditio
Re: identifying failed disk/s in an array.
- Message from [EMAIL PROTECTED] - Date: Wed, 23 Jan 2008 16:05:40 +1100 From: Michael Harris <[EMAIL PROTECTED]> Reply-To: Michael Harris <[EMAIL PROTECTED]> Subject: identifying failed disk/s in an array. To: linux-raid@vger.kernel.org Hi, I have just built a Raid 5 array using mdadm and while it is running fine I have a question, about identifying the order of disks in the array. In the pre sata days you would connect your drives as follows: Primary Master - HDA Primary Slave - HDB Secondary - Master - HDC Secondary - Slave -HDD So if disk HDC failed i would know it was the primary disk on the secondary controller and would replace that drive. My current setup is as follows MB Primary Master (PATA) Primary Master - Operating System The array disks are attached to: MB Sata port 1 MB Sata port 2 PCI card Sata port 1 When i setup the array the OS drive was SDA and the other SDB,SDC,SDD. Now the problem is everytime i reboot, the drives are sometimes detected in a different order, now because i mount root via the UUID of the OS disk and the kernel looks at the superblocks of the raided drives everything comes up fine, but I'm worried that if i move the array to another machine and need to do a mdadm --assemble that i won't know the correct order of the disks and what is more worrying if i have a disk fail say HDC for example, i wont know which disk HDC is as it could be any of the 5 disks in the PC. Is there anyway to make it easier to identify which disk is which?. - End message from [EMAIL PROTECTED] - Try this: mdadm -Q --detail /dev/md0 to see which disk is which disk in the raid. To identify a disk you can examine it using: mdadm -E /dev/sd[b-d] and read your dmesg. And finally you can use "blkid" to associate UUIDs with devices. I hope this helps. Kind regards, Alex. #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # cakebox.homeunix.net - all the machine one needs.. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Error on /dev/sda, but takes down RAID-1
Hi, I'm not sure this is completely linux-raid related, but I can't figure out where to start: A few days ago, my server died. I was able to log in and salvage this content of dmesg: http://pastebin.com/m4af616df I talked to my hosting-people and they said it was an io-error on /dev/sda, and replaced that drive. After this, I was able to boot into a PXE-image and re-build the two RAID-1 devices with no problems - indicating that sdb was fine. I expected RAID-1 to be able to stomach exactly this kind of error - one drive dying. What did I do wrong? Regards, Martin Seebach - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: AACRAID driver broken in 2.6.22.x (and beyond?) [WAS: Re: 2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN]
At which version of the kernel did the aacraid driver allegedly first go broken? At which version did it get fixed? (Since 1.1.5-2451 is older than latest represented on kernel.org) How is the SATA disk'd arrayed on the aacraid controller? The controller is limited to generating 24 arrays and since /dev/sdac is the 29th target, it would appear we need more details on your array's topology inside the aacraid controller. If you are using the driver with aacraid.physical=1 and thus using the physical drives directly (in the case of a SATA disk, a SATr0.9 translation in the Firmware), this is not a supported configuration and was added only to enable limited experimentation. If there is a problem in that path in the driver, I will glad to fix it, but still unsupported. You may need to acquire a diagnostic dump from the controller (Adaptec technical support can advise, it will depend on your application suite) and a report of any error recovery actions reported by the driver in the system log as initiated by the SCSI subsystem. There are no changes in the I/O path for the aacraid driver. Due to the simplicity of the I/O path to the processor based controller, it is unlikely to be an issue in this path. There have been several changes in the driver to deal with error recovery actions initiated by the SCSI subsystem. One likely candidate was to extend the default SCSI layer timeout because it was shorter than the adapter's firmware timeout. You can check if this is the issue by manually increasing the timeout for the target(s) via sysfs. There were recent patches to deal with orphaned commands resulting from devices being taken offline by the SCSI layer. There has been changes in the driver to reset the controller should it go into a BlinkLED (Firmware Assert) state. The symptom also acts like a condition in the older drivers (pre 08/08/2006 on scsi-misc-2.6, showing up in 2.6.20.4) which did not reset the adapter when it entered the BlinkLED state and merely allowed the system to lock, but alas you are working with a driver with this reset fix in the version you report. A BlinkLED condition generally indicates a serious hardware problem or target incompatibility; and is generally rare as they are a result of corner case conditions within the Adapter Firmware. The diagnostic dump reported by the Adaptec utilities should be able to point to the fault you are experiencing if these appear to be the root causes. Sincerely -- Mark Salyzyn > -Original Message- > From: Mike Snitzer [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 22, 2008 7:10 PM > To: linux-raid@vger.kernel.org; NeilBrown > Cc: [EMAIL PROTECTED]; K. Tanaka; AACRAID; > [EMAIL PROTECTED] > Subject: AACRAID driver broken in 2.6.22.x (and beyond?) > [WAS: Re: 2.6.22.16 MD raid1 doesn't mark removed disk > faulty, MD thread goes UN] > > On Jan 22, 2008 12:29 AM, Mike Snitzer <[EMAIL PROTECTED]> wrote: > > cc'ing Tanaka-san given his recent raid1 BUG report: > > http://lkml.org/lkml/2008/1/14/515 > > > > > > On Jan 21, 2008 6:04 PM, Mike Snitzer <[EMAIL PROTECTED]> wrote: > > > Under 2.6.22.16, I physically pulled a SATA disk > (/dev/sdac, connected to > > > an aacraid controller) that was acting as the local raid1 > member of > > > /dev/md30. > > > > > > Linux MD didn't see an /dev/sdac1 error until I tried > forcing the issue by > > > doing a read (with dd) from /dev/md30: > > > The raid1d thread is locked at line 720 in raid1.c > (raid1d+2437); aka > > freeze_array: > > > > (gdb) l *0x2539 > > 0x2539 is in raid1d (drivers/md/raid1.c:720). > > 715 * wait until barrier+nr_pending match nr_queued+2 > > 716 */ > > 717 spin_lock_irq(&conf->resync_lock); > > 718 conf->barrier++; > > 719 conf->nr_waiting++; > > 720 wait_event_lock_irq(conf->wait_barrier, > > 721 > conf->barrier+conf->nr_pending == > > conf->nr_queued+2, > > 722 conf->resync_lock, > > 723 > raid1_unplug(conf->mddev->queue)); > > 724 spin_unlock_irq(&conf->resync_lock); > > > > Given Tanaka-san's report against 2.6.23 and me hitting > what seems to > > be the same deadlock in 2.6.22.16; it stands to reason this affects > > raid1 in 2.6.24-rcX too. > > Turns out that the aacraid driver in 2.6.22.x is HORRIBLY BROKEN (when > you pull a drive); it responds to MD's write requests with uptodate=1 > (in raid1_end_write_request) for the drive that was pulled! I've not > looked to see if aacraid has been fixed in newer kernels... are others > aware of any crucial aacraid fixes in 2.6.23.x or 2.6.24? > > After the drive was physically pulled, and small periodic writes > continued to the associated MD device, the raid1 MD driver did _NOT_ > detect the pulled drive's writes as having failed (verified this with > systemtap). MD happily thought the write completed to both members > (so MD had no reason to mark the pulled drive "fau
Re: 2.6.24-rc6 reproducible raid5 hang
Sorry if this breaks threaded mail readers, I only just subscribed to the list so don;t have the original post to reply to. I believe I'm having the same problem. Regarding XFS on a raid5 md array: Kernels 2.6.22-14 (Ubuntu Gutsy generic and server builds) *and* 2.6.24-rc8 (pure build from virgin sources) compiled for amd64 arch. Raid 5 configured across 4 x 500GB SATA disks (Nforce nv_sata driver, Asus M2N-E mobo, Athlon X64, 4GB RAM MD Chunk size is 1024k. This is allocated to an LVM2 PV, then sliced up. Taking one sample logical volume of 150GB I ran mkfs.xfs -d su=1024k,sw=3 -L vol_linux /dev/vg00/vol_linux I then found that putting high write load on that filesystem cause a hang. High load could be a little as a single rsync of a mirror of Ubunty Gutsy (many 10's of GB) from my old server to here. Hang would happen in a few hours typically. I could generate relatively quick hangs by running xfs_fsr (defragger) in parallel. Trying the workaround up upping /sys/block/md1/md/stripe_cache_size to 4096 seems (fingers crossed) to have helped. Been running the rsync again, plus xfs_fst + a few dd's of 11 GB to the same filesystem. I did notice also that the write speed increased dramatically with a bigger stripe_cache_size. A more detailed analysis of the problem indicated that, after the hang: I could log in; One CPU core was stuck in 100% IO wait. The other core was useable, with care. So I managed to get a SysRQ T and one place the system appeared blocked was via this path: [ 2039.466258] xfs_fsr D 0 7324 7308 [ 2039.466260] 810119399858 0082 0046 [ 2039.466263] 810110d6c680 8101102ba998 8101102ba770 8054e5e0 [ 2039.466265] 8101102ba998 00010014a1e6 810110ddcb30 [ 2039.466268] Call Trace: [ 2039.466277] [] :raid456:get_active_stripe+0x1cb/0x610 [ 2039.466282] [] default_wake_function+0x0/0x10 [ 2039.466289] [] :raid456:make_request+0x1f8/0x610 [ 2039.466293] [] autoremove_wake_function+0x0/0x30 [ 2039.466295] [] __up_read+0x21/0xb0 [ 2039.466300] [] generic_make_request+0x1d6/0x3d0 [ 2039.466303] [] vm_normal_page+0x3d/0xc0 [ 2039.466307] [] submit_bio+0x6f/0xf0 [ 2039.466311] [] dio_bio_submit+0x5c/0x90 [ 2039.466313] [] dio_send_cur_page+0x43/0xa0 [ 2039.466316] [] submit_page_section+0x4e/0x150 [ 2039.466319] [] __blockdev_direct_IO+0x742/0xb50 [ 2039.466342] [] :xfs:xfs_vm_direct_IO+0x182/0x190 [ 2039.466357] [] :xfs:xfs_get_blocks_direct+0x0/0x20 [ 2039.466370] [] :xfs:xfs_end_io_direct+0x0/0x80 [ 2039.466375] [] __wait_on_bit_lock+0x65/0x80 [ 2039.466380] [] generic_file_direct_IO+0xe3/0x190 [ 2039.466385] [] generic_file_direct_write+0x74/0x150 [ 2039.466402] [] :xfs:xfs_write+0x492/0x8f0 [ 2039.466421] [] :xfs:xfs_iunlock+0x2c/0xb0 [ 2039.466437] [] :xfs:xfs_read+0x186/0x240 [ 2039.466443] [] do_sync_write+0xd9/0x120 [ 2039.466448] [] autoremove_wake_function+0x0/0x30 [ 2039.466457] [] vfs_write+0xdd/0x190 [ 2039.466461] [] sys_write+0x53/0x90 [ 2039.466465] [] system_call+0x7e/0x83 However, I'm of the opinion that the system should not deadlock, even if tunable parameters are unfavourable. I'm happy with the workaround (indeed the system performs better). However, it will take me a week's worth of testing before I'm willing to commission this as my new fileserver. So, if there is anything anyone would like me to try, I'm happy to volunteer as a guinea pig :) Yes, I can build and patch kernels. But I'm not hot at debugging kernels so if kernel core dumps or whatever are needed, please point me at the right document or hint as to which commands I need to read about. Cheers Tim - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: identifying failed disk/s in an array.
In message <[EMAIL PROTECTED]> you wrote: > > And/or use smartctl to look up the make/model/serial number and look at the > drive label. I always do this to make sure I'm pulling the right drive (also > useful to RMA the drive) Or, probblay even faster, do a "ls -l /dev/disk/by-id" (assuming you are using udev). Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED] Command, n.: Statement presented by a human and accepted by a computer in such a manner as to make the human feel as if he is in control. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: identifying failed disk/s in an array.
Tomasz Chmielewski wrote: > Michael Harris schrieb: >> i have a disk fail say HDC for example, i wont know which disk HDC is >> as it could be any of the 5 disks in the PC. Is there anyway to make >> it easier to identify which disk is which?. > > If the drives have any LEDs, the most reliable way would be: > > dd if=/dev/drive of=/dev/null > > Then look which LED is the one which blinks the most. And/or use smartctl to look up the make/model/serial number and look at the drive label. I always do this to make sure I'm pulling the right drive (also useful to RMA the drive) David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html