Re: raid5 reshape/resync - BUGREPORT/PROBLEM
- Message from [EMAIL PROTECTED] - - Message from [EMAIL PROTECTED] - Nagilum said: (by the date of Tue, 18 Dec 2007 11:09:38 +0100) >> Ok, I've recreated the problem in form of a semiautomatic testcase. >> All necessary files (plus the old xfs_repair output) are at: >> >> http://www.nagilum.de/md/ > >> After running the test.sh the created xfs filesystem on the raid >> device is broken and (at last in my case) cannot be mounted anymore. > > I think that you should file a bugreport - End message from [EMAIL PROTECTED] - Where would I file this bug report? I thought this is the place? I could also really use a way to fix that corruption. :( ouch. To be honest I subscribed here just a month ago, so I'm not sure. But I haven't seen other bugreports here so far. I was expecting that there is some bugzilla? Not really I'm afraid. At least not aware of anything like that for vanilla. Anyway I just verified the bug on 2.6.23.11 and 2.6.24-rc5-git4. Also I came across the bug on amd64 while I'm now using a PPC750 machine to verify the bug. So it's an architecture undependant bug. (but that was to be expected) I also prepared a different version of the testcase "v2_start.sh" and "v2_test.sh". This will print out all the wrong bytes (longs to be exact) + location. It shows the data is there, but scattered. :( Kind regards, Alex. - End message from [EMAIL PROTECTED] - #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # cakebox.homeunix.net - all the machine one needs.. pgptVVVnLvuof.pgp Description: PGP Digital Signature
Re: raid5 reshape/resync - BUGREPORT
> - Message from [EMAIL PROTECTED] - Nagilum said: (by the date of Tue, 18 Dec 2007 11:09:38 +0100) > >> Ok, I've recreated the problem in form of a semiautomatic testcase. > >> All necessary files (plus the old xfs_repair output) are at: > >> > >> http://www.nagilum.de/md/ > > > >> After running the test.sh the created xfs filesystem on the raid > >> device is broken and (at last in my case) cannot be mounted anymore. > > > > I think that you should file a bugreport > - End message from [EMAIL PROTECTED] - > > Where would I file this bug report? I thought this is the place? > I could also really use a way to fix that corruption. :( ouch. To be honest I subscribed here just a month ago, so I'm not sure. But I haven't seen other bugreports here so far. I was expecting that there is some bugzilla? -- Janek Kozicki | - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 reshape/resync
- Message from [EMAIL PROTECTED] - Date: Sun, 16 Dec 2007 14:16:45 +0100 From: Janek Kozicki <[EMAIL PROTECTED]> Reply-To: Janek Kozicki <[EMAIL PROTECTED]> Subject: Re: raid5 reshape/resync To: Nagilum <[EMAIL PROTECTED]> Cc: linux-raid@vger.kernel.org Nagilum said: (by the date of Tue, 11 Dec 2007 22:56:13 +0100) Ok, I've recreated the problem in form of a semiautomatic testcase. All necessary files (plus the old xfs_repair output) are at: http://www.nagilum.de/md/ After running the test.sh the created xfs filesystem on the raid device is broken and (at last in my case) cannot be mounted anymore. I think that you should file a bugreport, and provide there the explanations you have put in there. An automated test case that leads to xfs corruption is a neat snack for bug squashers ;-) I wonder however where to report this - the xfs or raid ? Eventually cross report to both places and write in the bugreport that you are not sure on which side there is a bug. - End message from [EMAIL PROTECTED] - This is a md/mdadm problem. xfs is merely used as a vehicle to show the problem also amplified bei luks. Where would I file this bug report? I thought this is the place? I could also really use a way to fix that corruption. :( Thanks, Alex. PS: yesterday I verified this bug on 2.6.23.9, will do 2.6.23.11 today. #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # cakebox.homeunix.net - all the machine one needs.. pgpu1mUvwteaE.pgp Description: PGP Digital Signature
Re: raid5 reshape/resync
Nagilum said: (by the date of Tue, 11 Dec 2007 22:56:13 +0100) > Ok, I've recreated the problem in form of a semiautomatic testcase. > All necessary files (plus the old xfs_repair output) are at: > http://www.nagilum.de/md/ > After running the test.sh the created xfs filesystem on the raid > device is broken and (at last in my case) cannot be mounted anymore. I think that you should file a bugreport, and provide there the explanations you have put in there. An automated test case that leads to xfs corruption is a neat snack for bug squashers ;-) I wonder however where to report this - the xfs or raid ? Eventually cross report to both places and write in the bugreport that you are not sure on which side there is a bug. best regards -- Janek Kozicki | - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 reshape/resync
- Message from [EMAIL PROTECTED] - Date: Sat, 01 Dec 2007 15:48:17 +0100 From: Nagilum <[EMAIL PROTECTED]> Reply-To: Nagilum <[EMAIL PROTECTED]> Subject: Re: raid5 reshape/resync To: Neil Brown <[EMAIL PROTECTED]> Cc: linux-raid@vger.kernel.org I'm not sure how to reorder things so it will be ok again, I'll ponder about that while I try to recreate the situation using files and losetup. - End message from [EMAIL PROTECTED] - Ok, I've recreated the problem in form of a semiautomatic testcase. All necessary files (plus the old xfs_repair output) are at: http://www.nagilum.de/md/ I also added a readme: http://www.nagilum.de/md/readme.txt After running the test.sh the created xfs filesystem on the raid device is broken and (at last in my case) cannot be mounted anymore. I hope this will help finding the problem. Kind regards, Alex. #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # cakebox.homeunix.net - all the machine one needs.. pgppWJZ6Ayex5.pgp Description: PGP Digital Signature
Re: raid5 reshape/resync
- Message from [EMAIL PROTECTED] - Date: Thu, 29 Nov 2007 16:48:47 +1100 From: Neil Brown <[EMAIL PROTECTED]> Reply-To: Neil Brown <[EMAIL PROTECTED]> Subject: Re: raid5 reshape/resync To: Nagilum <[EMAIL PROTECTED]> Cc: linux-raid@vger.kernel.org > Hi, > I'm running 2.6.23.8 x86_64 using mdadm v2.6.4. > I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0) > During that reshape (at around 4%) /dev/sdd reported read errors and > went offline. Sad. > I replaced /dev/sdd with a new drive and tried to reassemble the array > (/dev/sdd was shown as removed and now as spare). There must be a step missing here. Just because one drive goes offline, that doesn't mean that you need to reassemble the array. It should just continue with the reshape until that is finished. Did you shut the machine down or did it crash or what > Assembly worked but it would not run unless I use --force. That suggests an unclean shutdown. Maybe it did crash? I started the reshape and went out. When I came back the controller was beeping (indicating the erraneous disk). I tried to log on but I could not get in. The machine was responding to pings but that was about it (no ssh or xdm login worked). So I hard rebooted. I booted into a rescue root, the /etc/mdadm/mdadm.conf didn't yet include the new disk so the raid was missing one disk and not started. Since I didn't know what exactly what was going on I --re-added sdf (the new disk) and tried to resume reshaping. A second into that the read failure on /dev/sdd was reported. So I stopped md0 and shut down to verify the read error with another controller. After I had verified that I replaced /dev/sdd with a new drive and put in the broken drive as /dev/sdg, just in case. > Since I'm always reluctant to use force I put the bad disk back in, > this time as /dev/sdg . I re-added the drive and could run the array. > The array started to resync (since the disk can be read until 4%) and > then I marked the disk as failed. Now the array is "active, degraded, > recovering": It should have restarted the reshape from whereever it was up to, so it should have hit the read error almost immediately. Do you remember where it started the reshape from? If it restarted from the beginning that would be bad. It must have continued where it left off since the reshape position in all superblocks was at about 4%. Did you just "--assemble" all the drives or did you do something else? Sorry for being a bit unexact here, I didn't actually have to use --assemble, when booting into the rescue root the raid came up with /dev/sdd and /dev/sdf removed. I just had to --re-add /dev/sdf > unusually low which seems to indicate a lot of seeking as if two > operations are happening at the same time. Well reshape is always slow as it has to read from one part of the drive and write to another part of the drive. Actually it was resyncing with the minimum speed, I managed to crank up the speed to >20MB/s by adjusting /sys/block/md0/md/sync_speed_min > Can someone relief my doubts as to whether md does the right thing here? > Thanks, I believe it is do "the right thing". > - End message from [EMAIL PROTECTED] - Ok, so the reshape tried to continue without the failed drive and after that resynced to the new spare. As I would expect. Unfortunately the result is a mess. On top of the Raid5 I have Hmm. This I would not expect. dm-crypt and LVM. Although dmcrypt and LVM dont appear to have a problem the filesystems on top are a mess now. Can you be more specific about what sort of "mess" they are in? Sure. So here is the vg-layout: nas:~# lvdisplay vg01 --- Logical volume --- LV Name/dev/vg01/lv1 VG Namevg01 LV UUID4HmzU2-VQpO-vy5R-Wdys-PmwH-AuUg-W02CKS LV Write Accessread/write LV Status available # open 0 LV Size512.00 MB Current LE 128 Segments 1 Allocation inherit Read ahead sectors 0 Block device 253:1 --- Logical volume --- LV Name/dev/vg01/lv2 VG Namevg01 LV UUID4e2ZB9-29Rb-dy4M-EzEY-cEIG-Nm1I-CPI0kk LV Write Accessread/write LV Status available # open 0 LV Size7.81 GB Current LE 2000 Segments 1 Allocation inherit Read ahead sectors 0 Block device 253:2 --- Logical volume --- LV Name/dev/vg01/lv3 VG Namevg01 LV UUIDYQRd0X-5hF8-2dd3-GG4v-wQLH-WGH0-ntGgug LV Write Accessread/write LV Status available #
Re: raid5 reshape/resync
On Sunday November 25, [EMAIL PROTECTED] wrote: > - Message from [EMAIL PROTECTED] - > Date: Sat, 24 Nov 2007 12:02:09 +0100 > From: Nagilum <[EMAIL PROTECTED]> > Reply-To: Nagilum <[EMAIL PROTECTED]> > Subject: raid5 reshape/resync >To: linux-raid@vger.kernel.org > > > Hi, > > I'm running 2.6.23.8 x86_64 using mdadm v2.6.4. > > I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0) > > During that reshape (at around 4%) /dev/sdd reported read errors and > > went offline. Sad. > > I replaced /dev/sdd with a new drive and tried to reassemble the array > > (/dev/sdd was shown as removed and now as spare). There must be a step missing here. Just because one drive goes offline, that doesn't mean that you need to reassemble the array. It should just continue with the reshape until that is finished. Did you shut the machine down or did it crash or what > > Assembly worked but it would not run unless I use --force. That suggests an unclean shutdown. Maybe it did crash? > > Since I'm always reluctant to use force I put the bad disk back in, > > this time as /dev/sdg . I re-added the drive and could run the array. > > The array started to resync (since the disk can be read until 4%) and > > then I marked the disk as failed. Now the array is "active, degraded, > > recovering": It should have restarted the reshape from whereever it was up to, so it should have hit the read error almost immediately. Do you remember where it started the reshape from? If it restarted from the beginning that would be bad. Did you just "--assemble" all the drives or did you do something else? > > > > What I find somewhat confusing/disturbing is that does not appear to > > utilize /dev/sdd. What I see here could be explained by md doing a > > RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would > > have expected it to use the new spare sdd for that. Also the speed is md cannot recover to a spare while a reshape is happening. It completes the reshape, then does the recovery (as you discovered). > > unusually low which seems to indicate a lot of seeking as if two > > operations are happening at the same time. Well reshape is always slow as it has to read from one part of the drive and write to another part of the drive. > > Also when I look at the data rates it looks more like the reshape is > > continuing even though one drive is missing (possible but risky). Yes, that is happening. > > Can someone relief my doubts as to whether md does the right thing here? > > Thanks, I believe it is do "the right thing". > > > - End message from [EMAIL PROTECTED] - > > Ok, so the reshape tried to continue without the failed drive and > after that resynced to the new spare. As I would expect. > Unfortunately the result is a mess. On top of the Raid5 I have Hmm. This I would not expect. > dm-crypt and LVM. > Although dmcrypt and LVM dont appear to have a problem the filesystems > on top are a mess now. Can you be more specific about what sort of "mess" they are in? NeilBrown > I still have the failed drive, I can read the superblock from that > drive and up to 4% from the beginning and probably backwards from the > end towards that point. > So in theory it could be possible to reorder the stripe blocks which > appears to have been messed up.(?) > Unfortunately I'm not sure what exactly went wrong or what I did > wrong. Can someone please give me hint? > Thanks, > Alex. > > > #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # > # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # > # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # > # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # > # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # > > > > > cakebox.homeunix.net - all the machine one needs.. > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 reshape/resync
- Message from [EMAIL PROTECTED] - Date: Sat, 24 Nov 2007 12:02:09 +0100 From: Nagilum <[EMAIL PROTECTED]> Reply-To: Nagilum <[EMAIL PROTECTED]> Subject: raid5 reshape/resync To: linux-raid@vger.kernel.org Hi, I'm running 2.6.23.8 x86_64 using mdadm v2.6.4. I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0) During that reshape (at around 4%) /dev/sdd reported read errors and went offline. I replaced /dev/sdd with a new drive and tried to reassemble the array (/dev/sdd was shown as removed and now as spare). Assembly worked but it would not run unless I use --force. Since I'm always reluctant to use force I put the bad disk back in, this time as /dev/sdg . I re-added the drive and could run the array. The array started to resync (since the disk can be read until 4%) and then I marked the disk as failed. Now the array is "active, degraded, recovering": nas:~# mdadm -Q --detail /dev/md0 /dev/md0: Version : 00.91.03 Creation Time : Sat Sep 15 21:11:41 2007 Raid Level : raid5 Array Size : 1953234688 (1862.75 GiB 2000.11 GB) Used Dev Size : 488308672 (465.69 GiB 500.03 GB) Raid Devices : 6 Total Devices : 7 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Nov 24 10:10:46 2007 State : active, degraded, recovering Active Devices : 5 Working Devices : 6 Failed Devices : 1 Spare Devices : 1 Layout : left-symmetric Chunk Size : 16K Reshape Status : 19% complete Delta Devices : 1, (5->6) UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380 Events : 0.726347 Number Major Minor RaidDevice State 0 800 active sync /dev/sda 1 8 161 active sync /dev/sdb 2 8 322 active sync /dev/sdc 6 8 963 faulty spare rebuilding /dev/sdg 4 8 644 active sync /dev/sde 5 8 805 active sync /dev/sdf 7 8 48- spare /dev/sdd iostat: Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 129.48 1498.01 1201.59 7520 6032 sdb 134.86 1498.01 1201.59 7520 6032 sdc 127.69 1498.01 1201.59 7520 6032 sdd 0.40 0.00 3.19 0 16 sde 111.55 1498.01 1201.59 7520 6032 sdf 117.73 0.00 1201.59 0 6032 sdg 0.00 0.00 0.00 0 0 What I find somewhat confusing/disturbing is that does not appear to utilize /dev/sdd. What I see here could be explained by md doing a RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would have expected it to use the new spare sdd for that. Also the speed is unusually low which seems to indicate a lot of seeking as if two operations are happening at the same time. Also when I look at the data rates it looks more like the reshape is continuing even though one drive is missing (possible but risky). Can someone relief my doubts as to whether md does the right thing here? Thanks, - End message from [EMAIL PROTECTED] - Ok, so the reshape tried to continue without the failed drive and after that resynced to the new spare. Unfortunately the result is a mess. On top of the Raid5 I have dm-crypt and LVM. Although dmcrypt and LVM dont appear to have a problem the filesystems on top are a mess now. I still have the failed drive, I can read the superblock from that drive and up to 4% from the beginning and probably backwards from the end towards that point. So in theory it could be possible to reorder the stripe blocks which appears to have been messed up.(?) Unfortunately I'm not sure what exactly went wrong or what I did wrong. Can someone please give me hint? Thanks, Alex. #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # cakebox.homeunix.net - all the machine one needs.. pgp3M3OZnLTve.pgp Description: PGP Digital Signature
RE: Raid5 Reshape gone wrong, please help
On Monday August 27, [EMAIL PROTECTED] wrote: > > - s.locked += handle_write_operations5(sh, 0, 1); > > + s.locked += handle_write_operations5(sh, 1, 1); > How about for clarity: > s.locked += handle_write_operations5(sh, RECONSTRUCT_WRITE, 1); > Nope. That second argument is a boolean, not an enum. If it was changed to 'writemode' (or similar) and the code in handle_write_operations5 were changed to switch(writemode) { case RECONSTRUCT_WRITE: case READ_MODIFY_WRITE: } Then it would make sense to use RECONSTRUCT_WRITE in the call - and the code would probably be more readable on the whole. But as it is, either 'true' or '1' should go there. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape gone wrong, please help
Greg Nicholson wrote: OK I've reproduced the original issue on a seperate box. 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 I have to say that trying something as critical as a reshape of live data on an -rc kernel is a great way to have a learning experience. Good that you found the problem, but also good that *you* found the problem, not me. Thanks for testing. ;-) mdadm --add /dev/md0 /dev/sda1 mdadm -G --backup-file=/root/backup.raid.file /dev/md0 (Yes, I added the backup-file this time... just to be sure.) Mdadm began the grow, and stopped in the critical section, or right after creating the backup... Not sure which. Reboot. Refused to start the array. So... mdadm -A /dev/md0 /dev/sd[abdefg]1 and we have in /proc/mdstat: Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdg1[0] sda1[5] sdf1[4] sdd1[3] sdb1[2] sde1[1] 1953535488 blocks super 0.91 level 5, 128k chunk, algorithm 2 [6/6] [UU] [>] reshape = 0.0% (512/488383872) finish=378469.4min speed=0K/sec unused devices: And it's sat there without change for the past 2 hours. Now, I have a backup, so frankly, I'm about to blow away the array and just recreate it, but I thought you should know. I've got the stripe_cache_size at 8192... 256 and 1024 don't change anything. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Raid5 Reshape gone wrong, please help
> From: Neil Brown [mailto:[EMAIL PROTECTED] > On Thursday August 23, [EMAIL PROTECTED] wrote: > > > > > > OK I've reproduced the original issue on a seperate box. > > 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 > > No, you are right. It doesn't. > > Obviously insufficient testing and review - thanks for find it for us. > Agreed - seconded. > This patch seems to make it work - raid5 and raid6. > > Dan: Could you check it for me, particularly the moving of > + async_tx_ack(tx); > + dma_wait_for_async_tx(tx); > outside of the loop. > Yes, this definitely needs to be outside the loop. > Greg: could you pleas check it works for you too - it works for me, > but double-testing never hurts. > > Thanks again, > > NeilBrown > > > > - > Fix some bugs with growing raid5/raid6 arrays. > > > > ### Diffstat output > ./drivers/md/raid5.c | 17 + > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c > --- .prev/drivers/md/raid5.c 2007-08-24 16:36:22.0 +1000 > +++ ./drivers/md/raid5.c 2007-08-27 20:50:57.0 +1000 > @@ -2541,7 +2541,7 @@ static void handle_stripe_expansion(raid > struct dma_async_tx_descriptor *tx = NULL; > clear_bit(STRIPE_EXPAND_SOURCE, &sh->state); > for (i = 0; i < sh->disks; i++) > - if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) { > + if (i != sh->pd_idx && (!r6s || i != r6s->qd_idx)) { > int dd_idx, pd_idx, j; > struct stripe_head *sh2; > > @@ -2574,7 +2574,8 @@ static void handle_stripe_expansion(raid > set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags); > for (j = 0; j < conf->raid_disks; j++) > if (j != sh2->pd_idx && > - (r6s && j != r6s->qd_idx) && > + (!r6s || j != raid6_next_disk(sh2->pd_idx, > + sh2->disks)) && > !test_bit(R5_Expanded, &sh2->dev[j].flags)) > break; > if (j == conf->raid_disks) { > @@ -2583,12 +2584,12 @@ static void handle_stripe_expansion(raid > } > release_stripe(sh2); > > - /* done submitting copies, wait for them to complete */ > - if (i + 1 >= sh->disks) { > - async_tx_ack(tx); > - dma_wait_for_async_tx(tx); > - } > } > + /* done submitting copies, wait for them to complete */ > + if (tx) { > + async_tx_ack(tx); > + dma_wait_for_async_tx(tx); > + } > } > > /* > @@ -2855,7 +2856,7 @@ static void handle_stripe5(struct stripe > sh->disks = conf->raid_disks; > sh->pd_idx = stripe_to_pdidx(sh->sector, conf, > conf->raid_disks); > - s.locked += handle_write_operations5(sh, 0, 1); > + s.locked += handle_write_operations5(sh, 1, 1); How about for clarity: s.locked += handle_write_operations5(sh, RECONSTRUCT_WRITE, 1); > } else if (s.expanded && > !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) { > clear_bit(STRIPE_EXPAND_READY, &sh->state); Signed-off-by: Dan Williams <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape gone wrong, please help
On Thursday August 23, [EMAIL PROTECTED] wrote: > > > OK I've reproduced the original issue on a seperate box. > 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 No, you are right. It doesn't. Obviously insufficient testing and review - thanks for find it for us. This patch seems to make it work - raid5 and raid6. Dan: Could you check it for me, particularly the moving of + async_tx_ack(tx); + dma_wait_for_async_tx(tx); outside of the loop. Greg: could you pleas check it works for you too - it works for me, but double-testing never hurts. Thanks again, NeilBrown - Fix some bugs with growing raid5/raid6 arrays. ### Diffstat output ./drivers/md/raid5.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2007-08-24 16:36:22.0 +1000 +++ ./drivers/md/raid5.c2007-08-27 20:50:57.0 +1000 @@ -2541,7 +2541,7 @@ static void handle_stripe_expansion(raid struct dma_async_tx_descriptor *tx = NULL; clear_bit(STRIPE_EXPAND_SOURCE, &sh->state); for (i = 0; i < sh->disks; i++) - if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) { + if (i != sh->pd_idx && (!r6s || i != r6s->qd_idx)) { int dd_idx, pd_idx, j; struct stripe_head *sh2; @@ -2574,7 +2574,8 @@ static void handle_stripe_expansion(raid set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags); for (j = 0; j < conf->raid_disks; j++) if (j != sh2->pd_idx && - (r6s && j != r6s->qd_idx) && + (!r6s || j != raid6_next_disk(sh2->pd_idx, +sh2->disks)) && !test_bit(R5_Expanded, &sh2->dev[j].flags)) break; if (j == conf->raid_disks) { @@ -2583,12 +2584,12 @@ static void handle_stripe_expansion(raid } release_stripe(sh2); - /* done submitting copies, wait for them to complete */ - if (i + 1 >= sh->disks) { - async_tx_ack(tx); - dma_wait_for_async_tx(tx); - } } + /* done submitting copies, wait for them to complete */ + if (tx) { + async_tx_ack(tx); + dma_wait_for_async_tx(tx); + } } /* @@ -2855,7 +2856,7 @@ static void handle_stripe5(struct stripe sh->disks = conf->raid_disks; sh->pd_idx = stripe_to_pdidx(sh->sector, conf, conf->raid_disks); - s.locked += handle_write_operations5(sh, 0, 1); + s.locked += handle_write_operations5(sh, 1, 1); } else if (s.expanded && !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) { clear_bit(STRIPE_EXPAND_READY, &sh->state); - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape gone wrong, please help
On 8/23/07, Greg Nicholson <[EMAIL PROTECTED]> wrote: > > > OK I've reproduced the original issue on a seperate box. > 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 > > mdadm --add /dev/md0 /dev/sda1 > mdadm -G --backup-file=/root/backup.raid.file /dev/md0 > > (Yes, I added the backup-file this time... just to be sure.) > > Mdadm began the grow, and stopped in the critical section, or right > after creating the backup... Not sure which. Reboot. > > Refused to start the array. So... > > mdadm -A /dev/md0 /dev/sd[abdefg]1 > > and we have in /proc/mdstat: > > Personalities : [raid6] [raid5] [raid4] > md0 : active raid5 sdg1[0] sda1[5] sdf1[4] sdd1[3] sdb1[2] sde1[1] > 1953535488 blocks super 0.91 level 5, 128k chunk, algorithm 2 > [6/6] [UU] > [>] reshape = 0.0% (512/488383872) > finish=378469.4min speed=0K/sec > > unused devices: > > And it's sat there without change for the past 2 hours. Now, I have a > backup, so frankly, I'm about to blow away the array and just recreate > it, but I thought you should know. > > I've got the stripe_cache_size at 8192... 256 and 1024 don't change anything. > Forgot the DMESG output: md: bind md: bind md: bind md: bind md: bind md: bind md: md0: raid array is not clean -- starting background reconstruction raid5: reshape will continue raid5: device sdg1 operational as raid disk 0 raid5: device sda1 operational as raid disk 5 raid5: device sdf1 operational as raid disk 4 raid5: device sdd1 operational as raid disk 3 raid5: device sdb1 operational as raid disk 2 raid5: device sde1 operational as raid disk 1 raid5: allocated 6293kB for md0 raid5: raid level 5 set md0 active with 6 out of 6 devices, algorithm 2 RAID5 conf printout: --- rd:6 wd:6 disk 0, o:1, dev:sdg1 disk 1, o:1, dev:sde1 disk 2, o:1, dev:sdb1 disk 3, o:1, dev:sdd1 disk 4, o:1, dev:sdf1 disk 5, o:1, dev:sda1 ...ok start reshape thread md: reshape of RAID array md0 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for reshape. md: using 128k window, over a total of 488383872 blocks. Looks good, but it doesn't actually do anything. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape gone wrong, please help
OK I've reproduced the original issue on a seperate box. 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 mdadm --add /dev/md0 /dev/sda1 mdadm -G --backup-file=/root/backup.raid.file /dev/md0 (Yes, I added the backup-file this time... just to be sure.) Mdadm began the grow, and stopped in the critical section, or right after creating the backup... Not sure which. Reboot. Refused to start the array. So... mdadm -A /dev/md0 /dev/sd[abdefg]1 and we have in /proc/mdstat: Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdg1[0] sda1[5] sdf1[4] sdd1[3] sdb1[2] sde1[1] 1953535488 blocks super 0.91 level 5, 128k chunk, algorithm 2 [6/6] [UU] [>] reshape = 0.0% (512/488383872) finish=378469.4min speed=0K/sec unused devices: And it's sat there without change for the past 2 hours. Now, I have a backup, so frankly, I'm about to blow away the array and just recreate it, but I thought you should know. I've got the stripe_cache_size at 8192... 256 and 1024 don't change anything. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape gone wrong, please help
On 8/19/07, Greg Nicholson <[EMAIL PROTECTED]> wrote: > On 8/19/07, Greg Nicholson <[EMAIL PROTECTED]> wrote: > > On 8/19/07, Neil Brown <[EMAIL PROTECTED]> wrote: > > > On Saturday August 18, [EMAIL PROTECTED] wrote: > > > > > > > > That looks to me like the first 2 gig is completely empty on the > > > > drive. I really don't think it actually started to do anything. > > > > > > The backup data is near the end of the device. If you look at the > > > last 2 gig you should see something. > > > > > > > I figured something like that after I started thinking about it... > > That device is currently offline while I do some DD's to new devices. > > > > > > > > > > Do you have further suggestions on where to go now? > > > > > > Maybe an 'strace' of "mdadm -A " might show me something. > > > > > > If you feel like following the code, Assemble (in Assemble.c) should > > > call Grow_restart. > > > This should look in /dev/sdb1 (which is already open in 'fdlist') by > > > calling 'load_super'. It should then seek to 8 sectors before the > > > superblock (or close to there) and read a secondary superblock which > > > describes the backup data. > > > If this looks good, it seeks to where the backup data is (which is > > > towards the end of the device) and reads that. It uses this to > > > restore the 'critical section', and then updates the superblock on all > > > devices. > > > > > > As you aren't getting the messages 'restoring critical section', > > > something is going wrong before there. It should fail: > > > /dev/md0: Failed to restore critical section for reshape, sorry. > > > but I can see that there is a problem with the error return from > > > 'Grow_restart'. I'll get that fixed. > > > > > > > > > > > > > > Oh, and thank you very much for your help. Most of the data on this > > > > array I can stand to loose... It's not critical, but there are some of > > > > my photographs on this that my backup is out of date on. I can > > > > destroy it all and start over, but really want to try to recover this > > > > if it's possible. For that matter, if it didn't actually start > > > > rewriting the stripes, is there anyway to push it back down to 4 disks > > > > to recover ? > > > > > > You could always just recreate the array: > > > > > > mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1 \ > > > /dev/sdd1 /dev/sdc1 > > > > > > and make sure the data looks good (which it should). > > > > > > I'd still like to know that the problem is though > > > > > > Thanks, > > > NeilBeon > > > > > > > My current plan of attack, which I've been proceeding upon for the > > last 24 hours... I'm DDing the original drives to new devices. Once I > > have copies of the drives, I'm going to try to recreate the array as a > > 4 device array. Hopefully, at that point, the raid will come up, LVM > > will initialize, and it's time to saturate the GigE offloading > > EVERYTHING. > > > > Assuming the above goes well which will definitely take some time, > > Then I'll take the original drives, run the strace and try to get some > > additional data for you. I'd love to know what's up with this as > > well. If there is additional information I can get you to help, let > > me know. I've grown several arrays before without any issue, which > > frankly is why I didn't think this would have been an issue thus, > > my offload of the stuff I actually cared about wasn't up to date. > > > > At the end of day (or more likely, week) I'll completely destroy the > > existing raid, and rebuild the entire thing to make sure I'm starting > > from a good base. At least at that point, I'll have additional > > drives. Given that I have dual File-servers that will have drives > > added, it seems likely that I'll be testing the code again soon. Big > > difference being that this time, I won't make the assumption that > > everything will be perfect. :) > > > > Thanks again for your help, I'll post on my results as well as try to > > get you that strace. It's been quite a while since I dove into kernel > > internals, or C for that matter, so it's unlikely I'm going to find > > anything myself But I'll definitely send results back if I can. > > > > > Ok, as an update. ORDER MATTERS. :) > > The above command didn't work. It built, but LVM didn't recognize. > So, after despair, I thought, that's not the way I built it. So, I > redid it in Alphabetical order... and it worked. > > I'm in the process of taring and pulling everything off. > > Once that is done, I'll put the original drives back in, and try to > understand what went wrong with the original grow/build. > And as a final update... I pulled all the data from the 4 disk array I built from the copied Disks. Everything looks to be intact. That is definitely a better feeling for me. I then put the original disks back in, and compiled 2.6.3 to see if it did any better on the assemble. It appears that your update about the critical section missing
Re: Raid5 Reshape gone wrong, please help
On 8/19/07, Greg Nicholson <[EMAIL PROTECTED]> wrote: > On 8/19/07, Neil Brown <[EMAIL PROTECTED]> wrote: > > On Saturday August 18, [EMAIL PROTECTED] wrote: > > > > > > That looks to me like the first 2 gig is completely empty on the > > > drive. I really don't think it actually started to do anything. > > > > The backup data is near the end of the device. If you look at the > > last 2 gig you should see something. > > > > I figured something like that after I started thinking about it... > That device is currently offline while I do some DD's to new devices. > > > > > > > Do you have further suggestions on where to go now? > > > > Maybe an 'strace' of "mdadm -A " might show me something. > > > > If you feel like following the code, Assemble (in Assemble.c) should > > call Grow_restart. > > This should look in /dev/sdb1 (which is already open in 'fdlist') by > > calling 'load_super'. It should then seek to 8 sectors before the > > superblock (or close to there) and read a secondary superblock which > > describes the backup data. > > If this looks good, it seeks to where the backup data is (which is > > towards the end of the device) and reads that. It uses this to > > restore the 'critical section', and then updates the superblock on all > > devices. > > > > As you aren't getting the messages 'restoring critical section', > > something is going wrong before there. It should fail: > > /dev/md0: Failed to restore critical section for reshape, sorry. > > but I can see that there is a problem with the error return from > > 'Grow_restart'. I'll get that fixed. > > > > > > > > > > Oh, and thank you very much for your help. Most of the data on this > > > array I can stand to loose... It's not critical, but there are some of > > > my photographs on this that my backup is out of date on. I can > > > destroy it all and start over, but really want to try to recover this > > > if it's possible. For that matter, if it didn't actually start > > > rewriting the stripes, is there anyway to push it back down to 4 disks > > > to recover ? > > > > You could always just recreate the array: > > > > mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1 \ > > /dev/sdd1 /dev/sdc1 > > > > and make sure the data looks good (which it should). > > > > I'd still like to know that the problem is though > > > > Thanks, > > NeilBeon > > > > My current plan of attack, which I've been proceeding upon for the > last 24 hours... I'm DDing the original drives to new devices. Once I > have copies of the drives, I'm going to try to recreate the array as a > 4 device array. Hopefully, at that point, the raid will come up, LVM > will initialize, and it's time to saturate the GigE offloading > EVERYTHING. > > Assuming the above goes well which will definitely take some time, > Then I'll take the original drives, run the strace and try to get some > additional data for you. I'd love to know what's up with this as > well. If there is additional information I can get you to help, let > me know. I've grown several arrays before without any issue, which > frankly is why I didn't think this would have been an issue thus, > my offload of the stuff I actually cared about wasn't up to date. > > At the end of day (or more likely, week) I'll completely destroy the > existing raid, and rebuild the entire thing to make sure I'm starting > from a good base. At least at that point, I'll have additional > drives. Given that I have dual File-servers that will have drives > added, it seems likely that I'll be testing the code again soon. Big > difference being that this time, I won't make the assumption that > everything will be perfect. :) > > Thanks again for your help, I'll post on my results as well as try to > get you that strace. It's been quite a while since I dove into kernel > internals, or C for that matter, so it's unlikely I'm going to find > anything myself But I'll definitely send results back if I can. > Ok, as an update. ORDER MATTERS. :) The above command didn't work. It built, but LVM didn't recognize. So, after despair, I thought, that's not the way I built it. So, I redid it in Alphabetical order... and it worked. I'm in the process of taring and pulling everything off. Once that is done, I'll put the original drives back in, and try to understand what went wrong with the original grow/build. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape gone wrong, please help
On 8/19/07, Neil Brown <[EMAIL PROTECTED]> wrote: > On Saturday August 18, [EMAIL PROTECTED] wrote: > > > > That looks to me like the first 2 gig is completely empty on the > > drive. I really don't think it actually started to do anything. > > The backup data is near the end of the device. If you look at the > last 2 gig you should see something. > I figured something like that after I started thinking about it... That device is currently offline while I do some DD's to new devices. > > > > Do you have further suggestions on where to go now? > > Maybe an 'strace' of "mdadm -A " might show me something. > > If you feel like following the code, Assemble (in Assemble.c) should > call Grow_restart. > This should look in /dev/sdb1 (which is already open in 'fdlist') by > calling 'load_super'. It should then seek to 8 sectors before the > superblock (or close to there) and read a secondary superblock which > describes the backup data. > If this looks good, it seeks to where the backup data is (which is > towards the end of the device) and reads that. It uses this to > restore the 'critical section', and then updates the superblock on all > devices. > > As you aren't getting the messages 'restoring critical section', > something is going wrong before there. It should fail: > /dev/md0: Failed to restore critical section for reshape, sorry. > but I can see that there is a problem with the error return from > 'Grow_restart'. I'll get that fixed. > > > > > > Oh, and thank you very much for your help. Most of the data on this > > array I can stand to loose... It's not critical, but there are some of > > my photographs on this that my backup is out of date on. I can > > destroy it all and start over, but really want to try to recover this > > if it's possible. For that matter, if it didn't actually start > > rewriting the stripes, is there anyway to push it back down to 4 disks > > to recover ? > > You could always just recreate the array: > > mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1 \ > /dev/sdd1 /dev/sdc1 > > and make sure the data looks good (which it should). > > I'd still like to know that the problem is though > > Thanks, > NeilBeon > My current plan of attack, which I've been proceeding upon for the last 24 hours... I'm DDing the original drives to new devices. Once I have copies of the drives, I'm going to try to recreate the array as a 4 device array. Hopefully, at that point, the raid will come up, LVM will initialize, and it's time to saturate the GigE offloading EVERYTHING. Assuming the above goes well which will definitely take some time, Then I'll take the original drives, run the strace and try to get some additional data for you. I'd love to know what's up with this as well. If there is additional information I can get you to help, let me know. I've grown several arrays before without any issue, which frankly is why I didn't think this would have been an issue thus, my offload of the stuff I actually cared about wasn't up to date. At the end of day (or more likely, week) I'll completely destroy the existing raid, and rebuild the entire thing to make sure I'm starting from a good base. At least at that point, I'll have additional drives. Given that I have dual File-servers that will have drives added, it seems likely that I'll be testing the code again soon. Big difference being that this time, I won't make the assumption that everything will be perfect. :) Thanks again for your help, I'll post on my results as well as try to get you that strace. It's been quite a while since I dove into kernel internals, or C for that matter, so it's unlikely I'm going to find anything myself But I'll definitely send results back if I can. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape gone wrong, please help
On Saturday August 18, [EMAIL PROTECTED] wrote: > > That looks to me like the first 2 gig is completely empty on the > drive. I really don't think it actually started to do anything. The backup data is near the end of the device. If you look at the last 2 gig you should see something. > > Do you have further suggestions on where to go now? Maybe an 'strace' of "mdadm -A " might show me something. If you feel like following the code, Assemble (in Assemble.c) should call Grow_restart. This should look in /dev/sdb1 (which is already open in 'fdlist') by calling 'load_super'. It should then seek to 8 sectors before the superblock (or close to there) and read a secondary superblock which describes the backup data. If this looks good, it seeks to where the backup data is (which is towards the end of the device) and reads that. It uses this to restore the 'critical section', and then updates the superblock on all devices. As you aren't getting the messages 'restoring critical section', something is going wrong before there. It should fail: /dev/md0: Failed to restore critical section for reshape, sorry. but I can see that there is a problem with the error return from 'Grow_restart'. I'll get that fixed. > > Oh, and thank you very much for your help. Most of the data on this > array I can stand to loose... It's not critical, but there are some of > my photographs on this that my backup is out of date on. I can > destroy it all and start over, but really want to try to recover this > if it's possible. For that matter, if it didn't actually start > rewriting the stripes, is there anyway to push it back down to 4 disks > to recover ? You could always just recreate the array: mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1 \ /dev/sdd1 /dev/sdc1 and make sure the data looks good (which it should). I'd still like to know that the problem is though Thanks, NeilBeon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape gone wrong, please help
On 8/18/07, Neil Brown <[EMAIL PROTECTED]> wrote: > On Friday August 17, [EMAIL PROTECTED] wrote: > > I was trying to resize a Raid 5 array of 4 500G drives to 5. Kernel > > version 2.6.23-rc3 was the kernel I STARTED on this. > > > > I added the device to the array : > > mdadm --add /dev/md0 /dev/sdb1 > > > > Then I started to grow the array : > > mdadm --grow /dev/md0 --raid-devices=5 > > > > At this point the machine locked up. Not good. > > No, not good. But it shouldn't be fatal. Well, that was my thought as well. > > > > > I ended up having to hard reboot. Now, I have the following in dmesg : > > > > md: md0: raid array is not clean -- starting background reconstruction > > raid5: reshape_position too early for auto-recovery - aborting. > > md: pers->run() failed ... > > Looks like you crashed during the 'critical' period. > > > > > /proc/mdstat is : > > > > Personalities : [raid6] [raid5] [raid4] > > md0 : inactive sdf1[0] sdb1[4] sdc1[3] sdd1[2] sde1[1] > > 2441918720 blocks super 0.91 > > > > unused devices: > > > > > > It doesn't look like it actually DID anything besides update the raid > > count to 5 from 4. (I think.) > > > > How do I do a manual recovery on this? > > Simply use mdadm to assemble the array: > > mdadm -A /dev/md0 /dev/sd[bcdef]1 > > It should notice that the kernel needs help, and will provide > that help. > Specifically, when you started the 'grow', mdadm copied the first few > stripes into unused space in the new device. When you re-assemble, it > will copy those stripes back into the new layout, then let the kernel > do the rest. > > Please let us know how it goes. > > NeilBrown > I had already tried to assemble it by hand, before I basically said... WAIT. Ask for help. Don't screw up more. :) But I tried again: [EMAIL PROTECTED] { }$ mdadm -A /dev/md0 /dev/sd[bcdef]1 mdadm: device /dev/md0 already active - cannot assemble it [EMAIL PROTECTED] { ~ }$ mdadm -S /dev/md0 mdadm: stopped /dev/md0 [EMAIL PROTECTED] { ~ }$ mdadm -A /dev/md0 /dev/sd[bcdef]1 mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument Dmesg shows: md: md0 stopped. md: unbind md: export_rdev(sdf1) md: unbind md: export_rdev(sdb1) md: unbind md: export_rdev(sdc1) md: unbind md: export_rdev(sdd1) md: unbind md: export_rdev(sde1) md: md0 stopped. md: bind md: bind md: bind md: bind md: bind md: md0: raid array is not clean -- starting background reconstruction raid5: reshape_position too early for auto-recovery - aborting. md: pers->run() failed ... md: md0 stopped. md: unbind md: export_rdev(sdf1) md: unbind md: export_rdev(sdb1) md: unbind md: export_rdev(sdc1) md: unbind md: export_rdev(sdd1) md: unbind md: export_rdev(sde1) md: md0 stopped. md: bind md: bind md: bind md: bind md: bind md: md0: raid array is not clean -- starting background reconstruction raid5: reshape_position too early for auto-recovery - aborting. md: pers->run() failed ... And the raid stays in an inactive state. Using mdadm v2.6.2 and kernel 2.6.23-rc3, although I can push back to earlier versions easily if it would help. I know that sdb1 is the new device. When mdadm ran, it said the critical section was 3920k (approximately). When I didn't get a response for five minutes, and there wasn't ANY disk activity, I booted the box. Based on your message and the man page, it sounds like mdadm should have placed something on sdb1. So... Trying to be non-destructive, but still gather information: dd if=/dev/sdb1 of=/tmp/test bs=1024k count=1000 hexdump /tmp/test 000 * 3e80 dd if=/dev/sdb1 of=/tmp/test bs=1024k count=1000 skip=999 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 35.0176 seconds, 29.9 MB/s [EMAIL PROTECTED] { ~ }$ hexdump /tmp/test 000 * 3e80 That looks to me like the first 2 gig is completely empty on the drive. I really don't think it actually started to do anything. Do you have further suggestions on where to go now? Oh, and thank you very much for your help. Most of the data on this array I can stand to loose... It's not critical, but there are some of my photographs on this that my backup is out of date on. I can destroy it all and start over, but really want to try to recover this if it's possible. For that matter, if it didn't actually start rewriting the stripes, is there anyway to push it back down to 4 disks to recover ? - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape gone wrong, please help
On Friday August 17, [EMAIL PROTECTED] wrote: > I was trying to resize a Raid 5 array of 4 500G drives to 5. Kernel > version 2.6.23-rc3 was the kernel I STARTED on this. > > I added the device to the array : > mdadm --add /dev/md0 /dev/sdb1 > > Then I started to grow the array : > mdadm --grow /dev/md0 --raid-devices=5 > > At this point the machine locked up. Not good. No, not good. But it shouldn't be fatal. > > I ended up having to hard reboot. Now, I have the following in dmesg : > > md: md0: raid array is not clean -- starting background reconstruction > raid5: reshape_position too early for auto-recovery - aborting. > md: pers->run() failed ... Looks like you crashed during the 'critical' period. > > /proc/mdstat is : > > Personalities : [raid6] [raid5] [raid4] > md0 : inactive sdf1[0] sdb1[4] sdc1[3] sdd1[2] sde1[1] > 2441918720 blocks super 0.91 > > unused devices: > > > It doesn't look like it actually DID anything besides update the raid > count to 5 from 4. (I think.) > > How do I do a manual recovery on this? Simply use mdadm to assemble the array: mdadm -A /dev/md0 /dev/sd[bcdef]1 It should notice that the kernel needs help, and will provide that help. Specifically, when you started the 'grow', mdadm copied the first few stripes into unused space in the new device. When you re-assemble, it will copy those stripes back into the new layout, then let the kernel do the rest. Please let us know how it goes. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 reshape bug with XFS
On Sunday November 5, [EMAIL PROTECTED] wrote: > Neil Brown wrote: > > On Saturday November 4, [EMAIL PROTECTED] wrote: > > > >> Hi, > >> > >> I'm setting up a raid 5 system and I ran across a bug when reshaping an > >> array with a mounted XFS filesystem on it. This is under linux 2.6.18.2 > >> and mdadm 2.5.5 > > You have CONFIG_LBD=n don't you? > > > Yes, > > I have CONFIG_LBD=n > > ...and the patch fixed the problem. Cool thanks. > > Side Note: I just converted 2 raid0 drives into a 4 drive raid5 array > in-place, with relative ease. > I couldn't have done it without the work you (and I'm sure others) have > done. Thanks. And without bug reports like yours others would have more problems. Thanks. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 reshape bug with XFS
Neil Brown wrote: On Saturday November 4, [EMAIL PROTECTED] wrote: Hi, I'm setting up a raid 5 system and I ran across a bug when reshaping an array with a mounted XFS filesystem on it. This is under linux 2.6.18.2 and mdadm 2.5.5 You have CONFIG_LBD=n don't you? Yes, I have CONFIG_LBD=n ...and the patch fixed the problem. Side Note: I just converted 2 raid0 drives into a 4 drive raid5 array in-place, with relative ease. I couldn't have done it without the work you (and I'm sure others) have done. Thanks. -Bill - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 reshape bug with XFS
On Saturday November 4, [EMAIL PROTECTED] wrote: > Hi, > > I'm setting up a raid 5 system and I ran across a bug when reshaping an > array with a mounted XFS filesystem on it. This is under linux 2.6.18.2 > and mdadm 2.5.5 > ... > [EMAIL PROTECTED] $ mdadm --detail /dev/md4 > /dev/md4: > Version : 00.90.03 > Creation Time : Sat Nov 4 18:58:59 2006 > Raid Level : raid5 > >> > >>Array Size : 2086592 (2038.03 MiB 2136.67 MB) > >> > Device Size : 10482240 (10.00 GiB 10.73 GB) >Raid Devices : 4 > Total Devices : 4 > Preferred Minor : 4 > Persistence : Superblock is persistent > > (2086592 != 31446720 -- Bad, much too small) You have CONFIG_LBD=n don't you? Thanks for the report. This should fix it. Please let me know if it does. NeilBrown Signed-off-by: Neil Brown <[EMAIL PROTECTED]> ### Diffstat output ./drivers/md/raid5.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2006-11-03 15:11:52.0 +1100 +++ ./drivers/md/raid5.c2006-11-06 09:55:20.0 +1100 @@ -3909,7 +3909,7 @@ static void end_reshape(raid5_conf_t *co bdev = bdget_disk(conf->mddev->gendisk, 0); if (bdev) { mutex_lock(&bdev->bd_inode->i_mutex); - i_size_write(bdev->bd_inode, conf->mddev->array_size << 10); + i_size_write(bdev->bd_inode, (loff_t)conf->mddev->array_size << 10); mutex_unlock(&bdev->bd_inode->i_mutex); bdput(bdev); } - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape Status + xfs_growfs = Success! (2.6.17.3)
On Jul 11 2006 12:03, Justin Piszcz wrote: >Subject: Raid5 Reshape Status + xfs_growfs = Success! (2.6.17.3) Now we just need shrink-reshaping and xfs_shrinkfs... :) Jan Engelhardt -- - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape Status + xfs_growfs = Success! (2.6.17.3)
On Tuesday July 11, [EMAIL PROTECTED] wrote: > Neil, > > It worked, echo'ing the 600 > to the stripe width in /sys, however, how > come /dev/md3 says it is 0 MB when I type fdisk -l? > > Is this normal? Yes. The 'cylinders' number is limited to 16bits. For you 2.2TB array, the number of 'cylinders' (given 2 heads and 4 sectors) would be about 500,000 which doesn't fit into 16 bits. > > Furthermore, the xfs_growfs worked beautifully! > Excellent! NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape (Solved)
Neil Well I did warn you that I was an idiot... :-) I have been attempting to work out exactly what I did and what happened. All I have learned is that I need to keep better notes Yes, the 21 mounts is a fsck, nothing to do with raid. However it is still noteworthy that this took several hours to complete with the raid also reshaping rather than the few minutes I have seen in the past. Some kind of interaction there. I think that the kernel I was using had both the fixes you had sent me in it, but I honestly can't be sure - Sorry. In the past, that bug caused it to fail immediately and the reshape to freeze. This appeared to occur after the reshape, maybe a problem at the end of the reshape process. Probably however I screwed up, and I have no way to retest. Finally, just a note to say that the system continues to work just fine and I am really impressed. Thanks again Nigel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
On Tuesday June 20, [EMAIL PROTECTED] wrote: > Nigel J. Terry wrote: > > Well good news and bad news I'm afraid... > > Well I would like to be able to tell you that the time calculation now > works, but I can't. Here's why: Why I rebooted with the newly built > kernel, it decided to hit the magic 21 reboots and hence decided to > check the array for clean. The normally takes about 5-10 mins, but this > time took several hours, so I went to bed! I suspect that it was doing > the full reshape or something similar at boot time. > What "magic 21 reboots"?? md has no mechanism to automatically check the array after N reboots or anything like that. Or are you thinking of the 'fsck' that does a full check every so-often? > Now I am not sure that this makes good sense in a normal environment. > This could keep a server down for hours or days. I might suggest that if > such work was required, the clean check is postponed till next boot and > the reshape allowed to continue in the background. An fsck cannot tell if there is a reshape happening, but the reshape should notice the fsck and slow down to a crawl so the fsck can complete... > > Anyway the good news is that this morning, all is well, the array is > clean and grown as can be seen below. However, if you look further below > you will see the section from dmesg which still shows RIP errors, so I > guess there is still something wrong, even though it looks like it is > working. Let me know if i can provide any more information. > > Once again, many thanks. All I need to do now is grow the ext3 filesystem... . > ...ok start reshape thread > md: syncing RAID array md0 > md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. > md: using maximum available idle IO bandwidth (but not more than 20 > KB/sec) for reconstruction. > md: using 128k window, over a total of 245111552 blocks. > Unable to handle kernel NULL pointer dereference at RIP: > <>{stext+2145382632} > PGD 7c3f9067 PUD 7cb9e067 PMD 0 > Process md0_reshape (pid: 1432, threadinfo 81007aa42000, task > 810037f497b0) > Stack: 803dce42 1d383600 > > > Call Trace: {md_do_sync+1307} > {thread_return+0} >{thread_return+94} > {keventd_create_kthread+0} >{md_thread+248} That looks very much like the bug that I already sent you a patch for! Are you sure that the new kernel still had this patch? I'm a bit confused by this NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Nigel J. Terry wrote: Well good news and bad news I'm afraid... Well I would like to be able to tell you that the time calculation now works, but I can't. Here's why: Why I rebooted with the newly built kernel, it decided to hit the magic 21 reboots and hence decided to check the array for clean. The normally takes about 5-10 mins, but this time took several hours, so I went to bed! I suspect that it was doing the full reshape or something similar at boot time. Now I am not sure that this makes good sense in a normal environment. This could keep a server down for hours or days. I might suggest that if such work was required, the clean check is postponed till next boot and the reshape allowed to continue in the background. Anyway the good news is that this morning, all is well, the array is clean and grown as can be seen below. However, if you look further below you will see the section from dmesg which still shows RIP errors, so I guess there is still something wrong, even though it looks like it is working. Let me know if i can provide any more information. Once again, many thanks. All I need to do now is grow the ext3 filesystem... Nigel [EMAIL PROTECTED] ~]# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Tue Apr 18 17:44:34 2006 Raid Level : raid5 Array Size : 735334656 (701.27 GiB 752.98 GB) Device Size : 245111552 (233.76 GiB 250.99 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Jun 20 06:27:49 2006 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 128K UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb Events : 0.3366644 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 8 171 active sync /dev/sdb1 2 3 652 active sync /dev/hdb1 3 2213 active sync /dev/hdc1 [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[3] hdb1[2] 735334656 blocks level 5, 128k chunk, algorithm 2 [4/4] [] unused devices: [EMAIL PROTECTED] ~]# But from dmesg: md: Autodetecting RAID arrays. md: autorun ... md: considering sdb1 ... md: adding sdb1 ... md: adding sda1 ... md: adding hdc1 ... md: adding hdb1 ... md: created md0 md: bind md: bind md: bind md: bind md: running: raid5: automatically using best checksumming function: generic_sse generic_sse: 6795.000 MB/sec raid5: using function: generic_sse (6795.000 MB/sec) md: raid5 personality registered for level 5 md: raid4 personality registered for level 4 raid5: reshape will continue raid5: device sdb1 operational as raid disk 1 raid5: device sda1 operational as raid disk 0 raid5: device hdb1 operational as raid disk 2 raid5: allocated 4268kB for md0 raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2 RAID5 conf printout: --- rd:4 wd:3 fd:1 disk 0, o:1, dev:sda1 disk 1, o:1, dev:sdb1 disk 2, o:1, dev:hdb1 ...ok start reshape thread md: syncing RAID array md0 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for reconstruction. md: using 128k window, over a total of 245111552 blocks. Unable to handle kernel NULL pointer dereference at RIP: <>{stext+2145382632} PGD 7c3f9067 PUD 7cb9e067 PMD 0 Oops: 0010 [1] SMP CPU 0 Modules linked in: raid5 xor usb_storage video button battery ac lp parport_pc parport floppy nvram snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ehci_hcd ohci1394 ieee1394 sg snd_pcm uhci_hcd i2c_nforce2 i2c_core forcedeth ohci_hcd snd_timer snd soundcore snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv libata sd_mod scsi_mod Pid: 1432, comm: md0_reshape Not tainted 2.6.17-rc6 #1 RIP: 0010:[<>] <>{stext+2145382632} RSP: :81007aa43d60 EFLAGS: 00010246 RAX: 81007cf72f20 RBX: 81007c682000 RCX: 0006 RDX: RSI: RDI: 81007cf72f20 RBP: 02090900 R08: R09: 810037f497b0 R10: 000b44ffd564 R11: 8022c92a R12: R13: 0100 R14: R15: FS: 0066d870() GS:80611000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 7bebc000 CR4: 06e0 Process md0_reshape (pid: 1432, threadinfo 81007aa42000, task 810037f497b0) Stack: 803dce42 1d383600 0
Re: Raid5 reshape
Neil Brown wrote: On Monday June 19, [EMAIL PROTECTED] wrote: One comment - As I look at the rebuild, which is now over 20%, the time till finish makes no sense. It did make sense when the first reshape started. I guess your estimating / averaging algorithm doesn't work for a restarted reshape. A minor cosmetic issue - see below Nigel [EMAIL PROTECTED] ~]$ cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [>] reshape = 22.7% (55742816/245111552) finish=5.8min speed=542211K/sec Hmmm. I see. This should fix that, but I don't expect you to interrupt your reshape to try it. Thanks, NeilBrown I have nothing better to do, I'll give it a go and let you know... - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Nigel J. Terry wrote: > One comment - As I look at the rebuild, which is now over 20%, the time > till finish makes no sense. It did make sense when the first reshape > started. I guess your estimating / averaging algorithm doesn't work for > a restarted reshape. A minor cosmetic issue - see below > > Nigel > [EMAIL PROTECTED] ~]$ cat /proc/mdstat > Personalities : [raid5] [raid4] > md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] > 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] > [UUU_] > [>] reshape = 22.7% (55742816/245111552) > finish=5.8min speed=542211K/sec Unless something has changed recently the parity-rebuild-interrupted / restarted-parity-rebuild case shows the same behavior. It's probably the same chunk of code (I haven't looked, bad hacker! bad!), but I thought I'd mention it in case Neil goes looking The "speed" is truly impressive though. I'll almost be sorry to see it fixed :-) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Mike Hardy wrote: Unless something has changed recently the parity-rebuild-interrupted / restarted-parity-rebuild case shows the same behavior. It's probably the same chunk of code (I haven't looked, bad hacker! bad!), but I thought I'd mention it in case Neil goes looking The "speed" is truly impressive though. I'll almost be sorry to see it fixed :-) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I'd love to agree about the speed, but this has been the longest 5.8 minutes of my life... :-) - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
On Monday June 19, [EMAIL PROTECTED] wrote: > > One comment - As I look at the rebuild, which is now over 20%, the time > till finish makes no sense. It did make sense when the first reshape > started. I guess your estimating / averaging algorithm doesn't work for > a restarted reshape. A minor cosmetic issue - see below > > Nigel > [EMAIL PROTECTED] ~]$ cat /proc/mdstat > Personalities : [raid5] [raid4] > md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] > 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] > [UUU_] > [>] reshape = 22.7% (55742816/245111552) > finish=5.8min speed=542211K/sec Hmmm. I see. This should fix that, but I don't expect you to interrupt your reshape to try it. Thanks, NeilBrown ### Diffstat output ./drivers/md/md.c |8 +--- ./include/linux/raid/md_k.h |3 ++- 2 files changed, 7 insertions(+), 4 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2006-06-19 11:52:55.0 +1000 +++ ./drivers/md/md.c 2006-06-20 09:30:57.0 +1000 @@ -2717,7 +2717,7 @@ static ssize_t sync_speed_show(mddev_t *mddev, char *page) { unsigned long resync, dt, db; - resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active)); + resync = (mddev->curr_mark_cnt - atomic_read(&mddev->recovery_active)); dt = ((jiffies - mddev->resync_mark) / HZ); if (!dt) dt++; db = resync - (mddev->resync_mark_cnt); @@ -4688,8 +4688,9 @@ static void status_resync(struct seq_fil */ dt = ((jiffies - mddev->resync_mark) / HZ); if (!dt) dt++; - db = resync - (mddev->resync_mark_cnt/2); - rt = (dt * ((unsigned long)(max_blocks-resync) / (db/100+1)))/100; + db = (mddev->curr_mark_cnt - atomic_read(&mddev->recovery_active)) + - mddev->resync_mark_cnt; + rt = (dt/2 * ((unsigned long)(max_blocks-resync) / (db/100+1)))/100; seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6); @@ -5204,6 +5205,7 @@ void md_do_sync(mddev_t *mddev) j += sectors; if (j>1) mddev->curr_resync = j; + mddev->curr_mark_cnt = io_sectors; if (last_check == 0) /* this is the earliers that rebuilt will be * visible in /proc/mdstat diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h --- .prev/include/linux/raid/md_k.h 2006-06-20 09:31:22.0 +1000 +++ ./include/linux/raid/md_k.h 2006-06-20 09:31:58.0 +1000 @@ -148,9 +148,10 @@ struct mddev_s struct mdk_thread_s *thread;/* management thread */ struct mdk_thread_s *sync_thread; /* doing resync or reconstruct */ - sector_tcurr_resync;/* blocks scheduled */ + sector_tcurr_resync;/* last block scheduled */ unsigned long resync_mark;/* a recent timestamp */ sector_tresync_mark_cnt;/* blocks written at resync_mark */ + sector_tcurr_mark_cnt; /* blocks scheduled now */ sector_tresync_max_sectors; /* may be set by personality */ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: On Monday June 19, [EMAIL PROTECTED] wrote: That seems to have fixed it. The reshape is now progressing and there are no apparent errors in dmesg. Details below. Great! I'll send another confirmation tomorrow when hopefully it has finished :-) Many thanks for a great product and great support. And thank you for being a patient beta-tester! NeilBrown Neil - I see myself more as being an "idiot-proof" tester than a beta-tester... One comment - As I look at the rebuild, which is now over 20%, the time till finish makes no sense. It did make sense when the first reshape started. I guess your estimating / averaging algorithm doesn't work for a restarted reshape. A minor cosmetic issue - see below Nigel [EMAIL PROTECTED] ~]$ cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [>] reshape = 22.7% (55742816/245111552) finish=5.8min speed=542211K/sec unused devices: [EMAIL PROTECTED] ~]$ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
On Monday June 19, [EMAIL PROTECTED] wrote: > > That seems to have fixed it. The reshape is now progressing and > there are no apparent errors in dmesg. Details below. Great! > > I'll send another confirmation tomorrow when hopefully it has finished :-) > > Many thanks for a great product and great support. And thank you for being a patient beta-tester! NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: On Sunday June 18, [EMAIL PROTECTED] wrote: This from dmesg might help diagnose the problem: Yes, that helps a lot, thanks. The problem is that the reshape thread is restarting before the array is fully set-up, so it ends up dereferencing a NULL pointer. This patch should fix it. In fact, there is a small chance that next time you boot it will work without this patch, but the patch makes it more reliable. There definitely should be no data-loss due to this bug. Thanks, NeilBrown Neil That seems to have fixed it. The reshape is now progressing and there are no apparent errors in dmesg. Details below. I'll send another confirmation tomorrow when hopefully it has finished :-) Many thanks for a great product and great support. Nigel [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=>...] reshape = 7.9% (19588744/245111552) finish=6.4min speed=578718K/sec unused devices: [EMAIL PROTECTED] ~]# mdadm --detail /dev/md0 /dev/md0: Version : 00.91.03 Creation Time : Tue Apr 18 17:44:34 2006 Raid Level : raid5 Array Size : 490223104 (467.51 GiB 501.99 GB) Device Size : 245111552 (233.76 GiB 250.99 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Jun 19 17:38:42 2006 State : clean, degraded, recovering Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 128K Reshape Status : 8% complete Delta Devices : 1, (3->4) UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb Events : 0.3287189 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 8 171 active sync /dev/sdb1 2 3 652 active sync /dev/hdb1 3 003 removed 4 221- spare /dev/hdc1 [EMAIL PROTECTED] ~]# - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
On Sunday June 18, [EMAIL PROTECTED] wrote: > This from dmesg might help diagnose the problem: > Yes, that helps a lot, thanks. The problem is that the reshape thread is restarting before the array is fully set-up, so it ends up dereferencing a NULL pointer. This patch should fix it. In fact, there is a small chance that next time you boot it will work without this patch, but the patch makes it more reliable. There definitely should be no data-loss due to this bug. Thanks, NeilBrown ### Diffstat output ./drivers/md/md.c|6 -- ./drivers/md/raid5.c |3 --- 2 files changed, 4 insertions(+), 5 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2006-05-30 15:07:14.0 +1000 +++ ./drivers/md/md.c 2006-06-19 12:01:47.0 +1000 @@ -2719,8 +2719,6 @@ static int do_md_run(mddev_t * mddev) } set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - if (mddev->sb_dirty) md_update_sb(mddev); @@ -2738,6 +2736,10 @@ static int do_md_run(mddev_t * mddev) mddev->changed = 1; md_new_event(mddev); + + md_wakeup_thread(mddev->thread); + md_wakeup_thread(mddev->sync_thread); + return 0; } diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2006-06-19 11:56:41.0 +1000 +++ ./drivers/md/raid5.c2006-06-19 11:56:44.0 +1000 @@ -2373,9 +2373,6 @@ static int run(mddev_t *mddev) set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); mddev->sync_thread = md_register_thread(md_do_sync, mddev, "%s_reshape"); - /* FIXME if md_register_thread fails?? */ - md_wakeup_thread(mddev->sync_thread); - } /* read-ahead size must cover two whole stripes, which is - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Nigel J. Terry wrote: Neil Brown wrote: OK, thanks for the extra details. I'll have a look and see what I can find, but it'll probably be a couple of days before I have anything useful for you. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html This from dmesg might help diagnose the problem: md: Autodetecting RAID arrays. md: autorun ... md: considering sdb1 ... md: adding sdb1 ... md: adding sda1 ... md: adding hdc1 ... md: adding hdb1 ... md: created md0 md: bind md: bind md: bind md: bind md: running: raid5: automatically using best checksumming function: generic_sse generic_sse: 6795.000 MB/sec raid5: using function: generic_sse (6795.000 MB/sec) md: raid5 personality registered for level 5 md: raid4 personality registered for level 4 raid5: reshape will continue raid5: device sdb1 operational as raid disk 1 raid5: device sda1 operational as raid disk 0 raid5: device hdb1 operational as raid disk 2 raid5: allocated 4268kB for md0 raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2 RAID5 conf printout: --- rd:4 wd:3 fd:1 disk 0, o:1, dev:sda1 disk 1, o:1, dev:sdb1 disk 2, o:1, dev:hdb1 ...ok start reshape thread md: syncing RAID array md0 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for reconstruction. md: using 128k window, over a total of 245111552 blocks. Unable to handle kernel NULL pointer dereference at RIP: <>{stext+2145382632} PGD 7c3f9067 PUD 7cb9e067 PMD 0 Oops: 0010 [1] SMP CPU 0 Modules linked in: raid5 xor usb_storage video button battery ac lp parport_pc parport floppy nvram snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ehci_hcd ohci1394 ieee1394 sg snd_pcm uhci_hcd i2c_nforce2 i2c_core forcedeth ohci_hcd snd_timer snd soundcore snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv libata sd_mod scsi_mod Pid: 1432, comm: md0_reshape Not tainted 2.6.17-rc6 #1 RIP: 0010:[<>] <>{stext+2145382632} RSP: :81007aa43d60 EFLAGS: 00010246 RAX: 81007cf72f20 RBX: 81007c682000 RCX: 0006 RDX: RSI: RDI: 81007cf72f20 RBP: 02090900 R08: R09: 810037f497b0 R10: 000b44ffd564 R11: 8022c92a R12: R13: 0100 R14: R15: FS: 0066d870() GS:80611000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 7bebc000 CR4: 06e0 Process md0_reshape (pid: 1432, threadinfo 81007aa42000, task 810037f497b0) Stack: 803dce42 1d383600 Call Trace: {md_do_sync+1307} {thread_return+0} {thread_return+94} {keventd_create_kthread+0} {md_thread+248} {keventd_create_kthread+0} {md_thread+0} {kthread+254} {child_rip+8} {keventd_create_kthread+0} {thread_return+0} {kthread+0} {child_rip+0} Code: Bad RIP value. RIP <>{stext+2145382632} RSP CR2: <6>md: ... autorun DONE. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: OK, thanks for the extra details. I'll have a look and see what I can find, but it'll probably be a couple of days before I have anything useful for you. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html OK, I'll try and be patient :-) At least everything else is working. Let me know if you need to ssh to my machine. Nigel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
OK, thanks for the extra details. I'll have a look and see what I can find, but it'll probably be a couple of days before I have anything useful for you. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Nigel J. Terry wrote: Neil Brown wrote: On Saturday June 17, [EMAIL PROTECTED] wrote: Any ideas what I should do next? Thanks Looks like you've probably hit a bug. I'll need a bit more info though. First: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=>...] reshape = 6.9% (17073280/245111552) finish=86.3min speed=44003K/sec unused devices: This really makes it look like the reshape is progressing. How long after the reboot was this taken? How long after hdc1 has hot added (roughly)? What does it show now? What happens if you remove hdc1 again? Does the reshape keep going? What I would expect to happen in this case is that the array reshapes into a degraded array, then the missing disk is recovered onto hdc1. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I don't know how long the system was reshaping before the power went off, and then I had to restart when the power came back. It claimed it was going to take 430 minutes, so 6% would be about 25 minutes, which could make good sense, certainly it looked like it was working fine when I went out. Now nothing is happening, it shows: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=>...] reshape = 6.9% (17073280/245111552) finish=2281.2min speed=1665K/sec unused devices: [EMAIL PROTECTED] ~]# so the only thing changing is the time till finish. I'll try removing and adding /dev/hdc1 again. Will it make any difference if the device is mounted or not? Nigel Tried remove and add, made no difference: [EMAIL PROTECTED] ~]# mdadm /dev/md0 --remove /dev/hdc1 mdadm: hot removed /dev/hdc1 [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=>...] reshape = 6.9% (17073280/245111552) finish=2321.5min speed=1636K/sec unused devices: [EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdc1 mdadm: re-added /dev/hdc1 [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 hdc1[4](S) sdb1[1] sda1[0] hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=>...] reshape = 6.9% (17073280/245111552) finish=2329.3min speed=1630K/sec unused devices: [EMAIL PROTECTED] ~]# - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: On Saturday June 17, [EMAIL PROTECTED] wrote: Any ideas what I should do next? Thanks Looks like you've probably hit a bug. I'll need a bit more info though. First: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=>...] reshape = 6.9% (17073280/245111552) finish=86.3min speed=44003K/sec unused devices: This really makes it look like the reshape is progressing. How long after the reboot was this taken? How long after hdc1 has hot added (roughly)? What does it show now? What happens if you remove hdc1 again? Does the reshape keep going? What I would expect to happen in this case is that the array reshapes into a degraded array, then the missing disk is recovered onto hdc1. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I don't know how long the system was reshaping before the power went off, and then I had to restart when the power came back. It claimed it was going to take 430 minutes, so 6% would be about 25 minutes, which could make good sense, certainly it looked like it was working fine when I went out. Now nothing is happening, it shows: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=>...] reshape = 6.9% (17073280/245111552) finish=2281.2min speed=1665K/sec unused devices: [EMAIL PROTECTED] ~]# so the only thing changing is the time till finish. I'll try removing and adding /dev/hdc1 again. Will it make any difference if the device is mounted or not? Nigel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
On Saturday June 17, [EMAIL PROTECTED] wrote: > > Any ideas what I should do next? Thanks > Looks like you've probably hit a bug. I'll need a bit more info though. First: > [EMAIL PROTECTED] ~]# cat /proc/mdstat > Personalities : [raid5] [raid4] > md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] > 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] > [UUU_] > [=>...] reshape = 6.9% (17073280/245111552) > finish=86.3min speed=44003K/sec > > unused devices: This really makes it look like the reshape is progressing. How long after the reboot was this taken? How long after hdc1 has hot added (roughly)? What does it show now? What happens if you remove hdc1 again? Does the reshape keep going? What I would expect to happen in this case is that the array reshapes into a degraded array, then the missing disk is recovered onto hdc1. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: On Friday June 16, [EMAIL PROTECTED] wrote: Thanks for all the advice. One final question, what kernel and mdadm versions do I need? For resizing raid5: mdadm-2.4 or later linux-2.6.17-rc2 or later NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Ok, I tried and screwed up! I upgraded my kernel and mdadm. I set the grow going and all looked well, so as it said it was going to take 430 minutes, I went to Starbucks. When I came home there had been a power cut, but my UPS had shut the system down. When power returned I rebooted. Now I think I had failed to set the new partition on /dev/hdc1 to Raid Autodetect, so it didn't find it at reboot. I tried to hot add it, but now I seem to have a deadlock situation. Although --detail shows that it is degraded and recovering, /proc/mdstat shows it is reshaping. In truth there is no disk activity and the count in /proc/mdstat is not changing. I gues sthe only good news is that I can still mount the device and my data is fine. Please see below... Any ideas what I should do next? Thanks Nigel [EMAIL PROTECTED] ~]# uname -a Linux homepc.nigelterry.net 2.6.17-rc6 #1 SMP Sat Jun 17 11:05:52 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux [EMAIL PROTECTED] ~]# mdadm --version mdadm - v2.5.1 - 16 June 2006 [EMAIL PROTECTED] ~]# mdadm --detail /dev/md0 /dev/md0: Version : 00.91.03 Creation Time : Tue Apr 18 17:44:34 2006 Raid Level : raid5 Array Size : 490223104 (467.51 GiB 501.99 GB) Device Size : 245111552 (233.76 GiB 250.99 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Jun 17 15:15:05 2006 State : clean, degraded, recovering Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 128K Reshape Status : 6% complete Delta Devices : 1, (3->4) UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb Events : 0.3211829 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 8 171 active sync /dev/sdb1 2 3 652 active sync /dev/hdb1 3 003 removed 4 221- spare /dev/hdc1 [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=>...] reshape = 6.9% (17073280/245111552) finish=86.3min speed=44003K/sec unused devices: [EMAIL PROTECTED] ~]# - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
On Friday June 16, [EMAIL PROTECTED] wrote: > Thanks for all the advice. One final question, what kernel and mdadm > versions do I need? For resizing raid5: mdadm-2.4 or later linux-2.6.17-rc2 or later NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: On Friday June 16, [EMAIL PROTECTED] wrote: You have to grow the ext3 fs separately. ext2resize /dev/mdX. Keep in mind this can only be done off-line. ext3 can be resized online. I think ext2resize in the latest release will "do the right thing" whether it is online or not. There is a limit to the amount of expansion that can be achieved on-line. This limit is set when making the filesystem. Depending on which version of ext2-utils you used to make the filesystem, it may or may not already be prepared for substantial expansion. So if you want to do it on-line, give it a try or ask on the ext3-users list for particular details on what versions you need and how to see if your fs can be expanded. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks for all the advice. One final question, what kernel and mdadm versions do I need? Nigel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
On Friday June 16, [EMAIL PROTECTED] wrote: > You have to grow the ext3 fs separately. ext2resize /dev/mdX. Keep in > mind this can only be done off-line. > ext3 can be resized online. I think ext2resize in the latest release will "do the right thing" whether it is online or not. There is a limit to the amount of expansion that can be achieved on-line. This limit is set when making the filesystem. Depending on which version of ext2-utils you used to make the filesystem, it may or may not already be prepared for substantial expansion. So if you want to do it on-line, give it a try or ask on the ext3-users list for particular details on what versions you need and how to see if your fs can be expanded. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
You have to grow the ext3 fs separately. ext2resize /dev/mdX. Keep in mind this can only be done off-line. -Tim Nigel J. Terry wrote: Neil Brown wrote: On Thursday June 15, [EMAIL PROTECTED] wrote: Hello all, I'm sorry if this is a silly question, but I've been digging around for a few days now and have not found a clear answer, so I'm tossing it out to those who know it best. I see that as of a few rc's ago, 2.6.17 has had the capability of adding additional drives to an active raid 5 array (w/ the proper ver of mdadm, of course). I cannot, however, for the life of me find out exactly how one goes about doing it! I would love if someone could give a step-by-step on what needs to be changed in, say, mdadm.conf (if anything), and what args you need to throw at mdadm to start the reshape process. As a point of reference, here's my current mdadm.conf: DEVICE /dev/sda1 DEVICE /dev/sdb1 DEVICE /dev/sdc1 ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1 level=5 num-devices=3 May I suggest: DEVICE /dev/sd?1 ARRAY /dev/md0 UUID=whatever it would be a lot safer. I will be adding the devices /dev/sde1 and /dev/sdf1 (when I can find out how :) mdadm /dev/md0 --add /dev/sde1 /dev/sdf1 mdadm --grow /dev/md0 --raid-disks=5 NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html This might be an even sillier question, but I'll ask it anyway... If I add a drive to my RAID5 array, what happens to the ext3 filesystem on top of it? Does it grow automatically? Do I have to take some action to use the extra space? Thanks Nigel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: On Thursday June 15, [EMAIL PROTECTED] wrote: Hello all, I'm sorry if this is a silly question, but I've been digging around for a few days now and have not found a clear answer, so I'm tossing it out to those who know it best. I see that as of a few rc's ago, 2.6.17 has had the capability of adding additional drives to an active raid 5 array (w/ the proper ver of mdadm, of course). I cannot, however, for the life of me find out exactly how one goes about doing it! I would love if someone could give a step-by-step on what needs to be changed in, say, mdadm.conf (if anything), and what args you need to throw at mdadm to start the reshape process. As a point of reference, here's my current mdadm.conf: DEVICE /dev/sda1 DEVICE /dev/sdb1 DEVICE /dev/sdc1 ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1 level=5 num-devices=3 May I suggest: DEVICE /dev/sd?1 ARRAY /dev/md0 UUID=whatever it would be a lot safer. I will be adding the devices /dev/sde1 and /dev/sdf1 (when I can find out how :) mdadm /dev/md0 --add /dev/sde1 /dev/sdf1 mdadm --grow /dev/md0 --raid-disks=5 NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html This might be an even sillier question, but I'll ask it anyway... If I add a drive to my RAID5 array, what happens to the ext3 filesystem on top of it? Does it grow automatically? Do I have to take some action to use the extra space? Thanks Nigel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
On Thursday June 15, [EMAIL PROTECTED] wrote: > Hello all, > > I'm sorry if this is a silly question, but I've been digging around for > a few days now and have not found a clear answer, so I'm tossing it out > to those who know it best. > > I see that as of a few rc's ago, 2.6.17 has had the capability of adding > additional drives to an active raid 5 array (w/ the proper ver of mdadm, > of course). I cannot, however, for the life of me find out exactly how > one goes about doing it! I would love if someone could give a > step-by-step on what needs to be changed in, say, mdadm.conf (if > anything), and what args you need to throw at mdadm to start the reshape > process. > > As a point of reference, here's my current mdadm.conf: > > > DEVICE /dev/sda1 > DEVICE /dev/sdb1 > DEVICE /dev/sdc1 > ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1 level=5 num-devices=3 > May I suggest: DEVICE /dev/sd?1 ARRAY /dev/md0 UUID=whatever it would be a lot safer. > > I will be adding the devices /dev/sde1 and /dev/sdf1 (when I can find > out how :) mdadm /dev/md0 --add /dev/sde1 /dev/sdf1 mdadm --grow /dev/md0 --raid-disks=5 NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html