Re: Raid5 reshape (Solved)
Neil Well I did warn you that I was an idiot... :-) I have been attempting to work out exactly what I did and what happened. All I have learned is that I need to keep better notes Yes, the 21 mounts is a fsck, nothing to do with raid. However it is still noteworthy that this took several hours to complete with the raid also reshaping rather than the few minutes I have seen in the past. Some kind of interaction there. I think that the kernel I was using had both the fixes you had sent me in it, but I honestly can't be sure - Sorry. In the past, that bug caused it to fail immediately and the reshape to freeze. This appeared to occur after the reshape, maybe a problem at the end of the reshape process. Probably however I screwed up, and I have no way to retest. Finally, just a note to say that the system continues to work just fine and I am really impressed. Thanks again Nigel - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Nigel J. Terry wrote: Well good news and bad news I'm afraid... Well I would like to be able to tell you that the time calculation now works, but I can't. Here's why: Why I rebooted with the newly built kernel, it decided to hit the magic 21 reboots and hence decided to check the array for clean. The normally takes about 5-10 mins, but this time took several hours, so I went to bed! I suspect that it was doing the full reshape or something similar at boot time. Now I am not sure that this makes good sense in a normal environment. This could keep a server down for hours or days. I might suggest that if such work was required, the clean check is postponed till next boot and the reshape allowed to continue in the background. Anyway the good news is that this morning, all is well, the array is clean and grown as can be seen below. However, if you look further below you will see the section from dmesg which still shows RIP errors, so I guess there is still something wrong, even though it looks like it is working. Let me know if i can provide any more information. Once again, many thanks. All I need to do now is grow the ext3 filesystem... Nigel [EMAIL PROTECTED] ~]# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Tue Apr 18 17:44:34 2006 Raid Level : raid5 Array Size : 735334656 (701.27 GiB 752.98 GB) Device Size : 245111552 (233.76 GiB 250.99 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Jun 20 06:27:49 2006 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 128K UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb Events : 0.3366644 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 8 171 active sync /dev/sdb1 2 3 652 active sync /dev/hdb1 3 2213 active sync /dev/hdc1 [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[3] hdb1[2] 735334656 blocks level 5, 128k chunk, algorithm 2 [4/4] [] unused devices: none [EMAIL PROTECTED] ~]# But from dmesg: md: Autodetecting RAID arrays. md: autorun ... md: considering sdb1 ... md: adding sdb1 ... md: adding sda1 ... md: adding hdc1 ... md: adding hdb1 ... md: created md0 md: bindhdb1 md: bindhdc1 md: bindsda1 md: bindsdb1 md: running: sdb1sda1hdc1hdb1 raid5: automatically using best checksumming function: generic_sse generic_sse: 6795.000 MB/sec raid5: using function: generic_sse (6795.000 MB/sec) md: raid5 personality registered for level 5 md: raid4 personality registered for level 4 raid5: reshape will continue raid5: device sdb1 operational as raid disk 1 raid5: device sda1 operational as raid disk 0 raid5: device hdb1 operational as raid disk 2 raid5: allocated 4268kB for md0 raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2 RAID5 conf printout: --- rd:4 wd:3 fd:1 disk 0, o:1, dev:sda1 disk 1, o:1, dev:sdb1 disk 2, o:1, dev:hdb1 ...ok start reshape thread md: syncing RAID array md0 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for reconstruction. md: using 128k window, over a total of 245111552 blocks. Unable to handle kernel NULL pointer dereference at RIP: {stext+2145382632} PGD 7c3f9067 PUD 7cb9e067 PMD 0 Oops: 0010 [1] SMP CPU 0 Modules linked in: raid5 xor usb_storage video button battery ac lp parport_pc parport floppy nvram snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ehci_hcd ohci1394 ieee1394 sg snd_pcm uhci_hcd i2c_nforce2 i2c_core forcedeth ohci_hcd snd_timer snd soundcore snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv libata sd_mod scsi_mod Pid: 1432, comm: md0_reshape Not tainted 2.6.17-rc6 #1 RIP: 0010:[] {stext+2145382632} RSP: :81007aa43d60 EFLAGS: 00010246 RAX: 81007cf72f20 RBX: 81007c682000 RCX: 0006 RDX: RSI: RDI: 81007cf72f20 RBP: 02090900 R08: R09: 810037f497b0 R10: 000b44ffd564 R11: 8022c92a R12: R13: 0100 R14: R15: FS: 0066d870() GS:80611000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 7bebc000 CR4: 06e0 Process md0_reshape (pid: 1432, threadinfo 81007aa42000, task 810037f497b0) Stack: 803dce42 1d383600
Re: Raid5 reshape
Neil Brown wrote: On Sunday June 18, [EMAIL PROTECTED] wrote: This from dmesg might help diagnose the problem: Yes, that helps a lot, thanks. The problem is that the reshape thread is restarting before the array is fully set-up, so it ends up dereferencing a NULL pointer. This patch should fix it. In fact, there is a small chance that next time you boot it will work without this patch, but the patch makes it more reliable. There definitely should be no data-loss due to this bug. Thanks, NeilBrown Neil That seems to have fixed it. The reshape is now progressing and there are no apparent errors in dmesg. Details below. I'll send another confirmation tomorrow when hopefully it has finished :-) Many thanks for a great product and great support. Nigel [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=...] reshape = 7.9% (19588744/245111552) finish=6.4min speed=578718K/sec unused devices: none [EMAIL PROTECTED] ~]# mdadm --detail /dev/md0 /dev/md0: Version : 00.91.03 Creation Time : Tue Apr 18 17:44:34 2006 Raid Level : raid5 Array Size : 490223104 (467.51 GiB 501.99 GB) Device Size : 245111552 (233.76 GiB 250.99 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Jun 19 17:38:42 2006 State : clean, degraded, recovering Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 128K Reshape Status : 8% complete Delta Devices : 1, (3-4) UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb Events : 0.3287189 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 8 171 active sync /dev/sdb1 2 3 652 active sync /dev/hdb1 3 003 removed 4 221- spare /dev/hdc1 [EMAIL PROTECTED] ~]# - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: On Monday June 19, [EMAIL PROTECTED] wrote: That seems to have fixed it. The reshape is now progressing and there are no apparent errors in dmesg. Details below. Great! I'll send another confirmation tomorrow when hopefully it has finished :-) Many thanks for a great product and great support. And thank you for being a patient beta-tester! NeilBrown Neil - I see myself more as being an idiot-proof tester than a beta-tester... One comment - As I look at the rebuild, which is now over 20%, the time till finish makes no sense. It did make sense when the first reshape started. I guess your estimating / averaging algorithm doesn't work for a restarted reshape. A minor cosmetic issue - see below Nigel [EMAIL PROTECTED] ~]$ cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [] reshape = 22.7% (55742816/245111552) finish=5.8min speed=542211K/sec unused devices: none [EMAIL PROTECTED] ~]$ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Mike Hardy wrote: Unless something has changed recently the parity-rebuild-interrupted / restarted-parity-rebuild case shows the same behavior. It's probably the same chunk of code (I haven't looked, bad hacker! bad!), but I thought I'd mention it in case Neil goes looking The speed is truly impressive though. I'll almost be sorry to see it fixed :-) -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I'd love to agree about the speed, but this has been the longest 5.8 minutes of my life... :-) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Nigel J. Terry wrote: Neil Brown wrote: OK, thanks for the extra details. I'll have a look and see what I can find, but it'll probably be a couple of days before I have anything useful for you. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html This from dmesg might help diagnose the problem: md: Autodetecting RAID arrays. md: autorun ... md: considering sdb1 ... md: adding sdb1 ... md: adding sda1 ... md: adding hdc1 ... md: adding hdb1 ... md: created md0 md: bindhdb1 md: bindhdc1 md: bindsda1 md: bindsdb1 md: running: sdb1sda1hdc1hdb1 raid5: automatically using best checksumming function: generic_sse generic_sse: 6795.000 MB/sec raid5: using function: generic_sse (6795.000 MB/sec) md: raid5 personality registered for level 5 md: raid4 personality registered for level 4 raid5: reshape will continue raid5: device sdb1 operational as raid disk 1 raid5: device sda1 operational as raid disk 0 raid5: device hdb1 operational as raid disk 2 raid5: allocated 4268kB for md0 raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2 RAID5 conf printout: --- rd:4 wd:3 fd:1 disk 0, o:1, dev:sda1 disk 1, o:1, dev:sdb1 disk 2, o:1, dev:hdb1 ...ok start reshape thread md: syncing RAID array md0 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for reconstruction. md: using 128k window, over a total of 245111552 blocks. Unable to handle kernel NULL pointer dereference at RIP: {stext+2145382632} PGD 7c3f9067 PUD 7cb9e067 PMD 0 Oops: 0010 [1] SMP CPU 0 Modules linked in: raid5 xor usb_storage video button battery ac lp parport_pc parport floppy nvram snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ehci_hcd ohci1394 ieee1394 sg snd_pcm uhci_hcd i2c_nforce2 i2c_core forcedeth ohci_hcd snd_timer snd soundcore snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv libata sd_mod scsi_mod Pid: 1432, comm: md0_reshape Not tainted 2.6.17-rc6 #1 RIP: 0010:[] {stext+2145382632} RSP: :81007aa43d60 EFLAGS: 00010246 RAX: 81007cf72f20 RBX: 81007c682000 RCX: 0006 RDX: RSI: RDI: 81007cf72f20 RBP: 02090900 R08: R09: 810037f497b0 R10: 000b44ffd564 R11: 8022c92a R12: R13: 0100 R14: R15: FS: 0066d870() GS:80611000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 7bebc000 CR4: 06e0 Process md0_reshape (pid: 1432, threadinfo 81007aa42000, task 810037f497b0) Stack: 803dce42 1d383600 Call Trace: 803dce42{md_do_sync+1307} 802640c0{thread_return+0} 8026411e{thread_return+94} 8029925d{keventd_create_kthread+0} 803dd3d9{md_thread+248} 8029925d{keventd_create_kthread+0} 803dd2e1{md_thread+0} 80232cb1{kthread+254} 8026051e{child_rip+8} 8029925d{keventd_create_kthread+0} 802640c0{thread_return+0} 80232bb3{kthread+0} 80260516{child_rip+0} Code: Bad RIP value. RIP {stext+2145382632} RSP 81007aa43d60 CR2: 6md: ... autorun DONE. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: On Friday June 16, [EMAIL PROTECTED] wrote: Thanks for all the advice. One final question, what kernel and mdadm versions do I need? For resizing raid5: mdadm-2.4 or later linux-2.6.17-rc2 or later NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Ok, I tried and screwed up! I upgraded my kernel and mdadm. I set the grow going and all looked well, so as it said it was going to take 430 minutes, I went to Starbucks. When I came home there had been a power cut, but my UPS had shut the system down. When power returned I rebooted. Now I think I had failed to set the new partition on /dev/hdc1 to Raid Autodetect, so it didn't find it at reboot. I tried to hot add it, but now I seem to have a deadlock situation. Although --detail shows that it is degraded and recovering, /proc/mdstat shows it is reshaping. In truth there is no disk activity and the count in /proc/mdstat is not changing. I gues sthe only good news is that I can still mount the device and my data is fine. Please see below... Any ideas what I should do next? Thanks Nigel [EMAIL PROTECTED] ~]# uname -a Linux homepc.nigelterry.net 2.6.17-rc6 #1 SMP Sat Jun 17 11:05:52 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux [EMAIL PROTECTED] ~]# mdadm --version mdadm - v2.5.1 - 16 June 2006 [EMAIL PROTECTED] ~]# mdadm --detail /dev/md0 /dev/md0: Version : 00.91.03 Creation Time : Tue Apr 18 17:44:34 2006 Raid Level : raid5 Array Size : 490223104 (467.51 GiB 501.99 GB) Device Size : 245111552 (233.76 GiB 250.99 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Jun 17 15:15:05 2006 State : clean, degraded, recovering Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 128K Reshape Status : 6% complete Delta Devices : 1, (3-4) UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb Events : 0.3211829 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 8 171 active sync /dev/sdb1 2 3 652 active sync /dev/hdb1 3 003 removed 4 221- spare /dev/hdc1 [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=...] reshape = 6.9% (17073280/245111552) finish=86.3min speed=44003K/sec unused devices: none [EMAIL PROTECTED] ~]# - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Nigel J. Terry wrote: Neil Brown wrote: On Saturday June 17, [EMAIL PROTECTED] wrote: Any ideas what I should do next? Thanks Looks like you've probably hit a bug. I'll need a bit more info though. First: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=...] reshape = 6.9% (17073280/245111552) finish=86.3min speed=44003K/sec unused devices: none This really makes it look like the reshape is progressing. How long after the reboot was this taken? How long after hdc1 has hot added (roughly)? What does it show now? What happens if you remove hdc1 again? Does the reshape keep going? What I would expect to happen in this case is that the array reshapes into a degraded array, then the missing disk is recovered onto hdc1. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I don't know how long the system was reshaping before the power went off, and then I had to restart when the power came back. It claimed it was going to take 430 minutes, so 6% would be about 25 minutes, which could make good sense, certainly it looked like it was working fine when I went out. Now nothing is happening, it shows: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=...] reshape = 6.9% (17073280/245111552) finish=2281.2min speed=1665K/sec unused devices: none [EMAIL PROTECTED] ~]# so the only thing changing is the time till finish. I'll try removing and adding /dev/hdc1 again. Will it make any difference if the device is mounted or not? Nigel Tried remove and add, made no difference: [EMAIL PROTECTED] ~]# mdadm /dev/md0 --remove /dev/hdc1 mdadm: hot removed /dev/hdc1 [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=...] reshape = 6.9% (17073280/245111552) finish=2321.5min speed=1636K/sec unused devices: none [EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdc1 mdadm: re-added /dev/hdc1 [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 hdc1[4](S) sdb1[1] sda1[0] hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [=...] reshape = 6.9% (17073280/245111552) finish=2329.3min speed=1630K/sec unused devices: none [EMAIL PROTECTED] ~]# - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: OK, thanks for the extra details. I'll have a look and see what I can find, but it'll probably be a couple of days before I have anything useful for you. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html OK, I'll try and be patient :-) At least everything else is working. Let me know if you need to ssh to my machine. Nigel - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Neil Brown wrote: On Thursday June 15, [EMAIL PROTECTED] wrote: Hello all, I'm sorry if this is a silly question, but I've been digging around for a few days now and have not found a clear answer, so I'm tossing it out to those who know it best. I see that as of a few rc's ago, 2.6.17 has had the capability of adding additional drives to an active raid 5 array (w/ the proper ver of mdadm, of course). I cannot, however, for the life of me find out exactly how one goes about doing it! I would love if someone could give a step-by-step on what needs to be changed in, say, mdadm.conf (if anything), and what args you need to throw at mdadm to start the reshape process. As a point of reference, here's my current mdadm.conf: DEVICE /dev/sda1 DEVICE /dev/sdb1 DEVICE /dev/sdc1 ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1 level=5 num-devices=3 May I suggest: DEVICE /dev/sd?1 ARRAY /dev/md0 UUID=whatever it would be a lot safer. I will be adding the devices /dev/sde1 and /dev/sdf1 (when I can find out how :) mdadm /dev/md0 --add /dev/sde1 /dev/sdf1 mdadm --grow /dev/md0 --raid-disks=5 NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html This might be an even sillier question, but I'll ask it anyway... If I add a drive to my RAID5 array, what happens to the ext3 filesystem on top of it? Does it grow automatically? Do I have to take some action to use the extra space? Thanks Nigel - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help needed - RAID5 recovery from Power-fail - SOLVED
Thanks for all the help. I am now up and running again and have been stable for over a day. I will now install my new drive and add it to give me an array of three drives. I'll also learn more about Raid, mdadm and smartd so that I am better prepared next time. Thanks again Nigel Neil Brown wrote: On Monday April 3, [EMAIL PROTECTED] wrote: I wonder if you could help a Raid Newbie with a problem I had a power fail, and now I can't access my RAID array. It has been working fine for months until I lost power... Being a fool, I don't have a full backup, so I really need to get this data back. I run FC4 (64bit). I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array /dev/md0 on top of which I run lvm and mount the whole lot as /home. My intention was always to add another disk to this array, and I purchased one yesterday. 2 devices in a raid5?? Doesn't seem a lot of point it being raid5 rather than raid1. When I boot, I get: md0 is not clean Cannot start dirty degraded array failed to run raid set md0 This tells use that the array is degraded. A dirty degraded array can have undetectable data corruption. That is why it won't start it for you. However with only two devices, data corruption from this cause isn't actually possible. The kernel parameter md_mod.start_dirty_degraded=1 will bypass this message and start the array anyway. Alternately: mdadm -A --force /dev/md0 /dev/sd[ab]1 # mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 00.90.02 UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 Creation Time : Thu Dec 15 15:29:36 2005 Raid Level : raid5 Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Tue Mar 21 06:25:52 2006 State : active Active Devices : 1 So at 06:25:52, there was only one working devices, while... #mdadm --examine /dev/sdb1 /dev/sdb1: Magic : a92b4efc Version : 00.90.02 UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 Creation Time : Thu Dec 15 15:29:36 2005 Raid Level : raid5 Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Tue Mar 21 06:23:57 2006 State : active Active Devices : 2 at 06:23:57 there were two. It looks like you lost a drive a while ago. Did you notice? Anyway, the 'mdadm' command I gave above should get the array working again for you. Then you might want to mdadm /dev/md0 -a /dev/sdb1 is you trust /dev/sdb NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Help needed - RAID5 recovery from Power-fail
I wonder if you could help a Raid Newbie with a problem I had a power fail, and now I can't access my RAID array. It has been working fine for months until I lost power... Being a fool, I don't have a full backup, so I really need to get this data back. I run FC4 (64bit). I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array /dev/md0 on top of which I run lvm and mount the whole lot as /home. My intention was always to add another disk to this array, and I purchased one yesterday. When I boot, I get: md0 is not clean Cannot start dirty degraded array failed to run raid set md0 I can provide the following extra information: # cat /proc/mdstat Personalities : [raid5] unused devices: none # mdadm --query /dev/md0 /dev/md0: is an md device which is not active # mdadm --query /dev/md0 /dev/md0: is an md device which is not active /dev/md0: is too small to be an md component. # mdadm --query /dev/sda1 /dev/sda1: is not an md array /dev/sda1: device 0 in 2 device undetected raid5 md0. Use mdadm --examine for more detail. #mdadm --query /dev/sdb1 /dev/sdb1: is not an md array /dev/sdb1: device 1 in 2 device undetected raid5 md0. Use mdadm --examine for more detail. # mdadm --examine /dev/md0 mdadm: /dev/md0 is too small for md # mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 00.90.02 UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 Creation Time : Thu Dec 15 15:29:36 2005 Raid Level : raid5 Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Tue Mar 21 06:25:52 2006 State : active Active Devices : 1 Working Devices : 1 Failed Devices : 2 Spare Devices : 0 Checksum : 2ba99f09 - correct Events : 0.1498318 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 0 810 active sync /dev/sda1 0 0 810 active sync /dev/sda1 1 1 001 faulty removed #mdadm --examine /dev/sdb1 /dev/sdb1: Magic : a92b4efc Version : 00.90.02 UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 Creation Time : Thu Dec 15 15:29:36 2005 Raid Level : raid5 Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Tue Mar 21 06:23:57 2006 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 2ba99e95 - correct Events : 0.1498307 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 1 8 171 active sync /dev/sdb1 0 0 810 active sync /dev/sda1 1 1 8 171 active sync /dev/sdb1 It looks to me like there is no hardware problem, but maybe I am wrong. I cannot find any file /etc/mdadm.confnor /etc/raidtab. How would you suggest I proceed? I'm wary of doing anything (assemble, build, create) until I am sure it won't reset everything. Many Thanks Nigel - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html