Re: gmirror: replacing failed disks
On Wed, Jan 19, 2005 at 06:55:53AM +0100, Christian Hiris wrote: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:07:17 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:07:17 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. and on and on... Hi Doug, seems to me, that when you pulled drive ad4, data on it were damaged. Thanks for trying this with ad4 as provider! Now simply try to simulate a replacement with a fresh disk, as you would do in real life. Please set 'sysctl -w kern.geom.mirror.debug=0' and remove the according line from your /boot/loader.config. The procedure below is similar to the example in the gmirror manpage, just added 2. and 3. to make ad4 appear as a fresh disk and re-create the slice ad4s1. 1. Let ad6 forget about all other gms01's providers # gmirror forget gms01 2. Clean up ad4 2.1. Blank out the first few blocks of ad4 # dd if=/dev/zero of=/dev/ad4 bs=512 count=128 2.2. Blank out gmirror metadata on ad4 # dd if=/dev/zero of=/dev/ad4 bs=512 skip=156301400 This command took a long time and I didn't let it complete. I use the tcsh and would occassionaly hit ctrl T to track it's progress. I didn't see it writing data to the disk nor did I see disk activity. Did I not wait long enough? 3. Initialize ad4 and create slize ad4s1 # fdisk -v -B -I /dev/ad4 4. Add /dev/ad4s1 to mirror gm0s1 # gmirror insert gm0s1 /dev/ad4s1 Christian, Other than the issue with 2.2 above, the procedure worked and the replacement drive is now synchronizing. Now I'll print out a transcript of this and tape it to the box, Then, in three years, when a drive dies, I'll remember what to do :) Thanks again for all your help. -- Regards, Doug ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: gmirror: replacing failed disks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wednesday 19 January 2005 22:53, you wrote: On Wed, Jan 19, 2005 at 06:55:53AM +0100, Christian Hiris wrote: [...] 2.2. Blank out gmirror metadata on ad4 # dd if=/dev/zero of=/dev/ad4 bs=512 skip=156301400 This command took a long time and I didn't let it complete. I use the tcsh and would occassionaly hit ctrl T to track it's progress. I didn't see it writing data to the disk nor did I see disk activity. Did I not wait long enough? Ooops, I just realized that option 'skip=n' skips the blocks on the *input* file. The correct option to skip blocks on the outfile is 'seek=n'. It didn't damage anything in our case, but it waisted your time. Sorry about this mistaken option and the time you lost from this. The command's only purpose is to re-design the drive, as if it hasn't ever faced the gmirror framework before. Just for the archives the corrected (and double-checked) command: 2.2. Blank out gmirror metadata on ad4 # dd if=/dev/zero of=/dev/ad4 bs=512 seek=n where n=(metadata_location_in_bytes/512)-1 [...] Christian, Other than the issue with 2.2 above, the procedure worked and the replacement drive is now synchronizing. Now I'll print out a transcript of this and tape it to the box, Then, in three years, when a drive dies, I'll remember what to do :) Thanks again for all your help. Just in case, if the tape fails :) http://freebsd.rambler.ru It's excellent! Good luck, ch - -- Christian Hiris [EMAIL PROTECTED] | OpenPGP KeyID 0x3BCA53BE OpenPGP-Key at hkp://wwwkeys.eu.pgp.net and http://pgp.mit.edu -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.6 (FreeBSD) iD8DBQFB7wOK09WjGjvKU74RAuzzAJ97NfgwdPng100HzfDOqItmGo4xfQCfeNhW wQzw3GGrB/oWrWcWobrcEwI= =x6Rz -END PGP SIGNATURE- ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: gmirror: replacing failed disks
GEOM_MIRROR: Component ad4 (device gm0s1) broken, skipping. GEOM_MIRROR: Cannot add disk ad4 to gm0s1 (error=22). You can set 'kern.geom.mirror.debug=2' in /boot/loader.conf. This tells you more about what happens. I tested all this at a very early stage of development, so gmirror's behaviour might have changed. There is also a small chance that some bits in my brain got lost since I tested his :) I set this while the OS is running. GEOM_MIRROR responds every 5 seconds GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean. GEOM_MIRROR[2]: Metadata on ad6 updated. GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. GEOM_MIRROR[2]: Metadata on ad6 updated. Can I get gmirror to attempt to connect to ad4 again? I tried an atcontrol reinit 2, that didn't do it. I also tried gmirror rebuild gm0s1 ad4, but gmirror said: No such provider: ad4. -- Regards, Doug ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: gmirror: replacing failed disks
On Tue, Jan 18, 2005 at 09:02:48AM -0600, Doug Poland wrote: GEOM_MIRROR: Component ad4 (device gm0s1) broken, skipping. GEOM_MIRROR: Cannot add disk ad4 to gm0s1 (error=22). You can set 'kern.geom.mirror.debug=2' in /boot/loader.conf. This tells you more about what happens. I tested all this at a very early stage of development, so gmirror's behaviour might have changed. There is also a small chance that some bits in my brain got lost since I tested his :) I set this while the OS is running. GEOM_MIRROR responds every 5 seconds GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean. GEOM_MIRROR[2]: Metadata on ad6 updated. GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. GEOM_MIRROR[2]: Metadata on ad6 updated. Can I get gmirror to attempt to connect to ad4 again? I tried an atcontrol reinit 2, that didn't do it. I also tried gmirror rebuild gm0s1 ad4, but gmirror said: No such provider: ad4. Sorry to reply to my own post, but when I rebooted the box, this is what I see in /var/log/messages concerning gmirror: ... snip ... Jan 18 21:06:10 sgwww02 kernel: ad4: 76319MB WDC WD800JD-00HKA0/13.03G13 [155061/16/63] at ata2-master SATA150 Jan 18 21:06:10 sgwww02 kernel: ad6: 76319MB WDC WD800JD-00JNA0/05.01C05 [155061/16/63] at ata3-master SATA150 Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Device gm0s1 created (id=594613568). Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Device gm0s1: provider ad4 detected. Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Device gm0s1: provider ad6 detected. Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Component ad4 (device gm0s1) broken, skipping. Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Device gm0s1: provider ad6 activated. Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. Jan 18 21:06:10 sgwww02 kernel: Mounting root from ufs:/dev/mirror/gm0s1a Jan 18 21:06:10 sgwww02 kernel: em0: Link is up 100 Mbps Full Duplex Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean. Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:10 sgwww02 kernel: em0: Link is up 100 Mbps Full Duplex Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:17 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean. Jan 18 21:06:17 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:17 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:06:17 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:22 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean. Jan 18 21:06:22 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:22 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:06:22 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:32 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean. Jan 18 21:06:32 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:32 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:06:32 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:46 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean. Jan 18 21:06:46 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:46 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:06:46 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:52 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean. Jan 18 21:06:52 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:06:52 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:06:52 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:07:02 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean. Jan 18 21:07:02 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:07:02 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:07:02 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for mirror/gm0s1: r1w0e0. Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for mirror/gm0s1: r-1w0e0. Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for mirror/gm0s1: r1w0e0. Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for mirror/gm0s1: r-1w0e0. Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for mirror/gm0s1: r1w0e0. Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for mirror/gm0s1: r-1w0e0. Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for mirror/gm0s1: r1w0e0. Jan 18 21:07:12 sgwww02 kernel:
Re: gmirror: replacing failed disks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wednesday 19 January 2005 05:36, Doug Poland wrote: On Tue, Jan 18, 2005 at 09:02:48AM -0600, Doug Poland wrote: GEOM_MIRROR: Component ad4 (device gm0s1) broken, skipping. GEOM_MIRROR: Cannot add disk ad4 to gm0s1 (error=22). You can set 'kern.geom.mirror.debug=2' in /boot/loader.conf. This tells you more about what happens. I tested all this at a very early stage of development, so gmirror's behaviour might have changed. There is also a small chance that some bits in my brain got lost since I tested his :) I set this while the OS is running. GEOM_MIRROR responds every 5 seconds GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean. GEOM_MIRROR[2]: Metadata on ad6 updated. GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. GEOM_MIRROR[2]: Metadata on ad6 updated. Can I get gmirror to attempt to connect to ad4 again? I tried an atcontrol reinit 2, that didn't do it. I also tried gmirror rebuild gm0s1 ad4, but gmirror said: No such provider: ad4. Sorry to reply to my own post, but when I rebooted the box, this is what I see in /var/log/messages concerning gmirror: ... snip ... GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:07:17 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:07:17 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated. and on and on... Not sure what to do to get gmirror to recogonize this disk. Hi Doug, seems to me, that when you pulled drive ad4, data on it were damaged. Thanks for trying this with ad4 as provider! Now simply try to simulate a replacement with a fresh disk, as you would do in real life. Please set 'sysctl -w kern.geom.mirror.debug=0' and remove the according line from your /boot/loader.config. The procedure below is similar to the example in the gmirror manpage, just added 2. and 3. to make ad4 appear as a fresh disk and re-create the slice ad4s1. 1. Let ad6 forget about all other gms01's providers # gmirror forget gms01 2. Clean up ad4 2.1. Blank out the first few blocks of ad4 # dd if=/dev/zero of=/dev/ad4 bs=512 count=128 2.2. Blank out gmirror metadata on ad4 # dd if=/dev/zero of=/dev/ad4 bs=512 skip=156301400 3. Initialize ad4 and create slize ad4s1 # fdisk -v -B -I /dev/ad4 4. Add /dev/ad4s1 to mirror gm0s1 # gmirror insert gm0s1 /dev/ad4s1 Cheers, ch - -- Christian Hiris [EMAIL PROTECTED] | OpenPGP KeyID 0x3BCA53BE OpenPGP-Key at hkp://wwwkeys.eu.pgp.net and http://pgp.mit.edu -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.6 (FreeBSD) iD8DBQFB7fZ309WjGjvKU74RAu8bAJ90bGmDZ5WRG+cLWSvnR6dLt+whSgCaAkDS lNzBZNp+DVOlzNuA30rmKXQ= =ZPkt -END PGP SIGNATURE- ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: gmirror: replacing failed disks
On Sun, Jan 16, 2005 at 11:14:48PM +0100, Christian Hiris wrote: On Sunday 16 January 2005 21:14, Doug Poland wrote: (My system has a provider gm0s1 with ad4 and ad6 as consumers, so I'll use those device names) Simulate ad4 failing: pull the drive put the drive back in, reboot if necessary to detect drive After you put the drive in, you can try to attach or reinit the controller channel where it's connected to with the command 'atacontrol'. did that, atacontrol reinit 2, and the drive shows up, yee haw! If you put the same drive in, and you haven't zeroed the bootblocks and the slicetable (on ad4) geom will recognice that the missing disk has been re-attached and will start rebuilding. That's not happening, gmirror says: GEOM_MIRROR: Component ad4 (device gm0s1) broken, skipping. GEOM_MIRROR: Cannot add disk ad4 to gm0s1 (error=22). If you want to simulate insertion of a blank disk, run the 'gmirror forget' I'll try this after I've got the above working :) Hey, thanks for all your help so far, I really appreciate it. -- Regards, Doug ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
gmirror: replacing failed disks
Hello, I've got a bootable gmirror running on identical SATA drives on 5.3-STABLE. The technique I've used to build the gmirror can be found on http://people.freebsd.org/~rse/gmirror, under the heading GEOM mirror Approach 2: Single Slice, Preferred, More Flexible. Now I'd like to experiment with replacing a failed drive. This particular box has hot-swappable drives, so all I need to do is pull a drive out while the box is running. The man page states: One disk failed. Replace it with a brand new one: gmirror forget data gmirror insert data da1 (My system has a provider gm0s1 with ad4 and ad6 as consumers, so I'll use those device names) Simulate ad4 failing: pull the drive put the drive back in, reboot if necessary to detect drive # gmirror forget gm0s1 # dd if=/dev/zero of=/dev/ad4 bs=512 count=79 # size=`fdisk ad6 | grep ', size ' | head -1 | sed -e 's;^.*size \([0-9]*\).*$;\1;'` (echo p 1 165 63 $size; echo a 1) | fdisk -v -B -f- -i /dev/ad4 OR # fdisk -v -B -I /dev/ad4 # gmirror insert gm0s1 /dev/ad4s1 (Now wait two hours for the drives synchronize) That should work, yes? How does gmirror know about /dev/ad4s1 if that drive was previously unformatted or brand new? -- Regards, Doug ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: gmirror: replacing failed disks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sunday 16 January 2005 21:14, Doug Poland wrote: Hello, I've got a bootable gmirror running on identical SATA drives on 5.3-STABLE. The technique I've used to build the gmirror can be found on http://people.freebsd.org/~rse/gmirror, under the heading GEOM mirror Approach 2: Single Slice, Preferred, More Flexible. Now I'd like to experiment with replacing a failed drive. This particular box has hot-swappable drives, so all I need to do is pull a drive out while the box is running. The man page states: One disk failed. Replace it with a brand new one: gmirror forget data gmirror insert data da1 (My system has a provider gm0s1 with ad4 and ad6 as consumers, so I'll use those device names) Simulate ad4 failing: pull the drive put the drive back in, reboot if necessary to detect drive After you put the drive in, you can try to attach or reinit the controller channel where it's connected to with the command 'atacontrol'. If you put the same drive in, and you haven't zeroed the bootblocks and the slicetable (on ad4) geom will recognice that the missing disk has been re-attached and will start rebuilding. If you want to simulate insertion of a blank disk, run the 'gmirror forget' command before you re-attach disk ad4. Then dd the first few blocks and the last sector of the old slice, where the gmirror metadata are stored. You can do this by 'dd if=/dev/zero of=/dev/ad4 bs=512 count=1 skip=n', where n=number of sectors to be skipped. In your case it's better, if you check where the metadata are stored. Maybe they are stored at the end of your disk. I think of this because of your gmirror list output, where ad4 and ad6 are listed as consumers. # gmirror forget gm0s1 # dd if=/dev/zero of=/dev/ad4 bs=512 count=79 You are missing the operator here. # size=`fdisk ad6 | grep ', size ' | head -1 | sed -e 's;^.*size \([0-9]*\).*$;\1;'` (echo p 1 165 63 $size; echo a 1) | fdisk -v -B ^^ -f- -i /dev/ad4 OR # fdisk -v -B -I /dev/ad4 # gmirror insert gm0s1 /dev/ad4s1 (Now wait two hours for the drives synchronize) That should work, yes? How does gmirror know about /dev/ad4s1 if that drive was previously unformatted or brand new? That should work, if you create the slice with 'fdisk -v -B -I /dev/ad4', but on the other hand, it would be very interresting, if gmirror really handles the consumers as they are displayed by your gmirror list command. I would blank disk ad4 (as I described above) and see what happens when you issue the command 'gmirror insert gm0s1 /dev/ad4'. Maybe gmirror handles drives with one slice that covers the whole drive, as disks (instead of slices)? I would give it a try. (If you try this, please could you post or pm me the 'gmirror list' output? Thank you!) Cheers, ch - -- Christian Hiris [EMAIL PROTECTED] | OpenPGP KeyID 0x3BCA53BE OpenPGP-Key at hkp://wwwkeys.eu.pgp.net and http://pgp.mit.edu -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.6 (FreeBSD) iD8DBQFB6udl09WjGjvKU74RAsVAAJ4sDZKZ8qZqxVf927yQXBxK7HO/ZwCfdWnL OWuuqs6UMMjwaK/1E9Ewm/o= =0IXG -END PGP SIGNATURE- ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]