Re: gmirror: replacing failed disks

2005-01-19 Thread Doug Poland
On Wed, Jan 19, 2005 at 06:55:53AM +0100, Christian Hiris wrote:
 
  GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:07:17 sgwww02 kernel:
  GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:07:17
  sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
 
  and on and on...
 
 
 Hi Doug, seems to me, that when you pulled drive ad4, data on it were
 damaged.  Thanks for trying this with ad4 as provider! 
 
 Now simply try to simulate a replacement with a fresh disk, as you
 would do in real life. Please set 'sysctl -w kern.geom.mirror.debug=0'
 and remove the according line from your /boot/loader.config. The
 procedure below is similar to the example in the gmirror manpage, just
 added 2. and 3. to make ad4 appear as a fresh disk and re-create the
 slice ad4s1. 
 
 1.   Let ad6 forget about all other gms01's  providers
  # gmirror forget gms01
 
 2.   Clean up ad4  
 
 2.1. Blank out the first few blocks of ad4
  # dd if=/dev/zero of=/dev/ad4 bs=512 count=128 
 
 2.2. Blank out gmirror metadata on ad4 
  # dd if=/dev/zero of=/dev/ad4 bs=512 skip=156301400
 
This command took a long time and I didn't let it complete.  I use the
tcsh and would occassionaly hit ctrl T to track it's progress.  I
didn't see it writing data to the disk nor did I see disk activity.  Did
I not wait long enough?

 3.   Initialize ad4 and create slize ad4s1
  # fdisk -v -B -I /dev/ad4
 
 4.   Add /dev/ad4s1 to mirror gm0s1
  # gmirror insert gm0s1 /dev/ad4s1
 
Christian,

Other than the issue with 2.2 above, the procedure worked and the
replacement drive is now synchronizing.  Now I'll print out a
transcript of this and tape it to the box,  Then, in three years, when a
drive dies, I'll remember what to do :)

Thanks again for all your help.

-- 
Regards,
Doug

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: gmirror: replacing failed disks

2005-01-19 Thread Christian Hiris
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wednesday 19 January 2005 22:53, you wrote:
 On Wed, Jan 19, 2005 at 06:55:53AM +0100, Christian Hiris wrote:

[...]

 
  2.2. Blank out gmirror metadata on ad4
   # dd if=/dev/zero of=/dev/ad4 bs=512 skip=156301400

 This command took a long time and I didn't let it complete.  I use the
 tcsh and would occassionaly hit ctrl T to track it's progress.  I
 didn't see it writing data to the disk nor did I see disk activity.  Did
 I not wait long enough?

Ooops, I just realized that option 'skip=n' skips the blocks on the *input* 
file. The correct option to skip blocks on the outfile is 'seek=n'. It 
didn't damage anything in our case, but it waisted your time. Sorry about 
this  mistaken option and the time you lost from this. The command's only 
purpose is to re-design the drive, as if it hasn't ever faced the gmirror 
framework before. 
   
Just for the archives the corrected (and double-checked) command: 

   2.2. Blank out gmirror metadata on ad4
# dd if=/dev/zero of=/dev/ad4 bs=512 seek=n
where n=(metadata_location_in_bytes/512)-1

[...]

 Christian,

 Other than the issue with 2.2 above, the procedure worked and the
 replacement drive is now synchronizing.  Now I'll print out a
 transcript of this and tape it to the box,  Then, in three years, when a
 drive dies, I'll remember what to do :)

 Thanks again for all your help.

Just in case, if the tape fails :)

   http://freebsd.rambler.ru

It's excellent!

Good luck,
ch

- -- 
Christian Hiris [EMAIL PROTECTED] | OpenPGP KeyID 0x3BCA53BE 
OpenPGP-Key at hkp://wwwkeys.eu.pgp.net and http://pgp.mit.edu
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.6 (FreeBSD)

iD8DBQFB7wOK09WjGjvKU74RAuzzAJ97NfgwdPng100HzfDOqItmGo4xfQCfeNhW
wQzw3GGrB/oWrWcWobrcEwI=
=x6Rz
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: gmirror: replacing failed disks

2005-01-18 Thread Doug Poland

 GEOM_MIRROR: Component ad4 (device gm0s1) broken, skipping.
 GEOM_MIRROR: Cannot add disk ad4 to gm0s1 (error=22).

 You can set 'kern.geom.mirror.debug=2' in /boot/loader.conf. This
 tells you more about what happens. I tested all this at a very early
 stage of development, so gmirror's behaviour might have changed.
 There is also a small chance that some bits in my brain got lost
 since I tested his :)

I set this while the OS is running.  GEOM_MIRROR responds every 5
seconds

GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean.
GEOM_MIRROR[2]: Metadata on ad6 updated.
GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty.
GEOM_MIRROR[2]: Metadata on ad6 updated.

Can I get gmirror to attempt to connect to ad4 again?  I tried an
atcontrol reinit 2, that didn't do it.  I also tried gmirror rebuild
gm0s1 ad4, but gmirror said: No such provider: ad4.

-- 
Regards,
Doug



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: gmirror: replacing failed disks

2005-01-18 Thread Doug Poland
On Tue, Jan 18, 2005 at 09:02:48AM -0600, Doug Poland wrote:
 
  GEOM_MIRROR: Component ad4 (device gm0s1) broken, skipping.
  GEOM_MIRROR: Cannot add disk ad4 to gm0s1 (error=22).
 
  You can set 'kern.geom.mirror.debug=2' in /boot/loader.conf. This
  tells you more about what happens. I tested all this at a very early
  stage of development, so gmirror's behaviour might have changed.
  There is also a small chance that some bits in my brain got lost
  since I tested his :)
 
 I set this while the OS is running.  GEOM_MIRROR responds every 5
 seconds
 
 GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean.
 GEOM_MIRROR[2]: Metadata on ad6 updated.
 GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty.
 GEOM_MIRROR[2]: Metadata on ad6 updated.
 
 Can I get gmirror to attempt to connect to ad4 again?  I tried an
 atcontrol reinit 2, that didn't do it.  I also tried gmirror rebuild
 gm0s1 ad4, but gmirror said: No such provider: ad4.
 
Sorry to reply to my own post, but when I rebooted the box, this is what
I see in /var/log/messages concerning gmirror:

... snip ...
Jan 18 21:06:10 sgwww02 kernel: ad4: 76319MB WDC WD800JD-00HKA0/13.03G13 
[155061/16/63] at ata2-master SATA150
Jan 18 21:06:10 sgwww02 kernel: ad6: 76319MB WDC WD800JD-00JNA0/05.01C05 
[155061/16/63] at ata3-master SATA150
Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Device gm0s1 created 
(id=594613568).
Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Device gm0s1: provider ad4 
detected.
Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Device gm0s1: provider ad6 
detected.
Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Component ad4 (device gm0s1) 
broken, skipping.
Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Device gm0s1: provider ad6 
activated.
Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR: Device gm0s1: provider 
mirror/gm0s1 launched.
Jan 18 21:06:10 sgwww02 kernel: Mounting root from ufs:/dev/mirror/gm0s1a
Jan 18 21:06:10 sgwww02 kernel: em0: Link is up 100 Mbps Full Duplex
Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as clean.
Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:10 sgwww02 kernel: em0: Link is up 100 Mbps Full Duplex
Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as dirty.
Jan 18 21:06:10 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:17 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as clean.
Jan 18 21:06:17 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:17 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as dirty.
Jan 18 21:06:17 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:22 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as clean.
Jan 18 21:06:22 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:22 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as dirty.
Jan 18 21:06:22 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:32 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as clean.
Jan 18 21:06:32 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:32 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as dirty.
Jan 18 21:06:32 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:46 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as clean.
Jan 18 21:06:46 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:46 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as dirty.
Jan 18 21:06:46 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:52 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as clean.
Jan 18 21:06:52 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:06:52 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as dirty.
Jan 18 21:06:52 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:07:02 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as clean.
Jan 18 21:07:02 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:07:02 sgwww02 kernel: GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked 
as dirty.
Jan 18 21:07:02 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.
Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for 
mirror/gm0s1: r1w0e0.
Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for 
mirror/gm0s1: r-1w0e0.
Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for 
mirror/gm0s1: r1w0e0.
Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for 
mirror/gm0s1: r-1w0e0.
Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for 
mirror/gm0s1: r1w0e0.
Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for 
mirror/gm0s1: r-1w0e0.
Jan 18 21:07:12 sgwww02 kernel: GEOM_MIRROR[2]: Access request for 
mirror/gm0s1: r1w0e0.
Jan 18 21:07:12 sgwww02 kernel: 

Re: gmirror: replacing failed disks

2005-01-18 Thread Christian Hiris
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wednesday 19 January 2005 05:36, Doug Poland wrote:
 On Tue, Jan 18, 2005 at 09:02:48AM -0600, Doug Poland wrote:
   GEOM_MIRROR: Component ad4 (device gm0s1) broken, skipping.
   GEOM_MIRROR: Cannot add disk ad4 to gm0s1 (error=22).
  
   You can set 'kern.geom.mirror.debug=2' in /boot/loader.conf. This
   tells you more about what happens. I tested all this at a very early
   stage of development, so gmirror's behaviour might have changed.
   There is also a small chance that some bits in my brain got lost
   since I tested his :)
 
  I set this while the OS is running.  GEOM_MIRROR responds every 5
  seconds
 
  GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as clean.
  GEOM_MIRROR[2]: Metadata on ad6 updated.
  GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty.
  GEOM_MIRROR[2]: Metadata on ad6 updated.
 
  Can I get gmirror to attempt to connect to ad4 again?  I tried an
  atcontrol reinit 2, that didn't do it.  I also tried gmirror rebuild
  gm0s1 ad4, but gmirror said: No such provider: ad4.

 Sorry to reply to my own post, but when I rebooted the box, this is what
 I see in /var/log/messages concerning gmirror:


 ... snip ...

 GEOM_MIRROR[2]: Metadata on ad6 updated. Jan 18 21:07:17 sgwww02 kernel:
 GEOM_MIRROR[1]: Disk ad6 (device gm0s1) marked as dirty. Jan 18 21:07:17
 sgwww02 kernel: GEOM_MIRROR[2]: Metadata on ad6 updated.

 and on and on...

 Not sure what to do to get gmirror to recogonize this disk.

Hi Doug, seems to me, that when you pulled drive ad4, data on it were damaged. 
Thanks for trying this with ad4 as provider! 

Now simply try to simulate a replacement with a fresh disk, as you would do in 
real life. Please set 'sysctl -w kern.geom.mirror.debug=0' and remove the 
according line from your /boot/loader.config. The procedure below is similar 
to the example in the gmirror manpage, just added 2. and 3. to make ad4 
appear as a fresh disk and re-create the slice ad4s1. 

1.   Let ad6 forget about all other gms01's  providers
 # gmirror forget gms01

2.   Clean up ad4  

2.1. Blank out the first few blocks of ad4
 # dd if=/dev/zero of=/dev/ad4 bs=512 count=128 

2.2. Blank out gmirror metadata on ad4 
 # dd if=/dev/zero of=/dev/ad4 bs=512 skip=156301400

3.   Initialize ad4 and create slize ad4s1
 # fdisk -v -B -I /dev/ad4

4.   Add /dev/ad4s1 to mirror gm0s1
 # gmirror insert gm0s1 /dev/ad4s1

Cheers,
ch 

- -- 
Christian Hiris [EMAIL PROTECTED] | OpenPGP KeyID 0x3BCA53BE 
OpenPGP-Key at hkp://wwwkeys.eu.pgp.net and http://pgp.mit.edu
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.6 (FreeBSD)

iD8DBQFB7fZ309WjGjvKU74RAu8bAJ90bGmDZ5WRG+cLWSvnR6dLt+whSgCaAkDS
lNzBZNp+DVOlzNuA30rmKXQ=
=ZPkt
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: gmirror: replacing failed disks

2005-01-17 Thread Doug Poland
On Sun, Jan 16, 2005 at 11:14:48PM +0100, Christian Hiris wrote:
 On Sunday 16 January 2005 21:14, Doug Poland wrote:
 
  (My system has a provider gm0s1 with ad4 and ad6 as consumers, so
  I'll use those device names)
 
  Simulate ad4 failing:
 
  pull the drive
  put the drive back in, reboot if necessary to detect drive
 
 After you put the drive in, you can try to attach or reinit the
 controller channel where it's connected to with the command
 'atacontrol'.
 
did that, atacontrol reinit 2, and the drive shows up, yee haw!

 If you put the same drive in, and you haven't zeroed the bootblocks
 and the slicetable (on ad4) geom will recognice that the missing disk
 has been re-attached and will start rebuilding. 
 
That's not happening, gmirror says:

GEOM_MIRROR: Component ad4 (device gm0s1) broken, skipping.
GEOM_MIRROR: Cannot add disk ad4 to gm0s1 (error=22).

 If you want to simulate insertion of a blank disk, run the 'gmirror
 forget' 

I'll try this after I've got the above working :)

Hey, thanks for all your help so far, I really appreciate it.

-- 
Regards,
Doug
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


gmirror: replacing failed disks

2005-01-16 Thread Doug Poland
Hello,

I've got a bootable gmirror running on identical SATA drives on
5.3-STABLE.  The technique I've used to build the gmirror can be found
on http://people.freebsd.org/~rse/gmirror, under the heading GEOM
mirror Approach 2: Single Slice, Preferred, More Flexible.  Now I'd
like to experiment with replacing a failed drive.  This particular box
has hot-swappable drives, so all I need to do is pull a drive out while
the box is running.  

The man page states:

   One disk failed. Replace it with a brand new one:

   gmirror forget data
   gmirror insert data da1

(My system has a provider gm0s1 with ad4 and ad6 as consumers, so I'll
use those device names)

Simulate ad4 failing:

pull the drive
put the drive back in, reboot if necessary to detect drive

# gmirror forget gm0s1
# dd if=/dev/zero of=/dev/ad4 bs=512 count=79

# size=`fdisk ad6 | grep ', size ' | head -1 | sed -e 's;^.*size 
\([0-9]*\).*$;\1;'` (echo p 1 165 63 $size; echo a 1) | fdisk -v -B -f- -i 
/dev/ad4
  OR 
# fdisk -v -B -I /dev/ad4

# gmirror insert gm0s1 /dev/ad4s1

(Now wait two hours for the drives synchronize)

That should work, yes?  How does gmirror know about /dev/ad4s1 if that
drive was previously unformatted or brand new?


-- 
Regards,
Doug
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: gmirror: replacing failed disks

2005-01-16 Thread Christian Hiris
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sunday 16 January 2005 21:14, Doug Poland wrote:
 Hello,

 I've got a bootable gmirror running on identical SATA drives on
 5.3-STABLE.  The technique I've used to build the gmirror can be found
 on http://people.freebsd.org/~rse/gmirror, under the heading GEOM
 mirror Approach 2: Single Slice, Preferred, More Flexible.  Now I'd
 like to experiment with replacing a failed drive.  This particular box
 has hot-swappable drives, so all I need to do is pull a drive out while
 the box is running.

 The man page states:

One disk failed. Replace it with a brand new one:

gmirror forget data
gmirror insert data da1

 (My system has a provider gm0s1 with ad4 and ad6 as consumers, so I'll
 use those device names)

 Simulate ad4 failing:

 pull the drive
 put the drive back in, reboot if necessary to detect drive

After you put the drive in, you can try to attach or reinit the controller 
channel where it's connected to with the command 'atacontrol'.

If you put the same drive in, and you haven't zeroed the bootblocks and the 
slicetable (on ad4) geom will recognice that the missing disk has been 
re-attached and will start rebuilding. 

If you want to simulate insertion of a blank disk, run the 'gmirror forget' 
command before you re-attach disk ad4. Then dd the first few blocks and the 
last sector of the old slice, where the gmirror metadata are stored. You can 
do this by 'dd if=/dev/zero of=/dev/ad4 bs=512 count=1 skip=n', where 
n=number of sectors to be skipped.

In your case it's better, if you check where the metadata are stored. Maybe 
they are stored at the end of your disk. I think of this because of your 
gmirror list output, where ad4 and ad6 are listed as consumers.   

 # gmirror forget gm0s1
 # dd if=/dev/zero of=/dev/ad4 bs=512 count=79

You are missing the  operator here.  

 # size=`fdisk ad6 | grep ', size ' | head -1 | sed -e 's;^.*size
 \([0-9]*\).*$;\1;'`  (echo p 1 165 63 $size; echo a 1) | fdisk -v -B  
  ^^
 -f- -i /dev/ad4  
 OR 
 # fdisk -v -B -I /dev/ad4

 # gmirror insert gm0s1 /dev/ad4s1

 (Now wait two hours for the drives synchronize)

 That should work, yes?  How does gmirror know about /dev/ad4s1 if that
 drive was previously unformatted or brand new?

That should work, if you create the slice with 'fdisk -v -B -I /dev/ad4', but 
on the other hand, it would be very interresting, if gmirror really handles 
the consumers as they are displayed by your gmirror list command.
I would blank disk ad4 (as I described above) and see what happens when you 
issue the command 'gmirror insert gm0s1 /dev/ad4'. Maybe gmirror handles 
drives with one slice that covers the whole drive, as disks (instead of 
slices)? I would give it a try. (If you try this, please could you post or pm 
me the 'gmirror list' output? Thank you!)  

Cheers,
ch

- -- 
Christian Hiris [EMAIL PROTECTED] | OpenPGP KeyID 0x3BCA53BE 
OpenPGP-Key at hkp://wwwkeys.eu.pgp.net and http://pgp.mit.edu
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.6 (FreeBSD)

iD8DBQFB6udl09WjGjvKU74RAsVAAJ4sDZKZ8qZqxVf927yQXBxK7HO/ZwCfdWnL
OWuuqs6UMMjwaK/1E9Ewm/o=
=0IXG
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]