Re: Problems replacing failing drive in ZFS pool

2010-07-21 Thread Joshua Boyd
On Wed, Jul 21, 2010 at 1:57 AM, alan bryan alanbryan1...@yahoo.com wrote:



 --- On Mon, 7/19/10, Dan Langille d...@langille.org wrote:

  From: Dan Langille d...@langille.org
  Subject: Re: Problems replacing failing drive in ZFS pool
  To: Freddie Cash fjwc...@gmail.com
  Cc: freebsd-stable freebsd-stable@freebsd.org
  Date: Monday, July 19, 2010, 7:07 PM
  On 7/19/2010 12:15 PM, Freddie Cash
  wrote:
   On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com
 
  wrote:
   So you think it's because when I switch from the
  old disk to the new disk,
   ZFS doesn't realize the disk has changed, and
  thinks the data is just
   corrupt now? Even if that happens, shouldn't the
  pool still be available,
   since it's RAIDZ1 and only one disk has gone
  away?
  
   I think it's because you pull the old drive, boot with
  the new drive,
   the controller re-numbers all the devices (ie da3 is
  now da2, da2 is
   now da1, da1 is now da0, da0 is now da6, etc), and ZFS
  thinks that all
   the drives have changed, thus corrupting the
  pool.  I've had this
   happen on our storage servers a couple of times before
  I started using
   glabel(8) on all our drives (dead drive on RAID
  controller, remove
   drive, reboot for whatever reason, all device nodes
  are renumbered,
   everything goes kablooey).
 
  Can you explain a bit about how you use glabel(8) in
  conjunction with ZFS?  If I can retrofit this into an
  exist ZFS array to make things easier in the future...
 
  8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010
 
  ]# zpool status
pool: storage
   state: ONLINE
   scrub: none requested
  config:
 
  NAME
  STATE READ WRITE CKSUM
  storage
 ONLINE
 0 0
 0
raidz1
  ONLINE   0
 0 0
  ad8
 ONLINE
 0 0
 0
  ad10
  ONLINE   0
 0 0
  ad12
  ONLINE   0
 0 0
  ad14
  ONLINE   0
 0 0
  ad16
  ONLINE   0
 0 0
 
   Of course, always have good backups.  ;)
 
  In my case, this ZFS array is the backup.  ;)
 
  But I'm setting up a tape library, real soon now
 
  -- Dan Langille - http://langille.org/
  ___
  freebsd-stable@freebsd.org
  mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 
 

 Dan,

 Here's how to do it after the fact:


 http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html

 --Alan Bryan


[r...@foghornleghorn ~]# glabel label disk01 /dev/da0
glabel: Can't store metadata on /dev/da0: Operation not permitted.

Hrmph.








 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org




-- 
Joshua Boyd
JBipNet

E-mail: boy...@jbip.net

http://www.jbip.net
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-21 Thread Joshua Boyd
On Wed, Jul 21, 2010 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jul 21, 2010 at 1:57 AM, alan bryan alanbryan1...@yahoo.comwrote:



 --- On Mon, 7/19/10, Dan Langille d...@langille.org wrote:

  From: Dan Langille d...@langille.org
  Subject: Re: Problems replacing failing drive in ZFS pool
  To: Freddie Cash fjwc...@gmail.com
  Cc: freebsd-stable freebsd-stable@freebsd.org
  Date: Monday, July 19, 2010, 7:07 PM
  On 7/19/2010 12:15 PM, Freddie Cash
  wrote:
   On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com
 
  wrote:
   So you think it's because when I switch from the
  old disk to the new disk,
   ZFS doesn't realize the disk has changed, and
  thinks the data is just
   corrupt now? Even if that happens, shouldn't the
  pool still be available,
   since it's RAIDZ1 and only one disk has gone
  away?
  
   I think it's because you pull the old drive, boot with
  the new drive,
   the controller re-numbers all the devices (ie da3 is
  now da2, da2 is
   now da1, da1 is now da0, da0 is now da6, etc), and ZFS
  thinks that all
   the drives have changed, thus corrupting the
  pool.  I've had this
   happen on our storage servers a couple of times before
  I started using
   glabel(8) on all our drives (dead drive on RAID
  controller, remove
   drive, reboot for whatever reason, all device nodes
  are renumbered,
   everything goes kablooey).
 
  Can you explain a bit about how you use glabel(8) in
  conjunction with ZFS?  If I can retrofit this into an
  exist ZFS array to make things easier in the future...
 
  8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010
 
  ]# zpool status
pool: storage
   state: ONLINE
   scrub: none requested
  config:
 
  NAME
  STATE READ WRITE CKSUM
  storage
 ONLINE
 0 0
 0
raidz1
  ONLINE   0
 0 0
  ad8
 ONLINE
 0 0
 0
  ad10
  ONLINE   0
 0 0
  ad12
  ONLINE   0
 0 0
  ad14
  ONLINE   0
 0 0
  ad16
  ONLINE   0
 0 0
 
   Of course, always have good backups.  ;)
 
  In my case, this ZFS array is the backup.  ;)
 
  But I'm setting up a tape library, real soon now
 
  -- Dan Langille - http://langille.org/
  ___
  freebsd-stable@freebsd.org
  mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to 
 freebsd-stable-unsubscr...@freebsd.org
 

 Dan,

 Here's how to do it after the fact:


 http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html

 --Alan Bryan


 [r...@foghornleghorn ~]# glabel label disk01 /dev/da0
 glabel: Can't store metadata on /dev/da0: Operation not permitted.

 Hrmph.


Nevermind, sysctl kern.geom.debugflags=16 solves that problem, but then you
get this:

[r...@foghornleghorn ~]# zpool replace tank da0 label/disk01
cannot open 'label/disk01': no such GEOM provider
must be a full path or shorthand device name











 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org




 --
 Joshua Boyd
 JBipNet

 E-mail: boy...@jbip.net

 http://www.jbip.net




-- 
Joshua Boyd
JBipNet

E-mail: boy...@jbip.net

http://www.jbip.net
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-21 Thread jhell
On 07/21/2010 02:14, Joshua Boyd wrote:
 [r...@foghornleghorn ~]# zpool replace tank da0 label/disk01
 cannot open 'label/disk01': no such GEOM provider
 must be a full path or shorthand device name

Of course you cant. You have labeled a disk that is already in use so in
turn the label should never appear in dev/label/*.

If you tried to re-silver the same disk that was already in use I would
think if it could be done that the result would be of inconsistent data
and write errors all over the place.


Regards,

-- 

 jhell,v

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-21 Thread Charles Sprickman

On Tue, 20 Jul 2010, alan bryan wrote:




--- On Mon, 7/19/10, Dan Langille d...@langille.org wrote:


From: Dan Langille d...@langille.org
Subject: Re: Problems replacing failing drive in ZFS pool
To: Freddie Cash fjwc...@gmail.com
Cc: freebsd-stable freebsd-stable@freebsd.org
Date: Monday, July 19, 2010, 7:07 PM
On 7/19/2010 12:15 PM, Freddie Cash
wrote:
 On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com 
wrote:

 So you think it's because when I switch from the
old disk to the new disk,
 ZFS doesn't realize the disk has changed, and
thinks the data is just
 corrupt now? Even if that happens, shouldn't the
pool still be available,
 since it's RAIDZ1 and only one disk has gone
away?
 
 I think it's because you pull the old drive, boot with

the new drive,
 the controller re-numbers all the devices (ie da3 is
now da2, da2 is
 now da1, da1 is now da0, da0 is now da6, etc), and ZFS
thinks that all
 the drives have changed, thus corrupting the
pool.  I've had this
 happen on our storage servers a couple of times before
I started using
 glabel(8) on all our drives (dead drive on RAID
controller, remove
 drive, reboot for whatever reason, all device nodes
are renumbered,
 everything goes kablooey).

Can you explain a bit about how you use glabel(8) in
conjunction with ZFS?  If I can retrofit this into an
exist ZFS array to make things easier in the future...

8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010

]# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME 
STATE     READ WRITE CKSUM

        storage
   ONLINE
   0     0
   0
          raidz1 
ONLINE       0

   0     0
            ad8
   ONLINE
   0     0
   0
            ad10 
ONLINE       0

   0     0
            ad12 
ONLINE       0

   0     0
            ad14 
ONLINE       0

   0     0
            ad16 
ONLINE       0

   0     0

 Of course, always have good backups.  ;)

In my case, this ZFS array is the backup.  ;)

But I'm setting up a tape library, real soon now

-- Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org
mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org



Dan,

Here's how to do it after the fact:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html


Two things:

-What's the preferred labelling method for disks that will be used with 
zfs these days?  geom_label or gpt labels?  I've been using the latter and 
I find them a little simpler.


-I think that if you already are using gpt partitioning, you can add a 
gpt label after the fact (ie: gpart -i index# -l your_label adaX).  gpart 
list will give you a list of index numbers.


Charles


--Alan Bryan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Problems replacing failing drive in ZFS pool

2010-07-21 Thread Charles Sprickman

On Wed, 21 Jul 2010, Charles Sprickman wrote:


On Tue, 20 Jul 2010, alan bryan wrote:




--- On Mon, 7/19/10, Dan Langille d...@langille.org wrote:


From: Dan Langille d...@langille.org
Subject: Re: Problems replacing failing drive in ZFS pool
To: Freddie Cash fjwc...@gmail.com
Cc: freebsd-stable freebsd-stable@freebsd.org
Date: Monday, July 19, 2010, 7:07 PM
On 7/19/2010 12:15 PM, Freddie Cash
wrote:
 On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com 
wrote:

 So you think it's because when I switch from the
old disk to the new disk,
 ZFS doesn't realize the disk has changed, and
thinks the data is just
 corrupt now? Even if that happens, shouldn't the
pool still be available,
 since it's RAIDZ1 and only one disk has gone
away?
  I think it's because you pull the old drive, boot with
the new drive,
 the controller re-numbers all the devices (ie da3 is
now da2, da2 is
 now da1, da1 is now da0, da0 is now da6, etc), and ZFS
thinks that all
 the drives have changed, thus corrupting the
pool.  I've had this
 happen on our storage servers a couple of times before
I started using
 glabel(8) on all our drives (dead drive on RAID
controller, remove
 drive, reboot for whatever reason, all device nodes
are renumbered,
 everything goes kablooey).

Can you explain a bit about how you use glabel(8) in
conjunction with ZFS?  If I can retrofit this into an
exist ZFS array to make things easier in the future...

8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010

]# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME STATE     READ WRITE CKSUM
        storage
   ONLINE
   0     0
   0
          raidz1 ONLINE       0
   0     0
            ad8
   ONLINE
   0     0
   0
            ad10 ONLINE       0
   0     0
            ad12 ONLINE       0
   0     0
            ad14 ONLINE       0
   0     0
            ad16 ONLINE       0
   0     0

 Of course, always have good backups.  ;)

In my case, this ZFS array is the backup.  ;)

But I'm setting up a tape library, real soon now

-- Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org
mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org



Dan,

Here's how to do it after the fact:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html


Two things:

-What's the preferred labelling method for disks that will be used with zfs 
these days?  geom_label or gpt labels?  I've been using the latter and I find 
them a little simpler.


-I think that if you already are using gpt partitioning, you can add a gpt 
label after the fact (ie: gpart -i index# -l your_label adaX).  gpart list 
will give you a list of index numbers.


Oops.

That should be gpart modify -i index# -l your_label adax.


Charles


--Alan Bryan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Problems replacing failing drive in ZFS pool

2010-07-21 Thread Dan Langille

On 7/21/2010 2:54 AM, Charles Sprickman wrote:

On Wed, 21 Jul 2010, Charles Sprickman wrote:


On Tue, 20 Jul 2010, alan bryan wrote:




--- On Mon, 7/19/10, Dan Langille d...@langille.org wrote:


From: Dan Langille d...@langille.org
Subject: Re: Problems replacing failing drive in ZFS pool
To: Freddie Cash fjwc...@gmail.com
Cc: freebsd-stable freebsd-stable@freebsd.org
Date: Monday, July 19, 2010, 7:07 PM
On 7/19/2010 12:15 PM, Freddie Cash
wrote:
 On Mon, Jul 19, 2010 at 8:56 AM, Garrett
Mooregarrettmo...@gmail.com wrote:
 So you think it's because when I switch from the
old disk to the new disk,
 ZFS doesn't realize the disk has changed, and
thinks the data is just
 corrupt now? Even if that happens, shouldn't the
pool still be available,
 since it's RAIDZ1 and only one disk has gone
away?
  I think it's because you pull the old drive, boot with
the new drive,
 the controller re-numbers all the devices (ie da3 is
now da2, da2 is
 now da1, da1 is now da0, da0 is now da6, etc), and ZFS
thinks that all
 the drives have changed, thus corrupting the
pool.  I've had this
 happen on our storage servers a couple of times before
I started using
 glabel(8) on all our drives (dead drive on RAID
controller, remove
 drive, reboot for whatever reason, all device nodes
are renumbered,
 everything goes kablooey).

Can you explain a bit about how you use glabel(8) in
conjunction with ZFS?  If I can retrofit this into an
exist ZFS array to make things easier in the future...

8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010

]# zpool status
  pool: storage
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
storage
   ONLINE
   0 0
   0
  raidz1 ONLINE   0
   0 0
ad8
   ONLINE
   0 0
   0
ad10 ONLINE   0
   0 0
ad12 ONLINE   0
   0 0
ad14 ONLINE   0
   0 0
ad16 ONLINE   0
   0 0

 Of course, always have good backups.  ;)

In my case, this ZFS array is the backup.  ;)

But I'm setting up a tape library, real soon now

-- Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org
mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to
freebsd-stable-unsubscr...@freebsd.org



Dan,

Here's how to do it after the fact:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html



Two things:

-What's the preferred labelling method for disks that will be used
with zfs these days? geom_label or gpt labels? I've been using the
latter and I find them a little simpler.

-I think that if you already are using gpt partitioning, you can add a
gpt label after the fact (ie: gpart -i index# -l your_label adaX).
gpart list will give you a list of index numbers.


Oops.

That should be gpart modify -i index# -l your_label adax.


I'm not using gpt partitioning.  I think I'd like to try that.  To do 
just that, I've ordered two more HDD.  They'll be arriving today.


--
Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-21 Thread Freddie Cash
On Tue, Jul 20, 2010 at 11:43 PM, Charles Sprickman sp...@bway.net wrote:
 Two things:

 -What's the preferred labelling method for disks that will be used with zfs
 these days?  geom_label or gpt labels?  I've been using the latter and I
 find them a little simpler.

If the disks will only be used in FreeBSD systems, and you are using
the entire disk for ZFS (no partitioning), then glabel works well, and
is the easiest to use.

If you want to be able to move the disks between FreeBSD, OpenSolaris,
Solaris, ZFS-FUSE on Linux, etc, then you need to use GPT labels.
glabel is not portable.  I have not used gpart, so don't know if you
can label the disk or just partitions on the disk.

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-21 Thread Dan Langille

On 7/19/2010 10:50 PM, Adam Vande More wrote:

On Mon, Jul 19, 2010 at 9:07 PM, Dan Langilled...@langille.org  wrote:


I think it's because you pull the old drive, boot with the new drive,

the controller re-numbers all the devices (ie da3 is now da2, da2 is
now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all
the drives have changed, thus corrupting the pool.  I've had this
happen on our storage servers a couple of times before I started using
glabel(8) on all our drives (dead drive on RAID controller, remove
drive, reboot for whatever reason, all device nodes are renumbered,
everything goes kablooey).




Can you explain a bit about how you use glabel(8) in conjunction with ZFS?
  If I can retrofit this into an exist ZFS array to make things easier in the
future...



If you've used whole disks in ZFS, you can't retrofit it if by retrofit you
mean an almost painless method of resolving this.  GEOM setup stuff
generally should happen BEFORE the file system is on it.

You would create your partition(s) slightly smaller than the disk, label it,
then use the resulting device as your zfs device when creating the pool.  If
you have an existing full disk install, that means restoring the data after
you've done those steps.  It works just as well with MBR style partitioning,
there's nothing saying you have to use GPT.  GPT is just better though in
terms of ease of use IMO among other things.


FYI, this is exactly what I'm doing to do.  I have obtained addition HDD 
to serve as temporary storage.  I will also use them for practicing the 
commands before destroying the original array.  I'll post my plan to the 
list for review.


--
Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread Andriy Gapon
on 20/07/2010 01:04 Garrett Moore said the following:
 Well, hotswapping worked, but now I have a totally different problem. Just
 for reference:
 # zpool offline tank da3
 # camcontrol stop da3
 swap drive
 # camcontrol rescan all
 'da3 lost device, removing device entry'
 # camcontrol rescan all
 'da3 at mpt0 ...', so new drive was found! yay
 # zpool replace tank da3
 *cannot replace da3 with da3: device is too small*
 
 So I looked at the smartctl output for the old and new drive. Old:
 Device Model: WDC WD15EADS-00P8B0
 Serial Number:WD-WMAVU0087717
 Firmware Version: 01.00A01
 User Capacity:1,500,301,910,016 bytes
 
 New:
 Device Model: WDC WD15EADS-00R6B0
 Serial Number:WD-WCAVY4770428
 Firmware Version: 01.00A01
 User Capacity:1,500,300,828,160 bytes
 
 God damnit, Western Digital. What can I do now? It's such a small
 difference, is there a way I can work around this? My other replacement
 drive is the 00R6B0 drive model as well, with the slightly smaller
 capacity.

I second what others have said - crap.
But there could be some hope, not sure.
Can you check what is the actual size used by the pool on the disk?
It should be somewhere in zdb -C output (asize?).
If I remember correctly, that actual size should be a multiple of some rather
large power of two, so it could be that it is smaller than 'User Capacity' of 
both
old and new drives.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread Pawel Tyll
Hi guys,

 I second what others have said - crap.
 But there could be some hope, not sure.
 Can you check what is the actual size used by the pool on the disk?
 It should be somewhere in zdb -C output (asize?).
 If I remember correctly, that actual size should be a multiple of some rather
 large power of two, so it could be that it is smaller than 'User Capacity' of 
 both
 old and new drives.
Well, I see some possibilities for creative solution here, using some
ssd (or usb stick or mdconfig as act of desperation) and gconcat, but
it's asking for trouble and should probably be considered a temporary
hack.

What I personally would do is get a 2TB drive and use it instead, with
gpt and -l for label, and replace it as gpt/something. Using 100 or so
MB less than whole disk is also a good idea, as you can see ;)

Cheers and good luck.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread alan bryan


--- On Mon, 7/19/10, Dan Langille d...@langille.org wrote:

 From: Dan Langille d...@langille.org
 Subject: Re: Problems replacing failing drive in ZFS pool
 To: Freddie Cash fjwc...@gmail.com
 Cc: freebsd-stable freebsd-stable@freebsd.org
 Date: Monday, July 19, 2010, 7:07 PM
 On 7/19/2010 12:15 PM, Freddie Cash
 wrote:
  On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com 
 wrote:
  So you think it's because when I switch from the
 old disk to the new disk,
  ZFS doesn't realize the disk has changed, and
 thinks the data is just
  corrupt now? Even if that happens, shouldn't the
 pool still be available,
  since it's RAIDZ1 and only one disk has gone
 away?
  
  I think it's because you pull the old drive, boot with
 the new drive,
  the controller re-numbers all the devices (ie da3 is
 now da2, da2 is
  now da1, da1 is now da0, da0 is now da6, etc), and ZFS
 thinks that all
  the drives have changed, thus corrupting the
 pool.  I've had this
  happen on our storage servers a couple of times before
 I started using
  glabel(8) on all our drives (dead drive on RAID
 controller, remove
  drive, reboot for whatever reason, all device nodes
 are renumbered,
  everything goes kablooey).
 
 Can you explain a bit about how you use glabel(8) in
 conjunction with ZFS?  If I can retrofit this into an
 exist ZFS array to make things easier in the future...
 
 8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010
 
 ]# zpool status
   pool: storage
  state: ONLINE
  scrub: none requested
 config:
 
         NAME       
 STATE     READ WRITE CKSUM
         storage 
    ONLINE   
    0     0 
    0
           raidz1   
 ONLINE       0 
    0     0
             ad8 
    ONLINE   
    0     0 
    0
             ad10   
 ONLINE       0 
    0     0
             ad12   
 ONLINE       0 
    0     0
             ad14   
 ONLINE       0 
    0     0
             ad16   
 ONLINE       0 
    0     0
 
  Of course, always have good backups.  ;)
 
 In my case, this ZFS array is the backup.  ;)
 
 But I'm setting up a tape library, real soon now
 
 -- Dan Langille - http://langille.org/
 ___
 freebsd-stable@freebsd.org
 mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 

Dan,

Here's how to do it after the fact:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html

--Alan Bryan






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Jeremy Chadwick
On Mon, Jul 19, 2010 at 11:21:38AM -0400, Garrett Moore wrote:
 I have an 8-drive ZFS array consisting of WD15EADS drives. One of my disks
 has started to fail, so I got a replacement disk. I have replaced a disk
 before by:
 
  zpool offline tank /dev/da5
 shutting down, swapping from old disk to new disk
 booting
  zpool replace tank /dev/da5
 
 This worked fine.
 
 This time the failing disk was da3, and I tried the same thing:
  zpool offline tank /dev/da3
 zpool status showed da3 offline.
 shut down, swapped old disk to new disk.
 
 When I booted again, I got:
  Code:
 
 NAMESTATE READ WRITE CKSUM
 tankUNAVAIL  0 0 0  insufficient replicas
   raidz1UNAVAIL  0 0 0  corrupted data
 da0 ONLINE   0 0 0
 da1 ONLINE   0 0 0
 da2 ONLINE   0 0 0
 da3 ONLINE   0 0 0
 da4 ONLINE   0 0 0
 da5 ONLINE   0 0 0
 da6 ONLINE   0 0 0
 da7 ONLINE   0 0 0
 
 I switched back to the old disk and booted again and then I could access my
 data again, and da3 still showed as offline. I tried 'zpool online tank
 /dev/da3' and after a few seconds resilvering completed and all 8 drives are
 back online again, but with the 'dying' disk as da3 still.
 
 I tried shutting down WITHOUT first offlining /dev/da3, and swapping the
 disks, and when I booted I again got 'insufficient replicas'.
 
 Why am I getting this error, and how come it worked ok the last time I
 replaced a disk? And more importantly, how do I switch to my new replacement
 disk without losing data?

Can you please provide uname -a output?  Thanks.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Garrett Moore
Oops - shouldn't have forgotten that, sorry.

FreeBSD leviathan 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21
15:02:08 UTC 2009
r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64



On Mon, Jul 19, 2010 at 11:24 AM, Jeremy Chadwick
free...@jdc.parodius.comwrote:

 On Mon, Jul 19, 2010 at 11:21:38AM -0400, Garrett Moore wrote:
  I have an 8-drive ZFS array consisting of WD15EADS drives. One of my
 disks
  has started to fail, so I got a replacement disk. I have replaced a disk
  before by:
 
   zpool offline tank /dev/da5
  shutting down, swapping from old disk to new disk
  booting
   zpool replace tank /dev/da5
 
  This worked fine.
 
  This time the failing disk was da3, and I tried the same thing:
   zpool offline tank /dev/da3
  zpool status showed da3 offline.
  shut down, swapped old disk to new disk.
 
  When I booted again, I got:
   Code:
 
  NAMESTATE READ WRITE CKSUM
  tankUNAVAIL  0 0 0  insufficient replicas
raidz1UNAVAIL  0 0 0  corrupted data
  da0 ONLINE   0 0 0
  da1 ONLINE   0 0 0
  da2 ONLINE   0 0 0
  da3 ONLINE   0 0 0
  da4 ONLINE   0 0 0
  da5 ONLINE   0 0 0
  da6 ONLINE   0 0 0
  da7 ONLINE   0 0 0
 
  I switched back to the old disk and booted again and then I could access
 my
  data again, and da3 still showed as offline. I tried 'zpool online tank
  /dev/da3' and after a few seconds resilvering completed and all 8 drives
 are
  back online again, but with the 'dying' disk as da3 still.
 
  I tried shutting down WITHOUT first offlining /dev/da3, and swapping the
  disks, and when I booted I again got 'insufficient replicas'.
 
  Why am I getting this error, and how come it worked ok the last time I
  replaced a disk? And more importantly, how do I switch to my new
 replacement
  disk without losing data?

 Can you please provide uname -a output?  Thanks.

 --
 | Jeremy Chadwick   j...@parodius.com |
 | Parodius Networking   http://www.parodius.com/ |
 | UNIX Systems Administrator  Mountain View, CA, USA |
 | Making life hard for others since 1977.  PGP: 4BD6C0CB |


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Freddie Cash
On Mon, Jul 19, 2010 at 8:21 AM, Garrett Moore garrettmo...@gmail.com wrote:
 I have an 8-drive ZFS array consisting of WD15EADS drives. One of my disks
 has started to fail, so I got a replacement disk. I have replaced a disk
 before by:

  zpool offline tank /dev/da5
 shutting down, swapping from old disk to new disk
 booting
  zpool replace tank /dev/da5

 This worked fine.

 This time the failing disk was da3, and I tried the same thing:
  zpool offline tank /dev/da3
 zpool status showed da3 offline.
 shut down, swapped old disk to new disk.

For some reason, ZFS is getting confused by the device names, possibly
due to the controller renumbering device nodes?

Try the following:
  zpool offline tank /dev/da3
  zpool status tank   to make sure it offlined
the correct drive
  zpool export tank   might have to do this
from single-user mode
  reboot

  zpool import tank   this forces ZFS to
re-taste each drive to read the metadata
  zpool replace tank /dev/da3this should force it to use
the correct drive

Note:  if you have / on ZFS, the above may not be doable.

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Garrett Moore
So you think it's because when I switch from the old disk to the new disk,
ZFS doesn't realize the disk has changed, and thinks the data is just
corrupt now? Even if that happens, shouldn't the pool still be available,
since it's RAIDZ1 and only one disk has gone away?

I don't have / on ZFS; I'm only using it as a 'data' partition, so I should
be able to try your suggestion. My only concern: is there any risk of
trashing my pool if I try your instructions? Everything I've done so far,
even when told insufficient replicas / corrupt data, has not cost me any
data as long as I switch back to the original (dying) drive. If I mix in
export/import statements which might 'touch' the pool, is there a chance it
will choke and trash my data?



On Mon, Jul 19, 2010 at 11:45 AM, Freddie Cash fjwc...@gmail.com wrote:

 On Mon, Jul 19, 2010 at 8:21 AM, Garrett Moore garrettmo...@gmail.com
 wrote:
  I have an 8-drive ZFS array consisting of WD15EADS drives. One of my
 disks
  has started to fail, so I got a replacement disk. I have replaced a disk
  before by:
 
   zpool offline tank /dev/da5
  shutting down, swapping from old disk to new disk
  booting
   zpool replace tank /dev/da5
 
  This worked fine.
 
  This time the failing disk was da3, and I tried the same thing:
   zpool offline tank /dev/da3
  zpool status showed da3 offline.
  shut down, swapped old disk to new disk.

 For some reason, ZFS is getting confused by the device names, possibly
 due to the controller renumbering device nodes?

 Try the following:
   zpool offline tank /dev/da3
   zpool status tank   to make sure it offlined
 the correct drive
  zpool export tank   might have to do this
 from single-user mode
  reboot

  zpool import tank   this forces ZFS to
 re-taste each drive to read the metadata
  zpool replace tank /dev/da3this should force it to use
 the correct drive

 Note:  if you have / on ZFS, the above may not be doable.

 --
 Freddie Cash
 fjwc...@gmail.com
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Freddie Cash
On Mon, Jul 19, 2010 at 8:56 AM, Garrett Moore garrettmo...@gmail.com wrote:
 So you think it's because when I switch from the old disk to the new disk,
 ZFS doesn't realize the disk has changed, and thinks the data is just
 corrupt now? Even if that happens, shouldn't the pool still be available,
 since it's RAIDZ1 and only one disk has gone away?

I think it's because you pull the old drive, boot with the new drive,
the controller re-numbers all the devices (ie da3 is now da2, da2 is
now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all
the drives have changed, thus corrupting the pool.  I've had this
happen on our storage servers a couple of times before I started using
glabel(8) on all our drives (dead drive on RAID controller, remove
drive, reboot for whatever reason, all device nodes are renumbered,
everything goes kablooey).

Doing the export and import will force ZFS to re-read the metadata on
the drives (ZFS does it's own labelling to say which drives belong
to which vdevs), and to pick things up correctly using the new device
nodes.

 I don't have / on ZFS; I'm only using it as a 'data' partition, so I should
 be able to try your suggestion. My only concern: is there any risk of
 trashing my pool if I try your instructions? Everything I've done so far,
 even when told insufficient replicas / corrupt data, has not cost me any
 data as long as I switch back to the original (dying) drive. If I mix in
 export/import statements which might 'touch' the pool, is there a chance it
 will choke and trash my data?

Well, there's always a chance things explode.  :)  But an
export/import is safe so long as all drives are connected at the time.
 I've recovered corrupted pools by doing the above.  (I've now
switched to labelling all my drives to prevent this from happening.)

Of course, always have good backups.  ;)

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Adam Vande More
On Mon, Jul 19, 2010 at 10:56 AM, Garrett Moore garrettmo...@gmail.comwrote:

 So you think it's because when I switch from the old disk to the new disk,
 ZFS doesn't realize the disk has changed, and thinks the data is just
 corrupt now? Even if that happens, shouldn't the pool still be available,
 since it's RAIDZ1 and only one disk has gone away?

 I don't have / on ZFS; I'm only using it as a 'data' partition, so I should
 be able to try your suggestion. My only concern: is there any risk of
 trashing my pool if I try your instructions? Everything I've done so far,
 even when told insufficient replicas / corrupt data, has not cost me any
 data as long as I switch back to the original (dying) drive. If I mix in
 export/import statements which might 'touch' the pool, is there a chance it
 will choke and trash my data?


I'm not sure what's going on in your case, but I have cron'd a zpool scrub
for my pool on weekly basis to avoid this.  I run a / zfs mirror and one day
I could no longer boot and saw the dread 'insufficient replicas'.  I
eventually got it when disk started to work again briefly then did a
snapshot/send offsite, redid system with new install  disk then restored
data.  The export/import shouldn't hurt, I used that when booting off an
MFSBSD cd and imported the zpool to send from there.  Perhaps you might want
to consider RAIDZ2 with all those disks.

-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Garrett Moore
The data on the disks is not irreplaceable so if I lose the array it isn't
the end of the world but I would prefer not to lose it as it would be a pain
to get all of the data again.

Freddie's explanation is reasonable, but any ideas why it didn't happen when
I replaced my first dead drive (da5)? That replacement was completely
painless.

The system is in a Supermicro case with a hotswap backplane:
http://www.supermicro.com/products/chassis/4U/743/SC743TQ-865.cfm
The backplane ports are connected to a Supermicro AOC-USASLP-L8I LSI 1068E
8-PORT RAID 0/1/10 Uio SATA/SAS Controller.


By the way, Freddie, in your instructions in the 'reboot' step I assume that
is when I will be switching the physical drives, correct?



On Mon, Jul 19, 2010 at 12:18 PM, Adam Vande More amvandem...@gmail.comwrote:

 On Mon, Jul 19, 2010 at 10:56 AM, Garrett Moore garrettmo...@gmail.comwrote:

 So you think it's because when I switch from the old disk to the new disk,
 ZFS doesn't realize the disk has changed, and thinks the data is just
 corrupt now? Even if that happens, shouldn't the pool still be available,
 since it's RAIDZ1 and only one disk has gone away?

 I don't have / on ZFS; I'm only using it as a 'data' partition, so I
 should
 be able to try your suggestion. My only concern: is there any risk of
 trashing my pool if I try your instructions? Everything I've done so far,
 even when told insufficient replicas / corrupt data, has not cost me any
 data as long as I switch back to the original (dying) drive. If I mix in
 export/import statements which might 'touch' the pool, is there a chance
 it
 will choke and trash my data?


 I'm not sure what's going on in your case, but I have cron'd a zpool scrub
 for my pool on weekly basis to avoid this.  I run a / zfs mirror and one day
 I could no longer boot and saw the dread 'insufficient replicas'.  I
 eventually got it when disk started to work again briefly then did a
 snapshot/send offsite, redid system with new install  disk then restored
 data.  The export/import shouldn't hurt, I used that when booting off an
 MFSBSD cd and imported the zpool to send from there.  Perhaps you might want
 to consider RAIDZ2 with all those disks.

 --
 Adam Vande More

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Garrett Moore
I forgot to ask in the last email, is there a way to convert from Z1 to Z2
without losing data? I actually have far more storage than I need so I'd
consider going to Z2.


On Mon, Jul 19, 2010 at 12:18 PM, Adam Vande More amvandem...@gmail.comwrote:

 On Mon, Jul 19, 2010 at 10:56 AM, Garrett Moore garrettmo...@gmail.comwrote:

 So you think it's because when I switch from the old disk to the new disk,
 ZFS doesn't realize the disk has changed, and thinks the data is just
 corrupt now? Even if that happens, shouldn't the pool still be available,
 since it's RAIDZ1 and only one disk has gone away?

 I don't have / on ZFS; I'm only using it as a 'data' partition, so I
 should
 be able to try your suggestion. My only concern: is there any risk of
 trashing my pool if I try your instructions? Everything I've done so far,
 even when told insufficient replicas / corrupt data, has not cost me any
 data as long as I switch back to the original (dying) drive. If I mix in
 export/import statements which might 'touch' the pool, is there a chance
 it
 will choke and trash my data?


 I'm not sure what's going on in your case, but I have cron'd a zpool scrub
 for my pool on weekly basis to avoid this.  I run a / zfs mirror and one day
 I could no longer boot and saw the dread 'insufficient replicas'.  I
 eventually got it when disk started to work again briefly then did a
 snapshot/send offsite, redid system with new install  disk then restored
 data.  The export/import shouldn't hurt, I used that when booting off an
 MFSBSD cd and imported the zpool to send from there.  Perhaps you might want
 to consider RAIDZ2 with all those disks.

 --
 Adam Vande More

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Freddie Cash
On Mon, Jul 19, 2010 at 9:32 AM, Garrett Moore garrettmo...@gmail.com wrote:
 The data on the disks is not irreplaceable so if I lose the array it isn't
 the end of the world but I would prefer not to lose it as it would be a pain
 to get all of the data again.

 Freddie's explanation is reasonable, but any ideas why it didn't happen when
 I replaced my first dead drive (da5)? That replacement was completely
 painless.

 The system is in a Supermicro case with a hotswap backplane:
 http://www.supermicro.com/products/chassis/4U/743/SC743TQ-865.cfm
 The backplane ports are connected to a Supermicro AOC-USASLP-L8I LSI 1068E
 8-PORT RAID 0/1/10 Uio SATA/SAS Controller.

 By the way, Freddie, in your instructions in the 'reboot' step I assume that
 is when I will be switching the physical drives, correct?

Correct.  export, power off, swap drives, power on, import.

If it's a hot-swap backplane, though, have you tried doing it without
the reboot?

zpool offline tank da3
camcontrol something to turn off da3
swap drives
camcontrol rescan (or something like that)
zpool replace tank da3

Read the camcontrol man page to see what the commands are to turn
off the drive, and to rescan the controller for the new drive.  It's
possible the device number may change.

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Freddie Cash
On Mon, Jul 19, 2010 at 9:33 AM, Garrett Moore garrettmo...@gmail.com wrote:
 I forgot to ask in the last email, is there a way to convert from Z1 to Z2
 without losing data? I actually have far more storage than I need so I'd
 consider going to Z2.

No, unfortunately it's not currently possible to change vdev types in
ZFS.  This is waiting on the long-time-in-coming block-pointer
rewrite feature which has not yet been added to ZFS in OpenSolaris
(and there's very little information on it's progress).

The standard process is:
  - configure new server
  - configure pool using new vdev layout
  - zfs send/recv data to new server
  - decommission old server

Or:
  - take backups of server
  - destroy pool
  - configure new pool using new vdev layout
  - restore from backups

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Garrett Moore
I'm nervous to trust the hotswap features and camcontrol to set things up
properly, but I guess I could try it. When I first set the system up before
I put data on the array I tried the hotswap functionality and drives
wouldn't always re-attach when reinserted, even if I fiddled with
camcontrol, but I can't remember exactly what I did then.


On Mon, Jul 19, 2010 at 12:36 PM, Freddie Cash fjwc...@gmail.com wrote:

 On Mon, Jul 19, 2010 at 9:32 AM, Garrett Moore garrettmo...@gmail.com
 wrote:
  The data on the disks is not irreplaceable so if I lose the array it
 isn't
  the end of the world but I would prefer not to lose it as it would be a
 pain
  to get all of the data again.
 
  Freddie's explanation is reasonable, but any ideas why it didn't happen
 when
  I replaced my first dead drive (da5)? That replacement was completely
  painless.
 
  The system is in a Supermicro case with a hotswap backplane:
  http://www.supermicro.com/products/chassis/4U/743/SC743TQ-865.cfm
  The backplane ports are connected to a Supermicro AOC-USASLP-L8I LSI
 1068E
  8-PORT RAID 0/1/10 Uio SATA/SAS Controller.
 
  By the way, Freddie, in your instructions in the 'reboot' step I assume
 that
  is when I will be switching the physical drives, correct?

 Correct.  export, power off, swap drives, power on, import.

 If it's a hot-swap backplane, though, have you tried doing it without
 the reboot?

 zpool offline tank da3
 camcontrol something to turn off da3
 swap drives
 camcontrol rescan (or something like that)
 zpool replace tank da3

 Read the camcontrol man page to see what the commands are to turn
 off the drive, and to rescan the controller for the new drive.  It's
 possible the device number may change.

 --
 Freddie Cash
 fjwc...@gmail.com
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread John Hawkes-Reed

On 19/07/2010 17:52, Garrett Moore wrote:

I'm nervous to trust the hotswap features and camcontrol to set things up
properly, but I guess I could try it. When I first set the system up before
I put data on the array I tried the hotswap functionality and drives
wouldn't always re-attach when reinserted, even if I fiddled with
camcontrol, but I can't remember exactly what I did then.


We've a pair of medium-sized ZFS boxes with Supermicro boards (X8DTi, 
IIRC) in hotswap chassis. They've both got one hot-spare drive. Well, I 
say 'hot spare'. I mean 'Ought to be a hot-spare if my shoddy Perl works 
when triggered by devd'. What we've found to work is this:


Drive fails (thus far simulated by pulling the drive from the backplane)
ZFS error reported. Pool in degraded state.
'zpool replace pool da9 da23' (Where da23 is the hot spare and where 
this *should* happen automagically.)

Wait for resilvering.
Go on and swap the failed drive (da9 in this case)
'camcontrol rescan all' (new drive shows up in /var/log/messages)
'zpool replace da9'
Wait while resilvering happens.
Hot-swap drive returns to 'avail' status.

[ ... ]


--
JH-R
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Garrett Moore
Well, hotswapping worked, but now I have a totally different problem. Just
for reference:
# zpool offline tank da3
# camcontrol stop da3
swap drive
# camcontrol rescan all
'da3 lost device, removing device entry'
# camcontrol rescan all
'da3 at mpt0 ...', so new drive was found! yay
# zpool replace tank da3
*cannot replace da3 with da3: device is too small*

So I looked at the smartctl output for the old and new drive. Old:
Device Model: WDC WD15EADS-00P8B0
Serial Number:WD-WMAVU0087717
Firmware Version: 01.00A01
User Capacity:1,500,301,910,016 bytes

New:
Device Model: WDC WD15EADS-00R6B0
Serial Number:WD-WCAVY4770428
Firmware Version: 01.00A01
User Capacity:1,500,300,828,160 bytes

God damnit, Western Digital. What can I do now? It's such a small
difference, is there a way I can work around this? My other replacement
drive is the 00R6B0 drive model as well, with the slightly smaller
capacity.



On Mon, Jul 19, 2010 at 4:09 PM, John Hawkes-Reed hi...@libeljournal.comwrote:

 On 19/07/2010 17:52, Garrett Moore wrote:

 I'm nervous to trust the hotswap features and camcontrol to set things up
 properly, but I guess I could try it. When I first set the system up
 before
 I put data on the array I tried the hotswap functionality and drives
 wouldn't always re-attach when reinserted, even if I fiddled with
 camcontrol, but I can't remember exactly what I did then.


 We've a pair of medium-sized ZFS boxes with Supermicro boards (X8DTi, IIRC)
 in hotswap chassis. They've both got one hot-spare drive. Well, I say 'hot
 spare'. I mean 'Ought to be a hot-spare if my shoddy Perl works when
 triggered by devd'. What we've found to work is this:

 Drive fails (thus far simulated by pulling the drive from the backplane)
 ZFS error reported. Pool in degraded state.
 'zpool replace pool da9 da23' (Where da23 is the hot spare and where this
 *should* happen automagically.)
 Wait for resilvering.
 Go on and swap the failed drive (da9 in this case)
 'camcontrol rescan all' (new drive shows up in /var/log/messages)
 'zpool replace da9'
 Wait while resilvering happens.
 Hot-swap drive returns to 'avail' status.

 [ ... ]


 --
 JH-R

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Freddie Cash
On Mon, Jul 19, 2010 at 3:04 PM, Garrett Moore garrettmo...@gmail.com wrote:
 Well, hotswapping worked, but now I have a totally different problem. Just

Yay.  :)

 for reference:
 # zpool offline tank da3
 # camcontrol stop da3
 swap drive
 # camcontrol rescan all
 'da3 lost device, removing device entry'
 # camcontrol rescan all
 'da3 at mpt0 ...', so new drive was found! yay
 # zpool replace tank da3
 *cannot replace da3 with da3: device is too small*

 So I looked at the smartctl output for the old and new drive. Old:
 Device Model:     WDC WD15EADS-00P8B0
 Serial Number:    WD-WMAVU0087717
 Firmware Version: 01.00A01
 User Capacity:    1,500,301,910,016 bytes

 New:
 Device Model:     WDC WD15EADS-00R6B0
 Serial Number:    WD-WCAVY4770428
 Firmware Version: 01.00A01
 User Capacity:    1,500,300,828,160 bytes

 God damnit, Western Digital. What can I do now? It's such a small
 difference, is there a way I can work around this? My other replacement
 drive is the 00R6B0 drive model as well, with the slightly smaller
 capacity.

Crap!

There's a version of ZFS that works around this (or maybe a version of
OSol?), although I don't recall the version off-hand, and I can't find
it in the list of versions on the OSol website.

You may be stuck until you get a drive with more sectors.  :(

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Adam Vande More
On Mon, Jul 19, 2010 at 5:04 PM, Garrett Moore garrettmo...@gmail.comwrote:

 Well, hotswapping worked, but now I have a totally different problem. Just
 for reference:
 # zpool offline tank da3
 # camcontrol stop da3
 swap drive
 # camcontrol rescan all
 'da3 lost device, removing device entry'
 # camcontrol rescan all
 'da3 at mpt0 ...', so new drive was found! yay
 # zpool replace tank da3
 *cannot replace da3 with da3: device is too small*

 So I looked at the smartctl output for the old and new drive. Old:
 Device Model: WDC WD15EADS-00P8B0
 Serial Number:WD-WMAVU0087717
 Firmware Version: 01.00A01
 User Capacity:1,500,301,910,016 bytes

 New:
 Device Model: WDC WD15EADS-00R6B0
 Serial Number:WD-WCAVY4770428
 Firmware Version: 01.00A01
 User Capacity:1,500,300,828,160 bytes

 God damnit, Western Digital. What can I do now? It's such a small
 difference, is there a way I can work around this?


Uff da, I'm pretty sure the answer is no, you would have had to use the
smallest device first when creating the pool I think.


 My other replacement
 drive is the 00R6B0 drive model as well, with the slightly smaller
 capacity.




-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Garrett Moore
Well thank you very much Western Digital for your absolutely pathetic RMA
service sending me an inferior drive. I'll call tomorrow and see what can be
done; I'm going to insist on these 00R6B0 drives being sent back, and being
given a drive of = 1,500,301,910,016 bytes capacity.

At least now I learned how to hotswap :/


On Mon, Jul 19, 2010 at 6:22 PM, Adam Vande More amvandem...@gmail.comwrote:



 On Mon, Jul 19, 2010 at 5:04 PM, Garrett Moore garrettmo...@gmail.comwrote:

 Well, hotswapping worked, but now I have a totally different problem. Just
 for reference:
 # zpool offline tank da3
 # camcontrol stop da3
 swap drive
 # camcontrol rescan all
 'da3 lost device, removing device entry'
 # camcontrol rescan all
 'da3 at mpt0 ...', so new drive was found! yay
 # zpool replace tank da3
 *cannot replace da3 with da3: device is too small*

 So I looked at the smartctl output for the old and new drive. Old:
 Device Model: WDC WD15EADS-00P8B0
 Serial Number:WD-WMAVU0087717
 Firmware Version: 01.00A01
 User Capacity:1,500,301,910,016 bytes

 New:
 Device Model: WDC WD15EADS-00R6B0
 Serial Number:WD-WCAVY4770428
 Firmware Version: 01.00A01
 User Capacity:1,500,300,828,160 bytes

 God damnit, Western Digital. What can I do now? It's such a small
 difference, is there a way I can work around this?


 Uff da, I'm pretty sure the answer is no, you would have had to use the
 smallest device first when creating the pool I think.


 My other replacement
 drive is the 00R6B0 drive model as well, with the slightly smaller
 capacity.




 --
 Adam Vande More

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Clifton Royston
On Mon, Jul 19, 2010 at 06:28:16PM -0400, Garrett Moore wrote:
 Well thank you very much Western Digital for your absolutely pathetic RMA
 service sending me an inferior drive. I'll call tomorrow and see what can be
 done; I'm going to insist on these 00R6B0 drives being sent back, and being
 given a drive of = 1,500,301,910,016 bytes capacity.
 
 At least now I learned how to hotswap :/

This is the wrong time to find it out, I know, but:

For geom mirror purposes I have taken to carving out HD partitions
which are exactly a round number of GiB, and at least 100 MiB less than
the claimed capacity.  I then label the partitions using geom label,
and use the partition label specification in setting up the mirror.

The first means that when an unannounced change in drive capacity like
this happens, I can still add the smaller drive into a RAID set of
larger drives painlessly, because the partition sizes will still match
exactly.  (It also means that if I can only find a larger drive, e.g.
if 1TB drives later drop off the market or are more expensive than
larger drives, I can easily add a larger drive into the set by making
its partition size match the rest.)

The second means that if the BIOS/probing sequence does its little
dance and the drives end up getting renumbered, it does not affect
which drive is which part of what mirror in the slightest.

The space sacrificed is trivial compared to the convenience and safety
net.

I think I got both those suggestions on this list, and I would hope
(assume?) that they have equivalents under ZFS. 

  -- Clifton

-- 
Clifton Royston  --  clift...@iandicomputing.com / clift...@lava.net
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Daniel O'Connor

On 20/07/2010, at 10:55, Clifton Royston wrote:
 The space sacrificed is trivial compared to the convenience and safety
 net.
 
 I think I got both those suggestions on this list, and I would hope
 (assume?) that they have equivalents under ZFS. 

I partitioned my ZFS disks using GPT so I could reference them using GPT UUIDs.

I also put 4Gb partitions on each one which I glued together using gmirror and 
used it for swap.

I learnt this from many tales of woe :)

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
The nice thing about standards is that there
are so many of them to choose from.
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Dan Langille

On 7/19/2010 12:15 PM, Freddie Cash wrote:

On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com  wrote:

So you think it's because when I switch from the old disk to the new disk,
ZFS doesn't realize the disk has changed, and thinks the data is just
corrupt now? Even if that happens, shouldn't the pool still be available,
since it's RAIDZ1 and only one disk has gone away?


I think it's because you pull the old drive, boot with the new drive,
the controller re-numbers all the devices (ie da3 is now da2, da2 is
now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all
the drives have changed, thus corrupting the pool.  I've had this
happen on our storage servers a couple of times before I started using
glabel(8) on all our drives (dead drive on RAID controller, remove
drive, reboot for whatever reason, all device nodes are renumbered,
everything goes kablooey).


Can you explain a bit about how you use glabel(8) in conjunction with 
ZFS?  If I can retrofit this into an exist ZFS array to make things 
easier in the future...


8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010

]# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
storage ONLINE   0 0 0
  raidz1ONLINE   0 0 0
ad8 ONLINE   0 0 0
ad10ONLINE   0 0 0
ad12ONLINE   0 0 0
ad14ONLINE   0 0 0
ad16ONLINE   0 0 0


Of course, always have good backups.  ;)


In my case, this ZFS array is the backup.  ;)

But I'm setting up a tape library, real soon now

--
Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems replacing failing drive in ZFS pool

2010-07-19 Thread Adam Vande More
On Mon, Jul 19, 2010 at 9:07 PM, Dan Langille d...@langille.org wrote:

 I think it's because you pull the old drive, boot with the new drive,
 the controller re-numbers all the devices (ie da3 is now da2, da2 is
 now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all
 the drives have changed, thus corrupting the pool.  I've had this
 happen on our storage servers a couple of times before I started using
 glabel(8) on all our drives (dead drive on RAID controller, remove
 drive, reboot for whatever reason, all device nodes are renumbered,
 everything goes kablooey).



 Can you explain a bit about how you use glabel(8) in conjunction with ZFS?
  If I can retrofit this into an exist ZFS array to make things easier in the
 future...


If you've used whole disks in ZFS, you can't retrofit it if by retrofit you
mean an almost painless method of resolving this.  GEOM setup stuff
generally should happen BEFORE the file system is on it.

You would create your partition(s) slightly smaller than the disk, label it,
then use the resulting device as your zfs device when creating the pool.  If
you have an existing full disk install, that means restoring the data after
you've done those steps.  It works just as well with MBR style partitioning,
there's nothing saying you have to use GPT.  GPT is just better though in
terms of ease of use IMO among other things.

-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org