Re: Problems replacing failing drive in ZFS pool
On Wed, Jul 21, 2010 at 1:57 AM, alan bryan alanbryan1...@yahoo.com wrote: --- On Mon, 7/19/10, Dan Langille d...@langille.org wrote: From: Dan Langille d...@langille.org Subject: Re: Problems replacing failing drive in ZFS pool To: Freddie Cash fjwc...@gmail.com Cc: freebsd-stable freebsd-stable@freebsd.org Date: Monday, July 19, 2010, 7:07 PM On 7/19/2010 12:15 PM, Freddie Cash wrote: On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com wrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 ]# zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad16 ONLINE 0 0 0 Of course, always have good backups. ;) In my case, this ZFS array is the backup. ;) But I'm setting up a tape library, real soon now -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Dan, Here's how to do it after the fact: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html --Alan Bryan [r...@foghornleghorn ~]# glabel label disk01 /dev/da0 glabel: Can't store metadata on /dev/da0: Operation not permitted. Hrmph. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Joshua Boyd JBipNet E-mail: boy...@jbip.net http://www.jbip.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Wed, Jul 21, 2010 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote: On Wed, Jul 21, 2010 at 1:57 AM, alan bryan alanbryan1...@yahoo.comwrote: --- On Mon, 7/19/10, Dan Langille d...@langille.org wrote: From: Dan Langille d...@langille.org Subject: Re: Problems replacing failing drive in ZFS pool To: Freddie Cash fjwc...@gmail.com Cc: freebsd-stable freebsd-stable@freebsd.org Date: Monday, July 19, 2010, 7:07 PM On 7/19/2010 12:15 PM, Freddie Cash wrote: On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com wrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 ]# zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad16 ONLINE 0 0 0 Of course, always have good backups. ;) In my case, this ZFS array is the backup. ;) But I'm setting up a tape library, real soon now -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Dan, Here's how to do it after the fact: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html --Alan Bryan [r...@foghornleghorn ~]# glabel label disk01 /dev/da0 glabel: Can't store metadata on /dev/da0: Operation not permitted. Hrmph. Nevermind, sysctl kern.geom.debugflags=16 solves that problem, but then you get this: [r...@foghornleghorn ~]# zpool replace tank da0 label/disk01 cannot open 'label/disk01': no such GEOM provider must be a full path or shorthand device name ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Joshua Boyd JBipNet E-mail: boy...@jbip.net http://www.jbip.net -- Joshua Boyd JBipNet E-mail: boy...@jbip.net http://www.jbip.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On 07/21/2010 02:14, Joshua Boyd wrote: [r...@foghornleghorn ~]# zpool replace tank da0 label/disk01 cannot open 'label/disk01': no such GEOM provider must be a full path or shorthand device name Of course you cant. You have labeled a disk that is already in use so in turn the label should never appear in dev/label/*. If you tried to re-silver the same disk that was already in use I would think if it could be done that the result would be of inconsistent data and write errors all over the place. Regards, -- jhell,v ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Tue, 20 Jul 2010, alan bryan wrote: --- On Mon, 7/19/10, Dan Langille d...@langille.org wrote: From: Dan Langille d...@langille.org Subject: Re: Problems replacing failing drive in ZFS pool To: Freddie Cash fjwc...@gmail.com Cc: freebsd-stable freebsd-stable@freebsd.org Date: Monday, July 19, 2010, 7:07 PM On 7/19/2010 12:15 PM, Freddie Cash wrote: On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com wrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 ]# zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad16 ONLINE 0 0 0 Of course, always have good backups. ;) In my case, this ZFS array is the backup. ;) But I'm setting up a tape library, real soon now -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Dan, Here's how to do it after the fact: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html Two things: -What's the preferred labelling method for disks that will be used with zfs these days? geom_label or gpt labels? I've been using the latter and I find them a little simpler. -I think that if you already are using gpt partitioning, you can add a gpt label after the fact (ie: gpart -i index# -l your_label adaX). gpart list will give you a list of index numbers. Charles --Alan Bryan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Wed, 21 Jul 2010, Charles Sprickman wrote: On Tue, 20 Jul 2010, alan bryan wrote: --- On Mon, 7/19/10, Dan Langille d...@langille.org wrote: From: Dan Langille d...@langille.org Subject: Re: Problems replacing failing drive in ZFS pool To: Freddie Cash fjwc...@gmail.com Cc: freebsd-stable freebsd-stable@freebsd.org Date: Monday, July 19, 2010, 7:07 PM On 7/19/2010 12:15 PM, Freddie Cash wrote: On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com wrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 ]# zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad16 ONLINE 0 0 0 Of course, always have good backups. ;) In my case, this ZFS array is the backup. ;) But I'm setting up a tape library, real soon now -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Dan, Here's how to do it after the fact: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html Two things: -What's the preferred labelling method for disks that will be used with zfs these days? geom_label or gpt labels? I've been using the latter and I find them a little simpler. -I think that if you already are using gpt partitioning, you can add a gpt label after the fact (ie: gpart -i index# -l your_label adaX). gpart list will give you a list of index numbers. Oops. That should be gpart modify -i index# -l your_label adax. Charles --Alan Bryan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On 7/21/2010 2:54 AM, Charles Sprickman wrote: On Wed, 21 Jul 2010, Charles Sprickman wrote: On Tue, 20 Jul 2010, alan bryan wrote: --- On Mon, 7/19/10, Dan Langille d...@langille.org wrote: From: Dan Langille d...@langille.org Subject: Re: Problems replacing failing drive in ZFS pool To: Freddie Cash fjwc...@gmail.com Cc: freebsd-stable freebsd-stable@freebsd.org Date: Monday, July 19, 2010, 7:07 PM On 7/19/2010 12:15 PM, Freddie Cash wrote: On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com wrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 ]# zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad16 ONLINE 0 0 0 Of course, always have good backups. ;) In my case, this ZFS array is the backup. ;) But I'm setting up a tape library, real soon now -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Dan, Here's how to do it after the fact: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html Two things: -What's the preferred labelling method for disks that will be used with zfs these days? geom_label or gpt labels? I've been using the latter and I find them a little simpler. -I think that if you already are using gpt partitioning, you can add a gpt label after the fact (ie: gpart -i index# -l your_label adaX). gpart list will give you a list of index numbers. Oops. That should be gpart modify -i index# -l your_label adax. I'm not using gpt partitioning. I think I'd like to try that. To do just that, I've ordered two more HDD. They'll be arriving today. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Tue, Jul 20, 2010 at 11:43 PM, Charles Sprickman sp...@bway.net wrote: Two things: -What's the preferred labelling method for disks that will be used with zfs these days? geom_label or gpt labels? I've been using the latter and I find them a little simpler. If the disks will only be used in FreeBSD systems, and you are using the entire disk for ZFS (no partitioning), then glabel works well, and is the easiest to use. If you want to be able to move the disks between FreeBSD, OpenSolaris, Solaris, ZFS-FUSE on Linux, etc, then you need to use GPT labels. glabel is not portable. I have not used gpart, so don't know if you can label the disk or just partitions on the disk. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On 7/19/2010 10:50 PM, Adam Vande More wrote: On Mon, Jul 19, 2010 at 9:07 PM, Dan Langilled...@langille.org wrote: I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... If you've used whole disks in ZFS, you can't retrofit it if by retrofit you mean an almost painless method of resolving this. GEOM setup stuff generally should happen BEFORE the file system is on it. You would create your partition(s) slightly smaller than the disk, label it, then use the resulting device as your zfs device when creating the pool. If you have an existing full disk install, that means restoring the data after you've done those steps. It works just as well with MBR style partitioning, there's nothing saying you have to use GPT. GPT is just better though in terms of ease of use IMO among other things. FYI, this is exactly what I'm doing to do. I have obtained addition HDD to serve as temporary storage. I will also use them for practicing the commands before destroying the original array. I'll post my plan to the list for review. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
on 20/07/2010 01:04 Garrett Moore said the following: Well, hotswapping worked, but now I have a totally different problem. Just for reference: # zpool offline tank da3 # camcontrol stop da3 swap drive # camcontrol rescan all 'da3 lost device, removing device entry' # camcontrol rescan all 'da3 at mpt0 ...', so new drive was found! yay # zpool replace tank da3 *cannot replace da3 with da3: device is too small* So I looked at the smartctl output for the old and new drive. Old: Device Model: WDC WD15EADS-00P8B0 Serial Number:WD-WMAVU0087717 Firmware Version: 01.00A01 User Capacity:1,500,301,910,016 bytes New: Device Model: WDC WD15EADS-00R6B0 Serial Number:WD-WCAVY4770428 Firmware Version: 01.00A01 User Capacity:1,500,300,828,160 bytes God damnit, Western Digital. What can I do now? It's such a small difference, is there a way I can work around this? My other replacement drive is the 00R6B0 drive model as well, with the slightly smaller capacity. I second what others have said - crap. But there could be some hope, not sure. Can you check what is the actual size used by the pool on the disk? It should be somewhere in zdb -C output (asize?). If I remember correctly, that actual size should be a multiple of some rather large power of two, so it could be that it is smaller than 'User Capacity' of both old and new drives. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
Hi guys, I second what others have said - crap. But there could be some hope, not sure. Can you check what is the actual size used by the pool on the disk? It should be somewhere in zdb -C output (asize?). If I remember correctly, that actual size should be a multiple of some rather large power of two, so it could be that it is smaller than 'User Capacity' of both old and new drives. Well, I see some possibilities for creative solution here, using some ssd (or usb stick or mdconfig as act of desperation) and gconcat, but it's asking for trouble and should probably be considered a temporary hack. What I personally would do is get a 2TB drive and use it instead, with gpt and -l for label, and replace it as gpt/something. Using 100 or so MB less than whole disk is also a good idea, as you can see ;) Cheers and good luck. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
--- On Mon, 7/19/10, Dan Langille d...@langille.org wrote: From: Dan Langille d...@langille.org Subject: Re: Problems replacing failing drive in ZFS pool To: Freddie Cash fjwc...@gmail.com Cc: freebsd-stable freebsd-stable@freebsd.org Date: Monday, July 19, 2010, 7:07 PM On 7/19/2010 12:15 PM, Freddie Cash wrote: On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com wrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 ]# zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad16 ONLINE 0 0 0 Of course, always have good backups. ;) In my case, this ZFS array is the backup. ;) But I'm setting up a tape library, real soon now -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Dan, Here's how to do it after the fact: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html --Alan Bryan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Mon, Jul 19, 2010 at 11:21:38AM -0400, Garrett Moore wrote: I have an 8-drive ZFS array consisting of WD15EADS drives. One of my disks has started to fail, so I got a replacement disk. I have replaced a disk before by: zpool offline tank /dev/da5 shutting down, swapping from old disk to new disk booting zpool replace tank /dev/da5 This worked fine. This time the failing disk was da3, and I tried the same thing: zpool offline tank /dev/da3 zpool status showed da3 offline. shut down, swapped old disk to new disk. When I booted again, I got: Code: NAMESTATE READ WRITE CKSUM tankUNAVAIL 0 0 0 insufficient replicas raidz1UNAVAIL 0 0 0 corrupted data da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 I switched back to the old disk and booted again and then I could access my data again, and da3 still showed as offline. I tried 'zpool online tank /dev/da3' and after a few seconds resilvering completed and all 8 drives are back online again, but with the 'dying' disk as da3 still. I tried shutting down WITHOUT first offlining /dev/da3, and swapping the disks, and when I booted I again got 'insufficient replicas'. Why am I getting this error, and how come it worked ok the last time I replaced a disk? And more importantly, how do I switch to my new replacement disk without losing data? Can you please provide uname -a output? Thanks. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
Oops - shouldn't have forgotten that, sorry. FreeBSD leviathan 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 On Mon, Jul 19, 2010 at 11:24 AM, Jeremy Chadwick free...@jdc.parodius.comwrote: On Mon, Jul 19, 2010 at 11:21:38AM -0400, Garrett Moore wrote: I have an 8-drive ZFS array consisting of WD15EADS drives. One of my disks has started to fail, so I got a replacement disk. I have replaced a disk before by: zpool offline tank /dev/da5 shutting down, swapping from old disk to new disk booting zpool replace tank /dev/da5 This worked fine. This time the failing disk was da3, and I tried the same thing: zpool offline tank /dev/da3 zpool status showed da3 offline. shut down, swapped old disk to new disk. When I booted again, I got: Code: NAMESTATE READ WRITE CKSUM tankUNAVAIL 0 0 0 insufficient replicas raidz1UNAVAIL 0 0 0 corrupted data da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 I switched back to the old disk and booted again and then I could access my data again, and da3 still showed as offline. I tried 'zpool online tank /dev/da3' and after a few seconds resilvering completed and all 8 drives are back online again, but with the 'dying' disk as da3 still. I tried shutting down WITHOUT first offlining /dev/da3, and swapping the disks, and when I booted I again got 'insufficient replicas'. Why am I getting this error, and how come it worked ok the last time I replaced a disk? And more importantly, how do I switch to my new replacement disk without losing data? Can you please provide uname -a output? Thanks. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Mon, Jul 19, 2010 at 8:21 AM, Garrett Moore garrettmo...@gmail.com wrote: I have an 8-drive ZFS array consisting of WD15EADS drives. One of my disks has started to fail, so I got a replacement disk. I have replaced a disk before by: zpool offline tank /dev/da5 shutting down, swapping from old disk to new disk booting zpool replace tank /dev/da5 This worked fine. This time the failing disk was da3, and I tried the same thing: zpool offline tank /dev/da3 zpool status showed da3 offline. shut down, swapped old disk to new disk. For some reason, ZFS is getting confused by the device names, possibly due to the controller renumbering device nodes? Try the following: zpool offline tank /dev/da3 zpool status tank to make sure it offlined the correct drive zpool export tank might have to do this from single-user mode reboot zpool import tank this forces ZFS to re-taste each drive to read the metadata zpool replace tank /dev/da3this should force it to use the correct drive Note: if you have / on ZFS, the above may not be doable. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I don't have / on ZFS; I'm only using it as a 'data' partition, so I should be able to try your suggestion. My only concern: is there any risk of trashing my pool if I try your instructions? Everything I've done so far, even when told insufficient replicas / corrupt data, has not cost me any data as long as I switch back to the original (dying) drive. If I mix in export/import statements which might 'touch' the pool, is there a chance it will choke and trash my data? On Mon, Jul 19, 2010 at 11:45 AM, Freddie Cash fjwc...@gmail.com wrote: On Mon, Jul 19, 2010 at 8:21 AM, Garrett Moore garrettmo...@gmail.com wrote: I have an 8-drive ZFS array consisting of WD15EADS drives. One of my disks has started to fail, so I got a replacement disk. I have replaced a disk before by: zpool offline tank /dev/da5 shutting down, swapping from old disk to new disk booting zpool replace tank /dev/da5 This worked fine. This time the failing disk was da3, and I tried the same thing: zpool offline tank /dev/da3 zpool status showed da3 offline. shut down, swapped old disk to new disk. For some reason, ZFS is getting confused by the device names, possibly due to the controller renumbering device nodes? Try the following: zpool offline tank /dev/da3 zpool status tank to make sure it offlined the correct drive zpool export tank might have to do this from single-user mode reboot zpool import tank this forces ZFS to re-taste each drive to read the metadata zpool replace tank /dev/da3this should force it to use the correct drive Note: if you have / on ZFS, the above may not be doable. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Mon, Jul 19, 2010 at 8:56 AM, Garrett Moore garrettmo...@gmail.com wrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Doing the export and import will force ZFS to re-read the metadata on the drives (ZFS does it's own labelling to say which drives belong to which vdevs), and to pick things up correctly using the new device nodes. I don't have / on ZFS; I'm only using it as a 'data' partition, so I should be able to try your suggestion. My only concern: is there any risk of trashing my pool if I try your instructions? Everything I've done so far, even when told insufficient replicas / corrupt data, has not cost me any data as long as I switch back to the original (dying) drive. If I mix in export/import statements which might 'touch' the pool, is there a chance it will choke and trash my data? Well, there's always a chance things explode. :) But an export/import is safe so long as all drives are connected at the time. I've recovered corrupted pools by doing the above. (I've now switched to labelling all my drives to prevent this from happening.) Of course, always have good backups. ;) -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Mon, Jul 19, 2010 at 10:56 AM, Garrett Moore garrettmo...@gmail.comwrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I don't have / on ZFS; I'm only using it as a 'data' partition, so I should be able to try your suggestion. My only concern: is there any risk of trashing my pool if I try your instructions? Everything I've done so far, even when told insufficient replicas / corrupt data, has not cost me any data as long as I switch back to the original (dying) drive. If I mix in export/import statements which might 'touch' the pool, is there a chance it will choke and trash my data? I'm not sure what's going on in your case, but I have cron'd a zpool scrub for my pool on weekly basis to avoid this. I run a / zfs mirror and one day I could no longer boot and saw the dread 'insufficient replicas'. I eventually got it when disk started to work again briefly then did a snapshot/send offsite, redid system with new install disk then restored data. The export/import shouldn't hurt, I used that when booting off an MFSBSD cd and imported the zpool to send from there. Perhaps you might want to consider RAIDZ2 with all those disks. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
The data on the disks is not irreplaceable so if I lose the array it isn't the end of the world but I would prefer not to lose it as it would be a pain to get all of the data again. Freddie's explanation is reasonable, but any ideas why it didn't happen when I replaced my first dead drive (da5)? That replacement was completely painless. The system is in a Supermicro case with a hotswap backplane: http://www.supermicro.com/products/chassis/4U/743/SC743TQ-865.cfm The backplane ports are connected to a Supermicro AOC-USASLP-L8I LSI 1068E 8-PORT RAID 0/1/10 Uio SATA/SAS Controller. By the way, Freddie, in your instructions in the 'reboot' step I assume that is when I will be switching the physical drives, correct? On Mon, Jul 19, 2010 at 12:18 PM, Adam Vande More amvandem...@gmail.comwrote: On Mon, Jul 19, 2010 at 10:56 AM, Garrett Moore garrettmo...@gmail.comwrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I don't have / on ZFS; I'm only using it as a 'data' partition, so I should be able to try your suggestion. My only concern: is there any risk of trashing my pool if I try your instructions? Everything I've done so far, even when told insufficient replicas / corrupt data, has not cost me any data as long as I switch back to the original (dying) drive. If I mix in export/import statements which might 'touch' the pool, is there a chance it will choke and trash my data? I'm not sure what's going on in your case, but I have cron'd a zpool scrub for my pool on weekly basis to avoid this. I run a / zfs mirror and one day I could no longer boot and saw the dread 'insufficient replicas'. I eventually got it when disk started to work again briefly then did a snapshot/send offsite, redid system with new install disk then restored data. The export/import shouldn't hurt, I used that when booting off an MFSBSD cd and imported the zpool to send from there. Perhaps you might want to consider RAIDZ2 with all those disks. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
I forgot to ask in the last email, is there a way to convert from Z1 to Z2 without losing data? I actually have far more storage than I need so I'd consider going to Z2. On Mon, Jul 19, 2010 at 12:18 PM, Adam Vande More amvandem...@gmail.comwrote: On Mon, Jul 19, 2010 at 10:56 AM, Garrett Moore garrettmo...@gmail.comwrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I don't have / on ZFS; I'm only using it as a 'data' partition, so I should be able to try your suggestion. My only concern: is there any risk of trashing my pool if I try your instructions? Everything I've done so far, even when told insufficient replicas / corrupt data, has not cost me any data as long as I switch back to the original (dying) drive. If I mix in export/import statements which might 'touch' the pool, is there a chance it will choke and trash my data? I'm not sure what's going on in your case, but I have cron'd a zpool scrub for my pool on weekly basis to avoid this. I run a / zfs mirror and one day I could no longer boot and saw the dread 'insufficient replicas'. I eventually got it when disk started to work again briefly then did a snapshot/send offsite, redid system with new install disk then restored data. The export/import shouldn't hurt, I used that when booting off an MFSBSD cd and imported the zpool to send from there. Perhaps you might want to consider RAIDZ2 with all those disks. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Mon, Jul 19, 2010 at 9:32 AM, Garrett Moore garrettmo...@gmail.com wrote: The data on the disks is not irreplaceable so if I lose the array it isn't the end of the world but I would prefer not to lose it as it would be a pain to get all of the data again. Freddie's explanation is reasonable, but any ideas why it didn't happen when I replaced my first dead drive (da5)? That replacement was completely painless. The system is in a Supermicro case with a hotswap backplane: http://www.supermicro.com/products/chassis/4U/743/SC743TQ-865.cfm The backplane ports are connected to a Supermicro AOC-USASLP-L8I LSI 1068E 8-PORT RAID 0/1/10 Uio SATA/SAS Controller. By the way, Freddie, in your instructions in the 'reboot' step I assume that is when I will be switching the physical drives, correct? Correct. export, power off, swap drives, power on, import. If it's a hot-swap backplane, though, have you tried doing it without the reboot? zpool offline tank da3 camcontrol something to turn off da3 swap drives camcontrol rescan (or something like that) zpool replace tank da3 Read the camcontrol man page to see what the commands are to turn off the drive, and to rescan the controller for the new drive. It's possible the device number may change. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Mon, Jul 19, 2010 at 9:33 AM, Garrett Moore garrettmo...@gmail.com wrote: I forgot to ask in the last email, is there a way to convert from Z1 to Z2 without losing data? I actually have far more storage than I need so I'd consider going to Z2. No, unfortunately it's not currently possible to change vdev types in ZFS. This is waiting on the long-time-in-coming block-pointer rewrite feature which has not yet been added to ZFS in OpenSolaris (and there's very little information on it's progress). The standard process is: - configure new server - configure pool using new vdev layout - zfs send/recv data to new server - decommission old server Or: - take backups of server - destroy pool - configure new pool using new vdev layout - restore from backups -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
I'm nervous to trust the hotswap features and camcontrol to set things up properly, but I guess I could try it. When I first set the system up before I put data on the array I tried the hotswap functionality and drives wouldn't always re-attach when reinserted, even if I fiddled with camcontrol, but I can't remember exactly what I did then. On Mon, Jul 19, 2010 at 12:36 PM, Freddie Cash fjwc...@gmail.com wrote: On Mon, Jul 19, 2010 at 9:32 AM, Garrett Moore garrettmo...@gmail.com wrote: The data on the disks is not irreplaceable so if I lose the array it isn't the end of the world but I would prefer not to lose it as it would be a pain to get all of the data again. Freddie's explanation is reasonable, but any ideas why it didn't happen when I replaced my first dead drive (da5)? That replacement was completely painless. The system is in a Supermicro case with a hotswap backplane: http://www.supermicro.com/products/chassis/4U/743/SC743TQ-865.cfm The backplane ports are connected to a Supermicro AOC-USASLP-L8I LSI 1068E 8-PORT RAID 0/1/10 Uio SATA/SAS Controller. By the way, Freddie, in your instructions in the 'reboot' step I assume that is when I will be switching the physical drives, correct? Correct. export, power off, swap drives, power on, import. If it's a hot-swap backplane, though, have you tried doing it without the reboot? zpool offline tank da3 camcontrol something to turn off da3 swap drives camcontrol rescan (or something like that) zpool replace tank da3 Read the camcontrol man page to see what the commands are to turn off the drive, and to rescan the controller for the new drive. It's possible the device number may change. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On 19/07/2010 17:52, Garrett Moore wrote: I'm nervous to trust the hotswap features and camcontrol to set things up properly, but I guess I could try it. When I first set the system up before I put data on the array I tried the hotswap functionality and drives wouldn't always re-attach when reinserted, even if I fiddled with camcontrol, but I can't remember exactly what I did then. We've a pair of medium-sized ZFS boxes with Supermicro boards (X8DTi, IIRC) in hotswap chassis. They've both got one hot-spare drive. Well, I say 'hot spare'. I mean 'Ought to be a hot-spare if my shoddy Perl works when triggered by devd'. What we've found to work is this: Drive fails (thus far simulated by pulling the drive from the backplane) ZFS error reported. Pool in degraded state. 'zpool replace pool da9 da23' (Where da23 is the hot spare and where this *should* happen automagically.) Wait for resilvering. Go on and swap the failed drive (da9 in this case) 'camcontrol rescan all' (new drive shows up in /var/log/messages) 'zpool replace da9' Wait while resilvering happens. Hot-swap drive returns to 'avail' status. [ ... ] -- JH-R ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
Well, hotswapping worked, but now I have a totally different problem. Just for reference: # zpool offline tank da3 # camcontrol stop da3 swap drive # camcontrol rescan all 'da3 lost device, removing device entry' # camcontrol rescan all 'da3 at mpt0 ...', so new drive was found! yay # zpool replace tank da3 *cannot replace da3 with da3: device is too small* So I looked at the smartctl output for the old and new drive. Old: Device Model: WDC WD15EADS-00P8B0 Serial Number:WD-WMAVU0087717 Firmware Version: 01.00A01 User Capacity:1,500,301,910,016 bytes New: Device Model: WDC WD15EADS-00R6B0 Serial Number:WD-WCAVY4770428 Firmware Version: 01.00A01 User Capacity:1,500,300,828,160 bytes God damnit, Western Digital. What can I do now? It's such a small difference, is there a way I can work around this? My other replacement drive is the 00R6B0 drive model as well, with the slightly smaller capacity. On Mon, Jul 19, 2010 at 4:09 PM, John Hawkes-Reed hi...@libeljournal.comwrote: On 19/07/2010 17:52, Garrett Moore wrote: I'm nervous to trust the hotswap features and camcontrol to set things up properly, but I guess I could try it. When I first set the system up before I put data on the array I tried the hotswap functionality and drives wouldn't always re-attach when reinserted, even if I fiddled with camcontrol, but I can't remember exactly what I did then. We've a pair of medium-sized ZFS boxes with Supermicro boards (X8DTi, IIRC) in hotswap chassis. They've both got one hot-spare drive. Well, I say 'hot spare'. I mean 'Ought to be a hot-spare if my shoddy Perl works when triggered by devd'. What we've found to work is this: Drive fails (thus far simulated by pulling the drive from the backplane) ZFS error reported. Pool in degraded state. 'zpool replace pool da9 da23' (Where da23 is the hot spare and where this *should* happen automagically.) Wait for resilvering. Go on and swap the failed drive (da9 in this case) 'camcontrol rescan all' (new drive shows up in /var/log/messages) 'zpool replace da9' Wait while resilvering happens. Hot-swap drive returns to 'avail' status. [ ... ] -- JH-R ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Mon, Jul 19, 2010 at 3:04 PM, Garrett Moore garrettmo...@gmail.com wrote: Well, hotswapping worked, but now I have a totally different problem. Just Yay. :) for reference: # zpool offline tank da3 # camcontrol stop da3 swap drive # camcontrol rescan all 'da3 lost device, removing device entry' # camcontrol rescan all 'da3 at mpt0 ...', so new drive was found! yay # zpool replace tank da3 *cannot replace da3 with da3: device is too small* So I looked at the smartctl output for the old and new drive. Old: Device Model: WDC WD15EADS-00P8B0 Serial Number: WD-WMAVU0087717 Firmware Version: 01.00A01 User Capacity: 1,500,301,910,016 bytes New: Device Model: WDC WD15EADS-00R6B0 Serial Number: WD-WCAVY4770428 Firmware Version: 01.00A01 User Capacity: 1,500,300,828,160 bytes God damnit, Western Digital. What can I do now? It's such a small difference, is there a way I can work around this? My other replacement drive is the 00R6B0 drive model as well, with the slightly smaller capacity. Crap! There's a version of ZFS that works around this (or maybe a version of OSol?), although I don't recall the version off-hand, and I can't find it in the list of versions on the OSol website. You may be stuck until you get a drive with more sectors. :( -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Mon, Jul 19, 2010 at 5:04 PM, Garrett Moore garrettmo...@gmail.comwrote: Well, hotswapping worked, but now I have a totally different problem. Just for reference: # zpool offline tank da3 # camcontrol stop da3 swap drive # camcontrol rescan all 'da3 lost device, removing device entry' # camcontrol rescan all 'da3 at mpt0 ...', so new drive was found! yay # zpool replace tank da3 *cannot replace da3 with da3: device is too small* So I looked at the smartctl output for the old and new drive. Old: Device Model: WDC WD15EADS-00P8B0 Serial Number:WD-WMAVU0087717 Firmware Version: 01.00A01 User Capacity:1,500,301,910,016 bytes New: Device Model: WDC WD15EADS-00R6B0 Serial Number:WD-WCAVY4770428 Firmware Version: 01.00A01 User Capacity:1,500,300,828,160 bytes God damnit, Western Digital. What can I do now? It's such a small difference, is there a way I can work around this? Uff da, I'm pretty sure the answer is no, you would have had to use the smallest device first when creating the pool I think. My other replacement drive is the 00R6B0 drive model as well, with the slightly smaller capacity. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
Well thank you very much Western Digital for your absolutely pathetic RMA service sending me an inferior drive. I'll call tomorrow and see what can be done; I'm going to insist on these 00R6B0 drives being sent back, and being given a drive of = 1,500,301,910,016 bytes capacity. At least now I learned how to hotswap :/ On Mon, Jul 19, 2010 at 6:22 PM, Adam Vande More amvandem...@gmail.comwrote: On Mon, Jul 19, 2010 at 5:04 PM, Garrett Moore garrettmo...@gmail.comwrote: Well, hotswapping worked, but now I have a totally different problem. Just for reference: # zpool offline tank da3 # camcontrol stop da3 swap drive # camcontrol rescan all 'da3 lost device, removing device entry' # camcontrol rescan all 'da3 at mpt0 ...', so new drive was found! yay # zpool replace tank da3 *cannot replace da3 with da3: device is too small* So I looked at the smartctl output for the old and new drive. Old: Device Model: WDC WD15EADS-00P8B0 Serial Number:WD-WMAVU0087717 Firmware Version: 01.00A01 User Capacity:1,500,301,910,016 bytes New: Device Model: WDC WD15EADS-00R6B0 Serial Number:WD-WCAVY4770428 Firmware Version: 01.00A01 User Capacity:1,500,300,828,160 bytes God damnit, Western Digital. What can I do now? It's such a small difference, is there a way I can work around this? Uff da, I'm pretty sure the answer is no, you would have had to use the smallest device first when creating the pool I think. My other replacement drive is the 00R6B0 drive model as well, with the slightly smaller capacity. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Mon, Jul 19, 2010 at 06:28:16PM -0400, Garrett Moore wrote: Well thank you very much Western Digital for your absolutely pathetic RMA service sending me an inferior drive. I'll call tomorrow and see what can be done; I'm going to insist on these 00R6B0 drives being sent back, and being given a drive of = 1,500,301,910,016 bytes capacity. At least now I learned how to hotswap :/ This is the wrong time to find it out, I know, but: For geom mirror purposes I have taken to carving out HD partitions which are exactly a round number of GiB, and at least 100 MiB less than the claimed capacity. I then label the partitions using geom label, and use the partition label specification in setting up the mirror. The first means that when an unannounced change in drive capacity like this happens, I can still add the smaller drive into a RAID set of larger drives painlessly, because the partition sizes will still match exactly. (It also means that if I can only find a larger drive, e.g. if 1TB drives later drop off the market or are more expensive than larger drives, I can easily add a larger drive into the set by making its partition size match the rest.) The second means that if the BIOS/probing sequence does its little dance and the drives end up getting renumbered, it does not affect which drive is which part of what mirror in the slightest. The space sacrificed is trivial compared to the convenience and safety net. I think I got both those suggestions on this list, and I would hope (assume?) that they have equivalents under ZFS. -- Clifton -- Clifton Royston -- clift...@iandicomputing.com / clift...@lava.net President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On 20/07/2010, at 10:55, Clifton Royston wrote: The space sacrificed is trivial compared to the convenience and safety net. I think I got both those suggestions on this list, and I would hope (assume?) that they have equivalents under ZFS. I partitioned my ZFS disks using GPT so I could reference them using GPT UUIDs. I also put 4Gb partitions on each one which I glued together using gmirror and used it for swap. I learnt this from many tales of woe :) -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On 7/19/2010 12:15 PM, Freddie Cash wrote: On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com wrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 ]# zpool status pool: storage state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10ONLINE 0 0 0 ad12ONLINE 0 0 0 ad14ONLINE 0 0 0 ad16ONLINE 0 0 0 Of course, always have good backups. ;) In my case, this ZFS array is the backup. ;) But I'm setting up a tape library, real soon now -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On Mon, Jul 19, 2010 at 9:07 PM, Dan Langille d...@langille.org wrote: I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... If you've used whole disks in ZFS, you can't retrofit it if by retrofit you mean an almost painless method of resolving this. GEOM setup stuff generally should happen BEFORE the file system is on it. You would create your partition(s) slightly smaller than the disk, label it, then use the resulting device as your zfs device when creating the pool. If you have an existing full disk install, that means restoring the data after you've done those steps. It works just as well with MBR style partitioning, there's nothing saying you have to use GPT. GPT is just better though in terms of ease of use IMO among other things. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org