Re: [zfs-discuss] External SATA drive enclosures + ZFS?
On 2/25/2011 7:34 PM, Rich Teer wrote: > > One product that seems to fit the bill is the StarTech.com S352U2RER, > an external dual SATA disk enclosure with USB and eSATA connectivity > (I'd be using the USB port). Here's a link to the specific product > I'm considering: > > http://ca.startech.com/product/S352U2RER-35in-eSATA-USB-Dual-SATA-Hot-Swap-External-RAID-Hard-Drive-Enclosure I have had mixed results with their 4 bay version. When they work, they are great, but we have had a number of DOA/almost DOA units. I have had good luck with products from http://www.addonics.com/ (They ship to Canada as well without issue) Why use USB ? You wll get much better performance/throughput on eSata (if you have good drivers of course). I use their sil3124 eSata controller on FreeBSD as well as a number of PM units and they work great. ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multiple disk failure (solved?)
On 1/31/2011 4:19 PM, Mike Tancsa wrote: > On 1/31/2011 3:14 PM, Cindy Swearingen wrote: >> Hi Mike, >> >> Yes, this is looking much better. >> >> Some combination of removing corrupted files indicated in the zpool >> status -v output, running zpool scrub and then zpool clear should >> resolve the corruption, but its depends on how bad the corruption is. >> >> First, I would try least destruction method: Try to remove the >> files listed below by using the rm command. >> >> This entry probably means that the metadata is corrupted or some >> other file (like a temp file) no longer exists: >> >> tank1/argus-data:<0xc6> > > > Hi Cindy, > I removed the files that were listed, and now I am left with > > errors: Permanent errors have been detected in the following files: > > tank1/argus-data:<0xc5> > tank1/argus-data:<0xc6> > tank1/argus-data:<0xc7> > > I have started a scrub > scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go Looks like that was it! The scrub finished in the time it estimated and that was all I needed to do. I did not have to to do zpool clear or any other commands. Is there anything beyond scrub to check the integrity of the pool ? 0(offsite)# zpool status -v pool: tank1 state: ONLINE scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46 2011 config: NAMESTATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada0ONLINE 0 0 0 ada1ONLINE 0 0 0 ada2ONLINE 0 0 0 ada3ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada5ONLINE 0 0 0 ada8ONLINE 0 0 0 ada7ONLINE 0 0 0 ada6ONLINE 0 0 0 errors: No known data errors 0(offsite)# ---Mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multiple disk failure (solved?)
On 1/31/2011 3:14 PM, Cindy Swearingen wrote: > Hi Mike, > > Yes, this is looking much better. > > Some combination of removing corrupted files indicated in the zpool > status -v output, running zpool scrub and then zpool clear should > resolve the corruption, but its depends on how bad the corruption is. > > First, I would try least destruction method: Try to remove the > files listed below by using the rm command. > > This entry probably means that the metadata is corrupted or some > other file (like a temp file) no longer exists: > > tank1/argus-data:<0xc6> Hi Cindy, I removed the files that were listed, and now I am left with errors: Permanent errors have been detected in the following files: tank1/argus-data:<0xc5> tank1/argus-data:<0xc6> tank1/argus-data:<0xc7> I have started a scrub scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go I will report back once the scrub is done! ---Mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multiple disk failure (solved?)
On 1/29/2011 6:18 PM, Richard Elling wrote: > > On Jan 29, 2011, at 12:58 PM, Mike Tancsa wrote: > >> On 1/29/2011 12:57 PM, Richard Elling wrote: >>>> 0(offsite)# zpool status >>>> pool: tank1 >>>> state: UNAVAIL >>>> status: One or more devices could not be opened. There are insufficient >>>> replicas for the pool to continue functioning. >>>> action: Attach the missing device and online it using 'zpool online'. >>>> see: http://www.sun.com/msg/ZFS-8000-3C >>>> scrub: none requested >>>> config: >>>> >>>> NAMESTATE READ WRITE CKSUM >>>> tank1 UNAVAIL 0 0 0 insufficient replicas >>>> raidz1ONLINE 0 0 0 >>>> ad0 ONLINE 0 0 0 >>>> ad1 ONLINE 0 0 0 >>>> ad4 ONLINE 0 0 0 >>>> ad6 ONLINE 0 0 0 >>>> raidz1ONLINE 0 0 0 >>>> ada4ONLINE 0 0 0 >>>> ada5ONLINE 0 0 0 >>>> ada6ONLINE 0 0 0 >>>> ada7ONLINE 0 0 0 >>>> raidz1UNAVAIL 0 0 0 insufficient replicas >>>> ada0UNAVAIL 0 0 0 cannot open >>>> ada1UNAVAIL 0 0 0 cannot open >>>> ada2UNAVAIL 0 0 0 cannot open >>>> ada3UNAVAIL 0 0 0 cannot open >>>> 0(offsite)# >>> >>> This is usually easily solved without data loss by making the >>> disks available again. Can you read anything from the disks using >>> any program? >> >> Thats the strange thing, the disks are readable. The drive cage just >> reset a couple of times prior to the crash. But they seem OK now. Same >> order as well. >> >> # camcontrol devlist >> at scbus0 target 0 lun 0 >> (pass0,ada0) >> at scbus0 target 1 lun 0 >> (pass1,ada1) >> at scbus0 target 2 lun 0 >> (pass2,ada2) >> at scbus0 target 3 lun 0 >> (pass3,ada3) >> >> >> # dd if=/dev/ada2 of=/dev/null count=20 bs=1024 >> 20+0 records in >> 20+0 records out >> 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec) >> 0(offsite)# > > The next step is to run "zdb -l" and look for all 4 labels. Something like: > zdb -l /dev/ada2 > > If all 4 labels exist for each drive and appear intact, then look more closely > at how the OS locates the vdevs. If you can't solve the "UNAVAIL" problem, > you won't be able to import the pool. > -- richard On 1/29/2011 10:13 PM, James R. Van Artsdalen wrote: > On 1/28/2011 4:46 PM, Mike Tancsa wrote: >> >> I had just added another set of disks to my zfs array. It looks like the >> drive cage with the new drives is faulty. I had added a couple of files >> to the main pool, but not much. Is there any way to restore the pool >> below ? I have a lot of files on ad0,1,4,6 and ada4,5,6,7 and perhaps >> one file on the new drives in the bad cage. > > Get another enclosure and verify it works OK. Then move the disks from > the suspect enclosure to the tested enclosure and try to import the pool. > > The problem may be cabling or the controller instead - you didn't > specify how the disks were attached or which version of FreeBSD you're > using. > First off thanks to all who responded on and offlist! Good news (for me) it seems. New cage and all seems to be recognized correctly. The history is ... 2010-04-22.14:27:38 zpool add tank1 raidz /dev/ada4 /dev/ada5 /dev/ada6 /dev/ada7 2010-06-11.13:49:33 zfs create tank1/argus-data 2010-06-11.13:49:41 zfs create tank1/argus-data/previous 2010-06-11.13:50:38 zfs set compression=off tank1/argus-data 2010-08-06.12:20:59 zpool replace tank1 ad1 ad1 2010-09-16.10:17:51 zpool upgrade -a 2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2 /dev/ada3 FreeBSD RELENG_8 from last week, 8G of RAM, amd64. zpool status -v pool: tank1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM tank1 ONLINE 0 0 0
Re: [zfs-discuss] multiple disk failure
On 1/30/2011 12:39 AM, Richard Elling wrote: >> Hmmm, doesnt look good on any of the drives. > > I'm not sure of the way BSD enumerates devices. Some clever person thought > that hiding the partition or slice would be useful. I don't find it useful. > On a Solaris > system, ZFS can show a disk something like c0t1d0, but that doesn't exist. The > actual data is in slice 0, so you need to use c0t1d0s0 as the argument to zdb. I think its the right syntax. On the older drives, 0(offsite)# zdb -l /dev/ada0 LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 failed to unpack label 2 LABEL 3 failed to unpack label 3 0(offsite)# zdb -l /dev/ada4 LABEL 0 version=15 name='tank1' state=0 txg=44593174 pool_guid=7336939736750289319 hostid=3221266864 hostname='offsite.sentex.ca' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type='raidz' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type='disk' id=0 guid=16144392433229115618 path='/dev/ada4' whole_disk=0 DTL=341 children[1] type='disk' id=1 guid=1210677308003674848 path='/dev/ada5' whole_disk=0 DTL=340 children[2] type='disk' id=2 guid=2517076601231706249 path='/dev/ada6' whole_disk=0 DTL=339 children[3] type='disk' id=3 guid=16621760039941477713 path='/dev/ada7' whole_disk=0 DTL=338 LABEL 1 version=15 name='tank1' state=0 txg=44592523 pool_guid=7336939736750289319 hostid=3221266864 hostname='offsite.sentex.ca' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type='raidz' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type='disk' id=0 guid=16144392433229115618 path='/dev/ada4' whole_disk=0 DTL=341 children[1] type='disk' id=1 guid=1210677308003674848 path='/dev/ada5' whole_disk=0 DTL=340 children[2] type='disk' id=2 guid=2517076601231706249 path='/dev/ada6' whole_disk=0 DTL=339 children[3] type='disk' id=3 guid=16621760039941477713 path='/dev/ada7' whole_disk=0 DTL=338 LABEL 2 version=15 name='tank1' state=0 txg=44593174 pool_guid=7336939736750289319 hostid=3221266864 hostname='offsite.sentex.ca' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type='raidz' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type='disk' id=0 guid=16144392433229115618 path='/dev/ada4' whole_disk=0 DTL=341 children[1] type='disk' id=1 guid=1210677308003674848 path='/dev/ada5' whole_disk=0 DTL=340 children[2] type='disk' id=2 guid=2517076601231706249 path='/dev/ada6' whole_disk=0 DTL=339 children[3] type='disk' id=3 guid=16621760039941477713 path='/dev/ada7' whole_disk=0 DTL=338 -
Re: [zfs-discuss] multiple disk failure
On 1/29/2011 6:18 PM, Richard Elling wrote: >> 0(offsite)# > > The next step is to run "zdb -l" and look for all 4 labels. Something like: > zdb -l /dev/ada2 > > If all 4 labels exist for each drive and appear intact, then look more closely > at how the OS locates the vdevs. If you can't solve the "UNAVAIL" problem, > you won't be able to import the pool. Hmmm, doesnt look good on any of the drives. Before I give up, I will try the drives in a different cage Monday. Unfortunately, its a 150km away from me at our DR site # zdb -l /dev/ada0 LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 failed to unpack label 2 LABEL 3 failed to unpack label 3 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multiple disk failure
On 1/29/2011 11:38 AM, Edward Ned Harvey wrote: > > That is precisely the reason why you always want to spread your mirror/raidz > devices across multiple controllers or chassis. If you lose a controller or > a whole chassis, you lose one device from each vdev, and you're able to > continue production in a degraded state... Thanks. These are backups of backups. It would be nice to restore them as it will take a while to sync up once again. But if I need to start fresh, is there a resource you can point me to with the current best practices for laying out large storage like this ? Its just for backups of backups in a DR site ---Mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multiple disk failure
On 1/29/2011 12:57 PM, Richard Elling wrote: >> 0(offsite)# zpool status >> pool: tank1 >> state: UNAVAIL >> status: One or more devices could not be opened. There are insufficient >>replicas for the pool to continue functioning. >> action: Attach the missing device and online it using 'zpool online'. >> see: http://www.sun.com/msg/ZFS-8000-3C >> scrub: none requested >> config: >> >>NAMESTATE READ WRITE CKSUM >>tank1 UNAVAIL 0 0 0 insufficient replicas >> raidz1ONLINE 0 0 0 >>ad0 ONLINE 0 0 0 >>ad1 ONLINE 0 0 0 >>ad4 ONLINE 0 0 0 >>ad6 ONLINE 0 0 0 >> raidz1ONLINE 0 0 0 >>ada4ONLINE 0 0 0 >>ada5ONLINE 0 0 0 >>ada6ONLINE 0 0 0 >>ada7ONLINE 0 0 0 >> raidz1UNAVAIL 0 0 0 insufficient replicas >>ada0UNAVAIL 0 0 0 cannot open >>ada1UNAVAIL 0 0 0 cannot open >>ada2UNAVAIL 0 0 0 cannot open >>ada3UNAVAIL 0 0 0 cannot open >> 0(offsite)# > > This is usually easily solved without data loss by making the > disks available again. Can you read anything from the disks using > any program? Thats the strange thing, the disks are readable. The drive cage just reset a couple of times prior to the crash. But they seem OK now. Same order as well. # camcontrol devlist at scbus0 target 0 lun 0 (pass0,ada0) at scbus0 target 1 lun 0 (pass1,ada1) at scbus0 target 2 lun 0 (pass2,ada2) at scbus0 target 3 lun 0 (pass3,ada3) # dd if=/dev/ada2 of=/dev/null count=20 bs=1024 20+0 records in 20+0 records out 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec) 0(offsite)# ---Mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] multiple disk failure
Hi, I am using FreeBSD 8.2 and went to add 4 new disks today to expand my offsite storage. All was working fine for about 20min and then the new drive cage started to fail. Silly me for assuming new hardware would be fine :( The new drive cage started to fail, it hung the server and the box rebooted. After it rebooted, the entire pool is gone and in the state below. I had only written a few files to the new larger pool and I am not concerned about restoring that data. However, is there a way to get back the original pool data ? Going to http://www.sun.com/msg/ZFS-8000-3C gives a 503 error on the web page listed BTW. 0(offsite)# zpool status pool: tank1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAMESTATE READ WRITE CKSUM tank1 UNAVAIL 0 0 0 insufficient replicas raidz1ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada4ONLINE 0 0 0 ada5ONLINE 0 0 0 ada6ONLINE 0 0 0 ada7ONLINE 0 0 0 raidz1UNAVAIL 0 0 0 insufficient replicas ada0UNAVAIL 0 0 0 cannot open ada1UNAVAIL 0 0 0 cannot open ada2UNAVAIL 0 0 0 cannot open ada3UNAVAIL 0 0 0 cannot open 0(offsite)# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss