ZFS RaidZ-2 problems
I've already posted this to freebsd-fs@ but still have no idea as to why the below has happened. On 10/30/12 09:08, Paul Wootton wrote: Hi, I have had lots of bad luck with SATA drives and have had them fail on me far too often. Started with a 3 drive RAIDZ and lost 2 drives at the same time. Upgraded to a 6 drive RAIDZ and lost 2 drives with in hours of each other and finally had a 9 drive RAIDZ (1 parity) and lost another 2 drives (as luck would happen, this time I had a 90% backup on another machine so did not loose everything). I finally decided that I should switch to a RAIDZ2 (my current setup). Now I have lost 1 drive and the pack is showing as faulted. I have tried exporting and reimporting, but that did not help either. Is this normal? Has any one got any ideas as to what has happened and why? The fault this time might be cabling so I might not have lost the data, but my understanding was that with RAIDZ-2, you could loose 2 drives and still have a working pack. I do know the fault could also be the power supply, controller etc. I can take care of all the hardware. The issue I have is, I have a 9 RAIDZ-2 pack with only 1 disk showing as offline and the pack is showing as faulted. If the power supply was bouncing and a drive was giving bad data, I would expect ZFS to report that 2 drives were faulted (1 offline and 1 corrupt) Is there a way with ZDB that I can see why the pool is showing as faulted? Can it tell me which drives it thinks are bad, or has bad data? I do still have the 90% backup of the pool and nothing has really changed since that backup, so if someone wants me to try something and it blows the pack away, it's not the end of the world. Cheers Paul pool: storage state: FAULTED status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-3C scan: resilvered 30K in 0h0m with 0 errors on Sun Oct 14 12:52:45 2012 config: NAME STATE READ WRITE CKSUM storage FAULTED 0 0 1 raidz2-0FAULTED 0 0 6 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 1811927559723424 UNAVAIL 0 0 0 was /dev/ada3 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada8 ONLINE 0 0 0 ada10p4 ONLINE 0 0 0 root@filekeeper:/storage # zpool export storage root@filekeeper:/storage # zpool import storage cannot import 'storage': I/O error Destroy and re-create the pool from a backup source. root@filekeeper:/usr/home/paul # uname -a FreeBSD filekeeper.caspersworld.co.uk 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r240967: Thu Sep 27 08:01:24 UTC 2012 r...@filekeeper.caspersworld.co.uk:/usr/obj/usr/src/sys/GENERIC amd64 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ZFS RaidZ-2 problems
Yes RAIDZ2 should enable a 2 drive failure without the array faulting so something strange is going on there somewhere. Silly question, what size drives and what driver are you using? Regards Steve - Original Message - From: Paul Wootton paul-free...@fletchermoorland.co.uk To: freeBSD-CURRENT Mailing List freebsd-current@freebsd.org Sent: Monday, November 05, 2012 10:25 AM Subject: ZFS RaidZ-2 problems I've already posted this to freebsd-fs@ but still have no idea as to why the below has happened. On 10/30/12 09:08, Paul Wootton wrote: Hi, I have had lots of bad luck with SATA drives and have had them fail on me far too often. Started with a 3 drive RAIDZ and lost 2 drives at the same time. Upgraded to a 6 drive RAIDZ and lost 2 drives with in hours of each other and finally had a 9 drive RAIDZ (1 parity) and lost another 2 drives (as luck would happen, this time I had a 90% backup on another machine so did not loose everything). I finally decided that I should switch to a RAIDZ2 (my current setup). Now I have lost 1 drive and the pack is showing as faulted. I have tried exporting and reimporting, but that did not help either. Is this normal? Has any one got any ideas as to what has happened and why? The fault this time might be cabling so I might not have lost the data, but my understanding was that with RAIDZ-2, you could loose 2 drives and still have a working pack. I do know the fault could also be the power supply, controller etc. I can take care of all the hardware. The issue I have is, I have a 9 RAIDZ-2 pack with only 1 disk showing as offline and the pack is showing as faulted. If the power supply was bouncing and a drive was giving bad data, I would expect ZFS to report that 2 drives were faulted (1 offline and 1 corrupt) Is there a way with ZDB that I can see why the pool is showing as faulted? Can it tell me which drives it thinks are bad, or has bad data? I do still have the 90% backup of the pool and nothing has really changed since that backup, so if someone wants me to try something and it blows the pack away, it's not the end of the world. Cheers Paul pool: storage state: FAULTED status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-3C scan: resilvered 30K in 0h0m with 0 errors on Sun Oct 14 12:52:45 2012 config: NAME STATE READ WRITE CKSUM storage FAULTED 0 0 1 raidz2-0FAULTED 0 0 6 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 1811927559723424 UNAVAIL 0 0 0 was /dev/ada3 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada8 ONLINE 0 0 0 ada10p4 ONLINE 0 0 0 root@filekeeper:/storage # zpool export storage root@filekeeper:/storage # zpool import storage cannot import 'storage': I/O error Destroy and re-create the pool from a backup source. root@filekeeper:/usr/home/paul # uname -a FreeBSD filekeeper.caspersworld.co.uk 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r240967: Thu Sep 27 08:01:24 UTC 2012 r...@filekeeper.caspersworld.co.uk:/usr/obj/usr/src/sys/GENERIC amd64 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ZFS RaidZ-2 problems
On 11/05/12 10:49, Steven Hartland wrote: Yes RAIDZ2 should enable a 2 drive failure without the array faulting so something strange is going on there somewhere. That was my thought, but I dont know what or why. Silly question, what size drives and what driver are you using? See below Regards Steve - Original Message - From: Paul Wootton paul-free...@fletchermoorland.co.uk To: freeBSD-CURRENT Mailing List freebsd-current@freebsd.org Sent: Monday, November 05, 2012 10:25 AM Subject: ZFS RaidZ-2 problems state: FAULTED status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-3C scan: resilvered 30K in 0h0m with 0 errors on Sun Oct 14 12:52:45 2012 config: NAME STATE READ WRITE CKSUM storage FAULTED 0 0 1 raidz2-0FAULTED 0 0 6 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 1811927559723424 UNAVAIL 0 0 0 was /dev/ada3 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada8 ONLINE 0 0 0 ada10p4 ONLINE 0 0 0 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ST3500418AS CC37 ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: WDC WD5000AACS-00D0B0 01.01B01 ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad6 ada2 at ahcich2 bus 0 scbus3 target 0 lun 0 ada2: MAXTOR STM3500320AS MX15 ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada2: Previously was known as ad10 ada3 at ahcich3 bus 0 scbus4 target 0 lun 0 ada3: ST3500410AS CC34 ATA-8 SATA 2.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada3: Previously was known as ad12 ada4 at ahcich5 bus 0 scbus7 target 0 lun 0 ada4: WDC WD5000AADS-00S9B0 01.00A01 ATA-8 SATA 2.x device ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada4: Previously was known as ad18 ada5 at ahcich6 bus 0 scbus9 target 0 lun 0 ada5: WDC WD5000AADS-00S9B0 01.00A01 ATA-8 SATA 2.x device ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada5: Command Queueing enabled ada5: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada5: Previously was known as ad22 ada6 at ahcich7 bus 0 scbus10 target 0 lun 0 ada6: WDC WD5000AADS-00M2B0 01.00A01 ATA-8 SATA 2.x device ada6: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada6: Command Queueing enabled ada6: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada6: Previously was known as ad24 ada7 at ahcich8 bus 0 scbus11 target 0 lun 0 ada7: WDC WD5000AADS-00M2B0 01.00A01 ATA-8 SATA 2.x device ada7: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada7: Command Queueing enabled ada7: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada7: Previously was known as ad26 ada8 at ahcich9 bus 0 scbus12 target 0 lun 0 ada8: WDC WD5000AADS-00M2B0 01.00A01 ATA-8 SATA 2.x device ada8: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada8: Command Queueing enabled ada8: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada8: Previously was known as ad28 ada9 at ahcich10 bus 0 scbus13 target 0 lun 0 ada9: MAXTOR STM3160215AS 4.AAB ATA-7 SATA 2.x device ada9: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada9: Command Queueing enabled ada9: 152627MB (312581808 512 byte sectors: 16H 63S/T 16383C) ada9: Previously was known as ad30 ada10 at ahcich11 bus 0 scbus14 target 0 lun 0 ada10: ST31000528AS CC38 ATA-8 SATA 2.x device ada10: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada10: Command Queueing enabled ada10: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada10: Previously was known as ad32 root@filekeeper:/dev # gpart show ada10 =34 1953525101 ada10 GPT (931G) 34 256 1 freebsd-boot (128k