ZFS RaidZ-2 problems

2012-11-05 Thread Paul Wootton
I've already posted this to freebsd-fs@ but still have no idea as to why 
the below has happened.



On 10/30/12 09:08, Paul Wootton wrote:

Hi,

I have had lots of bad luck with SATA drives and have had them fail on 
me far too often. Started with a 3 drive RAIDZ and lost 2 drives at 
the same time. Upgraded to a 6 drive RAIDZ and lost 2 drives with in 
hours of each other and finally had a 9 drive RAIDZ (1 parity) and 
lost another 2 drives (as luck would happen, this time I had a 90% 
backup on another machine so did not loose everything). I finally 
decided that I should switch to a RAIDZ2 (my current setup).
Now I have lost 1 drive and the pack is showing as faulted. I have 
tried exporting and reimporting, but that did not help either.
Is this normal? Has any one got any ideas as to what has happened and 
why?


The fault this time might be cabling so I might not have lost the 
data, but my understanding was that with RAIDZ-2, you could loose 2 
drives and still have a working pack.
I do know the fault could also be the power supply, controller etc. I 
can take care of all the hardware.
The issue I have is, I have a 9 RAIDZ-2 pack with only 1 disk showing 
as offline and the pack is showing as faulted.
If the power supply was bouncing and a drive was giving bad data, I 
would expect ZFS to report that 2 drives were faulted (1 offline and 1 
corrupt)


Is there a way with ZDB that I can see why the pool is showing as 
faulted? Can it tell me which drives it thinks are bad, or has bad data?


I do still have the 90% backup of the pool and nothing has really 
changed since that backup, so if someone wants me to try something and 
it blows the pack away, it's not the end of the world.



Cheers
Paul


pool: storage
state: FAULTED
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: resilvered 30K in 0h0m with 0 errors on Sun Oct 14 12:52:45 2012
config:

NAME  STATE READ WRITE CKSUM
storage   FAULTED  0 0 1
  raidz2-0FAULTED  0 0 6
ada0  ONLINE   0 0 0
ada1  ONLINE   0 0 0
ada2  ONLINE   0 0 0
1811927559723424  UNAVAIL  0 0 0  was 
/dev/ada3

ada4  ONLINE   0 0 0
ada5  ONLINE   0 0 0
ada6  ONLINE   0 0 0
ada7  ONLINE   0 0 0
ada8  ONLINE   0 0 0
ada10p4   ONLINE   0 0 0

root@filekeeper:/storage # zpool export storage
root@filekeeper:/storage # zpool import storage
cannot import 'storage': I/O error
Destroy and re-create the pool from
a backup source.

root@filekeeper:/usr/home/paul # uname -a
FreeBSD filekeeper.caspersworld.co.uk 10.0-CURRENT FreeBSD 
10.0-CURRENT #0 r240967: Thu Sep 27 08:01:24 UTC 2012 
r...@filekeeper.caspersworld.co.uk:/usr/obj/usr/src/sys/GENERIC  amd64

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS RaidZ-2 problems

2012-11-05 Thread Steven Hartland

Yes RAIDZ2 should enable a 2 drive failure without the array faulting so
something strange is going on there somewhere.

Silly question, what size drives and what driver are you using?

   Regards
   Steve

- Original Message - 
From: Paul Wootton paul-free...@fletchermoorland.co.uk

To: freeBSD-CURRENT Mailing List freebsd-current@freebsd.org
Sent: Monday, November 05, 2012 10:25 AM
Subject: ZFS RaidZ-2 problems


I've already posted this to freebsd-fs@ but still have no idea as to why 
the below has happened.



On 10/30/12 09:08, Paul Wootton wrote:

Hi,

I have had lots of bad luck with SATA drives and have had them fail on 
me far too often. Started with a 3 drive RAIDZ and lost 2 drives at 
the same time. Upgraded to a 6 drive RAIDZ and lost 2 drives with in 
hours of each other and finally had a 9 drive RAIDZ (1 parity) and 
lost another 2 drives (as luck would happen, this time I had a 90% 
backup on another machine so did not loose everything). I finally 
decided that I should switch to a RAIDZ2 (my current setup).
Now I have lost 1 drive and the pack is showing as faulted. I have 
tried exporting and reimporting, but that did not help either.
Is this normal? Has any one got any ideas as to what has happened and 
why?


The fault this time might be cabling so I might not have lost the 
data, but my understanding was that with RAIDZ-2, you could loose 2 
drives and still have a working pack.
I do know the fault could also be the power supply, controller etc. I 
can take care of all the hardware.
The issue I have is, I have a 9 RAIDZ-2 pack with only 1 disk showing 
as offline and the pack is showing as faulted.
If the power supply was bouncing and a drive was giving bad data, I 
would expect ZFS to report that 2 drives were faulted (1 offline and 1 
corrupt)


Is there a way with ZDB that I can see why the pool is showing as 
faulted? Can it tell me which drives it thinks are bad, or has bad data?


I do still have the 90% backup of the pool and nothing has really 
changed since that backup, so if someone wants me to try something and 
it blows the pack away, it's not the end of the world.



Cheers
Paul


pool: storage
state: FAULTED
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: resilvered 30K in 0h0m with 0 errors on Sun Oct 14 12:52:45 2012
config:

NAME  STATE READ WRITE CKSUM
storage   FAULTED  0 0 1
  raidz2-0FAULTED  0 0 6
ada0  ONLINE   0 0 0
ada1  ONLINE   0 0 0
ada2  ONLINE   0 0 0
1811927559723424  UNAVAIL  0 0 0  was 
/dev/ada3

ada4  ONLINE   0 0 0
ada5  ONLINE   0 0 0
ada6  ONLINE   0 0 0
ada7  ONLINE   0 0 0
ada8  ONLINE   0 0 0
ada10p4   ONLINE   0 0 0

root@filekeeper:/storage # zpool export storage
root@filekeeper:/storage # zpool import storage
cannot import 'storage': I/O error
Destroy and re-create the pool from
a backup source.

root@filekeeper:/usr/home/paul # uname -a
FreeBSD filekeeper.caspersworld.co.uk 10.0-CURRENT FreeBSD 
10.0-CURRENT #0 r240967: Thu Sep 27 08:01:24 UTC 2012 
r...@filekeeper.caspersworld.co.uk:/usr/obj/usr/src/sys/GENERIC  amd64

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS RaidZ-2 problems

2012-11-05 Thread Paul Wootton

On 11/05/12 10:49, Steven Hartland wrote:

Yes RAIDZ2 should enable a 2 drive failure without the array faulting so
something strange is going on there somewhere.

That was my thought, but I dont know what or why.


Silly question, what size drives and what driver are you using?


See below

   Regards
   Steve

- Original Message - From: Paul Wootton 
paul-free...@fletchermoorland.co.uk

To: freeBSD-CURRENT Mailing List freebsd-current@freebsd.org
Sent: Monday, November 05, 2012 10:25 AM
Subject: ZFS RaidZ-2 problems



state: FAULTED
status: One or more devices could not be opened.  There are 
insufficient

replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: resilvered 30K in 0h0m with 0 errors on Sun Oct 14 12:52:45 
2012

config:

NAME  STATE READ WRITE CKSUM
storage   FAULTED  0 0 1
  raidz2-0FAULTED  0 0 6
ada0  ONLINE   0 0 0
ada1  ONLINE   0 0 0
ada2  ONLINE   0 0 0
1811927559723424  UNAVAIL  0 0 0  was 
/dev/ada3

ada4  ONLINE   0 0 0
ada5  ONLINE   0 0 0
ada6  ONLINE   0 0 0
ada7  ONLINE   0 0 0
ada8  ONLINE   0 0 0
ada10p4   ONLINE   0 0 0

ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: ST3500418AS CC37 ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: WDC WD5000AACS-00D0B0 01.01B01 ATA-8 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus3 target 0 lun 0
ada2: MAXTOR STM3500320AS MX15 ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad10
ada3 at ahcich3 bus 0 scbus4 target 0 lun 0
ada3: ST3500410AS CC34 ATA-8 SATA 2.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada3: Previously was known as ad12
ada4 at ahcich5 bus 0 scbus7 target 0 lun 0
ada4: WDC WD5000AADS-00S9B0 01.00A01 ATA-8 SATA 2.x device
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada4: Previously was known as ad18
ada5 at ahcich6 bus 0 scbus9 target 0 lun 0
ada5: WDC WD5000AADS-00S9B0 01.00A01 ATA-8 SATA 2.x device
ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada5: Command Queueing enabled
ada5: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada5: Previously was known as ad22
ada6 at ahcich7 bus 0 scbus10 target 0 lun 0
ada6: WDC WD5000AADS-00M2B0 01.00A01 ATA-8 SATA 2.x device
ada6: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada6: Command Queueing enabled
ada6: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada6: Previously was known as ad24
ada7 at ahcich8 bus 0 scbus11 target 0 lun 0
ada7: WDC WD5000AADS-00M2B0 01.00A01 ATA-8 SATA 2.x device
ada7: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada7: Command Queueing enabled
ada7: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada7: Previously was known as ad26
ada8 at ahcich9 bus 0 scbus12 target 0 lun 0
ada8: WDC WD5000AADS-00M2B0 01.00A01 ATA-8 SATA 2.x device
ada8: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada8: Command Queueing enabled
ada8: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada8: Previously was known as ad28
ada9 at ahcich10 bus 0 scbus13 target 0 lun 0
ada9: MAXTOR STM3160215AS 4.AAB ATA-7 SATA 2.x device
ada9: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada9: Command Queueing enabled
ada9: 152627MB (312581808 512 byte sectors: 16H 63S/T 16383C)
ada9: Previously was known as ad30
ada10 at ahcich11 bus 0 scbus14 target 0 lun 0
ada10: ST31000528AS CC38 ATA-8 SATA 2.x device
ada10: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada10: Command Queueing enabled
ada10: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada10: Previously was known as ad32

root@filekeeper:/dev # gpart show ada10
=34  1953525101  ada10  GPT  (931G)
  34 256  1  freebsd-boot  (128k