Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.
Brad Hill wrote: I've seen reports of a recent Seagate firmware update bricking drives again. What's the output of 'zpool import' from the LiveCD? It sounds like ore than 1 drive is dropping off. r...@opensolaris:~# zpool import pool: tank id: 16342816386332636568 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tankFAULTED corrupted data raidz1DEGRADED c6t0d0 ONLINE c6t1d0 ONLINE c6t2d0 ONLINE c6t3d0 UNAVAIL cannot open c6t4d0 ONLINE pool: rpool id: 9891756864015178061 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: rpool ONLINE c3d0s0ONLINE 1.) Here's a similar report from last summer from someone running ZFS on FreeBSD. No resolution there either: raidz vdev marked faulted with only one faulted disk http://kerneltrap.org/index.php?q=mailarchive/freebsd-fs/2008/6/15/2132754 2.) This old thread from Dec 2007 for a different raidz1 problem, titled 'Faulted raidz1 shows the same device twice' suggests trying these commands (see the link for the context they were run under): http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg13214.html # zdb -l /dev/dsk/c18t0d0 # zpool export external # zpool import external # zpool clear external # zpool scrub external # zpool clear external 3.) Do you have ECC RAM? Have you verified that your memory, cpu, and motherboard are reliable? 4.) 'Bad exchange descriptor' is mentioned very sparingly across the net, mostly in system error tables. Also here: http://opensolaris.org/jive/thread.jspa?threadID=88486&tstart=165 5.) More raidz setup caveats, at least on MacOS: http://lists.macosforge.org/pipermail/zfs-discuss/2008-March/000346.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting
Take the new disk out as well.. foreign/bad non-zero disk label may cause trouble too. I've experienced tool core dumps with foreign disk (partition) label which might be the case if it is a recycled replacement disk (In my case fixed by plugging the disk it into a linux desktop and "blanking" the disk by wiping the label with "dd if=/dev/zero of=/dev/sdc bs=512 count=4" where /dev/sdc was the device it got assigned (linux: fdisk -l)). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting
Yes. I have disconnected the bad disk and booted with nothing in the slot, and also with known good replacement disk in on the same sata port. Doesn't change anything. Running 2008.11 on the box and 2008.11 snv_101b_rc2 on the LiveCD. I'll give it a shot booting from the latest build and see if that makes any kind of difference. Thanks for the suggestions. Brad > Just a thought, but have you physically disconnected > the bad disk? It's not unheard of for a bad disk to > cause problems with others. > > Failing that, it's the "corrupted data" bit that's > worrying me, it sounds like you may have other > corruption on the pool (always a risk with single > parity raid), but I'm worried that it's not giving > you any more details as to what's wrong. > > Also, what version of OpenSolaris are you running? > Could you maybe try booting off a CD of the latest > build? There are often improvements in the way ZFS > copes with errors, so it's worth a try. I don't > think it's likely to help, but I wouldn't discount > it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting
Just a thought, but have you physically disconnected the bad disk? It's not unheard of for a bad disk to cause problems with others. Failing that, it's the "corrupted data" bit that's worrying me, it sounds like you may have other corruption on the pool (always a risk with single parity raid), but I'm worried that it's not giving you any more details as to what's wrong. Also, what version of OpenSolaris are you running? Could you maybe try booting off a CD of the latest build? There are often improvements in the way ZFS copes with errors, so it's worth a try. I don't think it's likely to help, but I wouldn't discount it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan
I do, thank you. The disk that went out sounds like it had a head crash or some such - loud clicking shortly after spin-up then it spins down and gives me nothing. BIOS doesn't even detect it properly to do a firmware update. > Do you know 7200.11 has firmware bugs? > > Go to seagate website to check. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting
This is outside the scope of my knowledge/experience. Maybe there is now a core file you can examine? That might help you at least see what's going on? On Tue, Jan 27, 2009 at 10:32 PM, Brad Hill wrote: > r...@opensolaris:~# zpool import -f tank > internal error: Bad exchange descriptor > Abort (core dumped) > > Hoping someone has seen that before... the Google is seriously letting me > down on that one. > >> I guess you could try 'zpool import -f'. This is a >> pretty odd status, >> I think. I'm pretty sure raidz1 should survive a >> single disk failure. >> >> Perhaps a more knowledgeable list member can explain. > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting
r...@opensolaris:~# zpool import -f tank internal error: Bad exchange descriptor Abort (core dumped) Hoping someone has seen that before... the Google is seriously letting me down on that one. > I guess you could try 'zpool import -f'. This is a > pretty odd status, > I think. I'm pretty sure raidz1 should survive a > single disk failure. > > Perhaps a more knowledgeable list member can explain. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.
I guess you could try 'zpool import -f'. This is a pretty odd status, I think. I'm pretty sure raidz1 should survive a single disk failure. Perhaps a more knowledgeable list member can explain. On Sat, Jan 24, 2009 at 12:48 PM, Brad Hill wrote: >> I've seen reports of a recent Seagate firmware update >> bricking drives again. >> >> What's the output of 'zpool import' from the LiveCD? >> It sounds like >> ore than 1 drive is dropping off. > > > r...@opensolaris:~# zpool import > pool: tank >id: 16342816386332636568 > state: FAULTED > status: The pool was last accessed by another system. > action: The pool cannot be imported due to damaged devices or data. >The pool may be active on another system, but can be imported using >the '-f' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > >tankFAULTED corrupted data > raidz1DEGRADED >c6t0d0 ONLINE >c6t1d0 ONLINE >c6t2d0 ONLINE >c6t3d0 UNAVAIL cannot open >c6t4d0 ONLINE > > pool: rpool >id: 9891756864015178061 > state: ONLINE > status: The pool was last accessed by another system. > action: The pool can be imported using its name or numeric identifier and >the '-f' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > >rpool ONLINE > c3d0s0ONLINE > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan
Do you know 7200.11 has firmware bugs? Go to seagate website to check. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan
Any ideas on this? It looks like a potential bug to me, or there is something that I'm not seeing. Thanks again! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.
> I've seen reports of a recent Seagate firmware update > bricking drives again. > > What's the output of 'zpool import' from the LiveCD? > It sounds like > ore than 1 drive is dropping off. r...@opensolaris:~# zpool import pool: tank id: 16342816386332636568 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tankFAULTED corrupted data raidz1DEGRADED c6t0d0 ONLINE c6t1d0 ONLINE c6t2d0 ONLINE c6t3d0 UNAVAIL cannot open c6t4d0 ONLINE pool: rpool id: 9891756864015178061 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: rpool ONLINE c3d0s0ONLINE -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.
I've seen reports of a recent Seagate firmware update bricking drives again. What's the output of 'zpool import' from the LiveCD? It sounds like more than 1 drive is dropping off. On Thu, Jan 22, 2009 at 10:52 PM, Brad Hill wrote: >> I would get a new 1.5 TB and make sure it has the new >> firmware and replace >> c6t3d0 right away - even if someone here comes up >> with a magic solution, you >> don't want to wait for another drive to fail. > > The replacement disk showed up today but I'm unable to replace the one marked > UNAVAIL: > > r...@blitz:~# zpool replace tank c6t3d0 > cannot open 'tank': pool is unavailable > >> I would in this case also immediately export the pool (to prevent any >> write attempts) and see about a firmware update for the failed drive >> (probably need windows for this). > > While I didn't export first, I did boot with a livecd and tried to force the > import with that: > > r...@opensolaris:~# zpool import -f tank > internal error: Bad exchange descriptor > Abort (core dumped) > > Hopefully someone on this list understands what situation I am in and how to > resolve it. Again, many thanks in advance for any suggestions you all have to > offer. > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.
> I would get a new 1.5 TB and make sure it has the new > firmware and replace > c6t3d0 right away - even if someone here comes up > with a magic solution, you > don't want to wait for another drive to fail. The replacement disk showed up today but I'm unable to replace the one marked UNAVAIL: r...@blitz:~# zpool replace tank c6t3d0 cannot open 'tank': pool is unavailable > I would in this case also immediately export the pool (to prevent any > write attempts) and see about a firmware update for the failed drive > (probably need windows for this). While I didn't export first, I did boot with a livecd and tried to force the import with that: r...@opensolaris:~# zpool import -f tank internal error: Bad exchange descriptor Abort (core dumped) Hopefully someone on this list understands what situation I am in and how to resolve it. Again, many thanks in advance for any suggestions you all have to offer. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss