Re: Panic on ZFS startup after crash
On Mon, Jul 21, 2008 at 06:18:10PM -0300, Nenhum_de_Nos wrote: The ZFS code in 7.0 is the same as in HEAD, so no worries. I'm trying zfs myself in a small enviroment at home, but for that I do follow 7-STABLE. there's no need to do that, as based in the above statement ? There might be some small differences, but the patches I provide here will apply to 7.0-RELEASE, 7-STABLE and 8-CURRENT. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpXtFi29GDZt.pgp Description: PGP signature
Re: Panic on ZFS startup after crash
On Tue, July 22, 2008 06:07, Pawel Jakub Dawidek wrote: On Mon, Jul 21, 2008 at 06:18:10PM -0300, Nenhum_de_Nos wrote: The ZFS code in 7.0 is the same as in HEAD, so no worries. I'm trying zfs myself in a small enviroment at home, but for that I do follow 7-STABLE. there's no need to do that, as based in the above statement ? There might be some small differences, but the patches I provide here will apply to 7.0-RELEASE, 7-STABLE and 8-CURRENT. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! ok, but my main wonder was about to be using the most new but stable zfs code :) do I keep running stable ? thanks, matheus -- We will call you cygnus, The God of balance you shall be ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Panic on ZFS startup after crash
On Mon, Jul 21, 2008 at 12:29:54AM +0200, Daniel Eriksson wrote: Pawel Jakub Dawidek wrote: Can you try this patch? http://people.freebsd.org/~pjd/patches/space_map.c.patch Now it panics (solaris assert) at line 431 in dmu.c. I'll try to get a backtrace in a day or two if it would help. The backtrace won't help here. I'm afraid your pool's metadata is somehow corrupted that ZFS can't handle that. I saw warnings in your first e-mail about ZFS not beeing able to replay ZIL. Can you try disabling ZIL? Something like: # zpool export name # kldunload zfs # kenv vfs.zfs.zil_disable=1 # kldload zfs # zpool import name Although I'm not sure if disabling ZIL will prevent replaying previously prepared ZIL. If that won't help, I'm afraid the last suggestion I can provide is to try the lastest ZFS version (I can prepare a patch for you in a few days). The panic you're seeing is in dmu_write() function. You could also try to import a pool read-only, but I just tried doing so with 'zpool import -o ro name' command and it mount file systems read-write. Not sure why it doesn't work, but I'll try to fix it today. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpZI9uza4kOn.pgp Description: PGP signature
Re: Panic on ZFS startup after crash
On Mon, Jul 21, 2008 at 11:02:36AM +0200, Pawel Jakub Dawidek wrote: On Mon, Jul 21, 2008 at 12:29:54AM +0200, Daniel Eriksson wrote: Pawel Jakub Dawidek wrote: Can you try this patch? http://people.freebsd.org/~pjd/patches/space_map.c.patch Now it panics (solaris assert) at line 431 in dmu.c. I'll try to get a backtrace in a day or two if it would help. The backtrace won't help here. I'm afraid your pool's metadata is somehow corrupted that ZFS can't handle that. I saw warnings in your first e-mail about ZFS not beeing able to replay ZIL. Can you try disabling ZIL? Something like: # zpool export name # kldunload zfs # kenv vfs.zfs.zil_disable=1 # kldload zfs # zpool import name Although I'm not sure if disabling ZIL will prevent replaying previously prepared ZIL. If that won't help, I'm afraid the last suggestion I can provide is to try the lastest ZFS version (I can prepare a patch for you in a few days). The panic you're seeing is in dmu_write() function. You could also try to import a pool read-only, but I just tried doing so with 'zpool import -o ro name' command and it mount file systems read-write. Not sure why it doesn't work, but I'll try to fix it today. I fixed 'zpool import -o ro' problem in HEAD, but you can also patch your 7.0 sources with this patch: http://people.freebsd.org/~pjd/patches/opensolaris_vfs.c.2.patch With this patch applied and ZIL disabled, try to: # zpool import -o ro name -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpw13LpCk4z9.pgp Description: PGP signature
RE: Panic on ZFS startup after crash
Pawel Jakub Dawidek wrote: I'm afraid your pool's metadata is somehow corrupted that ZFS can't handle that. Yes, that's my conclusion also. It looks like the intent log is messed up enough to trigger an assert while ZFS tries to parse/replay it. I saw warnings in your first e-mail about ZFS not beeing able to replay ZIL. Can you try disabling ZIL? Something like: I've already tried this, and it made no difference. When the box crashed ZIL was enabled, and for some reason garbage got written into the ZIL. Now whenever ZFS tries to import the pool it sees a non-empty ZIL and tries to parse/replay it. Is there an easy way to trick ZFS into thinking the ZIL is empty? Although I'm not sure if disabling ZIL will prevent replaying previously prepared ZIL. It won't unfortunately. If that won't help, I'm afraid the last suggestion I can provide is to try the lastest ZFS version (I can prepare a patch for you in a few days). I could probably prepare a temporary install of 8-CURRENT on a spare drive and boot from that if it's easier for you to make a patch against CURRENT instead of RELENG_7_0. You could also try to import a pool read-only, but I just tried doing so with 'zpool import -o ro name' command and it mount file systems read-write. Not sure why it doesn't work, but I'll try to fix it today. I'll try that! ___ Daniel Eriksson (http://www.toomuchdata.com/) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Panic on ZFS startup after crash
On Mon, Jul 21, 2008 at 03:49:24PM +0200, Daniel Eriksson wrote: Pawel Jakub Dawidek wrote: I'm afraid your pool's metadata is somehow corrupted that ZFS can't handle that. Yes, that's my conclusion also. It looks like the intent log is messed up enough to trigger an assert while ZFS tries to parse/replay it. I saw warnings in your first e-mail about ZFS not beeing able to replay ZIL. Can you try disabling ZIL? Something like: I've already tried this, and it made no difference. When the box crashed ZIL was enabled, and for some reason garbage got written into the ZIL. Now whenever ZFS tries to import the pool it sees a non-empty ZIL and tries to parse/replay it. Is there an easy way to trick ZFS into thinking the ZIL is empty? I'll check that. If that won't help, I'm afraid the last suggestion I can provide is to try the lastest ZFS version (I can prepare a patch for you in a few days). I could probably prepare a temporary install of 8-CURRENT on a spare drive and boot from that if it's easier for you to make a patch against CURRENT instead of RELENG_7_0. The ZFS code in 7.0 is the same as in HEAD, so no worries. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp0sRKS33wJ4.pgp Description: PGP signature
Re: Panic on ZFS startup after crash
On Mon, Jul 21, 2008 at 03:51:56PM +0200, Pawel Jakub Dawidek wrote: On Mon, Jul 21, 2008 at 03:49:24PM +0200, Daniel Eriksson wrote: Pawel Jakub Dawidek wrote: I'm afraid your pool's metadata is somehow corrupted that ZFS can't handle that. Yes, that's my conclusion also. It looks like the intent log is messed up enough to trigger an assert while ZFS tries to parse/replay it. I saw warnings in your first e-mail about ZFS not beeing able to replay ZIL. Can you try disabling ZIL? Something like: I've already tried this, and it made no difference. When the box crashed ZIL was enabled, and for some reason garbage got written into the ZIL. Now whenever ZFS tries to import the pool it sees a non-empty ZIL and tries to parse/replay it. Is there an easy way to trick ZFS into thinking the ZIL is empty? I'll check that. Ok. We may try not to replay the ZIL, but leave it there and see what will happen. We can also try to destroy the ZIL without replaying it. What we do from now on can mess up your pool even further, so you may want to backup entire disks if you want. To skip replaying the ZIL you need to edit /sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c file, find zil_replay() function and make the head of it looks like this: void zil_replay(objset_t *os, void *arg, uint64_t *txgp, zil_replay_func_t *replay_func[TX_MAX_TYPE]) { zilog_t *zilog = dmu_objset_zil(os); const zil_header_t *zh = zilog-zl_header; zil_replay_arg_t zr; /* XXX: Try to skip the ZIL replay. */ return; if (zil_empty(zilog)) { zil_destroy(zilog, B_TRUE); return; } [...] If that won't work, we can try to destroy the ZIL without replaying it: void zil_replay(objset_t *os, void *arg, uint64_t *txgp, zil_replay_func_t *replay_func[TX_MAX_TYPE]) { zilog_t *zilog = dmu_objset_zil(os); const zil_header_t *zh = zilog-zl_header; zil_replay_arg_t zr; /* XXX: Destroy the ZIL without replaying it. */ zil_destroy(zilog, B_FALSE); return; if (zil_empty(zilog)) { zil_destroy(zilog, B_TRUE); return; } [...] -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpibWYBElZFD.pgp Description: PGP signature
Re: Panic on ZFS startup after crash
The ZFS code in 7.0 is the same as in HEAD, so no worries. I'm trying zfs myself in a small enviroment at home, but for that I do follow 7-STABLE. there's no need to do that, as based in the above statement ? thanks, matheus -- We will call you cygnus, The God of balance you shall be ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Panic on ZFS startup after crash
Pawel Jakub Dawidek wrote: Can you try this patch? http://people.freebsd.org/~pjd/patches/space_map.c.patch Now it panics (solaris assert) at line 431 in dmu.c. I'll try to get a backtrace in a day or two if it would help. Any other suggestions Pawel? ___ Daniel Eriksson (http://www.toomuchdata.com/) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Panic on ZFS startup after crash
I've tried neither of these in your particular case, but they might be worth a try: Just a suggestion, but try specify vfs.zfs.zil_disable=1 or as a kernel variable in the boot cli. You may want to try export and import the pool and see how it likes it then. -- Alex On Sat, 2008-07-19 at 10:51 +0200, Daniel Eriksson wrote: I have a large ZFS pool that seems to be partially corrupt, causing a panic on ZFS startup. This is on a RELENG_7_0 machine. This is what happens when I try to start ZFS (written down by hand): ZFS: WARNING: can't process intent log for tank02/home ZFS: WARNING: can't process intent log for tank02 panic: solaris assert: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0x5 == 0x0), file: /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/spa ce_map.c, line: 341 The pool sits on top of a geli-encrypted hardware raid-array (Highpoint RocketRAID 2340, 8 x 500GB in RAID-5 config). Unfortunately the array broke (2 drives disconnected) due to a bad PSU, and this eventually crashed the box. When I restarted the box the above message showed up as soon as I started ZFS. It is my understanding that the intent log is emptied on clean shutdown, and if it is not empty during startup ZFS tries to replay the transactions recorded in it. I assume the initial crash left the intent log in an inconsistent state and that ZFS panics on startup due to badly formatted data in the intent log. Is there any way I can recover this pool? ___ Daniel Eriksson (http://www.toomuchdata.com/) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
RE: Panic on ZFS startup after crash
Alex Trull wrote: I've tried neither of these in your particular case, but they might be worth a try: Just a suggestion, but try specify vfs.zfs.zil_disable=1 or as a kernel variable in the boot cli. You may want to try export and import the pool and see how it likes it then. I considered disabling the zil but didn't think it would make a difference (because zfs will most likely check if there is any data in zil on startup anyway). I'll try it though! Importing the pool results in the exact same problem. (I couldn't export the live pool because the box paniced, but by making sure the vdev was unavailable on zfs startup I could export the stale pool, enable the vdev (by doing a geli attach) and then I tried to import the pool. No luck, same panic message. ___ Daniel Eriksson (http://www.toomuchdata.com/) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Panic on ZFS startup after crash
Alex Trull wrote: Just a suggestion, but try specify vfs.zfs.zil_disable=1 or as a kernel variable in the boot cli. I just tried this and unfortunately it didn't work. I got the exact same kernel panic. I've been looking through the code to try to find a way to fool ZFS into thinking the intent log. Anyone familiar with the code that could point me in the right direction? ___ Daniel Eriksson (http://www.toomuchdata.com/) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Panic on ZFS startup after crash
On Sat, 19 Jul 2008, Daniel Eriksson wrote: DE Just a suggestion, but try specify vfs.zfs.zil_disable=1 or DE as a kernel variable in the boot cli. DE DE I just tried this and unfortunately it didn't work. I got the exact same DE kernel panic. DE DE I've been looking through the code to try to find a way to fool ZFS into DE thinking the intent log. Anyone familiar with the code that could point DE me in the right direction? You may find useful trying to ask Pawel (pjd@) directly. I'd CC:d him to simplify process ;-) Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: [EMAIL PROTECTED] ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Panic on ZFS startup after crash
On Sat, Jul 19, 2008 at 10:51:21AM +0200, Daniel Eriksson wrote: I have a large ZFS pool that seems to be partially corrupt, causing a panic on ZFS startup. This is on a RELENG_7_0 machine. This is what happens when I try to start ZFS (written down by hand): ZFS: WARNING: can't process intent log for tank02/home ZFS: WARNING: can't process intent log for tank02 panic: solaris assert: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0x5 == 0x0), file: /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/spa ce_map.c, line: 341 The pool sits on top of a geli-encrypted hardware raid-array (Highpoint RocketRAID 2340, 8 x 500GB in RAID-5 config). Unfortunately the array broke (2 drives disconnected) due to a bad PSU, and this eventually crashed the box. When I restarted the box the above message showed up as soon as I started ZFS. It is my understanding that the intent log is emptied on clean shutdown, and if it is not empty during startup ZFS tries to replay the transactions recorded in it. I assume the initial crash left the intent log in an inconsistent state and that ZFS panics on startup due to badly formatted data in the intent log. Is there any way I can recover this pool? Can you try this patch? http://people.freebsd.org/~pjd/patches/space_map.c.patch -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpqlrzkruUGw.pgp Description: PGP signature