Re: Panic on ZFS startup after crash

2008-07-22 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2008 at 06:18:10PM -0300, Nenhum_de_Nos wrote:
  The ZFS code in 7.0 is the same as in HEAD, so no worries.
 
 I'm trying zfs myself in a small enviroment at home, but for that I do
 follow 7-STABLE. there's no need to do that, as based in the above
 statement ?

There might be some small differences, but the patches I provide here
will apply to 7.0-RELEASE, 7-STABLE and 8-CURRENT.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpXtFi29GDZt.pgp
Description: PGP signature


Re: Panic on ZFS startup after crash

2008-07-22 Thread Nenhum_de_Nos

On Tue, July 22, 2008 06:07, Pawel Jakub Dawidek wrote:
 On Mon, Jul 21, 2008 at 06:18:10PM -0300, Nenhum_de_Nos wrote:
  The ZFS code in 7.0 is the same as in HEAD, so no worries.

 I'm trying zfs myself in a small enviroment at home, but for that I do
 follow 7-STABLE. there's no need to do that, as based in the above
 statement ?

 There might be some small differences, but the patches I provide here
 will apply to 7.0-RELEASE, 7-STABLE and 8-CURRENT.

 --
 Pawel Jakub Dawidek   http://www.wheel.pl
 [EMAIL PROTECTED]   http://www.FreeBSD.org
 FreeBSD committer Am I Evil? Yes, I Am!

ok, but my main wonder was about to be using the most new but stable zfs
code :)

do I keep running stable ?

thanks,

matheus

-- 
We will call you cygnus,
The God of balance you shall be

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Panic on ZFS startup after crash

2008-07-21 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2008 at 12:29:54AM +0200, Daniel Eriksson wrote:
 Pawel Jakub Dawidek wrote:
 
  Can you try this patch?
  
  http://people.freebsd.org/~pjd/patches/space_map.c.patch
 
 Now it panics (solaris assert) at line 431 in dmu.c. I'll try to get a
 backtrace in a day or two if it would help.

The backtrace won't help here. I'm afraid your pool's metadata is
somehow corrupted that ZFS can't handle that. I saw warnings in your
first e-mail about ZFS not beeing able to replay ZIL. Can you try
disabling ZIL? Something like:

# zpool export name
# kldunload zfs
# kenv vfs.zfs.zil_disable=1
# kldload zfs
# zpool import name

Although I'm not sure if disabling ZIL will prevent replaying previously
prepared ZIL. If that won't help, I'm afraid the last suggestion I can
provide is to try the lastest ZFS version (I can prepare a patch for you
in a few days).

The panic you're seeing is in dmu_write() function. You could also try
to import a pool read-only, but I just tried doing so with
'zpool import -o ro name' command and it mount file systems
read-write. Not sure why it doesn't work, but I'll try to fix it today.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpZI9uza4kOn.pgp
Description: PGP signature


Re: Panic on ZFS startup after crash

2008-07-21 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2008 at 11:02:36AM +0200, Pawel Jakub Dawidek wrote:
 On Mon, Jul 21, 2008 at 12:29:54AM +0200, Daniel Eriksson wrote:
  Pawel Jakub Dawidek wrote:
  
   Can you try this patch?
   
 http://people.freebsd.org/~pjd/patches/space_map.c.patch
  
  Now it panics (solaris assert) at line 431 in dmu.c. I'll try to get a
  backtrace in a day or two if it would help.
 
 The backtrace won't help here. I'm afraid your pool's metadata is
 somehow corrupted that ZFS can't handle that. I saw warnings in your
 first e-mail about ZFS not beeing able to replay ZIL. Can you try
 disabling ZIL? Something like:
 
   # zpool export name
   # kldunload zfs
   # kenv vfs.zfs.zil_disable=1
   # kldload zfs
   # zpool import name
 
 Although I'm not sure if disabling ZIL will prevent replaying previously
 prepared ZIL. If that won't help, I'm afraid the last suggestion I can
 provide is to try the lastest ZFS version (I can prepare a patch for you
 in a few days).
 
 The panic you're seeing is in dmu_write() function. You could also try
 to import a pool read-only, but I just tried doing so with
 'zpool import -o ro name' command and it mount file systems
 read-write. Not sure why it doesn't work, but I'll try to fix it today.

I fixed 'zpool import -o ro' problem in HEAD, but you can also patch
your 7.0 sources with this patch:

http://people.freebsd.org/~pjd/patches/opensolaris_vfs.c.2.patch

With this patch applied and ZIL disabled, try to:

# zpool import -o ro name

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpw13LpCk4z9.pgp
Description: PGP signature


RE: Panic on ZFS startup after crash

2008-07-21 Thread Daniel Eriksson
Pawel Jakub Dawidek wrote:

 I'm afraid your pool's metadata is
 somehow corrupted that ZFS can't handle that.

Yes, that's my conclusion also. It looks like the intent log is messed
up enough to trigger an assert while ZFS tries to parse/replay it.

 I saw warnings in your
 first e-mail about ZFS not beeing able to replay ZIL. Can you try
 disabling ZIL? Something like:

I've already tried this, and it made no difference. When the box crashed
ZIL was enabled, and for some reason garbage got written into the ZIL.
Now whenever ZFS tries to import the pool it sees a non-empty ZIL and
tries to parse/replay it.

Is there an easy way to trick ZFS into thinking the ZIL is empty?

 Although I'm not sure if disabling ZIL will prevent replaying 
 previously prepared ZIL.

It won't unfortunately.

 If that won't help, I'm afraid the last suggestion I can
 provide is to try the lastest ZFS version (I can prepare a 
 patch for you in a few days).

I could probably prepare a temporary install of 8-CURRENT on a spare
drive and boot from that if it's easier for you to make a patch against
CURRENT instead of RELENG_7_0.

 You could also try 
 to import a pool read-only, but I just tried doing so with
 'zpool import -o ro name' command and it mount file systems
 read-write. Not sure why it doesn't work, but I'll try to fix 
 it today.

I'll try that!

___
Daniel Eriksson (http://www.toomuchdata.com/)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Panic on ZFS startup after crash

2008-07-21 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2008 at 03:49:24PM +0200, Daniel Eriksson wrote:
 Pawel Jakub Dawidek wrote:
 
  I'm afraid your pool's metadata is
  somehow corrupted that ZFS can't handle that.
 
 Yes, that's my conclusion also. It looks like the intent log is messed
 up enough to trigger an assert while ZFS tries to parse/replay it.
 
  I saw warnings in your
  first e-mail about ZFS not beeing able to replay ZIL. Can you try
  disabling ZIL? Something like:
 
 I've already tried this, and it made no difference. When the box crashed
 ZIL was enabled, and for some reason garbage got written into the ZIL.
 Now whenever ZFS tries to import the pool it sees a non-empty ZIL and
 tries to parse/replay it.
 
 Is there an easy way to trick ZFS into thinking the ZIL is empty?

I'll check that.

  If that won't help, I'm afraid the last suggestion I can
  provide is to try the lastest ZFS version (I can prepare a 
  patch for you in a few days).
 
 I could probably prepare a temporary install of 8-CURRENT on a spare
 drive and boot from that if it's easier for you to make a patch against
 CURRENT instead of RELENG_7_0.

The ZFS code in 7.0 is the same as in HEAD, so no worries.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp0sRKS33wJ4.pgp
Description: PGP signature


Re: Panic on ZFS startup after crash

2008-07-21 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2008 at 03:51:56PM +0200, Pawel Jakub Dawidek wrote:
 On Mon, Jul 21, 2008 at 03:49:24PM +0200, Daniel Eriksson wrote:
  Pawel Jakub Dawidek wrote:
  
   I'm afraid your pool's metadata is
   somehow corrupted that ZFS can't handle that.
  
  Yes, that's my conclusion also. It looks like the intent log is messed
  up enough to trigger an assert while ZFS tries to parse/replay it.
  
   I saw warnings in your
   first e-mail about ZFS not beeing able to replay ZIL. Can you try
   disabling ZIL? Something like:
  
  I've already tried this, and it made no difference. When the box crashed
  ZIL was enabled, and for some reason garbage got written into the ZIL.
  Now whenever ZFS tries to import the pool it sees a non-empty ZIL and
  tries to parse/replay it.
  
  Is there an easy way to trick ZFS into thinking the ZIL is empty?
 
 I'll check that.

Ok. We may try not to replay the ZIL, but leave it there and see what
will happen. We can also try to destroy the ZIL without replaying it.

What we do from now on can mess up your pool even further, so you may
want to backup entire disks if you want.

To skip replaying the ZIL you need to edit
/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c file, find
zil_replay() function and make the head of it looks like this:

void
zil_replay(objset_t *os, void *arg, uint64_t *txgp,
zil_replay_func_t *replay_func[TX_MAX_TYPE])
{
zilog_t *zilog = dmu_objset_zil(os);
const zil_header_t *zh = zilog-zl_header;
zil_replay_arg_t zr;

/* XXX: Try to skip the ZIL replay. */
return;

if (zil_empty(zilog)) {
zil_destroy(zilog, B_TRUE);
return;
}
[...]

If that won't work, we can try to destroy the ZIL without replaying it:

void
zil_replay(objset_t *os, void *arg, uint64_t *txgp,
zil_replay_func_t *replay_func[TX_MAX_TYPE])
{
zilog_t *zilog = dmu_objset_zil(os);
const zil_header_t *zh = zilog-zl_header;
zil_replay_arg_t zr;

/* XXX: Destroy the ZIL without replaying it. */
zil_destroy(zilog, B_FALSE);
return;

if (zil_empty(zilog)) {
zil_destroy(zilog, B_TRUE);
return;
}
[...]

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpibWYBElZFD.pgp
Description: PGP signature


Re: Panic on ZFS startup after crash

2008-07-21 Thread Nenhum_de_Nos
 The ZFS code in 7.0 is the same as in HEAD, so no worries.

I'm trying zfs myself in a small enviroment at home, but for that I do
follow 7-STABLE. there's no need to do that, as based in the above
statement ?

thanks,

matheus

-- 
We will call you cygnus,
The God of balance you shall be

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Panic on ZFS startup after crash

2008-07-20 Thread Daniel Eriksson
Pawel Jakub Dawidek wrote:

 Can you try this patch?
 
   http://people.freebsd.org/~pjd/patches/space_map.c.patch

Now it panics (solaris assert) at line 431 in dmu.c. I'll try to get a
backtrace in a day or two if it would help.

Any other suggestions Pawel?

___
Daniel Eriksson (http://www.toomuchdata.com/)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Panic on ZFS startup after crash

2008-07-19 Thread Alex Trull
I've tried neither of these in your particular case, but they might be
worth a try:

Just a suggestion, but try specify vfs.zfs.zil_disable=1 or as a kernel
variable in the boot cli.

You may want to try export and import the pool and see how it likes it
then.

--
Alex

On Sat, 2008-07-19 at 10:51 +0200, Daniel Eriksson wrote:
 I have a large ZFS pool that seems to be partially corrupt, causing a
 panic on ZFS startup. This is on a RELENG_7_0 machine.
 
 This is what happens when I try to start ZFS (written down by hand):
 
 ZFS: WARNING: can't process intent log for tank02/home
 ZFS: WARNING: can't process intent log for tank02
 panic: solaris assert: dmu_read(os, smo-smo_object, offset, size,
 entry_map) == 0 (0x5 == 0x0), file:
 /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/spa
 ce_map.c, line: 341
 
 The pool sits on top of a geli-encrypted hardware raid-array (Highpoint
 RocketRAID 2340, 8 x 500GB in RAID-5 config). Unfortunately the array
 broke (2 drives disconnected) due to a bad PSU, and this eventually
 crashed the box. When I restarted the box the above message showed up as
 soon as I started ZFS.
 
 It is my understanding that the intent log is emptied on clean shutdown,
 and if it is not empty during startup ZFS tries to replay the
 transactions recorded in it. I assume the initial crash left the intent
 log in an inconsistent state and that ZFS panics on startup due to badly
 formatted data in the intent log.
 
 Is there any way I can recover this pool?
 
 
 ___
 Daniel Eriksson (http://www.toomuchdata.com/)
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


RE: Panic on ZFS startup after crash

2008-07-19 Thread Daniel Eriksson
Alex Trull wrote:

 I've tried neither of these in your particular case, but they might be
 worth a try:
 
 Just a suggestion, but try specify vfs.zfs.zil_disable=1 or 
 as a kernel
 variable in the boot cli.
 
 You may want to try export and import the pool and see how it likes it
 then.

I considered disabling the zil but didn't think it would make a
difference (because zfs will most likely check if there is any data in
zil on startup anyway). I'll try it though!

Importing the pool results in the exact same problem. (I couldn't export
the live pool because the box paniced, but by making sure the vdev was
unavailable on zfs startup I could export the stale pool, enable the
vdev (by doing a geli attach) and then I tried to import the pool. No
luck, same panic message.

___
Daniel Eriksson (http://www.toomuchdata.com/)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Panic on ZFS startup after crash

2008-07-19 Thread Daniel Eriksson
Alex Trull wrote:

 Just a suggestion, but try specify vfs.zfs.zil_disable=1 or 
 as a kernel variable in the boot cli.

I just tried this and unfortunately it didn't work. I got the exact same
kernel panic.

I've been looking through the code to try to find a way to fool ZFS into
thinking the intent log. Anyone familiar with the code that could point
me in the right direction?

___
Daniel Eriksson (http://www.toomuchdata.com/)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Panic on ZFS startup after crash

2008-07-19 Thread Dmitry Morozovsky
On Sat, 19 Jul 2008, Daniel Eriksson wrote:

DE  Just a suggestion, but try specify vfs.zfs.zil_disable=1 or 
DE  as a kernel variable in the boot cli.
DE 
DE I just tried this and unfortunately it didn't work. I got the exact same
DE kernel panic.
DE 
DE I've been looking through the code to try to find a way to fool ZFS into
DE thinking the intent log. Anyone familiar with the code that could point
DE me in the right direction?

You may find useful trying to ask Pawel (pjd@) directly. I'd CC:d him to 
simplify process ;-)


Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: [EMAIL PROTECTED] ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Panic on ZFS startup after crash

2008-07-19 Thread Pawel Jakub Dawidek
On Sat, Jul 19, 2008 at 10:51:21AM +0200, Daniel Eriksson wrote:
 
 I have a large ZFS pool that seems to be partially corrupt, causing a
 panic on ZFS startup. This is on a RELENG_7_0 machine.
 
 This is what happens when I try to start ZFS (written down by hand):
 
 ZFS: WARNING: can't process intent log for tank02/home
 ZFS: WARNING: can't process intent log for tank02
 panic: solaris assert: dmu_read(os, smo-smo_object, offset, size,
 entry_map) == 0 (0x5 == 0x0), file:
 /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/spa
 ce_map.c, line: 341
 
 The pool sits on top of a geli-encrypted hardware raid-array (Highpoint
 RocketRAID 2340, 8 x 500GB in RAID-5 config). Unfortunately the array
 broke (2 drives disconnected) due to a bad PSU, and this eventually
 crashed the box. When I restarted the box the above message showed up as
 soon as I started ZFS.
 
 It is my understanding that the intent log is emptied on clean shutdown,
 and if it is not empty during startup ZFS tries to replay the
 transactions recorded in it. I assume the initial crash left the intent
 log in an inconsistent state and that ZFS panics on startup due to badly
 formatted data in the intent log.
 
 Is there any way I can recover this pool?

Can you try this patch?

http://people.freebsd.org/~pjd/patches/space_map.c.patch

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpqlrzkruUGw.pgp
Description: PGP signature