Re: [OpenZFS Developer] [zfs] Recover from corrupted space map (illumos #4390)

George Wilson Mon, 07 Jul 2014 07:06:37 -0700

On 7/7/14, 3:33 AM, Jan Schmidt via illumos-zfs wrote:

On Wed, June 25, 2014 at 16:15 (+0200), Keith Wesolowski Via Illumos-zfs wrote:

On Wed, Jun 25, 2014 at 01:47:54PM +0200, Jan Schmidt via illumos-zfs wrote:

That patch looks somewhat promising, though I have not tried it yet. How did you
decide which of the overlapping space map ranges to drop? From my understanding,
either range might be the one that's currently correct, isn't it?

It's actually worse than that, because there are a lot of different
cases, depending on whether the overlapping ranges are alloc or free,
whether there are overlapping sub-ranges within them, whether they're
partial or complete overlaps, etc.  And then there is the possibility of
subsequent ranges that partially overlap the previous bad ones.  You
didn't mention which form of corruption you're hitting or how severe it
is, so I don't know which cases might apply to you.  zdb is helpful in
getting a handle on that.

I have a different patch (George gets most of the credit, I take most of
the blame), that I used to recover spacemap corruption we had at Joyent
(albeit from a different cause, 4504).  It's intended for one-time use;
you boot it, it fixes the spacemaps by leaking ambiguous regions,
preferring to lose a little space rather than risk later overwriting of
data, and condenses them back out, then you reboot onto normal bits
again.  This covers a lot more cases; I tested many of them, but there
may yet be edge cases that aren't addressed.  I recommend building a
libzpool with this first and trying zdb with that before booting with
the zfs module.

Thanks for the explanation. We recovered our data. Using the most recent illumos
code already helped in importing the pool read-only.

This comes with absolutely no warranty of any kind and should be used
only where dumping the data somewhere else (harder than you might think,
since you can't create snapshots in read-only mode) and recreating the
pool is not an option.  It's on you to understand what it does and why
and to satisfy yourself that it will solve your problem safely before
using it.  The comments might help a little, but you're really on your
own.

See
https://github.com/wesolows/illumos-joyent/commit/dc4d7e06c8e0af213619f0aa517d819172911005

After backing up all data, we applied this patch and the non-readonly pool
import no longer crashed, printing ...

Jul  1 11:00:44 hostname genunix: [ID 882369 kern.warning] WARNING: zfs: freeing
overlapping segments: [fba5d0cee00,fba5d0cfa00) existing segment
[fba5d05e600,fba5d0cf400)
Jul  1 11:02:59 hostname genunix: [ID 882369 kern.warning] WARNING: zfs: freeing
overlapping segments: [12cf8b202400,12cf8b203000) existing segment
[12cf8b1d0c00,12cf8b202a00)

... several times (like 10 times each). After that, a full scrub of the pool
succeeded without any messages.

Do you think it is safe to continue using the repaired pool, or would you still
recommend to recreate it?

If all of the cases were frees then you can continue using the pool andjust realize that that space has been leaked and will never beallocatable. If the amount of space is significant then you may want tojust recreate the pool.


Thanks,
George
_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

Re: [OpenZFS Developer] [zfs] Recover from corrupted space map (illumos #4390)

Reply via email to