Re: Oops when mounting btrfs partition

Arnd Bergmann Sat, 02 Feb 2013 09:58:38 -0800

On Saturday 02 February 2013 10:20:35 Chris Mason wrote:
> Hi Arnd,
> 
> First things first, nospace_cache is a safe thing to use.  It is slow
> because it's finding free extents, but it's just a cache and always safe
> to discard.  With your other errors, I'd just mount it readonly
> and then you won't waste time on atime updates.


Ok, I see. Thanks for taking a look so quickly.

> I'll take a look at the BUG you got during log recovery.  We've fixed a
> few of those during the 3.8 rc cycle.

Well, it happened on 3.8-rc4 and on 3.5 here, so I'd guess it's a
different one.

> > Feb  1 22:57:37 localhost kernel: [ 8561.599482] Kernel BUG at 
> > ffffffffa01fdcf7 [verbose debug info unavailable]
> 
> > Jan 14 19:18:42 localhost kernel: [1060055.746373] btrfs csum failed ino 
> > 15619835 off 454656 csum 2755731641 private 864823192
> > Jan 14 19:18:42 localhost kernel: [1060055.746381] btrfs: bdev /dev/sdb1 
> > errs: wr 0, rd 0, flush 0, corrupt 17, gen 0
> > ...
> > Jan 21 16:35:40 localhost kernel: [1655047.701147] parent transid verify 
> > failed on 17006399488 wanted 54700 found 54764
> 
> These aren't good.  With a few exceptions for really tight races in fsx
> use cases, csum errors are bad data from the disk.  The transid verify
> failed shows we wanted to find a metadata block from generation 54700
> but found 54764 instead:
> 
> 54700 = 0xD5AC
> 54764 = 0xD5EC
> 
> This same bad block comes up a few different times.

The machine has had problems with data consistency in the past, so
I'm not too surprised with getting a single-bit error, although this
is the first time in a year that I've seen problems, and I replaced
the faulty memory modules some time ago.

Anyway, I already ordered a replacement box a few weeks ago, and that
one will have ECC memory besides being a modern Opteron system to replace
the aging Core 2.

> > Jan 21 16:35:40 localhost kernel: [1655047.752692] btrfs read error 
> > corrected: ino 1 off 17006399488 (dev /dev/sdb1 sector 64689288)
> 
> This shows we pulled from the second copy of this block and got the
> right answer, and then wrote the right answer to the duplicate.
> Inode 1 means it was metadata.
> 
> But for some reason still aborted the transaction.  It could have been
> an EIO on the correction, but the auto correction code in 3.5 did work
> well.
> 
> I think your plan to pull the data off and reformat is a good one.  I'd
> also look hard at your ram since drives don't usually send back single bit
> errors.

Ok. I'll wait before reformmatting though, in case you need to take
a look at the data later to find out why it crashed without fsck finding
a problem.

        Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oops when mounting btrfs partition

Reply via email to