Re: Kernel bug during RAID1 replace

Chris Murphy Tue, 28 Jun 2016 21:26:01 -0700

On Tue, Jun 28, 2016 at 4:52 PM, Saint Germain <saint...@gmail.com> wrote:


> Well I made a ddrescue image of both drives (only one error on sdb
> during ddrescue copy) and started the computer again (after
> disconnecting the old drives).

What was the error? Any kernel message at the time of this error?



> I don't know if I should continue trying to repair this RAID1 or if I
> should just cp/rsync to a new BTRFS volume and get done with it.

Well for sure already you should prepare to lose this volume, so
whatever backup you need, do that yesterday.

> On the other hand it seems interesting to repair instead of just giving
> up. It gives a good look at BTRFS resiliency/reliability.

On the one hand Btrfs shouldn't become inconsistent in the first
place, that's the design goal. On the other hand, I'm finding from the
problems reported on the list that Btrfs increasingly mounts at least
read only and allows getting data off, even when the file system isn't
fully functional or repairable.

In your case, once there are metadata problems even with raid 1, it's
difficult at best. But once you have the backup you could try some
other things once it's certain the hardware isn't adding to the
problems, which I'm still not yet certain of.



>
> Here is the log from the mount to the scrub aborting and the result
> from smartctl.
>
> Thanks for your precious help so far.
>
>
> BTRFS error (device sdb1): cleaner transaction attach returned -30

Not sure what this is. The Btrfs cleaner is used to remove snapshots,
decrement extent reference count, and if the count is 0, then free up
that space. So, why is it running? I don't know what -30 means.


> BTRFS info (device sdb1): disk space caching is enabled
> BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, flush 
> 7928, corrupt 1714507, gen 1335
> BTRFS info (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 
> 21622, gen 24

I missed something the first time around in these messages: the
generation error. Both drives have generation errors. A generation
error on a single drive means that drive was not successfully being
written to or was missing. For it to happen on both drives is bad. If
it happens to just one drive, once it's reappears it will be passively
caught up to the other one as reads happen, but best practice for now
requires the user to run scrub or balance. If that doesn't happen and
a 2nd drive vanishes or has write errors that cause generation
mismatches, now both drives are simultaneously behind and ahead of
each other. Some commits went to one drive, some went to the other.
And right now Btrfs totally flips out and will irreparably get
corrupted.

So I have to ask if this volume was ever mounted degraded? If not you
really need to look at logs and find out why the drives weren't being
written to. sdb show lots of write, flush, corruption and generation
errors, so it seems like it was having a hardware issue. But then sda
has only corruptions and generation problems, as if it wasn't even
connected or powered on.

OR another possibility is one of the drives was previously cloned
(block copied), or snapshot via LVM and you ran into the block level
copies gotcha:
https://btrfs.wiki.kernel.org/index.php/Gotchas



> BTRFS warning (device sdb1): checksum error at logical 93445255168 on dev 
> /dev/sdb1, sector 54528696, root 5, inode 3434831, offset 479232, length 
> 4096, links 1 (path: user/.local/share/zeitgeist/activity.sqlite-wal)

Some extent data and its checksum don't match, on sdb. So this file is
considered corrupt. Maybe the data is OK and the checksum is wrong?

> btrfs_dev_stat_print_on_error: 164 callbacks suppressed
> BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, flush 
> 7928, corrupt 1714508, gen 1335
> scrub_handle_errored_block: 164 callbacks suppressed
> BTRFS error (device sdb1): unable to fixup (regular) error at logical 
> 93445255168 on dev /dev/sdb1

And it can't be fixed, because...

> BTRFS warning (device sdb1): checksum error at logical 93445255168 on dev 
> /dev/sda1, sector 77669048, root 5, inode 3434831, offset 479232, length 
> 4096, links 1 (path: user/.local/share/zeitgeist/activity.sqlite-wal)


The same block on sda also doesn't match checksum. So either both
checksums are wrong, or both datas are wrong.

You can make these errors "go away" by using btrfs check --repair
--init-csum-tree but what this does it it will totally paper over any
real corruptions. You will have no idea if they're really corrupt or
not without checking them. Looks like most of the messages have to do
with files, not metadata although I didn't look at every single line.

I think the generations between the two drives is too far off for them
to be put back together again. But if the --init-csum-tree starts to
clean up the data related errors, you could use rsync -c to compare
the files to a backup and see if they are the same and further inspect
to see if they're corrupt or not.

You definitely don't want corrupt files propagating into your future
backups. That's bad news.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Kernel bug during RAID1 replace

Reply via email to