On Tue, Jan 15, 2019 at 5:04 AM David Sterba <dste...@suse.cz> wrote:
>
> On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote:
> > Super nice move, it shows the corruption and the cause.
> >
> >       item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
> >       item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
> >       item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33
>
> The key order is the most frequent and also very reliable report of the
> memory bitlips. I think we should add an unconditional check before a
> leaf or node is written so we catch such errors before the bad data hit
> the disk.
>
> This seems to happen way too often, I believe the check overhead would
> be acceptable and at least give early warning.

What about out of tree or proprietary modules tainting the kernel? Or
other corruptions we see that aren't key order related, like the
several recent "unable to find ref byte" reports? Are these memory
corruption related, or are they non-Btrfs bugs causing such
corruption? Does it make any sense for users who are running
proprietary or out of tree kernels to run with slub_debug=F or even
FZP and possibly get a better idea what category the corruption is in?

I guess what I'm getting at is, users get a corrupt file system, they
can't repair it (honestly the tools are not good enough, and aren't
user friendly), so we tell them OK just start over with a new file
system. It would be better if there's some additional advice to give
them to try and find out what caused the corruption to begin with,
rather than just start over and maybe run into the same problem again.


-- 
Chris Murphy

Reply via email to