On Thu, Oct 25, 2018 at 9:56 AM, Dmitry Katsubo <dm...@mail.ru> wrote:
> Dear btrfs community,
>
> My excuses for the dumps for rather old kernel (4.9.25), nevertheless I
> wonder
> about your opinion about the below reported kernel crashes.
>
> As I could understand the situation (correct me if I am wrong), it happened
> that some data block became corrupted which resulted the following kernel
> trace
> during the boot:
>
> kernel BUG at /build/linux-fB36Cv/linux-4.9.25/fs/btrfs/extent_io.c:2318!
> invalid opcode: 0000 [#1] SMP
> Call Trace:
>  [<f8c63739>] ? end_bio_extent_readpage+0x4e9/0x680 [btrfs]
>  [<f8c951eb>] ? end_compressed_bio_read+0x3b/0x2d0 [btrfs]
>  [<f8c771de>] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs]
>  [<de07ebb1>] ? process_one_work+0x141/0x380
>  [<de07ee31>] ? worker_thread+0x41/0x460
>  [<de0840e4>] ? kthread+0xb4/0xd0
>  [<de07edf0>] ? process_one_work+0x380/0x380
>  [<de084030>] ? kthread_park+0x50/0x50
>  [<de5aae03>] ? ret_from_fork+0x1b/0x28
>
> The problematic file turned out to be the one used by systemd-journald
> /var/log/journal/c496cea41ebc4700a0dfaabf64a21be4/system.journal
> which was trying to read it (or append to it) during the boot and that was
> causing the system crash (see attached bootN_dmesg.txt).
>
> I've rebooted in safe mode and tried to copy the data from this partition to
> another location using btrfs-restore, however kernel was crashing as well
> with
> a bit different symphom (see attached copyN_dmesg.txt):
>
> Call Trace:
>  [<f8b4c760>] ? lzo_decompress_biovec+0x1b0/0x2b0 [btrfs]
>  [<d71a8828>] ? vmalloc+0x38/0x40
>  [<f8b4d415>] ? end_compressed_bio_read+0x265/0x2d0 [btrfs]
>  [<f8b2f1de>] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs]
>  [<d707ebb1>] ? process_one_work+0x141/0x380
>  [<d707ee31>] ? worker_thread+0x41/0x460
>  [<d70840e4>] ? kthread+0xb4/0xd0
>  [<d75aae03>] ? ret_from_fork+0x1b/0x28
>
> Just to keep away from the problem, I've removed this file and also removed
> "compress=lzo" mount option.
>
> Are there any updates / fixes done in that area? Is lzo option safe to use?


It should be safe even with that kernel. I'm not sure this is
compression related. There is a corruption bug related to inline
extents and corruption that had been fairly elusive but I think it's
fixed now. I haven't run into it though.

I would say the first step no matter what if you're using an older
kernel, is to boot a current Fedora or Arch live or install media,
mount the Btrfs and try to read the problem files and see if the
problem still happens. I can't even being to estimate the tens of
thousands of line changes since kernel 4.9.

What profile are you using for this Btrfs? Is this a raid56? What do
you get for 'btrfs fi us <mountpoint>' ?



-- 
Chris Murphy

Reply via email to