On Sun, May 13, 2018 at 10:08 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > On 2018年05月12日 13:08, james harvey wrote: >> Hardware is fine. Passes memtest86+ in SMP mode. Works fine on all >> other files. >> >> >> >> [ 381.869940] BUG: unable to handle kernel paging request at >> 0000000000390e50 >> [ 381.870881] BTRFS: decompress failed >> [ 381.891775] IP: rebalance_domains+0x8a/0x2c0 > > The interesting part here is, btrfs is not showing up the call trace, > not even lzo code. > (Despite of the "decompress failed" message). > Maybe some corrupted data is screwing up some random kernel memory?
I've been surprised by this too. I've seen a few "styles" of crashes from this. The fuller version of the one I posted in original post: https://bugzilla.kernel.org/attachment.cgi?id=275949 One that starts with a "general protection fault": https://bugzilla.kernel.org/attachment.cgi?id=275951 And my most recent version, starts with "BTRFS: decompress failed" then "BUG: unable to handle kernel NULL pointer dereference at 0000000000000001": https://bugzilla.kernel.org/attachment.cgi?id=275961 This latest one does have a call trace including btrfs. The top of the call trace is "end_compressed_bio_read+0x34e/0x3d0 [btrfs]", and although it includes the word compressed, I'm not sure that's actually having to do with lzo compression. The call stack doesn't scream that to me. It seems like when the invalid decompression happens, that code itself doesn't give any kernel errors, but the rest of the kernel starts spazzing. I've replicated this probably about 15 times now. Only happens on these files that have inconsistent mirrored data. > Would you please get the inode number of that corrupted files, and throw > it through btrfs-debug-tree? > > # btrfs-debug-tree -t <subvol_id> <device> | grep -A 50 \(<INO> > > This is the preferred method as it would provide all the details we > need. But since it could contain sensitive info like filename, please > double check before posting it. # ls -i system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal 291489 system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal # ls -i user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal 72267 user-1000@b70add0ef010457d933fec23a2afa48a-0000000000000495-00053b6b6e65e9cf.journal # btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(291489 > debug.tree.291489 Available at: http://termbin.com/kegj # btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(72267 > debug.tree.72267 Available at: http://termbin.com/xhdc > Or fiemap of that file could also help: > > # xfs_io -c "fiemap -v" <corrupted_file> > > This is completely safe, but I'm not 100% sure about if the info is enough. # xfs_io -c "fiemap -v" system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal Available at: http://termbin.com/nsej # xfs_io -c "fiemap -v" system@00fa3c0596e64d2e84096520ca46f008-0000000000000001-00053cd2c1756577.journal Available at: http://termbin.com/4fiz -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html