On 2018/10/26 下午10:57, Dmitry Katsubo wrote: > On 2018-10-25 20:49, Chris Murphy wrote: >> I would say the first step no matter what if you're using an older >> kernel, is to boot a current Fedora or Arch live or install media, >> mount the Btrfs and try to read the problem files and see if the >> problem still happens. I can't even being to estimate the tens of >> thousands of line changes since kernel 4.9. > > Good point Chris. Indeed booting a fresh kernel is never a problem. > Actually I forgot to mention that I've seen the same problem with > kernel 4.12.13 (attached). > >> What profile are you using for this Btrfs? Is this a raid56? What do >> you get for 'btrfs fi us <mountpoint>' ? > > It is RAID1 volume for both metadata and data, but unfortunately I > haven't recorded the actual output before the failure. The configuration > was like this: > > # btrfs filesystem show /var/log > Label: none uuid: 5b45ac8e-fd8c-4759-854a-94e45069959d > Total devices 2 FS bytes used 11.13GiB > devid 3 size 50.00GiB used 14.03GiB path /dev/sda3 > devid 4 size 50.00GiB used 14.03GiB path /dev/sdc1 > > On 2018-10-25 20:49, Chris Murphy wrote: >> It should be safe even with that kernel. I'm not sure this is >> compression related. There is a corruption bug related to inline >> extents and corruption that had been fairly elusive but I think it's >> fixed now. I haven't run into it though. > > On 2018-10-26 02:09, Qu Wenruo wrote: >>> Are there any updates / fixes done in that area? Is lzo option safe >>> to use? >> >> Yes, we have commits to harden lzo decompress code in v4.18: >> >> de885e3ee281a88f52283c7e8994e762e3a5f6bd btrfs: lzo: Harden inline lzo >> compressed extent decompression >> 314bfa473b6b6d3efe68011899bd718b349f29d7 btrfs: lzo: Add header length >> check to avoid potential out-of-bounds acc >> >> And for the root cause, it's compressed data without csum, then scrub >> could make it corrupted. >> >> It's also fixed in v4.18: >> >> 665d4953cde6d9e75c62a07ec8f4f8fd7d396ade btrfs: scrub: Don't use inode >> page cache in scrub_handle_errored_block() >> ac0b4145d662a3b9e34085dea460fb06ede9b69b btrfs: scrub: Don't use inode >> pages for device replace > > Thanks, Qu, for this information. Actually one time I've seen the binary > crap (not zeros) in text log files (/var/log/*.log) and I was surprised > that btrfs returned me data which is corrupted instead of signalling I/O > error. Could it be because of "compressed data without csum" problem?
Yes, pretty much the case, especially for your RAID1 setup. The root fix should has been backported to stable kernel after 4.0, but the lzo decompression harden part isn't sent to stable kernel, so you may still hit such problem. Thanks, Qu > > Thanks! >
signature.asc
Description: OpenPGP digital signature