On 2018-10-25 20:49, Chris Murphy wrote:
I would say the first step no matter what if you're using an older
kernel, is to boot a current Fedora or Arch live or install media,
mount the Btrfs and try to read the problem files and see if the
problem still happens. I can't even being to estimate the tens of
thousands of line changes since kernel 4.9.
Good point Chris. Indeed booting a fresh kernel is never a problem.
Actually I forgot to mention that I've seen the same problem with
kernel 4.12.13 (attached).
What profile are you using for this Btrfs? Is this a raid56? What do
you get for 'btrfs fi us <mountpoint>' ?
It is RAID1 volume for both metadata and data, but unfortunately I
haven't recorded the actual output before the failure. The configuration
was like this:
# btrfs filesystem show /var/log
Label: none uuid: 5b45ac8e-fd8c-4759-854a-94e45069959d
Total devices 2 FS bytes used 11.13GiB
devid 3 size 50.00GiB used 14.03GiB path /dev/sda3
devid 4 size 50.00GiB used 14.03GiB path /dev/sdc1
On 2018-10-25 20:49, Chris Murphy wrote:
It should be safe even with that kernel. I'm not sure this is
compression related. There is a corruption bug related to inline
extents and corruption that had been fairly elusive but I think it's
fixed now. I haven't run into it though.
On 2018-10-26 02:09, Qu Wenruo wrote:
Are there any updates / fixes done in that area? Is lzo option safe to
use?
Yes, we have commits to harden lzo decompress code in v4.18:
de885e3ee281a88f52283c7e8994e762e3a5f6bd btrfs: lzo: Harden inline lzo
compressed extent decompression
314bfa473b6b6d3efe68011899bd718b349f29d7 btrfs: lzo: Add header length
check to avoid potential out-of-bounds acc
And for the root cause, it's compressed data without csum, then scrub
could make it corrupted.
It's also fixed in v4.18:
665d4953cde6d9e75c62a07ec8f4f8fd7d396ade btrfs: scrub: Don't use inode
page cache in scrub_handle_errored_block()
ac0b4145d662a3b9e34085dea460fb06ede9b69b btrfs: scrub: Don't use inode
pages for device replace
Thanks, Qu, for this information. Actually one time I've seen the binary
crap (not zeros) in text log files (/var/log/*.log) and I was surprised
that btrfs returned me data which is corrupted instead of signalling I/O
error. Could it be because of "compressed data without csum" problem?
Thanks!
--
With best regards,
Dmitry
[Sun Dec 3 19:39:55 2017] BUG: unable to handle kernel paging request at
f80a3000
[Sun Dec 3 19:39:55 2017] IP: memcpy+0x11/0x20
[Sun Dec 3 19:39:55 2017] *pde = 370bb067
[Sun Dec 3 19:39:55 2017] *pte = 00000000
[Sun Dec 3 19:39:55 2017] Oops: 0002 [#1] SMP
[Sun Dec 3 19:39:55 2017] Modules linked in: bridge stp llc arc4 iTCO_wdt
iTCO_vendor_support ppdev ath5k evdev ath mac80211 cfg80211 i915 coretemp
pcspkr rfkill snd_hda_codec_realtek serio_raw snd_hda_codec_generic video
snd_hda_intel drm_kms_helper snd_hda_codec lpc_ich drm snd_hda_core snd_hwdep
i2c_algo_bit snd_pcm_oss snd_mixer_oss fb_sys_fops sg snd_pcm syscopyarea
snd_timer sysfillrect rng_core snd sysimgblt soundcore parport_pc parport
shpchp button acpi_cpufreq binfmt_misc w83627hf hwmon_vid ip_tables x_tables
autofs4 ses enclosure scsi_transport_sas xfs libcrc32c hid_generic usbhid hid
btrfs crc32c_generic xor raid6_pq uas usb_storage sr_mod cdrom sd_mod
ata_generic ata_piix i2c_i801 libata scsi_mod firewire_ohci firewire_core
crc_itu_t ehci_pci e1000e ptp pps_core uhci_hcd ehci_hcd usbcore usb_common
[Sun Dec 3 19:39:55 2017] CPU: 1 PID: 100 Comm: kworker/u4:2 Tainted: G
W 4.12.0-2-686 #1 Debian 4.12.13-1
[Sun Dec 3 19:39:55 2017] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS
i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[Sun Dec 3 19:39:55 2017] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[Sun Dec 3 19:39:55 2017] task: f7337280 task.stack: f695c000
[Sun Dec 3 19:39:55 2017] EIP: memcpy+0x11/0x20
[Sun Dec 3 19:39:55 2017] EFLAGS: 00010206 CPU: 1
[Sun Dec 3 19:39:55 2017] EAX: f80a2ff8 EBX: 00001000 ECX: 000003fe EDX:
ff998000
[Sun Dec 3 19:39:55 2017] ESI: ff998008 EDI: f80a3000 EBP: 00000000 ESP:
f695de88
[Sun Dec 3 19:39:55 2017] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[Sun Dec 3 19:39:55 2017] CR0: 80050033 CR2: f9c00140 CR3: 36bc7000 CR4:
000006d0
[Sun Dec 3 19:39:55 2017] Call Trace:
[Sun Dec 3 19:39:55 2017] ? lzo_decompress_bio+0x19f/0x2b0 [btrfs]
[Sun Dec 3 19:39:55 2017] ? end_compressed_bio_read+0x28d/0x360 [btrfs]
[Sun Dec 3 19:39:55 2017] ? btrfs_scrubparity_helper+0xb6/0x2c0 [btrfs]
[Sun Dec 3 19:39:55 2017] ? process_one_work+0x135/0x2f0
[Sun Dec 3 19:39:55 2017] ? worker_thread+0x39/0x3a0
[Sun Dec 3 19:39:55 2017] ? kthread+0xd7/0x110
[Sun Dec 3 19:39:55 2017] ? process_one_work+0x2f0/0x2f0
[Sun Dec 3 19:39:55 2017] ? kthread_create_on_node+0x30/0x30
[Sun Dec 3 19:39:55 2017] ? ret_from_fork+0x19/0x24
[Sun Dec 3 19:39:55 2017] Code: 43 58 2b 43 50 88 43 4e 5b eb ed 90 90 90 90
90 90 90 90 90 90 90 90 90 90 90 3e 8d 74 26 00 57 89 c7 56 89 d6 53 89 cb c1
e9 02 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 5b 5e 5f c3 3e 8d 74 26 00 55
[Sun Dec 3 19:39:55 2017] EIP: memcpy+0x11/0x20 SS:ESP: 0068:f695de88
[Sun Dec 3 19:39:55 2017] CR2: 00000000f80a3000
[Sun Dec 3 19:39:55 2017] ---[ end trace a961d395687ad265 ]---