Re: fatal database corruption with btrfs "out of space" with ~50 GB left

Tomasz Chmielewski Wed, 14 Feb 2018 23:03:25 -0800

On 2018-02-15 13:32, Qu Wenruo wrote:

Is there any kernel message like kernel warning or backtrace?


I see there was this one:

Feb 13 13:53:32 lxd01 kernel: [9351710.878404] ------------[ cut here]------------Feb 13 13:53:32 lxd01 kernel: [9351710.878430] WARNING: CPU: 9 PID: 7780at /home/kernel/COD/linux/fs/btrfs/tree-log.c:3361log_dir_items+0x54b/0x560 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878431] Modules linked in:nfnetlink_queue bluetooth ecdh_generic xt_nat xt_REDIRECTnf_nat_redirect sunrpc cfg80211 tcp_diag inet_diag xt_NFLOGnfnetlink_log nfnetlink xt_conntrack ipt_REJECT nf_reject_ipv4binfmt_misc veth ebtable_filter ebtables ip6t_MASQUERADEnf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6nf_nat_ipv6 xt_comment nf_log_ipv4 nf_log_common xt_LOG ipt_MASQUERADEnf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4nf_nat_ipv4 nf_nat ip_vs nf_conntrack ip6table_filter ip6_tablesiptable_filter xt_CHECKSUM xt_tcpudp iptable_mangle ip_tables x_tablesintel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvmirqbypass btrfs bridge stp llc crct10dif_pclmul crc32_pclmulghash_clmulni_intel pcbc zstd_compress aesni_intel aes_x86_64Feb 13 13:53:32 lxd01 kernel: [9351710.878460] crypto_simd glue_helpercryptd input_leds intel_cstate ipmi_ssif intel_rapl_perf serio_rawlpc_ich shpchp ipmi_devintf ipmi_msghandler tpm_infineon acpi_padmac_hid autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pqasync_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linearttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb drmdca ahci ptp pps_core libahci i2c_algo_bit wmiFeb 13 13:53:32 lxd01 kernel: [9351710.878484] CPU: 9 PID: 7780 Comm:TaskSchedulerBa Tainted: G W 4.14.0-041400rc6-generic#201710230731Feb 13 13:53:32 lxd01 kernel: [9351710.878485] Hardware name: ASUSTeKCOMPUTER INC. Z10PA-U8 Series/Z10PA-U8 Series, BIOS 0601 06/26/2015Feb 13 13:53:32 lxd01 kernel: [9351710.878486] task: ffff9454227d1700task.stack: ffffabc6a810c000Feb 13 13:53:32 lxd01 kernel: [9351710.878502] RIP:0010:log_dir_items+0x54b/0x560 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878502] RSP:0018:ffffabc6a810f980 EFLAGS: 00010202Feb 13 13:53:32 lxd01 kernel: [9351710.878503] RAX: 0000000000000001RBX: 000000000008b771 RCX: 0000000000000000Feb 13 13:53:32 lxd01 kernel: [9351710.878504] RDX: 0000000000000000RSI: 0000000000000000 RDI: 0000000000000000Feb 13 13:53:32 lxd01 kernel: [9351710.878505] RBP: ffffabc6a810fa28R08: ffff9491a8f05540 R09: 0000000000000008Feb 13 13:53:32 lxd01 kernel: [9351710.878506] R10: 0000000000000000R11: ffffabc6a810f934 R12: ffffabc6a810fe50Feb 13 13:53:32 lxd01 kernel: [9351710.878506] R13: ffff94666d426000R14: ffff9491a8f05540 R15: 0000000000000054Feb 13 13:53:32 lxd01 kernel: [9351710.878508] FS:00007f9936e22700(0000) GS:ffff9491bf440000(0000) knlGS:0000000000000000Feb 13 13:53:32 lxd01 kernel: [9351710.878508] CS: 0010 DS: 0000 ES:0000 CR0: 0000000080050033Feb 13 13:53:32 lxd01 kernel: [9351710.878509] CR2: 00007f6abef4d7b0CR3: 00000023ecaf7006 CR4: 00000000001606e0

Feb 13 13:53:32 lxd01 kernel: [9351710.878510] Call Trace:

Feb 13 13:53:32 lxd01 kernel: [9351710.878524] ?btrfs_search_slot+0x81b/0x9c0 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878538]log_directory_changes+0x83/0xd0 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878551]btrfs_log_inode+0xa24/0x11a0 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878563] ?generic_bin_search.constprop.37+0xe7/0x1f0 [btrfs]

Feb 13 13:53:32 lxd01 kernel: [9351710.878565]  ? find_inode+0x59/0xb0

Feb 13 13:53:32 lxd01 kernel: [9351710.878567] ?iget5_locked+0x9e/0x1e0Feb 13 13:53:32 lxd01 kernel: [9351710.878582]log_new_dir_dentries+0x203/0x4a7 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878595]btrfs_log_inode_parent+0x6c2/0xa10 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878598] ?pagevec_lookup_tag+0x21/0x30Feb 13 13:53:32 lxd01 kernel: [9351710.878599] ?__filemap_fdatawait_range+0x9a/0x170Feb 13 13:53:32 lxd01 kernel: [9351710.878614] ?wait_current_trans+0x33/0x110 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878627] ?join_transaction+0x27/0x420 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878639]btrfs_log_dentry_safe+0x60/0x80 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878658]btrfs_sync_file+0x2d1/0x410 [btrfs]Feb 13 13:53:32 lxd01 kernel: [9351710.878661]vfs_fsync_range+0x4b/0xb0

Feb 13 13:53:32 lxd01 kernel: [9351710.878663]  do_fsync+0x3d/0x70
Feb 13 13:53:32 lxd01 kernel: [9351710.878668]  SyS_fdatasync+0x13/0x20
Feb 13 13:53:32 lxd01 kernel: [9351710.878670]  do_syscall_64+0x61/0x120

Feb 13 13:53:32 lxd01 kernel: [9351710.878673]entry_SYSCALL64_slow_path+0x25/0x25

Feb 13 13:53:32 lxd01 kernel: [9351710.878674] RIP: 0033:0x7f99461437dd

Feb 13 13:53:32 lxd01 kernel: [9351710.878675] RSP:002b:00007f9936e20f10 EFLAGS: 00000293 ORIG_RAX: 000000000000004bFeb 13 13:53:32 lxd01 kernel: [9351710.878676] RAX: ffffffffffffffdaRBX: 0000307d6f5d1070 RCX: 00007f99461437ddFeb 13 13:53:32 lxd01 kernel: [9351710.878677] RDX: 000000000000005cRSI: 0000000000080000 RDI: 000000000000005cFeb 13 13:53:32 lxd01 kernel: [9351710.878678] RBP: 0000000000000000R08: 0000000000000000 R09: 0000000000000000Feb 13 13:53:32 lxd01 kernel: [9351710.878679] R10: 00000000ffffffffR11: 0000000000000293 R12: 0000000000001000Feb 13 13:53:32 lxd01 kernel: [9351710.878679] R13: 0000307d6f550b00R14: 0000000000000000 R15: 0000000000001000Feb 13 13:53:32 lxd01 kernel: [9351710.878681] Code: 89 85 6c ff ff ff4c 8b 95 70 ff ff ff 74 23 4c 89 f7 e8 a9 dc f8 ff 48 8b 7d 88 e8 a0 dcf8 ff 8b 85 6c ff ff ff e9 d8 fb ff ff <0f> ff e9 35 fe ff ff 4c 89 5518 e9 56 fc ff ff e8 60 65 61 ebFeb 13 13:53:32 lxd01 kernel: [9351710.878707] ---[ end trace81aeb3fb0c68ce00 ]---



BTW we've updated to the latest 4.15 kernel after that.

Not sure if the removal of 80G has anything to do with this, but this
seems that your metadata (along with data) is quite scattered.

It's really recommended to keep some unallocated device space, and one
of the method to do that is to use balance to free such scattered space
from data/metadata usage.

And that's why balance routine is recommened for btrfs.


The balance might work on that server - it's less than 0.5 TB SSD disks.

However, on multi-terabyte servers with terabytes of data on HDD disks,running balance is not realistic. We have some servers where balance wastaking 2 months or so, and was not even 50% done. And the IO load thebalance was adding was slowing the things down a lot.



Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: fatal database corruption with btrfs "out of space" with ~50 GB left

Reply via email to