Hi guys, The problem happened again, but now it was way more serious. I was doing a big Tumbleweed update (4680 packages) and I got the ENOSPC during the update. To avoid being left with a broken system, as it has already happened in the past, I, unfortunately, needed to delete data that I really was not planning to. This is a disaster, because I have more than 1 TiB of **free space**.
After deleting 7GiB of data, I could run rebalance and the update finished successfully. However, the ENOSPC happened 3 more times (!) and I always needed to run rebalance to keep the update going. Sometimes, during the rebalance, I saw the message: [28736.688266] BTRFS info (device sda6): relocating block group 389998968832 flags 34 [28737.376302] BTRFS info (device sda6): found 4 extents [28737.712815] BTRFS info (device sda6): relocating block group 343760961536 flags 36 [28738.010030] BTRFS info (device sda6): relocating block group 343224090624 flags 36 [28738.343461] BTRFS info (device sda6): relocating block group 342687219712 flags 36 [28738.660023] BTRFS info (device sda6): relocating block group 342150348800 flags 36 [28738.665241] use_block_rsv: 11 callbacks suppressed [28738.665247] ------------[ cut here ]------------ [28738.665290] WARNING: CPU: 10 PID: 639 at ../fs/btrfs/extent- tree.c:8097 btrfs_alloc_tree_block+0x3f1/0x4c0 [btrfs] [28738.665292] BTRFS: block rsv returned -28 [28738.665295] Modules linked in: dm_mod fuse nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet iscsi_ibft iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT nf_reject_ipv4 iptable_raw xt_CT snd_hda_codec_hdmi snd_hda_codec_realtek nvidia_drm(PO) snd_hda_codec_generic snd_hda_intel nvidia_modeset(PO) snd_hda_codec snd_hda_core snd_hwdep iptable_filter nvidia(PO) joydev drm_kms_helper intel_rapl drm fb_sys_fops iTCO_wdt mei_wdt syscopyarea snd_pcm snd_timer iTCO_vendor_support sysfillrect sb_edac snd i2c_i801 mei_me lpc_ich edac_core sysimgblt ip6table_mangle x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel soundcore mei aes_x86_64 [28738.665359] lrw gf128mul glue_helper ablk_helper cryptd e1000e hp_wmi ioatdma fjes nf_conntrack_netbios_ns ptp shpchp pps_core sparse_keymap pcspkr mfd_core nf_conntrack_broadcast rfkill tpm_infineon tpm_tis dca tpm nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables btrfs xor raid6_pq hid_generic usbhid crc32c_intel serio_raw xhci_pci ehci_pci sr_mod firewire_ohci xhci_hcd ehci_hcd cdrom firewire_core crc_itu_t usbcore isci usb_common libsas ata_generic mpt3sas raid_class scsi_transport_sas wmi button sg [28738.665419] CPU: 10 PID: 639 Comm: systemd-journal Tainted: P W O 4.7.1-1-default #1 [28738.665421] Hardware name: Hewlett-Packard HP Z820 Workstation/158B, BIOS J63 v03.65 12/19/2013 [28738.665425] 0000000000000000 ffffffff81393104 ffff88080bc63a68 0000000000000000 [28738.665430] ffffffff8107ca1e ffff8804eaa73300 ffff88080bc63ab8 0000000000004000 [28738.665434] 0000000000000000 ffff88017be9a000 ffff880f51b31760 ffffffff8107ca8f [28738.665438] Call Trace: [28738.665464] [<ffffffff8102ed5e>] dump_trace+0x5e/0x320 [28738.665472] [<ffffffff8102f12c>] show_stack_log_lvl+0x10c/0x180 [28738.665478] [<ffffffff8102fe41>] show_stack+0x21/0x40 [28738.665486] [<ffffffff81393104>] dump_stack+0x5c/0x78 [28738.665496] [<ffffffff8107ca1e>] __warn+0xbe/0xe0 [28738.665503] [<ffffffff8107ca8f>] warn_slowpath_fmt+0x4f/0x60 [28738.665529] [<ffffffffa029d911>] btrfs_alloc_tree_block+0x3f1/0x4c0 [btrfs] [28738.665560] [<ffffffffa02846a2>] btrfs_copy_root+0xf2/0x280 [btrfs] [28738.665593] [<ffffffffa02fd141>] create_reloc_root+0x171/0x1e0 [btrfs] [28738.665623] [<ffffffffa030316f>] btrfs_init_reloc_root+0x8f/0xa0 [btrfs] [28738.665652] [<ffffffffa02ac992>] record_root_in_trans+0xb2/0x110 [btrfs] [28738.665679] [<ffffffffa02adb11>] btrfs_record_root_in_trans+0x41/0x70 [btrfs] [28738.665704] [<ffffffffa02afd00>] start_transaction+0xa0/0x4f0 [btrfs] [28738.665732] [<ffffffffa02b6153>] btrfs_dirty_inode+0x33/0xc0 [btrfs] [28738.665741] [<ffffffff8122aa59>] file_update_time+0x99/0xf0 [28738.665770] [<ffffffffa02c11a3>] btrfs_page_mkwrite+0xa3/0x450 [btrfs] [28738.665779] [<ffffffff811bd2c9>] do_page_mkwrite+0x69/0xc0 [28738.665785] [<ffffffff811c00f4>] handle_pte_fault+0xf4/0x1760 [28738.665792] [<ffffffff811c1bfe>] handle_mm_fault+0x29e/0x5a0 [28738.665798] [<ffffffff81064fc0>] __do_page_fault+0x1e0/0x510 [28738.665809] [<ffffffff816bd608>] page_fault+0x28/0x30 [28738.669296] DWARF2 unwinder stuck at page_fault+0x28/0x30 [28738.669300] Leftover inexact backtrace: [28738.669327] ---[ end trace 8ef9cfba38cc9bfc ]--- Look what happened to my METADATA during the update: 1) When the problem occured: # btrfs fi usage / Overall: Device size: 1.26TiB Device allocated: 63.07GiB Device unallocated: 1.20TiB Device missing: 0.00B Used: 50.21GiB Free (estimated): 1.20TiB (min: 612.49GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 400.00MiB (used: 0.00B) Data,single: Size:48.01GiB, Used:47.91GiB /dev/sda6 48.01GiB Metadata,DUP: Size:7.50GiB, Used:1.15GiB /dev/sda6 15.00GiB System,DUP: Size:32.00MiB, Used:16.00KiB /dev/sda6 64.00MiB Unallocated: /dev/sda6 1.20TiB 2) After deleting 7GiB of data and run rebalance: # btrfs fi usage / Overall: Device size: 1.26TiB Device allocated: 133.07GiB Device unallocated: 1.13TiB Device missing: 0.00B Used: 43.16GiB Free (estimated): 1.13TiB (min: 584.46GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 384.00MiB (used: 0.00B) Data,single: Size:48.01GiB, Used:40.94GiB /dev/sda6 48.01GiB Metadata,DUP: Size:42.50GiB, Used:1.11GiB /dev/sda6 85.00GiB System,DUP: Size:32.00MiB, Used:48.00KiB /dev/sda6 64.00MiB Unallocated: /dev/sda6 1.13TiB 3) After another rebalance (I saw the ENOSPC again): # btrfs fi usage / Overall: Device size: 1.26TiB Device allocated: 207.07GiB Device unallocated: 1.05TiB Device missing: 0.00B Used: 43.87GiB Free (estimated): 1.06TiB (min: 540.83GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 400.00MiB (used: 0.00B) Data,single: Size:42.01GiB, Used:41.57GiB /dev/sda6 42.01GiB Metadata,DUP: Size:82.50GiB, Used:1.15GiB /dev/sda6 165.00GiB System,DUP: Size:32.00MiB, Used:48.00KiB /dev/sda6 64.00MiB Unallocated: /dev/sda6 1.05TiB 4) After another rebalance (I saw the ENOSPC again): # btrfs fi usage / Overall: Device size: 1.26TiB Device allocated: 344.07GiB Device unallocated: 943.79GiB Device missing: 0.00B Used: 44.69GiB Free (estimated): 944.45GiB (min: 472.55GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 416.00MiB (used: 0.00B) Data,single: Size:43.01GiB, Used:42.34GiB /dev/sda6 43.01GiB Metadata,DUP: Size:150.50GiB, Used:1.17GiB /dev/sda6 301.00GiB System,DUP: Size:32.00MiB, Used:80.00KiB /dev/sda6 64.00MiB Unallocated: /dev/sda6 943.79GiB Yes, 150 GiB of METADATA, 3x more than my actual data. This problem is really causing me problems. I am starting to think that Tumbleweed, at least, should not choose BTRFS as the default file system, since this distribution is supposed to be stable. I think that BTRFS has some serious problems at least in kernels 4.6 and 4.7. I reported this problem more than 1 month ago, and yet nobody could provide me at least a workaround so I can keep working here. I think the best will be to format this machine (**again**) and use EXT4 of XFS, if nobody could help me to fix or avoid this problem in the following days. Best regards, Ronan Arraes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html