Dne 9.3.2018 v 20:03 Martin Svec napsal(a): > Dne 9.3.2018 v 17:36 Nikolay Borisov napsal(a): >> On 23.02.2018 16:28, Martin Svec wrote: >>> Hello, >>> >>> we have a btrfs-based backup system using btrfs snapshots and rsync. >>> Sometimes, >>> we hit ENOSPC bug and the filesystem is remounted read-only. However, >>> there's >>> still plenty of unallocated space according to "btrfs fi usage". So I think >>> this >>> isn't another edge condition when btrfs runs out of space due to fragmented >>> chunks, >>> but a bug in disk space allocation code. It suffices to umount the >>> filesystem and >>> remount it back and it works fine again. The frequency of ENOSPC seems to be >>> dependent on metadata chunks usage. When there's a lot of free space in >>> existing >>> metadata chunks, the bug doesn't happen for months. If most metadata chunks >>> are >>> above ~98%, we hit the bug every few days. Below are details regarding the >>> backup >>> server and btrfs. >>> >>> The backup works as follows: >>> >>> * Every night, we create a btrfs snapshot on the backup server and rsync >>> data >>> from a production server into it. This snapshot is then marked >>> read-only and >>> will be used as a base subvolume for the next backup snapshot. >>> * Every day, expired snapshots are removed and their space is freed. >>> Cleanup >>> is scheduled in such a way that it doesn't interfere with the backup >>> window. >>> * Multiple production servers are backed up in parallel to one backup >>> server. >>> * The backed up servers are mostly webhosting servers and mail servers, >>> i.e. >>> hundreds of billions of small files. (Yes, we push btrfs to the limits >>> :-)) >>> * Backup server contains ~1080 snapshots, Zlib compression is enabled. >>> * Rsync is configured to use whole file copying. >>> >>> System configuration: >>> >>> Debian Stretch, vanilla stable 4.14.20 kernel with one custom btrfs patch >>> (see below) and >>> Nikolay's patch 1b816c23e9 (btrfs: Add enospc_debug printing in >>> metadata_reserve_bytes) >>> >>> btrfs mount options: >>> noatime,compress=zlib,enospc_debug,space_cache=v2,commit=15 >>> >>> $ btrfs fi df /backup: >>> >>> Data, single: total=28.05TiB, used=26.37TiB >>> System, single: total=32.00MiB, used=3.53MiB >>> Metadata, single: total=255.00GiB, used=250.73GiB >>> GlobalReserve, single: total=512.00MiB, used=0.00B >>> >>> $ btrfs fi show /backup: >>> >>> Label: none uuid: a52501a9-651c-4712-a76b-7b4238cfff63 >>> Total devices 2 FS bytes used 26.62TiB >>> devid 1 size 416.62GiB used 255.03GiB path /dev/sdb >>> devid 2 size 36.38TiB used 28.05TiB path /dev/sdc >>> >>> $ btrfs fi usage /backup: >>> >>> Overall: >>> Device size: 36.79TiB >>> Device allocated: 28.30TiB >>> Device unallocated: 8.49TiB >>> Device missing: 0.00B >>> Used: 26.62TiB >>> Free (estimated): 10.17TiB (min: 10.17TiB) >>> Data ratio: 1.00 >>> Metadata ratio: 1.00 >>> Global reserve: 512.00MiB (used: 0.00B) >>> >>> Data,single: Size:28.05TiB, Used:26.37TiB >>> /dev/sdc 28.05TiB >>> >>> Metadata,single: Size:255.00GiB, Used:250.73GiB >>> /dev/sdb 255.00GiB >>> >>> System,single: Size:32.00MiB, Used:3.53MiB >>> /dev/sdb 32.00MiB >>> >>> Unallocated: >>> /dev/sdb 161.59GiB >>> /dev/sdc 8.33TiB >>> >>> Btrfs filesystem uses two logical drives in single mode, backed by >>> hardware RAID controller PERC H710; /dev/sdb is HW RAID1 consisting >>> of two SATA SSDs and /dev/sdc is HW RAID6 SATA volume. >>> >>> Please note that we have a simple custom patch in btrfs which ensures >>> that metadata chunks are allocated preferably on SSD volume and data >>> chunks are allocated only on SATA volume. The patch slightly modifies >>> __btrfs_alloc_chunk() so that its loop over devices ignores rotating >>> devices when a metadata chunk is requested and vice versa. However, >>> I'm quite sure that this patch doesn't cause the reported bug because >>> we log every call of the modified code and there're no __btrfs_alloc_chunk() >>> calls when ENOSPC is triggered. Moreover, we observed the same bug before >>> we developed the patch. (IIRC, Chris Mason mentioned that they work on >>> a similar feature in facebook, but I've found no official patches yet.) >>> >>> Dmesg dump: >>> >>> [285167.750763] use_block_rsv: 62468 callbacks suppressed >>> [285167.750764] BTRFS: block rsv returned -28 >>> [285167.750789] ------------[ cut here ]------------ >>> [285167.750822] WARNING: CPU: 5 PID: 443 at fs/btrfs/extent-tree.c:8463 >>> btrfs_alloc_tree_block+0x39b/0x4c0 [btrfs] >>> [285167.750823] Modules linked in: binfmt_misc xt_comment xt_tcpudp >>> iptable_filter nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack iptable_raw >>> ip6table_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 >>> nf_nat nf_conntr >>> [285167.750853] zstd_compress xxhash raid6_pq sd_mod crc32c_intel psmouse >>> uhci_hcd ehci_pci ehci_hcd megaraid_sas usbcore scsi_mod bnx2 >>> [285167.750861] CPU: 5 PID: 443 Comm: btrfs-transacti Tainted: G W I >>> 4.14.20-znr1+ #69 >>> [285167.750862] Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.6.3 >>> 02/01/2011 >>> [285167.750863] task: ffff9c4a1740e280 task.stack: ffffba48c1ecc000 >>> [285167.750878] RIP: 0010:btrfs_alloc_tree_block+0x39b/0x4c0 [btrfs] >>> [285167.750879] RSP: 0018:ffffba48c1ecf958 EFLAGS: 00010282 >>> [285167.750880] RAX: 000000000000001d RBX: ffff9c4a1c2ce128 RCX: >>> 0000000000000006 >>> [285167.750881] RDX: 0000000000000000 RSI: 0000000000000096 RDI: >>> ffff9c4a2fd566f0 >>> [285167.750882] RBP: 0000000000004000 R08: 0000000000000001 R09: >>> 00000000000003dc >>> [285167.750883] R10: 0000000000000001 R11: 00000000000003dc R12: >>> ffff9c4a1c2ce000 >>> [285167.750883] R13: ffff9c4a17692800 R14: 0000000000000001 R15: >>> 00000000ffffffe4 >>> [285167.750885] FS: 0000000000000000(0000) GS:ffff9c4a2fd40000(0000) >>> knlGS:0000000000000000 >>> [285167.750885] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [285167.750886] CR2: 000056250e55bfd0 CR3: 000000000ee0a003 CR4: >>> 00000000000206e0 >>> [285167.750887] Call Trace: >>> [285167.750903] __btrfs_cow_block+0x125/0x5c0 [btrfs] >>> [285167.750917] btrfs_cow_block+0xcb/0x1b0 [btrfs] >>> [285167.750929] btrfs_search_slot+0x1fd/0x9e0 [btrfs] >>> [285167.750943] lookup_inline_extent_backref+0x105/0x610 [btrfs] >>> [285167.750961] ? set_extent_bit+0x19/0x20 [btrfs] >>> [285167.750974] __btrfs_free_extent.isra.61+0xf5/0xd30 [btrfs] >>> [285167.750992] ? btrfs_merge_delayed_refs+0x63/0x560 [btrfs] >>> [285167.751006] __btrfs_run_delayed_refs+0x516/0x12a0 [btrfs] >>> [285167.751021] btrfs_run_delayed_refs+0x7a/0x270 [btrfs] >>> [285167.751037] btrfs_commit_transaction+0x3e1/0x950 [btrfs] >>> [285167.751053] ? start_transaction+0x89/0x410 [btrfs] >>> [285167.751068] transaction_kthread+0x195/0x1b0 [btrfs] >>> [285167.751071] kthread+0xfc/0x130 >>> [285167.751087] ? btrfs_cleanup_transaction+0x580/0x580 [btrfs] >>> [285167.751088] ? kthread_create_on_node+0x70/0x70 >>> [285167.751091] ret_from_fork+0x35/0x40 >>> [285167.751092] Code: ff 48 c7 c6 28 d7 44 c0 48 c7 c7 a0 21 4a c0 e8 3c a5 >>> 4b cb 85 c0 0f 84 1c fd ff ff 44 89 fe 48 c7 c7 c0 4c 45 c0 e8 80 fd f1 ca >>> <0f> ff e9 06 fd ff ff 4c 63 e8 31 d2 48 89 ee 48 89 df e8 4e eb >>> [285167.751114] ---[ end trace 8721883b5af677ec ]--- >>> [285169.096630] BTRFS info (device sdb): space_info 4 has >>> 18446744072120172544 free, is not full >>> [285169.096633] BTRFS info (device sdb): space_info total=273804165120, >>> used=269218267136, pinned=3459629056, reserved=52396032, >>> may_use=2663120896, readonly=131072 >>> [285169.096638] BTRFS: Transaction aborted (error -28) >>> [285169.096664] ------------[ cut here ]------------ >>> [285169.096691] WARNING: CPU: 7 PID: 443 at fs/btrfs/extent-tree.c:3089 >>> btrfs_run_delayed_refs+0x259/0x270 [btrfs] >>> [285169.096692] Modules linked in: binfmt_misc xt_comment xt_tcpudp >>> iptable_filter nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack iptable_raw >>> ip6table_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 >>> nf_nat nf_conntr >>> [285169.096722] zstd_compress xxhash raid6_pq sd_mod crc32c_intel psmouse >>> uhci_hcd ehci_pci ehci_hcd megaraid_sas usbcore scsi_mod bnx2 >>> [285169.096729] CPU: 7 PID: 443 Comm: btrfs-transacti Tainted: G W I >>> 4.14.20-znr1+ #69 >>> [285169.096730] Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.6.3 >>> 02/01/2011 >>> [285169.096731] task: ffff9c4a1740e280 task.stack: ffffba48c1ecc000 >>> [285169.096745] RIP: 0010:btrfs_run_delayed_refs+0x259/0x270 [btrfs] >>> [285169.096746] RSP: 0018:ffffba48c1ecfde0 EFLAGS: 00010282 >>> [285169.096747] RAX: 0000000000000026 RBX: ffff9c47990c0780 RCX: >>> 0000000000000006 >>> [285169.096748] RDX: 0000000000000000 RSI: 0000000000000082 RDI: >>> ffff9c4a2fdd66f0 >>> [285169.096749] RBP: ffff9c493d509b68 R08: 0000000000000001 R09: >>> 0000000000000403 >>> [285169.096749] R10: ffff9c49731d6620 R11: 0000000000000403 R12: >>> ffff9c4a1c2ce000 >>> [285169.096750] R13: ffff9c4a1c2cf1f0 R14: ffffffffffffffff R15: >>> 0000000000000000 >>> [285169.096751] FS: 0000000000000000(0000) GS:ffff9c4a2fdc0000(0000) >>> knlGS:0000000000000000 >>> [285169.096752] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [285169.096753] CR2: 000055e70555bfe0 CR3: 000000000ee0a005 CR4: >>> 00000000000206e0 >>> [285169.096754] Call Trace: >>> [285169.096774] btrfs_commit_transaction+0x3e1/0x950 [btrfs] >>> [285169.096790] ? start_transaction+0x89/0x410 [btrfs] >>> [285169.096806] transaction_kthread+0x195/0x1b0 [btrfs] >>> [285169.096809] kthread+0xfc/0x130 >>> [285169.096825] ? btrfs_cleanup_transaction+0x580/0x580 [btrfs] >>> [285169.096826] ? kthread_create_on_node+0x70/0x70 >>> [285169.096828] ret_from_fork+0x35/0x40 >>> [285169.096830] Code: c7 c6 20 d8 44 c0 48 89 df 44 89 04 24 e8 19 bb 09 00 >>> 44 8b 04 24 eb 86 44 89 c6 48 c7 c7 30 48 45 c0 44 89 04 24 e8 d2 40 f2 ca >>> <0f> ff 44 8b 04 24 eb c4 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 >>> [285169.096852] ---[ end trace 8721883b5af677ed ]--- >>> [285169.096918] BTRFS: error (device sdb) in btrfs_run_delayed_refs:3089: >>> errno=-28 No space left >>> [285169.096976] BTRFS info (device sdb): forced readonly >>> [285169.096979] BTRFS warning (device sdb): Skipping commit of aborted >>> transaction. >>> [285169.096981] BTRFS: error (device sdb) in cleanup_transaction:1873: >>> errno=-28 No space left >>> >>> >>> How can I help you to fix this issue? >> As a first step can you try and apply the following debug diff, recompile >> the kernel, mount again the fs with enospc_debug: >> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index d227d8514b25..501043bd92d6 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -8472,6 +8472,21 @@ use_block_rsv(struct btrfs_trans_handle *trans, >> ret = block_rsv_use_bytes(global_rsv, blocksize); >> if (!ret) >> return global_rsv; >> + else { >> + u64 reserved = 0; >> + u64 size = 0; >> + spin_lock(&global_rsv->lock); >> + reserved = global_rsv->reserved; >> + size = global_rsv->size; >> + spin_unlock(&global_rsv->lock); >> + pr_info("%s: Couldn't allocate %d bytes from global >> rsv(reserved = %llu size = %llu)\n", >> + __func__, blocksize, reserved, size); >> + } >> + } else { >> + if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) { >> + pr_info("%s: Not using global blockrsv! Current >> blockrsv->type = %d blockrsv->space_info = %p global_rsv->space_info = %p\n", >> + __func__, block_rsv->type, >> block_rsv->space_info, global_rsv->space_info); >> + } >> } >> return ERR_PTR(ret); >> } >> >> >> And then report back on the output of the extra debug >> statements. >> >> Your global rsv is essentially unused, this means >> in the worst case the code should fallback to using the global rsv >> for satisfying the memory allocation for delayed refs. So we should >> figure out why this isn't' happening. > Patch applied. Thank you very much, Nikolay. I'll let you know as soon as we > hit ENOSPC again.
There is the output: [24672.573075] BTRFS info (device sdb): space_info 4 has 18446744072971649024 free, is not full [24672.573077] BTRFS info (device sdb): space_info total=308163903488, used=304593289216, pinned=2321940480, reserved=174800896, may_use=1811644416, readonly=131072 [24672.573079] use_block_rsv: Not using global blockrsv! Current blockrsv->type = 1 blockrsv->space_info = ffff999a57db7000 global_rsv->space_info = ffff999a57db7000 [24672.573083] BTRFS: Transaction aborted (error -28) Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html