On Mon, Jun 24, 2019 at 10:01:31AM +0000, Martin Raiber wrote:
> I've fixed the same problem(s) by increasing the global metadata size as
> well. Though I haven't encountered them since Josef Bacik's block rsv
> rework in 5.0.

Now _that_ is interesting--for me, this issue never happened in five
years _before_ 5.0.

> Another problem with increasing the global metadata size is, that I
> think it is the only way dirty metadata is throttled. If increased too
> much (as a percentage of RAM) the system goes OOM depending on work
> load. As far as I can see dirty metdata isn't included into the
> dirty_ratio calculation as well, causing issues on that front as well.
> Another thing that I think helps is to run with "nodatasum" -- probably
> because then less metadata needs to be changed when deleting.
> 
> On 23.06.2019 16:14 Zygo Blaxell wrote:
> > On Tue, Apr 23, 2019 at 07:06:51PM -0400, Zygo Blaxell wrote:
> >> I had a test filesystem that ran out of unallocated space, then ran
> >> out of metadata space during a snapshot delete, and forced readonly.
> >> The workload before the failure was a lot of rsync and bees dedupe
> >> combined with random snapshot creates and deletes.
> > Had this happen again on a production filesystem, this time on 5.1.11,
> > and it happened during orphan inode cleanup instead of snapshot delete:
> >
> >     [14303.076134][T20882] BTRFS: error (device dm-21) in 
> > add_to_free_space_tree:1037: errno=-28 No space left
> >     [14303.076144][T20882] BTRFS: error (device dm-21) in 
> > __btrfs_free_extent:7196: errno=-28 No space left
> >     [14303.076157][T20882] BTRFS: error (device dm-21) in 
> > btrfs_run_delayed_refs:3008: errno=-28 No space left
> >     [14303.076203][T20882] BTRFS error (device dm-21): Error removing 
> > orphan entry, stopping orphan cleanup
> >     [14303.076210][T20882] BTRFS error (device dm-21): could not do orphan 
> > cleanup -22
> >     [14303.076281][T20882] BTRFS error (device dm-21): commit super ret -30
> >     [14303.357337][T20882] BTRFS error (device dm-21): open_ctree failed
> >
> > Same fix:  I bumped the reserved size limit from 512M to 2G and mounted
> > normally.  (OK, technically, I booted my old 5.0.21 kernel--but my 5.0.21
> > kernel has the 2G reserved space patch below in it.)
> >
> > I've not been able to repeat this ENOSPC behavior under test conditions
> > in the last two months of trying, but it's now happened twice in different
> > places, so it has non-zero repeatability.
> >
> >> I tried the usual fix strategies:
> >>
> >>    1.  Immediately after mount, try to balance to free space for
> >>    metadata
> >>
> >>    2.  Immediately after mount, add additional disks to provide
> >>    unallocated space for metadata
> >>
> >>    3.  Mount -o nossd to increase metadata density
> >>
> >> #3 had no effect.  #1 failed consistently.
> >>
> >> #2 was successful, but the additional space was not used because
> >> btrfs couldn't allocate chunks for metadata because it ran out of
> >> metadata space for new metadata chunks.
> >>
> >> When btrfs-cleaner tried to remove the first pending deleted snapshot,
> >> it started a transaction that failed due to lack of metadata space.
> >> Since the transaction failed, the filesystem reverts to its earlier state,
> >> and exactly the same thing happens on the next mount.  The 'btrfs dev
> >> add' in #2 is successful only if it is executed immediately after mount,
> >> before the btrfs-cleaner thread wakes up.
> >>
> >> Here's what the kernel said during one of the attempts:
> >>
> >>    [41263.822252] BTRFS info (device dm-3): use zstd compression, level 0
> >>    [41263.825135] BTRFS info (device dm-3): using free space tree
> >>    [41263.827319] BTRFS info (device dm-3): has skinny extents
> >>    [42046.463356] ------------[ cut here ]------------
> >>    [42046.463387] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: 
> >> errno=-28 No space left
> >>    [42046.463404] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: 
> >> errno=-28 No space left
> >>    [42046.463407] BTRFS info (device dm-3): forced readonly
> >>    [42046.463414] BTRFS: error (device dm-3) in 
> >> btrfs_run_delayed_refs:3011: errno=-28 No space left
> >>    [42046.463429] BTRFS: error (device dm-3) in 
> >> btrfs_create_pending_block_groups:10517: errno=-28 No space left
> >>    [42046.463548] BTRFS: error (device dm-3) in 
> >> btrfs_create_pending_block_groups:10520: errno=-28 No space left
> >>    [42046.471363] BTRFS: error (device dm-3) in 
> >> btrfs_run_delayed_refs:3011: errno=-28 No space left
> >>    [42046.471475] BTRFS: error (device dm-3) in 
> >> btrfs_create_pending_block_groups:10517: errno=-28 No space left
> >>    [42046.471506] BTRFS: error (device dm-3) in 
> >> btrfs_create_pending_block_groups:10520: errno=-28 No space left
> >>    [42046.473672] BTRFS: error (device dm-3) in btrfs_drop_snapshot:9489: 
> >> errno=-28 No space left
> >>    [42046.475643] WARNING: CPU: 0 PID: 10187 at 
> >> fs/btrfs/extent-tree.c:7056 __btrfs_free_extent+0x364/0xf60
> >>    [42046.475645] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache 
> >> dm_persistent_data dm_bio_prison dm_bufio joydev ppdev crct10dif_pclmul 
> >> crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod snd_pcm aesni_intel 
> >> aes_x86_64 snd_timer crypto_simd cryptd glue_helper sr_mod snd cdrom 
> >> psmouse sg soundcore input_leds pcspkr serio_raw ide_pci_generic i2c_piix4 
> >> bochs_drm parport_pc piix rtc_cmos floppy parport pcc_cpufreq ide_core 
> >> qemu_fw_cfg evbug evdev ip_tables x_tables ipv6 crc_ccitt autofs4
> >>    [42046.475677] CPU: 0 PID: 10187 Comm: btrfs-transacti Tainted: G    B  
> >>  W         5.0.8-zb64-10a85e8a1569+ #1
> >>    [42046.475678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> >> BIOS 1.10.2-1 04/01/2014
> >>    [42046.475681] RIP: 0010:__btrfs_free_extent+0x364/0xf60
> >>    [42046.475684] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe 
> >> ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 00 85 be b8 e8 3c 1b 9b ff 
> >> <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
> >>    [42046.475685] RSP: 0018:ffff888103d8f8e0 EFLAGS: 00010282
> >>    [42046.475688] RAX: 0000000000000000 RBX: ffff88802d9ce370 RCX: 
> >> ffffffffb7224ca2
> >>    [42046.475689] RDX: 0000000000000001 RSI: 0000000000000008 RDI: 
> >> ffff88811960dfac
> >>    [42046.475691] RBP: ffff888103d8fa98 R08: ffffed10232c24c1 R09: 
> >> ffffed10232c24c0
> >>    [42046.475693] R10: ffff888025c94aeb R11: ffffed10232c24c1 R12: 
> >> 0000000000a5054e
> >>    [42046.475694] R13: 0000000000003000 R14: 000000000000214a R15: 
> >> ffff888103d8fa70
> >>    [42046.475696] FS:  0000000000000000(0000) GS:ffff888119400000(0000) 
> >> knlGS:0000000000000000
> >>    [42046.475698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>    [42046.475700] CR2: 00007f1b84271718 CR3: 00000000a3636001 CR4: 
> >> 00000000001606f0
> >>    [42046.475702] Call Trace:
> >>    [42046.475707]  ? trace_hardirqs_off+0x24/0x120
> >>    [42046.475714]  ? update_block_group+0x6d0/0x6d0
> >>    [42046.475718]  ? __lock_acquire+0xf8/0x26e0
> >>    [42046.475721]  ? __kasan_slab_free+0x14d/0x230
> >>    [42046.475724]  ? btrfs_delayed_ref_lock+0x3e/0x100
> >>    [42046.475727]  ? __lock_acquire+0xf8/0x26e0
> >>    [42046.475730]  ? kthread+0x1a9/0x200
> >>    [42046.475733]  ? debug_show_all_locks+0x210/0x210
> >>    [42046.475737]  ? __btrfs_run_delayed_refs+0xce/0x210
> >>    [42046.475742]  ? debug_show_all_locks+0x210/0x210
> >>    [42046.475747]  run_delayed_data_ref+0x15a/0x400
> >>    [42046.475752]  ? alloc_reserved_file_extent+0x520/0x520
> >>    [42046.475756]  ? rb_next+0x25/0x90
> >>    [42046.475761]  run_one_delayed_ref+0x71/0xe0
> >>    [42046.475765]  btrfs_run_delayed_refs_for_head+0x284/0x580
> >>    [42046.475772]  __btrfs_run_delayed_refs+0xeb/0x210
> >>    [42046.475777]  ? btrfs_run_delayed_refs_for_head+0x580/0x580
> >>    [42046.475781]  ? debug_show_all_locks+0x210/0x210
> >>    [42046.475786]  btrfs_run_delayed_refs+0xc0/0x260
> >>    [42046.475792]  btrfs_commit_transaction+0xf0/0x10f0
> >>    [42046.475795]  ? do_raw_spin_unlock+0xa8/0x140
> >>    [42046.475799]  ? _raw_spin_unlock+0x27/0x40
> >>    [42046.475803]  ? btrfs_record_root_in_trans+0x24/0xb0
> >>    [42046.475807]  ? btrfs_apply_pending_changes+0xb0/0xb0
> >>    [42046.475809]  ? start_transaction+0x14a/0x760
> >>    [42046.475816]  transaction_kthread+0x218/0x250
> >>    [42046.475822]  kthread+0x1a9/0x200
> >>    [42046.475824]  ? btrfs_cleanup_transaction+0xb80/0xb80
> >>    [42046.475827]  ? kthread_park+0xb0/0xb0
> >>    [42046.475831]  ret_from_fork+0x3a/0x50
> >>    [42046.475837] irq event stamp: 0
> >>    [42046.475839] hardirqs last  enabled at (0): [<0000000000000000>]      
> >>      (null)
> >>    [42046.475843] hardirqs last disabled at (0): [<ffffffffb70efe95>] 
> >> copy_process.part.7+0xad5/0x3b80
> >>    [42046.475846] softirqs last  enabled at (0): [<ffffffffb70efe95>] 
> >> copy_process.part.7+0xad5/0x3b80
> >>    [42046.475847] softirqs last disabled at (0): [<0000000000000000>]      
> >>      (null)
> >>    [42046.475849] ---[ end trace b02d556137ea688c ]---
> >>    [42046.475852] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: 
> >> errno=-28 No space left
> >>    [42046.475859] BTRFS: error (device dm-3) in 
> >> btrfs_run_delayed_refs:3011: errno=-28 No space left
> >>    [42046.475878] BTRFS: error (device dm-3) in 
> >> btrfs_create_pending_block_groups:10517: errno=-28 No space left
> >>    [42046.475905] BTRFS: error (device dm-3) in 
> >> btrfs_create_pending_block_groups:10520: errno=-28 No space left
> >>
> >> I noticed that during the few minutes when the filesystem was running,
> >> the "global reserve" number in "btrfs fi usage" output kept going up to
> >> 503M or so, then the transaction failed.
> >>
> >> I then applied this patch to the kernel (5.0.8):
> >>
> >>    diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >>    index 83791d13c204..59ba456e2314 100644
> >>    --- a/fs/btrfs/extent-tree.c
> >>    +++ b/fs/btrfs/extent-tree.c
> >>    @@ -5790,7 +5790,7 @@ static void update_global_block_rsv(struct 
> >> btrfs_fs_info *fs_info)
> >>            spin_lock(&sinfo->lock);
> >>            spin_lock(&block_rsv->lock);
> >>
> >>    -       block_rsv->size = min_t(u64, num_bytes, SZ_512M);
> >>    +       block_rsv->size = min_t(u64, num_bytes, SZ_1G);
> >>
> >>            if (block_rsv->reserved < block_rsv->size) {
> >>                    num_bytes = btrfs_space_info_used(sinfo, true);
> >>
> >> Then mounted the filesystem.
> >>
> >> With the larger reserved block size, btrfs-cleaner was able to get
> >> past its earlier failures, allocating 3GB of new metadata chunks in the
> >> first transaction after it started deleting snapshots.  After this, the
> >> filesystem deleted some more snapshots (there are currently 2470 deleted
> >> snapshots remaining on this filesystem).
> >>
> >> This isn't really a solution, though, for two reasons:
> >>
> >> 1.  The commit that adds the limit on block_rsv->size (fdf30d1c1b3
> >> "Btrfs: limit the global reserve to 512mb") points out:
> >>
> >>     A user reported a problem where he was getting early ENOSPC with
> >>     hundreds of gigs of free data space and 6 gigs of free metadata space.
> >>     This is because the global block reserve was taking up the entire
> >>     free metadata space.  This is ridiculous, we have infrastructure in
> >>     place to throttle if we start using too much of the global reserve,
> >>     so instead of letting it get this huge just limit it to 512mb so that
> >>     users can still get work done.  This allowed the user to complete
> >>     his rsync without issues. 
> >>
> >> In other words, something should be handling the case when reserved
> >> space runs out, and that something is failing.
> >>
> >> 2.  The 1G reserved limit still isn't sufficient.  The filesystem deletes
> >> a few dozen snapshots, then fails again later:
> >>
> >>    [ 3705.477036] CPU: 2 PID: 8388 Comm: btrfs-transacti Tainted: G    B   
> >>           5.0.9-zb64-b6eed6abc880+ #3
> >>    [ 3705.481510] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> >> BIOS 1.10.2-1 04/01/2014
> >>    [ 3705.484038] RIP: 0010:__btrfs_free_extent+0x364/0xf60
> >>    [ 3705.485061] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe 
> >> ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 80 85 be 8b e8 fc 1b 9b ff 
> >> <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
> >>    [ 3705.488046] RSP: 0018:ffff8881c29cf8e0 EFLAGS: 00010282
> >>    [ 3705.488868] RAX: 0000000000000000 RBX: ffff888121dbdf20 RCX: 
> >> ffffffff8a224a92
> >>    [ 3705.490129] RDX: 0000000000000001 RSI: 0000000000000008 RDI: 
> >> ffff8881f5e0dfac
> >>    [ 3705.491246] RBP: ffff8881c29cfa98 R08: ffffed103ebc24c1 R09: 
> >> ffffed103ebc24c0
> >>    [ 3705.492376] R10: ffff88819e6d430b R11: ffffed103ebc24c1 R12: 
> >> 00000000009ecb21
> >>    [ 3705.493501] R13: 0000000000040000 R14: 00000000000021b3 R15: 
> >> ffff8881c29cfa70
> >>    [ 3705.494619] FS:  0000000000000000(0000) GS:ffff8881f5c00000(0000) 
> >> knlGS:0000000000000000
> >>    [ 3705.495883] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>    [ 3705.496781] CR2: 0000556bc833ec70 CR3: 00000001ef9e0001 CR4: 
> >> 00000000001606e0
> >>    [ 3705.497872] Call Trace:
> >>    [ 3705.498279]  ? update_block_group+0x6d0/0x6d0
> >>    [ 3705.498950]  ? btrfs_run_delayed_refs_for_head+0x352/0x580
> >>    [ 3705.499816]  ? kthread+0x1a9/0x200
> >>    [ 3705.500379]  ? debug_show_all_locks+0x210/0x210
> >>    [ 3705.501078]  ? __btrfs_run_delayed_refs+0xce/0x210
> >>    [ 3705.501813]  ? debug_show_all_locks+0x210/0x210
> >>    [ 3705.502516]  run_delayed_data_ref+0x15a/0x400
> >>    [ 3705.503193]  ? alloc_reserved_file_extent+0x520/0x520
> >>    [ 3705.503968]  ? rb_next+0x3c/0x90
> >>    [ 3705.504479]  run_one_delayed_ref+0x71/0xe0
> >>    [ 3705.505116]  btrfs_run_delayed_refs_for_head+0x284/0x580
> >>    [ 3705.505935]  __btrfs_run_delayed_refs+0xeb/0x210
> >>    [ 3705.506649]  ? btrfs_run_delayed_refs_for_head+0x580/0x580
> >>    [ 3705.507495]  ? debug_show_all_locks+0x210/0x210
> >>    [ 3705.508201]  btrfs_run_delayed_refs+0xc0/0x260
> >>    [ 3705.508912]  btrfs_commit_transaction+0xf0/0x10f0
> >>    [ 3705.509654]  ? do_raw_spin_unlock+0xa8/0x140
> >>    [ 3705.510343]  ? _raw_spin_unlock+0x27/0x40
> >>    [ 3705.510979]  ? btrfs_record_root_in_trans+0x24/0xb0
> >>    [ 3705.511748]  ? btrfs_apply_pending_changes+0xb0/0xb0
> >>    [ 3705.512533]  ? start_transaction+0x14a/0x760
> >>    [ 3705.513216]  transaction_kthread+0x218/0x250
> >>    [ 3705.513900]  kthread+0x1a9/0x200
> >>    [ 3705.514425]  ? btrfs_cleanup_transaction+0xb80/0xb80
> >>    [ 3705.515233]  ? kthread_park+0xb0/0xb0
> >>    [ 3705.515829]  ret_from_fork+0x3a/0x50
> >>    [ 3705.516408] irq event stamp: 0
> >>    [ 3705.516898] hardirqs last  enabled at (0): [<0000000000000000>]      
> >>      (null)
> >>    [ 3705.518054] hardirqs last disabled at (0): [<ffffffff8a0efe65>] 
> >> copy_process.part.7+0xad5/0x3b80
> >>    [ 3705.519419] softirqs last  enabled at (0): [<ffffffff8a0efe65>] 
> >> copy_process.part.7+0xad5/0x3b80
> >>    [ 3705.520779] softirqs last disabled at (0): [<0000000000000000>]      
> >>      (null)
> >>    [ 3705.521944] ---[ end trace 3a5f720e889bc0d3 ]---
> >>    [ 3705.522713] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: 
> >> errno=-28 No space left
> >>    [ 3705.524069] BTRFS: error (device dm-3) in 
> >> btrfs_run_delayed_refs:3011: errno=-28 No space left
> >>
> >> If I increase the reserved limit again to 2G, btrfs-cleaner gets past
> >> this shapshot (so far).  It takes more than 10 minutes to delete some
> >> of the snapshots, so this will take a while to finish all 2470.
> >>
> >> The current btrfs fi usage output worries me:
> >>
> >>    Overall:
> >>        Device size:                   2.02TiB
> >>        Device allocated:              2.01TiB
> >>        Device unallocated:           10.03GiB
> >>        Device missing:                  0.00B
> >>        Used:                          1.70TiB
> >>        Free (estimated):            296.02GiB      (min: 291.01GiB)
> >>        Data ratio:                       1.00
> >>        Metadata ratio:                   2.00
> >>        Global reserve:                2.00GiB      (used: 864.00KiB)
> >>
> >>    Data,single: Size:1.81TiB, Used:1.53TiB
> >>       /dev/mapper/vgnebtest-tvdb    928.97GiB
> >>       /dev/mapper/vgnebtest-tvdc    929.00GiB
> >>
> >>    Metadata,RAID1: Size:99.99GiB, Used:82.29GiB
> >>       /dev/mapper/vgnebtest-lvol1     5.00GiB
> >>       /dev/mapper/vgnebtest-lvol2     5.00GiB
> >>       /dev/mapper/vgnebtest-tvdb     94.99GiB
> >>       /dev/mapper/vgnebtest-tvdc     94.99GiB
> >>
> >>    System,RAID1: Size:8.00MiB, Used:240.00KiB
> >>       /dev/mapper/vgnebtest-tvdb      8.00MiB
> >>       /dev/mapper/vgnebtest-tvdc      8.00MiB
> >>
> >>    Unallocated:
> >>       /dev/mapper/vgnebtest-lvol1     5.00GiB
> >>       /dev/mapper/vgnebtest-lvol2     5.00GiB
> >>       /dev/mapper/vgnebtest-tvdb     30.00MiB
> >>       /dev/mapper/vgnebtest-tvdc      1.00MiB
> >>
> >> Note "lvol1" and "lvol2" were added after the initial failure.  
> >>
> >> There is clearly a lot of metadata space available, but the filesystem 
> >> still
> >> intermittently hits ENOSPC while trying to free space.
> >
> 
> 

Attachment: signature.asc
Description: PGP signature

Reply via email to