On Mon, Mar 25, 2019 at 10:50:28PM -0400, Zygo Blaxell wrote:
> Running balance, rsync, and dedupe, I get kernel warnings every few
> minutes on 5.0.4.  No warnings on 5.0.3 under similar conditions.
> 
> Mount options are:  flushoncommit,space_cache=v2,compress=zstd.
> 
> There are two different stacks on the warnings.  This one comes from
> btrfs balance:
> 
>       [Mon Mar 25 18:22:42 2019] WARNING: CPU: 0 PID: 4786 at 
> fs/btrfs/delayed-ref.c:296 btrfs_merge_delayed_refs+0x3dc/0x410
>       [Mon Mar 25 18:22:42 2019] Modules linked in: ppdev joydev 
> crct10dif_pclmul crc32_pclmul dm_cache_smq crc32c_intel ghash_clmulni_intel 
> dm_cache dm_persistent_data dm_bio_prison dm_bufio dm_mod rtc_cmos floppy 
> parport_pc parport aesni_intel sr_mod cdrom aes_x86_64 crypto_simd cryptd 
> glue_helper sg ide_pci_generic piix ide_core evbug evdev bochs_drm i2c_piix4 
> input_leds psmouse serio_raw snd_pcm snd_timer snd soundcore pcspkr 
> qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
>       [Mon Mar 25 18:22:42 2019] CPU: 0 PID: 4786 Comm: btrfs-balance 
> Tainted: G        W         5.0.4-zb64-303ce93b05c9+ #1
>       [Mon Mar 25 18:22:42 2019] Hardware name: QEMU Standard PC (i440FX + 
> PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>       [Mon Mar 25 18:22:42 2019] RIP: 
> 0010:btrfs_merge_delayed_refs+0x3dc/0x410
>       [Mon Mar 25 18:22:42 2019] Code: 7c 24 28 be ff ff ff ff e8 31 a0 c1 ff 
> 85 c0 0f 85 7c fe ff ff 0f 0b e9 75 fe ff ff 0f 0b e9 0a fe ff ff 0f 0b e9 1b 
> fe ff ff <0f> 0b e9 25 ff ff ff 0f 0b e9 b0 fe ff ff 48 c7 44 24 10 00 00 00
>       [Mon Mar 25 18:22:42 2019] RSP: 0018:ffffabb981377b70 EFLAGS: 00010246
>       [Mon Mar 25 18:22:42 2019] RAX: 00000000000000b6 RBX: 00000000ffffffff 
> RCX: 0000000000000001
>       [Mon Mar 25 18:22:42 2019] RDX: 0000000000000000 RSI: 00000000ffffffff 
> RDI: 0000000000000282
>       [Mon Mar 25 18:22:42 2019] RBP: ffff9392514bbb80 R08: ffff9392514bbc80 
> R09: 0000000000000000
>       [Mon Mar 25 18:22:42 2019] R10: 0000000000000000 R11: 0000000000000001 
> R12: ffff93928a5c8f08
>       [Mon Mar 25 18:22:42 2019] R13: ffff939269744e38 R14: ffff939269744e38 
> R15: ffff9392697449c0
>       [Mon Mar 25 18:22:42 2019] FS:  0000000000000000(0000) 
> GS:ffff939475c00000(0000) knlGS:0000000000000000
>       [Mon Mar 25 18:22:42 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
>       [Mon Mar 25 18:22:42 2019] CR2: 00007f0882d38600 CR3: 0000000230eb6002 
> CR4: 00000000001606f0
>       [Mon Mar 25 18:22:42 2019] Call Trace:
>       [Mon Mar 25 18:22:42 2019]  __btrfs_run_delayed_refs+0x70/0x170
>       [Mon Mar 25 18:22:42 2019]  btrfs_run_delayed_refs+0x7d/0x1d0
>       [Mon Mar 25 18:22:42 2019]  btrfs_commit_transaction+0x50/0x9e0
>       [Mon Mar 25 18:22:42 2019]  ? start_transaction+0x91/0x4d0
>       [Mon Mar 25 18:22:42 2019]  relocate_block_group+0x1bd/0x600
>       [Mon Mar 25 18:22:42 2019]  btrfs_relocate_block_group+0x15b/0x260
>       [Mon Mar 25 18:22:42 2019]  btrfs_relocate_chunk+0x46/0xf0
>       [Mon Mar 25 18:22:42 2019]  btrfs_balance+0xa60/0x12b0
>       [Mon Mar 25 18:22:42 2019]  balance_kthread+0x36/0x50
>       [Mon Mar 25 18:22:42 2019]  kthread+0x106/0x140
>       [Mon Mar 25 18:22:42 2019]  ? btrfs_balance+0x12b0/0x12b0
>       [Mon Mar 25 18:22:42 2019]  ? kthread_park+0x90/0x90
>       [Mon Mar 25 18:22:42 2019]  ret_from_fork+0x3a/0x50
>       [Mon Mar 25 18:22:42 2019] irq event stamp: 81529004
>       [Mon Mar 25 18:22:42 2019] hardirqs last  enabled at (81529003): 
> [<ffffffffaa2be2b7>] kmem_cache_free+0x67/0x1e0
>       [Mon Mar 25 18:22:42 2019] hardirqs last disabled at (81529004): 
> [<ffffffffaa00379f>] trace_hardirqs_off_thunk+0x1a/0x1c
>       [Mon Mar 25 18:22:42 2019] softirqs last  enabled at (81527500): 
> [<ffffffffab0003a4>] __do_softirq+0x3a4/0x45f
>       [Mon Mar 25 18:22:42 2019] softirqs last disabled at (81527493): 
> [<ffffffffaa0a3d24>] irq_exit+0xe4/0xf0
>       [Mon Mar 25 18:22:42 2019] ---[ end trace 3d8cdfff7444099a ]---
> 
> This one comes from btrfs-transaction:
> 
>       [Mon Mar 25 18:27:58 2019] WARNING: CPU: 3 PID: 4137 at 
> fs/btrfs/delayed-ref.c:296 btrfs_merge_delayed_refs+0x3dc/0x410
>       [Mon Mar 25 18:27:58 2019] Modules linked in: ppdev joydev 
> crct10dif_pclmul crc32_pclmul dm_cache_smq crc32c_intel ghash_clmulni_intel 
> dm_cache dm_persistent_data dm_bio_prison dm_bufio dm_mod rtc_cmos floppy 
> parport_pc parport aesni_intel sr_mod cdrom aes_x86_64 crypto_simd cryptd 
> glue_helper sg ide_pci_generic piix ide_core evbug evdev bochs_drm i2c_piix4 
> input_leds psmouse serio_raw snd_pcm snd_timer snd soundcore pcspkr 
> qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
>       [Mon Mar 25 18:27:58 2019] CPU: 3 PID: 4137 Comm: btrfs-transacti 
> Tainted: G        W         5.0.4-zb64-303ce93b05c9+ #1
>       [Mon Mar 25 18:27:58 2019] Hardware name: QEMU Standard PC (i440FX + 
> PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>       [Mon Mar 25 18:27:58 2019] RIP: 
> 0010:btrfs_merge_delayed_refs+0x3dc/0x410
>       [Mon Mar 25 18:27:58 2019] Code: 7c 24 28 be ff ff ff ff e8 31 a0 c1 ff 
> 85 c0 0f 85 7c fe ff ff 0f 0b e9 75 fe ff ff 0f 0b e9 0a fe ff ff 0f 0b e9 1b 
> fe ff ff <0f> 0b e9 25 ff ff ff 0f 0b e9 b0 fe ff ff 48 c7 44 24 10 00 00 00
>       [Mon Mar 25 18:27:58 2019] RSP: 0018:ffffabb9812f7d40 EFLAGS: 00010246
>       [Mon Mar 25 18:27:58 2019] RAX: 00000000000000b6 RBX: 00000000ffffffff 
> RCX: 0000000000000001
>       [Mon Mar 25 18:27:58 2019] RDX: 0000000000000000 RSI: 00000000ffffffff 
> RDI: 0000000000000286
>       [Mon Mar 25 18:27:58 2019] RBP: ffff93925b187ce0 R08: ffff93925b187de0 
> R09: 0000000000000000
>       [Mon Mar 25 18:27:58 2019] R10: 0000000000000000 R11: 0000000000000001 
> R12: ffff93938b59d308
>       [Mon Mar 25 18:27:58 2019] R13: ffff9393a5d09820 R14: ffff9393a5d09820 
> R15: ffff939261978820
>       [Mon Mar 25 18:27:58 2019] FS:  0000000000000000(0000) 
> GS:ffff939476800000(0000) knlGS:0000000000000000
>       [Mon Mar 25 18:27:58 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
>       [Mon Mar 25 18:27:58 2019] CR2: 00007f91f6321c50 CR3: 00000001ef54e003 
> CR4: 00000000001606e0
>       [Mon Mar 25 18:27:58 2019] Call Trace:
>       [Mon Mar 25 18:27:58 2019]  __btrfs_run_delayed_refs+0x70/0x170
>       [Mon Mar 25 18:27:58 2019]  btrfs_run_delayed_refs+0x7d/0x1d0
>       [Mon Mar 25 18:27:58 2019]  btrfs_commit_transaction+0x50/0x9e0
>       [Mon Mar 25 18:27:58 2019]  ? start_transaction+0x91/0x4d0
>       [Mon Mar 25 18:27:58 2019]  transaction_kthread+0x146/0x180
>       [Mon Mar 25 18:27:58 2019]  kthread+0x106/0x140
>       [Mon Mar 25 18:27:58 2019]  ? btrfs_cleanup_transaction+0x620/0x620
>       [Mon Mar 25 18:27:58 2019]  ? kthread_park+0x90/0x90
>       [Mon Mar 25 18:27:58 2019]  ret_from_fork+0x3a/0x50
>       [Mon Mar 25 18:27:58 2019] irq event stamp: 81665152
>       [Mon Mar 25 18:27:58 2019] hardirqs last  enabled at (81665151): 
> [<ffffffffaa2be2b7>] kmem_cache_free+0x67/0x1e0
>       [Mon Mar 25 18:27:58 2019] hardirqs last disabled at (81665152): 
> [<ffffffffaa00379f>] trace_hardirqs_off_thunk+0x1a/0x1c
>       [Mon Mar 25 18:27:58 2019] softirqs last  enabled at (81664580): 
> [<ffffffffab0003a4>] __do_softirq+0x3a4/0x45f
>       [Mon Mar 25 18:27:58 2019] softirqs last disabled at (81664565): 
> [<ffffffffaa0a3d24>] irq_exit+0xe4/0xf0
>       [Mon Mar 25 18:27:58 2019] ---[ end trace 3d8cdfff7444099c ]---
>
> Pausing the balance makes the warnings stop.

After sorting out some unrelated issues with 5.0 kernels, these warnings
are now the top of my list of unresolved 5.0 kernel issues.  They are
popping up more often:  31804 times over two machine-days.  Too much
spam for the kernel logs.

This is the code surrounding the WARN_ON:

static bool merge_ref(...
{
        // ...
                drop_delayed_ref(trans, delayed_refs, head, next);
                ref->ref_mod += mod;
                if (ref->ref_mod == 0) {
                        drop_delayed_ref(trans, delayed_refs, head, ref);
                        done = true;
                } else {
                        /*
                         * Can't have multiples of the same ref on a tree block.
                         */
                        WARN_ON(ref->type == BTRFS_TREE_BLOCK_REF_KEY ||
                                ref->type == BTRFS_SHARED_BLOCK_REF_KEY);
                }
        // ...
}

Should the warning be removed?  If something bad was going to happen,
I would have expected it to happen some time in the last month of
stress-testing, but I can find no evidence of failure related to
this warning.

Attachment: signature.asc
Description: PGP signature

Reply via email to