Hello, I found current misc-next sometimes fails btrfs/152 when the number of cpu is >= 4 in my vm and git bisect points this commit. (btrfs/152 performs parallel send/receive.)
The failure is caused by _check_dmesg and sometimes also leads to inconsistent fs. dmesg looks like: [ 6.649213] WARNING: CPU: 0 PID: 2838 at fs/btrfs/transaction.c:303 record_root_in_trans+0x38/0xd0 [ 6.650807] CPU: 0 PID: 2838 Comm: btrfs Tainted: G W 4.17.0-rc6+ #113 [ 6.651998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 6.653325] RIP: 0010:record_root_in_trans+0x38/0xd0 [ 6.654090] RSP: 0018:ffff90b781e23a68 EFLAGS: 00010206 [ 6.654882] RAX: ffff8f346a48b720 RBX: ffff8f3479ce86c0 RCX: 0000000000000000 [ 6.655965] RDX: 0000000000000000 RSI: ffff8f347c8ec000 RDI: ffff8f346a03a958 [ 6.657059] RBP: ffff90b781e23b80 R08: 0000000000025870 R09: ffffffffa8c112bb [ 6.658162] R10: ffff90b781e23ba0 R11: 0000000000000002 R12: ffff8f346a03a958 [ 6.659290] R13: ffff8f347c8d9000 R14: ffff8f347ba2a0b8 R15: 0000000000000000 [ 6.660374] FS: 00007f3b62ebc8c0(0000) GS:ffff8f347fc00000(0000) knlGS:0000000000000000 [ 6.661349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.661928] CR2: 00007f0a641995e0 CR3: 000000006af5c000 CR4: 00000000000006f0 [ 6.662651] Call Trace: [ 6.662909] create_pending_snapshot+0x1ab/0xd00 [ 6.663391] ? btrfs_delayed_inode_release_metadata+0xe0/0xf0 [ 6.663972] ? __btrfs_update_delayed_inode+0x1aa/0x210 [ 6.664526] ? __btrfs_release_delayed_node.part.18+0x8f/0x1d0 [ 6.665116] ? create_pending_snapshots+0x81/0xa0 [ 6.665597] create_pending_snapshots+0x81/0xa0 [ 6.666068] btrfs_commit_transaction+0x279/0x860 [ 6.666553] ? start_transaction+0x98/0x3c0 [ 6.666989] btrfs_mksubvol+0x589/0x5a0 [ 6.667390] ? btrfs_opendir+0x39/0x70 [ 6.667778] btrfs_ioctl_snap_create_transid+0x16a/0x1a0 [ 6.668327] btrfs_ioctl_snap_create_v2+0x121/0x180 [ 6.668827] btrfs_ioctl+0x56d/0x25a0 [ 6.669205] ? do_filp_open+0xaa/0x110 [ 6.669591] ? do_vfs_ioctl+0x9f/0x610 [ 6.669980] ? btrfs_ioctl_get_supported_features+0x20/0x20 [ 6.670550] do_vfs_ioctl+0x9f/0x610 [ 6.670922] ksys_ioctl+0x6b/0x80 [ 6.671261] __x64_sys_ioctl+0x11/0x20 [ 6.671650] do_syscall_64+0x49/0x100 [ 6.672040] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 6.672545] RIP: 0033:0x7f3b61cbddd7 [ 6.672899] RSP: 002b:00007ffcdbe7b318 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 6.673666] RAX: ffffffffffffffda RBX: 00007ffcdbe90610 RCX: 00007f3b61cbddd7 [ 6.674430] RDX: 00007ffcdbe7b360 RSI: 0000000050009417 RDI: 0000000000000003 [ 6.675154] RBP: 00005634403ec020 R08: 000000000000ffff R09: 0000000000001010 [ 6.675878] R10: 0000000000000000 R11: 0000000000000246 R12: 00005634403ed670 [ 6.676604] R13: 00007ffcdbe7c470 R14: 0000000000000053 R15: 00007ffcdbe7b360 [ 6.677330] Code: f0 01 00 00 a8 02 74 27 48 8b 07 48 39 86 40 03 00 00 73 1b 49 39 75 28 0f 84 91 00 00 00 85 d2 75 17 48 8b 06 48 39 46 08 74 0e <0f> 0b eb 0a 85 d2 74 65 49 3b 75 28 74 76 41 89 d4 48 89 f3 48 [ 6.679318] ---[ end trace b2378f91e69026c3 ]--- transaction.c:303: WARN_ON(!force && root->commit_root != root->node); And 152.out.full looks like: *** fsck.btrfs output *** root item for root 261, current bytenr 53870592, current gen 149, current level 0, new bytenr 54067200, new gen 149, new level 0 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. ===== (My observation) Call chain is: btrfs_commit_transaction() create_pending_snapshots() create_pending_snapshot() run_delayed_item() <- removed qgroup_account_snapshot() run_delayed_item() <- added This commit removes run_delayed_item() in create_pending_snapshot() and instead calls it after create_pending_snapshots(). However, as qgroup_account_snapshot() updates commit_root, if there are more than one pending snapshots under the same subvolume, it may fail to update root item correctly. So, I think we cannot remove run_delayed_item() here. Thanks, Tomohiro Misono On 2018/02/13 23:16, Nikolay Borisov wrote: > Currently calls to btrfs_run_delayed_inode items have been scattered > around the transaction commit code with no real design argument when they > should be execute. > > We have one call, after transaction writers go to 0. Then we have the delayed > items running as part of creating a snapshot (once per pedning snapshot). > Finally, delayed items are run once more _after_ snapshots have been created. > All in all this amounts to 2 + N (N = number of snapshots slated for > creation). > In reality we need to flush delayed items once before > create_pending_snapshots is called to ensure snapshosts are consistent with > inode data and once after snapshots are created so that newly introduced > inode > items during snapshot creation process are correctly persisted on disk. This > patch brings the total executions of run_delayed_items to just 2. > > This survived multiple xfstest runs. > > Signed-off-by: Nikolay Borisov <nbori...@suse.com> > --- > fs/btrfs/transaction.c | 31 ++++++++++--------------------- > 1 file changed, 10 insertions(+), 21 deletions(-) > > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > index 02bc1e6212e6..b32d3136f36c 100644 > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -1524,18 +1524,6 @@ static noinline int create_pending_snapshot(struct > btrfs_trans_handle *trans, > } > btrfs_release_path(path); > > - /* > - * pull in the delayed directory update > - * and the delayed inode item > - * otherwise we corrupt the FS during > - * snapshot > - */ > - ret = btrfs_run_delayed_items(trans); > - if (ret) { /* Transaction aborted */ > - btrfs_abort_transaction(trans, ret); > - goto fail; > - } > - > record_root_in_trans(trans, root, 0); > btrfs_set_root_last_snapshot(&root->root_item, trans->transid); > memcpy(new_root_item, &root->root_item, sizeof(*new_root_item)); > @@ -2069,10 +2057,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle > *trans) > wait_event(cur_trans->writer_wait, > extwriter_counter_read(cur_trans) == 0); > > - /* some pending stuffs might be added after the previous flush. */ > - ret = btrfs_run_delayed_items(trans); > - if (ret) > - goto cleanup_transaction; > > btrfs_wait_delalloc_flush(fs_info); > > @@ -2095,6 +2079,16 @@ int btrfs_commit_transaction(struct btrfs_trans_handle > *trans) > ret = cur_trans->aborted; > goto scrub_continue; > } > + > + /* > + * Run delayed items because we need them to be consistent on-disk > + * so that snapshots created in create_pending_snapshots don't corrupt > + * the filesystem. At this point we are the one doing transaction > + * commit and now one can come and introduce new delayed inode items > + */ > + ret = btrfs_run_delayed_items(trans); > + if (ret) > + goto scrub_continue; > /* > * the reloc mutex makes sure that we stop > * the balancing code from coming in and moving > @@ -2102,11 +2096,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle > *trans) > */ > mutex_lock(&fs_info->reloc_mutex); > > - /* > - * We needn't worry about the delayed items because we will > - * deal with them in create_pending_snapshot(), which is the > - * core function of the snapshot creation. > - */ > ret = create_pending_snapshots(trans); > if (ret) { > mutex_unlock(&fs_info->reloc_mutex); > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html