Re: [PATCH] xfstests: btrfs/012: add a regression test for deleting ext2_saved
On Tue, Oct 20, 2015 at 07:34:06PM +0800, Liu Bo wrote: > Btrfs now has changed to delete subvolume/snapshot asynchronously, > which means that after umount, if we've already deleted 'ext2_saved', > rollback can still be completed, which should not. > > So this adds a regression test for this. > > Signed-off-by: Liu Bo I'm not sure if this belongs to a new test, but given that this test has very similar steps to existing tests, so I think that's fine. Reviewed-by: Eryu Guan > --- > tests/btrfs/012 | 12 > 1 file changed, 12 insertions(+) > > diff --git a/tests/btrfs/012 b/tests/btrfs/012 > index d513759..b39dec0 100755 > --- a/tests/btrfs/012 > +++ b/tests/btrfs/012 > @@ -112,6 +112,18 @@ diff -r /lib/modules/`uname -r`/ $SCRATCH_MNT/`uname > -r`/ 2>&1 | grep -vw "sourc > > _scratch_unmount > > +# Convert it to btrfs, mount it and delete "ext2_saved" > +$BTRFS_CONVERT_PROG $SCRATCH_DEV >> $seqres.full 2>&1 || \ > + _fail "btrfs-convert failed" > +_scratch_mount || _fail "Could not mount new btrfs fs" > +$BTRFS_UTIL_PROG subvolume delete $SCRATCH_MNT/ext2_saved >> $seqres.full > 2>&1 || > + _fail "failed to delete ext2_saved subvolume" > +_scratch_unmount > + > +# Now restore the ext4 device, expecting a failure > +$BTRFS_CONVERT_PROG -r $SCRATCH_DEV >> $seqres.full 2>&1 > +[ $? -eq 1 ] || _fail "Failure is expected, but btrfs-convert returns with > rollback complete" > + > # success, all done > status=0 > exit > -- > 1.8.2.1 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 RESENT 2/2] btrfs: qgroup: Don't copy extent buffer to do qgroup rescan
On Thu, Oct 22, 2015 at 1:42 AM, Qu Wenruo wrote: > Ancient qgroup code call memcpy() on a extent buffer and use it for leaf > iteration. > > As extent buffer contains lock, pointers to pages, it's never sane to do > such copy. > > The following bug may be caused by this insane operation: > [92098.841309] general protection fault: [#1] SMP > [92098.841338] Modules linked in: ... > [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted > 4.3.0-rc1 #1 > [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper > [btrfs] > [92098.842261] Call Trace: > [92098.842277] [] ? read_extent_buffer+0xb8/0x110 > [btrfs] > [92098.842304] [] ? btrfs_find_all_roots+0x60/0x70 > [btrfs] > [92098.842329] [] > btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs] > > Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(), > called in reading key from the memcpied extent_buffer. > > This patch will read the whole leaf into memory, and use newly > introduced stack function to do qgroup rescan. Hi Qu, Instead of introducing more new functions, why not clone the extent buffer (btrfs_clone_extent_buffer) and then use it the regular/existing functions? Iow, the same as we do in backref walking, should make the change much smaller than it is. thanks > > Reported-by: Stephane Lesimple > Signed-off-by: Qu Wenruo > --- > v2: > Follow the parameter change in previous patch. > v3: > None > --- > fs/btrfs/qgroup.c | 22 -- > 1 file changed, 12 insertions(+), 10 deletions(-) > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index e9ace09..6a83a40 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -2183,11 +2183,11 @@ void assert_qgroups_uptodate(struct > btrfs_trans_handle *trans) > */ > static int > qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, > - struct btrfs_trans_handle *trans, > - struct extent_buffer *scratch_leaf) > + struct btrfs_trans_handle *trans, char *stack_leaf) > { > struct btrfs_key found; > struct ulist *roots = NULL; > + struct btrfs_header *header; > struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem); > u64 num_bytes; > int slot; > @@ -2224,13 +2224,15 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, > struct btrfs_path *path, > fs_info->qgroup_rescan_progress.objectid = found.objectid + 1; > > btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem); > - memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf)); > + read_extent_buffer(path->nodes[0], stack_leaf, 0, > + fs_info->extent_root->nodesize); > + header = (struct btrfs_header *)stack_leaf; > slot = path->slots[0]; > btrfs_release_path(path); > mutex_unlock(&fs_info->qgroup_rescan_lock); > > - for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) { > - btrfs_item_key_to_cpu(scratch_leaf, &found, slot); > + for (; slot < btrfs_stack_header_nritems(header); ++slot) { > + btrfs_stack_item_key_to_cpu(header, &found, slot); > if (found.type != BTRFS_EXTENT_ITEM_KEY && > found.type != BTRFS_METADATA_ITEM_KEY) > continue; > @@ -2261,15 +2263,15 @@ static void btrfs_qgroup_rescan_worker(struct > btrfs_work *work) > qgroup_rescan_work); > struct btrfs_path *path; > struct btrfs_trans_handle *trans = NULL; > - struct extent_buffer *scratch_leaf = NULL; > + char *stack_leaf = NULL; > int err = -ENOMEM; > int ret = 0; > > path = btrfs_alloc_path(); > if (!path) > goto out; > - scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS); > - if (!scratch_leaf) > + stack_leaf = kmalloc(fs_info->extent_root->nodesize, GFP_NOFS); > + if (!stack_leaf) > goto out; > > err = 0; > @@ -2283,7 +2285,7 @@ static void btrfs_qgroup_rescan_worker(struct > btrfs_work *work) > err = -EINTR; > } else { > err = qgroup_rescan_leaf(fs_info, path, trans, > -scratch_leaf); > +stack_leaf); > } > if (err > 0) > btrfs_commit_transaction(trans, fs_info->fs_root); > @@ -2292,7 +2294,7 @@ static void btrfs_qgroup_rescan_worker(struct > btrfs_work *work) > } > > out: > - kfree(scratch_leaf); > + kfree(stack_leaf); > btrfs_free_path(path); > > mutex_lock(&fs_info->qgroup_rescan_lock); > -- > 2.6.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http:
[4.3-rc4] scrubbing aborts before finishing
Hi! I get this: merkaba:~> btrfs scrub status -d / scrub status for […] scrub device /dev/mapper/sata-debian (id 1) history scrub started at Thu Oct 22 10:05:49 2015 and was aborted after 00:00:00 total bytes scrubbed: 0.00B with 0 errors scrub device /dev/dm-2 (id 2) history scrub started at Thu Oct 22 10:05:49 2015 and was aborted after 00:01:30 total bytes scrubbed: 23.81GiB with 0 errors For / scrub aborts for sata SSD immediately. For /home scrub aborts for both SSDs at some time. merkaba:~> btrfs scrub status -d /home scrub status for […] scrub device /dev/mapper/msata-home (id 1) history scrub started at Thu Oct 22 10:09:37 2015 and was aborted after 00:01:31 total bytes scrubbed: 22.03GiB with 0 errors scrub device /dev/dm-3 (id 2) history scrub started at Thu Oct 22 10:09:37 2015 and was aborted after 00:03:34 total bytes scrubbed: 53.30GiB with 0 errors Also single volume BTRFS is affected: merkaba:~> btrfs scrub status /daten scrub status for […] scrub started at Thu Oct 22 10:36:38 2015 and was aborted after 00:00:00 total bytes scrubbed: 0.00B with 0 errors No errors in dmesg, btrfs device stat or smartctl -a. Any known issue? Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 RESENT 2/2] btrfs: qgroup: Don't copy extent buffer to do qgroup rescan
Filipe Manana wrote on 2015/10/22 09:16 +0100: On Thu, Oct 22, 2015 at 1:42 AM, Qu Wenruo wrote: Ancient qgroup code call memcpy() on a extent buffer and use it for leaf iteration. As extent buffer contains lock, pointers to pages, it's never sane to do such copy. The following bug may be caused by this insane operation: [92098.841309] general protection fault: [#1] SMP [92098.841338] Modules linked in: ... [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted 4.3.0-rc1 #1 [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper [btrfs] [92098.842261] Call Trace: [92098.842277] [] ? read_extent_buffer+0xb8/0x110 [btrfs] [92098.842304] [] ? btrfs_find_all_roots+0x60/0x70 [btrfs] [92098.842329] [] btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs] Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(), called in reading key from the memcpied extent_buffer. This patch will read the whole leaf into memory, and use newly introduced stack function to do qgroup rescan. Hi Qu, Instead of introducing more new functions, why not clone the extent buffer (btrfs_clone_extent_buffer) and then use it the regular/existing functions? Iow, the same as we do in backref walking, should make the change much smaller than it is. thanks Thanks Filipe, I didn't know there is such a nice function. And it's setting EXTENT_BUFFER_DUMMY, so it should be quite safe for the use case. Thanks for your advice a lot! Qu Reported-by: Stephane Lesimple Signed-off-by: Qu Wenruo --- v2: Follow the parameter change in previous patch. v3: None --- fs/btrfs/qgroup.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index e9ace09..6a83a40 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2183,11 +2183,11 @@ void assert_qgroups_uptodate(struct btrfs_trans_handle *trans) */ static int qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, - struct btrfs_trans_handle *trans, - struct extent_buffer *scratch_leaf) + struct btrfs_trans_handle *trans, char *stack_leaf) { struct btrfs_key found; struct ulist *roots = NULL; + struct btrfs_header *header; struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem); u64 num_bytes; int slot; @@ -2224,13 +2224,15 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, fs_info->qgroup_rescan_progress.objectid = found.objectid + 1; btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem); - memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf)); + read_extent_buffer(path->nodes[0], stack_leaf, 0, + fs_info->extent_root->nodesize); + header = (struct btrfs_header *)stack_leaf; slot = path->slots[0]; btrfs_release_path(path); mutex_unlock(&fs_info->qgroup_rescan_lock); - for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) { - btrfs_item_key_to_cpu(scratch_leaf, &found, slot); + for (; slot < btrfs_stack_header_nritems(header); ++slot) { + btrfs_stack_item_key_to_cpu(header, &found, slot); if (found.type != BTRFS_EXTENT_ITEM_KEY && found.type != BTRFS_METADATA_ITEM_KEY) continue; @@ -2261,15 +2263,15 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) qgroup_rescan_work); struct btrfs_path *path; struct btrfs_trans_handle *trans = NULL; - struct extent_buffer *scratch_leaf = NULL; + char *stack_leaf = NULL; int err = -ENOMEM; int ret = 0; path = btrfs_alloc_path(); if (!path) goto out; - scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS); - if (!scratch_leaf) + stack_leaf = kmalloc(fs_info->extent_root->nodesize, GFP_NOFS); + if (!stack_leaf) goto out; err = 0; @@ -2283,7 +2285,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) err = -EINTR; } else { err = qgroup_rescan_leaf(fs_info, path, trans, -scratch_leaf); +stack_leaf); } if (err > 0) btrfs_commit_transaction(trans, fs_info->fs_root); @@ -2292,7 +2294,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) } out: - kfree(scratch_leaf); + kfree(stack_leaf); btrfs_free_path(path); mutex_lock(&fs_info->qgroup_rescan_lock); -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vge
[PATCH] Btrfs: change the initialization point of fs_root in open_ctree()
Kernel panic occurred due to NULL pointer reference in can_overcommit(). Because btrfs_async_reclaim_metadata_space() passed NULL pointer to btrfs_calc_reclaim_metadata_size(). [ 3756.152833] BUG: unable to handle kernel NULL pointer dereference at 01f0 [ 3756.152882] IP: [] can_overcommit+0x21/0xf0 [btrfs] [ 3756.152936] PGD 0 [ 3756.152949] Oops: [#1] SMP [ 3756.152969] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_filter ebtable_broute bridge stp llc ebtable_nat ebtables ip6table_mangle ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack coretemp kvm_intel kvm crc32 _pclmul iTCO_wdt iTCO_vendor_support microcode ipmi_si lpc_ich mfd_core pcspkr acpi_power_meter ipmi_msghandler i2c_i801 i7core_edac shpchp edac_core nfsd acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel btrfs xor raid6_pq usb_storage mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb ptp ata_generic pps_core pata_acpi crc32c_intel [ 3756.153397] dca megaraid_sas i2c_algo_bit ata_piix i2c_core [ 3756.153433] CPU: 3 PID: 3004 Comm: kworker/u25:4 Tainted: G I 4.3.0-rc6 #1 [ 3756.153469] Hardware name: FUJITSU-SV PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.09.2619.N1 12/13/2010 [ 3756.153537] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] [ 3756.153571] task: 88023581a400 ti: 880234648000 task.ti: 880234648000 [ 3756.153604] RIP: 0010:[] [] can_overcommit+0x21/0xf0 [btrfs] [ 3756.153655] RSP: 0018:88023464bda8 EFLAGS: 00010282 [ 3756.153679] RAX: 0100 RBX: 880431f68c00 RCX: 0002 [ 3756.153711] RDX: 00c0 RSI: RDI: [ 3756.153742] RBP: 88023464bde0 R08: 0101 R09: 000c [ 3756.153773] R10: 81d10060 R11: 81d10050 R12: 880431f68c00 [ 3756.153804] R13: R14: 880035f67070 R15: 00c0 [ 3756.153836] FS: () GS:880237cc() knlGS: [ 3756.153871] CS: 0010 DS: ES: CR0: 8005003b [ 3756.153897] CR2: 01f0 CR3: 01c08000 CR4: 06e0 [ 3756.153929] Stack: [ 3756.153940] 8802 880237cd2940 880431f68c00 [ 3756.153979] 00c0 880035f67070 88023464be20 [ 3756.154016] a01e5404 880431f68c80 880234482240 8802378a1800 [ 3756.154054] Call Trace: [ 3756.154081] [] btrfs_async_reclaim_metadata_space+0xb4/0x210 [btrfs] [ 3756.154119] [] process_one_work+0x19e/0x3d0 [ 3756.154146] [] worker_thread+0x4e/0x450 [ 3756.154174] [] ? __schedule+0x2b9/0x930 [ 3756.154199] [] ? process_one_work+0x3d0/0x3d0 [ 3756.154227] [] ? process_one_work+0x3d0/0x3d0 [ 3756.154255] [] kthread+0xc9/0xe0 [ 3756.154279] [] ? kthread_worker_fn+0x160/0x160 [ 3756.154307] [] ret_from_fork+0x3f/0x70 [ 3756.154333] [] ? kthread_worker_fn+0x160/0x160 [ 3756.154361] Code: a5 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 f4 53 31 f6 49 89 fd 49 89 d7 48 83 ec 10 <4c> 8b b7 f0 01 00 00 89 4d cc 49 3b 7e 30 40 0f 95 c6 48 8d 74 [ 3756.156802] RIP [] can_overcommit+0x21/0xf0 [btrfs] [ 3756.157995] RSP [ 3756.159162] CR2: 01f0 fs_info->fs_root is referred in btrfs_async_reclaim_metadata_space() when mount kicked kworker(btrfs_async_reclaim_metadata_space). But at this time, fs_info->fs_root had not been initialized yet, so NULL pointer passed to btrfs_calc_reclaim_metadata_size(). PID: 3045 TASK: 8800bb06b000 CPU: 2 COMMAND: "mount" [exception RIP: queued_spin_lock_slowpath+350] RIP: 810be2de RSP: 8800b9fdb738 RFLAGS: 0202 RAX: 0101 RBX: 880431f68c00 RCX: 0001 RDX: 0101 RSI: 0001 RDI: 880431f68c00 RBP: 8800b9fdb738 R8: 0101 R9: R10: 4000 R11: 00018e58 R12: 0001 R13: 8800b9fdb7c0 R14: 8800bb06b000 R15: 0001 CS: 0010 SS: 0018 #0 [8800b9fdb740] _raw_spin_lock at 81694ff0 #1 [8800b9fdb750] reserve_metadata_bytes at a01e55cc [btrfs] #2 [8800b9fdb800] btrfs_block_rsv_add at a01e5a93 [btrfs] #3 [8800b9fdb828] btrfs_truncate_inode_items at a0202779 [btrfs] #4 [8800b9fdb920] btrfs_evict_inode at a02040ec [btrfs] #5 [8800b9fdb990] evict at 811ed6ea #6 [880
[PATCH] Btrfs: fix regression when running delayed references
From: Filipe Manana In the kernel 4.2 merge window we had a refactoring/rework of the delayed references implementation in order to fix certain problems with qgroups. However that rework introduced one more regression that leads to the following trace when running delayed references for metadata: [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc psmouse i2 [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW 4.3.0-rc5-btrfs-next-17+ #1 [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 88010c4c8000 [35908.065201] RIP: 0010:[] [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 [35908.065201] RAX: RBX: 88008a661000 RCX: [35908.065201] RDX: a04dd58f RSI: 0001 RDI: [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8 [35908.065201] R10: R11: 002c R12: [35908.065201] R13: 88020a74c578 R14: R15: [35908.065201] FS: () GS:88023edc() knlGS: [35908.065201] CS: 0010 DS: ES: CR0: 8005003b [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0 [35908.065201] Stack: [35908.065201] 88010c4cbb18 0f37 88020a74c578 88015a408000 [35908.065201] 880154a44000 0005 88010c4cbbd8 [35908.065201] a0492b9a 0005 [35908.065201] Call Trace: [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs] [35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 [btrfs] [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f [btrfs] [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f [btrfs] [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs] [35908.065201] [] delayed_ref_async_start+0x3c/0x7b [btrfs] [35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs] [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 [btrfs] [35908.065201] [] process_one_work+0x24a/0x4ac [35908.065201] [] worker_thread+0x206/0x2c2 [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] kthread+0xef/0xf7 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] [] ret_from_fork+0x3f/0x70 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 [35908.065201] RIP [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP [35908.310885] ---[ end trace fe4299baf0666457 ]--- This happens because the new delayed references code no longer merges delayed references that have different sequence values. The following steps are an example sequence leading to this issue: 1) Transaction N starts, fs_info->tree_mod_seq has value 0; 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for bytenr A is created, with a value of 1 and a seq value of 0; 3) fs_info->tree_mod_seq is incremented to 1; 4) Extent buffer A is deleted through btrfs_del_items(), which calls btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The later returns the metadata extent associated to extent buffer A to the free space cache (the range is not pinned), because the extent buffer was created in the current transaction (N) and writeback never happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set in the extent buffer). This creates the delayed reference Ref2 for bytenr A, with a value of -1 and a seq value of 1; 5) Delayed reference Ref2 is not merged with Ref1 when we create it, because they have different sequence numbers (decided at add_delayed_ref_tail_merge()); 6) fs_info->tree_mod_seq is incremented to 2; 7) Some task attempts to allocate a new extent buffer (done at extent-tree.c:find_free_extent()), but due to heavy fragmentation and running low on metadata space the clustered allocation fails and we fall back to unclustered allocation, which finds the extent at offset A, so a new extent buffer at offset A is allocated. This creates delayed reference Ref3 for bytenr A, with a value of -1
Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing
On Thu, Oct 22, 2015 at 6:32 AM, Erkki Seppala wrote: > Hello, > > Recently I added daily rebalancing to my cron.d (after finding myself in > the no-space-situation), and not long after that, I found my PC had > crashed over night. Having no sign in the logs anywhere (not even over > network even though there should be) I had nothing to go on, but this > night it crashed again after starting the rebalance, and this time there > was some information on the kernel log. > > Kernel version: 4.2.3 (package linux-image-4.2.0-1-amd64 version 4.2.3-1 > from Debian Unstable) > > The dump is available at: > > http://www.modeemi.fi/~flux/btrfs/btrfs-BUG-2015-10-55.txt > > The log is available as well (stripped some unrelated USB- and firewall > logging, showing that last evening there was some kernel task hung for > 120 seconds; but it's in another btrfs filesystem and is another story): > > http://www.modeemi.fi/~flux/btrfs/btrfs-2015-10-55.txt > > I'm not quite sure which of the btrfs balance commands caused the > issue. But there is my script: > > #!/bin/sh > fs="$1" > if [ -z "$fs" ]; then > echo usage: btrfs-balance / 0 1 5 10 20 50 > exit 1 > fi > fs="$1" > shift > for usage in d m; do for a in "$@"; do date; /bin/btrfs balance start > "$fs" -v -${usage}usage=$a; done; done > > And it was started at 07:30 with: > > /usr/local/sbin/btrfs-balance / 0 1 2 5 10 20 30 50 70 > > I should add that the filesystem in question is backed by MD RAID10 and > that is backed by four SSDs, so it's reasonably fast in IO, if that > affects anything. There should have been no much competing IO at the > time of the occurrence. > > Before Duncan asks ;-), I only have a moderate number of subvolumes and > snapshots, ie. one subvolume for each of /, /var/log/journal and /home, > 24 snapshots of / and /home plus <10 snapshots of /. > > Before that balance there was another balance on a another BTRFS RAID10, > but given the time stamp I think I can easily say it wasn't the cause. > > I don't really have other 'solutions' than disabling the rebalancing for > the time being, and only use it as-needed as I had earlier done.. Try this (just sent a few minutes ago): https://patchwork.kernel.org/patch/7463161/ thanks > > Cheers, > > -- > _ > / __// /__ __ http://www.modeemi.fi/~flux/\ \ > / /_ / // // /\ \/ /\ / >/_/ /_/ \___/ /_/\_\@modeemi.fi \/ > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: add balance filters limits, stripes and usage to supported mask
Enable the extended 'limit' syntax (a range), the new 'stripes' and extended 'usage' syntax (a range) filters in the filters mask. The patch comes separate and not within the series that introduced the new filters because the patch adding the mask was merged in a late rc. The integration branch was based on an older rc and could not merge the patch due to the missing changes. Prerequisities: * btrfs: check unsupported filters in balance arguments * btrfs: extend balance filter limit to take minimum and maximum * btrfs: add balance filter for stripes * btrfs: extend balance filter usage to take minimum and maximum Signed-off-by: David Sterba --- fs/btrfs/volumes.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 90ef3e722b72..6abd2dd346b3 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -385,7 +385,10 @@ struct map_lookup { BTRFS_BALANCE_ARGS_DEVID | \ BTRFS_BALANCE_ARGS_DRANGE |\ BTRFS_BALANCE_ARGS_VRANGE |\ -BTRFS_BALANCE_ARGS_LIMIT) +BTRFS_BALANCE_ARGS_LIMIT | \ +BTRFS_BALANCE_ARGS_LIMIT_RANGE | \ +BTRFS_BALANCE_ARGS_STRIPES_RANGE | \ +BTRFS_BALANCE_ARGS_USAGE_RANGE) /* * Profile changing flags. When SOFT is set we won't relocate chunk if -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix regression when running delayed references
wrote on 2015/10/22 09:47 +0100: From: Filipe Manana In the kernel 4.2 merge window we had a refactoring/rework of the delayed references implementation in order to fix certain problems with qgroups. However that rework introduced one more regression that leads to the following trace when running delayed references for metadata: [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc psmouse i2 [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW 4.3.0-rc5-btrfs-next-17+ #1 [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 88010c4c8000 [35908.065201] RIP: 0010:[] [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 [35908.065201] RAX: RBX: 88008a661000 RCX: [35908.065201] RDX: a04dd58f RSI: 0001 RDI: [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8 [35908.065201] R10: R11: 002c R12: [35908.065201] R13: 88020a74c578 R14: R15: [35908.065201] FS: () GS:88023edc() knlGS: [35908.065201] CS: 0010 DS: ES: CR0: 8005003b [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0 [35908.065201] Stack: [35908.065201] 88010c4cbb18 0f37 88020a74c578 88015a408000 [35908.065201] 880154a44000 0005 88010c4cbbd8 [35908.065201] a0492b9a 0005 [35908.065201] Call Trace: [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs] [35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 [btrfs] [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f [btrfs] [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f [btrfs] [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs] [35908.065201] [] delayed_ref_async_start+0x3c/0x7b [btrfs] [35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs] [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 [btrfs] [35908.065201] [] process_one_work+0x24a/0x4ac [35908.065201] [] worker_thread+0x206/0x2c2 [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] kthread+0xef/0xf7 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] [] ret_from_fork+0x3f/0x70 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 [35908.065201] RIP [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP [35908.310885] ---[ end trace fe4299baf0666457 ]--- This happens because the new delayed references code no longer merges delayed references that have different sequence values. The following steps are an example sequence leading to this issue: 1) Transaction N starts, fs_info->tree_mod_seq has value 0; 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for bytenr A is created, with a value of 1 and a seq value of 0; 3) fs_info->tree_mod_seq is incremented to 1; 4) Extent buffer A is deleted through btrfs_del_items(), which calls btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The later returns the metadata extent associated to extent buffer A to the free space cache (the range is not pinned), because the extent buffer was created in the current transaction (N) and writeback never happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set in the extent buffer). This creates the delayed reference Ref2 for bytenr A, with a value of -1 and a seq value of 1; 5) Delayed reference Ref2 is not merged with Ref1 when we create it, because they have different sequence numbers (decided at add_delayed_ref_tail_merge()); 6) fs_info->tree_mod_seq is incremented to 2; 7) Some task attempts to allocate a new extent buffer (done at extent-tree.c:find_free_extent()), but due to heavy fragmentation and running low on metadata space the clustered allocation fails and we fall back to unclustered allocation, which finds the extent at offset A, so a new extent buffer at offset A is allocated. This creates delay
Re: [PATCH] Btrfs: fix regression when running delayed references
On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo wrote: > > > wrote on 2015/10/22 09:47 +0100: >> >> From: Filipe Manana >> >> In the kernel 4.2 merge window we had a refactoring/rework of the delayed >> references implementation in order to fix certain problems with qgroups. >> However that rework introduced one more regression that leads to the >> following trace when running delayed references for metadata: >> >> [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! >> [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC >> [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic >> xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache >> sunrpc loop fuse parport_pc psmouse i2 >> [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW >> 4.3.0-rc5-btrfs-next-17+ #1 >> [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 >> [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper >> [btrfs] >> [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: >> 88010c4c8000 >> [35908.065201] RIP: 0010:[] [] >> insert_inline_extent_backref+0x52/0xb1 [btrfs] >> [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 >> [35908.065201] RAX: RBX: 88008a661000 RCX: >> >> [35908.065201] RDX: a04dd58f RSI: 0001 RDI: >> >> [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: >> 88010c4cb9f8 >> [35908.065201] R10: R11: 002c R12: >> >> [35908.065201] R13: 88020a74c578 R14: R15: >> >> [35908.065201] FS: () GS:88023edc() >> knlGS: >> [35908.065201] CS: 0010 DS: ES: CR0: 8005003b >> [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: >> 06e0 >> [35908.065201] Stack: >> [35908.065201] 88010c4cbb18 0f37 88020a74c578 >> 88015a408000 >> [35908.065201] 880154a44000 0005 >> 88010c4cbbd8 >> [35908.065201] a0492b9a 0005 >> >> [35908.065201] Call Trace: >> [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 >> [btrfs] >> [35908.065201] [] ? >> __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] >> [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 >> [btrfs] >> [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f >> [btrfs] >> [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f >> [btrfs] >> [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd >> [btrfs] >> [35908.065201] [] delayed_ref_async_start+0x3c/0x7b >> [btrfs] >> [35908.065201] [] normal_work_helper+0x14c/0x32a >> [btrfs] >> [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 >> [btrfs] >> [35908.065201] [] process_one_work+0x24a/0x4ac >> [35908.065201] [] worker_thread+0x206/0x2c2 >> [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb >> [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb >> [35908.065201] [] kthread+0xef/0xf7 >> [35908.065201] [] ? kthread_parkme+0x24/0x24 >> [35908.065201] [] ret_from_fork+0x3f/0x70 >> [35908.065201] [] ? kthread_parkme+0x24/0x24 >> [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 >> 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 >> <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 >> [35908.065201] RIP [] >> insert_inline_extent_backref+0x52/0xb1 [btrfs] >> [35908.065201] RSP >> [35908.310885] ---[ end trace fe4299baf0666457 ]--- >> >> This happens because the new delayed references code no longer merges >> delayed references that have different sequence values. The following >> steps are an example sequence leading to this issue: >> >> 1) Transaction N starts, fs_info->tree_mod_seq has value 0; >> >> 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for >> bytenr A is created, with a value of 1 and a seq value of 0; >> >> 3) fs_info->tree_mod_seq is incremented to 1; >> >> 4) Extent buffer A is deleted through btrfs_del_items(), which calls >> btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The >> later returns the metadata extent associated to extent buffer A to >> the free space cache (the range is not pinned), because the extent >> buffer was created in the current transaction (N) and writeback never >> happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set >> in the extent buffer). >> This creates the delayed reference Ref2 for bytenr A, with a value >> of -1 and a seq value of 1; >> >> 5) Delayed reference Ref2 is not merged with Ref1 when we create it, >> because they have different sequence numbers (decided at >> add_delayed_ref_tail_merge()); >> >> 6) fs_info->tree_mod_seq is incremented to 2; >> >> 7) Some task
Re: [PATCH] Btrfs: fix regression when running delayed references
On Thu, Oct 22, 2015 at 10:43 AM, Filipe Manana wrote: > On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo wrote: >> >> >> wrote on 2015/10/22 09:47 +0100: >>> >>> From: Filipe Manana >>> >>> In the kernel 4.2 merge window we had a refactoring/rework of the delayed >>> references implementation in order to fix certain problems with qgroups. >>> However that rework introduced one more regression that leads to the >>> following trace when running delayed references for metadata: >>> >>> [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! >>> [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC >>> [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic >>> xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache >>> sunrpc loop fuse parport_pc psmouse i2 >>> [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW >>> 4.3.0-rc5-btrfs-next-17+ #1 >>> [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 >>> [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper >>> [btrfs] >>> [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: >>> 88010c4c8000 >>> [35908.065201] RIP: 0010:[] [] >>> insert_inline_extent_backref+0x52/0xb1 [btrfs] >>> [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 >>> [35908.065201] RAX: RBX: 88008a661000 RCX: >>> >>> [35908.065201] RDX: a04dd58f RSI: 0001 RDI: >>> >>> [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: >>> 88010c4cb9f8 >>> [35908.065201] R10: R11: 002c R12: >>> >>> [35908.065201] R13: 88020a74c578 R14: R15: >>> >>> [35908.065201] FS: () GS:88023edc() >>> knlGS: >>> [35908.065201] CS: 0010 DS: ES: CR0: 8005003b >>> [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: >>> 06e0 >>> [35908.065201] Stack: >>> [35908.065201] 88010c4cbb18 0f37 88020a74c578 >>> 88015a408000 >>> [35908.065201] 880154a44000 0005 >>> 88010c4cbbd8 >>> [35908.065201] a0492b9a 0005 >>> >>> [35908.065201] Call Trace: >>> [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 >>> [btrfs] >>> [35908.065201] [] ? >>> __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] >>> [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 >>> [btrfs] >>> [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f >>> [btrfs] >>> [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f >>> [btrfs] >>> [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd >>> [btrfs] >>> [35908.065201] [] delayed_ref_async_start+0x3c/0x7b >>> [btrfs] >>> [35908.065201] [] normal_work_helper+0x14c/0x32a >>> [btrfs] >>> [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 >>> [btrfs] >>> [35908.065201] [] process_one_work+0x24a/0x4ac >>> [35908.065201] [] worker_thread+0x206/0x2c2 >>> [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb >>> [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb >>> [35908.065201] [] kthread+0xef/0xf7 >>> [35908.065201] [] ? kthread_parkme+0x24/0x24 >>> [35908.065201] [] ret_from_fork+0x3f/0x70 >>> [35908.065201] [] ? kthread_parkme+0x24/0x24 >>> [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 >>> 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 >>> <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 >>> [35908.065201] RIP [] >>> insert_inline_extent_backref+0x52/0xb1 [btrfs] >>> [35908.065201] RSP >>> [35908.310885] ---[ end trace fe4299baf0666457 ]--- >>> >>> This happens because the new delayed references code no longer merges >>> delayed references that have different sequence values. The following >>> steps are an example sequence leading to this issue: >>> >>> 1) Transaction N starts, fs_info->tree_mod_seq has value 0; >>> >>> 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for >>> bytenr A is created, with a value of 1 and a seq value of 0; >>> >>> 3) fs_info->tree_mod_seq is incremented to 1; >>> >>> 4) Extent buffer A is deleted through btrfs_del_items(), which calls >>> btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The >>> later returns the metadata extent associated to extent buffer A to >>> the free space cache (the range is not pinned), because the extent >>> buffer was created in the current transaction (N) and writeback never >>> happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set >>> in the extent buffer). >>> This creates the delayed reference Ref2 for bytenr A, with a value >>> of -1 and a seq value of 1; >>> >>> 5) Delayed reference Ref2 is not merged with Ref1 when we create it, >>> b
Re: [PATCH] Btrfs: fix regression when running delayed references
Op 22-10-15 om 10:47 schreef fdman...@kernel.org: > From: Filipe Manana > > In the kernel 4.2 merge window we had a refactoring/rework of the delayed > references implementation in order to fix certain problems with qgroups. > However that rework introduced one more regression that leads to the > following trace when running delayed references for metadata: > > [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! > [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC > [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor > raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc > loop fuse parport_pc psmouse i2 > [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW > 4.3.0-rc5-btrfs-next-17+ #1 > [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 > [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] > [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: > 88010c4c8000 > [35908.065201] RIP: 0010:[] [] > insert_inline_extent_backref+0x52/0xb1 [btrfs] > [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 > [35908.065201] RAX: RBX: 88008a661000 RCX: > > [35908.065201] RDX: a04dd58f RSI: 0001 RDI: > > [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: > 88010c4cb9f8 > [35908.065201] R10: R11: 002c R12: > > [35908.065201] R13: 88020a74c578 R14: R15: > > [35908.065201] FS: () GS:88023edc() > knlGS: > [35908.065201] CS: 0010 DS: ES: CR0: 8005003b > [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: > 06e0 > [35908.065201] Stack: > [35908.065201] 88010c4cbb18 0f37 88020a74c578 > 88015a408000 > [35908.065201] 880154a44000 0005 > 88010c4cbbd8 > [35908.065201] a0492b9a 0005 > > [35908.065201] Call Trace: > [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs] > [35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 > [btrfs] > [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 > [btrfs] > [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f > [btrfs] > [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f > [btrfs] > [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs] > [35908.065201] [] delayed_ref_async_start+0x3c/0x7b [btrfs] > [35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs] > [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 > [btrfs] > [35908.065201] [] process_one_work+0x24a/0x4ac > [35908.065201] [] worker_thread+0x206/0x2c2 > [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb > [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb > [35908.065201] [] kthread+0xef/0xf7 > [35908.065201] [] ? kthread_parkme+0x24/0x24 > [35908.065201] [] ret_from_fork+0x3f/0x70 > [35908.065201] [] ? kthread_parkme+0x24/0x24 > [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d > 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> > 0b 4c 8b 45 30 8b 4d 28 45 31 > [35908.065201] RIP [] > insert_inline_extent_backref+0x52/0xb1 [btrfs] > [35908.065201] RSP > [35908.310885] ---[ end trace fe4299baf0666457 ]--- Would this also solve this: Oct 22 12:03:20 beast kernel: WARNING: CPU: 5 PID: 323 at lib/list_debug.c:62 __list_del_entry+0x5a/0x98() Oct 22 12:03:20 beast kernel: list_del corruption. next->prev should be 88033f864500, but was 88033f8642c0 Oct 22 12:03:20 beast kernel: Modules linked in: arc4 md4 nls_utf8 cifs dns_resolver fscache ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack veth loop b43 mac80211 cfg80211 ssb mmc_core kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel serio_raw sb_edac edac_core i2c_i801 btusb btrtl btintel btbcm bluetooth joydev bcma rfkill cp210x tpm_infineon tpm_tis tpm sch_fq_codel radeon crc32c_intel ttm drm_kms_helper Oct 22 12:03:20 beast kernel: CPU: 5 PID: 323 Comm: kworker/u16:12 Tainted: G W 4.2.2 #50 Oct 22 12:03:20 beast kernel: Hardware name: System manufacturer System Product Name/X79-DELUXE, BIOS 0901 06/20/2014 Oct 22 12:03:20 beast kernel: Workqueue: btrfs-delalloc btrfs_delalloc_helper Oct 22 12:03:20 beast kernel: 0009 88013993fb98 8170b663 0006 Oct 22 12:03:20 beast kernel: 88013993fbe8 88013993fbd8 8106aa40 88013993fc78 Oct 22 12:03:20 beast kernel: 813392c3 88033f864480 88033f864500 880ba968d510 Oct 22 12:03:20 beast kernel: Call Trace: Oct 22 12:03:20 beast kernel
Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing
Le 2015-10-22 10:53, Filipe Manana a écrit : On Thu, Oct 22, 2015 at 6:32 AM, Erkki Seppala wrote: Hello, Recently I added daily rebalancing to my cron.d (after finding myself in the no-space-situation), and not long after that, I found my PC had crashed over night. Having no sign in the logs anywhere (not even over network even though there should be) I had nothing to go on, but this night it crashed again after starting the rebalance, and this time there was some information on the kernel log. Kernel version: 4.2.3 (package linux-image-4.2.0-1-amd64 version 4.2.3-1 from Debian Unstable) The dump is available at: http://www.modeemi.fi/~flux/btrfs/btrfs-BUG-2015-10-55.txt The log is available as well (stripped some unrelated USB- and firewall logging, showing that last evening there was some kernel task hung for 120 seconds; but it's in another btrfs filesystem and is another story): http://www.modeemi.fi/~flux/btrfs/btrfs-2015-10-55.txt I'm not quite sure which of the btrfs balance commands caused the issue. But there is my script: #!/bin/sh fs="$1" if [ -z "$fs" ]; then echo usage: btrfs-balance / 0 1 5 10 20 50 exit 1 fi fs="$1" shift for usage in d m; do for a in "$@"; do date; /bin/btrfs balance start "$fs" -v -${usage}usage=$a; done; done And it was started at 07:30 with: /usr/local/sbin/btrfs-balance / 0 1 2 5 10 20 30 50 70 I should add that the filesystem in question is backed by MD RAID10 and that is backed by four SSDs, so it's reasonably fast in IO, if that affects anything. There should have been no much competing IO at the time of the occurrence. Before Duncan asks ;-), I only have a moderate number of subvolumes and snapshots, ie. one subvolume for each of /, /var/log/journal and /home, 24 snapshots of / and /home plus <10 snapshots of /. Before that balance there was another balance on a another BTRFS RAID10, but given the time stamp I think I can easily say it wasn't the cause. I don't really have other 'solutions' than disabling the rebalancing for the time being, and only use it as-needed as I had earlier done.. Try this (just sent a few minutes ago): https://patchwork.kernel.org/patch/7463161/ Awesome, I'll also try it right now under 4.3.0-rc6. My system is currently hit so hard by this bug that it no longer survives a balance for longer than a few minutes. Will keep you posted on the outcome. Thanks, -- Stéphane. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix regression when running delayed references
在 2015年10月22日 17:43, Filipe Manana 写道: On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo wrote: wrote on 2015/10/22 09:47 +0100: From: Filipe Manana In the kernel 4.2 merge window we had a refactoring/rework of the delayed references implementation in order to fix certain problems with qgroups. However that rework introduced one more regression that leads to the following trace when running delayed references for metadata: [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc psmouse i2 [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW 4.3.0-rc5-btrfs-next-17+ #1 [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 88010c4c8000 [35908.065201] RIP: 0010:[] [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 [35908.065201] RAX: RBX: 88008a661000 RCX: [35908.065201] RDX: a04dd58f RSI: 0001 RDI: [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8 [35908.065201] R10: R11: 002c R12: [35908.065201] R13: 88020a74c578 R14: R15: [35908.065201] FS: () GS:88023edc() knlGS: [35908.065201] CS: 0010 DS: ES: CR0: 8005003b [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0 [35908.065201] Stack: [35908.065201] 88010c4cbb18 0f37 88020a74c578 88015a408000 [35908.065201] 880154a44000 0005 88010c4cbbd8 [35908.065201] a0492b9a 0005 [35908.065201] Call Trace: [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs] [35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 [btrfs] [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f [btrfs] [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f [btrfs] [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs] [35908.065201] [] delayed_ref_async_start+0x3c/0x7b [btrfs] [35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs] [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 [btrfs] [35908.065201] [] process_one_work+0x24a/0x4ac [35908.065201] [] worker_thread+0x206/0x2c2 [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] kthread+0xef/0xf7 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] [] ret_from_fork+0x3f/0x70 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 [35908.065201] RIP [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP [35908.310885] ---[ end trace fe4299baf0666457 ]--- This happens because the new delayed references code no longer merges delayed references that have different sequence values. The following steps are an example sequence leading to this issue: 1) Transaction N starts, fs_info->tree_mod_seq has value 0; 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for bytenr A is created, with a value of 1 and a seq value of 0; 3) fs_info->tree_mod_seq is incremented to 1; 4) Extent buffer A is deleted through btrfs_del_items(), which calls btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The later returns the metadata extent associated to extent buffer A to the free space cache (the range is not pinned), because the extent buffer was created in the current transaction (N) and writeback never happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set in the extent buffer). This creates the delayed reference Ref2 for bytenr A, with a value of -1 and a seq value of 1; 5) Delayed reference Ref2 is not merged with Ref1 when we create it, because they have different sequence numbers (decided at add_delayed_ref_tail_merge()); 6) fs_info->tree_mod_seq is incremented to 2; 7) Some task attempts to allocate a new extent buffer (done at extent-tree.c:find_free_extent()), but due to heavy fragmentation and running low on metadata space the clustered allocation fails and we fall back to unclustered allocation, which finds the ex
Re: [PATCH] Btrfs: fix regression when running delayed references
On Thu, Oct 22, 2015 at 11:05 AM, Koen Kooi wrote: > Op 22-10-15 om 10:47 schreef fdman...@kernel.org: >> From: Filipe Manana >> >> In the kernel 4.2 merge window we had a refactoring/rework of the delayed >> references implementation in order to fix certain problems with qgroups. >> However that rework introduced one more regression that leads to the >> following trace when running delayed references for metadata: >> >> [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! >> [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC >> [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor >> raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache >> sunrpc loop fuse parport_pc psmouse i2 >> [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW >>4.3.0-rc5-btrfs-next-17+ #1 >> [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 >> [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] >> [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: >> 88010c4c8000 >> [35908.065201] RIP: 0010:[] [] >> insert_inline_extent_backref+0x52/0xb1 [btrfs] >> [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 >> [35908.065201] RAX: RBX: 88008a661000 RCX: >> >> [35908.065201] RDX: a04dd58f RSI: 0001 RDI: >> >> [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: >> 88010c4cb9f8 >> [35908.065201] R10: R11: 002c R12: >> >> [35908.065201] R13: 88020a74c578 R14: R15: >> >> [35908.065201] FS: () GS:88023edc() >> knlGS: >> [35908.065201] CS: 0010 DS: ES: CR0: 8005003b >> [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: >> 06e0 >> [35908.065201] Stack: >> [35908.065201] 88010c4cbb18 0f37 88020a74c578 >> 88015a408000 >> [35908.065201] 880154a44000 0005 >> 88010c4cbbd8 >> [35908.065201] a0492b9a 0005 >> >> [35908.065201] Call Trace: >> [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 >> [btrfs] >> [35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 >> [btrfs] >> [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 >> [btrfs] >> [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f >> [btrfs] >> [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f >> [btrfs] >> [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd >> [btrfs] >> [35908.065201] [] delayed_ref_async_start+0x3c/0x7b >> [btrfs] >> [35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs] >> [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 >> [btrfs] >> [35908.065201] [] process_one_work+0x24a/0x4ac >> [35908.065201] [] worker_thread+0x206/0x2c2 >> [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb >> [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb >> [35908.065201] [] kthread+0xef/0xf7 >> [35908.065201] [] ? kthread_parkme+0x24/0x24 >> [35908.065201] [] ret_from_fork+0x3f/0x70 >> [35908.065201] [] ? kthread_parkme+0x24/0x24 >> [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 >> 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 >> <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 >> [35908.065201] RIP [] >> insert_inline_extent_backref+0x52/0xb1 [btrfs] >> [35908.065201] RSP >> [35908.310885] ---[ end trace fe4299baf0666457 ]--- > > Would this also solve this: No, what you get is a totally different and unrelated problem. > > Oct 22 12:03:20 beast kernel: WARNING: CPU: 5 PID: 323 at lib/list_debug.c:62 > __list_del_entry+0x5a/0x98() > Oct 22 12:03:20 beast kernel: list_del corruption. next->prev should be > 88033f864500, but was 88033f8642c0 > Oct 22 12:03:20 beast kernel: Modules linked in: arc4 md4 nls_utf8 cifs > dns_resolver fscache ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat > nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack veth loop b43 > mac80211 cfg80211 ssb mmc_core kvm_intel kvm crct10dif_pclmul crc32_pclmul > ghash_clmulni_intel serio_raw sb_edac edac_core i2c_i801 btusb btrtl btintel > btbcm bluetooth joydev bcma rfkill cp210x tpm_infineon tpm_tis tpm > sch_fq_codel radeon crc32c_intel ttm drm_kms_helper > Oct 22 12:03:20 beast kernel: CPU: 5 PID: 323 Comm: kworker/u16:12 Tainted: G >W 4.2.2 #50 > Oct 22 12:03:20 beast kernel: Hardware name: System manufacturer System > Product Name/X79-DELUXE, BIOS 0901 06/20/2014 > Oct 22 12:03:20 beast kernel: Workqueue: btrfs-delalloc btrfs_delalloc_helper > Oct 22 12:03:20 beast kernel: 0009 88013993fb98 > 8170b663 0006 > Oct 22 12:03:20 beast k
Re: [PATCH] Btrfs: fix regression when running delayed references
Le 2015-10-22 11:47, Filipe Manana a écrit : On Thu, Oct 22, 2015 at 10:43 AM, Filipe Manana wrote: On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo wrote: wrote on 2015/10/22 09:47 +0100: From: Filipe Manana In the kernel 4.2 merge window we had a refactoring/rework of the delayed references implementation in order to fix certain problems with qgroups. However that rework introduced one more regression that leads to the following trace when running delayed references for metadata: [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc psmouse i2 [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1 [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 88010c4c8000 [35908.065201] RIP: 0010:[] [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 [35908.065201] RAX: RBX: 88008a661000 RCX: [35908.065201] RDX: a04dd58f RSI: 0001 RDI: [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8 [35908.065201] R10: R11: 002c R12: [35908.065201] R13: 88020a74c578 R14: R15: [35908.065201] FS: () GS:88023edc() knlGS: [35908.065201] CS: 0010 DS: ES: CR0: 8005003b [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0 [35908.065201] Stack: [35908.065201] 88010c4cbb18 0f37 88020a74c578 88015a408000 [35908.065201] 880154a44000 0005 88010c4cbbd8 [35908.065201] a0492b9a 0005 [35908.065201] Call Trace: [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs] [35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 [btrfs] [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f [btrfs] [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f [btrfs] [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs] [35908.065201] [] delayed_ref_async_start+0x3c/0x7b [btrfs] [35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs] [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 [btrfs] [35908.065201] [] process_one_work+0x24a/0x4ac [35908.065201] [] worker_thread+0x206/0x2c2 [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] kthread+0xef/0xf7 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] [] ret_from_fork+0x3f/0x70 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 [35908.065201] RIP [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP [35908.310885] ---[ end trace fe4299baf0666457 ]--- This happens because the new delayed references code no longer merges delayed references that have different sequence values. The following steps are an example sequence leading to this issue: 1) Transaction N starts, fs_info->tree_mod_seq has value 0; 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for bytenr A is created, with a value of 1 and a seq value of 0; 3) fs_info->tree_mod_seq is incremented to 1; 4) Extent buffer A is deleted through btrfs_del_items(), which calls btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The later returns the metadata extent associated to extent buffer A to the free space cache (the range is not pinned), because the extent buffer was created in the current transaction (N) and writeback never happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set in the extent buffer). This creates the delayed reference Ref2 for bytenr A, with a value of -1 and a seq value of 1; 5) Delayed reference Ref2 is not merged with Ref1 when we create it, because they have different sequence numbers (decided at add_delayed_ref_tail_merge()); 6) fs_info->tree_mod_seq is incremented to 2; 7) Some task attempts to allocate a new extent buffer (done at extent-tree.c:find_free_extent()), but due to heavy fragmentation and running low on metadata space the clustered
Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing
Hello, Thanks for the super-fast response :). I've installed the patch and shall be waiting. The effects should be visible within a week given daily rebalances of two filesystems. -- _ / __// /__ __ http://www.modeemi.fi/~flux/\ \ / /_ / // // /\ \/ /\ / /_/ /_/ \___/ /_/\_\@modeemi.fi \/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix regression when running delayed references
On Thu, Oct 22, 2015 at 3:58 PM, Stéphane Lesimple wrote: > Le 2015-10-22 11:47, Filipe Manana a écrit : >> >> On Thu, Oct 22, 2015 at 10:43 AM, Filipe Manana >> wrote: >>> >>> On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo >>> wrote: wrote on 2015/10/22 09:47 +0100: > > > From: Filipe Manana > > In the kernel 4.2 merge window we had a refactoring/rework of the > delayed > references implementation in order to fix certain problems with > qgroups. > However that rework introduced one more regression that leads to the > following trace when running delayed references for metadata: > > [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! > [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC > [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic > xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace > fscache > sunrpc loop fuse parport_pc psmouse i2 > [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: G > W > 4.3.0-rc5-btrfs-next-17+ #1 > [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS > rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 > [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper > [btrfs] > [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: > 88010c4c8000 > [35908.065201] RIP: 0010:[] [] > insert_inline_extent_backref+0x52/0xb1 [btrfs] > [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 > [35908.065201] RAX: RBX: 88008a661000 RCX: > > [35908.065201] RDX: a04dd58f RSI: 0001 RDI: > > [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: > 88010c4cb9f8 > [35908.065201] R10: R11: 002c R12: > > [35908.065201] R13: 88020a74c578 R14: R15: > > [35908.065201] FS: () GS:88023edc() > knlGS: > [35908.065201] CS: 0010 DS: ES: CR0: 8005003b > [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: > 06e0 > [35908.065201] Stack: > [35908.065201] 88010c4cbb18 0f37 88020a74c578 > 88015a408000 > [35908.065201] 880154a44000 0005 > 88010c4cbbd8 > [35908.065201] a0492b9a 0005 > > [35908.065201] Call Trace: > [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 > [btrfs] > [35908.065201] [] ? > __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] > [35908.065201] [] > __btrfs_run_delayed_refs+0xafa/0xd33 > [btrfs] > [35908.065201] [] ? > join_transaction.isra.10+0x25/0x41f > [btrfs] > [35908.065201] [] ? > join_transaction.isra.10+0xa8/0x41f > [btrfs] > [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd > [btrfs] > [35908.065201] [] delayed_ref_async_start+0x3c/0x7b > [btrfs] > [35908.065201] [] normal_work_helper+0x14c/0x32a > [btrfs] > [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 > [btrfs] > [35908.065201] [] process_one_work+0x24a/0x4ac > [35908.065201] [] worker_thread+0x206/0x2c2 > [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb > [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb > [35908.065201] [] kthread+0xef/0xf7 > [35908.065201] [] ? kthread_parkme+0x24/0x24 > [35908.065201] [] ret_from_fork+0x3f/0x70 > [35908.065201] [] ? kthread_parkme+0x24/0x24 > [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 > 48 > 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 > 02 > <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 > [35908.065201] RIP [] > insert_inline_extent_backref+0x52/0xb1 [btrfs] > [35908.065201] RSP > [35908.310885] ---[ end trace fe4299baf0666457 ]--- > > This happens because the new delayed references code no longer merges > delayed references that have different sequence values. The following > steps are an example sequence leading to this issue: > > 1) Transaction N starts, fs_info->tree_mod_seq has value 0; > > 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 > for > bytenr A is created, with a value of 1 and a seq value of 0; > > 3) fs_info->tree_mod_seq is incremented to 1; > > 4) Extent buffer A is deleted through btrfs_del_items(), which calls > btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The > later returns the metadata extent associated to extent buffer A to > the free space cache (the range is not pinned), because the
Re: [PATCH] Btrfs-progs: fix btrfs-convert rollback to check ROOT_BACKREF
On Sun, Oct 18, 2015 at 07:41:27PM +0800, Qu Wenruo wrote: > 在 2015年10月18日 13:44, Liu Bo 写道: > > Btrfs has changed to delete subvolume/snapshot asynchronously, which means > > that > > after umount itself, if we've already deleted 'ext2_saved', rollback can > > still > > be completed. > > > > So this adds a check for ROOT_BACKREF before checking ROOT_ITEM since > > ROOT_BACKREF is immediately not in the btree after > > ioctl(BTRFS_IOC_SNAP_DESTROY) > > returns. > > > > Signed-off-by: Liu Bo > Reviewed-by: Qu Wenruo > > Looks good to me. > > Although the error message for ret > 0 case can be improved a little, like: > "unable to find convert image subvolume, maybe it's already deleted?\n". I've adjusted the error messages. > BTW, would you please submit a test case for fstests? It won't be a hard > one though. Test added. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix regression when running delayed references
Le 2015-10-22 19:03, Filipe Manana a écrit : On Thu, Oct 22, 2015 at 3:58 PM, Stéphane Lesimple wrote: Le 2015-10-22 11:47, Filipe Manana a écrit : On Thu, Oct 22, 2015 at 10:43 AM, Filipe Manana wrote: On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo wrote: wrote on 2015/10/22 09:47 +0100: From: Filipe Manana In the kernel 4.2 merge window we had a refactoring/rework of the delayed references implementation in order to fix certain problems with qgroups. However that rework introduced one more regression that leads to the following trace when running delayed references for metadata: [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832! [35908.065201] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc psmouse i2 [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1 [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 88010c4c8000 [35908.065201] RIP: 0010:[] [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP: 0018:88010c4cbb08 EFLAGS: 00010293 [35908.065201] RAX: RBX: 88008a661000 RCX: [35908.065201] RDX: a04dd58f RSI: 0001 RDI: [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8 [35908.065201] R10: R11: 002c R12: [35908.065201] R13: 88020a74c578 R14: R15: [35908.065201] FS: () GS:88023edc() knlGS: [35908.065201] CS: 0010 DS: ES: CR0: 8005003b [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0 [35908.065201] Stack: [35908.065201] 88010c4cbb18 0f37 88020a74c578 88015a408000 [35908.065201] 880154a44000 0005 88010c4cbbd8 [35908.065201] a0492b9a 0005 [35908.065201] Call Trace: [35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs] [35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs] [35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 [btrfs] [35908.065201] [] ? join_transaction.isra.10+0x25/0x41f [btrfs] [35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f [btrfs] [35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs] [35908.065201] [] delayed_ref_async_start+0x3c/0x7b [btrfs] [35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs] [35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 [btrfs] [35908.065201] [] process_one_work+0x24a/0x4ac [35908.065201] [] worker_thread+0x206/0x2c2 [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] ? rescuer_thread+0x2cb/0x2cb [35908.065201] [] kthread+0xef/0xf7 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] [] ret_from_fork+0x3f/0x70 [35908.065201] [] ? kthread_parkme+0x24/0x24 [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 45 30 8b 4d 28 45 31 [35908.065201] RIP [] insert_inline_extent_backref+0x52/0xb1 [btrfs] [35908.065201] RSP [35908.310885] ---[ end trace fe4299baf0666457 ]--- This happens because the new delayed references code no longer merges delayed references that have different sequence values. The following steps are an example sequence leading to this issue: 1) Transaction N starts, fs_info->tree_mod_seq has value 0; 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for bytenr A is created, with a value of 1 and a seq value of 0; 3) fs_info->tree_mod_seq is incremented to 1; 4) Extent buffer A is deleted through btrfs_del_items(), which calls btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The later returns the metadata extent associated to extent buffer A to the free space cache (the range is not pinned), because the extent buffer was created in the current transaction (N) and writeback never happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set in the extent buffer). This creates the delayed reference Ref2 for bytenr A, with a value of -1 and a seq value of 1; 5) Delayed reference Ref2 is not merged with Ref1 when we create it, because they have different sequence numbers (decided at add_delayed_ref_tail_merge()); 6) fs_info->tree_mod_seq is incremented to 2; 7) Some task attempts to allocate a new extent buffer (done at extent-t
[PATCH] Btrfs: igrab inode in writepage
We hit this panic on a few of our boxes this week where we have an ordered_extent with an NULL inode. We do an igrab() of the inode in writepages, but weren't doing it in writepage which can be called directly from the VM on dirty pages. If the inode has been unlinked then we could have I_FREEING set which means igrab() would return NULL and we get this panic. Fix this by trying to igrab in btrfs_writepage, and if it returns NULL then just redirty the page and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful. Thanks, Signed-off-by: Josef Bacik --- fs/btrfs/inode.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a0fa725..4d1fdc2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8438,15 +8438,28 @@ int btrfs_readpage(struct file *file, struct page *page) static int btrfs_writepage(struct page *page, struct writeback_control *wbc) { struct extent_io_tree *tree; - + struct inode *inode = page->mapping->host; + int ret; if (current->flags & PF_MEMALLOC) { redirty_page_for_writepage(wbc, page); unlock_page(page); return 0; } + + /* +* If we are under memory pressure we will call this directly from the +* VM, we need to make sure we have the inode referenced for the ordered +* extent. If not just return like we didn't do anything. +*/ + if (!igrab(inode)) { + redirty_page_for_writepage(wbc, page); + return AOP_WRITEPAGE_ACTIVATE; + } tree = &BTRFS_I(page->mapping->host)->io_tree; - return extent_write_full_page(tree, page, btrfs_get_extent, wbc); + ret = extent_write_full_page(tree, page, btrfs_get_extent, wbc); + btrfs_add_delayed_iput(inode); + return ret; } static int btrfs_writepages(struct address_space *mapping, -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Exclusive quota of snapshot exceeded despite no space used
I'm having a weird problem with snapshots and exclusive quotas. After creating a snapshot of a subvolume and setting an exclusive quota of 50MB for the snapshot, everything seems to work fine. I can write approximately 50MB before the quota kicks in. However, if I create a snapshot, set an exclusive quota and just wait for some time, I suddenly cannot even create an empty file because I'm getting a "quota exceeded" error. The time until the bug appears seems to vary. During the waiting time, I'm changing neither the snapshot nor the original subvolume. "qgroup show -e" reports an exclusive use of only a few kilobytes for the snapshot, which is nowhere near the limit. Steps to reproduce (/media/extern is a fresh and empty btrfs partition): Enable quota and create an empty subvolume: root@t420:/media/extern# btrfs quota enable . root@t420:/media/extern# btrfs subvolume create sub Create subvolume './sub' Snapshot the subvolume and set a limit: root@t420:/media/extern# btrfs subvolume snapshot sub snap Create a snapshot of 'sub' in './snap' root@t420:/media/extern# cd snap/ root@t420:/media/extern/snap# btrfs qgroup limit -e 50M . Sometimes it takes "longer" for the quota to kick in, so I'm touching a file every 5 minutes here: root@t420:/media/extern/snap# for file in {1..100}; do touch $file; sleep 5m; done touch: cannot touch ‘7’: Disk quota exceeded ^C root@t420:/media/extern/snap# btrfs qgroup show -e . qgroupid rfer excl max_excl 0/5 16.00KiB 16.00KiB none 0/25716.00KiB 16.00KiB none 0/25816.00KiB 16.00KiB 50.00MiB Any idea why this happens? Thanks, Johannes System info: Linux t420 4.3.0-rc5 #1 SMP Tue Oct 13 13:21:02 CEST 2015 x86_64 GNU/Linux Label: none uuid: 9551e3ca-1608-469c-9d8c-77b99ce0e8ec Total devices 1 FS bytes used 816.00KiB devid1 size 931.51GiB used 2.04GiB path /dev/sdb1 btrfs-progs v4.1.2 Data, single: total=8.00MiB, used=256.00KiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=1.00GiB, used=544.00KiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=16.00MiB, used=0.00B [249174.151820] sdb: sdb1 [249184.387377] sdb: sdb1 [249184.573096] sdb: sdb1 [249184.656274] BTRFS: device fsid 9551e3ca-1608-469c-9d8c-77b99ce0e8ec devid 1 transid 3 /dev/sdb1 [249186.323915] sdb: sdb1 [249186.534505] sdb: sdb1 [249186.538420] sdb: sdb1 [249196.781978] BTRFS info (device sdb1): disk space caching is enabled [249196.781986] BTRFS: has skinny extents [249196.781990] BTRFS: flagging fs with big metadata feature [249196.818164] BTRFS: creating UUID tree [249202.311983] BTRFS info (device sdb1): qgroup scan completed (inconsistency flag cleared) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix regression when running delayed references
[ ... thread cleanup ... ] Don't hesitate to ask if you need me to debug or even ftrace something. Thanks Stéphane. I haven't seen that crash yet (still running tests for 2 consecutive days now). Can you please try the following patch, which works on top of mine, and enable ftrace before running balance: Debug patch: https://friendpaste.com/5s3dItRpcpq3dH1E4KUJor Enable ftrace: $ echo > /sys/kernel/debug/tracing/trace $ echo "nop" > /sys/kernel/debug/tracing/current_tracer $ echo 10 > /sys/kernel/debug/tracing/buffer_size_kb # if you can use larger buffer size, even better $ echo > /sys/kernel/debug/tracing/set_ftrace_filter $ echo 1 > /sys/kernel/debug/tracing/tracing_on $ run balance... wait until it finishes with IO error or the patch's printk message shows up in dmesg/syslog $ echo 0 > /sys/kernel/debug/tracing/tracing_on $ cat /sys/kernel/debug/tracing/trace > some_file.txt Then send is some_file.txt for debugging, hopefully it will give some useful information. Note that it might produce tons of messages, depending on how long it takes for you to hit the BUG_ON. Thanks a lot for this. I'm compiling it now (using your v2 of the friendpaste diff). I took the liberty to add a tracing_off() right before the return -EIO so that the trace tail ends exactly at the right place. Last time I tried to use ftrace to diagnose the bug we're trying to fix, the system crashes so hard that usually it's complicated to get the trace contents written somewhere before the system is unusable. But I'll eventually work around it by using /sys/kernel/debug/tracing/trace_pipe to send the trace live to another machine over the LAN. This series of bugs are so easy to trigger on my system that we'll hopefully get something useful out of the trace. I guess that's a good thing ! So, this time it took a little over an hour to get the crash, but it did reach the -EIO condition eventually. The ftrace log (2M gzipped) is available here : http://www.speed47.net/tmp2/btrfs-4.3rc6p7463161-ftrace1.log.gz The associated kernel log is as follows : [ 2880.178589] INFO: task btrfs-transacti:7358 blocked for more than 120 seconds. [ 2880.178600] Not tainted 4.3.0-rc6p7463161+ #3 [ 2880.178603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3088.829429] Out of memory: Kill process 9449 (df-complex2simp) score 246 or sacrifice child [ 3088.829435] Killed process 9449 (df-complex2simp) total-vm:964732kB, anon-rss:943764kB, file-rss:0kB [ 3600.197642] INFO: task btrfs-transacti:7358 blocked for more than 120 seconds. [ 3600.197657] Not tainted 4.3.0-rc6p7463161+ #3 [ 3600.197660] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3840.204146] INFO: task btrfs-transacti:7358 blocked for more than 120 seconds. [ 3840.204180] Not tainted 4.3.0-rc6p7463161+ #3 [ 3840.204219] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3993.671982] Out of memory: Kill process 11357 (df-complex2simp) score 227 or sacrifice child [ 3993.671989] Killed process 11357 (df-complex2simp) total-vm:891608kB, anon-rss:870704kB, file-rss:60kB [ 4080.210324] INFO: task btrfs-transacti:7358 blocked for more than 120 seconds. [ 4080.210336] Not tainted 4.3.0-rc6p7463161+ #3 [ 4080.210339] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4320.215635] INFO: task btrfs-transacti:7358 blocked for more than 120 seconds. [ 4320.215662] Not tainted 4.3.0-rc6p7463161+ #3 [ 4320.215667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4560.221119] INFO: task btrfs-transacti:7358 blocked for more than 120 seconds. [ 4560.221146] Not tainted 4.3.0-rc6p7463161+ #3 [ 4560.221148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4800.226884] INFO: task btrfs-transacti:7358 blocked for more than 120 seconds. [ 4800.226898] Not tainted 4.3.0-rc6p7463161+ #3 [ 4800.226902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4890.116131] Out of memory: Kill process 13377 (df-complex2simp) score 207 or sacrifice child [ 4890.116138] Killed process 13377 (df-complex2simp) total-vm:834976kB, anon-rss:793272kB, file-rss:48kB [ 5785.793580] Out of memory: Kill process 15285 (df-complex2simp) score 201 or sacrifice child [ 5785.793586] Killed process 15285 (df-complex2simp) total-vm:802208kB, anon-rss:772172kB, file-rss:4kB [ 6480.269728] INFO: task btrfs-transacti:7358 blocked for more than 120 seconds. [ 6480.269738] Not tainted 4.3.0-rc6p7463161+ #3 [ 6480.269740] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7081.967354] BTRFS: here, ref_mod != 1, bytenr 12090260504576, ref_mod 2, seq 0 action 1 [ 7081.967784] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:2872: errno=-5 IO failure The OOM conditions are unrelated, this is an rrdtool cr
Re: [PATCH] Btrfs: fix regression when running delayed references
On Thu, Oct 22, 2015 at 11:38 PM, Stéphane Lesimple wrote: [ ... thread cleanup ... ] Don't hesitate to ask if you need me to debug or even ftrace something. >>> >>> >>> Thanks Stéphane. I haven't seen that crash yet (still running tests >>> for 2 consecutive days now). >>> Can you please try the following patch, which works on top of mine, >>> and enable ftrace before running balance: >>> >>> Debug patch: https://friendpaste.com/5s3dItRpcpq3dH1E4KUJor >>> >>> Enable ftrace: >>> >>> $ echo > /sys/kernel/debug/tracing/trace >>> $ echo "nop" > /sys/kernel/debug/tracing/current_tracer >>> $ echo 10 > /sys/kernel/debug/tracing/buffer_size_kb # if >>> you can use larger buffer size, even better >>> $ echo > /sys/kernel/debug/tracing/set_ftrace_filter >>> $ echo 1 > /sys/kernel/debug/tracing/tracing_on >>> >>> $ run balance... wait until it finishes with IO error or the >>> patch's printk message shows up in dmesg/syslog >>> >>> $ echo 0 > /sys/kernel/debug/tracing/tracing_on >>> >>> $ cat /sys/kernel/debug/tracing/trace > some_file.txt >>> >>> Then send is some_file.txt for debugging, hopefully it will give some >>> useful information. Note that it might produce tons of messages, >>> depending on how long it takes for you to hit the BUG_ON. >>> >>> Thanks a lot for this. >> >> >> I'm compiling it now (using your v2 of the friendpaste diff). >> >> I took the liberty to add a tracing_off() right before the return -EIO >> so that the trace tail ends exactly at the right place. >> >> Last time I tried to use ftrace to diagnose the bug we're trying to >> fix, the system crashes so hard that usually it's complicated to get >> the trace contents written somewhere before the system is unusable. >> But I'll eventually work around it by using >> /sys/kernel/debug/tracing/trace_pipe to send the trace live to another >> machine over the LAN. >> >> This series of bugs are so easy to trigger on my system that we'll >> hopefully get something useful out of the trace. I guess that's a good >> thing ! > > > So, this time it took a little over an hour to get the crash, but it did > reach the -EIO condition eventually. > The ftrace log (2M gzipped) is available here : > http://www.speed47.net/tmp2/btrfs-4.3rc6p7463161-ftrace1.log.gz > > The associated kernel log is as follows : > > [ 2880.178589] INFO: task btrfs-transacti:7358 blocked for more than 120 > seconds. > [ 2880.178600] Not tainted 4.3.0-rc6p7463161+ #3 > [ 2880.178603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 3088.829429] Out of memory: Kill process 9449 (df-complex2simp) score 246 > or sacrifice child > [ 3088.829435] Killed process 9449 (df-complex2simp) total-vm:964732kB, > anon-rss:943764kB, file-rss:0kB > [ 3600.197642] INFO: task btrfs-transacti:7358 blocked for more than 120 > seconds. > [ 3600.197657] Not tainted 4.3.0-rc6p7463161+ #3 > [ 3600.197660] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 3840.204146] INFO: task btrfs-transacti:7358 blocked for more than 120 > seconds. > [ 3840.204180] Not tainted 4.3.0-rc6p7463161+ #3 > [ 3840.204219] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 3993.671982] Out of memory: Kill process 11357 (df-complex2simp) score 227 > or sacrifice child > [ 3993.671989] Killed process 11357 (df-complex2simp) total-vm:891608kB, > anon-rss:870704kB, file-rss:60kB > [ 4080.210324] INFO: task btrfs-transacti:7358 blocked for more than 120 > seconds. > [ 4080.210336] Not tainted 4.3.0-rc6p7463161+ #3 > [ 4080.210339] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 4320.215635] INFO: task btrfs-transacti:7358 blocked for more than 120 > seconds. > [ 4320.215662] Not tainted 4.3.0-rc6p7463161+ #3 > [ 4320.215667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 4560.221119] INFO: task btrfs-transacti:7358 blocked for more than 120 > seconds. > [ 4560.221146] Not tainted 4.3.0-rc6p7463161+ #3 > [ 4560.221148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 4800.226884] INFO: task btrfs-transacti:7358 blocked for more than 120 > seconds. > [ 4800.226898] Not tainted 4.3.0-rc6p7463161+ #3 > [ 4800.226902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 4890.116131] Out of memory: Kill process 13377 (df-complex2simp) score 207 > or sacrifice child > [ 4890.116138] Killed process 13377 (df-complex2simp) total-vm:834976kB, > anon-rss:793272kB, file-rss:48kB > [ 5785.793580] Out of memory: Kill process 15285 (df-complex2simp) score 201 > or sacrifice child > [ 5785.793586] Killed process 15285 (df-complex2simp) total-vm:802208kB, > anon-rss:772172kB, file-rss:4kB > [ 6480.269728] INFO: task btrfs-transacti:7358 blocked for more than 120 > seconds. > [ 6480.269738] Not tainted 4.3.0-rc6p7463161+ #3 > [ 6480.269740]
Crash during mount -o degraded, kernel BUG at fs/btrfs/extent_io.c:2044
Hi again, So I intentionally broke this small raid6 fs on a VM to learn recovery strategies for another much bigger raid6 I have running (which also suffered a drive failure). Basically I zeroed out one of the drives (vdd) from under the running vm. Then ran an md5sum on a file on the fs to trigger some detection of data inconsistency. I ran a scrub, which completed "ok". Then rebooted. Now trying to mount the filesystem in degraded mode leads to a kernel crash. I'm using kernel 4.3-rc6 and btrfs-progs 4.2.3 Linux ubuntu 4.3.0-040300rc6-generic #201510182030 SMP Mon Oct 19 00:31:41 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Label: none uuid: aee28657-3ce0-4efc-9cd3-cc7c58782af3 Total devices 1 FS bytes used 1.87GiB devid1 size 9.52GiB used 2.89GiB path /dev/vda2 warning devid 3 not found already Label: 'boxofkittens' uuid: 4957afbe-e2cb-410c-8d45-3850840898f2 Total devices 9 FS bytes used 3.56GiB devid1 size 1022.00MiB used 716.19MiB path /dev/vdb1 devid2 size 1022.00MiB used 716.19MiB path /dev/vdc1 devid4 size 1022.00MiB used 716.19MiB path /dev/vde1 devid5 size 1022.00MiB used 716.19MiB path /dev/vdf1 devid6 size 1022.00MiB used 716.19MiB path /dev/vdg1 devid7 size 2.00GiB used 1.70GiB path /dev/vdh1 devid8 size 3.00GiB used 1.70GiB path /dev/vdi1 devid9 size 3.00GiB used 1.70GiB path /dev/vdj1 *** Some devices missing btrfs-progs v4.2.3 mount -o degraded /dev/vdb1 /mnt/boxofkittens [ 36.426731] [ cut here ] [ 36.427547] kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent_io.c:2044! [ 36.428686] invalid opcode: [#1] SMP [ 36.429438] Modules linked in: snd_hda_codec_generic iosf_mbi crct10dif_pclmul crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd input_leds joydev snd_hda_intel serio_raw snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore i2c_piix4 parport_pc parport 8250_fintek mac_hid autofs4 btrfs xor raid6_pq cirrus ttm psmouse drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm floppy pata_acpi [ 36.436782] CPU: 0 PID: 86 Comm: kworker/u2:2 Not tainted 4.3.0-040300rc6-generic #201510182030 [ 36.438138] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 36.439648] Workqueue: btrfs-endio btrfs_endio_helper [btrfs] [ 36.440617] task: 880035b4e200 ti: 880035564000 task.ti: 880035564000 [ 36.441778] RIP: 0010:[] [] repair_io_failure+0x1a9/0x1f0 [btrfs] [ 36.443287] RSP: 0018:880035567c20 EFLAGS: 00010246 [ 36.444128] RAX: 88003c7ad000 RBX: 8800363dc7d0 RCX: [ 36.445227] RDX: 1000 RSI: 00027000 RDI: 8800388ce100 [ 36.446315] RBP: 880035567c78 R08: eaddb640 R09: [ 36.447397] R10: 8800363dc980 R11: 88003bd49b00 R12: 00027000 [ 36.448479] R13: 8800388ce000 R14: 8800363dc980 R15: 8800363dc838 [ 36.449553] FS: () GS:88003fc0() knlGS: [ 36.450766] CS: 0010 DS: ES: CR0: 80050033 [ 36.451641] CR2: 02015008 CR3: 3c1be000 CR4: 000406f0 [ 36.452709] Stack: [ 36.453026] 00027000 35567c48 eaddb640 0002b1047000 [ 36.454211] 8800363dc7d0 00027000 [ 36.455513] 8800388ce000 8800363dc980 8800363dc838 880035567ce8 [ 36.456663] Call Trace: [ 36.457043] [] clean_io_failure+0x18d/0x1a0 [btrfs] [ 36.458002] [] end_bio_extent_readpage+0x30a/0x560 [btrfs] [ 36.459662] [] ? btrfs_create_repair_bio+0xe0/0xe0 [btrfs] [ 36.460715] [] bio_endio+0x40/0x60 [ 36.461459] [] end_workqueue_fn+0x3c/0x40 [btrfs] [ 36.462387] [] normal_work_helper+0xc0/0x270 [btrfs] [ 36.463360] [] btrfs_endio_helper+0x12/0x20 [btrfs] [ 36.464314] [] process_one_work+0x14e/0x3d0 [ 36.465158] [] worker_thread+0x11a/0x470 [ 36.466264] [] ? rescuer_thread+0x310/0x310 [ 36.467154] [] kthread+0xc9/0xe0 [ 36.467863] [] ? kthread_park+0x60/0x60 [ 36.468791] [] ret_from_fork+0x3f/0x70 [ 36.470022] [] ? kthread_park+0x60/0x60 [ 36.471334] Code: fe ff ff 48 89 df 41 bf fb ff ff ff e8 21 70 20 c1 31 f6 4c 89 ef e8 07 eb 00 00 e9 d1 fe ff ff 41 bf fb ff ff ff e9 c6 fe ff ff <0f> 0b 0f 0b 49 8b 4d 30 49 8b b6 58 fe ff ff 48 83 c1 10 48 85 [ 36.475278] RIP [] repair_io_failure+0x1a9/0x1f0 [btrfs] [ 36.476256] RSP [ 36.476783] ---[ end trace a06ea60748bbedae ]--- [ 36.481369] BUG: unable to handle kernel paging request at ffd8 [ 36.484441] IP: [] kthread_data+0x10/0x20 [ 36.486710] PGD 1c13067 PUD 1c15067 PMD 0 [ 36.488690] Oops: [#2] SMP [ 36.490516] Modules linked in: snd_hda_codec_generic iosf_mbi crct10dif_pclmul crc32_p
Re: Exclusive quota of snapshot exceeded despite no space used
在 2015年10月23日 04:38, Johannes Henninger 写道: I'm having a weird problem with snapshots and exclusive quotas. After creating a snapshot of a subvolume and setting an exclusive quota of 50MB for the snapshot, everything seems to work fine. I can write approximately 50MB before the quota kicks in. However, if I create a snapshot, set an exclusive quota and just wait for some time, I suddenly cannot even create an empty file because I'm getting a "quota exceeded" error. The time until the bug appears seems to vary. During the waiting time, I'm changing neither the snapshot nor the original subvolume. "qgroup show -e" reports an exclusive use of only a few kilobytes for the snapshot, which is nowhere near the limit. Steps to reproduce (/media/extern is a fresh and empty btrfs partition): Enable quota and create an empty subvolume: root@t420:/media/extern# btrfs quota enable . root@t420:/media/extern# btrfs subvolume create sub Create subvolume './sub' Snapshot the subvolume and set a limit: root@t420:/media/extern# btrfs subvolume snapshot sub snap Create a snapshot of 'sub' in './snap' root@t420:/media/extern# cd snap/ root@t420:/media/extern/snap# btrfs qgroup limit -e 50M . Sometimes it takes "longer" for the quota to kick in, so I'm touching a file every 5 minutes here: root@t420:/media/extern/snap# for file in {1..100}; do touch $file; sleep 5m; done touch: cannot touch ‘7’: Disk quota exceeded ^C root@t420:/media/extern/snap# btrfs qgroup show -e . qgroupid rfer excl max_excl 0/5 16.00KiB 16.00KiB none 0/25716.00KiB 16.00KiB none 0/25816.00KiB 16.00KiB 50.00MiB Any idea why this happens? BTW, to make btrfs qgroup show work, it's better to call sync before qgroup show. It's a known bug that even after qgroup accounting rework, qgroup reserve still has bug and can cause reserved space to underflow, making such problem happen. For such case, btrfs qgroup show won't help as reserved space is not shown in the output. One workaround would be, umount the filesystem and mount again. Which will reset the underflow reserved space and work for sometime. If it's OK for you to recompile the kernel, you can try the following patchset: [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Which should solve the problem. Thanks, Qu Thanks, Johannes System info: Linux t420 4.3.0-rc5 #1 SMP Tue Oct 13 13:21:02 CEST 2015 x86_64 GNU/Linux Label: none uuid: 9551e3ca-1608-469c-9d8c-77b99ce0e8ec Total devices 1 FS bytes used 816.00KiB devid1 size 931.51GiB used 2.04GiB path /dev/sdb1 btrfs-progs v4.1.2 Data, single: total=8.00MiB, used=256.00KiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=1.00GiB, used=544.00KiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=16.00MiB, used=0.00B [249174.151820] sdb: sdb1 [249184.387377] sdb: sdb1 [249184.573096] sdb: sdb1 [249184.656274] BTRFS: device fsid 9551e3ca-1608-469c-9d8c-77b99ce0e8ec devid 1 transid 3 /dev/sdb1 [249186.323915] sdb: sdb1 [249186.534505] sdb: sdb1 [249186.538420] sdb: sdb1 [249196.781978] BTRFS info (device sdb1): disk space caching is enabled [249196.781986] BTRFS: has skinny extents [249196.781990] BTRFS: flagging fs with big metadata feature [249196.818164] BTRFS: creating UUID tree [249202.311983] BTRFS info (device sdb1): qgroup scan completed (inconsistency flag cleared) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] btrfs/ioctl.c: Prefer inode with lowest offset as source for clone
On Tue, Oct 20, 2015 at 04:29:46PM +0300, Timofey Titovets wrote: > For performance reason, leave data at the start of disk, is preferable > while deduping > It's might sense for the reasons: > 1. Spinning rust - start of the disk is much faster > 2. Btrfs can deallocate empty data chunk from the end of fs - ie it's compact > fs "src" is the extent that is kept, and "dst" is the extent that is discarded. When both extents are shared, the dedup userspace has to pass a common "src" with many different "dst" over several extent-same calls in order to get rid of all of the references to the "dst" extent. If "src" and "dst" are arbitrarily swapped over multiple extent-same calls then it becomes impossible to dedup shared extents. Heck, if there are more than two extents even in one extent-same call then it stops working. It would be possible to have dedup figure out which extent the kernel picked after the fact, but that's totally unnecessary extra work in cases where the userspace has a good reason to pick the extents it did (e.g. administrator hints about future usage of the files where the extents were found). Dedup userspace can figure out the physical addresses of the extents and rearrange the arguments itself if desired. > Signed-off-by: Timofey Titovets > --- > fs/btrfs/ioctl.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 3e3e613..3eb77c0 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src, > u64 loff, u64 olen, > > /* pass original length for comparison so we stay within i_size */ > ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp); > - if (ret == 0) > - ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1); > + if (ret == 0) { > + /* prefer inode with lowest offset as source for clone*/ > + if (loff > dest_loff) > + ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1); > + else > + ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1); > + } > > if (same_inode) > unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start, > -- > 2.6.1 > From 5ed3822bc308c726d91a837fbd97ebacaa51e58d Mon Sep 17 00:00:00 2001 > From: Timofey Titovets > Date: Tue, 20 Oct 2015 15:53:20 +0300 > Subject: [RFC PATCH] btrfs/ioctl.c: Prefer inode with lowest offset as source > for > clone > > For performance reason, leave data at the start of disk, is preferable > > Signed-off-by: Timofey Titovets > --- > fs/btrfs/ioctl.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 3e3e613..3eb77c0 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src, u64 > loff, u64 olen, > > /* pass original length for comparison so we stay within i_size */ > ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp); > - if (ret == 0) > - ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1); > + if (ret == 0) { > + /* prefer inode with lowest offset as source for clone*/ > + if (loff > dest_loff) > + ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, > 1); > + else > + ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, > 1); > + } > > if (same_inode) > unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start, > -- > 2.6.1 > signature.asc Description: Digital signature
[PATCH v4] btrfs: qgroup: Don't copy extent buffer to do qgroup rescan
Ancient qgroup code call memcpy() on a extent buffer and use it for leaf iteration. As extent buffer contains lock, pointers to pages, it's never sane to do such copy. The following bug may be caused by this insane operation: [92098.841309] general protection fault: [#1] SMP [92098.841338] Modules linked in: ... [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted 4.3.0-rc1 #1 [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper [btrfs] [92098.842261] Call Trace: [92098.842277] [] ? read_extent_buffer+0xb8/0x110 [btrfs] [92098.842304] [] ? btrfs_find_all_roots+0x60/0x70 [btrfs] [92098.842329] [] btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs] Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(), called in reading key from the copied extent_buffer. This patch will use btrfs_clone_extent_buffer() to a better copy of extent buffer to deal such case. Reported-by: Stephane Lesimple Suggested-by: Filipe Manana Signed-off-by: Qu Wenruo --- v2: Follow the parameter change in previous patch. v3: None v4: Use btrfs_clone_extent_buffer() other than introducing new facilities --- fs/btrfs/qgroup.c | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 158633c..5534629 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2192,10 +2192,10 @@ void assert_qgroups_uptodate(struct btrfs_trans_handle *trans) */ static int qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, - struct btrfs_trans_handle *trans, - struct extent_buffer *scratch_leaf) + struct btrfs_trans_handle *trans) { struct btrfs_key found; + struct extent_buffer *scratch_leaf = NULL; struct ulist *roots = NULL; struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem); u64 num_bytes; @@ -2233,9 +2233,17 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, fs_info->qgroup_rescan_progress.objectid = found.objectid + 1; btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem); - memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf)); - slot = path->slots[0]; + scratch_leaf = btrfs_clone_extent_buffer(path->nodes[0]); + if (!scratch_leaf) { + ret = -ENOMEM; + mutex_unlock(&fs_info->qgroup_rescan_lock); + goto out; + } + extent_buffer_get(scratch_leaf); + btrfs_tree_read_lock(scratch_leaf); + btrfs_set_lock_blocking_rw(scratch_leaf, BTRFS_READ_LOCK); btrfs_release_path(path); + slot = path->slots[0]; mutex_unlock(&fs_info->qgroup_rescan_lock); for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) { @@ -2259,6 +2267,10 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, goto out; } out: + if (scratch_leaf) { + btrfs_tree_read_unlock_blocking(scratch_leaf); + free_extent_buffer(scratch_leaf); + } btrfs_put_tree_mod_seq(fs_info, &tree_mod_seq_elem); return ret; @@ -2270,16 +2282,12 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) qgroup_rescan_work); struct btrfs_path *path; struct btrfs_trans_handle *trans = NULL; - struct extent_buffer *scratch_leaf = NULL; int err = -ENOMEM; int ret = 0; path = btrfs_alloc_path(); if (!path) goto out; - scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS); - if (!scratch_leaf) - goto out; err = 0; while (!err) { @@ -2291,8 +2299,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) if (!fs_info->quota_enabled) { err = -EINTR; } else { - err = qgroup_rescan_leaf(fs_info, path, trans, -scratch_leaf); + err = qgroup_rescan_leaf(fs_info, path, trans); } if (err > 0) btrfs_commit_transaction(trans, fs_info->fs_root); @@ -2301,7 +2308,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) } out: - kfree(scratch_leaf); btrfs_free_path(path); mutex_lock(&fs_info->qgroup_rescan_lock); -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: igrab inode in writepage
On 10/23/2015 03:05 AM, Josef Bacik wrote: We hit this panic on a few of our boxes this week where we have an ordered_extent with an NULL inode. We do an igrab() of the inode in writepages, but weren't doing it in writepage which can be called directly from the VM on dirty pages. If the inode has been unlinked then we could have I_FREEING set which means igrab() would return NULL and we get this panic. Fix this by trying to igrab in btrfs_writepage, and if it returns NULL then just redirty the page and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful. Thanks, Reviewed-by: Liu Bo thanks, -Liubo Signed-off-by: Josef Bacik --- fs/btrfs/inode.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a0fa725..4d1fdc2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8438,15 +8438,28 @@ int btrfs_readpage(struct file *file, struct page *page) static int btrfs_writepage(struct page *page, struct writeback_control *wbc) { struct extent_io_tree *tree; - + struct inode *inode = page->mapping->host; + int ret; if (current->flags & PF_MEMALLOC) { redirty_page_for_writepage(wbc, page); unlock_page(page); return 0; } + + /* +* If we are under memory pressure we will call this directly from the +* VM, we need to make sure we have the inode referenced for the ordered +* extent. If not just return like we didn't do anything. +*/ + if (!igrab(inode)) { + redirty_page_for_writepage(wbc, page); + return AOP_WRITEPAGE_ACTIVATE; + } tree = &BTRFS_I(page->mapping->host)->io_tree; - return extent_write_full_page(tree, page, btrfs_get_extent, wbc); + ret = extent_write_full_page(tree, page, btrfs_get_extent, wbc); + btrfs_add_delayed_iput(inode); + return ret; } static int btrfs_writepages(struct address_space *mapping, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html