Re: [PATCH] xfstests: btrfs/012: add a regression test for deleting ext2_saved

2015-10-22 Thread Eryu Guan
On Tue, Oct 20, 2015 at 07:34:06PM +0800, Liu Bo wrote:
> Btrfs now has changed to delete subvolume/snapshot asynchronously,
> which means that after umount, if we've already deleted 'ext2_saved',
> rollback can still be completed, which should not.
> 
> So this adds a regression test for this.
> 
> Signed-off-by: Liu Bo 

I'm not sure if this belongs to a new test, but given that this test
has very similar steps to existing tests, so I think that's fine.

Reviewed-by: Eryu Guan 

> ---
>  tests/btrfs/012 | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/tests/btrfs/012 b/tests/btrfs/012
> index d513759..b39dec0 100755
> --- a/tests/btrfs/012
> +++ b/tests/btrfs/012
> @@ -112,6 +112,18 @@ diff -r /lib/modules/`uname -r`/ $SCRATCH_MNT/`uname 
> -r`/ 2>&1 | grep -vw "sourc
>  
>  _scratch_unmount
>  
> +# Convert it to btrfs, mount it and delete "ext2_saved"
> +$BTRFS_CONVERT_PROG $SCRATCH_DEV >> $seqres.full 2>&1 || \
> + _fail "btrfs-convert failed"
> +_scratch_mount || _fail "Could not mount new btrfs fs"
> +$BTRFS_UTIL_PROG subvolume delete $SCRATCH_MNT/ext2_saved >> $seqres.full 
> 2>&1 ||
> + _fail "failed to delete ext2_saved subvolume"
> +_scratch_unmount
> +
> +# Now restore the ext4 device, expecting a failure
> +$BTRFS_CONVERT_PROG -r $SCRATCH_DEV >> $seqres.full 2>&1
> +[ $? -eq 1 ] || _fail "Failure is expected, but btrfs-convert returns with 
> rollback complete"
> +
>  # success, all done
>  status=0
>  exit
> -- 
> 1.8.2.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 RESENT 2/2] btrfs: qgroup: Don't copy extent buffer to do qgroup rescan

2015-10-22 Thread Filipe Manana
On Thu, Oct 22, 2015 at 1:42 AM, Qu Wenruo  wrote:
> Ancient qgroup code call memcpy() on a extent buffer and use it for leaf
> iteration.
>
> As extent buffer contains lock, pointers to pages, it's never sane to do
> such copy.
>
> The following bug may be caused by this insane operation:
> [92098.841309] general protection fault:  [#1] SMP
> [92098.841338] Modules linked in: ...
> [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
> 4.3.0-rc1 #1
> [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
> [btrfs]
> [92098.842261] Call Trace:
> [92098.842277]  [] ? read_extent_buffer+0xb8/0x110
> [btrfs]
> [92098.842304]  [] ? btrfs_find_all_roots+0x60/0x70
> [btrfs]
> [92098.842329]  []
> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>
> Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(),
> called in reading key from the memcpied extent_buffer.
>
> This patch will read the whole leaf into memory, and use newly
> introduced stack function to do qgroup rescan.

Hi Qu,

Instead of introducing more new functions, why not clone the extent
buffer (btrfs_clone_extent_buffer) and then use it the
regular/existing functions? Iow, the same as we do in backref walking,
should make the change much smaller than it is.

thanks

>
> Reported-by: Stephane Lesimple 
> Signed-off-by: Qu Wenruo 
> ---
> v2:
>   Follow the parameter change in previous patch.
> v3:
>   None
> ---
>  fs/btrfs/qgroup.c | 22 --
>  1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index e9ace09..6a83a40 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2183,11 +2183,11 @@ void assert_qgroups_uptodate(struct 
> btrfs_trans_handle *trans)
>   */
>  static int
>  qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
> -  struct btrfs_trans_handle *trans,
> -  struct extent_buffer *scratch_leaf)
> +  struct btrfs_trans_handle *trans, char *stack_leaf)
>  {
> struct btrfs_key found;
> struct ulist *roots = NULL;
> +   struct btrfs_header *header;
> struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem);
> u64 num_bytes;
> int slot;
> @@ -2224,13 +2224,15 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, 
> struct btrfs_path *path,
> fs_info->qgroup_rescan_progress.objectid = found.objectid + 1;
>
> btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem);
> -   memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf));
> +   read_extent_buffer(path->nodes[0], stack_leaf, 0,
> +  fs_info->extent_root->nodesize);
> +   header = (struct btrfs_header *)stack_leaf;
> slot = path->slots[0];
> btrfs_release_path(path);
> mutex_unlock(&fs_info->qgroup_rescan_lock);
>
> -   for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) {
> -   btrfs_item_key_to_cpu(scratch_leaf, &found, slot);
> +   for (; slot < btrfs_stack_header_nritems(header); ++slot) {
> +   btrfs_stack_item_key_to_cpu(header, &found, slot);
> if (found.type != BTRFS_EXTENT_ITEM_KEY &&
> found.type != BTRFS_METADATA_ITEM_KEY)
> continue;
> @@ -2261,15 +2263,15 @@ static void btrfs_qgroup_rescan_worker(struct 
> btrfs_work *work)
>  qgroup_rescan_work);
> struct btrfs_path *path;
> struct btrfs_trans_handle *trans = NULL;
> -   struct extent_buffer *scratch_leaf = NULL;
> +   char *stack_leaf = NULL;
> int err = -ENOMEM;
> int ret = 0;
>
> path = btrfs_alloc_path();
> if (!path)
> goto out;
> -   scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS);
> -   if (!scratch_leaf)
> +   stack_leaf = kmalloc(fs_info->extent_root->nodesize, GFP_NOFS);
> +   if (!stack_leaf)
> goto out;
>
> err = 0;
> @@ -2283,7 +2285,7 @@ static void btrfs_qgroup_rescan_worker(struct 
> btrfs_work *work)
> err = -EINTR;
> } else {
> err = qgroup_rescan_leaf(fs_info, path, trans,
> -scratch_leaf);
> +stack_leaf);
> }
> if (err > 0)
> btrfs_commit_transaction(trans, fs_info->fs_root);
> @@ -2292,7 +2294,7 @@ static void btrfs_qgroup_rescan_worker(struct 
> btrfs_work *work)
> }
>
>  out:
> -   kfree(scratch_leaf);
> +   kfree(stack_leaf);
> btrfs_free_path(path);
>
> mutex_lock(&fs_info->qgroup_rescan_lock);
> --
> 2.6.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http:

[4.3-rc4] scrubbing aborts before finishing

2015-10-22 Thread Martin Steigerwald
Hi!

I get this:

merkaba:~> btrfs scrub status -d /   
scrub status for […]
scrub device /dev/mapper/sata-debian (id 1) history
scrub started at Thu Oct 22 10:05:49 2015 and was aborted after 00:00:00
total bytes scrubbed: 0.00B with 0 errors
scrub device /dev/dm-2 (id 2) history
scrub started at Thu Oct 22 10:05:49 2015 and was aborted after 00:01:30
total bytes scrubbed: 23.81GiB with 0 errors

For / scrub aborts for sata SSD immediately.

For /home scrub aborts for both SSDs at some time.

merkaba:~> btrfs scrub status -d /home
scrub status for […]
scrub device /dev/mapper/msata-home (id 1) history
scrub started at Thu Oct 22 10:09:37 2015 and was aborted after 00:01:31
total bytes scrubbed: 22.03GiB with 0 errors
scrub device /dev/dm-3 (id 2) history
scrub started at Thu Oct 22 10:09:37 2015 and was aborted after 00:03:34
total bytes scrubbed: 53.30GiB with 0 errors

Also single volume BTRFS is affected:

merkaba:~> btrfs scrub status /daten
scrub status for […]
scrub started at Thu Oct 22 10:36:38 2015 and was aborted after 00:00:00
total bytes scrubbed: 0.00B with 0 errors


No errors in dmesg, btrfs device stat or smartctl -a.

Any known issue?

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 RESENT 2/2] btrfs: qgroup: Don't copy extent buffer to do qgroup rescan

2015-10-22 Thread Qu Wenruo



Filipe Manana wrote on 2015/10/22 09:16 +0100:

On Thu, Oct 22, 2015 at 1:42 AM, Qu Wenruo  wrote:

Ancient qgroup code call memcpy() on a extent buffer and use it for leaf
iteration.

As extent buffer contains lock, pointers to pages, it's never sane to do
such copy.

The following bug may be caused by this insane operation:
[92098.841309] general protection fault:  [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
4.3.0-rc1 #1
[92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
[btrfs]
[92098.842261] Call Trace:
[92098.842277]  [] ? read_extent_buffer+0xb8/0x110
[btrfs]
[92098.842304]  [] ? btrfs_find_all_roots+0x60/0x70
[btrfs]
[92098.842329]  []
btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]

Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(),
called in reading key from the memcpied extent_buffer.

This patch will read the whole leaf into memory, and use newly
introduced stack function to do qgroup rescan.


Hi Qu,

Instead of introducing more new functions, why not clone the extent
buffer (btrfs_clone_extent_buffer) and then use it the
regular/existing functions? Iow, the same as we do in backref walking,
should make the change much smaller than it is.

thanks


Thanks Filipe,

I didn't know there is such a nice function.
And it's setting EXTENT_BUFFER_DUMMY, so it should be quite safe for the 
use case.


Thanks for your advice a lot!
Qu



Reported-by: Stephane Lesimple 
Signed-off-by: Qu Wenruo 
---
v2:
   Follow the parameter change in previous patch.
v3:
   None
---
  fs/btrfs/qgroup.c | 22 --
  1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index e9ace09..6a83a40 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2183,11 +2183,11 @@ void assert_qgroups_uptodate(struct btrfs_trans_handle 
*trans)
   */
  static int
  qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
-  struct btrfs_trans_handle *trans,
-  struct extent_buffer *scratch_leaf)
+  struct btrfs_trans_handle *trans, char *stack_leaf)
  {
 struct btrfs_key found;
 struct ulist *roots = NULL;
+   struct btrfs_header *header;
 struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem);
 u64 num_bytes;
 int slot;
@@ -2224,13 +2224,15 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, 
struct btrfs_path *path,
 fs_info->qgroup_rescan_progress.objectid = found.objectid + 1;

 btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem);
-   memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf));
+   read_extent_buffer(path->nodes[0], stack_leaf, 0,
+  fs_info->extent_root->nodesize);
+   header = (struct btrfs_header *)stack_leaf;
 slot = path->slots[0];
 btrfs_release_path(path);
 mutex_unlock(&fs_info->qgroup_rescan_lock);

-   for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) {
-   btrfs_item_key_to_cpu(scratch_leaf, &found, slot);
+   for (; slot < btrfs_stack_header_nritems(header); ++slot) {
+   btrfs_stack_item_key_to_cpu(header, &found, slot);
 if (found.type != BTRFS_EXTENT_ITEM_KEY &&
 found.type != BTRFS_METADATA_ITEM_KEY)
 continue;
@@ -2261,15 +2263,15 @@ static void btrfs_qgroup_rescan_worker(struct 
btrfs_work *work)
  qgroup_rescan_work);
 struct btrfs_path *path;
 struct btrfs_trans_handle *trans = NULL;
-   struct extent_buffer *scratch_leaf = NULL;
+   char *stack_leaf = NULL;
 int err = -ENOMEM;
 int ret = 0;

 path = btrfs_alloc_path();
 if (!path)
 goto out;
-   scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS);
-   if (!scratch_leaf)
+   stack_leaf = kmalloc(fs_info->extent_root->nodesize, GFP_NOFS);
+   if (!stack_leaf)
 goto out;

 err = 0;
@@ -2283,7 +2285,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work 
*work)
 err = -EINTR;
 } else {
 err = qgroup_rescan_leaf(fs_info, path, trans,
-scratch_leaf);
+stack_leaf);
 }
 if (err > 0)
 btrfs_commit_transaction(trans, fs_info->fs_root);
@@ -2292,7 +2294,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work 
*work)
 }

  out:
-   kfree(scratch_leaf);
+   kfree(stack_leaf);
 btrfs_free_path(path);

 mutex_lock(&fs_info->qgroup_rescan_lock);
--
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vge

[PATCH] Btrfs: change the initialization point of fs_root in open_ctree()

2015-10-22 Thread Tsutomu Itoh
Kernel panic occurred due to NULL pointer reference in can_overcommit().
Because btrfs_async_reclaim_metadata_space() passed NULL pointer to
btrfs_calc_reclaim_metadata_size().


[ 3756.152833] BUG: unable to handle kernel NULL pointer dereference at 
01f0
[ 3756.152882] IP: [] can_overcommit+0x21/0xf0 [btrfs]
[ 3756.152936] PGD 0
[ 3756.152949] Oops:  [#1] SMP
[ 3756.152969] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 
xt_conntrack ebtable_filter ebtable_broute bridge stp llc ebtable_nat 
ebtables ip6table_mangle ip6table_raw ip6table_security ip6table_nat 
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables 
iptable_mangle iptable_raw iptable_security iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack coretemp kvm_intel kvm crc32
_pclmul iTCO_wdt iTCO_vendor_support microcode ipmi_si lpc_ich mfd_core pcspkr 
acpi_power_meter ipmi_msghandler i2c_i801 i7core_edac shpchp edac_core 
nfsd acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel btrfs xor 
raid6_pq usb_storage mgag200 drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops ttm drm igb ptp ata_generic pps_core pata_acpi 
crc32c_intel
[ 3756.153397]  dca megaraid_sas i2c_algo_bit ata_piix i2c_core
[ 3756.153433] CPU: 3 PID: 3004 Comm: kworker/u25:4 Tainted: G  I 
4.3.0-rc6 #1
[ 3756.153469] Hardware name: FUJITSU-SV   PRIMERGY RX300 
S6 /D2619, BIOS 6.00 Rev. 1.09.2619.N1   12/13/2010
[ 3756.153537] Workqueue: events_unbound btrfs_async_reclaim_metadata_space 
[btrfs]
[ 3756.153571] task: 88023581a400 ti: 880234648000 task.ti: 
880234648000
[ 3756.153604] RIP: 0010:[]  [] 
can_overcommit+0x21/0xf0 [btrfs]
[ 3756.153655] RSP: 0018:88023464bda8  EFLAGS: 00010282
[ 3756.153679] RAX: 0100 RBX: 880431f68c00 RCX: 0002
[ 3756.153711] RDX: 00c0 RSI:  RDI: 
[ 3756.153742] RBP: 88023464bde0 R08: 0101 R09: 000c
[ 3756.153773] R10: 81d10060 R11: 81d10050 R12: 880431f68c00
[ 3756.153804] R13:  R14: 880035f67070 R15: 00c0
[ 3756.153836] FS:  () GS:880237cc() 
knlGS:
[ 3756.153871] CS:  0010 DS:  ES:  CR0: 8005003b
[ 3756.153897] CR2: 01f0 CR3: 01c08000 CR4: 06e0
[ 3756.153929] Stack:
[ 3756.153940]  8802 880237cd2940 880431f68c00 

[ 3756.153979]  00c0 880035f67070  
88023464be20
[ 3756.154016]  a01e5404 880431f68c80 880234482240 
8802378a1800
[ 3756.154054] Call Trace:
[ 3756.154081]  [] 
btrfs_async_reclaim_metadata_space+0xb4/0x210 [btrfs]
[ 3756.154119]  [] process_one_work+0x19e/0x3d0
[ 3756.154146]  [] worker_thread+0x4e/0x450
[ 3756.154174]  [] ? __schedule+0x2b9/0x930
[ 3756.154199]  [] ? process_one_work+0x3d0/0x3d0
[ 3756.154227]  [] ? process_one_work+0x3d0/0x3d0
[ 3756.154255]  [] kthread+0xc9/0xe0
[ 3756.154279]  [] ? kthread_worker_fn+0x160/0x160
[ 3756.154307]  [] ret_from_fork+0x3f/0x70
[ 3756.154333]  [] ? kthread_worker_fn+0x160/0x160
[ 3756.154361] Code: a5 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 
41 57 41 56 41 55 41 54 49 89 f4 53 31 f6 49 89 fd 49 89 d7 48 83 ec 10 
<4c> 8b b7 f0 01 00 00 89 4d cc 49 3b 7e 30 40 0f 95 c6 48 8d 74
[ 3756.156802] RIP  [] can_overcommit+0x21/0xf0 [btrfs]
[ 3756.157995]  RSP 
[ 3756.159162] CR2: 01f0


fs_info->fs_root is referred in btrfs_async_reclaim_metadata_space()
when mount kicked kworker(btrfs_async_reclaim_metadata_space).

But at this time, fs_info->fs_root had not been initialized yet,
so NULL pointer passed to btrfs_calc_reclaim_metadata_size().


PID: 3045   TASK: 8800bb06b000  CPU: 2   COMMAND: "mount"
[exception RIP: queued_spin_lock_slowpath+350]
RIP: 810be2de  RSP: 8800b9fdb738  RFLAGS: 0202
RAX: 0101  RBX: 880431f68c00  RCX: 0001
RDX: 0101  RSI: 0001  RDI: 880431f68c00
RBP: 8800b9fdb738   R8: 0101   R9: 
R10: 4000  R11: 00018e58  R12: 0001
R13: 8800b9fdb7c0  R14: 8800bb06b000  R15: 0001
CS: 0010  SS: 0018
 #0 [8800b9fdb740] _raw_spin_lock at 81694ff0
 #1 [8800b9fdb750] reserve_metadata_bytes at a01e55cc [btrfs]
 #2 [8800b9fdb800] btrfs_block_rsv_add at a01e5a93 [btrfs]
 #3 [8800b9fdb828] btrfs_truncate_inode_items at a0202779 [btrfs]
 #4 [8800b9fdb920] btrfs_evict_inode at a02040ec [btrfs]
 #5 [8800b9fdb990] evict at 811ed6ea
 #6 [880

[PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread fdmanana
From: Filipe Manana 

In the kernel 4.2 merge window we had a refactoring/rework of the delayed
references implementation in order to fix certain problems with qgroups.
However that rework introduced one more regression that leads to the
following trace when running delayed references for metadata:

[35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
[35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor 
raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
loop fuse parport_pc psmouse i2
[35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW   
4.3.0-rc5-btrfs-next-17+ #1
[35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 
88010c4c8000
[35908.065201] RIP: 0010:[]  [] 
insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
[35908.065201] RAX:  RBX: 88008a661000 RCX: 
[35908.065201] RDX: a04dd58f RSI: 0001 RDI: 
[35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8
[35908.065201] R10:  R11: 002c R12: 
[35908.065201] R13: 88020a74c578 R14:  R15: 
[35908.065201] FS:  () GS:88023edc() 
knlGS:
[35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
[35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0
[35908.065201] Stack:
[35908.065201]  88010c4cbb18 0f37 88020a74c578 
88015a408000
[35908.065201]  880154a44000  0005 
88010c4cbbd8
[35908.065201]  a0492b9a 0005  

[35908.065201] Call Trace:
[35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs]
[35908.065201]  [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 
[btrfs]
[35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33 
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f 
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f 
[btrfs]
[35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs]
[35908.065201]  [] delayed_ref_async_start+0x3c/0x7b [btrfs]
[35908.065201]  [] normal_work_helper+0x14c/0x32a [btrfs]
[35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14 [btrfs]
[35908.065201]  [] process_one_work+0x24a/0x4ac
[35908.065201]  [] worker_thread+0x206/0x2c2
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] kthread+0xef/0xf7
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201]  [] ret_from_fork+0x3f/0x70
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 
4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 
4c 8b 45 30 8b 4d 28 45 31
[35908.065201] RIP  [] insert_inline_extent_backref+0x52/0xb1 
[btrfs]
[35908.065201]  RSP 
[35908.310885] ---[ end trace fe4299baf0666457 ]---

This happens because the new delayed references code no longer merges
delayed references that have different sequence values. The following
steps are an example sequence leading to this issue:

1) Transaction N starts, fs_info->tree_mod_seq has value 0;

2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
   bytenr A is created, with a value of 1 and a seq value of 0;

3) fs_info->tree_mod_seq is incremented to 1;

4) Extent buffer A is deleted through btrfs_del_items(), which calls
   btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
   later returns the metadata extent associated to extent buffer A to
   the free space cache (the range is not pinned), because the extent
   buffer was created in the current transaction (N) and writeback never
   happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
   in the extent buffer).
   This creates the delayed reference Ref2 for bytenr A, with a value
   of -1 and a seq value of 1;

5) Delayed reference Ref2 is not merged with Ref1 when we create it,
   because they have different sequence numbers (decided at
   add_delayed_ref_tail_merge());

6) fs_info->tree_mod_seq is incremented to 2;

7) Some task attempts to allocate a new extent buffer (done at
   extent-tree.c:find_free_extent()), but due to heavy fragmentation
   and running low on metadata space the clustered allocation fails
   and we fall back to unclustered allocation, which finds the
   extent at offset A, so a new extent buffer at offset A is allocated.
   This creates delayed reference Ref3 for bytenr A, with a value of -1
   

Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing

2015-10-22 Thread Filipe Manana
On Thu, Oct 22, 2015 at 6:32 AM, Erkki Seppala  wrote:
> Hello,
>
> Recently I added daily rebalancing to my cron.d (after finding myself in
> the no-space-situation), and not long after that, I found my PC had
> crashed over night. Having no sign in the logs anywhere (not even over
> network even though there should be) I had nothing to go on, but this
> night it crashed again after starting the rebalance, and this time there
> was some information on the kernel log.
>
> Kernel version: 4.2.3 (package linux-image-4.2.0-1-amd64 version 4.2.3-1
> from Debian Unstable)
>
> The dump is available at:
>
>   http://www.modeemi.fi/~flux/btrfs/btrfs-BUG-2015-10-55.txt
>
> The log is available as well (stripped some unrelated USB- and firewall
> logging, showing that last evening there was some kernel task hung for
> 120 seconds; but it's in another btrfs filesystem and is another story):
>
>   http://www.modeemi.fi/~flux/btrfs/btrfs-2015-10-55.txt
>
> I'm not quite sure which of the btrfs balance commands caused the
> issue. But there is my script:
>
> #!/bin/sh
> fs="$1"
> if [ -z "$fs" ]; then
>   echo usage: btrfs-balance / 0 1 5 10 20 50
>   exit 1
> fi
> fs="$1"
> shift
> for usage in d m; do for a in "$@"; do date; /bin/btrfs balance start
> "$fs" -v -${usage}usage=$a; done; done
>
> And it was started at 07:30 with:
>
>   /usr/local/sbin/btrfs-balance / 0 1 2 5 10 20 30 50 70
>
> I should add that the filesystem in question is backed by MD RAID10 and
> that is backed by four SSDs, so it's reasonably fast in IO, if that
> affects anything. There should have been no much competing IO at the
> time of the occurrence.
>
> Before Duncan asks ;-), I only have a moderate number of subvolumes and
> snapshots, ie. one subvolume for each of /, /var/log/journal and /home,
> 24 snapshots of / and /home plus <10 snapshots of /.
>
> Before that balance there was another balance on a another BTRFS RAID10,
> but given the time stamp I think I can easily say it wasn't the cause.
>
> I don't really have other 'solutions' than disabling the rebalancing for
> the time being, and only use it as-needed as I had earlier done..

Try this (just sent a few minutes ago):
https://patchwork.kernel.org/patch/7463161/

thanks

>
> Cheers,
>
> --
>   _
>  / __// /__   __   http://www.modeemi.fi/~flux/\   \
> / /_ / // // /\ \/ /\  /
>/_/  /_/ \___/ /_/\_\@modeemi.fi  \/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: add balance filters limits, stripes and usage to supported mask

2015-10-22 Thread David Sterba
Enable the extended 'limit' syntax (a range), the new 'stripes' and
extended 'usage' syntax (a range) filters in the filters mask. The patch
comes separate and not within the series that introduced the new filters
because the patch adding the mask was merged in a late rc. The
integration branch was based on an older rc and could not merge the
patch due to the missing changes.

Prerequisities:
* btrfs: check unsupported filters in balance arguments
* btrfs: extend balance filter limit to take minimum and maximum
* btrfs: add balance filter for stripes
* btrfs: extend balance filter usage to take minimum and maximum

Signed-off-by: David Sterba 
---

 fs/btrfs/volumes.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 90ef3e722b72..6abd2dd346b3 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -385,7 +385,10 @@ struct map_lookup {
 BTRFS_BALANCE_ARGS_DEVID | \
 BTRFS_BALANCE_ARGS_DRANGE |\
 BTRFS_BALANCE_ARGS_VRANGE |\
-BTRFS_BALANCE_ARGS_LIMIT)
+BTRFS_BALANCE_ARGS_LIMIT | \
+BTRFS_BALANCE_ARGS_LIMIT_RANGE |   \
+BTRFS_BALANCE_ARGS_STRIPES_RANGE | \
+BTRFS_BALANCE_ARGS_USAGE_RANGE)
 
 /*
  * Profile changing flags.  When SOFT is set we won't relocate chunk if
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Qu Wenruo



 wrote on 2015/10/22 09:47 +0100:

From: Filipe Manana 

In the kernel 4.2 merge window we had a refactoring/rework of the delayed
references implementation in order to fix certain problems with qgroups.
However that rework introduced one more regression that leads to the
following trace when running delayed references for metadata:

[35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
[35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor 
raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
loop fuse parport_pc psmouse i2
[35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW   
4.3.0-rc5-btrfs-next-17+ #1
[35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 
88010c4c8000
[35908.065201] RIP: 0010:[]  [] 
insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
[35908.065201] RAX:  RBX: 88008a661000 RCX: 
[35908.065201] RDX: a04dd58f RSI: 0001 RDI: 
[35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8
[35908.065201] R10:  R11: 002c R12: 
[35908.065201] R13: 88020a74c578 R14:  R15: 
[35908.065201] FS:  () GS:88023edc() 
knlGS:
[35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
[35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0
[35908.065201] Stack:
[35908.065201]  88010c4cbb18 0f37 88020a74c578 
88015a408000
[35908.065201]  880154a44000  0005 
88010c4cbbd8
[35908.065201]  a0492b9a 0005  

[35908.065201] Call Trace:
[35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs]
[35908.065201]  [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 
[btrfs]
[35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33 
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f 
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f 
[btrfs]
[35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs]
[35908.065201]  [] delayed_ref_async_start+0x3c/0x7b [btrfs]
[35908.065201]  [] normal_work_helper+0x14c/0x32a [btrfs]
[35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14 [btrfs]
[35908.065201]  [] process_one_work+0x24a/0x4ac
[35908.065201]  [] worker_thread+0x206/0x2c2
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] kthread+0xef/0xf7
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201]  [] ret_from_fork+0x3f/0x70
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 
e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 
45 30 8b 4d 28 45 31
[35908.065201] RIP  [] insert_inline_extent_backref+0x52/0xb1 
[btrfs]
[35908.065201]  RSP 
[35908.310885] ---[ end trace fe4299baf0666457 ]---

This happens because the new delayed references code no longer merges
delayed references that have different sequence values. The following
steps are an example sequence leading to this issue:

1) Transaction N starts, fs_info->tree_mod_seq has value 0;

2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
bytenr A is created, with a value of 1 and a seq value of 0;

3) fs_info->tree_mod_seq is incremented to 1;

4) Extent buffer A is deleted through btrfs_del_items(), which calls
btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
later returns the metadata extent associated to extent buffer A to
the free space cache (the range is not pinned), because the extent
buffer was created in the current transaction (N) and writeback never
happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
in the extent buffer).
This creates the delayed reference Ref2 for bytenr A, with a value
of -1 and a seq value of 1;

5) Delayed reference Ref2 is not merged with Ref1 when we create it,
because they have different sequence numbers (decided at
add_delayed_ref_tail_merge());

6) fs_info->tree_mod_seq is incremented to 2;

7) Some task attempts to allocate a new extent buffer (done at
extent-tree.c:find_free_extent()), but due to heavy fragmentation
and running low on metadata space the clustered allocation fails
and we fall back to unclustered allocation, which finds the
extent at offset A, so a new extent buffer at offset A is allocated.
This creates delay

Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Filipe Manana
On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo  wrote:
>
>
>  wrote on 2015/10/22 09:47 +0100:
>>
>> From: Filipe Manana 
>>
>> In the kernel 4.2 merge window we had a refactoring/rework of the delayed
>> references implementation in order to fix certain problems with qgroups.
>> However that rework introduced one more regression that leads to the
>> following trace when running delayed references for metadata:
>>
>> [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
>> [35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic
>> xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache
>> sunrpc loop fuse parport_pc psmouse i2
>> [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW
>> 4.3.0-rc5-btrfs-next-17+ #1
>> [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
>> [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
>> [btrfs]
>> [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti:
>> 88010c4c8000
>> [35908.065201] RIP: 0010:[]  []
>> insert_inline_extent_backref+0x52/0xb1 [btrfs]
>> [35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
>> [35908.065201] RAX:  RBX: 88008a661000 RCX:
>> 
>> [35908.065201] RDX: a04dd58f RSI: 0001 RDI:
>> 
>> [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09:
>> 88010c4cb9f8
>> [35908.065201] R10:  R11: 002c R12:
>> 
>> [35908.065201] R13: 88020a74c578 R14:  R15:
>> 
>> [35908.065201] FS:  () GS:88023edc()
>> knlGS:
>> [35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
>> [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4:
>> 06e0
>> [35908.065201] Stack:
>> [35908.065201]  88010c4cbb18 0f37 88020a74c578
>> 88015a408000
>> [35908.065201]  880154a44000  0005
>> 88010c4cbbd8
>> [35908.065201]  a0492b9a 0005 
>> 
>> [35908.065201] Call Trace:
>> [35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208
>> [btrfs]
>> [35908.065201]  [] ?
>> __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs]
>> [35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33
>> [btrfs]
>> [35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f
>> [btrfs]
>> [35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f
>> [btrfs]
>> [35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd
>> [btrfs]
>> [35908.065201]  [] delayed_ref_async_start+0x3c/0x7b
>> [btrfs]
>> [35908.065201]  [] normal_work_helper+0x14c/0x32a
>> [btrfs]
>> [35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14
>> [btrfs]
>> [35908.065201]  [] process_one_work+0x24a/0x4ac
>> [35908.065201]  [] worker_thread+0x206/0x2c2
>> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
>> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
>> [35908.065201]  [] kthread+0xef/0xf7
>> [35908.065201]  [] ? kthread_parkme+0x24/0x24
>> [35908.065201]  [] ret_from_fork+0x3f/0x70
>> [35908.065201]  [] ? kthread_parkme+0x24/0x24
>> [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48
>> 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02
>> <0f> 0b 4c 8b 45 30 8b 4d 28 45 31
>> [35908.065201] RIP  []
>> insert_inline_extent_backref+0x52/0xb1 [btrfs]
>> [35908.065201]  RSP 
>> [35908.310885] ---[ end trace fe4299baf0666457 ]---
>>
>> This happens because the new delayed references code no longer merges
>> delayed references that have different sequence values. The following
>> steps are an example sequence leading to this issue:
>>
>> 1) Transaction N starts, fs_info->tree_mod_seq has value 0;
>>
>> 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
>> bytenr A is created, with a value of 1 and a seq value of 0;
>>
>> 3) fs_info->tree_mod_seq is incremented to 1;
>>
>> 4) Extent buffer A is deleted through btrfs_del_items(), which calls
>> btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
>> later returns the metadata extent associated to extent buffer A to
>> the free space cache (the range is not pinned), because the extent
>> buffer was created in the current transaction (N) and writeback never
>> happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
>> in the extent buffer).
>> This creates the delayed reference Ref2 for bytenr A, with a value
>> of -1 and a seq value of 1;
>>
>> 5) Delayed reference Ref2 is not merged with Ref1 when we create it,
>> because they have different sequence numbers (decided at
>> add_delayed_ref_tail_merge());
>>
>> 6) fs_info->tree_mod_seq is incremented to 2;
>>
>> 7) Some task 

Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Filipe Manana
On Thu, Oct 22, 2015 at 10:43 AM, Filipe Manana  wrote:
> On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo  wrote:
>>
>>
>>  wrote on 2015/10/22 09:47 +0100:
>>>
>>> From: Filipe Manana 
>>>
>>> In the kernel 4.2 merge window we had a refactoring/rework of the delayed
>>> references implementation in order to fix certain problems with qgroups.
>>> However that rework introduced one more regression that leads to the
>>> following trace when running delayed references for metadata:
>>>
>>> [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
>>> [35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>> [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic
>>> xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache
>>> sunrpc loop fuse parport_pc psmouse i2
>>> [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW
>>> 4.3.0-rc5-btrfs-next-17+ #1
>>> [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>>> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
>>> [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
>>> [btrfs]
>>> [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti:
>>> 88010c4c8000
>>> [35908.065201] RIP: 0010:[]  []
>>> insert_inline_extent_backref+0x52/0xb1 [btrfs]
>>> [35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
>>> [35908.065201] RAX:  RBX: 88008a661000 RCX:
>>> 
>>> [35908.065201] RDX: a04dd58f RSI: 0001 RDI:
>>> 
>>> [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09:
>>> 88010c4cb9f8
>>> [35908.065201] R10:  R11: 002c R12:
>>> 
>>> [35908.065201] R13: 88020a74c578 R14:  R15:
>>> 
>>> [35908.065201] FS:  () GS:88023edc()
>>> knlGS:
>>> [35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
>>> [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4:
>>> 06e0
>>> [35908.065201] Stack:
>>> [35908.065201]  88010c4cbb18 0f37 88020a74c578
>>> 88015a408000
>>> [35908.065201]  880154a44000  0005
>>> 88010c4cbbd8
>>> [35908.065201]  a0492b9a 0005 
>>> 
>>> [35908.065201] Call Trace:
>>> [35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208
>>> [btrfs]
>>> [35908.065201]  [] ?
>>> __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs]
>>> [35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33
>>> [btrfs]
>>> [35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f
>>> [btrfs]
>>> [35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f
>>> [btrfs]
>>> [35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd
>>> [btrfs]
>>> [35908.065201]  [] delayed_ref_async_start+0x3c/0x7b
>>> [btrfs]
>>> [35908.065201]  [] normal_work_helper+0x14c/0x32a
>>> [btrfs]
>>> [35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14
>>> [btrfs]
>>> [35908.065201]  [] process_one_work+0x24a/0x4ac
>>> [35908.065201]  [] worker_thread+0x206/0x2c2
>>> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
>>> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
>>> [35908.065201]  [] kthread+0xef/0xf7
>>> [35908.065201]  [] ? kthread_parkme+0x24/0x24
>>> [35908.065201]  [] ret_from_fork+0x3f/0x70
>>> [35908.065201]  [] ? kthread_parkme+0x24/0x24
>>> [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48
>>> 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02
>>> <0f> 0b 4c 8b 45 30 8b 4d 28 45 31
>>> [35908.065201] RIP  []
>>> insert_inline_extent_backref+0x52/0xb1 [btrfs]
>>> [35908.065201]  RSP 
>>> [35908.310885] ---[ end trace fe4299baf0666457 ]---
>>>
>>> This happens because the new delayed references code no longer merges
>>> delayed references that have different sequence values. The following
>>> steps are an example sequence leading to this issue:
>>>
>>> 1) Transaction N starts, fs_info->tree_mod_seq has value 0;
>>>
>>> 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
>>> bytenr A is created, with a value of 1 and a seq value of 0;
>>>
>>> 3) fs_info->tree_mod_seq is incremented to 1;
>>>
>>> 4) Extent buffer A is deleted through btrfs_del_items(), which calls
>>> btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
>>> later returns the metadata extent associated to extent buffer A to
>>> the free space cache (the range is not pinned), because the extent
>>> buffer was created in the current transaction (N) and writeback never
>>> happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
>>> in the extent buffer).
>>> This creates the delayed reference Ref2 for bytenr A, with a value
>>> of -1 and a seq value of 1;
>>>
>>> 5) Delayed reference Ref2 is not merged with Ref1 when we create it,
>>> b

Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Koen Kooi
Op 22-10-15 om 10:47 schreef fdman...@kernel.org:
> From: Filipe Manana 
> 
> In the kernel 4.2 merge window we had a refactoring/rework of the delayed
> references implementation in order to fix certain problems with qgroups.
> However that rework introduced one more regression that leads to the
> following trace when running delayed references for metadata:
> 
> [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
> [35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor 
> raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
> loop fuse parport_pc psmouse i2
> [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW 
>   4.3.0-rc5-btrfs-next-17+ #1
> [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
> [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
> [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 
> 88010c4c8000
> [35908.065201] RIP: 0010:[]  [] 
> insert_inline_extent_backref+0x52/0xb1 [btrfs]
> [35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
> [35908.065201] RAX:  RBX: 88008a661000 RCX: 
> 
> [35908.065201] RDX: a04dd58f RSI: 0001 RDI: 
> 
> [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 
> 88010c4cb9f8
> [35908.065201] R10:  R11: 002c R12: 
> 
> [35908.065201] R13: 88020a74c578 R14:  R15: 
> 
> [35908.065201] FS:  () GS:88023edc() 
> knlGS:
> [35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
> [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 
> 06e0
> [35908.065201] Stack:
> [35908.065201]  88010c4cbb18 0f37 88020a74c578 
> 88015a408000
> [35908.065201]  880154a44000  0005 
> 88010c4cbbd8
> [35908.065201]  a0492b9a 0005  
> 
> [35908.065201] Call Trace:
> [35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs]
> [35908.065201]  [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 
> [btrfs]
> [35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33 
> [btrfs]
> [35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f 
> [btrfs]
> [35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f 
> [btrfs]
> [35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs]
> [35908.065201]  [] delayed_ref_async_start+0x3c/0x7b [btrfs]
> [35908.065201]  [] normal_work_helper+0x14c/0x32a [btrfs]
> [35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14 
> [btrfs]
> [35908.065201]  [] process_one_work+0x24a/0x4ac
> [35908.065201]  [] worker_thread+0x206/0x2c2
> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
> [35908.065201]  [] kthread+0xef/0xf7
> [35908.065201]  [] ? kthread_parkme+0x24/0x24
> [35908.065201]  [] ret_from_fork+0x3f/0x70
> [35908.065201]  [] ? kthread_parkme+0x24/0x24
> [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 
> 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 
> 0b 4c 8b 45 30 8b 4d 28 45 31
> [35908.065201] RIP  [] 
> insert_inline_extent_backref+0x52/0xb1 [btrfs]
> [35908.065201]  RSP 
> [35908.310885] ---[ end trace fe4299baf0666457 ]---

Would this also solve this:

Oct 22 12:03:20 beast kernel: WARNING: CPU: 5 PID: 323 at lib/list_debug.c:62
__list_del_entry+0x5a/0x98()
Oct 22 12:03:20 beast kernel: list_del corruption. next->prev should be
88033f864500, but was 88033f8642c0
Oct 22 12:03:20 beast kernel: Modules linked in: arc4 md4 nls_utf8 cifs
dns_resolver fscache ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack veth loop b43
mac80211 cfg80211 ssb mmc_core kvm_intel kvm crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel serio_raw sb_edac edac_core i2c_i801 btusb btrtl btintel
btbcm bluetooth joydev bcma rfkill cp210x tpm_infineon tpm_tis tpm
sch_fq_codel radeon crc32c_intel ttm drm_kms_helper
Oct 22 12:03:20 beast kernel: CPU: 5 PID: 323 Comm: kworker/u16:12 Tainted: G
   W   4.2.2 #50
Oct 22 12:03:20 beast kernel: Hardware name: System manufacturer System
Product Name/X79-DELUXE, BIOS 0901 06/20/2014
Oct 22 12:03:20 beast kernel: Workqueue: btrfs-delalloc btrfs_delalloc_helper
Oct 22 12:03:20 beast kernel:  0009 88013993fb98
8170b663 0006
Oct 22 12:03:20 beast kernel:  88013993fbe8 88013993fbd8
8106aa40 88013993fc78
Oct 22 12:03:20 beast kernel:  813392c3 88033f864480
88033f864500 880ba968d510
Oct 22 12:03:20 beast kernel: Call Trace:
Oct 22 12:03:20 beast kernel

Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing

2015-10-22 Thread Stéphane Lesimple

Le 2015-10-22 10:53, Filipe Manana a écrit :
On Thu, Oct 22, 2015 at 6:32 AM, Erkki Seppala  
wrote:

Hello,

Recently I added daily rebalancing to my cron.d (after finding myself 
in

the no-space-situation), and not long after that, I found my PC had
crashed over night. Having no sign in the logs anywhere (not even over
network even though there should be) I had nothing to go on, but this
night it crashed again after starting the rebalance, and this time 
there

was some information on the kernel log.

Kernel version: 4.2.3 (package linux-image-4.2.0-1-amd64 version 
4.2.3-1

from Debian Unstable)

The dump is available at:

  http://www.modeemi.fi/~flux/btrfs/btrfs-BUG-2015-10-55.txt

The log is available as well (stripped some unrelated USB- and 
firewall

logging, showing that last evening there was some kernel task hung for
120 seconds; but it's in another btrfs filesystem and is another 
story):


  http://www.modeemi.fi/~flux/btrfs/btrfs-2015-10-55.txt

I'm not quite sure which of the btrfs balance commands caused the
issue. But there is my script:

#!/bin/sh
fs="$1"
if [ -z "$fs" ]; then
  echo usage: btrfs-balance / 0 1 5 10 20 50
  exit 1
fi
fs="$1"
shift
for usage in d m; do for a in "$@"; do date; /bin/btrfs balance start
"$fs" -v -${usage}usage=$a; done; done

And it was started at 07:30 with:

  /usr/local/sbin/btrfs-balance / 0 1 2 5 10 20 30 50 70

I should add that the filesystem in question is backed by MD RAID10 
and

that is backed by four SSDs, so it's reasonably fast in IO, if that
affects anything. There should have been no much competing IO at the
time of the occurrence.

Before Duncan asks ;-), I only have a moderate number of subvolumes 
and
snapshots, ie. one subvolume for each of /, /var/log/journal and 
/home,

24 snapshots of / and /home plus <10 snapshots of /.

Before that balance there was another balance on a another BTRFS 
RAID10,

but given the time stamp I think I can easily say it wasn't the cause.

I don't really have other 'solutions' than disabling the rebalancing 
for

the time being, and only use it as-needed as I had earlier done..


Try this (just sent a few minutes ago):
https://patchwork.kernel.org/patch/7463161/



Awesome, I'll also try it right now under 4.3.0-rc6. My system is 
currently hit so hard by this bug that it no longer survives a balance 
for longer than a few minutes.


Will keep you posted on the outcome.

Thanks,

--
Stéphane.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Qu Wenruo



在 2015年10月22日 17:43, Filipe Manana 写道:

On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo  wrote:



  wrote on 2015/10/22 09:47 +0100:


From: Filipe Manana 

In the kernel 4.2 merge window we had a refactoring/rework of the delayed
references implementation in order to fix certain problems with qgroups.
However that rework introduced one more regression that leads to the
following trace when running delayed references for metadata:

[35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
[35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic
xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache
sunrpc loop fuse parport_pc psmouse i2
[35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW
4.3.0-rc5-btrfs-next-17+ #1
[35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[btrfs]
[35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti:
88010c4c8000
[35908.065201] RIP: 0010:[]  []
insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
[35908.065201] RAX:  RBX: 88008a661000 RCX:

[35908.065201] RDX: a04dd58f RSI: 0001 RDI:

[35908.065201] RBP: 88010c4cbb40 R08: 1000 R09:
88010c4cb9f8
[35908.065201] R10:  R11: 002c R12:

[35908.065201] R13: 88020a74c578 R14:  R15:

[35908.065201] FS:  () GS:88023edc()
knlGS:
[35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
[35908.065201] CR2: 015e8708 CR3: 000102185000 CR4:
06e0
[35908.065201] Stack:
[35908.065201]  88010c4cbb18 0f37 88020a74c578
88015a408000
[35908.065201]  880154a44000  0005
88010c4cbbd8
[35908.065201]  a0492b9a 0005 

[35908.065201] Call Trace:
[35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208
[btrfs]
[35908.065201]  [] ?
__btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs]
[35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f
[btrfs]
[35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd
[btrfs]
[35908.065201]  [] delayed_ref_async_start+0x3c/0x7b
[btrfs]
[35908.065201]  [] normal_work_helper+0x14c/0x32a
[btrfs]
[35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14
[btrfs]
[35908.065201]  [] process_one_work+0x24a/0x4ac
[35908.065201]  [] worker_thread+0x206/0x2c2
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] kthread+0xef/0xf7
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201]  [] ret_from_fork+0x3f/0x70
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48
8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02
<0f> 0b 4c 8b 45 30 8b 4d 28 45 31
[35908.065201] RIP  []
insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201]  RSP 
[35908.310885] ---[ end trace fe4299baf0666457 ]---

This happens because the new delayed references code no longer merges
delayed references that have different sequence values. The following
steps are an example sequence leading to this issue:

1) Transaction N starts, fs_info->tree_mod_seq has value 0;

2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
 bytenr A is created, with a value of 1 and a seq value of 0;

3) fs_info->tree_mod_seq is incremented to 1;

4) Extent buffer A is deleted through btrfs_del_items(), which calls
 btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
 later returns the metadata extent associated to extent buffer A to
 the free space cache (the range is not pinned), because the extent
 buffer was created in the current transaction (N) and writeback never
 happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
 in the extent buffer).
 This creates the delayed reference Ref2 for bytenr A, with a value
 of -1 and a seq value of 1;

5) Delayed reference Ref2 is not merged with Ref1 when we create it,
 because they have different sequence numbers (decided at
 add_delayed_ref_tail_merge());

6) fs_info->tree_mod_seq is incremented to 2;

7) Some task attempts to allocate a new extent buffer (done at
 extent-tree.c:find_free_extent()), but due to heavy fragmentation
 and running low on metadata space the clustered allocation fails
 and we fall back to unclustered allocation, which finds the
 ex

Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Filipe Manana
On Thu, Oct 22, 2015 at 11:05 AM, Koen Kooi  wrote:
> Op 22-10-15 om 10:47 schreef fdman...@kernel.org:
>> From: Filipe Manana 
>>
>> In the kernel 4.2 merge window we had a refactoring/rework of the delayed
>> references implementation in order to fix certain problems with qgroups.
>> However that rework introduced one more regression that leads to the
>> following trace when running delayed references for metadata:
>>
>> [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
>> [35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor 
>> raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache 
>> sunrpc loop fuse parport_pc psmouse i2
>> [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW
>>4.3.0-rc5-btrfs-next-17+ #1
>> [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
>> [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
>> [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 
>> 88010c4c8000
>> [35908.065201] RIP: 0010:[]  [] 
>> insert_inline_extent_backref+0x52/0xb1 [btrfs]
>> [35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
>> [35908.065201] RAX:  RBX: 88008a661000 RCX: 
>> 
>> [35908.065201] RDX: a04dd58f RSI: 0001 RDI: 
>> 
>> [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 
>> 88010c4cb9f8
>> [35908.065201] R10:  R11: 002c R12: 
>> 
>> [35908.065201] R13: 88020a74c578 R14:  R15: 
>> 
>> [35908.065201] FS:  () GS:88023edc() 
>> knlGS:
>> [35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
>> [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 
>> 06e0
>> [35908.065201] Stack:
>> [35908.065201]  88010c4cbb18 0f37 88020a74c578 
>> 88015a408000
>> [35908.065201]  880154a44000  0005 
>> 88010c4cbbd8
>> [35908.065201]  a0492b9a 0005  
>> 
>> [35908.065201] Call Trace:
>> [35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208 
>> [btrfs]
>> [35908.065201]  [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 
>> [btrfs]
>> [35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33 
>> [btrfs]
>> [35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f 
>> [btrfs]
>> [35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f 
>> [btrfs]
>> [35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd 
>> [btrfs]
>> [35908.065201]  [] delayed_ref_async_start+0x3c/0x7b 
>> [btrfs]
>> [35908.065201]  [] normal_work_helper+0x14c/0x32a [btrfs]
>> [35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14 
>> [btrfs]
>> [35908.065201]  [] process_one_work+0x24a/0x4ac
>> [35908.065201]  [] worker_thread+0x206/0x2c2
>> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
>> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
>> [35908.065201]  [] kthread+0xef/0xf7
>> [35908.065201]  [] ? kthread_parkme+0x24/0x24
>> [35908.065201]  [] ret_from_fork+0x3f/0x70
>> [35908.065201]  [] ? kthread_parkme+0x24/0x24
>> [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 
>> 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 
>> <0f> 0b 4c 8b 45 30 8b 4d 28 45 31
>> [35908.065201] RIP  [] 
>> insert_inline_extent_backref+0x52/0xb1 [btrfs]
>> [35908.065201]  RSP 
>> [35908.310885] ---[ end trace fe4299baf0666457 ]---
>
> Would this also solve this:

No, what you get is a totally different and unrelated problem.

>
> Oct 22 12:03:20 beast kernel: WARNING: CPU: 5 PID: 323 at lib/list_debug.c:62
> __list_del_entry+0x5a/0x98()
> Oct 22 12:03:20 beast kernel: list_del corruption. next->prev should be
> 88033f864500, but was 88033f8642c0
> Oct 22 12:03:20 beast kernel: Modules linked in: arc4 md4 nls_utf8 cifs
> dns_resolver fscache ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack veth loop b43
> mac80211 cfg80211 ssb mmc_core kvm_intel kvm crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel serio_raw sb_edac edac_core i2c_i801 btusb btrtl btintel
> btbcm bluetooth joydev bcma rfkill cp210x tpm_infineon tpm_tis tpm
> sch_fq_codel radeon crc32c_intel ttm drm_kms_helper
> Oct 22 12:03:20 beast kernel: CPU: 5 PID: 323 Comm: kworker/u16:12 Tainted: G
>W   4.2.2 #50
> Oct 22 12:03:20 beast kernel: Hardware name: System manufacturer System
> Product Name/X79-DELUXE, BIOS 0901 06/20/2014
> Oct 22 12:03:20 beast kernel: Workqueue: btrfs-delalloc btrfs_delalloc_helper
> Oct 22 12:03:20 beast kernel:  0009 88013993fb98
> 8170b663 0006
> Oct 22 12:03:20 beast k

Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Stéphane Lesimple

Le 2015-10-22 11:47, Filipe Manana a écrit :
On Thu, Oct 22, 2015 at 10:43 AM, Filipe Manana  
wrote:
On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo  
wrote:



 wrote on 2015/10/22 09:47 +0100:


From: Filipe Manana 

In the kernel 4.2 merge window we had a refactoring/rework of the 
delayed
references implementation in order to fix certain problems with 
qgroups.

However that rework introduced one more regression that leads to the
following trace when running delayed references for metadata:

[35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
[35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[35908.065201] Modules linked in: dm_flakey dm_mod btrfs 
crc32c_generic
xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace 
fscache

sunrpc loop fuse parport_pc psmouse i2
[35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: G 
   W

4.3.0-rc5-btrfs-next-17+ #1
[35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 
04/01/2014

[35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[btrfs]
[35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti:
88010c4c8000
[35908.065201] RIP: 0010:[]  []
insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
[35908.065201] RAX:  RBX: 88008a661000 RCX:

[35908.065201] RDX: a04dd58f RSI: 0001 RDI:

[35908.065201] RBP: 88010c4cbb40 R08: 1000 R09:
88010c4cb9f8
[35908.065201] R10:  R11: 002c R12:

[35908.065201] R13: 88020a74c578 R14:  R15:

[35908.065201] FS:  () GS:88023edc()
knlGS:
[35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
[35908.065201] CR2: 015e8708 CR3: 000102185000 CR4:
06e0
[35908.065201] Stack:
[35908.065201]  88010c4cbb18 0f37 88020a74c578
88015a408000
[35908.065201]  880154a44000  0005
88010c4cbbd8
[35908.065201]  a0492b9a 0005 

[35908.065201] Call Trace:
[35908.065201]  [] 
__btrfs_inc_extent_ref+0x8b/0x208

[btrfs]
[35908.065201]  [] ?
__btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs]
[35908.065201]  [] 
__btrfs_run_delayed_refs+0xafa/0xd33

[btrfs]
[35908.065201]  [] ? 
join_transaction.isra.10+0x25/0x41f

[btrfs]
[35908.065201]  [] ? 
join_transaction.isra.10+0xa8/0x41f

[btrfs]
[35908.065201]  [] 
btrfs_run_delayed_refs+0x75/0x1dd

[btrfs]
[35908.065201]  [] 
delayed_ref_async_start+0x3c/0x7b

[btrfs]
[35908.065201]  [] normal_work_helper+0x14c/0x32a
[btrfs]
[35908.065201]  [] 
btrfs_extent_refs_helper+0x12/0x14

[btrfs]
[35908.065201]  [] process_one_work+0x24a/0x4ac
[35908.065201]  [] worker_thread+0x206/0x2c2
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] kthread+0xef/0xf7
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201]  [] ret_from_fork+0x3f/0x70
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 
c8 48
8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 
77 02

<0f> 0b 4c 8b 45 30 8b 4d 28 45 31
[35908.065201] RIP  []
insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201]  RSP 
[35908.310885] ---[ end trace fe4299baf0666457 ]---

This happens because the new delayed references code no longer 
merges
delayed references that have different sequence values. The 
following

steps are an example sequence leading to this issue:

1) Transaction N starts, fs_info->tree_mod_seq has value 0;

2) Extent buffer (btree node) A is allocated, delayed reference Ref1 
for

bytenr A is created, with a value of 1 and a seq value of 0;

3) fs_info->tree_mod_seq is incremented to 1;

4) Extent buffer A is deleted through btrfs_del_items(), which calls
btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). 
The
later returns the metadata extent associated to extent buffer A 
to
the free space cache (the range is not pinned), because the 
extent
buffer was created in the current transaction (N) and writeback 
never
happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN 
not set

in the extent buffer).
This creates the delayed reference Ref2 for bytenr A, with a 
value

of -1 and a seq value of 1;

5) Delayed reference Ref2 is not merged with Ref1 when we create it,
because they have different sequence numbers (decided at
add_delayed_ref_tail_merge());

6) fs_info->tree_mod_seq is incremented to 2;

7) Some task attempts to allocate a new extent buffer (done at
extent-tree.c:find_free_extent()), but due to heavy 
fragmentation

and running low on metadata space the clustered

Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing

2015-10-22 Thread Erkki Seppala
Hello,

Thanks for the super-fast response :).

I've installed the patch and shall be waiting. The effects should be
visible within a week given daily rebalances of two filesystems.

-- 
  _
 / __// /__   __   http://www.modeemi.fi/~flux/\   \
/ /_ / // // /\ \/ /\  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi  \/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Filipe Manana
On Thu, Oct 22, 2015 at 3:58 PM, Stéphane Lesimple
 wrote:
> Le 2015-10-22 11:47, Filipe Manana a écrit :
>>
>> On Thu, Oct 22, 2015 at 10:43 AM, Filipe Manana 
>> wrote:
>>>
>>> On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo 
>>> wrote:



  wrote on 2015/10/22 09:47 +0100:
>
>
> From: Filipe Manana 
>
> In the kernel 4.2 merge window we had a refactoring/rework of the
> delayed
> references implementation in order to fix certain problems with
> qgroups.
> However that rework introduced one more regression that leads to the
> following trace when running delayed references for metadata:
>
> [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
> [35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic
> xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace
> fscache
> sunrpc loop fuse parport_pc psmouse i2
> [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: G
> W
> 4.3.0-rc5-btrfs-next-17+ #1
> [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS
> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
> [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
> [btrfs]
> [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti:
> 88010c4c8000
> [35908.065201] RIP: 0010:[]  []
> insert_inline_extent_backref+0x52/0xb1 [btrfs]
> [35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
> [35908.065201] RAX:  RBX: 88008a661000 RCX:
> 
> [35908.065201] RDX: a04dd58f RSI: 0001 RDI:
> 
> [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09:
> 88010c4cb9f8
> [35908.065201] R10:  R11: 002c R12:
> 
> [35908.065201] R13: 88020a74c578 R14:  R15:
> 
> [35908.065201] FS:  () GS:88023edc()
> knlGS:
> [35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
> [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4:
> 06e0
> [35908.065201] Stack:
> [35908.065201]  88010c4cbb18 0f37 88020a74c578
> 88015a408000
> [35908.065201]  880154a44000  0005
> 88010c4cbbd8
> [35908.065201]  a0492b9a 0005 
> 
> [35908.065201] Call Trace:
> [35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208
> [btrfs]
> [35908.065201]  [] ?
> __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs]
> [35908.065201]  []
> __btrfs_run_delayed_refs+0xafa/0xd33
> [btrfs]
> [35908.065201]  [] ?
> join_transaction.isra.10+0x25/0x41f
> [btrfs]
> [35908.065201]  [] ?
> join_transaction.isra.10+0xa8/0x41f
> [btrfs]
> [35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd
> [btrfs]
> [35908.065201]  [] delayed_ref_async_start+0x3c/0x7b
> [btrfs]
> [35908.065201]  [] normal_work_helper+0x14c/0x32a
> [btrfs]
> [35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14
> [btrfs]
> [35908.065201]  [] process_one_work+0x24a/0x4ac
> [35908.065201]  [] worker_thread+0x206/0x2c2
> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
> [35908.065201]  [] kthread+0xef/0xf7
> [35908.065201]  [] ? kthread_parkme+0x24/0x24
> [35908.065201]  [] ret_from_fork+0x3f/0x70
> [35908.065201]  [] ? kthread_parkme+0x24/0x24
> [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8
> 48
> 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77
> 02
> <0f> 0b 4c 8b 45 30 8b 4d 28 45 31
> [35908.065201] RIP  []
> insert_inline_extent_backref+0x52/0xb1 [btrfs]
> [35908.065201]  RSP 
> [35908.310885] ---[ end trace fe4299baf0666457 ]---
>
> This happens because the new delayed references code no longer merges
> delayed references that have different sequence values. The following
> steps are an example sequence leading to this issue:
>
> 1) Transaction N starts, fs_info->tree_mod_seq has value 0;
>
> 2) Extent buffer (btree node) A is allocated, delayed reference Ref1
> for
> bytenr A is created, with a value of 1 and a seq value of 0;
>
> 3) fs_info->tree_mod_seq is incremented to 1;
>
> 4) Extent buffer A is deleted through btrfs_del_items(), which calls
> btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
> later returns the metadata extent associated to extent buffer A to
> the free space cache (the range is not pinned), because the

Re: [PATCH] Btrfs-progs: fix btrfs-convert rollback to check ROOT_BACKREF

2015-10-22 Thread David Sterba
On Sun, Oct 18, 2015 at 07:41:27PM +0800, Qu Wenruo wrote:
> 在 2015年10月18日 13:44, Liu Bo 写道:
> > Btrfs has changed to delete subvolume/snapshot asynchronously, which means 
> > that
> > after umount itself, if we've already deleted 'ext2_saved', rollback can 
> > still
> > be completed.
> >
> > So this adds a check for ROOT_BACKREF before checking ROOT_ITEM since
> > ROOT_BACKREF is immediately not in the btree after 
> > ioctl(BTRFS_IOC_SNAP_DESTROY)
> > returns.
> >
> > Signed-off-by: Liu Bo 
> Reviewed-by: Qu Wenruo 
> 
> Looks good to me.
> 
> Although the error message for ret > 0 case can be improved a little, like:
> "unable to find convert image subvolume, maybe it's already deleted?\n".

I've adjusted the error messages.

> BTW, would you please submit a test case for fstests? It won't be a hard 
> one though.

Test added.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Stéphane Lesimple

Le 2015-10-22 19:03, Filipe Manana a écrit :

On Thu, Oct 22, 2015 at 3:58 PM, Stéphane Lesimple
 wrote:

Le 2015-10-22 11:47, Filipe Manana a écrit :


On Thu, Oct 22, 2015 at 10:43 AM, Filipe Manana 
wrote:


On Thu, Oct 22, 2015 at 10:32 AM, Qu Wenruo 


wrote:




 wrote on 2015/10/22 09:47 +0100:



From: Filipe Manana 

In the kernel 4.2 merge window we had a refactoring/rework of the
delayed
references implementation in order to fix certain problems with
qgroups.
However that rework introduced one more regression that leads to 
the

following trace when running delayed references for metadata:

[35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
[35908.065201] invalid opcode:  [#1] PREEMPT SMP 
DEBUG_PAGEALLOC
[35908.065201] Modules linked in: dm_flakey dm_mod btrfs 
crc32c_generic

xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace
fscache
sunrpc loop fuse parport_pc psmouse i2
[35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: G
W
4.3.0-rc5-btrfs-next-17+ #1
[35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996),

BIOS
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 
04/01/2014
[35908.065201] Workqueue: btrfs-extent-refs 
btrfs_extent_refs_helper

[btrfs]
[35908.065201] task: 880114b7d780 ti: 88010c4c8000 
task.ti:

88010c4c8000
[35908.065201] RIP: 0010:[]  
[]

insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
[35908.065201] RAX:  RBX: 88008a661000 RCX:

[35908.065201] RDX: a04dd58f RSI: 0001 RDI:

[35908.065201] RBP: 88010c4cbb40 R08: 1000 R09:
88010c4cb9f8
[35908.065201] R10:  R11: 002c R12:

[35908.065201] R13: 88020a74c578 R14:  R15:

[35908.065201] FS:  () 
GS:88023edc()

knlGS:
[35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
[35908.065201] CR2: 015e8708 CR3: 000102185000 CR4:
06e0
[35908.065201] Stack:
[35908.065201]  88010c4cbb18 0f37 88020a74c578
88015a408000
[35908.065201]  880154a44000  0005
88010c4cbbd8
[35908.065201]  a0492b9a 0005 

[35908.065201] Call Trace:
[35908.065201]  [] 
__btrfs_inc_extent_ref+0x8b/0x208

[btrfs]
[35908.065201]  [] ?
__btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs]
[35908.065201]  []
__btrfs_run_delayed_refs+0xafa/0xd33
[btrfs]
[35908.065201]  [] ?
join_transaction.isra.10+0x25/0x41f
[btrfs]
[35908.065201]  [] ?
join_transaction.isra.10+0xa8/0x41f
[btrfs]
[35908.065201]  [] 
btrfs_run_delayed_refs+0x75/0x1dd

[btrfs]
[35908.065201]  [] 
delayed_ref_async_start+0x3c/0x7b

[btrfs]
[35908.065201]  [] 
normal_work_helper+0x14c/0x32a

[btrfs]
[35908.065201]  [] 
btrfs_extent_refs_helper+0x12/0x14

[btrfs]
[35908.065201]  [] process_one_work+0x24a/0x4ac
[35908.065201]  [] worker_thread+0x206/0x2c2
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] kthread+0xef/0xf7
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201]  [] ret_from_fork+0x3f/0x70
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 
89 c8

48
8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 
00 77

02
<0f> 0b 4c 8b 45 30 8b 4d 28 45 31
[35908.065201] RIP  []
insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201]  RSP 
[35908.310885] ---[ end trace fe4299baf0666457 ]---

This happens because the new delayed references code no longer 
merges
delayed references that have different sequence values. The 
following

steps are an example sequence leading to this issue:

1) Transaction N starts, fs_info->tree_mod_seq has value 0;

2) Extent buffer (btree node) A is allocated, delayed reference 
Ref1

for
bytenr A is created, with a value of 1 and a seq value of 0;

3) fs_info->tree_mod_seq is incremented to 1;

4) Extent buffer A is deleted through btrfs_del_items(), which 
calls
btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). 
The
later returns the metadata extent associated to extent buffer 
A to
the free space cache (the range is not pinned), because the 
extent
buffer was created in the current transaction (N) and 
writeback

never
happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN 
not

set
in the extent buffer).
This creates the delayed reference Ref2 for bytenr A, with a 
value

of -1 and a seq value of 1;

5) Delayed reference Ref2 is not merged with Ref1 when we create 
it,

because they have different sequence numbers (decided at
add_delayed_ref_tail_merge());

6) fs_info->tree_mod_seq is incremented to 2;

7) Some task attempts to allocate a new extent buffer (done at
extent-t

[PATCH] Btrfs: igrab inode in writepage

2015-10-22 Thread Josef Bacik
We hit this panic on a few of our boxes this week where we have an
ordered_extent with an NULL inode.  We do an igrab() of the inode in writepages,
but weren't doing it in writepage which can be called directly from the VM on
dirty pages.  If the inode has been unlinked then we could have I_FREEING set
which means igrab() would return NULL and we get this panic.  Fix this by trying
to igrab in btrfs_writepage, and if it returns NULL then just redirty the page
and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful.  
Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/inode.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a0fa725..4d1fdc2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8438,15 +8438,28 @@ int btrfs_readpage(struct file *file, struct page *page)
 static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
 {
struct extent_io_tree *tree;
-
+   struct inode *inode = page->mapping->host;
+   int ret;
 
if (current->flags & PF_MEMALLOC) {
redirty_page_for_writepage(wbc, page);
unlock_page(page);
return 0;
}
+
+   /*
+* If we are under memory pressure we will call this directly from the
+* VM, we need to make sure we have the inode referenced for the ordered
+* extent.  If not just return like we didn't do anything.
+*/
+   if (!igrab(inode)) {
+   redirty_page_for_writepage(wbc, page);
+   return AOP_WRITEPAGE_ACTIVATE;
+   }
tree = &BTRFS_I(page->mapping->host)->io_tree;
-   return extent_write_full_page(tree, page, btrfs_get_extent, wbc);
+   ret = extent_write_full_page(tree, page, btrfs_get_extent, wbc);
+   btrfs_add_delayed_iput(inode);
+   return ret;
 }
 
 static int btrfs_writepages(struct address_space *mapping,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Exclusive quota of snapshot exceeded despite no space used

2015-10-22 Thread Johannes Henninger
I'm having a weird problem with snapshots and exclusive quotas. After
creating a snapshot of a subvolume and setting an exclusive quota of
50MB for the snapshot, everything seems to work fine. I can write
approximately 50MB before the quota kicks in.

However, if I create a snapshot, set an exclusive quota and just wait
for some time, I suddenly cannot even create an empty file because I'm
getting a "quota exceeded" error. The time until the bug appears seems
to vary. During the waiting time, I'm changing neither the snapshot nor
the original subvolume. "qgroup show -e" reports an exclusive use of
only a few kilobytes for the snapshot, which is nowhere near the limit.

Steps to reproduce (/media/extern is a fresh and empty btrfs partition):

Enable quota and create an empty subvolume:
root@t420:/media/extern# btrfs quota enable .
root@t420:/media/extern# btrfs subvolume create sub
Create subvolume './sub'

Snapshot the subvolume and set a limit:
root@t420:/media/extern# btrfs subvolume snapshot sub snap
Create a snapshot of 'sub' in './snap'
root@t420:/media/extern# cd snap/
root@t420:/media/extern/snap# btrfs qgroup limit -e 50M .

Sometimes it takes "longer" for the quota to kick in, so I'm touching a
file every 5 minutes here:

root@t420:/media/extern/snap# for file in {1..100}; do touch $file;
sleep 5m; done
touch: cannot touch ‘7’: Disk quota exceeded
^C
root@t420:/media/extern/snap# btrfs qgroup show -e .
qgroupid rfer excl max_excl
   
0/5  16.00KiB 16.00KiB none
0/25716.00KiB 16.00KiB none
0/25816.00KiB 16.00KiB 50.00MiB

Any idea why this happens?

Thanks,
Johannes

System info:

Linux t420 4.3.0-rc5 #1 SMP Tue Oct 13 13:21:02 CEST 2015 x86_64
GNU/Linux

Label: none  uuid: 9551e3ca-1608-469c-9d8c-77b99ce0e8ec
Total devices 1 FS bytes used 816.00KiB
devid1 size 931.51GiB used 2.04GiB path /dev/sdb1

btrfs-progs v4.1.2

Data, single: total=8.00MiB, used=256.00KiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=544.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

[249174.151820]  sdb: sdb1
[249184.387377]  sdb: sdb1
[249184.573096]  sdb: sdb1
[249184.656274] BTRFS: device fsid
9551e3ca-1608-469c-9d8c-77b99ce0e8ec devid 1 transid 3 /dev/sdb1
[249186.323915]  sdb: sdb1
[249186.534505]  sdb: sdb1
[249186.538420]  sdb: sdb1
[249196.781978] BTRFS info (device sdb1): disk space caching is enabled
[249196.781986] BTRFS: has skinny extents
[249196.781990] BTRFS: flagging fs with big metadata feature
[249196.818164] BTRFS: creating UUID tree
[249202.311983] BTRFS info (device sdb1): qgroup scan completed
(inconsistency flag cleared)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Stéphane Lesimple

[ ... thread cleanup ... ]
Don't hesitate to ask if you need me to debug or even ftrace 
something.


Thanks Stéphane. I haven't seen that crash yet (still running tests
for 2 consecutive days now).
Can you please try the following patch, which works on top of mine,
and enable ftrace before running balance:

Debug patch:  https://friendpaste.com/5s3dItRpcpq3dH1E4KUJor

Enable ftrace:

$ echo > /sys/kernel/debug/tracing/trace
$ echo "nop" > /sys/kernel/debug/tracing/current_tracer
$ echo 10 > /sys/kernel/debug/tracing/buffer_size_kb   # if
you can use larger buffer size, even better
$ echo > /sys/kernel/debug/tracing/set_ftrace_filter
$ echo 1 > /sys/kernel/debug/tracing/tracing_on

$ run balance... wait until it finishes with IO error or the
patch's printk message shows up in dmesg/syslog

$ echo 0 > /sys/kernel/debug/tracing/tracing_on

$ cat /sys/kernel/debug/tracing/trace > some_file.txt

Then send is some_file.txt for debugging, hopefully it will give some
useful information. Note that it might produce tons of messages,
depending on how long it takes for you to hit the BUG_ON.

Thanks a lot for this.


I'm compiling it now (using your v2 of the friendpaste diff).

I took the liberty to add a tracing_off() right before the return -EIO
so that the trace tail ends exactly at the right place.

Last time I tried to use ftrace to diagnose the bug we're trying to
fix, the system crashes so hard that usually it's complicated to get
the trace contents written somewhere before the system is unusable.
But I'll eventually work around it by using
/sys/kernel/debug/tracing/trace_pipe to send the trace live to another
machine over the LAN.

This series of bugs are so easy to trigger on my system that we'll
hopefully get something useful out of the trace. I guess that's a good
thing !


So, this time it took a little over an hour to get the crash, but it did 
reach the -EIO condition eventually.

The ftrace log (2M gzipped) is available here :
http://www.speed47.net/tmp2/btrfs-4.3rc6p7463161-ftrace1.log.gz

The associated kernel log is as follows :

[ 2880.178589] INFO: task btrfs-transacti:7358 blocked for more than 120 
seconds.

[ 2880.178600]   Not tainted 4.3.0-rc6p7463161+ #3
[ 2880.178603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 3088.829429] Out of memory: Kill process 9449 (df-complex2simp) score 
246 or sacrifice child
[ 3088.829435] Killed process 9449 (df-complex2simp) total-vm:964732kB, 
anon-rss:943764kB, file-rss:0kB
[ 3600.197642] INFO: task btrfs-transacti:7358 blocked for more than 120 
seconds.

[ 3600.197657]   Not tainted 4.3.0-rc6p7463161+ #3
[ 3600.197660] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 3840.204146] INFO: task btrfs-transacti:7358 blocked for more than 120 
seconds.

[ 3840.204180]   Not tainted 4.3.0-rc6p7463161+ #3
[ 3840.204219] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 3993.671982] Out of memory: Kill process 11357 (df-complex2simp) score 
227 or sacrifice child
[ 3993.671989] Killed process 11357 (df-complex2simp) total-vm:891608kB, 
anon-rss:870704kB, file-rss:60kB
[ 4080.210324] INFO: task btrfs-transacti:7358 blocked for more than 120 
seconds.

[ 4080.210336]   Not tainted 4.3.0-rc6p7463161+ #3
[ 4080.210339] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 4320.215635] INFO: task btrfs-transacti:7358 blocked for more than 120 
seconds.

[ 4320.215662]   Not tainted 4.3.0-rc6p7463161+ #3
[ 4320.215667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 4560.221119] INFO: task btrfs-transacti:7358 blocked for more than 120 
seconds.

[ 4560.221146]   Not tainted 4.3.0-rc6p7463161+ #3
[ 4560.221148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 4800.226884] INFO: task btrfs-transacti:7358 blocked for more than 120 
seconds.

[ 4800.226898]   Not tainted 4.3.0-rc6p7463161+ #3
[ 4800.226902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 4890.116131] Out of memory: Kill process 13377 (df-complex2simp) score 
207 or sacrifice child
[ 4890.116138] Killed process 13377 (df-complex2simp) total-vm:834976kB, 
anon-rss:793272kB, file-rss:48kB
[ 5785.793580] Out of memory: Kill process 15285 (df-complex2simp) score 
201 or sacrifice child
[ 5785.793586] Killed process 15285 (df-complex2simp) total-vm:802208kB, 
anon-rss:772172kB, file-rss:4kB
[ 6480.269728] INFO: task btrfs-transacti:7358 blocked for more than 120 
seconds.

[ 6480.269738]   Not tainted 4.3.0-rc6p7463161+ #3
[ 6480.269740] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 7081.967354] BTRFS: here, ref_mod != 1, bytenr 12090260504576, ref_mod 
2, seq 0 action 1
[ 7081.967784] BTRFS: error (device dm-3) in 
btrfs_run_delayed_refs:2872: errno=-5 IO failure


The OOM conditions are unrelated, this is an rrdtool cr

Re: [PATCH] Btrfs: fix regression when running delayed references

2015-10-22 Thread Filipe Manana
On Thu, Oct 22, 2015 at 11:38 PM, Stéphane Lesimple
 wrote:
 [ ... thread cleanup ... ]

 Don't hesitate to ask if you need me to debug or even ftrace something.
>>>
>>>
>>> Thanks Stéphane. I haven't seen that crash yet (still running tests
>>> for 2 consecutive days now).
>>> Can you please try the following patch, which works on top of mine,
>>> and enable ftrace before running balance:
>>>
>>> Debug patch:  https://friendpaste.com/5s3dItRpcpq3dH1E4KUJor
>>>
>>> Enable ftrace:
>>>
>>> $ echo > /sys/kernel/debug/tracing/trace
>>> $ echo "nop" > /sys/kernel/debug/tracing/current_tracer
>>> $ echo 10 > /sys/kernel/debug/tracing/buffer_size_kb   # if
>>> you can use larger buffer size, even better
>>> $ echo > /sys/kernel/debug/tracing/set_ftrace_filter
>>> $ echo 1 > /sys/kernel/debug/tracing/tracing_on
>>>
>>> $ run balance... wait until it finishes with IO error or the
>>> patch's printk message shows up in dmesg/syslog
>>>
>>> $ echo 0 > /sys/kernel/debug/tracing/tracing_on
>>>
>>> $ cat /sys/kernel/debug/tracing/trace > some_file.txt
>>>
>>> Then send is some_file.txt for debugging, hopefully it will give some
>>> useful information. Note that it might produce tons of messages,
>>> depending on how long it takes for you to hit the BUG_ON.
>>>
>>> Thanks a lot for this.
>>
>>
>> I'm compiling it now (using your v2 of the friendpaste diff).
>>
>> I took the liberty to add a tracing_off() right before the return -EIO
>> so that the trace tail ends exactly at the right place.
>>
>> Last time I tried to use ftrace to diagnose the bug we're trying to
>> fix, the system crashes so hard that usually it's complicated to get
>> the trace contents written somewhere before the system is unusable.
>> But I'll eventually work around it by using
>> /sys/kernel/debug/tracing/trace_pipe to send the trace live to another
>> machine over the LAN.
>>
>> This series of bugs are so easy to trigger on my system that we'll
>> hopefully get something useful out of the trace. I guess that's a good
>> thing !
>
>
> So, this time it took a little over an hour to get the crash, but it did
> reach the -EIO condition eventually.
> The ftrace log (2M gzipped) is available here :
> http://www.speed47.net/tmp2/btrfs-4.3rc6p7463161-ftrace1.log.gz
>
> The associated kernel log is as follows :
>
> [ 2880.178589] INFO: task btrfs-transacti:7358 blocked for more than 120
> seconds.
> [ 2880.178600]   Not tainted 4.3.0-rc6p7463161+ #3
> [ 2880.178603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 3088.829429] Out of memory: Kill process 9449 (df-complex2simp) score 246
> or sacrifice child
> [ 3088.829435] Killed process 9449 (df-complex2simp) total-vm:964732kB,
> anon-rss:943764kB, file-rss:0kB
> [ 3600.197642] INFO: task btrfs-transacti:7358 blocked for more than 120
> seconds.
> [ 3600.197657]   Not tainted 4.3.0-rc6p7463161+ #3
> [ 3600.197660] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 3840.204146] INFO: task btrfs-transacti:7358 blocked for more than 120
> seconds.
> [ 3840.204180]   Not tainted 4.3.0-rc6p7463161+ #3
> [ 3840.204219] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 3993.671982] Out of memory: Kill process 11357 (df-complex2simp) score 227
> or sacrifice child
> [ 3993.671989] Killed process 11357 (df-complex2simp) total-vm:891608kB,
> anon-rss:870704kB, file-rss:60kB
> [ 4080.210324] INFO: task btrfs-transacti:7358 blocked for more than 120
> seconds.
> [ 4080.210336]   Not tainted 4.3.0-rc6p7463161+ #3
> [ 4080.210339] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 4320.215635] INFO: task btrfs-transacti:7358 blocked for more than 120
> seconds.
> [ 4320.215662]   Not tainted 4.3.0-rc6p7463161+ #3
> [ 4320.215667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 4560.221119] INFO: task btrfs-transacti:7358 blocked for more than 120
> seconds.
> [ 4560.221146]   Not tainted 4.3.0-rc6p7463161+ #3
> [ 4560.221148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 4800.226884] INFO: task btrfs-transacti:7358 blocked for more than 120
> seconds.
> [ 4800.226898]   Not tainted 4.3.0-rc6p7463161+ #3
> [ 4800.226902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 4890.116131] Out of memory: Kill process 13377 (df-complex2simp) score 207
> or sacrifice child
> [ 4890.116138] Killed process 13377 (df-complex2simp) total-vm:834976kB,
> anon-rss:793272kB, file-rss:48kB
> [ 5785.793580] Out of memory: Kill process 15285 (df-complex2simp) score 201
> or sacrifice child
> [ 5785.793586] Killed process 15285 (df-complex2simp) total-vm:802208kB,
> anon-rss:772172kB, file-rss:4kB
> [ 6480.269728] INFO: task btrfs-transacti:7358 blocked for more than 120
> seconds.
> [ 6480.269738]   Not tainted 4.3.0-rc6p7463161+ #3
> [ 6480.269740]

Crash during mount -o degraded, kernel BUG at fs/btrfs/extent_io.c:2044

2015-10-22 Thread Erik Berg

Hi again,

So I intentionally broke this small raid6 fs on a VM to learn recovery 
strategies for another much bigger raid6 I have running (which also 
suffered a drive failure).


Basically I zeroed out one of the drives (vdd) from under the running 
vm. Then ran an md5sum on a file on the fs to trigger some detection of 
data inconsistency. I ran a scrub, which completed "ok". Then rebooted.


Now trying to mount the filesystem in degraded mode leads to a kernel crash.

I'm using kernel 4.3-rc6 and btrfs-progs 4.2.3

Linux ubuntu 4.3.0-040300rc6-generic #201510182030 SMP Mon Oct 19 
00:31:41 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux


Label: none  uuid: aee28657-3ce0-4efc-9cd3-cc7c58782af3
Total devices 1 FS bytes used 1.87GiB
devid1 size 9.52GiB used 2.89GiB path /dev/vda2

warning devid 3 not found already
Label: 'boxofkittens'  uuid: 4957afbe-e2cb-410c-8d45-3850840898f2
Total devices 9 FS bytes used 3.56GiB
devid1 size 1022.00MiB used 716.19MiB path /dev/vdb1
devid2 size 1022.00MiB used 716.19MiB path /dev/vdc1
devid4 size 1022.00MiB used 716.19MiB path /dev/vde1
devid5 size 1022.00MiB used 716.19MiB path /dev/vdf1
devid6 size 1022.00MiB used 716.19MiB path /dev/vdg1
devid7 size 2.00GiB used 1.70GiB path /dev/vdh1
devid8 size 3.00GiB used 1.70GiB path /dev/vdi1
devid9 size 3.00GiB used 1.70GiB path /dev/vdj1
*** Some devices missing

btrfs-progs v4.2.3

mount -o degraded /dev/vdb1 /mnt/boxofkittens

[   36.426731] [ cut here ]
[   36.427547] kernel BUG at 
/home/kernel/COD/linux/fs/btrfs/extent_io.c:2044!

[   36.428686] invalid opcode:  [#1] SMP
[   36.429438] Modules linked in: snd_hda_codec_generic iosf_mbi 
crct10dif_pclmul crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd input_leds joydev snd_hda_intel serio_raw 
snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore 
i2c_piix4 parport_pc parport 8250_fintek mac_hid autofs4 btrfs xor 
raid6_pq cirrus ttm psmouse drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops drm floppy pata_acpi
[   36.436782] CPU: 0 PID: 86 Comm: kworker/u2:2 Not tainted 
4.3.0-040300rc6-generic #201510182030
[   36.438138] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.8.1-20150318_183358- 04/01/2014

[   36.439648] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[   36.440617] task: 880035b4e200 ti: 880035564000 task.ti: 
880035564000
[   36.441778] RIP: 0010:[]  [] 
repair_io_failure+0x1a9/0x1f0 [btrfs]

[   36.443287] RSP: 0018:880035567c20  EFLAGS: 00010246
[   36.444128] RAX: 88003c7ad000 RBX: 8800363dc7d0 RCX: 

[   36.445227] RDX: 1000 RSI: 00027000 RDI: 
8800388ce100
[   36.446315] RBP: 880035567c78 R08: eaddb640 R09: 

[   36.447397] R10: 8800363dc980 R11: 88003bd49b00 R12: 
00027000
[   36.448479] R13: 8800388ce000 R14: 8800363dc980 R15: 
8800363dc838
[   36.449553] FS:  () GS:88003fc0() 
knlGS:

[   36.450766] CS:  0010 DS:  ES:  CR0: 80050033
[   36.451641] CR2: 02015008 CR3: 3c1be000 CR4: 
000406f0

[   36.452709] Stack:
[   36.453026]  00027000 35567c48 eaddb640 
0002b1047000
[   36.454211]    8800363dc7d0 
00027000
[   36.455513]  8800388ce000 8800363dc980 8800363dc838 
880035567ce8

[   36.456663] Call Trace:
[   36.457043]  [] clean_io_failure+0x18d/0x1a0 [btrfs]
[   36.458002]  [] end_bio_extent_readpage+0x30a/0x560 
[btrfs]
[   36.459662]  [] ? btrfs_create_repair_bio+0xe0/0xe0 
[btrfs]

[   36.460715]  [] bio_endio+0x40/0x60
[   36.461459]  [] end_workqueue_fn+0x3c/0x40 [btrfs]
[   36.462387]  [] normal_work_helper+0xc0/0x270 [btrfs]
[   36.463360]  [] btrfs_endio_helper+0x12/0x20 [btrfs]
[   36.464314]  [] process_one_work+0x14e/0x3d0
[   36.465158]  [] worker_thread+0x11a/0x470
[   36.466264]  [] ? rescuer_thread+0x310/0x310
[   36.467154]  [] kthread+0xc9/0xe0
[   36.467863]  [] ? kthread_park+0x60/0x60
[   36.468791]  [] ret_from_fork+0x3f/0x70
[   36.470022]  [] ? kthread_park+0x60/0x60
[   36.471334] Code: fe ff ff 48 89 df 41 bf fb ff ff ff e8 21 70 20 c1 
31 f6 4c 89 ef e8 07 eb 00 00 e9 d1 fe ff ff 41 bf fb ff ff ff e9 c6 fe 
ff ff <0f> 0b 0f 0b 49 8b 4d 30 49 8b b6 58 fe ff ff 48 83 c1 10 48 85
[   36.475278] RIP  [] repair_io_failure+0x1a9/0x1f0 
[btrfs]

[   36.476256]  RSP 
[   36.476783] ---[ end trace a06ea60748bbedae ]---
[   36.481369] BUG: unable to handle kernel paging request at 
ffd8

[   36.484441] IP: [] kthread_data+0x10/0x20
[   36.486710] PGD 1c13067 PUD 1c15067 PMD 0
[   36.488690] Oops:  [#2] SMP
[   36.490516] Modules linked in: snd_hda_codec_generic iosf_mbi 
crct10dif_pclmul crc32_p

Re: Exclusive quota of snapshot exceeded despite no space used

2015-10-22 Thread Qu Wenruo



在 2015年10月23日 04:38, Johannes Henninger 写道:

I'm having a weird problem with snapshots and exclusive quotas. After
creating a snapshot of a subvolume and setting an exclusive quota of
50MB for the snapshot, everything seems to work fine. I can write
approximately 50MB before the quota kicks in.

However, if I create a snapshot, set an exclusive quota and just wait
for some time, I suddenly cannot even create an empty file because I'm
getting a "quota exceeded" error. The time until the bug appears seems
to vary. During the waiting time, I'm changing neither the snapshot nor
the original subvolume. "qgroup show -e" reports an exclusive use of
only a few kilobytes for the snapshot, which is nowhere near the limit.

Steps to reproduce (/media/extern is a fresh and empty btrfs partition):

Enable quota and create an empty subvolume:
 root@t420:/media/extern# btrfs quota enable .
 root@t420:/media/extern# btrfs subvolume create sub
 Create subvolume './sub'

Snapshot the subvolume and set a limit:
 root@t420:/media/extern# btrfs subvolume snapshot sub snap
 Create a snapshot of 'sub' in './snap'
 root@t420:/media/extern# cd snap/
 root@t420:/media/extern/snap# btrfs qgroup limit -e 50M .

Sometimes it takes "longer" for the quota to kick in, so I'm touching a
file every 5 minutes here:

 root@t420:/media/extern/snap# for file in {1..100}; do touch $file;
sleep 5m; done
 touch: cannot touch ‘7’: Disk quota exceeded
 ^C
 root@t420:/media/extern/snap# btrfs qgroup show -e .
 qgroupid rfer excl max_excl
    
 0/5  16.00KiB 16.00KiB none
 0/25716.00KiB 16.00KiB none
 0/25816.00KiB 16.00KiB 50.00MiB

Any idea why this happens?
BTW, to make btrfs qgroup show work, it's better to call sync before 
qgroup show.


It's a known bug that even after qgroup accounting rework, qgroup 
reserve still has bug and can cause reserved space to underflow, making 
such problem happen.


For such case, btrfs qgroup show won't help as reserved space is not 
shown in the output.


One workaround would be, umount the filesystem and mount again.
Which will reset the underflow reserved space and work for sometime.

If it's OK for you to recompile the kernel, you can try the following 
patchset:

[PATCH v3 00/21] Rework btrfs qgroup reserved space framework

Which should solve the problem.

Thanks,
Qu



Thanks,
Johannes

System info:

 Linux t420 4.3.0-rc5 #1 SMP Tue Oct 13 13:21:02 CEST 2015 x86_64
GNU/Linux

 Label: none  uuid: 9551e3ca-1608-469c-9d8c-77b99ce0e8ec
 Total devices 1 FS bytes used 816.00KiB
 devid1 size 931.51GiB used 2.04GiB path /dev/sdb1

 btrfs-progs v4.1.2

 Data, single: total=8.00MiB, used=256.00KiB
 System, DUP: total=8.00MiB, used=16.00KiB
 System, single: total=4.00MiB, used=0.00B
 Metadata, DUP: total=1.00GiB, used=544.00KiB
 Metadata, single: total=8.00MiB, used=0.00B
 GlobalReserve, single: total=16.00MiB, used=0.00B

 [249174.151820]  sdb: sdb1
 [249184.387377]  sdb: sdb1
 [249184.573096]  sdb: sdb1
 [249184.656274] BTRFS: device fsid
9551e3ca-1608-469c-9d8c-77b99ce0e8ec devid 1 transid 3 /dev/sdb1
 [249186.323915]  sdb: sdb1
 [249186.534505]  sdb: sdb1
 [249186.538420]  sdb: sdb1
 [249196.781978] BTRFS info (device sdb1): disk space caching is enabled
 [249196.781986] BTRFS: has skinny extents
 [249196.781990] BTRFS: flagging fs with big metadata feature
 [249196.818164] BTRFS: creating UUID tree
 [249202.311983] BTRFS info (device sdb1): qgroup scan completed
(inconsistency flag cleared)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] btrfs/ioctl.c: Prefer inode with lowest offset as source for clone

2015-10-22 Thread Zygo Blaxell
On Tue, Oct 20, 2015 at 04:29:46PM +0300, Timofey Titovets wrote:
> For performance reason, leave data at the start of disk, is preferable
> while deduping
> It's might sense for the reasons:
> 1. Spinning rust - start of the disk is much faster
> 2. Btrfs can deallocate empty data chunk from the end of fs - ie it's compact 
> fs

"src" is the extent that is kept, and "dst" is the extent that is
discarded.  When both extents are shared, the dedup userspace has to
pass a common "src" with many different "dst" over several extent-same
calls in order to get rid of all of the references to the "dst" extent.

If "src" and "dst" are arbitrarily swapped over multiple extent-same calls
then it becomes impossible to dedup shared extents.  Heck, if there are
more than two extents even in one extent-same call then it stops working.

It would be possible to have dedup figure out which extent the kernel
picked after the fact, but that's totally unnecessary extra work in
cases where the userspace has a good reason to pick the extents it did
(e.g. administrator hints about future usage of the files where the
extents were found).

Dedup userspace can figure out the physical addresses of the extents
and rearrange the arguments itself if desired.

> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/ioctl.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 3e3e613..3eb77c0 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src,
> u64 loff, u64 olen,
> 
>   /* pass original length for comparison so we stay within i_size */
>   ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> - if (ret == 0)
> - ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> + if (ret == 0) {
> + /* prefer inode with lowest offset as source for clone*/
> + if (loff > dest_loff)
> + ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
> + else
> + ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> + }
> 
>   if (same_inode)
>   unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> -- 
> 2.6.1

> From 5ed3822bc308c726d91a837fbd97ebacaa51e58d Mon Sep 17 00:00:00 2001
> From: Timofey Titovets 
> Date: Tue, 20 Oct 2015 15:53:20 +0300
> Subject: [RFC PATCH] btrfs/ioctl.c: Prefer inode with lowest offset as source 
> for
>  clone
> 
> For performance reason, leave data at the start of disk, is preferable
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/ioctl.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 3e3e613..3eb77c0 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src, u64 
> loff, u64 olen,
>  
>   /* pass original length for comparison so we stay within i_size */
>   ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> - if (ret == 0)
> - ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> + if (ret == 0) {
> + /* prefer inode with lowest offset as source for clone*/
> + if (loff > dest_loff)
> + ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 
> 1);
> + else
> + ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 
> 1);
> + }
>  
>   if (same_inode)
>   unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> -- 
> 2.6.1
> 



signature.asc
Description: Digital signature


[PATCH v4] btrfs: qgroup: Don't copy extent buffer to do qgroup rescan

2015-10-22 Thread Qu Wenruo
Ancient qgroup code call memcpy() on a extent buffer and use it for leaf
iteration.

As extent buffer contains lock, pointers to pages, it's never sane to do
such copy.

The following bug may be caused by this insane operation:
[92098.841309] general protection fault:  [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
4.3.0-rc1 #1
[92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
[btrfs]
[92098.842261] Call Trace:
[92098.842277]  [] ? read_extent_buffer+0xb8/0x110
[btrfs]
[92098.842304]  [] ? btrfs_find_all_roots+0x60/0x70
[btrfs]
[92098.842329]  []
btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]

Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(),
called in reading key from the copied extent_buffer.

This patch will use btrfs_clone_extent_buffer() to a better copy of
extent buffer to deal such case.

Reported-by: Stephane Lesimple 
Suggested-by: Filipe Manana 
Signed-off-by: Qu Wenruo 
---
v2:
  Follow the parameter change in previous patch.
v3:
  None
v4:
  Use btrfs_clone_extent_buffer() other than introducing new facilities
---
 fs/btrfs/qgroup.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 158633c..5534629 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2192,10 +2192,10 @@ void assert_qgroups_uptodate(struct btrfs_trans_handle 
*trans)
  */
 static int
 qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
-  struct btrfs_trans_handle *trans,
-  struct extent_buffer *scratch_leaf)
+  struct btrfs_trans_handle *trans)
 {
struct btrfs_key found;
+   struct extent_buffer *scratch_leaf = NULL;
struct ulist *roots = NULL;
struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem);
u64 num_bytes;
@@ -2233,9 +2233,17 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct 
btrfs_path *path,
fs_info->qgroup_rescan_progress.objectid = found.objectid + 1;
 
btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem);
-   memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf));
-   slot = path->slots[0];
+   scratch_leaf = btrfs_clone_extent_buffer(path->nodes[0]);
+   if (!scratch_leaf) {
+   ret = -ENOMEM;
+   mutex_unlock(&fs_info->qgroup_rescan_lock);
+   goto out;
+   }
+   extent_buffer_get(scratch_leaf);
+   btrfs_tree_read_lock(scratch_leaf);
+   btrfs_set_lock_blocking_rw(scratch_leaf, BTRFS_READ_LOCK);
btrfs_release_path(path);
+   slot = path->slots[0];
mutex_unlock(&fs_info->qgroup_rescan_lock);
 
for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) {
@@ -2259,6 +2267,10 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct 
btrfs_path *path,
goto out;
}
 out:
+   if (scratch_leaf) {
+   btrfs_tree_read_unlock_blocking(scratch_leaf);
+   free_extent_buffer(scratch_leaf);
+   }
btrfs_put_tree_mod_seq(fs_info, &tree_mod_seq_elem);
 
return ret;
@@ -2270,16 +2282,12 @@ static void btrfs_qgroup_rescan_worker(struct 
btrfs_work *work)
 qgroup_rescan_work);
struct btrfs_path *path;
struct btrfs_trans_handle *trans = NULL;
-   struct extent_buffer *scratch_leaf = NULL;
int err = -ENOMEM;
int ret = 0;
 
path = btrfs_alloc_path();
if (!path)
goto out;
-   scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS);
-   if (!scratch_leaf)
-   goto out;
 
err = 0;
while (!err) {
@@ -2291,8 +2299,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work 
*work)
if (!fs_info->quota_enabled) {
err = -EINTR;
} else {
-   err = qgroup_rescan_leaf(fs_info, path, trans,
-scratch_leaf);
+   err = qgroup_rescan_leaf(fs_info, path, trans);
}
if (err > 0)
btrfs_commit_transaction(trans, fs_info->fs_root);
@@ -2301,7 +2308,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work 
*work)
}
 
 out:
-   kfree(scratch_leaf);
btrfs_free_path(path);
 
mutex_lock(&fs_info->qgroup_rescan_lock);
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: igrab inode in writepage

2015-10-22 Thread Liu Bo

On 10/23/2015 03:05 AM, Josef Bacik wrote:

We hit this panic on a few of our boxes this week where we have an
ordered_extent with an NULL inode.  We do an igrab() of the inode in writepages,
but weren't doing it in writepage which can be called directly from the VM on
dirty pages.  If the inode has been unlinked then we could have I_FREEING set
which means igrab() would return NULL and we get this panic.  Fix this by trying
to igrab in btrfs_writepage, and if it returns NULL then just redirty the page
and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful.  
Thanks,



Reviewed-by: Liu Bo 

thanks,

-Liubo


Signed-off-by: Josef Bacik 
---
  fs/btrfs/inode.c | 17 +++--
  1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a0fa725..4d1fdc2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8438,15 +8438,28 @@ int btrfs_readpage(struct file *file, struct page *page)
  static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
  {
struct extent_io_tree *tree;
-
+   struct inode *inode = page->mapping->host;
+   int ret;

if (current->flags & PF_MEMALLOC) {
redirty_page_for_writepage(wbc, page);
unlock_page(page);
return 0;
}
+
+   /*
+* If we are under memory pressure we will call this directly from the
+* VM, we need to make sure we have the inode referenced for the ordered
+* extent.  If not just return like we didn't do anything.
+*/
+   if (!igrab(inode)) {
+   redirty_page_for_writepage(wbc, page);
+   return AOP_WRITEPAGE_ACTIVATE;
+   }
tree = &BTRFS_I(page->mapping->host)->io_tree;
-   return extent_write_full_page(tree, page, btrfs_get_extent, wbc);
+   ret = extent_write_full_page(tree, page, btrfs_get_extent, wbc);
+   btrfs_add_delayed_iput(inode);
+   return ret;
  }

  static int btrfs_writepages(struct address_space *mapping,



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html