Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-23 Thread Qu Wenruo



在 2015年09月23日 17:40, Stéphane Lesimple 写道:

Le 2015-09-23 09:03, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/22 16:31 +0200:

Le 2015-09-22 10:51, Qu Wenruo a écrit :

[92098.842261] Call Trace:
[92098.842277]  [] ?
read_extent_buffer+0xb8/0x110
[btrfs]
[92098.842304]  [] ?
btrfs_find_all_roots+0x60/0x70
[btrfs]
[92098.842329]  []
btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]


Would you please show the code of it?
This one seems to be another stupid bug I made when rewriting the
framework.
Maybe I forgot to reinit some variants or I'm screwing memory...


(gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
2755
2756static inline void btrfs_disk_key_to_cpu(struct btrfs_key
*cpu,
2757 struct
btrfs_disk_key
*disk)
2758{
2759cpu->offset =e64_to_cpu(disk->offset);
2760cpu->type =isk->type;
2761cpu->objectid =e64_to_cpu(disk->objectid);
2762}
2763
2764static inline void btrfs_cpu_key_to_disk(struct
btrfs_disk_key
*disk,
(gdb)


Does it makes sense ?

So it seems that the memory of cpu key is being screwed up...

The code is be specific thin inline function, so what about other
stack?
Like btrfs_qgroup_rescan_helper+0x12?

Thanks,
Qu

Oh, I forgot that you can just change the number of
btrfs_qgroup_rescan_worker+0x28d to smaller value.
Try +0x280 for example, which will revert to 14 bytes asm code back,
which may jump out of the inline function range, and may give you a
good hint.

Or gdb may have a better mode for inline function, but I don't know...


Actually, "list -" is our friend here (show 10 lignes before the last
src output)

No, that's not the case.

List - will only show lines around the source code.

What I need is to get the higher caller stack.
If debugging a running program, it's quite easy to just use frame
command.

But in this situation, we don't have call stack, so I'd like to change
the +0x28d to several bytes backward, until we jump out of the inline
function call, and see the meaningful codes.


Ah, you're right.
I had a hard time finding a value where I wouldn't end up in another inline
function or entirely somewhere else in the kernel code, but here it is :

(gdb) list *(btrfs_qgroup_rescan_worker+0x26e)
0x97f4e is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2237).
2232memcpy(scratch_leaf, path->nodes[0],
sizeof(*scratch_leaf));
2233slot = path->slots[0];
2234btrfs_release_path(path);
2235mutex_unlock(_info->qgroup_rescan_lock);
2236
2237for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) {
2238btrfs_item_key_to_cpu(scratch_leaf, ,
slot); <== here

2239if (found.type != BTRFS_EXTENT_ITEM_KEY &&
2240found.type != BTRFS_METADATA_ITEM_KEY)
2241continue;

the btrfs_item_key_to_cpu() inline func calls 2 other inline funcs:

static inline void btrfs_item_key_to_cpu(struct extent_buffer *eb,
   struct btrfs_key *key, int nr)
{
 struct btrfs_disk_key disk_key;
 btrfs_item_key(eb, _key, nr);
 btrfs_disk_key_to_cpu(key, _key); <== this is 0x28d
}

btrfs_disk_key_to_cpu() is the inline referenced by 0x28d and this is where
the GPF happens.


Thanks, now things are much more clear.
Not completely sure, but scratch_leaf seems invalid and cause the bug.
(found is in stack memory, so I don't think it's the cause).

But less related to the qgroup rework, as that's the existing code.

A quick glance already shows some dirty and maybe deadly hack, like 
copying the whole extent buffer, which includes pages and all kinds of 
locks.


But I'm not 100% sure if that's the problem, but I'll create a patch to 
for you to test in recent days.






BTW, did you tried the following patch?
https://patchwork.kernel.org/patch/7114321/
btrfs: qgroup: exit the rescan worker during umount

The problem seems a little related to the bug you encountered, so I'd
recommend to give it a try.


Not yet, but I've come across this bug too during my tests: starting a
rescan
and umounting gets you a crash. I didn't mention it because I was sure this
was an already known bug. Nice to see it has been fixed though !
I'll certainly give it a try but I'm not really sure it'll fix the specific
bug we're talking about.
However the group of patches posted by Mark should fix the qgroup count
disrepancies as I understand it, right ? It might be of interest to try
them
all at once for sure.


Yes, his patch should fix the qgroup count mismatch problem for 
subvolume remove.


If I read the codes correctly, after remove and sync, the accounting 
number for qgroup of deleted subvolume should be:

rfer = 0 and excl = 0.

Thanks,
Qu


Thanks,


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to 

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-23 Thread Qu Wenruo



Stéphane Lesimple wrote on 2015/09/22 16:31 +0200:

Le 2015-09-22 10:51, Qu Wenruo a écrit :

[92098.842261] Call Trace:
[92098.842277]  [] ? read_extent_buffer+0xb8/0x110
[btrfs]
[92098.842304]  [] ? btrfs_find_all_roots+0x60/0x70
[btrfs]
[92098.842329]  []
btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]


Would you please show the code of it?
This one seems to be another stupid bug I made when rewriting the
framework.
Maybe I forgot to reinit some variants or I'm screwing memory...


(gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
2755
2756static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
2757 struct btrfs_disk_key
*disk)
2758{
2759cpu->offset =e64_to_cpu(disk->offset);
2760cpu->type =isk->type;
2761cpu->objectid =e64_to_cpu(disk->objectid);
2762}
2763
2764static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
*disk,
(gdb)


Does it makes sense ?

So it seems that the memory of cpu key is being screwed up...

The code is be specific thin inline function, so what about other stack?
Like btrfs_qgroup_rescan_helper+0x12?

Thanks,
Qu

Oh, I forgot that you can just change the number of
btrfs_qgroup_rescan_worker+0x28d to smaller value.
Try +0x280 for example, which will revert to 14 bytes asm code back,
which may jump out of the inline function range, and may give you a
good hint.

Or gdb may have a better mode for inline function, but I don't know...


Actually, "list -" is our friend here (show 10 lignes before the last
src output)

No, that's not the case.

List - will only show lines around the source code.

What I need is to get the higher caller stack.
If debugging a running program, it's quite easy to just use frame command.

But in this situation, we don't have call stack, so I'd like to change 
the +0x28d to several bytes backward, until we jump out of the inline 
function call, and see the meaningful codes.



BTW, did you tried the following patch?
https://patchwork.kernel.org/patch/7114321/
btrfs: qgroup: exit the rescan worker during umount

The problem seems a little related to the bug you encountered, so I'd 
recommend to give it a try.


Thanks,
Qu


(gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
2755
2756static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
2757 struct btrfs_disk_key
*disk)
2758{
2759cpu->offset = le64_to_cpu(disk->offset);
2760cpu->type = disk->type;
2761cpu->objectid = le64_to_cpu(disk->objectid);
2762}
2763
2764static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
*disk,
(gdb) list -
2745struct
btrfs_disk_key *key)
2746{
2747write_eb_member(eb, h, struct btrfs_free_space_header,
location, key);
2748}
2749
2750/* struct btrfs_disk_key */
2751BTRFS_SETGET_STACK_FUNCS(disk_key_objectid, struct btrfs_disk_key,
2752 objectid, 64);
2753BTRFS_SETGET_STACK_FUNCS(disk_key_offset, struct btrfs_disk_key,
offset, 64);
2754BTRFS_SETGET_STACK_FUNCS(disk_key_type, struct btrfs_disk_key,
type, 8);
(gdb) list -
2735
2736static inline void btrfs_free_space_key(struct extent_buffer *eb,
2737struct
btrfs_free_space_header *h,
2738struct btrfs_disk_key *key)
2739{
2740read_eb_member(eb, h, struct btrfs_free_space_header,
location, key);
2741}
2742
2743static inline void btrfs_set_free_space_key(struct extent_buffer
*eb,
2744struct
btrfs_free_space_header *h,
(gdb)

Lots of inline funcs and macros it seems.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-23 Thread Stéphane Lesimple

Le 2015-09-23 09:03, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/22 16:31 +0200:

Le 2015-09-22 10:51, Qu Wenruo a écrit :

[92098.842261] Call Trace:
[92098.842277]  [] ? 
read_extent_buffer+0xb8/0x110

[btrfs]
[92098.842304]  [] ? 
btrfs_find_all_roots+0x60/0x70

[btrfs]
[92098.842329]  []
btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]


Would you please show the code of it?
This one seems to be another stupid bug I made when rewriting the
framework.
Maybe I forgot to reinit some variants or I'm screwing memory...


(gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
2755
2756static inline void btrfs_disk_key_to_cpu(struct btrfs_key 
*cpu,
2757 struct 
btrfs_disk_key

*disk)
2758{
2759cpu->offset =e64_to_cpu(disk->offset);
2760cpu->type =isk->type;
2761cpu->objectid =e64_to_cpu(disk->objectid);
2762}
2763
2764static inline void btrfs_cpu_key_to_disk(struct 
btrfs_disk_key

*disk,
(gdb)


Does it makes sense ?

So it seems that the memory of cpu key is being screwed up...

The code is be specific thin inline function, so what about other 
stack?

Like btrfs_qgroup_rescan_helper+0x12?

Thanks,
Qu

Oh, I forgot that you can just change the number of
btrfs_qgroup_rescan_worker+0x28d to smaller value.
Try +0x280 for example, which will revert to 14 bytes asm code back,
which may jump out of the inline function range, and may give you a
good hint.

Or gdb may have a better mode for inline function, but I don't 
know...


Actually, "list -" is our friend here (show 10 lignes before the last
src output)

No, that's not the case.

List - will only show lines around the source code.

What I need is to get the higher caller stack.
If debugging a running program, it's quite easy to just use frame 
command.


But in this situation, we don't have call stack, so I'd like to change
the +0x28d to several bytes backward, until we jump out of the inline
function call, and see the meaningful codes.


Ah, you're right.
I had a hard time finding a value where I wouldn't end up in another 
inline

function or entirely somewhere else in the kernel code, but here it is :

(gdb) list *(btrfs_qgroup_rescan_worker+0x26e)
0x97f4e is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2237).
2232memcpy(scratch_leaf, path->nodes[0], 
sizeof(*scratch_leaf));

2233slot = path->slots[0];
2234btrfs_release_path(path);
2235mutex_unlock(_info->qgroup_rescan_lock);
2236
2237for (; slot < btrfs_header_nritems(scratch_leaf); 
++slot) {
2238btrfs_item_key_to_cpu(scratch_leaf, , 
slot); <== here


2239if (found.type != BTRFS_EXTENT_ITEM_KEY &&
2240found.type != BTRFS_METADATA_ITEM_KEY)
2241continue;

the btrfs_item_key_to_cpu() inline func calls 2 other inline funcs:

static inline void btrfs_item_key_to_cpu(struct extent_buffer *eb,
  struct btrfs_key *key, int nr)
{
struct btrfs_disk_key disk_key;
btrfs_item_key(eb, _key, nr);
btrfs_disk_key_to_cpu(key, _key); <== this is 0x28d
}

btrfs_disk_key_to_cpu() is the inline referenced by 0x28d and this is 
where

the GPF happens.



BTW, did you tried the following patch?
https://patchwork.kernel.org/patch/7114321/
btrfs: qgroup: exit the rescan worker during umount

The problem seems a little related to the bug you encountered, so I'd
recommend to give it a try.


Not yet, but I've come across this bug too during my tests: starting a 
rescan
and umounting gets you a crash. I didn't mention it because I was sure 
this

was an already known bug. Nice to see it has been fixed though !
I'll certainly give it a try but I'm not really sure it'll fix the 
specific

bug we're talking about.
However the group of patches posted by Mark should fix the qgroup count
disrepancies as I understand it, right ? It might be of interest to try 
them

all at once for sure.

Thanks,

--
Stéphane.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-22 Thread Qu Wenruo



在 2015年09月22日 15:34, Stéphane Lesimple 写道:

Le 2015-09-22 03:37, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:

Le 2015-09-20 13:14, Stéphane Lesimple a écrit :

Le 2015-09-20 12:51, Qu Wenruo a écrit :

Would you please use gdb to show the codes of
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)

My guess is the following line:(pretty sure, but not 100% sure)
--
/*
 * only update status, since the previous part has alreay
updated the
 * qgroup info.
 */
trans = btrfs_start_transaction(fs_info->quota_root, 1);
<
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
btrfs_err(fs_info,
  "fail to start transaction for status
update: %d\n",
  err);
goto done;
}
--


The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of
/proc/kcore
properly
aligned with the source I compiled: the asm code doesn't match the C
code shown
by gdb. In any case, watching the source of this function, this is
the
only place
btrfs_start_transaction is called, so we can be 100% sure it's where
the
crash
happens indeed.


Yep, that's the only caller.

Here is some useful small hint to locate the code, if you are
interestied in kernel development.

# Not sure about whether ubuntu gzipped modules, at least Arch does
# compress it
$ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/
$ gunzip /tmp/btrfs.ko.gz
$ gdb /tmp/btrfs.ko
# Make sure gdb read all the needed debuginfo
$ gdb list *(btrfs_qgroup_rescan_worker+0x388)

And gdb will find the code position for you.
Quite easy one, only backtrace info is needed.


Ah, thanks for the tips, I was loading whole vmlinux and using
/proc/kcore
as the core info, then adding the module with "add-symbol-file". But as
we're just looking for the code and not the variables, it was indeed
completely overkill.

(gdb) list *(btrfs_qgroup_rescan_worker+0x388)
0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
2323
2324/*
2325 * only update status, since the previous part has
alreay updated the
2326 * qgroup info.
2327 */
2328trans = btrfs_start_transaction(fs_info->quota_root,
1);
2329if (IS_ERR(trans)) {
2330err = PTR_ERR(trans);
2331btrfs_err(fs_info,
2332  "fail to start transaction for
status update: %d\n",

So this just confirms what we were already 99% sure of.


Another hint is about how to collect the kernel crash info.
Your netconsole setup would be definitely one good practice.

Another one I use to collect crash info is kdump.
Ubuntu should have a good wiki on it.


I've already come across kdump a few times, but never really look into
it.
To debug the other complicated extend backref bug, it could be of some
use.


So, as a quick summary of this big thread, it seems I've been
hitting
3 bugs, all reproductible :
- kernel BUG on balance (this original thread)


For this, I can't provide much help, as extent backref bug is quite
hard to debug, unless a developer is interested in it and find a
stable way to reproduce it.


Yes, unfortunately as it looks so much like a race condition, I know
I can
reproduce it with my worflow, but it can take between 1 minute and 12
hours,
so I wouldn't call it a "stable way" to reproduce it unfortunately :(

Still if any dev is interested in it, I can reproduce it, with a
patched
kernel if needed.


Maybe you are already doing it, you can only compile the btrfs
modules, which will be far more faster than compile the whole kernel,
if and only if the compiled module can be loaded.


Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
I'll try to
load the modified module and see if the running kernel accepts it. I
have to rmmod
the loaded module first, hence umounting any btrfs fs before that.
Should be able
to do it in a couple hours.

I'll delete again all my snapshots and run my script. Should be easy
to trigger
the (hopefully worked-around) bug again.


Well, I didn't trigger this exact bug, but another one, not less severe
though, as it also crashed the system:

[92098.841309] general protection fault:  [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
4.3.0-rc1 #1
[92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
01/06/2014
[92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
[btrfs]
[92098.841889] task: 8800b6cc4100 ti: 8800a3dc8000 task.ti:
8800a3dc8000
[92098.841910] RIP: 0010:[]  []
memcpy_erms+0x6/0x10
[92098.841935] RSP: 0018:8800a3dcbcc8  EFLAGS: 00010207
[92098.841950] RAX: 8800a3dcbd67 RBX: 0009 RCX:
0009
[92098.841970] RDX: 0009 RSI: 00050800 RDI:
8800a3dcbd67
[92098.841989] RBP: 

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-22 Thread Qu Wenruo



在 2015年09月22日 16:40, Qu Wenruo 写道:



在 2015年09月22日 15:34, Stéphane Lesimple 写道:

Le 2015-09-22 03:37, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:

Le 2015-09-20 13:14, Stéphane Lesimple a écrit :

Le 2015-09-20 12:51, Qu Wenruo a écrit :

Would you please use gdb to show the codes of
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)

My guess is the following line:(pretty sure, but not 100% sure)
--
/*
 * only update status, since the previous part has alreay
updated the
 * qgroup info.
 */
trans =trfs_start_transaction(fs_info->quota_root, 1);
<
if (IS_ERR(trans)) {
err =TR_ERR(trans);
btrfs_err(fs_info,
  "fail to start transaction for status
update: %d\n",
  err);
goto done;
}
--


The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of
/proc/kcore
properly
aligned with the source I compiled: the asm code doesn't match the C
code shown
by gdb. In any case, watching the source of this function, this is
the
only place
btrfs_start_transaction is called, so we can be 100% sure it's where
the
crash
happens indeed.


Yep, that's the only caller.

Here is some useful small hint to locate the code, if you are
interestied in kernel development.

# Not sure about whether ubuntu gzipped modules, at least Arch does
# compress it
$ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/
$ gunzip /tmp/btrfs.ko.gz
$ gdb /tmp/btrfs.ko
# Make sure gdb read all the needed debuginfo
$ gdb list *(btrfs_qgroup_rescan_worker+0x388)

And gdb will find the code position for you.
Quite easy one, only backtrace info is needed.


Ah, thanks for the tips, I was loading whole vmlinux and using
/proc/kcore
as the core info, then adding the module with "add-symbol-file".
But as
we're just looking for the code and not the variables, it was indeed
completely overkill.

(gdb) list *(btrfs_qgroup_rescan_worker+0x388)
0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
2323
2324/*
2325 * only update status, since the previous part has
alreay updated the
2326 * qgroup info.
2327 */
2328trans =trfs_start_transaction(fs_info->quota_root,
1);
2329if (IS_ERR(trans)) {
2330err =TR_ERR(trans);
2331btrfs_err(fs_info,
2332  "fail to start transaction for
status update: %d\n",

So this just confirms what we were already 99% sure of.


Another hint is about how to collect the kernel crash info.
Your netconsole setup would be definitely one good practice.

Another one I use to collect crash info is kdump.
Ubuntu should have a good wiki on it.


I've already come across kdump a few times, but never really look into
it.
To debug the other complicated extend backref bug, it could be of some
use.


So, as a quick summary of this big thread, it seems I've been
hitting
3 bugs, all reproductible :
- kernel BUG on balance (this original thread)


For this, I can't provide much help, as extent backref bug is quite
hard to debug, unless a developer is interested in it and find a
stable way to reproduce it.


Yes, unfortunately as it looks so much like a race condition, I know
I can
reproduce it with my worflow, but it can take between 1 minute
and 12
hours,
so I wouldn't call it a "stable way" to reproduce it
unfortunately :(

Still if any dev is interested in it, I can reproduce it, with a
patched
kernel if needed.


Maybe you are already doing it, you can only compile the btrfs
modules, which will be far more faster than compile the whole kernel,
if and only if the compiled module can be loaded.


Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
I'll try to
load the modified module and see if the running kernel accepts it. I
have to rmmod
the loaded module first, hence umounting any btrfs fs before that.
Should be able
to do it in a couple hours.

I'll delete again all my snapshots and run my script. Should be easy
to trigger
the (hopefully worked-around) bug again.


Well, I didn't trigger this exact bug, but another one, not less severe
though, as it also crashed the system:

[92098.841309] general protection fault:  [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
4.3.0-rc1 #1
[92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
01/06/2014
[92098.841868] Workqueue: btrfs-qgroup-rescan
btrfs_qgroup_rescan_helper
[btrfs]
[92098.841889] task: 8800b6cc4100 ti: 8800a3dc8000 task.ti:
8800a3dc8000
[92098.841910] RIP: 0010:[]  []
memcpy_erms+0x6/0x10
[92098.841935] RSP: 0018:8800a3dcbcc8  EFLAGS: 00010207
[92098.841950] RAX: 8800a3dcbd67 RBX: 0009 RCX:
0009
[92098.841970] RDX: 0009 RSI: 00050800 RDI:

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-22 Thread Stéphane Lesimple

Le 2015-09-22 03:37, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:

Le 2015-09-20 13:14, Stéphane Lesimple a écrit :

Le 2015-09-20 12:51, Qu Wenruo a écrit :

Would you please use gdb to show the codes of
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)

My guess is the following line:(pretty sure, but not 100% sure)
--
/*
 * only update status, since the previous part has alreay
updated the
 * qgroup info.
 */
trans = btrfs_start_transaction(fs_info->quota_root, 1); 
<

if (IS_ERR(trans)) {
err = PTR_ERR(trans);
btrfs_err(fs_info,
  "fail to start transaction for status
update: %d\n",
  err);
goto done;
}
--


The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of 
/proc/kcore

properly
aligned with the source I compiled: the asm code doesn't match the 
C

code shown
by gdb. In any case, watching the source of this function, this is 
the

only place
btrfs_start_transaction is called, so we can be 100% sure it's 
where

the
crash
happens indeed.


Yep, that's the only caller.

Here is some useful small hint to locate the code, if you are
interestied in kernel development.

# Not sure about whether ubuntu gzipped modules, at least Arch does
# compress it
$ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/
$ gunzip /tmp/btrfs.ko.gz
$ gdb /tmp/btrfs.ko
# Make sure gdb read all the needed debuginfo
$ gdb list *(btrfs_qgroup_rescan_worker+0x388)

And gdb will find the code position for you.
Quite easy one, only backtrace info is needed.


Ah, thanks for the tips, I was loading whole vmlinux and using
/proc/kcore
as the core info, then adding the module with "add-symbol-file". But 
as

we're just looking for the code and not the variables, it was indeed
completely overkill.

(gdb) list *(btrfs_qgroup_rescan_worker+0x388)
0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
2323
2324/*
2325 * only update status, since the previous part has
alreay updated the
2326 * qgroup info.
2327 */
2328trans = btrfs_start_transaction(fs_info->quota_root, 
1);

2329if (IS_ERR(trans)) {
2330err = PTR_ERR(trans);
2331btrfs_err(fs_info,
2332  "fail to start transaction for
status update: %d\n",

So this just confirms what we were already 99% sure of.


Another hint is about how to collect the kernel crash info.
Your netconsole setup would be definitely one good practice.

Another one I use to collect crash info is kdump.
Ubuntu should have a good wiki on it.


I've already come across kdump a few times, but never really look 
into

it.
To debug the other complicated extend backref bug, it could be of 
some

use.


So, as a quick summary of this big thread, it seems I've been
hitting
3 bugs, all reproductible :
- kernel BUG on balance (this original thread)


For this, I can't provide much help, as extent backref bug is 
quite

hard to debug, unless a developer is interested in it and find a
stable way to reproduce it.


Yes, unfortunately as it looks so much like a race condition, I 
know

I can
reproduce it with my worflow, but it can take between 1 minute and 
12

hours,
so I wouldn't call it a "stable way" to reproduce it unfortunately 
:(


Still if any dev is interested in it, I can reproduce it, with a
patched
kernel if needed.


Maybe you are already doing it, you can only compile the btrfs
modules, which will be far more faster than compile the whole 
kernel,

if and only if the compiled module can be loaded.


Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
I'll try to
load the modified module and see if the running kernel accepts it. I
have to rmmod
the loaded module first, hence umounting any btrfs fs before that.
Should be able
to do it in a couple hours.

I'll delete again all my snapshots and run my script. Should be easy
to trigger
the (hopefully worked-around) bug again.


Well, I didn't trigger this exact bug, but another one, not less 
severe

though, as it also crashed the system:

[92098.841309] general protection fault:  [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
4.3.0-rc1 #1
[92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
01/06/2014
[92098.841868] Workqueue: btrfs-qgroup-rescan 
btrfs_qgroup_rescan_helper

[btrfs]
[92098.841889] task: 8800b6cc4100 ti: 8800a3dc8000 task.ti:
8800a3dc8000
[92098.841910] RIP: 0010:[]  []
memcpy_erms+0x6/0x10
[92098.841935] RSP: 0018:8800a3dcbcc8  EFLAGS: 00010207
[92098.841950] RAX: 8800a3dcbd67 RBX: 0009 RCX:
0009
[92098.841970] RDX: 0009 RSI: 00050800 RDI:
8800a3dcbd67
[92098.841989] RBP: 8800a3dcbd00 

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-21 Thread Stéphane Lesimple

Le 2015-09-20 13:14, Stéphane Lesimple a écrit :

Le 2015-09-20 12:51, Qu Wenruo a écrit :

Would you please use gdb to show the codes of
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)

My guess is the following line:(pretty sure, but not 100% sure)
--
/*
 * only update status, since the previous part has alreay
updated the
 * qgroup info.
 */
trans = btrfs_start_transaction(fs_info->quota_root, 1); 
<

if (IS_ERR(trans)) {
err = PTR_ERR(trans);
btrfs_err(fs_info,
  "fail to start transaction for status
update: %d\n",
  err);
goto done;
}
--


The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of 
/proc/kcore

properly
aligned with the source I compiled: the asm code doesn't match the C
code shown
by gdb. In any case, watching the source of this function, this is 
the

only place
btrfs_start_transaction is called, so we can be 100% sure it's where 
the

crash
happens indeed.


Yep, that's the only caller.

Here is some useful small hint to locate the code, if you are
interestied in kernel development.

# Not sure about whether ubuntu gzipped modules, at least Arch does
# compress it
$ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/
$ gunzip /tmp/btrfs.ko.gz
$ gdb /tmp/btrfs.ko
# Make sure gdb read all the needed debuginfo
$ gdb list *(btrfs_qgroup_rescan_worker+0x388)

And gdb will find the code position for you.
Quite easy one, only backtrace info is needed.


Ah, thanks for the tips, I was loading whole vmlinux and using 
/proc/kcore

as the core info, then adding the module with "add-symbol-file". But as
we're just looking for the code and not the variables, it was indeed
completely overkill.

(gdb) list *(btrfs_qgroup_rescan_worker+0x388)
0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
2323
2324/*
2325 * only update status, since the previous part has
alreay updated the
2326 * qgroup info.
2327 */
2328trans = btrfs_start_transaction(fs_info->quota_root, 
1);

2329if (IS_ERR(trans)) {
2330err = PTR_ERR(trans);
2331btrfs_err(fs_info,
2332  "fail to start transaction for
status update: %d\n",

So this just confirms what we were already 99% sure of.


Another hint is about how to collect the kernel crash info.
Your netconsole setup would be definitely one good practice.

Another one I use to collect crash info is kdump.
Ubuntu should have a good wiki on it.


I've already come across kdump a few times, but never really look into 
it.
To debug the other complicated extend backref bug, it could be of some 
use.


So, as a quick summary of this big thread, it seems I've been 
hitting

3 bugs, all reproductible :
- kernel BUG on balance (this original thread)


For this, I can't provide much help, as extent backref bug is quite
hard to debug, unless a developer is interested in it and find a
stable way to reproduce it.


Yes, unfortunately as it looks so much like a race condition, I know 
I can

reproduce it with my worflow, but it can take between 1 minute and 12
hours,
so I wouldn't call it a "stable way" to reproduce it unfortunately :(

Still if any dev is interested in it, I can reproduce it, with a 
patched

kernel if needed.


Maybe you are already doing it, you can only compile the btrfs
modules, which will be far more faster than compile the whole kernel,
if and only if the compiled module can be loaded.


Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll 
try to

load the modified module and see if the running kernel accepts it. I
have to rmmod
the loaded module first, hence umounting any btrfs fs before that.
Should be able
to do it in a couple hours.

I'll delete again all my snapshots and run my script. Should be easy to 
trigger

the (hopefully worked-around) bug again.


Well, I didn't trigger this exact bug, but another one, not less severe 
though, as it also crashed the system:


[92098.841309] general protection fault:  [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted 
4.3.0-rc1 #1
[92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005 
01/06/2014
[92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper 
[btrfs]
[92098.841889] task: 8800b6cc4100 ti: 8800a3dc8000 task.ti: 
8800a3dc8000
[92098.841910] RIP: 0010:[]  [] 
memcpy_erms+0x6/0x10

[92098.841935] RSP: 0018:8800a3dcbcc8  EFLAGS: 00010207
[92098.841950] RAX: 8800a3dcbd67 RBX: 0009 RCX: 
0009
[92098.841970] RDX: 0009 RSI: 00050800 RDI: 
8800a3dcbd67
[92098.841989] RBP: 8800a3dcbd00 R08: 00019c60 R09: 
88011fb19c60
[92098.842009] R10: ea0003006480 R11: 

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-21 Thread Qu Wenruo



Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:

Le 2015-09-20 13:14, Stéphane Lesimple a écrit :

Le 2015-09-20 12:51, Qu Wenruo a écrit :

Would you please use gdb to show the codes of
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)

My guess is the following line:(pretty sure, but not 100% sure)
--
/*
 * only update status, since the previous part has alreay
updated the
 * qgroup info.
 */
trans = btrfs_start_transaction(fs_info->quota_root, 1); <
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
btrfs_err(fs_info,
  "fail to start transaction for status
update: %d\n",
  err);
goto done;
}
--


The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of /proc/kcore
properly
aligned with the source I compiled: the asm code doesn't match the C
code shown
by gdb. In any case, watching the source of this function, this is the
only place
btrfs_start_transaction is called, so we can be 100% sure it's where
the
crash
happens indeed.


Yep, that's the only caller.

Here is some useful small hint to locate the code, if you are
interestied in kernel development.

# Not sure about whether ubuntu gzipped modules, at least Arch does
# compress it
$ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/
$ gunzip /tmp/btrfs.ko.gz
$ gdb /tmp/btrfs.ko
# Make sure gdb read all the needed debuginfo
$ gdb list *(btrfs_qgroup_rescan_worker+0x388)

And gdb will find the code position for you.
Quite easy one, only backtrace info is needed.


Ah, thanks for the tips, I was loading whole vmlinux and using
/proc/kcore
as the core info, then adding the module with "add-symbol-file". But as
we're just looking for the code and not the variables, it was indeed
completely overkill.

(gdb) list *(btrfs_qgroup_rescan_worker+0x388)
0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
2323
2324/*
2325 * only update status, since the previous part has
alreay updated the
2326 * qgroup info.
2327 */
2328trans = btrfs_start_transaction(fs_info->quota_root, 1);
2329if (IS_ERR(trans)) {
2330err = PTR_ERR(trans);
2331btrfs_err(fs_info,
2332  "fail to start transaction for
status update: %d\n",

So this just confirms what we were already 99% sure of.


Another hint is about how to collect the kernel crash info.
Your netconsole setup would be definitely one good practice.

Another one I use to collect crash info is kdump.
Ubuntu should have a good wiki on it.


I've already come across kdump a few times, but never really look into
it.
To debug the other complicated extend backref bug, it could be of some
use.


So, as a quick summary of this big thread, it seems I've been
hitting
3 bugs, all reproductible :
- kernel BUG on balance (this original thread)


For this, I can't provide much help, as extent backref bug is quite
hard to debug, unless a developer is interested in it and find a
stable way to reproduce it.


Yes, unfortunately as it looks so much like a race condition, I know
I can
reproduce it with my worflow, but it can take between 1 minute and 12
hours,
so I wouldn't call it a "stable way" to reproduce it unfortunately :(

Still if any dev is interested in it, I can reproduce it, with a
patched
kernel if needed.


Maybe you are already doing it, you can only compile the btrfs
modules, which will be far more faster than compile the whole kernel,
if and only if the compiled module can be loaded.


Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
I'll try to
load the modified module and see if the running kernel accepts it. I
have to rmmod
the loaded module first, hence umounting any btrfs fs before that.
Should be able
to do it in a couple hours.

I'll delete again all my snapshots and run my script. Should be easy
to trigger
the (hopefully worked-around) bug again.


Well, I didn't trigger this exact bug, but another one, not less severe
though, as it also crashed the system:

[92098.841309] general protection fault:  [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
4.3.0-rc1 #1
[92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
01/06/2014
[92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
[btrfs]
[92098.841889] task: 8800b6cc4100 ti: 8800a3dc8000 task.ti:
8800a3dc8000
[92098.841910] RIP: 0010:[]  []
memcpy_erms+0x6/0x10
[92098.841935] RSP: 0018:8800a3dcbcc8  EFLAGS: 00010207
[92098.841950] RAX: 8800a3dcbd67 RBX: 0009 RCX:
0009
[92098.841970] RDX: 0009 RSI: 00050800 RDI:
8800a3dcbd67
[92098.841989] RBP: 8800a3dcbd00 R08: 00019c60 R09:
88011fb19c60
[92098.842009] R10: 

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-20 Thread Qu Wenruo



在 2015年09月20日 18:35, Stéphane Lesimple 写道:

Le 2015-09-20 03:22, Qu Wenruo a écrit :

The mentioned steps are as follows :

0) Rsync data from the next ext4 "snapshot" to the subvolume
1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
2) Create the needed readonly snapshot on btrfs
3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -⁠w' and save it <==
6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==

The resulting files are available here:
http://speed47.net/tmp2/qgroup.tar.gz
The run2 is the more complete one, during run1 the machine crashed
even faster.
It's interesting to note, however, that it seems to have crashed the
same way and at the same step in the process.


Your data really helps a lot!!

And the good news is, the qgroup accouting part is working as expected.
Although I only checked about 1/3/6 of about 5 snaps, they are all OK.

I can make a script to cross check them, but from the last few result,
I think qgroup works fine.

I'm more confident about the minus number, which should be a result of
deleted subvolume, and the real problem is, such qgroup is not handled
well with qgroup rescan.


I agree with your analysis, this matches what I observed.


I'll try to add a hot fix for such case if needed.
But right now, I don't have a good idea for it until Mark's work of
rescan subtree.

Maybe I can add a new option for btrfs-progs to automatically remove
the qgroup and trigger a rescan?


Until this is properly fixed in the kernel code, and this is good news to
know Mark and you are working on it, this would be a good workaround yes!


[ 5738.172879] Call Trace:
[ 5738.172887]  [] btrfs_start_transaction+0x1b/0x20
[btrfs]
[ 5738.172896]  []
btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]


Your netconsole backtrace is also of greate value.
This one implies that, my rework also caused some stupid bug.
(Yeah, I always make such bugs) or some existing unexposed rescan bug.

Would you please use gdb to show the codes of
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)

My guess is the following line:(pretty sure, but not 100% sure)
--
/*
 * only update status, since the previous part has alreay
updated the
 * qgroup info.
 */
trans = btrfs_start_transaction(fs_info->quota_root, 1); <
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
btrfs_err(fs_info,
  "fail to start transaction for status
update: %d\n",
  err);
goto done;
}
--


The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of /proc/kcore
properly
aligned with the source I compiled: the asm code doesn't match the C
code shown
by gdb. In any case, watching the source of this function, this is the
only place
btrfs_start_transaction is called, so we can be 100% sure it's where the
crash
happens indeed.


Yep, that's the only caller.

Here is some useful small hint to locate the code, if you are 
interestied in kernel development.


# Not sure about whether ubuntu gzipped modules, at least Arch does
# compress it
$ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/
$ gunzip /tmp/btrfs.ko.gz
$ gdb /tmp/btrfs.ko
# Make sure gdb read all the needed debuginfo
$ gdb list *(btrfs_qgroup_rescan_worker+0x388)

And gdb will find the code position for you.
Quite easy one, only backtrace info is needed.

Another hint is about how to collect the kernel crash info.
Your netconsole setup would be definitely one good practice.

Another one I use to collect crash info is kdump.
Ubuntu should have a good wiki on it.




But that means, at rescan time, fs_info->quota_root is still NULL,
which is quite wired.
I can add extra check to avoid such NULL pointer for now, but it's
better to review the existing rescan workflow, as I think there is
some race for it to init quota_root.

You can also try the following hotfix patch to see if it works:
http://pastebin.com/966GQXPk

My concern is, this may cause qgroup rescan to exit without updating
its accounting info...

So still need your help.
Or I can use your reproducer script to test it next Monday.


Compiling with your patch, just amended of a little printk to know if
the execution
flow enters the added if condition. Will let you know about the results.


But I'm pretty sure I can get that (u64)-1 value again by deleting
snapshots. Shall I ? Or do you have something else for me to run
before that ?


You have already done a great job in helping maturing qgroups.
The minus number and 0 excl is somewhat expected for deleted snapshots.

Good news is, 1) it doesn't affect valid(non-orphan) qgroup.
2) Mark is already working on it.

I'll try to add a btrfs-progs hotfix for you to delete and rescan
qgroups to avoid such problem.


That would be good !


So, as a quick summary of 

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-20 Thread Stéphane Lesimple

Le 2015-09-20 12:51, Qu Wenruo a écrit :

Would you please use gdb to show the codes of
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)

My guess is the following line:(pretty sure, but not 100% sure)
--
/*
 * only update status, since the previous part has alreay
updated the
 * qgroup info.
 */
trans = btrfs_start_transaction(fs_info->quota_root, 1); 
<

if (IS_ERR(trans)) {
err = PTR_ERR(trans);
btrfs_err(fs_info,
  "fail to start transaction for status
update: %d\n",
  err);
goto done;
}
--


The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of /proc/kcore
properly
aligned with the source I compiled: the asm code doesn't match the C
code shown
by gdb. In any case, watching the source of this function, this is the
only place
btrfs_start_transaction is called, so we can be 100% sure it's where 
the

crash
happens indeed.


Yep, that's the only caller.

Here is some useful small hint to locate the code, if you are
interestied in kernel development.

# Not sure about whether ubuntu gzipped modules, at least Arch does
# compress it
$ cp /kernel/fs/btrfs/btrfs.ko.gz /tmp/
$ gunzip /tmp/btrfs.ko.gz
$ gdb /tmp/btrfs.ko
# Make sure gdb read all the needed debuginfo
$ gdb list *(btrfs_qgroup_rescan_worker+0x388)

And gdb will find the code position for you.
Quite easy one, only backtrace info is needed.


Ah, thanks for the tips, I was loading whole vmlinux and using 
/proc/kcore

as the core info, then adding the module with "add-symbol-file". But as
we're just looking for the code and not the variables, it was indeed
completely overkill.

(gdb) list *(btrfs_qgroup_rescan_worker+0x388)
0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
2323
2324/*
2325 * only update status, since the previous part has 
alreay updated the

2326 * qgroup info.
2327 */
2328trans = btrfs_start_transaction(fs_info->quota_root, 1);
2329if (IS_ERR(trans)) {
2330err = PTR_ERR(trans);
2331btrfs_err(fs_info,
2332  "fail to start transaction for status 
update: %d\n",


So this just confirms what we were already 99% sure of.


Another hint is about how to collect the kernel crash info.
Your netconsole setup would be definitely one good practice.

Another one I use to collect crash info is kdump.
Ubuntu should have a good wiki on it.


I've already come across kdump a few times, but never really look into 
it.
To debug the other complicated extend backref bug, it could be of some 
use.


So, as a quick summary of this big thread, it seems I've been 
hitting

3 bugs, all reproductible :
- kernel BUG on balance (this original thread)


For this, I can't provide much help, as extent backref bug is quite
hard to debug, unless a developer is interested in it and find a
stable way to reproduce it.


Yes, unfortunately as it looks so much like a race condition, I know I 
can

reproduce it with my worflow, but it can take between 1 minute and 12
hours,
so I wouldn't call it a "stable way" to reproduce it unfortunately :(

Still if any dev is interested in it, I can reproduce it, with a 
patched

kernel if needed.


Maybe you are already doing it, you can only compile the btrfs
modules, which will be far more faster than compile the whole kernel,
if and only if the compiled module can be loaded.


Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll 
try to
load the modified module and see if the running kernel accepts it. I 
have to rmmod
the loaded module first, hence umounting any btrfs fs before that. 
Should be able

to do it in a couple hours.

I'll delete again all my snapshots and run my script. Should be easy to 
trigger

the (hopefully worked-around) bug again.

Regards,

--
Stéphane.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-20 Thread Stéphane Lesimple

Le 2015-09-20 03:22, Qu Wenruo a écrit :

The mentioned steps are as follows :

0) Rsync data from the next ext4 "snapshot" to the subvolume
1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output 
<==

2) Create the needed readonly snapshot on btrfs
3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output 
<==

4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -⁠w' and save it <==
6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output 
<==


The resulting files are available here:
http://speed47.net/tmp2/qgroup.tar.gz
The run2 is the more complete one, during run1 the machine crashed
even faster.
It's interesting to note, however, that it seems to have crashed the
same way and at the same step in the process.


Your data really helps a lot!!

And the good news is, the qgroup accouting part is working as expected.
Although I only checked about 1/3/6 of about 5 snaps, they are all OK.

I can make a script to cross check them, but from the last few result,
I think qgroup works fine.

I'm more confident about the minus number, which should be a result of
deleted subvolume, and the real problem is, such qgroup is not handled
well with qgroup rescan.


I agree with your analysis, this matches what I observed.


I'll try to add a hot fix for such case if needed.
But right now, I don't have a good idea for it until Mark's work of
rescan subtree.

Maybe I can add a new option for btrfs-progs to automatically remove
the qgroup and trigger a rescan?


Until this is properly fixed in the kernel code, and this is good news 
to
know Mark and you are working on it, this would be a good workaround 
yes!



[ 5738.172879] Call Trace:
[ 5738.172887]  [] btrfs_start_transaction+0x1b/0x20
[btrfs]
[ 5738.172896]  []
btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]


Your netconsole backtrace is also of greate value.
This one implies that, my rework also caused some stupid bug.
(Yeah, I always make such bugs) or some existing unexposed rescan bug.

Would you please use gdb to show the codes of
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)

My guess is the following line:(pretty sure, but not 100% sure)
--
/*
 * only update status, since the previous part has alreay 
updated the

 * qgroup info.
 */
trans = btrfs_start_transaction(fs_info->quota_root, 1); <
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
btrfs_err(fs_info,
  "fail to start transaction for status update: 
%d\n",

  err);
goto done;
}
--


The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of /proc/kcore 
properly
aligned with the source I compiled: the asm code doesn't match the C 
code shown
by gdb. In any case, watching the source of this function, this is the 
only place
btrfs_start_transaction is called, so we can be 100% sure it's where the 
crash

happens indeed.


But that means, at rescan time, fs_info->quota_root is still NULL,
which is quite wired.
I can add extra check to avoid such NULL pointer for now, but it's
better to review the existing rescan workflow, as I think there is
some race for it to init quota_root.

You can also try the following hotfix patch to see if it works:
http://pastebin.com/966GQXPk

My concern is, this may cause qgroup rescan to exit without updating
its accounting info...

So still need your help.
Or I can use your reproducer script to test it next Monday.


Compiling with your patch, just amended of a little printk to know if 
the execution

flow enters the added if condition. Will let you know about the results.


But I'm pretty sure I can get that (u64)-1 value again by deleting
snapshots. Shall I ? Or do you have something else for me to run
before that ?


You have already done a great job in helping maturing qgroups.
The minus number and 0 excl is somewhat expected for deleted snapshots.

Good news is, 1) it doesn't affect valid(non-orphan) qgroup.
2) Mark is already working on it.

I'll try to add a btrfs-progs hotfix for you to delete and rescan
qgroups to avoid such problem.


That would be good !


So, as a quick summary of this big thread, it seems I've been hitting
3 bugs, all reproductible :
- kernel BUG on balance (this original thread)


For this, I can't provide much help, as extent backref bug is quite
hard to debug, unless a developer is interested in it and find a
stable way to reproduce it.


Yes, unfortunately as it looks so much like a race condition, I know I 
can
reproduce it with my worflow, but it can take between 1 minute and 12 
hours,

so I wouldn't call it a "stable way" to reproduce it unfortunately :(

Still if any dev is interested in it, I can reproduce it, with a patched
kernel if needed.


The rest two are explained or have hot fix mentioned above.


And thanks for that, will keep you posted.

--
Stéphane.

--

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-19 Thread Qu Wenruo



在 2015年09月18日 18:26, Stéphane Lesimple 写道:

Le 2015-09-18 12:15, Stéphane Lesimple a écrit :

Le 2015-09-18 09:36, Stéphane Lesimple a écrit :

Sure, I did a quota disable / quota enable before running the snapshot
debug procedure, so the qgroups were clean again when I started :

qgroupid  rfer  excl max_rfer max_excl parent
child
       --
-
0/5  16384 16384 none none --- ---
0/1906   1657848029184 1657848029184 none none --- ---
0/1909124950921216  124950921216 none none --- ---
0/1911   1054587293696 1054587293696 none none --- ---
0/3270 23727300608   23727300608 none none --- ---
0/3314 23221784576   23221784576 none none --- ---
0/3341  74792755207479275520 none none --- ---
0/3367 24185790464   24185790464 none none --- ---

The test is running, I expect to post the results within an hour or two.


Well, my system crashed twice while running the procedure...
By "crashed" I mean : the machine no longer pings, and nothing is
logged in kern.log unfortunately :

[ 7096.735731] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7172.614851] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7242.870259] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7321.466931] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[0.00] Initializing cgroup subsys cpuset

The even stranger part is that the last 2 stdout dump files exist but
are empty :

-rw-r--r-- 1 root root   21 Sep 18 10:29 snap32.step5
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
-rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
-rw-r--r-- 1 root root   21 Sep 18 10:30 snap33.step5
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
-rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step3 <==
-rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step5 <==

The mentioned steps are as follows :

0) Rsync data from the next ext4 "snapshot" to the subvolume
1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
2) Create the needed readonly snapshot on btrfs
3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -⁠w' and save it <==
6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==

The resulting files are available here:
http://speed47.net/tmp2/qgroup.tar.gz
The run2 is the more complete one, during run1 the machine crashed
even faster.
It's interesting to note, however, that it seems to have crashed the
same way and at the same step in the process.


Your data really helps a lot!!

And the good news is, the qgroup accouting part is working as expected.
Although I only checked about 1/3/6 of about 5 snaps, they are all OK.

I can make a script to cross check them, but from the last few result, I 
think qgroup works fine.


I'm more confident about the minus number, which should be a result of 
deleted subvolume, and the real problem is, such qgroup is not handled 
well with qgroup rescan.


I'll try to add a hot fix for such case if needed.
But right now, I don't have a good idea for it until Mark's work of 
rescan subtree.


Maybe I can add a new option for btrfs-progs to automatically remove the 
qgroup and trigger a rescan?



Actually about that, I forgot I did set up netconsole before starting
the second run after the first "muted" crash, and it did work : even if
I have no logs in kern.log, netconsole managed to send them to my other
machine before going down, so here it is :

---
[ 5738.172692] BUG: unable to handle kernel NULL pointer dereference at
01f0
[ 5738.172702] IP: [] start_transaction+0x1b/0x580
[btrfs]
[ 5738.172719] PGD c0aa7067 PUD c0aa6067 PMD 0
[ 5738.172723] Oops:  [#1] SMP
[ 5738.172726] Modules linked in: netconsole configfs xts gf128mul drbg
ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp
nfnetlink_queue nfnetlink_log nfnetlink nf_conntrack_ftp
nf_conntrack_sane iptable_security iptable_filter iptable_mangle
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs
cmac dm_crypt rfcomm bnep lockd grace sunrpc fscache binfmt_misc
intel_rapl snd_hda_codec_realtek iosf_mbi x86_pkg_temp_thermal
intel_powerclamp kvm_intel snd_hda_codec_generic snd_hda_intel
snd_hda_codec kvm eeepc_wmi asus_wmi snd_hda_core btusb sparse_keymap
btrtl snd_hwdep btbcm snd_pcm btintel 8021q bluetooth snd_seq_midi
dm_multipath snd_seq_midi_event garp snd_rawmidi mrp snd_seq stp llc
snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul 

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-18 Thread Stéphane Lesimple

Le 2015-09-18 02:59, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/17 20:47 +0200:

Le 2015-09-17 12:41, Qu Wenruo a écrit :
In the meantime, I've reactivated quotas, umounted the filesystem 
and

ran a btrfsck on it : as you would expect, there's no qgroup problem
reported so far.


At least, rescan code is working without problem.


I'll clear all my snapshots, run an quota rescan, then
re-create them one by one by rsyncing from my ext4 system I still 
have.

Maybe I'll run into the issue again.



Would you mind to do the following check for each subvolume rsync?

1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
2) Create the needed snapshot
3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -w' and save it
6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
7) Rsync data from ext4 to the newly created snapshot

The point is, as you mentioned, rescan is working fine, we can 
compare

output from 3), 6) and 1) to see which qgroup accounting number
changes.

And if differs, which means the qgroup update at write time OR
snapshot creation has something wrong, at least we can locate the
problem to qgroup update routine or snapshot creation.


I was about to do that, but first there's something that sounds 
strange
: I've begun by trashing all my snapshots, then ran a quota rescan, 
and

waited for it to complete, to start on a sane base.
However, this is the output of qgroup show now :


By "trashing", did you mean deleting all the files inside the 
subvolume?

Or "btrfs subv del"?


Sorry for the confusion here, yes, I meant btrfs subvolume del.


qgroupid  rfer excl max_rfer max_excl
parent  child
     
--  -
0/5  1638416384 none none
--- ---
0/1906   16578480291841657848029184 none none
--- ---
0/1909124950921216 124950921216 none none
--- ---
0/1911   10545872936961054587293696 none none
--- ---
0/3270 23727300608  23727300608 none none
--- ---
0/3314 23206055936  23206055936 none none
--- ---
0/3317 184729968640 none none
--- ---
0/3318 22235709440 18446744073708421120 none none
--- ---
0/3319 222403338240 none none
--- ---
0/3320 222896087040 none none
--- ---
0/3321 222896087040 none none
--- ---
0/3322 184611512320 none none
--- ---
0/3323 184239022080 none none
--- ---
0/3324 184239022080 none none
--- ---
0/3325 184635064320 none none
--- ---
0/3326 184635064320 none none
--- ---
0/3327 184635064320 none none
--- ---
0/3328 184635064320 none none
--- ---
0/3329 185854279680 none none
--- ---
0/3330 18621472768 18446744073251348480 none none
--- ---
0/3331 186214727680 none none
--- ---
0/3332 186214727680 none none
--- ---
0/ 187830763520 none none
--- ---
0/3334 187998044160 none none
--- ---
0/3335 187998044160 none none
--- ---
0/3336 188162170880 none none
--- ---
0/3337 188162662400 none none
--- ---
0/3338 188162662400 none none
--- ---
0/3339 188162662400 none none
--- ---
0/3340 188163645440 none none
--- ---
0/3341  7530119168   7530119168 none none
--- ---
0/3342  49192837120 none none
--- ---
0/3343  49217249280 none none
--- ---
0/3344  49217249280 none none
--- ---
0/3345  6503317504 18446744073690902528 none none
--- ---
0/3346  65034526720 none none
--- ---
0/3347  65095147520 none none
--- ---
0/3348  65157939200 none none
--- ---
0/3349  

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-18 Thread Stéphane Lesimple

Le 2015-09-18 09:36, Stéphane Lesimple a écrit :

Sure, I did a quota disable / quota enable before running the snapshot
debug procedure, so the qgroups were clean again when I started :

qgroupid  rfer  excl max_rfer max_excl parent  
child
       --  
-
0/5  16384 16384 none none --- 
---
0/1906   1657848029184 1657848029184 none none --- 
---
0/1909124950921216  124950921216 none none --- 
---
0/1911   1054587293696 1054587293696 none none --- 
---
0/3270 23727300608   23727300608 none none --- 
---
0/3314 23221784576   23221784576 none none --- 
---
0/3341  74792755207479275520 none none --- 
---
0/3367 24185790464   24185790464 none none --- 
---


The test is running, I expect to post the results within an hour or 
two.


Well, my system crashed twice while running the procedure...
By "crashed" I mean : the machine no longer pings, and nothing is logged 
in kern.log unfortunately :


[ 7096.735731] BTRFS info (device dm-3): qgroup scan completed 
(inconsistency flag cleared)
[ 7172.614851] BTRFS info (device dm-3): qgroup scan completed 
(inconsistency flag cleared)
[ 7242.870259] BTRFS info (device dm-3): qgroup scan completed 
(inconsistency flag cleared)
[ 7321.466931] BTRFS info (device dm-3): qgroup scan completed 
(inconsistency flag cleared)

[0.00] Initializing cgroup subsys cpuset

The even stranger part is that the last 2 stdout dump files exist but 
are empty :


-rw-r--r-- 1 root root   21 Sep 18 10:29 snap32.step5
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
-rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
-rw-r--r-- 1 root root   21 Sep 18 10:30 snap33.step5
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
-rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step3 <==
-rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step5 <==

The mentioned steps are as follows :

0) Rsync data from the next ext4 "snapshot" to the subvolume
1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
2) Create the needed readonly snapshot on btrfs
3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -⁠w' and save it <==
6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==

The resulting files are available here: 
http://speed47.net/tmp2/qgroup.tar.gz
The run2 is the more complete one, during run1 the machine crashed even 
faster.
It's interesting to note, however, that it seems to have crashed the 
same way and at the same step in the process.


As the machine is now, qgroups seems OK :

~# btrfs qgroup show -pcre --raw /tank/
qgroupid  rfer  excl max_rfer max_excl parent  
child
       --  
-
0/5  32768 32768 none none --- 
---
0/1906   3315696058368 3315696058368 none none --- 
---
0/1909249901842432  249901842432 none none --- 
---
0/1911   2109174587392 2109174587392 none none --- 
---
0/3270 47454601216   47454601216 none none --- 
---
0/3314 46408499200 32768 none none --- 
---
0/3341 14991097856 32768 none none --- 
---
0/3367 48371580928   48371580928 none none --- 
---
0/5335 56523751424 280592384 none none --- 
---
0/5336 601752535042599960576 none none --- 
---
0/5337 45751746560 250888192 none none --- 
---
0/5338 45804650496 186531840 none none --- 
---
0/5339 45875167232 190521344 none none --- 
---
0/5340 45933486080327680 none none --- 
---
0/5341 45933502464344064 none none --- 
---
0/5342 46442815488  35454976 none none --- 
---
0/5343 46442520576  30638080 none none --- 
---
0/5344 46448312320  36495360 none none --- 
---
0/5345 46425235456  86204416 none none --- 
---
0/5346 46081941504 119398400 none none --- 
---
0/5347 46402715648  55615488 none none --- 
---
0/5348 46403534848  50528256 none none --- 
---
0/5349 45486301184  91463680 none none --- 
---
0/5351 46414635008393216 none none --- 
---
0/5352 

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-18 Thread Stéphane Lesimple

Le 2015-09-18 12:15, Stéphane Lesimple a écrit :

Le 2015-09-18 09:36, Stéphane Lesimple a écrit :

Sure, I did a quota disable / quota enable before running the snapshot
debug procedure, so the qgroups were clean again when I started :

qgroupid  rfer  excl max_rfer max_excl parent  
child
       --  
-
0/5  16384 16384 none none --- 
---
0/1906   1657848029184 1657848029184 none none --- 
---
0/1909124950921216  124950921216 none none --- 
---
0/1911   1054587293696 1054587293696 none none --- 
---
0/3270 23727300608   23727300608 none none --- 
---
0/3314 23221784576   23221784576 none none --- 
---
0/3341  74792755207479275520 none none --- 
---
0/3367 24185790464   24185790464 none none --- 
---


The test is running, I expect to post the results within an hour or 
two.


Well, my system crashed twice while running the procedure...
By "crashed" I mean : the machine no longer pings, and nothing is
logged in kern.log unfortunately :

[ 7096.735731] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7172.614851] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7242.870259] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7321.466931] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[0.00] Initializing cgroup subsys cpuset

The even stranger part is that the last 2 stdout dump files exist but
are empty :

-rw-r--r-- 1 root root   21 Sep 18 10:29 snap32.step5
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
-rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
-rw-r--r-- 1 root root   21 Sep 18 10:30 snap33.step5
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
-rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step3 <==
-rw-r--r-- 1 root root0 Sep 18 10:30 snap34.step5 <==

The mentioned steps are as follows :

0) Rsync data from the next ext4 "snapshot" to the subvolume
1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
2) Create the needed readonly snapshot on btrfs
3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -⁠w' and save it <==
6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==

The resulting files are available here: 
http://speed47.net/tmp2/qgroup.tar.gz
The run2 is the more complete one, during run1 the machine crashed even 
faster.

It's interesting to note, however, that it seems to have crashed the
same way and at the same step in the process.


Actually about that, I forgot I did set up netconsole before starting 
the second run after the first "muted" crash, and it did work : even if 
I have no logs in kern.log, netconsole managed to send them to my other 
machine before going down, so here it is :


---
[ 5738.172692] BUG: unable to handle kernel NULL pointer dereference at 
01f0
[ 5738.172702] IP: [] start_transaction+0x1b/0x580 
[btrfs]

[ 5738.172719] PGD c0aa7067 PUD c0aa6067 PMD 0
[ 5738.172723] Oops:  [#1] SMP
[ 5738.172726] Modules linked in: netconsole configfs xts gf128mul drbg 
ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp 
nfnetlink_queue nfnetlink_log nfnetlink nf_conntrack_ftp 
nf_conntrack_sane iptable_security iptable_filter iptable_mangle 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs 
cmac dm_crypt rfcomm bnep lockd grace sunrpc fscache binfmt_misc 
intel_rapl snd_hda_codec_realtek iosf_mbi x86_pkg_temp_thermal 
intel_powerclamp kvm_intel snd_hda_codec_generic snd_hda_intel 
snd_hda_codec kvm eeepc_wmi asus_wmi snd_hda_core btusb sparse_keymap 
btrtl snd_hwdep btbcm snd_pcm btintel 8021q bluetooth snd_seq_midi 
dm_multipath snd_seq_midi_event garp snd_rawmidi mrp snd_seq stp llc 
snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul snd 
ghash_clmulni_intel cryptd serio_raw soundcore mei_me mei lpc_ich shpchp 
mac_hid parport_pc ppdev nct6775 hwmon_vid coretemp lp parport btrfs 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq raid0 multipath linear nbd raid1 i915 e1000e 
i2c_algo_bit drm_kms_helper syscopyarea ptp sysfillrect sysimgblt 
fb_sys_fops psmouse ahci drm libahci pps_core wmi video [last unloaded: 
netconsole]
[ 5738.172831] CPU: 1 PID: 10932 Comm: kworker/u4:14 Not tainted 
4.3.0-rc1 #1
[ 5738.172833] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005 
01/06/2014
[ 5738.172843] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper 

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-17 Thread Stéphane Lesimple

Le 2015-09-17 05:03, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:

Le 2015-09-16 22:18, Duncan a écrit :
Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as 
excerpted:




Well actually it's the (d) option ;)
I activate the quota feature for only one reason : being able to track
down how much space my snapshots are taking.


Yeah, that's completely one of the ideal use case of btrfs qgroup.

But I'm quite curious about the btrfsck error report on qgroup.

If btrfsck report such error, it means either I'm too confident about
the recent qgroup accounting rework, or btrfsck has some bug which I
didn't take much consideration during the kernel rework.

Would you please provide the full result of previous btrfsck with 
qgroup error?


Sure, I've saved the log somewhere just in case, here your are :

Counts for qgroup id: 3359 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3361 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3362 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3363 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3361 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3362 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3363 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3364 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3365 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3366 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 16384 exclusive compressed 16384
disk:   exclusive 16384 exclusive compressed 16384


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-17 Thread Stéphane Lesimple

Le 2015-09-16 15:04, Stéphane Lesimple a écrit :

I also disabled quota because it has almost for sure nothing
to do with the bug


As it turns out, it seems that this assertion was completely wrong.

I've got balance running for more than 16 hours now, without a crash. 
This is almost 50% of the work done without any issue. Before, a crash 
would happen within minutes, sometimes 1 hour, but not much more. The 
problem is, I didn't change anything to the filesystem, well, appart 
from the benign quota disable. So Qu's question about the qgroups errors 
in fsck made me wonder : if I activate quota again, it'll still continue 
to balance flawlessly, right ?


Well, it doesn't. I just ran btrfs quota enable on my filesystem, it 
completed successfully after some minutes (rescan -s said that no rescan 
was pending). Then less than 5 minutes later, the kernel crashed, at the 
same BUG_ON() than usually :


[60156.062082] BTRFS info (device dm-3): relocating block group 
972839452672 flags 129

[60185.203626] BTRFS info (device dm-3): found 1463 extents
[60414.452890] {btrfs} in insert_inline_extent_backref, got owner < 
BTRFS_FIRST_FREE_OBJECTID
[60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384 
parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1 
BTRFS_FIRST_FREE_OBJECTID=256

[60414.452924] [ cut here ]
[60414.452928] kernel BUG at fs/btrfs/extent-tree.c:1837!

owner is=1 again at this point in the code (this is still kernel 
4.3.0-rc1 with my added printks).


So I'll disable quota, again, and resume the balance. If I'm right, it 
should proceed without issue for 18 more hours !


Qu, my filesystem is at your disposal :)

--
Stéphane.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-17 Thread Stéphane Lesimple

Le 2015-09-17 08:29, Stéphane Lesimple a écrit :

Le 2015-09-16 15:04, Stéphane Lesimple a écrit :

I also disabled quota because it has almost for sure nothing
to do with the bug


As it turns out, it seems that this assertion was completely wrong.

I've got balance running for more than 16 hours now, without a crash.
This is almost 50% of the work done without any issue. Before, a crash
would happen within minutes, sometimes 1 hour, but not much more. The
problem is, I didn't change anything to the filesystem, well, appart
from the benign quota disable. So Qu's question about the qgroups
errors in fsck made me wonder : if I activate quota again, it'll still
continue to balance flawlessly, right ?

Well, it doesn't. I just ran btrfs quota enable on my filesystem, it
completed successfully after some minutes (rescan -s said that no
rescan was pending). Then less than 5 minutes later, the kernel
crashed, at the same BUG_ON() than usually :

[60156.062082] BTRFS info (device dm-3): relocating block group
972839452672 flags 129
[60185.203626] BTRFS info (device dm-3): found 1463 extents
[60414.452890] {btrfs} in insert_inline_extent_backref, got owner <
BTRFS_FIRST_FREE_OBJECTID
[60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384
parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1
BTRFS_FIRST_FREE_OBJECTID=256
[60414.452924] [ cut here ]
[60414.452928] kernel BUG at fs/btrfs/extent-tree.c:1837!

owner is=1 again at this point in the code (this is still kernel
4.3.0-rc1 with my added printks).

So I'll disable quota, again, and resume the balance. If I'm right, it
should proceed without issue for 18 more hours !


Damn, wrong again. It just re-crashed without quota enabled :(
The fact that it went perfectly well for 17+ hours and crashed minutes 
after I reactivated quota might be by complete chance then ...


[ 5487.706499] {btrfs} in insert_inline_extent_backref, got owner < 
BTRFS_FIRST_FREE_OBJECTID
[ 5487.706504] {btrfs} with bytenr=6906661109760 num_bytes=16384 
parent=6905020874752 root_objectid=18446744073709551608 owner=1 offset=0 
refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256

[ 5487.706536] [ cut here ]
[ 5487.706539] kernel BUG at fs/btrfs/extent-tree.c:1837!

For reference, the crash I had earlier this morning was as follows :

[60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384 
parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1 
BTRFS_FIRST_FREE_OBJECTID=256


So, this is a completely different part of the filesystem.
The bug is always the same though, owner=1 where it shouldn't be < 256.

Balance cancelled.

To me, it sounds like some sort of race condition. But I'm out of ideas 
on what to test now.


--
Stéphane.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-17 Thread Stéphane Lesimple

Le 2015-09-17 08:42, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:

Le 2015-09-17 05:03, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:

Le 2015-09-16 22:18, Duncan a écrit :

Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
excerpted:



Well actually it's the (d) option ;)
I activate the quota feature for only one reason : being able to 
track

down how much space my snapshots are taking.


Yeah, that's completely one of the ideal use case of btrfs qgroup.

But I'm quite curious about the btrfsck error report on qgroup.

If btrfsck report such error, it means either I'm too confident about
the recent qgroup accounting rework, or btrfsck has some bug which I
didn't take much consideration during the kernel rework.

Would you please provide the full result of previous btrfsck with
qgroup error?


Sure, I've saved the log somewhere just in case, here your are :

[...]

Thanks for your log, pretty interesting result.

BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
If so, I would be much relaxed as they can be the problem of old 
kernels.


The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota 
under 4.2.0 precisely. My kern.log tends to confirm that (looking for 
'qgroup scan completed').



If it's OK for you, would you please enable quota after reproducing
the bug and use for sometime and recheck it?


Sure, I've just reproduced the bug twice as I wanted, and posted the 
info, so now I've cancelled the balance and I can reenable quota. Will 
do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about 
it in the following days.


Regards,

--
Stéphane.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-17 Thread Stéphane Lesimple

Le 2015-09-17 10:11, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/17 10:02 +0200:

Le 2015-09-17 08:42, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:

Le 2015-09-17 05:03, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:

Le 2015-09-16 22:18, Duncan a écrit :

Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
excerpted:



Well actually it's the (d) option ;)
I activate the quota feature for only one reason : being able to 
track

down how much space my snapshots are taking.


Yeah, that's completely one of the ideal use case of btrfs qgroup.

But I'm quite curious about the btrfsck error report on qgroup.

If btrfsck report such error, it means either I'm too confident 
about
the recent qgroup accounting rework, or btrfsck has some bug which 
I

didn't take much consideration during the kernel rework.

Would you please provide the full result of previous btrfsck with
qgroup error?


Sure, I've saved the log somewhere just in case, here your are :

[...]

Thanks for your log, pretty interesting result.

BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
If so, I would be much relaxed as they can be the problem of old 
kernels.


The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled 
quota

under 4.2.0 precisely. My kern.log tends to confirm that (looking for
'qgroup scan completed').


Emmm, seems I need to pay more attention on this case now.
Any info about the workload for this btrfs fs?




If it's OK for you, would you please enable quota after reproducing
the bug and use for sometime and recheck it?


Sure, I've just reproduced the bug twice as I wanted, and posted the
info, so now I've cancelled the balance and I can reenable quota. Will
do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about
it in the following days.

Regards,


Thanks for your patience and detailed report.


You're very welcome.


But I still have another question, did you do any snapshot deletion
after quota enabled?
(I'll assume you did it, as there are a lot of backup snapshot, old
ones should be already deleted)


Actually no : this btrfs system is quite new (less than a week old) as 
I'm migrating from mdadm(raid1)+ext4 to btrfs. So those snapshots were 
actually rsynced one by one from my hardlinks-based "snapshots" under 
ext4 (those pseudo-snapshots are created using a program named 
"rsnapshot", if you know it. This is basically a wrapper to cp -la). I 
didn't activate yet an automatic snapshot/delete on my btrfs system, due 
to the bugs I'm tripping on. So no snapshot was deleted.



That's one of the known bug and Mark is working on it actively.
If you delete non-empty snapshot a lot, then I'd better add a hot fix
to mark qgroup inconsistent after snapshot delete, and trigger a
rescan if possible.


I've made a btrfs-image of the filesystem just before disabling quotas 
(which I did to get a clean btrfsck and eliminate quotas from the 
equation trying to reproduce the bug I have). Would it be of any use if 
I drop it somewhere for you to pick it up ? (2.9G in size).


In the meantime, I've reactivated quotas, umounted the filesystem and 
ran a btrfsck on it : as you would expect, there's no qgroup problem 
reported so far. I'll clear all my snapshots, run an quota rescan, then 
re-create them one by one by rsyncing from my ext4 system I still have. 
Maybe I'll run into the issue again.


--
Stéphane.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-17 Thread Qu Wenruo



在 2015年09月17日 18:08, Stéphane Lesimple 写道:

Le 2015-09-17 10:11, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/17 10:02 +0200:

Le 2015-09-17 08:42, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:

Le 2015-09-17 05:03, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:

Le 2015-09-16 22:18, Duncan a écrit :

Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
excerpted:



Well actually it's the (d) option ;)
I activate the quota feature for only one reason : being able to
track
down how much space my snapshots are taking.


Yeah, that's completely one of the ideal use case of btrfs qgroup.

But I'm quite curious about the btrfsck error report on qgroup.

If btrfsck report such error, it means either I'm too confident about
the recent qgroup accounting rework, or btrfsck has some bug which I
didn't take much consideration during the kernel rework.

Would you please provide the full result of previous btrfsck with
qgroup error?


Sure, I've saved the log somewhere just in case, here your are :

[...]

Thanks for your log, pretty interesting result.

BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
If so, I would be much relaxed as they can be the problem of old
kernels.


The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota
under 4.2.0 precisely. My kern.log tends to confirm that (looking for
'qgroup scan completed').


Emmm, seems I need to pay more attention on this case now.
Any info about the workload for this btrfs fs?




If it's OK for you, would you please enable quota after reproducing
the bug and use for sometime and recheck it?


Sure, I've just reproduced the bug twice as I wanted, and posted the
info, so now I've cancelled the balance and I can reenable quota. Will
do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about
it in the following days.

Regards,


Thanks for your patience and detailed report.


You're very welcome.


But I still have another question, did you do any snapshot deletion
after quota enabled?
(I'll assume you did it, as there are a lot of backup snapshot, old
ones should be already deleted)


Actually no : this btrfs system is quite new (less than a week old) as
I'm migrating from mdadm(raid1)+ext4 to btrfs. So those snapshots were
actually rsynced one by one from my hardlinks-based "snapshots" under
ext4 (those pseudo-snapshots are created using a program named
"rsnapshot", if you know it. This is basically a wrapper to cp -la). I
didn't activate yet an automatic snapshot/delete on my btrfs system, due
to the bugs I'm tripping on. So no snapshot was deleted.


Now things are getting tricky, as all known bugs are ruled out, it must 
be another hidden bug, even we tried to rework the qgroup accounting code.





That's one of the known bug and Mark is working on it actively.
If you delete non-empty snapshot a lot, then I'd better add a hot fix
to mark qgroup inconsistent after snapshot delete, and trigger a
rescan if possible.


I've made a btrfs-image of the filesystem just before disabling quotas
(which I did to get a clean btrfsck and eliminate quotas from the
equation trying to reproduce the bug I have). Would it be of any use if
I drop it somewhere for you to pick it up ? (2.9G in size).


For dismatch case, static btrfs-image dump won't really help.
As the important point is, when and which operation caused qgroup 
accounting to dismatch.




In the meantime, I've reactivated quotas, umounted the filesystem and
ran a btrfsck on it : as you would expect, there's no qgroup problem
reported so far.


At least, rescan code is working without problem.


I'll clear all my snapshots, run an quota rescan, then
re-create them one by one by rsyncing from my ext4 system I still have.
Maybe I'll run into the issue again.



Would you mind to do the following check for each subvolume rsync?

1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
2) Create the needed snapshot
3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -w' and save it
6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
7) Rsync data from ext4 to the newly created snapshot

The point is, as you mentioned, rescan is working fine, we can compare 
output from 3), 6) and 1) to see which qgroup accounting number changes.


And if differs, which means the qgroup update at write time OR snapshot 
creation has something wrong, at least we can locate the problem to 
qgroup update routine or snapshot creation.


Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-17 Thread Qu Wenruo



Stéphane Lesimple wrote on 2015/09/17 10:02 +0200:

Le 2015-09-17 08:42, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:

Le 2015-09-17 05:03, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:

Le 2015-09-16 22:18, Duncan a écrit :

Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
excerpted:



Well actually it's the (d) option ;)
I activate the quota feature for only one reason : being able to track
down how much space my snapshots are taking.


Yeah, that's completely one of the ideal use case of btrfs qgroup.

But I'm quite curious about the btrfsck error report on qgroup.

If btrfsck report such error, it means either I'm too confident about
the recent qgroup accounting rework, or btrfsck has some bug which I
didn't take much consideration during the kernel rework.

Would you please provide the full result of previous btrfsck with
qgroup error?


Sure, I've saved the log somewhere just in case, here your are :

[...]

Thanks for your log, pretty interesting result.

BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
If so, I would be much relaxed as they can be the problem of old kernels.


The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota
under 4.2.0 precisely. My kern.log tends to confirm that (looking for
'qgroup scan completed').


Emmm, seems I need to pay more attention on this case now.
Any info about the workload for this btrfs fs?




If it's OK for you, would you please enable quota after reproducing
the bug and use for sometime and recheck it?


Sure, I've just reproduced the bug twice as I wanted, and posted the
info, so now I've cancelled the balance and I can reenable quota. Will
do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about
it in the following days.

Regards,


Thanks for your patience and detailed report.

But I still have another question, did you do any snapshot deletion 
after quota enabled?
(I'll assume you did it, as there are a lot of backup snapshot, old ones 
should be already deleted)


That's one of the known bug and Mark is working on it actively.
If you delete non-empty snapshot a lot, then I'd better add a hot fix to 
mark qgroup inconsistent after snapshot delete, and trigger a rescan if 
possible.


Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-17 Thread Qu Wenruo



Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:

Le 2015-09-17 05:03, Qu Wenruo a écrit :

Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:

Le 2015-09-16 22:18, Duncan a écrit :

Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
excerpted:



Well actually it's the (d) option ;)
I activate the quota feature for only one reason : being able to track
down how much space my snapshots are taking.


Yeah, that's completely one of the ideal use case of btrfs qgroup.

But I'm quite curious about the btrfsck error report on qgroup.

If btrfsck report such error, it means either I'm too confident about
the recent qgroup accounting rework, or btrfsck has some bug which I
didn't take much consideration during the kernel rework.

Would you please provide the full result of previous btrfsck with
qgroup error?


Sure, I've saved the log somewhere just in case, here your are :

Counts for qgroup id: 3359 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3361 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3362 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3363 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3361 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3362 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3363 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3364 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3365 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 49152 exclusive compressed 49152
disk:   exclusive 32768 exclusive compressed 32768
diff:   exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3366 are different
our:referenced 7530119168 referenced compressed 7530119168
disk:   referenced 7530086400 referenced compressed 7530086400
diff:   referenced 32768 referenced compressed 32768
our:exclusive 16384 exclusive compressed 16384
disk:

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-17 Thread Stéphane Lesimple

Le 2015-09-17 12:41, Qu Wenruo a écrit :

In the meantime, I've reactivated quotas, umounted the filesystem and
ran a btrfsck on it : as you would expect, there's no qgroup problem
reported so far.


At least, rescan code is working without problem.


I'll clear all my snapshots, run an quota rescan, then
re-create them one by one by rsyncing from my ext4 system I still 
have.

Maybe I'll run into the issue again.



Would you mind to do the following check for each subvolume rsync?

1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
2) Create the needed snapshot
3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -w' and save it
6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
7) Rsync data from ext4 to the newly created snapshot

The point is, as you mentioned, rescan is working fine, we can compare
output from 3), 6) and 1) to see which qgroup accounting number
changes.

And if differs, which means the qgroup update at write time OR
snapshot creation has something wrong, at least we can locate the
problem to qgroup update routine or snapshot creation.


I was about to do that, but first there's something that sounds strange 
: I've begun by trashing all my snapshots, then ran a quota rescan, and 
waited for it to complete, to start on a sane base.

However, this is the output of qgroup show now :

qgroupid  rfer excl max_rfer max_excl 
parent  child
      
--  -
0/5  1638416384 none none 
--- ---
0/1906   16578480291841657848029184 none none 
--- ---
0/1909124950921216 124950921216 none none 
--- ---
0/1911   10545872936961054587293696 none none 
--- ---
0/3270 23727300608  23727300608 none none 
--- ---
0/3314 23206055936  23206055936 none none 
--- ---
0/3317 184729968640 none none 
--- ---
0/3318 22235709440 18446744073708421120 none none 
--- ---
0/3319 222403338240 none none 
--- ---
0/3320 222896087040 none none 
--- ---
0/3321 222896087040 none none 
--- ---
0/3322 184611512320 none none 
--- ---
0/3323 184239022080 none none 
--- ---
0/3324 184239022080 none none 
--- ---
0/3325 184635064320 none none 
--- ---
0/3326 184635064320 none none 
--- ---
0/3327 184635064320 none none 
--- ---
0/3328 184635064320 none none 
--- ---
0/3329 185854279680 none none 
--- ---
0/3330 18621472768 18446744073251348480 none none 
--- ---
0/3331 186214727680 none none 
--- ---
0/3332 186214727680 none none 
--- ---
0/ 187830763520 none none 
--- ---
0/3334 187998044160 none none 
--- ---
0/3335 187998044160 none none 
--- ---
0/3336 188162170880 none none 
--- ---
0/3337 188162662400 none none 
--- ---
0/3338 188162662400 none none 
--- ---
0/3339 188162662400 none none 
--- ---
0/3340 188163645440 none none 
--- ---
0/3341  7530119168   7530119168 none none 
--- ---
0/3342  49192837120 none none 
--- ---
0/3343  49217249280 none none 
--- ---
0/3344  49217249280 none none 
--- ---
0/3345  6503317504 18446744073690902528 none none 
--- ---
0/3346  65034526720 none none 
--- ---
0/3347  65095147520 none none 
--- ---
0/3348  65157939200 none none 
--- ---
0/3349  65157939200 none none 
--- ---
0/3350  65186856960 none none 
--- ---
0/3351  65215119360 none none 
---  

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-16 Thread Stéphane Lesimple

Le 2015-09-16 07:02, Duncan a écrit :
Stéphane Lesimple posted on Tue, 15 Sep 2015 23:47:01 +0200 as 
excerpted:



Le 2015-09-15 16:56, Josef Bacik a écrit :

On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
I've been experiencing repetitive "kernel BUG" occurences in the 
past

few days trying to balance a raid5 filesystem after adding a new
drive.
It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.


I've ran a scrub on this filesystem after the crash happened twice,
and if found no errors.

The BUG_ON() condition that my filesystem triggers is the following 
:


BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
// in insert_inline_extent_backref() of extent-tree.c.


Does btrfsck complain at all?


Just to elucidate a bit...
[...]
Which is where btrfs check comes in and why JB asked you to run it, 
since

unlike scrub, check is designed to catch filesystem logic errors.


Thanks for your clarification Duncan, that perfectly makes sense.


You're right, even if btrfs scrub didn't complain, btrfsck does :

checking extents
bad metadata [4179166806016, 4179166822400) crossing stripe boundary
bad metadata [4179166871552, 4179166887936) crossing stripe boundary
bad metadata [4179166937088, 4179166953472) crossing stripe boundary


This is an actively in-focus bug ATM, and while I'm not a dev and can't
tell you for sure that it's behind the specific balance-related crash 
and

traces you posted (tho I believe it so), it certainly has the potential
to be that serious, yes.

The most common cause is a buggy btrfs-convert that was creating 
invalid
btrfs when converting from ext* at one point.  AFAIK they've hotfixed 
the
immediate convert issue, but are still actively working on a longer 
term
proper fix.  Meanwhile, while btrfs check does now detect the issue 
(and

even that is quite new code, added in 4.2 I believe), there's still no
real fix for what was after all a defective btrfs from the moment the
convert was done.
[...]
If, however, you created the filesystem using mkfs.btrfs, then the
problem must have occurred some other way.  Whether there's some other
cause beyond the known cause, a buggy btrfs-convert, has in fact been 
in

question, so in this case the devs are likely to be quite interested
indeed in your case and perhaps the filesystem history that brought you
to this point.  The ultimate fix is likely to be the same (unless the
devs have you test new fix code for btrfs check --repair), but I'd
strongly urge you to delay blowing away the filesystem, if possible,
until the devs have a chance to ask you to run other diagnostics and
perhaps even get a btrfs-image for them, since you may well have
accidentally found a corner-case they'll have trouble reproducing,
without your information.


Nice to know that this bug was already somewhat known, but I can confirm 
that it actually doesn't come from an ext4 conversion on my case.


Here is the filesystem history, which is actually quite short :
- FS created from scratch, no convert, on 2x4T devices using mkfs.btrfs 
with raid1 metadata, raid5 data. This is using the 4.2 tools and kernel 
3.19, so a couple incompat features were turned on by default (such as 
skinny metadata).
- Approx. 4T worth of files copied to it, a bit less, I had around 30G 
free after the copy.

- Upgraded to kernel 4.2.0
- Added a third 4T device to the filesystem
- Ran a balance to get an even repartition of data/metadata among the 3 
drives
- Kernel BUG after a couple hours. The btrfs balance userspace tool 
segfaulted at the same time. Due to apport default configuration (damn 
you, Ubuntu !), core file was discarded, but I don't think the segfault 
is really interesting. The kernel trace is.


This was all done within ~1 week.

I've just created an image of the metadata, using btrfs-image -s. The 
image is 2.9G large, I can drop it somewhere in case a dev would like to 
have a look at it.


For what it's worth, I've been hitting another kernel BUG, almost 
certainly related, while trying to dev del the 3rd device, after 8 hours 
of work (kernel 4.1.7) :


kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2248!
in __btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]

Trace:
[] ? __percpu_counter_add+0x55/0x70
[] btrfs_run_delayed_refs.part.66+0x73/0x270 [btrfs]
[] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
[] btrfs_should_end_transaction+0x49/0x60 [btrfs]
[] btrfs_drop_snapshot+0x472/0x880 [btrfs]
[] ? should_ignore_root.part.15+0x50/0x50 [btrfs]
[] merge_reloc_roots+0xd9/0x240 [btrfs]
[] relocate_block_group+0x269/0x670 [btrfs]
[] btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
[] btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
[] btrfs_shrink_device+0x1d4/0x450 [btrfs]
[] btrfs_rm_device+0x323/0x810 [btrfs]
[] btrfs_ioctl+0x1e86/0x2b30 [btrfs]
[] ? filemap_map_pages+0x1d4/0x230
[] ? handle_mm_fault+0xd95/0x17e0
[] ? from_kgid_munged+0x12/0x20
[] ? cp_new_stat+0x140/0x160
[] do_vfs_ioctl+0x2f8/0x510
[] ? __do_page_fault+0x1b6/0x450
[] ? SYSC_newstat+0x2f/0x40
[] 

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-16 Thread Holger Hoffstätte
On 09/16/15 12:28, Stéphane Lesimple wrote:
> Nice to know that this bug was already somewhat known, but I can
> confirm that it actually doesn't come from an ext4 conversion on my
> case.

In that case the "crossing stripe boundary" messages are false positives
in btrfs-progs-4.2: http://www.spinics.net/lists/linux-btrfs/msg47059.html

This should be fixed in the next release.

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-16 Thread Stéphane Lesimple

Le 2015-09-16 12:46, Holger Hoffstätte a écrit :

On 09/16/15 12:28, Stéphane Lesimple wrote:

Nice to know that this bug was already somewhat known, but I can
confirm that it actually doesn't come from an ext4 conversion on my
case.


In that case the "crossing stripe boundary" messages are false 
positives
in btrfs-progs-4.2: 
http://www.spinics.net/lists/linux-btrfs/msg47059.html


This should be fixed in the next release.


Out of curiosity I compiled the btrfs-progs-4.2 release patched with the 
diff you're referencing to fix the off-by-one error, and ran a btrfsck 
again.
Indeed those errors disappear and my filesystem seems clean in this 
regard. I also disabled quota because it has almost for sure nothing to 
do with the bug, and now btrsfck is 100% happy:


-
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on 
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164

UUID: 6bec1608-d9c0-453e-87eb-8b8663c9010d
found 2922178546042 bytes used err is 0
total csum bytes: 2849102736
total tree bytes: 4697341952
total fs tree bytes: 1276395520
total extent tree bytes: 90963968
btree space waste bytes: 640514848
file data blocks allocated: 2959998808064
 referenced 2959997575168
btrfs-progs v4.2-dirty
-

So this is even more interesting, my filesystem is reported by scrub and 
fsck as being in perfect shape, but still crashes the kernel from time 
to time on balance.


Next step: reboot under 4.3.0-rc1 with my printk's, run a balance, log 
the crash, reboot, balance again, crash again, and compare. If the same 
filesystem spot triggers the crash twice then it would be an undetected 
metadata/filesystem internal integrity corruption, if it crashes at 2 
different spots, then maybe it's some kind of race condition that, for 
some reason, my system hits way more often than others.


--
Stéphane.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-16 Thread Stéphane Lesimple

Le 2015-09-16 22:18, Duncan a écrit :
Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as 
excerpted:



Le 2015-09-16 12:46, Holger Hoffstätte a écrit :



I also disabled quota because it has almost for sure nothing to
do with the bug, and now btrsfck is 100% happy:


Yes.  Quotas have been a continuing issue on btrfs.  AFAIK, they're on
the third rewrite now, and still have some bugs to work out.  So what
I've been recommending is unless you're (a) directly and specifically
working with the devs to find and fix the quota issues (in which case
please keep at it! =:^), either you (b) really depend on quotas working
and btrfs isn't appropriate for you at this time, since they're known 
to
be buggy, so use a more mature filesystem where they're known to work, 
or
(c) you don't actually need quotas at all, so disable them and remove 
the
quota tracking metadata, thus avoiding the bugs in the feature 
entirely.

=:^)


Well actually it's the (d) option ;)
I activate the quota feature for only one reason : being able to track 
down how much space my snapshots are taking.
I've been using ZFS in the past, and I was really missing the "zfs list" 
command that is able to tell you how much space a given snapshot is 
actually taking under btrfs.
With quota enabled, I was able to somehow mimic zfs list with a perl 
script I wrote, btrfs-list :


PATHID TYPE REFER  USED
'tank'  -1   df - 2.66T 
(13.26G free)

   /tank 5  vol48.00K48.00K
   media  1906   subvol 1.04T 1.04T
   photos 1909   subvol   116.37G   116.37G
   main   1911   subvol   973.23G   973.23G
   bkp-slash  3270   subvol15.86G15.86G
   bkp-quasar 3314   subvol18.26G16.00K
  .snaps/bkp-quasar@2015-01-173317   rosnap17.77G40.76M
  .snaps/bkp-quasar@2015-03-063318   rosnap17.89G88.88M
  .snaps/bkp-quasar@2015-04-053319   rosnap17.92G90.97M
  .snaps/bkp-quasar@2015-05-313320   rosnap17.95G 1.02M
  .snaps/bkp-quasar@2015-06-133321   rosnap17.95G   760.00K
  .snaps/bkp-quasar@2015-07-263322   rosnap18.19G17.88M
  .snaps/bkp-quasar@2015-07-313323   rosnap18.19G14.58M
  .snaps/bkp-quasar@2015-08-033324   rosnap18.19G17.43M
   bkp-liznodisk  3341   subvol 7.01G16.00K
  .snaps/bkp-liznodisk@2015-03-01 3342   rosnap 6.96G75.37M
  .snaps/bkp-liznodisk@2015-03-28 3343   rosnap 6.98G84.93M
  .snaps/bkp-liznodisk@2015-04-26 3344   rosnap 6.96G67.14M
  .snaps/bkp-liznodisk@2015-05-24 3345   rosnap 6.95G47.67M
  .snaps/bkp-liznodisk@2015-06-27 3346   rosnap 6.96G67.97M
  .snaps/bkp-liznodisk@2015-07-25 3347   rosnap 6.98G60.30M
  .snaps/bkp-liznodisk@2015-08-16 3348   rosnap 7.10G   159.44M
   bkp-skyline3367   subvol22.52G22.52G

I just pushed it to https://github.com/speed47/btrfs-list, if anybody is 
interested.


Anyway, balance is running again for 7+ hours, trying to reproduce the 
bug twice, but no crash yet ... should happen soon now !


--
Stéphane.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-16 Thread Qu Wenruo



Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:

Le 2015-09-16 22:18, Duncan a écrit :

Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted:


Le 2015-09-16 12:46, Holger Hoffstätte a écrit :



I also disabled quota because it has almost for sure nothing to
do with the bug, and now btrsfck is 100% happy:


Yes.  Quotas have been a continuing issue on btrfs.  AFAIK, they're on
the third rewrite now, and still have some bugs to work out.  So what
I've been recommending is unless you're (a) directly and specifically
working with the devs to find and fix the quota issues (in which case
please keep at it! =:^), either you (b) really depend on quotas working
and btrfs isn't appropriate for you at this time, since they're known to
be buggy, so use a more mature filesystem where they're known to work, or
(c) you don't actually need quotas at all, so disable them and remove the
quota tracking metadata, thus avoiding the bugs in the feature entirely.
=:^)


Well actually it's the (d) option ;)
I activate the quota feature for only one reason : being able to track
down how much space my snapshots are taking.
I've been using ZFS in the past, and I was really missing the "zfs list"
command that is able to tell you how much space a given snapshot is
actually taking under btrfs.
With quota enabled, I was able to somehow mimic zfs list with a perl
script I wrote, btrfs-list :

PATHID TYPE REFER  USED
'tank'  -1   df - 2.66T
(13.26G free)
/tank 5  vol48.00K48.00K
media  1906   subvol 1.04T 1.04T
photos 1909   subvol   116.37G   116.37G
main   1911   subvol   973.23G   973.23G
bkp-slash  3270   subvol15.86G15.86G
bkp-quasar 3314   subvol18.26G16.00K
   .snaps/bkp-quasar@2015-01-173317   rosnap17.77G40.76M
   .snaps/bkp-quasar@2015-03-063318   rosnap17.89G88.88M
   .snaps/bkp-quasar@2015-04-053319   rosnap17.92G90.97M
   .snaps/bkp-quasar@2015-05-313320   rosnap17.95G 1.02M
   .snaps/bkp-quasar@2015-06-133321   rosnap17.95G   760.00K
   .snaps/bkp-quasar@2015-07-263322   rosnap18.19G17.88M
   .snaps/bkp-quasar@2015-07-313323   rosnap18.19G14.58M
   .snaps/bkp-quasar@2015-08-033324   rosnap18.19G17.43M
bkp-liznodisk  3341   subvol 7.01G16.00K
   .snaps/bkp-liznodisk@2015-03-01 3342   rosnap 6.96G75.37M
   .snaps/bkp-liznodisk@2015-03-28 3343   rosnap 6.98G84.93M
   .snaps/bkp-liznodisk@2015-04-26 3344   rosnap 6.96G67.14M
   .snaps/bkp-liznodisk@2015-05-24 3345   rosnap 6.95G47.67M
   .snaps/bkp-liznodisk@2015-06-27 3346   rosnap 6.96G67.97M
   .snaps/bkp-liznodisk@2015-07-25 3347   rosnap 6.98G60.30M
   .snaps/bkp-liznodisk@2015-08-16 3348   rosnap 7.10G   159.44M
bkp-skyline3367   subvol22.52G22.52G

I just pushed it to https://github.com/speed47/btrfs-list, if anybody is
interested.

Anyway, balance is running again for 7+ hours, trying to reproduce the
bug twice, but no crash yet ... should happen soon now !


Yeah, that's completely one of the ideal use case of btrfs qgroup.

But I'm quite curious about the btrfsck error report on qgroup.

If btrfsck report such error, it means either I'm too confident about 
the recent qgroup accounting rework, or btrfsck has some bug which I 
didn't take much consideration during the kernel rework.


Would you please provide the full result of previous btrfsck with qgroup 
error?


Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-15 Thread Duncan
Stéphane Lesimple posted on Tue, 15 Sep 2015 23:47:01 +0200 as excerpted:

> Le 2015-09-15 16:56, Josef Bacik a écrit :
>> On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
 I've been experiencing repetitive "kernel BUG" occurences in the past
 few days trying to balance a raid5 filesystem after adding a new
 drive.
 It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
>>> 
>>> I've ran a scrub on this filesystem after the crash happened twice,
>>> and if found no errors.
>>> 
>>> The BUG_ON() condition that my filesystem triggers is the following :
>>> 
>>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
>>> // in insert_inline_extent_backref() of extent-tree.c.
>>> 
>> Does btrfsck complain at all?

Just to elucidate a bit...

Scrub is designed to detect, and where there's a second copy available 
(dup or raid1/10 modes, raid5/6 modes can reconstruct from parity) 
correct, exactly one problem, corruption where the checksum stored at 
data write doesn't match that computed on data read back from storage.  
As such, it detects/corrects media errors and (perhaps more commonly) 
corrupted data due to crashes in the middle of the write, but if the data 
was bad when it was written in the first place and thus the checksum 
covering it simply validates what was already bad before the write 
happened, scrub will be none the wiser and will happily validate the 
incorrect data, since it's a totally valid checksum covering data that 
was bad before the checksum was ever created.

Which is where btrfs check comes in and why JB asked you to run it, since 
unlike scrub, check is designed to catch filesystem logic errors.

> Thanks for your suggestion.
> You're right, even if btrfs scrub didn't complain, btrfsck does :
> 
> checking extents
> bad metadata [4179166806016, 4179166822400) crossing stripe boundary
> bad metadata [4179166871552, 4179166887936) crossing stripe boundary
> bad metadata [4179166937088, 4179166953472) crossing stripe boundary

This is an actively in-focus bug ATM, and while I'm not a dev and can't 
tell you for sure that it's behind the specific balance-related crash and 
traces you posted (tho I believe it so), it certainly has the potential 
to be that serious, yes.

The most common cause is a buggy btrfs-convert that was creating invalid 
btrfs when converting from ext* at one point.  AFAIK they've hotfixed the 
immediate convert issue, but are still actively working on a longer term 
proper fix.  Meanwhile, while btrfs check does now detect the issue (and 
even that is quite new code, added in 4.2 I believe), there's still no 
real fix for what was after all a defective btrfs from the moment the 
convert was done.

So where that's the cause, the filesystem was created from an ext* fs 
using a buggy btrfs-convert and is thus actually invalid due to this 
cross-stripe-metadata, the current fix is to back up the files you want 
to keep (and FWIW, as any good sysadmin will tell you, a backup that 
hasn't been tested restorable isn't yet a backup, as the job isn't 
complete), then blow away and recreate the filesystem properly, using 
mkfs.btrfs, and of course then restore to the new filesystem.

If, however, you created the filesystem using mkfs.btrfs, then the 
problem must have occurred some other way.  Whether there's some other 
cause beyond the known cause, a buggy btrfs-convert, has in fact been in 
question, so in this case the devs are likely to be quite interested 
indeed in your case and perhaps the filesystem history that brought you 
to this point.  The ultimate fix is likely to be the same (unless the 
devs have you test new fix code for btrfs check --repair), but I'd 
strongly urge you to delay blowing away the filesystem, if possible, 
until the devs have a chance to ask you to run other diagnostics and 
perhaps even get a btrfs-image for them, since you may well have 
accidentally found a corner-case they'll have trouble reproducing, 
without your information.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-15 Thread Stéphane Lesimple

Le 2015-09-15 16:56, Josef Bacik a écrit :

On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:

I've been experiencing repetitive "kernel BUG" occurences in the past
few days trying to balance a raid5 filesystem after adding a new 
drive.

It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.


I've ran a scrub on this filesystem after the crash happened twice, 
and

if found no errors.

The BUG_ON() condition that my filesystem triggers is the following :

BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
// in insert_inline_extent_backref() of extent-tree.c.

I've compiled a fresh 4.3.0-rc1 with a couple added printk's just 
before

the BUG_ON(), to dump the parameters passed to
insert_inline_extent_backref() when the problem occurs.
Here is an excerpt of the resulting dmesg :

{btrfs} in insert_inline_extent_backref, got owner <
BTRFS_FIRST_FREE_OBJECTID
{btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432
root_objectid=3339 owner=1 offset=0 refs_to_add=1
BTRFS_FIRST_FREE_OBJECTID=256
[ cut here ]
kernel BUG at fs/btrfs/extent-tree.c:1837!

I'll retry with the exact same kernel once I get the machine back up,
and see if the the bug happens again at the same filesystem spot or a
different one.
The variable amount of time after a balance start elapsed before I get
the bug suggests that this would be a different one.



Does btrfsck complain at all?


Thanks for your suggestion.
You're right, even if btrfs scrub didn't complain, btrfsck does :

checking extents
bad metadata [4179166806016, 4179166822400) crossing stripe boundary
bad metadata [4179166871552, 4179166887936) crossing stripe boundary
bad metadata [4179166937088, 4179166953472) crossing stripe boundary
[... some more ...]
extent buffer leak: start 4561066901504 len 16384
extent buffer leak: start 4561078812672 len 16384
extent buffer leak: start 4561078861824 len 16384
[... some more ...]
then some complains about mismatched counts for qgroups.

I can see from tbe btrfsck source code that the --repair will not work 
here, so I didn't try.


I'm not sure if those errors would be a cause or a consequence of the 
bug. As the filesystem was only a few days old and as there was always a 
balance running during the crashes, I would be tempted to think it might 
actually be a consequence, but I can't be sure.

In your experience, could these inconsistencies cause the crash ?

If you think so, then I'll btrfs dev del the 3rd device, then remount 
the array degraded with just 1 disk and create a new btrfs system from 
scratch on the second, then copy the data in single redundancy, then 
re-add the 2 disks and balance convert in raid5.


If you think not, then this array could still help you debug a corner 
case, and I can keep it that way for a couple days if more testing/debug 
is needed.


Thanks,

--
Stéphane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-15 Thread Stéphane Lesimple

I've been experiencing repetitive "kernel BUG" occurences in the past
few days trying to balance a raid5 filesystem after adding a new drive.
It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.


I've ran a scrub on this filesystem after the crash happened twice, and 
if found no errors.


The BUG_ON() condition that my filesystem triggers is the following :

BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
// in insert_inline_extent_backref() of extent-tree.c.

I've compiled a fresh 4.3.0-rc1 with a couple added printk's just before 
the BUG_ON(), to dump the parameters passed to 
insert_inline_extent_backref() when the problem occurs.

Here is an excerpt of the resulting dmesg :

{btrfs} in insert_inline_extent_backref, got owner < 
BTRFS_FIRST_FREE_OBJECTID
{btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432 
root_objectid=3339 owner=1 offset=0 refs_to_add=1 
BTRFS_FIRST_FREE_OBJECTID=256

[ cut here ]
kernel BUG at fs/btrfs/extent-tree.c:1837!

I'll retry with the exact same kernel once I get the machine back up, 
and see if the the bug happens again at the same filesystem spot or a 
different one.
The variable amount of time after a balance start elapsed before I get 
the bug suggests that this would be a different one.


--
Stéphane.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

2015-09-15 Thread Josef Bacik

On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:

I've been experiencing repetitive "kernel BUG" occurences in the past
few days trying to balance a raid5 filesystem after adding a new drive.
It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.


I've ran a scrub on this filesystem after the crash happened twice, and
if found no errors.

The BUG_ON() condition that my filesystem triggers is the following :

BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
// in insert_inline_extent_backref() of extent-tree.c.

I've compiled a fresh 4.3.0-rc1 with a couple added printk's just before
the BUG_ON(), to dump the parameters passed to
insert_inline_extent_backref() when the problem occurs.
Here is an excerpt of the resulting dmesg :

{btrfs} in insert_inline_extent_backref, got owner <
BTRFS_FIRST_FREE_OBJECTID
{btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432
root_objectid=3339 owner=1 offset=0 refs_to_add=1
BTRFS_FIRST_FREE_OBJECTID=256
[ cut here ]
kernel BUG at fs/btrfs/extent-tree.c:1837!

I'll retry with the exact same kernel once I get the machine back up,
and see if the the bug happens again at the same filesystem spot or a
different one.
The variable amount of time after a balance start elapsed before I get
the bug suggests that this would be a different one.



Does btrfsck complain at all?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html