Le 2015-09-17 08:29, Stéphane Lesimple a écrit :
Le 2015-09-16 15:04, Stéphane Lesimple a écrit :
I also disabled quota because it has almost for sure nothing
to do with the bug

As it turns out, it seems that this assertion was completely wrong.

I've got balance running for more than 16 hours now, without a crash.
This is almost 50% of the work done without any issue. Before, a crash
would happen within minutes, sometimes 1 hour, but not much more. The
problem is, I didn't change anything to the filesystem, well, appart
from the benign quota disable. So Qu's question about the qgroups
errors in fsck made me wonder : if I activate quota again, it'll still
continue to balance flawlessly, right ?

Well, it doesn't. I just ran btrfs quota enable on my filesystem, it
completed successfully after some minutes (rescan -s said that no
rescan was pending). Then less than 5 minutes later, the kernel
crashed, at the same BUG_ON() than usually :

[60156.062082] BTRFS info (device dm-3): relocating block group
972839452672 flags 129
[60185.203626] BTRFS info (device dm-3): found 1463 extents
[60414.452890] {btrfs} in insert_inline_extent_backref, got owner <
BTRFS_FIRST_FREE_OBJECTID
[60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384
parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1
BTRFS_FIRST_FREE_OBJECTID=256
[60414.452924] ------------[ cut here ]------------
[60414.452928] kernel BUG at fs/btrfs/extent-tree.c:1837!

owner is=1 again at this point in the code (this is still kernel
4.3.0-rc1 with my added printks).

So I'll disable quota, again, and resume the balance. If I'm right, it
should proceed without issue for 18 more hours !

Damn, wrong again. It just re-crashed without quota enabled :(
The fact that it went perfectly well for 17+ hours and crashed minutes after I reactivated quota might be by complete chance then ...

[ 5487.706499] {btrfs} in insert_inline_extent_backref, got owner < BTRFS_FIRST_FREE_OBJECTID [ 5487.706504] {btrfs} with bytenr=6906661109760 num_bytes=16384 parent=6905020874752 root_objectid=18446744073709551608 owner=1 offset=0 refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256
[ 5487.706536] ------------[ cut here ]------------
[ 5487.706539] kernel BUG at fs/btrfs/extent-tree.c:1837!

For reference, the crash I had earlier this morning was as follows :

[60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384 parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256

So, this is a completely different part of the filesystem.
The bug is always the same though, owner=1 where it shouldn't be < 256.

Balance cancelled.

To me, it sounds like some sort of race condition. But I'm out of ideas on what to test now.

--
Stéphane.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to