On 2020/12/29 上午7:39, Qu Wenruo wrote:


On 2020/12/29 上午3:58, Stéphane Lesimple wrote:
I know it fails in relocate_block_group(), which returns -2, I'm
currently
adding a couple printk's here and there to try to pinpoint that better.

Okay, so btrfs_relocate_block_group() starts with stage
MOVE_DATA_EXTENTS, which
completes successfully, as relocate_block_group() returns 0:

BTRFS info (device <unknown>): relocate_block_group:
prepare_to_realocate = 0
BTRFS info (device <unknown>): relocate_block_group loop: progress =
1, btrfs_start_transaction = ok
[...]
BTRFS info (device <unknown>): relocate_block_group loop: progress =
168, btrfs_start_transaction = ok
BTRFS info (device <unknown>): relocate_block_group: returning err = 0
BTRFS info (device dm-10): stage = move data extents,
relocate_block_group = 0
BTRFS info (device dm-10): found 167 extents, stage: move data extents

Then it proceeds to the UPDATE_DATA_PTRS stage and calls
relocate_block_group()
again. This time it'll fail at the 92th iteration of the loop:

BTRFS info (device <unknown>): relocate_block_group loop: progress =
92, btrfs_start_transaction = ok
BTRFS info (device <unknown>): relocate_block_group loop:
extents_found = 92, item_size(53) >= sizeof(*ei)(24), flags = 1, ret = 0
BTRFS info (device <unknown>): add_data_references:
btrfs_find_all_leafs = 0
BTRFS info (device <unknown>): add_data_references loop:
read_tree_block ok
BTRFS info (device <unknown>): add_data_references loop:
delete_v1_space_cache = -2

Damn it, if we find no v1 space cache for the block group, it means
we're fine to continue...

BTRFS info (device <unknown>): relocate_block_group loop:
add_data_references = -2

Then the -ENOENT goes all the way up the call stack and aborts the
balance.

So it fails in delete_v1_space_cache(), though it is worth noting that
the
FS we're talking about is actually using space_cache v2.

Space cache v2, no wonder no v1 space cache.


Does it help? Shall I dig deeper?

You're already at the point!

Mind me to craft a fix with your signed-off-by?

The problem is more complex than I thought, but still we at least have
some workaround.

Firstly, this happens when an old fs get v2 space cache enabled, but
still has v1 space cache left.

Newer v2 mount should cleanup v1 properly, but older kernel doesn't do
the proper cleaning, thus left some v1 cache.

Then we call btrfs balance on such old fs, leading to the -ENOENT error.
We can't ignore the error, as we have no way to relocate such left over
v1 cache (normally we delete it completely, but with v2 cache, we can't).

So what I can do is only to add a warning message to the problem.

To solve your problem, I also submitted a patch to btrfs-progs, to force
v1 space cache cleaning even if the fs has v2 space cache enabled.

Or, you can disable v2 space cache first, using "btrfs check
--clear-space-cache v2" first, then "btrfs check --clear-space_cache
v1", and finally mount the fs with "space_cache=v2" again.

To verify there is no space cache v1 left, you can run the following
command to verify:

# btrfs ins dump-tree -t root <device> | grep EXTENT_DATA

It should output nothing.

Then please try if you can balance all your data.

Thanks,
Qu


Thanks,
Qu


Regards,

Stéphane.

Reply via email to