FWIW, just recompiled with the patch to be 100% sure, as I still had
the problematic FS around untouched:

[  199.722122] BTRFS info (device dm-10): balance: start 
-dvrange=34625344765952..34625344765953
[  199.730267] BTRFS info (device dm-10): relocating block group 34625344765952 
flags data|raid1
[  212.232222] BTRFS info (device dm-10): found 167 extents, stage: move data 
extents
[  236.124541] BTRFS info (device dm-10): found 167 extents, stage: update data 
pointers
[  248.011778] BTRFS info (device dm-10): balance: ended with status: 0

As expected, all is good now!

Tested-By: Stéphane Lesimple <stephane_btr...@lesimple.fr>

-- 
Stéphane.

January 4, 2021 5:18 PM, "David Sterba" <dste...@suse.cz> wrote:

> On Tue, Dec 29, 2020 at 09:29:34PM +0800, Qu Wenruo wrote:
> 
>> [BUG]
>> There are several bug reports about recent kernel unable to relocate
>> certain data block groups.
>> 
>> Sometimes the error just go away, but there is one reporter who can
>> reproduce it reliably.
>> 
>> The dmesg would look like:
>> [ 438.260483] BTRFS info (device dm-10): balance: start 
>> -dvrange=34625344765952..34625344765953
>> [ 438.269018] BTRFS info (device dm-10): relocating block group 
>> 34625344765952 flags data|raid1
>> [ 450.439609] BTRFS info (device dm-10): found 167 extents, stage: move data 
>> extents
>> [ 463.501781] BTRFS info (device dm-10): balance: ended with status: -2
>> 
>> [CAUSE]
>> The -ENOENT error is returned from the following chall chain:
>> 
>> add_data_references()
>> |- delete_v1_space_cache();
>> |- if (!found)
>> return -ENOENT;
>> 
>> The variable @found is set to true if we find a data extent whose
>> disk bytenr matches parameter @data_bytes.
>> 
>> With extra debug, the offending tree block looks like this:
>> leaf bytenr = 42676709441536, data_bytenr = 34626327621632
>> 
>> ctime 1567904822.739884119 (2019-09-08 03:07:02)
>> mtime 0.0 (1970-01-01 01:00:00)
>> otime 0.0 (1970-01-01 01:00:00)
>> item 27 key (51933 EXTENT_DATA 0) itemoff 9854 itemsize 53
>> generation 1517381 type 2 (prealloc)
>> prealloc data disk byte 34626327621632 nr 262144 <<<
>> prealloc data offset 0 nr 262144
>> item 28 key (52262 ROOT_ITEM 0) itemoff 9415 itemsize 439
>> generation 2618893 root_dirid 256 bytenr 42677048360960 level 3 refs 1
>> lastsnap 2618893 byte_limit 0 bytes_used 5557338112 flags 0x0(none)
>> uuid d0d4361f-d231-6d40-8901-fe506e4b2b53
>> 
>> Although item 27 has disk bytenr 34626327621632, which matches the
>> data_bytenr, its type is prealloc, not reg.
>> This makes the existing code skip that item, and return -ENOENT.
>> 
>> [FIX]
>> The code is modified in commit 19b546d7a1b2 ("btrfs: relocation: Use
>> btrfs_find_all_leafs to locate data extent parent tree leaves"), before
>> that commit, we use something like
>> "if (type == BTRFS_FILE_EXTENT_INLINE) continue;".
>> 
>> But in that offending commit, we use (type == BTRFS_FILE_EXTENT_REG),
>> ignoring BTRFS_FILE_EXTENT_PREALLOC.
>> 
>> Fix it by also checking BTRFS_FILE_EXTENT_PREALLOC.
>> 
>> Reported-by: Stéphane Lesimple <stephane_btr...@lesimple.fr>
>> Fixes: 19b546d7a1b2 ("btrfs: relocation: Use btrfs_find_all_leafs to locate 
>> data extent parent tree
>> leaves")
>> Signed-off-by: Qu Wenruo <w...@suse.com>
> 
> Thank you all for tracking down the bug, added to misc-next.

Reply via email to