FWIW, just recompiled with the patch to be 100% sure, as I still had the problematic FS around untouched:
[ 199.722122] BTRFS info (device dm-10): balance: start -dvrange=34625344765952..34625344765953 [ 199.730267] BTRFS info (device dm-10): relocating block group 34625344765952 flags data|raid1 [ 212.232222] BTRFS info (device dm-10): found 167 extents, stage: move data extents [ 236.124541] BTRFS info (device dm-10): found 167 extents, stage: update data pointers [ 248.011778] BTRFS info (device dm-10): balance: ended with status: 0 As expected, all is good now! Tested-By: Stéphane Lesimple <stephane_btr...@lesimple.fr> -- Stéphane. January 4, 2021 5:18 PM, "David Sterba" <dste...@suse.cz> wrote: > On Tue, Dec 29, 2020 at 09:29:34PM +0800, Qu Wenruo wrote: > >> [BUG] >> There are several bug reports about recent kernel unable to relocate >> certain data block groups. >> >> Sometimes the error just go away, but there is one reporter who can >> reproduce it reliably. >> >> The dmesg would look like: >> [ 438.260483] BTRFS info (device dm-10): balance: start >> -dvrange=34625344765952..34625344765953 >> [ 438.269018] BTRFS info (device dm-10): relocating block group >> 34625344765952 flags data|raid1 >> [ 450.439609] BTRFS info (device dm-10): found 167 extents, stage: move data >> extents >> [ 463.501781] BTRFS info (device dm-10): balance: ended with status: -2 >> >> [CAUSE] >> The -ENOENT error is returned from the following chall chain: >> >> add_data_references() >> |- delete_v1_space_cache(); >> |- if (!found) >> return -ENOENT; >> >> The variable @found is set to true if we find a data extent whose >> disk bytenr matches parameter @data_bytes. >> >> With extra debug, the offending tree block looks like this: >> leaf bytenr = 42676709441536, data_bytenr = 34626327621632 >> >> ctime 1567904822.739884119 (2019-09-08 03:07:02) >> mtime 0.0 (1970-01-01 01:00:00) >> otime 0.0 (1970-01-01 01:00:00) >> item 27 key (51933 EXTENT_DATA 0) itemoff 9854 itemsize 53 >> generation 1517381 type 2 (prealloc) >> prealloc data disk byte 34626327621632 nr 262144 <<< >> prealloc data offset 0 nr 262144 >> item 28 key (52262 ROOT_ITEM 0) itemoff 9415 itemsize 439 >> generation 2618893 root_dirid 256 bytenr 42677048360960 level 3 refs 1 >> lastsnap 2618893 byte_limit 0 bytes_used 5557338112 flags 0x0(none) >> uuid d0d4361f-d231-6d40-8901-fe506e4b2b53 >> >> Although item 27 has disk bytenr 34626327621632, which matches the >> data_bytenr, its type is prealloc, not reg. >> This makes the existing code skip that item, and return -ENOENT. >> >> [FIX] >> The code is modified in commit 19b546d7a1b2 ("btrfs: relocation: Use >> btrfs_find_all_leafs to locate data extent parent tree leaves"), before >> that commit, we use something like >> "if (type == BTRFS_FILE_EXTENT_INLINE) continue;". >> >> But in that offending commit, we use (type == BTRFS_FILE_EXTENT_REG), >> ignoring BTRFS_FILE_EXTENT_PREALLOC. >> >> Fix it by also checking BTRFS_FILE_EXTENT_PREALLOC. >> >> Reported-by: Stéphane Lesimple <stephane_btr...@lesimple.fr> >> Fixes: 19b546d7a1b2 ("btrfs: relocation: Use btrfs_find_all_leafs to locate >> data extent parent tree >> leaves") >> Signed-off-by: Qu Wenruo <w...@suse.com> > > Thank you all for tracking down the bug, added to misc-next.