On Wed, Feb 17, 2021 at 5:24 PM Qu Wenruo <[email protected]> wrote: > On 2021/2/11 上午7:47, Qu Wenruo wrote: > > On 2021/2/11 上午6:17, Erik Jensen wrote: > >> On Tue, Feb 9, 2021 at 9:47 PM Qu Wenruo <[email protected]> wrote: > > [...] > >>> > >>> Unfortunately I didn't get much useful info from the trace events. > >>> As a lot of the values doesn't even make sense to me.... > >>> > >>> But the chunk tree dump proves to be more useful. > >>> > >>> Firstly, the offending tree block doesn't even occur in chunk chunk > >>> ranges. > >>> > >>> The offending tree block is 26207780683776, but the tree dump doesn't > >>> have any range there. > >>> > >>> The highest chunk is at 5958289850368 + 4294967296, still one digit > >>> lower than the expected value. > >>> > >>> I'm surprised we didn't even get any error for that, thus it may > >>> indicate our chunk mapping is incorrect too. > >>> > >>> Would you please try the following diff on the 32bit system and report > >>> back the dmesg? > >>> > >>> The diff adds the following debug output: > >>> - when we try to read one tree block > >>> - when a bio is mapped to read device > >>> - when a new chunk is added to chunk tree > >>> > >>> Thanks, > >>> Qu > >> > >> Okay, here's the dmesg output from attempting to mount the filesystem: > >> https://gist.github.com/rkjnsn/914651efdca53c83199029de6bb61e20 > >> > >> I captured this on my 32-bit x86 VM, as it's much faster to rebuild > >> the kernel there than on my ARM board, and it fails with the same > >> error. > >> > > > > This is indeed much better. > > > > The involved things are: > > > > [ 84.463147] read_one_chunk: chunk start=26207148048384 len=1073741824 > > num_stripes=2 type=0x14 > > [ 84.463148] read_one_chunk: stripe 0 phy=6477927415808 devid=5 > > [ 84.463149] read_one_chunk: stripe 1 phy=6477927415808 devid=4 > > > > Above is the chunk for the offending tree block. > > > > [ 84.463724] read_extent_buffer_pages: eb->start=26207780683776 mirror=0 > > [ 84.463731] submit_stripe_bio: rw 0 0x1000, phy=2118735708160 > > sector=4138155680 dev_id=3 size=16384 > > [ 84.470793] BTRFS error (device dm-4): bad tree block start, want > > 26207780683776 have 3395945502747707095 > > > > But when the metadata read happens, the physical address and dev id is > > completely insane. > > > > The chunk doesn't have dev 3 in it at all, but we still get the wrong > > mapping. > > > > Furthermore, that physical and devid belongs to chunk 8614760677376, > > which is raid5 data chunk. > > > > So there is definitely something wrong in btrfs chunk mapping on 32bit. > > > > I'll craft a newer debug diff for you after I pinned down which can be > > wrong. > > Sorry for the delay, mostly due to lunar new year vocation. > > Here is the new diff, it should be applied upon previous diff. > > This new diff would add extra debug info inside __btrfs_map_block(). > > BTW, you only need to rebuild btrfs module to test it, hopes this saves > you some time. > > Although if I could got a small enough image to reproduce locally, it > would be the best case... > > Thanks, > Qu > > > > Thanks, > > Qu
Okay, here is the output with both patches applied: https://gist.github.com/rkjnsn/7139eaf855687c6bd4ce371f88e28a9e I've only run into the issue on this filesystem, which is quite large, so I'm not sure how I would even attempt to make a reduced test case. Thanks!
