On Wed, Feb 17, 2021 at 5:24 PM Qu Wenruo <[email protected]> wrote:
> On 2021/2/11 上午7:47, Qu Wenruo wrote:
> > On 2021/2/11 上午6:17, Erik Jensen wrote:
> >> On Tue, Feb 9, 2021 at 9:47 PM Qu Wenruo <[email protected]> wrote:
> > [...]
> >>>
> >>> Unfortunately I didn't get much useful info from the trace events.
> >>> As a lot of the values doesn't even make sense to me....
> >>>
> >>> But the chunk tree dump proves to be more useful.
> >>>
> >>> Firstly, the offending tree block doesn't even occur in chunk chunk
> >>> ranges.
> >>>
> >>> The offending tree block is 26207780683776, but the tree dump doesn't
> >>> have any range there.
> >>>
> >>> The highest chunk is at 5958289850368 + 4294967296, still one digit
> >>> lower than the expected value.
> >>>
> >>> I'm surprised we didn't even get any error for that, thus it may
> >>> indicate our chunk mapping is incorrect too.
> >>>
> >>> Would you please try the following diff on the 32bit system and report
> >>> back the dmesg?
> >>>
> >>> The diff adds the following debug output:
> >>> - when we try to read one tree block
> >>> - when a bio is mapped to read device
> >>> - when a new chunk is added to chunk tree
> >>>
> >>> Thanks,
> >>> Qu
> >>
> >> Okay, here's the dmesg output from attempting to mount the filesystem:
> >> https://gist.github.com/rkjnsn/914651efdca53c83199029de6bb61e20
> >>
> >> I captured this on my 32-bit x86 VM, as it's much faster to rebuild
> >> the kernel there than on my ARM board, and it fails with the same
> >> error.
> >>
> >
> > This is indeed much better.
> >
> > The involved things are:
> >
> > [   84.463147] read_one_chunk: chunk start=26207148048384 len=1073741824
> > num_stripes=2 type=0x14
> > [   84.463148] read_one_chunk:    stripe 0 phy=6477927415808 devid=5
> > [   84.463149] read_one_chunk:    stripe 1 phy=6477927415808 devid=4
> >
> > Above is the chunk for the offending tree block.
> >
> > [   84.463724] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
> > [   84.463731] submit_stripe_bio: rw 0 0x1000, phy=2118735708160
> > sector=4138155680 dev_id=3 size=16384
> > [   84.470793] BTRFS error (device dm-4): bad tree block start, want
> > 26207780683776 have 3395945502747707095
> >
> > But when the metadata read happens, the physical address and dev id is
> > completely insane.
> >
> > The chunk doesn't have dev 3 in it at all, but we still get the wrong
> > mapping.
> >
> > Furthermore, that physical and devid belongs to chunk 8614760677376,
> > which is raid5 data chunk.
> >
> > So there is definitely something wrong in btrfs chunk mapping on 32bit.
> >
> > I'll craft a newer debug diff for you after I pinned down which can be
> > wrong.
>
> Sorry for the delay, mostly due to lunar new year vocation.
>
> Here is the new diff, it should be applied upon previous diff.
>
> This new diff would add extra debug info inside __btrfs_map_block().
>
> BTW, you only need to rebuild btrfs module to test it, hopes this saves
> you some time.
>
> Although if I could got a small enough image to reproduce locally, it
> would be the best case...
>
> Thanks,
> Qu
> >
> > Thanks,
> > Qu

Okay, here is the output with both patches applied:
https://gist.github.com/rkjnsn/7139eaf855687c6bd4ce371f88e28a9e

I've only run into the issue on this filesystem, which is quite large,
so I'm not sure how I would even attempt to make a reduced test case.

Thanks!

Reply via email to