On 2021/1/18 下午6:33, Erik Jensen wrote:
I ended up having other priorities occupying my time since 2019, and the
"solution" of exporting the individual drives on my NAS using NBD and
mounting them on my desktop worked, even if it wasn't pretty.

However, I am currently looking into Syncthing, which I would like to
run on the NAS directly. That would, of course, require accessing the
filesystem directly on the NAS rather than just exporting the raw
devices, which means circling back to this issue.

After updating my NAS, I have determined that the issue still occurs
with Linux 5.8.

What's the next best step for debugging the issue? Ideally, I'd like to
help track down the issue to find a proper fix, rather than just trying
to bypass the issue. I wasn't sure if the suggestion to comment out
btrfs_verify_dev_extents() was more geared toward the former or the latter.

After rewinding my memory on this case, the problem is really that the
ARM btrfs kernel is reading garbage, while X86 or ARM user space tool
works as expected.

Can you recompile your kernel on the ARM board to add extra debugging
messages?
If possible, we can try to add some extra debug points to bombarding
your dmesg.

Or do you have other ARM boards to test the same fs?


Thanks,
Qu



On Fri, Jun 28, 2019 at 1:15 AM Qu Wenruo <quwenruo.bt...@gmx.com
<mailto:quwenruo.bt...@gmx.com>> wrote:



    On 2019/6/28 下午4:00, Erik Jensen wrote:
     >> So it's either the block layer reading some wrong from the disk
    or btrfs
     >> layer doesn't do correct endian convert.
     >
     > My ARM board is running in little endian mode, so it doesn't seem
    like
     > endianness should be an issue. (It is 32-bits versus my desktop's 64,
     > though.) I've also tried exporting the drives via NBD to my x86_64
     > system, and that worked fine, so if the problem is under btrfs, it
     > would have to be in the encryption layer, but fsck succeeding on the
     > ARM board would seem to rule that out, as well.
     >
     >> Would you dump the following data (X86 and ARM should output the
    same
     >> content, thus one output is enough).
     >> # btrfs ins dump-tree -b 17628726968320 /dev/dm-3
     >> # btrfs ins dump-tree -b 17628727001088 /dev/dm-3
     >
     > Attached, and also 17628705964032, since that's the block
    mentioned in
     > my most recent mount attempt (see below).

    The trees are completely fine.

    So it should be something else causing the problem.

     >
     >> And then, for the ARM system, please apply the following diff,
    and try
     >> mount again.
     >> The diff adds extra debug info, to exam the vital members of a
    tree block.
     >>
     >> Correct fs should output something like:
     >>   BTRFS error (device dm-4): bad tree block start, want 30408704
    have 0
     >>   tree block gen=4 owner=5 nritems=2 level=0
     >>   csum:
     >>
    a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000
     >>
     >> The csum one is the most important one, if there aren't so many
    zeros,
     >> it means at that timing, btrfs just got a bunch of garbage, thus we
     >> could do further debug.
     >
     > [  131.725573] BTRFS info (device dm-1): disk space caching is
    enabled
     > [  131.731884] BTRFS info (device dm-1): has skinny extents
     > [  133.046145] BTRFS error (device dm-1): bad tree block start, want
     > 17628705964032 have 2807793151171243621
     > [  133.055775] tree block gen=7888986126946982446
     > owner=11331573954727661546 nritems=4191910623 level=112
     > [  133.065661] csum:
     >
    416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc

    Completely garbage here, so I'd say the data we got isn't what we want.

     > [  133.108383] BTRFS error (device dm-1): bad tree block start, want
     > 17628705964032 have 2807793151171243621
     > [  133.117999] tree block gen=7888986126946982446
     > owner=11331573954727661546 nritems=4191910623 level=112
     > [  133.127756] csum:
     >
    416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc

    But strangely, the 2nd try still gives us the same result, if it's
    really some garbage, we should get some different result.

     > [  133.136241] BTRFS error (device dm-1): failed to verify dev
    extents
     > against chunks: -5

    You can try to skip the dev extents verification by commenting out the
    btrfs_verify_dev_extents() call in disk-io.c::open_ctree().

    It may fail at another location though.

    The more strange part is, we have the device tree root node read out
    without problem.

    Thanks,
    Qu

     > [  133.166165] BTRFS error (device dm-1): open_ctree failed
     >
     > I copied some files over last time I had it mounted on my desktop,
     > which may be why it's now failing at a different block.
     >
     > Thanks!
     >

Reply via email to