> >>>>> TL;DR It seems as regression in 4.17, but I managed to find a > >>>>> workaround to make filesystem rw mountable again. > >>>>> > >>>>> Kernel built from tag v4.17-rc1 > >>>>> btrfs-progs 4.16 > >>>>> > >>>>> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were > >>>>> doing usual weekly balance with this command via cron: > >>>>> btrfs balance start -musage=50 -dusage=50 <mountpoint> > >>>>> Both machines run same kernel version. > >>>>> > >>>>> On PC that caused root and "data" filesystems to go readonly. Root > >>>>> is on an SSD with data single and metadata DUP, "data" filesystem > >>>>> is on 2 HDDs with RAID1 for data and metadata. > >>>>> > >>>>> On laptop only /home went ro, it's on NVMe SSD with data single and > >>>>> metadata DUP. > >>>>> > >>>>> Btrfs check of PC rootfs was without any errors in both modes, I did > >>>>> them once each before reboot on readonly filesystem with --force > >>>>> flag and then from live usb. Same output without any errors. > >>>>> > >>>>> After reboot kernel refused rw mount rootfs with the same error as > >>>>> during cron balance, ro mount was accepted, error during rw mount: > >>>>> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117 > >>> > >>>> 117 means EUCLEAN, which could be caused by the newly introduced > >>>> first_key and level check. > >>> > >>>> Please apply this hotfix to fix it. > >>>> btrfs: Only check first key for committed tree blocks > >>>> (Which is included in latest pull request) > >>> > >>>> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra > >>>> debug info. > >>> > >>>> Thanks, > >>>> Qu > >>> > >>> I tried 4.17-rc2 (as the pull request was pulled) with > >>> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb) > >>> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg > >>> attached. > >> > >> Thanks for the info and your previous btrfs-image. > >> > >> The image itself shows nothing wrong, so it should be runtime problem. > >> Would you please apply these two debug patches? > >> https://patchwork.kernel.org/patch/10335133/ > >> https://patchwork.kernel.org/patch/10335135/ > >> > >> And the attached diff file? > >> > >> My guess is the parent node is not initialized correctly in this case. > >> > >> Thanks, > >> Qu > > > > Dmesg from kernel with all three patches applied attached. > > > Thanks for the debug info, it really helps a lot! > > It turns out that I'm just a super idiot, a typo in replace_path() > caused this, and it could not be trigger unless we enter it from > relocation recovery. > > Please try the attached patch to see if it solves the problem. > > Thanks, > Qu Glad to help, the patch solved the problem, rw mount is successful and balance finished, no errors or debug output, btrfs check is clean in both modes.
[ 2.842718] BTRFS: device label home devid 1 transid 277952 /dev/vdb [ 2.924965] BTRFS: device label root devid 1 transid 84092 /dev/vda2 [ 3.072271] BTRFS info (device vda2): use lzo compression, level 0 [ 3.072897] BTRFS info (device vda2): enabling auto defrag [ 3.073476] BTRFS info (device vda2): using free space tree [ 3.074049] BTRFS info (device vda2): has skinny extents [ 5.411821] BTRFS info (device vda2): using free space tree [ 24.925293] BTRFS info (device vdb): using free space tree [ 24.925324] BTRFS info (device vdb): has skinny extents [ 31.711868] BTRFS info (device vdb): continuing balance [ 31.721658] BTRFS info (device vdb): checking UUID tree [ 31.822920] BTRFS info (device vdb): relocating block group 69889687552flags data [ 33.730399] BTRFS info (device vdb): found 12 extents [ 36.950699] BTRFS info (device vdb): found 12 extents [ 37.030813] BTRFS info (device vdb): relocating block group 67742203904flags metadata|dup [ 37.104174] BTRFS info (device vdb): relocating block group 67708649472 flags system|dup [ 37.189843] BTRFS info (device vdb): found 1 extents
pgppgUIF6oj1v.pgp
Description: OpenPGP digital signature