On 2018年04月23日 13:08, Dmitrii Tcvetkov wrote: > On Mon, 23 Apr 2018 09:23:53 +0800 > Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > >> On 2018年04月21日 22:55, Dmitrii Tcvetkov wrote: >>> TL;DR It seems as regression in 4.17, but I managed to find a >>> workaround to make filesystem rw mountable again. >>> >>> Kernel built from tag v4.17-rc1 >>> btrfs-progs 4.16 >>> >>> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were >>> doing usual weekly balance with this command via cron: >>> btrfs balance start -musage=50 -dusage=50 <mountpoint> >>> Both machines run same kernel version. >>> >>> On PC that caused root and "data" filesystems to go readonly. Root >>> is on an SSD with data single and metadata DUP, "data" filesystem >>> is on 2 HDDs with RAID1 for data and metadata. >>> >>> On laptop only /home went ro, it's on NVMe SSD with data single and >>> metadata DUP. >>> >>> Btrfs check of PC rootfs was without any errors in both modes, I did >>> them once each before reboot on readonly filesystem with --force >>> flag and then from live usb. Same output without any errors. >>> >>> After reboot kernel refused rw mount rootfs with the same error as >>> during cron balance, ro mount was accepted, error during rw mount: >>> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117 > >> 117 means EUCLEAN, which could be caused by the newly introduced >> first_key and level check. > >> Please apply this hotfix to fix it. >> btrfs: Only check first key for committed tree blocks >> (Which is included in latest pull request) > >> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra >> debug info. > >> Thanks, >> Qu > > I tried 4.17-rc2 (as the pull request was pulled) with > CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb) > in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg > attached.
Thanks for the info and your previous btrfs-image. The image itself shows nothing wrong, so it should be runtime problem. Would you please apply these two debug patches? https://patchwork.kernel.org/patch/10335133/ https://patchwork.kernel.org/patch/10335135/ And the attached diff file? My guess is the parent node is not initialized correctly in this case. Thanks, Qu
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 60caa68c3618..79f482578e02 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -458,6 +458,7 @@ static int verify_level_key(struct btrfs_fs_info *fs_info, eb->start, first_key->objectid, first_key->type, first_key->offset, found_key.objectid, found_key.type, found_key.offset); + btrfs_print_tree(eb, false); } #endif return ret; diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 00b7d3231821..cde0cb6c9786 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1870,6 +1870,8 @@ int replace_path(struct btrfs_trans_handle *trans, level - 1, &first_key); if (IS_ERR(eb)) { ret = PTR_ERR(eb); + btrfs_err(fs_info, "parent leaf, slot: %d:", slot); + btrfs_print_tree(parent, false); break; } else if (!extent_buffer_uptodate(eb)) { ret = -EIO;
signature.asc
Description: OpenPGP digital signature