On Monday 05 December 2016 08:39:02 Qu Wenruo wrote: > At 12/04/2016 02:40 AM, Marc Joliet wrote: > > Hello all, > > > > I'm having some trouble with btrfs on a laptop, possibly due to qgroups. > > Specifically, some file system activities (e.g., snapshot creation, > > baloo_file_extractor from KDE Plasma) cause the system to hang for up to > > about 40 minutes, maybe more. It always causes (most of) my desktop to > > hang, (although I can usually navigate between pre-existing Konsole tabs) > > and prevents new programs from starting. I've seen the system load go up > > to >30 before the laptop suddenly resumes normal operation. I've been > > seeing this since Linux 4.7, maybe already 4.6. > > Qgroup is CPU intensive operation. > > The main problem is the design of btrfs extent tree, which bias towards > snapshot creating speed, but quite complicated if used for tracing all > referencer (which qgroup heavily relies on it). > > > The main factor affecting qgroup speed, is how many shared extents are > in the fs. > This including reflinked files and snapshot, under most case snapshot is > the main part. > > Unless we find a better solution, to keep both qgroup accurate and fast, > I'd recommend to keep qgroup under a reasonable number. > (Personally speaking, 10 would be good) > > Despite the qgroup, relocation(balancing) should also be affected by the > number of shared extents.
OK > > Now, I thought that maybe this was (indirectly) due to an overly full file > > system (~90% full), so I deleted some things I didn't need to get it up to > > 15% free. (For the record, I also tried mounting with ssd_spread.) > > After that, I ran a balance with -dusage=50, which started out promising, > > but then went back to the "bad" behaviour. *But* it seemed better than > > before overall, so I started a balance with -musage=10, then -musage=50. > > That turned out to be a mistake. Since I had to transport the laptop, > > and couldn't wait for "balance cancel" to return (IIUC it only returns > > after the next block (group?) is freed), I forced the laptop off. > > > > After I next turned on the laptop, the balance resumed, causing bootup to > > fail, after which I remembered about the skip_balance mount option, which > > I > > tried in a rescue shell from an initramfs. But wait, that failed, too! > > Specifically, the stack trace I get whenever I try it includes as one of > > the last lines: > > > > "RIP [<ffffffff8131226f>] qgroup_fix_relocated_data_extents+0x1f/0x2a8" > > This seems to be a NULL pointer bug in qgroup relocation fix. > > The latest fix (not merged yet) should address it. > > You could try the for-next-20161125 branch from David to fix it: > https://github.com/kdave/btrfs-devel/tree/for-next-20161125 OK, I'll try that, thanks! I just have to wait for it to finish cloning... > > (I can take photos of the full stack trace if requested.) > > > > So then I ran "btrfs qgroup show /sysroot/", which showed many quota > > groups, much to my surprise. On the upside, at least now I discovered > > the likely reason for the performance problems. > > So, the number of qgroups is the cause for the slowness. OK > > (I actually think I know why I'm seeing qgroups: at one point I was trying > > out various snapshot/backup tools for btrfs, and one (I forgot which) > > unconditionally activated quota support, which infuriated me, but I > > promptly deactivated it, or so I thought. Is quota support automatically > > enabled when qgroups are discovered, or did I perhaps not disable quota > > support properly?) > Qgroup will always be enabled after "btrfs quota enable", and until > "btrfs quota disable" to disable it. > > No method to temporarily disable quota, since quota must trace any > modification, or qgroup number will be out of true. > > So, one should manually disable quota. > (And that's the backup tool to blame, it should either info user or > disable qgroup on uninstallation) Hmm, I must not be remembering the whole story then, because I was pretty sure that I ran "quota disable" and verified that quotas were off, too, but then again, it's been quite a while now (a year?) since it happened. > > Since I couldn't use skip_balance, and logically can't destroy qgroups on > > a > > read-only file system, I decided to wait for a regular mount to finish. > > That has been running since Tuesday, and I am slowly growing impatient. > > > > Thus I arrive at my question(s): is there anything else I can try, short > > of > > reformatting and restoring from backup? Can I use btrfs-check here, or > > any > > other tool? Or...? > > > > Also, should I be able to avoid reformatting: how do I properly disable > > quota support? > > "btrfs quota disable <mnt>", yes you need RW mount. > Any RW mountable snapshot/subvolume is OK. OK > > (BTW, searching for qgroup_fix_relocated_data_extents turned up the ML > > thread "[PATCH] Btrfs: fix endless loop in balancing block groups", could > > that be related?) > > Nope, the actual fixing patches are: > [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works > [PATCH 2/4] btrfs: qgroup: Rename functions to make it follow > reserve,trace,account steps > [PATCH 3/4] btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c > [PATCH 4/4] btrfs: qgroup: Fix qgroup data leaking by using subtree tracing > > > The 4th patch is the real working one, but relies on previous 3 to apply. > > The regression is also caused by my patch: > [PATCH v3.1 2/3] btrfs: relocation: Fix leaking qgroups numbers on data > extents > > Sorry for the trouble. No problem, I just wish I would've thought to check for qgroups before getting into this mess. Although I'm actually *relieved* that it's qgroups, because before that I was worried that I had finally hit a nigh-show-stopping bug. I thought that I was merely not seeing it on my other systems, but that it could happen at any time. Now I'm more confident in the stability of my systems again :) . > And for your recovery, I'd suggest to install an Archlinux into a USB > HDD or USB stick, and compile David's branch and install it into the USB > HDD. > > Then use the USB storage as rescue tool to mount the fs, which should do > RW mount with or without skip_balance mount option. > So you could disable quota then. OK, I'll try that, thanks! > Thanks, > Qu > > > The laptop is currently running Gentoo with Linux 4.8.10 and btrfs-progs > > 4.8.4. > > > > Greetings > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Greetings -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup
signature.asc
Description: This is a digitally signed message part.