On Monday 05 December 2016 08:39:02 Qu Wenruo wrote:
> At 12/04/2016 02:40 AM, Marc Joliet wrote:
> > Hello all,
> > 
> > I'm having some trouble with btrfs on a laptop, possibly due to qgroups.
> > Specifically, some file system activities (e.g., snapshot creation,
> > baloo_file_extractor from KDE Plasma) cause the system to hang for up to
> > about 40 minutes, maybe more.  It always causes (most of) my desktop to
> > hang, (although I can usually navigate between pre-existing Konsole tabs)
> > and prevents new programs from starting.  I've seen the system load go up
> > to >30 before the laptop suddenly resumes normal operation.  I've been
> > seeing this since Linux 4.7, maybe already 4.6.
> 
> Qgroup is CPU intensive operation.
> 
> The main problem is the design of btrfs extent tree, which bias towards
> snapshot creating speed, but quite complicated if used for tracing all
> referencer (which qgroup heavily relies on it).
> 
> 
> The main factor affecting qgroup speed, is how many shared extents are
> in the fs.
> This including reflinked files and snapshot, under most case snapshot is
> the main part.
> 
> Unless we find a better solution, to keep both qgroup accurate and fast,
> I'd recommend to keep qgroup under a reasonable number.
> (Personally speaking, 10 would be good)
> 
> Despite the qgroup, relocation(balancing) should also be affected by the
> number of shared extents.

OK

> > Now, I thought that maybe this was (indirectly) due to an overly full file
> > system (~90% full), so I deleted some things I didn't need to get it up to
> > 15% free.  (For the record, I also tried mounting with ssd_spread.) 
> > After that, I ran a balance with -dusage=50, which started out promising,
> > but then went back to the "bad" behaviour.  *But* it seemed better than
> > before overall, so I started a balance with -musage=10, then -musage=50. 
> > That turned out to be a mistake.  Since I had to transport the laptop,
> > and couldn't wait for "balance cancel" to return (IIUC it only returns
> > after the next block (group?) is freed), I forced the laptop off.
> > 
> > After I next turned on the laptop, the balance resumed, causing bootup to
> > fail, after which I remembered about the skip_balance mount option, which
> > I
> > tried in a rescue shell from an initramfs.  But wait, that failed, too!
> > Specifically, the stack trace I get whenever I try it includes as one of
> > the last lines:
> > 
> > "RIP [<ffffffff8131226f>] qgroup_fix_relocated_data_extents+0x1f/0x2a8"
> 
> This seems to be a NULL pointer bug in qgroup relocation fix.
> 
> The latest fix (not merged yet) should address it.
> 
> You could try the for-next-20161125 branch from David to fix it:
> https://github.com/kdave/btrfs-devel/tree/for-next-20161125

OK, I'll try that, thanks!  I just have to wait for it to finish cloning...

> > (I can take photos of the full stack trace if requested.)
> > 
> > So then I ran "btrfs qgroup show /sysroot/", which showed many quota
> > groups, much to my surprise.  On the upside, at least now I discovered
> > the likely reason for the performance problems.
> 
> So, the number of qgroups is the cause for the slowness.

OK

> > (I actually think I know why I'm seeing qgroups: at one point I was trying
> > out various snapshot/backup tools for btrfs, and one (I forgot which)
> > unconditionally activated quota support, which infuriated me, but I
> > promptly deactivated it, or so I thought.  Is quota support automatically
> > enabled when qgroups are discovered, or did I perhaps not disable quota
> > support properly?)
> Qgroup will always be enabled after "btrfs quota enable", and until
> "btrfs quota disable" to disable it.
> 
> No method to temporarily disable quota, since quota must trace any
> modification, or qgroup number will be out of true.
> 
> So, one should manually disable quota.
> (And that's the backup tool to blame, it should either info user or
> disable qgroup on uninstallation)

Hmm, I must not be remembering the whole story then, because I was pretty sure 
that I ran "quota disable" and verified that quotas were off, too, but then 
again, it's been quite a while now (a year?) since it happened.

> > Since I couldn't use skip_balance, and logically can't destroy qgroups on
> > a
> > read-only file system, I decided to wait for a regular mount to finish. 
> > That has been running since Tuesday, and I am slowly growing impatient.
> > 
> > Thus I arrive at my question(s): is there anything else I can try, short
> > of
> > reformatting and restoring from backup?  Can I use btrfs-check here, or
> > any
> > other tool?  Or...?
> > 
> > Also, should I be able to avoid reformatting: how do I properly disable
> > quota support?
> 
> "btrfs quota disable <mnt>", yes you need RW mount.
> Any RW mountable snapshot/subvolume is OK.

OK

> > (BTW, searching for qgroup_fix_relocated_data_extents turned up the ML
> > thread "[PATCH] Btrfs: fix endless loop in balancing block groups", could
> > that be related?)
> 
> Nope, the actual fixing patches are:
> [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works
> [PATCH 2/4] btrfs: qgroup: Rename functions to make it follow
> reserve,trace,account steps
> [PATCH 3/4] btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c
> [PATCH 4/4] btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
> 
> 
> The 4th patch is the real working one, but relies on previous 3 to apply.
> 
> The regression is also caused by my patch:
> [PATCH v3.1 2/3] btrfs: relocation: Fix leaking qgroups numbers on data
> extents
> 
> Sorry for the trouble.

No problem, I just wish I would've thought to check for qgroups before getting 
into this mess.

Although I'm actually *relieved* that it's qgroups, because before that I was 
worried that I had finally hit a nigh-show-stopping bug.  I thought that I was 
merely not seeing it on my other systems, but that it could happen at any 
time.  Now I'm more confident in the stability of my systems again :) .

> And for your recovery, I'd suggest to install an Archlinux into a USB
> HDD or USB stick, and compile David's branch and install it into the USB
> HDD.
> 
> Then use the USB storage as rescue tool to mount the fs, which should do
> RW mount with or without skip_balance mount option.
> So you could disable quota then.

OK, I'll try that, thanks!

> Thanks,
> Qu
> 
> > The laptop is currently running Gentoo with Linux 4.8.10 and btrfs-progs
> > 4.8.4.
> > 
> > Greetings
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to