Re: system hangs due to qgroups

Marc Joliet Sun, 04 Dec 2016 08:03:09 -0800

OK, so I tried a few things, to now avail, more below.

On Saturday 03 December 2016 15:56:45 Chris Murphy wrote:
> On Sat, Dec 3, 2016 at 2:46 PM, Marc Joliet <mar...@gmx.de> wrote:
> > On Saturday 03 December 2016 13:42:42 Chris Murphy wrote:
> >> On Sat, Dec 3, 2016 at 11:40 AM, Marc Joliet <mar...@gmx.de> wrote:
> >> > Hello all,
> >> > 
> >> > I'm having some trouble with btrfs on a laptop, possibly due to
> >> > qgroups.
> >> > Specifically, some file system activities (e.g., snapshot creation,
> >> > baloo_file_extractor from KDE Plasma) cause the system to hang for up
> >> > to
> >> > about 40 minutes, maybe more.
> >> 
> >> Do you get any blocked tasks kernel messages? If so, issue sysrq+w
> >> during the hang, and then check the system log (dmesg may not contain
> >> everything if the command fills the message buffer). If it's a hang
> >> without any kernel messages, then issue sysrq+t.
> >> 
> >> https://www.kernel.org/doc/Documentation/sysrq.txt
> > 
> > As it's a rescue shell, I have only the one shell AFAIK, and it's occupied
> > by mount.  So I can't tell if there are dmesg entries, however, when this
> > happens during a normal running system, I never saw any dmesg entries. 
> > Anyway, I ran both.
> 
> OK so this is root fs? I would try to work on it from another volume.
> An advantage of openSUSE Tumbleweed is they claim to fully support
> qgroups, where upstream uses much more guarded language about its
> stability.
> 
> Whereas last night's Fedora Rawhide has kernel 4.9-rc7 and btrfs-progs
> 4.8.5.
> https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161203.
> n.0/compose/Workstation/x86_64/iso/Fedora-Workstation-netinst-x86_64-Rawhide
> -20161203.n.0.iso
> 
> You can use dd to write the ISO to a USB stick, it supports BIOS and
> UEFI and Secure Boot.
> 
> Troubleshooting > Rescue a Fedora system > option 3 to get to a shell
> The sysrq+t and sysrq+w can be written out in their entirety with
> monotonic time using 'journalctl -b -k -o short-monotonic >
> kernelmessages.log'
> 
> Unfortunately this is not a live system, so you can't (as far as I
> know) install script to more easily capture everything to a single
> file; 'btrfs check <dev> > btrfscheck.log' should capture most of the
> output, but it misses a few early lines for some reason.
> 
> And then scp those files to another system, or mount another stick and
> copy locally.


That's a good idea, although I'll probably start with sysrescuecd (Linux 4.8.5 
and btrfs-progs 4.7.3), as I already have experience with it.

[After trying it]

Well, crap, I was able to get images of the file system (one sanitized), but 
mounting always fails with "device or resource busy" (with no corresponding 
dmesg output).  (Also, that drive's partitions weren't discovered on bootup, I 
had to run partprobe first.)  I never see that in the initramfs, so I'm not 
sure what's causing that.

Also, now the file system fails with the BUG I mentioned, see here:

[Sun Dec  4 12:27:07 2016] BUG: unable to handle kernel paging request at 
fffffffffffffe10
[Sun Dec  4 12:27:07 2016] IP: [<ffffffff8131226f>] 
qgroup_fix_relocated_data_extents+0x1f/0x2a0
[Sun Dec  4 12:27:07 2016] PGD 1c07067 PUD 1c09067 PMD 0 
[Sun Dec  4 12:27:07 2016] Oops: 0000 [#1] PREEMPT SMP
[Sun Dec  4 12:27:07 2016] Modules linked in: crc32c_intel serio_raw
[Sun Dec  4 12:27:07 2016] CPU: 0 PID: 370 Comm: mount Not tainted 4.8.11-
gentoo #1
[Sun Dec  4 12:27:07 2016] Hardware name: FUJITSU LIFEBOOK A530/FJNBB06, BIOS 
Version 1.19   08/15/2011
[Sun Dec  4 12:27:07 2016] task: ffff8801b1d90000 task.stack: ffff8801b1268000
[Sun Dec  4 12:27:07 2016] RIP: 0010:[<ffffffff8131226f>]  
[<ffffffff8131226f>] qgroup_fix_relocated_data_extents+0x1f/0x2a0
[Sun Dec  4 12:27:07 2016] RSP: 0018:ffff8801b126bcd8  EFLAGS: 00010246
[Sun Dec  4 12:27:07 2016] RAX: 0000000000000000 RBX: ffff8801b10b3150 RCX: 
0000000000000000
[Sun Dec  4 12:27:07 2016] RDX: ffff8801b20f24f0 RSI: ffff8801b2790800 RDI: 
ffff8801b20f2460
[Sun Dec  4 12:27:07 2016] RBP: ffff8801b10bc000 R08: 0000000000020340 R09: 
ffff8801b20f2460
[Sun Dec  4 12:27:07 2016] R10: ffff8801b48b7300 R11: ffffea0005dd0ac0 R12: 
ffff8801b126bd70
[Sun Dec  4 12:27:07 2016] R13: 0000000000000000 R14: ffff8801b2790800 R15: 
00000000b20f2460
[Sun Dec  4 12:27:07 2016] FS:  00007f97a7846780(0000) 
GS:ffff8801bbc00000(0000) knlGS:0000000000000000
[Sun Dec  4 12:27:07 2016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Dec  4 12:27:07 2016] CR2: fffffffffffffe10 CR3: 00000001b12ae000 CR4: 
00000000000006f0
[Sun Dec  4 12:27:07 2016] Stack:
[Sun Dec  4 12:27:07 2016]  0000000000000801 0000000000000801 ffff8801b20f2460 
ffff8801b4aaa000
[Sun Dec  4 12:27:07 2016]  0000000000000801 ffff8801b20f2460 ffffffff812c23ed 
ffff8801b1d90000
[Sun Dec  4 12:27:07 2016]  0000000000000000 00ff8801b126bd18 ffff8801b10b3150 
ffff8801b4aa9800
[Sun Dec  4 12:27:07 2016] Call Trace:
[Sun Dec  4 12:27:07 2016]  [<ffffffff812c23ed>] ? 
start_transaction+0x8d/0x4e0
[Sun Dec  4 12:27:07 2016]  [<ffffffff81317913>] ? 
btrfs_recover_relocation+0x3b3/0x440
[Sun Dec  4 12:27:07 2016]  [<ffffffff81292b2a>] ? btrfs_remount+0x3ca/0x560
[Sun Dec  4 12:27:07 2016]  [<ffffffff811bfc04>] ? shrink_dcache_sb+0x54/0x70
[Sun Dec  4 12:27:07 2016]  [<ffffffff811ad473>] ? do_remount_sb+0x63/0x1d0
[Sun Dec  4 12:27:07 2016]  [<ffffffff811c9953>] ? do_mount+0x6f3/0xbe0
[Sun Dec  4 12:27:07 2016]  [<ffffffff811c918f>] ? 
copy_mount_options+0xbf/0x170
[Sun Dec  4 12:27:07 2016]  [<ffffffff811ca111>] ? SyS_mount+0x61/0xa0
[Sun Dec  4 12:27:07 2016]  [<ffffffff8169565b>] ? 
entry_SYSCALL_64_fastpath+0x13/0x8f
[Sun Dec  4 12:27:07 2016] Code: 66 90 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 
56 41 55 41 54 55 53 48 83 ec 50 48 8b 46 08 4c 8b 6e 10 48 8b a8 f0 01 00 00 
31 c0 <4d> 8b a5 10 fe ff ff f6 85 80 0c 00 00 01 74 09 80 be b0 05 00 
[Sun Dec  4 12:27:07 2016] RIP  [<ffffffff8131226f>] 
qgroup_fix_relocated_data_extents+0x1f/0x2a0
[Sun Dec  4 12:27:07 2016]  RSP <ffff8801b126bcd8>
[Sun Dec  4 12:27:07 2016] CR2: fffffffffffffe10
[Sun Dec  4 12:27:07 2016] ---[ end trace bd51bbcfd10492f7 ]---

The main difference is that I remounted rw instead of unmounting and mounting 
again.  In any case, my hope was to mount the file system from the live 
medium, then cancel the scrub from another terminal window.

Ah, but what does work is mounting a snapshot, in the sense that mount doesn't 
fail.  However, it seems that the balance still continues, so I'm back at 
square one.

> > Should I take photos?  That'll be annoying to do with all the scrolling,
> > but I can do that if need be.
> 
> I can't decipher it anyway, it's mainly for a dev who wanders across
> this thread or if you file a bug report. But you can get the complete
> output using the method above.

Alright, I can try the fedora image now that sysrescuecd is a dead end.  I can 
also try to insert the SSD in my desktop (it's a SATA device IIRC).

Oh, and I was wrong: the initramfs rescue shell *does* show dmesg output as it 
comes along, as I witnessed when inserting a USB stick.

> >> > After I next turned on the laptop, the balance resumed, causing bootup
> >> > to
> >> > fail, after which I remembered about the skip_balance mount option,
> >> > which
> >> > I
> >> > tried in a rescue shell from an initramfs.
> >> 
> >> The file system is the root filesystem? If so, skip_balance may not be
> >> happening soon enough. Use kernel parameter rootflags=skip_balance
> >> which will apply this mount option at the very first moment the file
> >> system is mounted during boot.
> > 
> > Yes, it's the root file system (there's that plus a swap partition).  I
> > believe I tried rootflags, but I think it also failed, which is why I'm
> > using a rescue shell now.  I can try it again, though, if anybody thinks
> > that there's no point in waiting, especially if btrfs_scrub_pause in the
> > btrfs- transaction call trace is significant.
> 
> It sounds like it's resuming a scrub. That won't happen if you boot
> from an alternate volume. There's a scrub file found at
> /var/lib/btrfs/ that tracks the progress of scrubs for each btrfs
> volume - that directory with an inprogress scrub for your file system
> is actually in the directory on that file system. If you haven't had
> luck with btrfs scrub cancel, you can just remove the files in that
> directory when you get a chance to rw mount the volume.

OK, I did try again with rootflags=skip_balance, then remounting 
rw,skip_balance, but that also fails, as expected.  If mount ever returned I 
probably wouldn't have to remove those files, though ;) .

> >> > Since I couldn't use skip_balance, and logically can't destroy qgroups
> >> > on
> >> > a
> >> > read-only file system, I decided to wait for a regular mount to finish.
> >> > That has been running since Tuesday, and I am slowly growing impatient.
> >> 
> >> Haha, no kidding! I think that's very patient.
> > 
> > Heh :) . I've still got my main desktop (as ancient as it may be), so I'm
> > content with waiting for now, but I don't want to wait forever, especially
> > if there might not even be a point.
> 
> How big is the file system? Sounds like it's a single device volume on
> a laptop so I'm guessing at most 1TB, and that'd mean at most 100GiB
> of metadata, which should mean around 15 minutes max to completely
> read and process all the metadata, and maybe a few hours to do a
> scrub. I'd bail after a few hours for sure.

It's only 108 GB.  I'm tolerating this low performance because it seems to me 
that it is tied to the same hangs I get at regular system run-time.

[...]
> >> > Also, should I be able to avoid reformatting: how do I properly disable
> >> > quota support?
> >> 
> >> 'btrfs quota disable' is the only command that applies to this and it
> >> requires rw mount; there's no 'noquota' mount option.
> > 
> > OK, thanks.
> > 
> > So what should I try next?  I'm sick at home, so I can spend more time on
> > this than usual.
> 
> Well if it were me I'd use btrfs check to see what state it thinks the
> file system is in. And then I'd do btrfs image to make a copy of the
> filesystem metadata both for the devs and also in case the next things
> make the problem worse, in theory the fs can be restored (or you can
> setup an overlay  if you prefer).

Well, btrfs check came back clean.  And as mentioned above, I was able to get 
two images, but with btrfs-progs 4.7.3 (the version in sysrescuecd).  I can 
get different images from the initramfs (which I didn't think of earlier, 
sorry).

> And then I'd mount normally, possibly with skip_balance. Capture
> sysrq+t or +w or both. And then see if things get more sane if you
> disable quotas. If not, then I'd see if it'll tolerate 'btrfs qgroup
> destroy' on a few subvolumes. I'd basically use destroy and remove to
> wipe away all the quotas - I don't know off hand if quotas needs to be
> enabled for qgroup remove/destroy to work so you'll have to figure
> that out. And it might take a while for the command to complete, but
> I'd like to believe as you wipe away the qgroups, whatever qgroup
> related kernel accounting is happening will eventually stop.

skip_balance always fails.  The rest sounds good, though, but I'll have to get 
a live system to mount the FS.

> It sounds to me like there may be some legacy qgroup confusion going
> on, but I haven't tested this much at all, so you're kinda on the
> bleeding edge.

OK

I think I'll try mounting the SSD in my desktop first, then I'll try the 
fedora image.  Perhaps its newer kernel will help.

Thanks
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

signature.asc
Description: This is a digitally signed message part.

Re: system hangs due to qgroups

Reply via email to