Re: system hangs due to qgroups

Marc Joliet Mon, 05 Dec 2016 01:10:06 -0800

On Sunday 04 December 2016 11:52:40 Chris Murphy wrote:
> On Sun, Dec 4, 2016 at 9:02 AM, Marc Joliet <mar...@gmx.de> wrote:
> > Also, now the file system fails with the BUG I mentioned, see here:
> > 
> > [Sun Dec  4 12:27:07 2016] BUG: unable to handle kernel paging request at
> > fffffffffffffe10
> > [Sun Dec  4 12:27:07 2016] IP: [<ffffffff8131226f>]
> > qgroup_fix_relocated_data_extents+0x1f/0x2a0
> > [Sun Dec  4 12:27:07 2016] PGD 1c07067 PUD 1c09067 PMD 0
> > [Sun Dec  4 12:27:07 2016] Oops: 0000 [#1] PREEMPT SMP
> > [Sun Dec  4 12:27:07 2016] Modules linked in: crc32c_intel serio_raw
> > [Sun Dec  4 12:27:07 2016] CPU: 0 PID: 370 Comm: mount Not tainted 4.8.11-
> > gentoo #1
> > [Sun Dec  4 12:27:07 2016] Hardware name: FUJITSU LIFEBOOK A530/FJNBB06,
> > BIOS Version 1.19   08/15/2011
> > [Sun Dec  4 12:27:07 2016] task: ffff8801b1d90000 task.stack:
> > ffff8801b1268000 [Sun Dec  4 12:27:07 2016] RIP:
> > 0010:[<ffffffff8131226f>]
> > [<ffffffff8131226f>] qgroup_fix_relocated_data_extents+0x1f/0x2a0
> > [Sun Dec  4 12:27:07 2016] RSP: 0018:ffff8801b126bcd8  EFLAGS: 00010246
> > [Sun Dec  4 12:27:07 2016] RAX: 0000000000000000 RBX: ffff8801b10b3150
> > RCX:
> > 0000000000000000
> > [Sun Dec  4 12:27:07 2016] RDX: ffff8801b20f24f0 RSI: ffff8801b2790800
> > RDI:
> > ffff8801b20f2460
> > [Sun Dec  4 12:27:07 2016] RBP: ffff8801b10bc000 R08: 0000000000020340
> > R09:
> > ffff8801b20f2460
> > [Sun Dec  4 12:27:07 2016] R10: ffff8801b48b7300 R11: ffffea0005dd0ac0
> > R12:
> > ffff8801b126bd70
> > [Sun Dec  4 12:27:07 2016] R13: 0000000000000000 R14: ffff8801b2790800
> > R15:
> > 00000000b20f2460
> > [Sun Dec  4 12:27:07 2016] FS:  00007f97a7846780(0000)
> > GS:ffff8801bbc00000(0000) knlGS:0000000000000000
> > [Sun Dec  4 12:27:07 2016] CS:  0010 DS: 0000 ES: 0000 CR0:
> > 0000000080050033 [Sun Dec  4 12:27:07 2016] CR2: fffffffffffffe10 CR3:
> > 00000001b12ae000 CR4: 00000000000006f0
> > [Sun Dec  4 12:27:07 2016] Stack:
> > [Sun Dec  4 12:27:07 2016]  0000000000000801 0000000000000801
> > ffff8801b20f2460 ffff8801b4aaa000
> > [Sun Dec  4 12:27:07 2016]  0000000000000801 ffff8801b20f2460
> > ffffffff812c23ed ffff8801b1d90000
> > [Sun Dec  4 12:27:07 2016]  0000000000000000 00ff8801b126bd18
> > ffff8801b10b3150 ffff8801b4aa9800
> > [Sun Dec  4 12:27:07 2016] Call Trace:
> > [Sun Dec  4 12:27:07 2016]  [<ffffffff812c23ed>] ?
> > start_transaction+0x8d/0x4e0
> > [Sun Dec  4 12:27:07 2016]  [<ffffffff81317913>] ?
> > btrfs_recover_relocation+0x3b3/0x440
> > [Sun Dec  4 12:27:07 2016]  [<ffffffff81292b2a>] ?
> > btrfs_remount+0x3ca/0x560 [Sun Dec  4 12:27:07 2016] 
> > [<ffffffff811bfc04>] ? shrink_dcache_sb+0x54/0x70 [Sun Dec  4 12:27:07
> > 2016]  [<ffffffff811ad473>] ? do_remount_sb+0x63/0x1d0 [Sun Dec  4
> > 12:27:07 2016]  [<ffffffff811c9953>] ? do_mount+0x6f3/0xbe0 [Sun Dec  4
> > 12:27:07 2016]  [<ffffffff811c918f>] ?
> > copy_mount_options+0xbf/0x170
> > [Sun Dec  4 12:27:07 2016]  [<ffffffff811ca111>] ? SyS_mount+0x61/0xa0
> > [Sun Dec  4 12:27:07 2016]  [<ffffffff8169565b>] ?
> > entry_SYSCALL_64_fastpath+0x13/0x8f
> > [Sun Dec  4 12:27:07 2016] Code: 66 90 66 2e 0f 1f 84 00 00 00 00 00 41 57
> > 41 56 41 55 41 54 55 53 48 83 ec 50 48 8b 46 08 4c 8b 6e 10 48 8b a8 f0
> > 01 00 00 31 c0 <4d> 8b a5 10 fe ff ff f6 85 80 0c 00 00 01 74 09 80 be b0
> > 05 00 [Sun Dec  4 12:27:07 2016] RIP  [<ffffffff8131226f>]
> > qgroup_fix_relocated_data_extents+0x1f/0x2a0
> > [Sun Dec  4 12:27:07 2016]  RSP <ffff8801b126bcd8>
> > [Sun Dec  4 12:27:07 2016] CR2: fffffffffffffe10
> > [Sun Dec  4 12:27:07 2016] ---[ end trace bd51bbcfd10492f7 ]---
> 
> I can't parse this. Maybe someone else can. Do you get the same thing,
> or a different thing, if you do a normal mount rather than a remount?


The call trace is of course a bit different, but in both cases the RIP line is 
almost identical (if that even matters?).  Compare the line from my first 
message:

"RIP [<ffffffff8131226f>] qgroup_fix_relocated_data_extents+0x1f/0x2a8"

with the newest line:

"RIP [<ffffffff8131226f>] qgroup_fix_relocated_data_extents+0x1f/0x2a0"

But I just remembered, I have one from trying to mount the top-level subvolume 
on my desktop:

[So Dez  4 18:45:19 2016] BUG: unable to handle kernel paging request at 
fffffffffffffe10
[So Dez  4 18:45:19 2016] IP: [<ffffffff812f1103>] 
qgroup_fix_relocated_data_extents+0x33/0x2e0
[So Dez  4 18:45:19 2016] PGD 1a07067 PUD 1a09067 PMD 0 
[So Dez  4 18:45:19 2016] Oops: 0000 [#1] PREEMPT SMP
[So Dez  4 18:45:19 2016] Modules linked in: joydev dummy iptable_filter 
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj snd_hda_codec_hdmi 
snd_hda_codec_analog snd_hda_codec_generic uvcvideo videobuf2_vmalloc 
videobuf2_memops videobuf2_v4l2 videobuf2_core videodev snd_usb_audio 
snd_hwdep snd_usbmidi_lib radeon i2c_algo_bit drm_kms_helper cfbfillrect 
syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea kvm_amd 
kvm ttm irqbypass evdev drm k8temp backlight snd_ice1724 snd_ak4113 snd_pt2258 
snd_hda_intel snd_i2c snd_ak4114 snd_hda_codec snd_ac97_codec snd_hda_core 
ac97_bus snd_ice17xx_ak4xxx snd_ak4xxx_adda snd_rawmidi snd_seq_device snd_pcm 
forcedeth snd_timer snd rtc_cmos asus_atk0110 i2c_nforce2 i2c_core sg sr_mod 
cdrom xhci_pci ata_generic ohci_pci xhci_hcd pata_amd pata_acpi ohci_hcd 
ehci_pci
[So Dez  4 18:45:19 2016]  ehci_hcd
[So Dez  4 18:45:19 2016] CPU: 1 PID: 8545 Comm: mount Not tainted 4.8.12-
gentoo #1
[So Dez  4 18:45:19 2016] Hardware name: System manufacturer System Product 
Name/M2N-E, BIOS ASUS M2N-E ACPI BIOS Revision 1701 10/30/2008
[So Dez  4 18:45:19 2016] task: ffff88003d0a2100 task.stack: ffff88011a5e0000
[So Dez  4 18:45:19 2016] RIP: 0010:[<ffffffff812f1103>]  [<ffffffff812f1103>] 
qgroup_fix_relocated_data_extents+0x33/0x2e0
[So Dez  4 18:45:19 2016] RSP: 0018:ffff88011a5e3a38  EFLAGS: 00010246
[So Dez  4 18:45:19 2016] RAX: 0000000000000000 RBX: ffff88007d3d8690 RCX: 
0000000000000000
[So Dez  4 18:45:19 2016] RDX: ffff880138be2ef0 RSI: ffff88007c3e3800 RDI: 
ffff880138be2e60
[So Dez  4 18:45:19 2016] RBP: ffff88007b69a000 R08: 000000000001f0d0 R09: 
ffff880138be2e60
[So Dez  4 18:45:19 2016] R10: ffff88007d3d82a0 R11: ffffea0001f4f600 R12: 
ffff88011a5e3ad0
[So Dez  4 18:45:19 2016] R13: 0000000000000000 R14: ffff88007c3e3800 R15: 
0000000038be2e60
[So Dez  4 18:45:19 2016] FS:  00007fe24229c780(0000) 
GS:ffff88013fc80000(0000) knlGS:0000000000000000
[So Dez  4 18:45:19 2016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[So Dez  4 18:45:19 2016] CR2: fffffffffffffe10 CR3: 000000010b3e8000 CR4: 
00000000000006e0
[So Dez  4 18:45:19 2016] Stack:
[So Dez  4 18:45:19 2016]  0000000000000801 0000000000000801 ffff880138be2e60 
ffff88005b94f000
[So Dez  4 18:45:19 2016]  0000000000000801 ffff880138be2e60 ffffffff8129dd8c 
ffff88003d0a2100
[So Dez  4 18:45:19 2016]  0000000000000000 00ff88011a5e3a78 ffff88007d3d8690 
ffff88005b94a800
[So Dez  4 18:45:19 2016] Call Trace:
[So Dez  4 18:45:19 2016]  [<ffffffff8129dd8c>] ? start_transaction+0x8c/0x4e0
[So Dez  4 18:45:19 2016]  [<ffffffff812f69df>] ? 
btrfs_recover_relocation+0x3bf/0x440
[So Dez  4 18:45:19 2016]  [<ffffffff8129a562>] ? open_ctree+0x2182/0x26a0
[So Dez  4 18:45:19 2016]  [<ffffffff81383769>] ? snprintf+0x39/0x40
[So Dez  4 18:45:19 2016]  [<ffffffff8126d362>] ? btrfs_mount+0xd32/0xe40
[So Dez  4 18:45:19 2016]  [<ffffffff81132e41>] ? pcpu_alloc+0x321/0x610
[So Dez  4 18:45:19 2016]  [<ffffffff81179915>] ? mount_fs+0x45/0x180
[So Dez  4 18:45:19 2016]  [<ffffffff811941e1>] ? vfs_kern_mount+0x71/0x130
[So Dez  4 18:45:19 2016]  [<ffffffff8126c7c2>] ? btrfs_mount+0x192/0xe40
[So Dez  4 18:45:19 2016]  [<ffffffff81132e41>] ? pcpu_alloc+0x321/0x610
[So Dez  4 18:45:19 2016]  [<ffffffff81179915>] ? mount_fs+0x45/0x180
[So Dez  4 18:45:19 2016]  [<ffffffff811941e1>] ? vfs_kern_mount+0x71/0x130
[So Dez  4 18:45:19 2016]  [<ffffffff81196df5>] ? do_mount+0x1e5/0xc00
[So Dez  4 18:45:19 2016]  [<ffffffff8117264f>] ? 
__check_object_size+0x13f/0x1ed
[So Dez  4 18:45:19 2016]  [<ffffffff8112eaa3>] ? memdup_user+0x53/0x80
[So Dez  4 18:45:19 2016]  [<ffffffff81197af5>] ? SyS_mount+0x75/0xc0
[So Dez  4 18:45:19 2016]  [<ffffffff8160275b>] ? 
entry_SYSCALL_64_fastpath+0x13/0x8f
[So Dez  4 18:45:19 2016] Code: 50 48 89 6c 24 58 4c 89 64 24 60 4c 89 6c 24 
68 4c 89 74 24 70 4c 89 7c 24 78 48 8b 46 08 4c 8b 6e 10 48 8b a8 f0 01 00 00 
31 c0 <4d> 8b a5 10 fe ff ff f6 85 80 0c 00 00 01 74 09 80 be b0 05 00 
[So Dez  4 18:45:19 2016] RIP  [<ffffffff812f1103>] 
qgroup_fix_relocated_data_extents+0x33/0x2e0
[So Dez  4 18:45:19 2016]  RSP <ffff88011a5e3a38>
[So Dez  4 18:45:19 2016] CR2: fffffffffffffe10
[So Dez  4 18:45:19 2016] ---[ end trace 5c46f8b4f82b998c ]---

> > Ah, but what does work is mounting a snapshot, in the sense that mount
> > doesn't fail.  However, it seems that the balance still continues, so I'm
> > back at square one.
> 
> Interesting that mounting a subvolume directly works, seeing as that's
> just a bind mount behind the scene. But maybe there's something wrong
> in the top level subvolume that's being skipped when mounting a
> subvolume directly.

Yeah, however, just to be clear, that's a new problem, and one that might be 
temporary (I had the same subvolume fail, too, only to "work" the next try).

> Are you mounting with skip_balance mount option?

When I try that, I get the BUGs mentioned above.  When I'm at the point where 
mount "works", but hangs, I do *not* use it.

> And how do you know
> that it's a balance continuing? What do you get for 'btrfs balance
> status' for this volume? Basically I'm asking if you're sure there's a
> balance happening.

I'm not 100% positive, but the behaviour I'm seeing matches my previous 
experience.  At least, I vaguely remember being in a similar situation with 
the laptop back when I first noticed the performance issues, but I can't 
remember any details.  I *think* the balance finished in a rescue shell, too, 
after an hour or so (it was only a data balance, I think).

> The balance itself is not bad, it's just that it
> slows everything down astronomically. That's the main reason why you'd
> like to skip it or cancel it. Instead of balancing it might be doing
> some sort of cleanup. Either 'top' or 'perf top' might give a clue
> what's going on if 'btrfs balance status' doesn't show a balance is
> happening, and yet the drive is super busy as if a balance is
> happening.

Sadly, "balance status" won't work, because it operates on mounted file 
systems, and mount never finishes.  So "balance cancel" won't work, either.

Also, I can't just watch drive activity, because the symptom of the 
performance issue is that there is none for a long time, with one or more 
intermittent bursts of activity, during which the system becomes (partially) 
responsive, at least temporarily.

But thanks for reminding me of perf top.  Is there any particularly 
recommended invocation, e.g., specific events it should watch?

Actually, I just randomly tried "perf top -e btrfs:*", which actually works :D 
.  I see, amongst others:

253 btrfs:btrfs_qgroup_release_data
1K btrfs:btrfs_qgroup_free_delayed_ref

Sadly, there do not seem to be any balance related events.

Also, I've been running "dstat -df" the entire time, and it shows *no* 
activity whatsoever going on.

> Also, if you boot from alternate media, scrub resuming should not
> happen because the progress file for scrub is in /var/lib/btrfs, there
> is no metadata on the Btrfs volume itself that indicates it's being
> scrubbed or what the progress is.
>
> > Well, btrfs check came back clean.  And as mentioned above, I was able to
> > get two images, but with btrfs-progs 4.7.3 (the version in sysrescuecd). 
> > I can get different images from the initramfs (which I didn't think of
> > earlier, sorry).
> 
> 'btrfs check' using btrfs-progs 4.8.2 or higher came back clean? That
> sounds like a bug. You're having quota related problems (at least it's
> a contributing factor) but btrfs check says clean, while the kernel is
> getting confused. So either 'btrfs check' is correct that there are no
> problems, and there's a kernel bug resulting in confusion; or the
> check is missing something, and that's why the kernel is mishandling
> it. In either case, there's a kernel bug.

Yeah, I was sure it was a bug when I first had the laptop hang for 30 minutes 
for no apparent reason ;) .  But I see what you mean.  Whether it's a balance 
or not, *something* is going on, though, as per the perf events shown above 
(none of my other btrfs file systems have quota support enabled, so it must be 
from the laptop drive).

> So yeah for sure you'll want a sanitized btrfs-image captured for the
> developers to look at; put it somewhere like Google Drive or wherever
> they can grab it. And put the URL in this thread, and/or also file a
> bug about this problem with the URL to the image included.

OK, I'll post the URLs once the images are uploaded.  (I had Dropbox public 
URLs right before my desktop crashed -- see below -- but now dropbox-cli 
doesn't want to create them.)

> Looking at 4.9 there's not many qgroup.c changes, but there's a pile
> of other changes, per usual. So even though the problem seems like
> it's qgroup related, it might actually be some other problem that then
> also triggers qgroup messages.

Yeah, Qu responded in the meantime, and there are fixes available, but not 
merged yet.

Also: I would have sent this mail yesterday, but my desktop hung in the same 
way my laptop hangs, but it never recovered.  I waited until this morning, but 
it was still unresponsive.  After forcing a reboot, the logs showed nothing, 
i.e., they stopped around the time of the hang.  But this time, running "mount 
-o subvol=/.snapshots/rootfs.201611260202,skip_balance /dev/sdd1 /mnt/tmp" 
eventually resulted in a recursive fault (I didn't have a camera with me, 
sorry).  Now I've rebooted normally, and my Email draft was still intact 
*phew*.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

signature.asc
Description: This is a digitally signed message part.

Re: system hangs due to qgroups

Reply via email to