On Mon, May 6, 2019 at 10:22 AM Otto Kekäläinen <o...@seravo.fi> wrote:
>
> Logs have the output below. How shall I read it and debug this situation?
> What are the next steps I could test/debug?
>
>
> kernel: BTRFS info (device dm-9): disk space caching is enabled
> kernel: BTRFS: has skinny extents
> kernel: BTRFS: checking UUID tree
> kernel: BTRFS info (device dm-9): relocating block group 13693423976448 flags 
> 20
> kernel: INFO: task btrfs:2918 blocked for more than 120 seconds.
> kernel:       Not tainted 4.4.0-146-generic #172-Ubuntu

Old kernel, a developer may not reply. This list is for upstream
development so the normal recommendation is to try a newer kernel and
see if the problem still happens. If it still happens, it's still a
bug. If it doesn't happen, probably has been fixed in a newer kernel
but hard to say which one without deep Btrfs knowledge of all the
chances since 4.4 which really is a long time ago, tens of thousands
of commits have happened since then.

If you don't want to try a new kernel you can try the mount option
skip_balance and see if that prevents the balance from resuming. If so
you can 'btrfs balance cancel' it, and then I suggest not doing
another balance. Update your backups at this point in case the problem
with the file system gets worse.

If you are booting from this file system, you can use
'rootflags=skip_balance' as the boot parameter in the bootloader;
mount options are separated by comma and no space.

>From the call trace, it's clearly stuck in the balance. I'd say it is
a bug but I don't know why it happens or if there is any work around
other than just using a much newer kernel.

I'm not familiar with the Ubuntu build system and how to get a
rescue/install image with a very new kernel. But I just tested this
image and it works:
https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190506.n.1/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20190506.n.1.iso

You can dd that to a USB stick and boot either BIOS or UEFI firmware
computers. At the bootloader choose Troubleshoot..., and then the
Rescue... option. That will get you to a menu where you can just drop
to a shell, option 3. That kernel is 5.1rc7+ so it's something in
between 5.1rc7 and 5.1.0, and is based on git ea9866793d1e.

>From here you can do a mount. If you previously cancelled the balance,
you can use 'btrfs balance resume' to resume, and see if it runs into
the same problem. If it does, then yeah it's still a bug, if not maybe
it's fixed. But again, I'm not sure if there is a backport of the fix
to older kernels or what version the backport is found in.



-- 
Chris Murphy

Reply via email to