Re: btrfs send hangs after partial transfer and blocks all IO

Jürgen Herrmann Thu, 13 Sep 2018 03:57:36 -0700

Both loops were started before the hang because after the hang I cannot dothat anymore. That's why there is progress in the logs at first. The hangcontinues for at least 1.5 hours. No data is transferred anymore duringthis time. I never waited longer than 1.5 hours.


Best regards,
Jürgen


Am 13. September 2018 12:50:59 schrieb Nikolay Borisov <nbori...@suse.com>:

On 13.09.2018 13:29, Jürgen Herrmann wrote:

Am 13.9.2018 10:40, schrieb Nikolay Borisov:

On 13.09.2018 11:34, Jürgen Herrmann wrote:

Hello!

I have a newly installed laptop running a freshly installed (abt. two
months ago) laptop running latest linux mint 19. Root filesystem is on a
1TB Samsung 860 M.2 SSD with btrfs on top of a LUKS encrypted 900G
partition. Timeshift-btrfs is enabled for root (@) and home (@home)
subvolumes. I want to transfer snapshots to a server with a separated
disk via "btrfs send" and ssh.

Here's the list of snapshot directories, each containing tow snapshots
for root and home:

drwxr-xr-x 1 root root 30 Sep 12 22:08 2018-08-16_20-00-01
drwxr-xr-x 1 root root 30 Aug 17 14:00 2018-08-17_14-00-02
drwxr-xr-x 1 root root 30 Aug 23 20:00 2018-08-23_20-00-01
drwxr-xr-x 1 root root 30 Aug 30 20:00 2018-08-30_20-00-01
drwxr-xr-x 1 root root 30 Sep  6 20:00 2018-09-06_20-00-01
drwxr-xr-x 1 root root 30 Sep  6 22:00 2018-09-06_22-00-01
drwxr-xr-x 1 root root 30 Sep  8 16:00 2018-09-08_16-00-01
drwxr-xr-x 1 root root 30 Sep 10 20:00 2018-09-10_20-00-02
drwxr-xr-x 1 root root 30 Sep 11 21:00 2018-09-11_21-00-02
drwxr-xr-x 1 root root 30 Sep 12 21:00 2018-09-12_21-00-01

"btrfs send
/mnt/timeshift/backup/timeshift-btrfs/snapshots/2018-08-16_20-00-01/@

/dev/null" results in the btrfs task taking 100% cpu time on one cpu

and then all IO is blocked -> only reboot can solve the hang.

The crash does not happen immediately, as i was on the road using
cellular connection it seemed fine at first. That's how I found out that
it transfers ~140MB of data before hanging. The snapshot is created on
the server and contains data (du shows abt 140MB).

I am running vanilla kernel 4.18.6 (compiled by myself) and btrfs progs
4.17.1 compiled from source.

Here's the btrfs filesystem info:
Label: none  uuid: a914c141-72bf-448b-847f-d64ee82d8b7b
        Total devices 1 FS bytes used 342.85GiB
        devid    1 size 875.44GiB used 357.05GiB path
/dev/mapper/sda3_crypt

A scrub shows no errors:
scrub status for a914c141-72bf-448b-847f-d64ee82d8b7b
        scrub started at Thu Sep 13 10:20:18 2018 and finished after
00:12:19
        total bytes scrubbed: 342.78GiB with 0 errors

What can I do to help debugging this issue?



You should provide output of echo w > /proc/sysrq-trigger. Also
sample the stack of /proc/[pid of btrfs send]/stack to see if it is
changing.


Best regards,
Jürgen


Hello!

dmesg output can be found here:
https://pastebin.com/g86dPGSZ


So from what I see current transaction commit is waiting for
root->commit_root_sem and then other threads (in this case systemd) is
waiting for transaction commit to finish.


stacks can be found here:
https://pastebin.com/dCt1YgJp


ANd your user process seems to be making some progress as evident from
the fact that the call trace of the process is actually changing over
the course of sampling. Is it possible that it just takes time to do the
IO ?


Best regards,
Jürgen



Mit AquaMail Android
https://www.mobisystems.com/aqua-mail

Re: btrfs send hangs after partial transfer and blocks all IO

Reply via email to