On 1/9/19 11:03 AM, Nikolay Borisov wrote:
On 9.01.19 г. 11:16 ч., Florian Stecker wrote:
Provide output of echo w > /proc/sysrq-trigger when the hang occurs
otherwise it's hard to figure what's going on.
Here's one, again in gajim. This time, fdatasync() took "only" 2 seconds:
[42481.243491] sysrq: SysRq : Show Blocked State
[42481.243494] task PC stack pid father
[42481.243566] gajim D 0 15778 15774 0x00000083
[42481.243569] Call Trace:
[42481.243575] ? __schedule+0x29b/0x8b0
[42481.243576] ? bit_wait+0x50/0x50
[42481.243578] schedule+0x32/0x90
[42481.243580] io_schedule+0x12/0x40
[42481.243582] bit_wait_io+0xd/0x50
[42481.243583] __wait_on_bit+0x6c/0x80
[42481.243585] out_of_line_wait_on_bit+0x91/0xb0
[42481.243587] ? init_wait_var_entry+0x40/0x40
[42481.243605] write_all_supers+0x418/0xa70 [btrfs]
[42481.243622] btrfs_sync_log+0x695/0x910 [btrfs]
[42481.243625] ? _raw_spin_lock_irqsave+0x25/0x50
[42481.243641] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[42481.243655] btrfs_sync_file+0x3a9/0x3d0 [btrfs]
[42481.243659] do_fsync+0x38/0x70
[42481.243661] __x64_sys_fdatasync+0x13/0x20
[42481.243663] do_syscall_64+0x5b/0x170
[42481.243666] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[42481.243667] RIP: 0033:0x7fd4022f873f
[42481.243671] Code: Bad RIP value.
[42481.243672] RSP: 002b:00007ffd3710a300 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[42481.243674] RAX: ffffffffffffffda RBX: 0000000000000019 RCX:
00007fd4022f873f
[42481.243675] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
0000000000000019
[42481.243675] RBP: 0000000000000000 R08: 000055d8d8649f68 R09:
00007ffd3710a320
[42481.243676] R10: 0000000000013000 R11: 0000000000000293 R12:
0000000000000000
[42481.243677] R13: 0000000000000000 R14: 000055d8d8363fa0 R15:
000055d8d8613040
This shows that IO was send to disk to write the supper blocks following
an fsync and it's waiting for IO to finish. This seems like a problem in
the storage layer, i.e IOs being stuck. Check your dmesg for any error.
There are no IO errors in dmesg. Also, I never had any problems with
this disk, SMART reports nothing, and also btrfs dev stats and btrfs
scrub say everything's ok.
I now found a way to reproduce this issue more reliably: If I just write
10KB of random data to a file and sync, this usually takes only a few
ms, but on my setup if I do it 1000 times, about 10 of them will be
longer than 100ms, sometimes much longer:
$ for i in $(seq 0 1000); do dd if=/dev/urandom of=/home/stecker/test
bs=10k count=1 conv=fdatasync 2>&1 && sleep 0.1; done | grep
'([1-9][0-9]*\.|0\.[1-9])[0-9]* s'
10240 bytes (10 kB, 10 KiB) copied, 1.12436 s, 9.1 kB/s
10240 bytes (10 kB, 10 KiB) copied, 1.33179 s, 7.7 kB/s
10240 bytes (10 kB, 10 KiB) copied, 1.27658 s, 8.0 kB/s
10240 bytes (10 kB, 10 KiB) copied, 0.401769 s, 25.5 kB/s
10240 bytes (10 kB, 10 KiB) copied, 1.019 s, 10.0 kB/s
10240 bytes (10 kB, 10 KiB) copied, 1.95148 s, 5.2 kB/s
10240 bytes (10 kB, 10 KiB) copied, 1.48939 s, 6.9 kB/s
10240 bytes (10 kB, 10 KiB) copied, 1.9071 s, 5.4 kB/s
10240 bytes (10 kB, 10 KiB) copied, 1.90988 s, 5.4 kB/s
10240 bytes (10 kB, 10 KiB) copied, 0.845141 s, 12.1 kB/s
10240 bytes (10 kB, 10 KiB) copied, 0.184172 s, 55.6 kB/s
If I use the two partitions /dev/sda2 not as part of a single fs, but as
a seperate btrfs filesystems, this does not happen, all writes are fast.
But this should not make a difference for the storage layer, or should
it? I mean, it writes the superblocks to the exact same position on the
disk?
By the way, thanks a lot for your help!
On 1/9/19 7:24 AM, Nikolay Borisov wrote:
On 8.01.19 г. 21:38 ч., Florian Stecker wrote:
Hi everyone,
I extended the btrfs volume on my laptop by adding a second partition to
it which lies on the same SSD (using btrfs device add). Since I did
this, all kinds of applications regularly hang for up to 30 seconds. It
seems they are stuck in the fdatasync syscall. For example:
$ strace -tt -T gajim 2>&1 | grep fdatasync
[...]
11:36:31.112200 fdatasync(25) = 0 <0.006958>
11:36:32.147525 fdatasync(25) = 0 <0.008138>
11:36:32.156882 fdatasync(25) = 0 <0.006866>
11:36:32.165979 fdatasync(25) = 0 <0.011797>
11:36:32.178867 fdatasync(25) = 0 <23.636614>
11:36:55.827726 fdatasync(25) = 0 <0.009595>
11:36:55.838702 fdatasync(25) = 0 <0.007261>
11:36:55.850440 fdatasync(25) = 0 <0.006807>
11:36:55.858168 fdatasync(25) = 0 <0.006767>
[...]
File descriptor 25 here points to a file which is just ~90KB, so it
really shouldn't take that long.
Removing the second partition again resolves the problem. Does anyone
know this issue? Is it related to btrfs? Or am I just doing something
wrong?
Best,
Florian
Some more info:
$ btrfs device usage /
/dev/sda2, ID: 2
Device size: 52.16GiB
Device slack: 0.00B
Data,single: 1.00GiB
Unallocated: 51.16GiB
/dev/sda8, ID: 1
Device size: 174.92GiB
Device slack: 0.00B
Data,single: 168.91GiB
Metadata,single: 3.01GiB
System,single: 4.00MiB
Unallocated: 3.00GiB
$ fdisk -l /dev/sda
Disk /dev/sda: 238.5 GiB, 256060514304 bytes, 500118192 sectors
Disk model: SAMSUNG SSD PM87
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: A48B5A25-AA84-4D3F-90DD-E8A4991BDF03
Device Start End Sectors Size Type
/dev/sda1 2048 1026047 1024000 500M EFI System
/dev/sda2 1026048 110422015 109395968 52.2G Linux filesystem
/dev/sda8 110422016 477263871 366841856 174.9G Linux filesystem
/dev/sda9 477263872 481458175 4194304 2G Linux swap
$ uname -a
Linux dell 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC
2018 x86_64 GNU/Linux