So it may be that the write-back-throttling (wbt) for the underlying devices is getting confused about the exact throttle rates are for these devices and somehow getting stuck. It maybe worth experimenting by disabling the throttling and seeing if this gets I/O working again.
For example, to disable wbt for a device /dev/sda use: echo 0 | sudo tee /sys/block/sda/queue/wbt_lat_usec and if you need to reset it back to the default: echo -1 | sudo tee /sys/block/sda/queue/wbt_lat_usec ..use the appropriate block device name for the block devices you have attached. It may even be worth setting the wbt_lat_usec to 0 for all the block devices in your pool as early as possible after boot and see if this helps. My hunch is that wbt is not set correctly for the attached non-spinny devices you have and it is assuming they are reasonably fast when in fact it may be rate limited to be slower than expected because they are attached to relatively slow host. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in Native ZFS for Linux: Unknown Status in zfs-linux package in Ubuntu: Incomplete Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D 0 2161 2 0x00000028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_sync D 0 2333 2 0x00000028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D 0 75339 5499 0x00000000 [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D 0 95187 5499 0x00000000 [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30 [zfs] [25375.981052] zil_commit+0x48/0x70 [zfs] [25375.981280] zfs_write+0xa3c/0xb90 [zfs] [25375.981498] zpl_write_common_iovec+0xac/0x120 [zfs] [25375.981726] zpl_iter_write+0xe4/0x150 [zfs] [25375.981766] new_sync_write+0x100/0x1a8 [25375.981776] __vfs_write+0x74/0x90 [25375.981784] vfs_write+0xe4/0x1c8 [25375.981794] ksys_write+0x78/0x100 [25375.981803] __arm64_sys_write+0x24/0x30 [25375.981813] el0_svc_common.constprop.0+0x80/0x218 [25375.981821] el0_svc_handler+0x34/0xa0 [25375.981830] el0_svc+0x10/0x2cc [25375.981842] INFO: task zbackup:95188 blocked for more than 120 seconds. [25375.988615] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.995543] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25376.003532] zbackup D 0 95188 5499 0x00000008 [25376.003545] Call trace: [25376.003561] __switch_to+0x104/0x170 [25376.003571] __schedule+0x30c/0x7c0 [25376.003579] schedule+0x3c/0xb8 [25376.003619] cv_wait_common+0x188/0x1b0 [spl] [25376.003654] __cv_wait+0x30/0x40 [spl] [25376.003875] zil_commit_impl+0x234/0xd30 [zfs] [25376.004092] zil_commit+0x48/0x70 [zfs] [25376.004312] zfs_write+0xa3c/0xb90 [zfs] [25376.004530] zpl_write_common_iovec+0xac/0x120 [zfs] [25376.004748] zpl_iter_write+0xe4/0x150 [zfs] [25376.004759] new_sync_write+0x100/0x1a8 [25376.004768] __vfs_write+0x74/0x90 [25376.004776] vfs_write+0xe4/0x1c8 [25376.004785] ksys_write+0x78/0x100 [25376.004794] __arm64_sys_write+0x24/0x30 [25376.004803] el0_svc_common.constprop.0+0x80/0x218 [25376.004811] el0_svc_handler+0x34/0xa0 [25376.004820] el0_svc+0x10/0x2cc [25376.004917] INFO: task zbackup:95231 blocked for more than 120 seconds. [25376.011678] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25376.018587] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25376.026554] zbackup D 0 95231 5499 0x00000000 [25376.026562] Call trace: [25376.026575] __switch_to+0x104/0x170 [25376.026582] __schedule+0x30c/0x7c0 [25376.026588] schedule+0x3c/0xb8 [25376.026615] cv_wait_common+0x188/0x1b0 [spl] [25376.026636] __cv_wait+0x30/0x40 [spl] [25376.026771] zil_commit_impl+0x234/0xd30 [zfs] [25376.026904] zil_commit+0x48/0x70 [zfs] [25376.027042] zfs_write+0xa3c/0xb90 [zfs] [25376.027174] zpl_write_common_iovec+0xac/0x120 [zfs] [25376.027309] zpl_iter_write+0xe4/0x150 [zfs] [25376.027317] new_sync_write+0x100/0x1a8 [25376.027322] __vfs_write+0x74/0x90 [25376.027332] vfs_write+0xe4/0x1c8 [25376.027338] ksys_write+0x78/0x100 [25376.027344] __arm64_sys_write+0x24/0x30 [25376.027350] el0_svc_common.constprop.0+0x80/0x218 [25376.027355] el0_svc_handler+0x34/0xa0 [25376.027361] el0_svc+0x10/0x2cc [25376.027374] INFO: task z_wr_int:95546 blocked for more than 120 seconds. [25376.034203] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25376.041113] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25376.049080] z_wr_int D 0 95546 2 0x00000028 [25376.049089] Call trace: [25376.049100] __switch_to+0x104/0x170 [25376.049106] __schedule+0x30c/0x7c0 [25376.049111] schedule+0x3c/0xb8 [25376.049116] io_schedule+0x20/0x58 [25376.049123] rq_qos_wait+0x100/0x178 [25376.049129] wbt_wait+0xb4/0xf0 [25376.049135] __rq_qos_throttle+0x38/0x50 [25376.049143] blk_mq_make_request+0x128/0x610 [25376.049150] generic_make_request+0xb4/0x2d8 [25376.049156] submit_bio+0x48/0x218 [25376.049292] vdev_disk_io_start+0x670/0x9f8 [zfs] [25376.049423] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25376.049554] zio_nowait+0xd4/0x170 [zfs] [25376.049693] vdev_queue_io_done+0x1ec/0x2a0 [zfs] [25376.049847] zio_vdev_io_done+0xec/0x220 [zfs] [25376.049986] zio_execute+0xac/0x110 [zfs] [25376.050010] taskq_thread+0x2f8/0x570 [spl] [25376.050025] kthread+0xfc/0x128 [25376.050033] ret_from_fork+0x10/0x1c [25376.050041] INFO: task z_wr_int:95553 blocked for more than 120 seconds. [25376.056873] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25376.063780] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25376.071746] z_wr_int D 0 95553 2 0x00000028 [25376.071754] Call trace: [25376.071764] __switch_to+0x104/0x170 [25376.071770] __schedule+0x30c/0x7c0 [25376.071775] schedule+0x3c/0xb8 [25376.071780] io_schedule+0x20/0x58 [25376.071787] rq_qos_wait+0x100/0x178 [25376.071792] wbt_wait+0xb4/0xf0 [25376.071798] __rq_qos_throttle+0x38/0x50 [25376.071806] blk_mq_make_request+0x128/0x610 [25376.071813] generic_make_request+0xb4/0x2d8 [25376.071819] submit_bio+0x48/0x218 [25376.071955] vdev_disk_io_start+0x670/0x9f8 [zfs] [25376.072086] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25376.072217] zio_nowait+0xd4/0x170 [zfs] [25376.072346] vdev_queue_io_done+0x1ec/0x2a0 [zfs] [25376.072476] zio_vdev_io_done+0xec/0x220 [zfs] [25376.072606] zio_execute+0xac/0x110 [zfs] [25376.072629] taskq_thread+0x2f8/0x570 [spl] [25376.072637] kthread+0xfc/0x128 [25376.072643] ret_from_fork+0x10/0x1c [25496.722315] INFO: task z_wr_iss_h:2161 blocked for more than 241 seconds. [25496.729251] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25496.736195] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25496.744197] z_wr_iss_h D 0 2161 2 0x00000028 [25496.744247] Call trace: [25496.744268] __switch_to+0x104/0x170 [25496.744279] __schedule+0x30c/0x7c0 [25496.744288] schedule+0x3c/0xb8 [25496.744296] io_schedule+0x20/0x58 [25496.744308] rq_qos_wait+0x100/0x178 [25496.744317] wbt_wait+0xb4/0xf0 [25496.744327] __rq_qos_throttle+0x38/0x50 [25496.744340] blk_mq_make_request+0x128/0x610 [25496.744351] generic_make_request+0xb4/0x2d8 [25496.744362] submit_bio+0x48/0x218 [25496.744600] vdev_disk_io_start+0x670/0x9f8 [zfs] [25496.744821] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25496.745039] zio_nowait+0xd4/0x170 [zfs] [25496.745256] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25496.745473] zio_vdev_io_start+0x248/0x2b8 [zfs] [25496.745690] zio_execute+0xac/0x110 [zfs] [25496.745730] taskq_thread+0x2f8/0x570 [spl] [25496.745742] kthread+0xfc/0x128 [25496.745753] ret_from_fork+0x10/0x1c [25496.745853] INFO: task zbackup:75339 blocked for more than 241 seconds. [25496.752621] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25496.759557] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25496.767547] zbackup D 0 75339 5499 0x00000000 [25496.767561] Call trace: [25496.767585] __switch_to+0x104/0x170 [25496.767595] __schedule+0x30c/0x7c0 [25496.767606] schedule+0x3c/0xb8 [25496.767646] cv_wait_common+0x188/0x1b0 [spl] [25496.767682] __cv_wait+0x30/0x40 [spl] [25496.767906] zil_commit_impl+0x234/0xd30 [zfs] [25496.768126] zil_commit+0x48/0x70 [zfs] [25496.768355] zfs_create+0x544/0x7d0 [zfs] [25496.768574] zpl_create+0xb8/0x178 [zfs] [25496.768595] lookup_open+0x4ec/0x6a8 [25496.768604] do_last+0x260/0x8c0 [25496.768614] path_openat+0x84/0x258 [25496.768624] do_filp_open+0x84/0x108 [25496.768636] do_sys_open+0x180/0x2b0 [25496.768647] __arm64_sys_openat+0x2c/0x38 [25496.768657] el0_svc_common.constprop.0+0x80/0x218 [25496.768665] el0_svc_handler+0x34/0xa0 [25496.768674] el0_svc+0x10/0x2cc tyson@ubuntu:~$ lsb_release -rd Description: Ubuntu 20.04.1 LTS Release: 20.04 tyson@ubuntu:~$ apt-cache policy zfs-dkms zfs-dkms: Installed: 0.8.3-1ubuntu12.4 Candidate: 0.8.3-1ubuntu12.4 Version table: *** 0.8.3-1ubuntu12.4 500 500 http://ports.ubuntu.com/ubuntu-ports focal-updates/universe arm64 Packages 500 http://ports.ubuntu.com/ubuntu-ports focal-updates/universe armhf Packages 100 /var/lib/dpkg/status 0.8.3-1ubuntu12 500 500 http://ports.ubuntu.com/ubuntu-ports focal/universe arm64 Packages 500 http://ports.ubuntu.com/ubuntu-ports focal/universe armhf Packages ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: zfs-dkms 0.8.3-1ubuntu12.4 ProcVersionSignature: Ubuntu 5.4.0-1018.20-raspi 5.4.55 Uname: Linux 5.4.0-1018-raspi aarch64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu27.8 Architecture: arm64 CasperMD5CheckResult: skip Date: Sat Oct 10 08:50:34 2020 ImageMediaBuild: 20200423.1 PackageArchitecture: all ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: zfs-linux UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp