So it may be that the write-back-throttling (wbt) for the underlying
devices is getting confused about the exact throttle rates are for these
devices and somehow getting stuck.  It maybe worth experimenting by
disabling the throttling and seeing if this gets I/O working again.

For example, to disable wbt for a device /dev/sda use:

echo 0 | sudo tee /sys/block/sda/queue/wbt_lat_usec

and if you need to reset it back to the default:

echo -1 | sudo tee /sys/block/sda/queue/wbt_lat_usec

..use the appropriate block device name for the block devices you have
attached. It may even be worth setting the wbt_lat_usec to 0 for all the
block devices in your pool as early as possible after boot and see if
this helps.  My hunch is that wbt is not set correctly for the attached
non-spinny devices you have and it is assuming they are reasonably fast
when in fact it may be rate limited to be slower than expected because
they are attached to relatively slow host.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in Native ZFS for Linux:
  Unknown
Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h      D    0  2161      2 0x00000028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_sync        D    0  2333      2 0x00000028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup         D    0 75339   5499 0x00000000
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup         D    0 95187   5499 0x00000000
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.980612]  __cv_wait+0x30/0x40 [spl]
  [25375.980834]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.981052]  zil_commit+0x48/0x70 [zfs]
  [25375.981280]  zfs_write+0xa3c/0xb90 [zfs]
  [25375.981498]  zpl_write_common_iovec+0xac/0x120 [zfs]
  [25375.981726]  zpl_iter_write+0xe4/0x150 [zfs]
  [25375.981766]  new_sync_write+0x100/0x1a8
  [25375.981776]  __vfs_write+0x74/0x90
  [25375.981784]  vfs_write+0xe4/0x1c8
  [25375.981794]  ksys_write+0x78/0x100
  [25375.981803]  __arm64_sys_write+0x24/0x30
  [25375.981813]  el0_svc_common.constprop.0+0x80/0x218
  [25375.981821]  el0_svc_handler+0x34/0xa0
  [25375.981830]  el0_svc+0x10/0x2cc
  [25375.981842] INFO: task zbackup:95188 blocked for more than 120 seconds.
  [25375.988615]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
  [25375.995543] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25376.003532] zbackup         D    0 95188   5499 0x00000008
  [25376.003545] Call trace:
  [25376.003561]  __switch_to+0x104/0x170
  [25376.003571]  __schedule+0x30c/0x7c0
  [25376.003579]  schedule+0x3c/0xb8
  [25376.003619]  cv_wait_common+0x188/0x1b0 [spl]
  [25376.003654]  __cv_wait+0x30/0x40 [spl]
  [25376.003875]  zil_commit_impl+0x234/0xd30 [zfs]
  [25376.004092]  zil_commit+0x48/0x70 [zfs]
  [25376.004312]  zfs_write+0xa3c/0xb90 [zfs]
  [25376.004530]  zpl_write_common_iovec+0xac/0x120 [zfs]
  [25376.004748]  zpl_iter_write+0xe4/0x150 [zfs]
  [25376.004759]  new_sync_write+0x100/0x1a8
  [25376.004768]  __vfs_write+0x74/0x90
  [25376.004776]  vfs_write+0xe4/0x1c8
  [25376.004785]  ksys_write+0x78/0x100
  [25376.004794]  __arm64_sys_write+0x24/0x30
  [25376.004803]  el0_svc_common.constprop.0+0x80/0x218
  [25376.004811]  el0_svc_handler+0x34/0xa0
  [25376.004820]  el0_svc+0x10/0x2cc
  [25376.004917] INFO: task zbackup:95231 blocked for more than 120 seconds.
  [25376.011678]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
  [25376.018587] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25376.026554] zbackup         D    0 95231   5499 0x00000000
  [25376.026562] Call trace:
  [25376.026575]  __switch_to+0x104/0x170
  [25376.026582]  __schedule+0x30c/0x7c0
  [25376.026588]  schedule+0x3c/0xb8
  [25376.026615]  cv_wait_common+0x188/0x1b0 [spl]
  [25376.026636]  __cv_wait+0x30/0x40 [spl]
  [25376.026771]  zil_commit_impl+0x234/0xd30 [zfs]
  [25376.026904]  zil_commit+0x48/0x70 [zfs]
  [25376.027042]  zfs_write+0xa3c/0xb90 [zfs]
  [25376.027174]  zpl_write_common_iovec+0xac/0x120 [zfs]
  [25376.027309]  zpl_iter_write+0xe4/0x150 [zfs]
  [25376.027317]  new_sync_write+0x100/0x1a8
  [25376.027322]  __vfs_write+0x74/0x90
  [25376.027332]  vfs_write+0xe4/0x1c8
  [25376.027338]  ksys_write+0x78/0x100
  [25376.027344]  __arm64_sys_write+0x24/0x30
  [25376.027350]  el0_svc_common.constprop.0+0x80/0x218
  [25376.027355]  el0_svc_handler+0x34/0xa0
  [25376.027361]  el0_svc+0x10/0x2cc
  [25376.027374] INFO: task z_wr_int:95546 blocked for more than 120 seconds.
  [25376.034203]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
  [25376.041113] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25376.049080] z_wr_int        D    0 95546      2 0x00000028
  [25376.049089] Call trace:
  [25376.049100]  __switch_to+0x104/0x170
  [25376.049106]  __schedule+0x30c/0x7c0
  [25376.049111]  schedule+0x3c/0xb8
  [25376.049116]  io_schedule+0x20/0x58
  [25376.049123]  rq_qos_wait+0x100/0x178
  [25376.049129]  wbt_wait+0xb4/0xf0
  [25376.049135]  __rq_qos_throttle+0x38/0x50
  [25376.049143]  blk_mq_make_request+0x128/0x610
  [25376.049150]  generic_make_request+0xb4/0x2d8
  [25376.049156]  submit_bio+0x48/0x218
  [25376.049292]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25376.049423]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25376.049554]  zio_nowait+0xd4/0x170 [zfs]
  [25376.049693]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
  [25376.049847]  zio_vdev_io_done+0xec/0x220 [zfs]
  [25376.049986]  zio_execute+0xac/0x110 [zfs]
  [25376.050010]  taskq_thread+0x2f8/0x570 [spl]
  [25376.050025]  kthread+0xfc/0x128
  [25376.050033]  ret_from_fork+0x10/0x1c
  [25376.050041] INFO: task z_wr_int:95553 blocked for more than 120 seconds.
  [25376.056873]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
  [25376.063780] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25376.071746] z_wr_int        D    0 95553      2 0x00000028
  [25376.071754] Call trace:
  [25376.071764]  __switch_to+0x104/0x170
  [25376.071770]  __schedule+0x30c/0x7c0
  [25376.071775]  schedule+0x3c/0xb8
  [25376.071780]  io_schedule+0x20/0x58
  [25376.071787]  rq_qos_wait+0x100/0x178
  [25376.071792]  wbt_wait+0xb4/0xf0
  [25376.071798]  __rq_qos_throttle+0x38/0x50
  [25376.071806]  blk_mq_make_request+0x128/0x610
  [25376.071813]  generic_make_request+0xb4/0x2d8
  [25376.071819]  submit_bio+0x48/0x218
  [25376.071955]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25376.072086]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25376.072217]  zio_nowait+0xd4/0x170 [zfs]
  [25376.072346]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
  [25376.072476]  zio_vdev_io_done+0xec/0x220 [zfs]
  [25376.072606]  zio_execute+0xac/0x110 [zfs]
  [25376.072629]  taskq_thread+0x2f8/0x570 [spl]
  [25376.072637]  kthread+0xfc/0x128
  [25376.072643]  ret_from_fork+0x10/0x1c
  [25496.722315] INFO: task z_wr_iss_h:2161 blocked for more than 241 seconds.
  [25496.729251]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
  [25496.736195] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25496.744197] z_wr_iss_h      D    0  2161      2 0x00000028
  [25496.744247] Call trace:
  [25496.744268]  __switch_to+0x104/0x170
  [25496.744279]  __schedule+0x30c/0x7c0
  [25496.744288]  schedule+0x3c/0xb8
  [25496.744296]  io_schedule+0x20/0x58
  [25496.744308]  rq_qos_wait+0x100/0x178
  [25496.744317]  wbt_wait+0xb4/0xf0
  [25496.744327]  __rq_qos_throttle+0x38/0x50
  [25496.744340]  blk_mq_make_request+0x128/0x610
  [25496.744351]  generic_make_request+0xb4/0x2d8
  [25496.744362]  submit_bio+0x48/0x218
  [25496.744600]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25496.744821]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25496.745039]  zio_nowait+0xd4/0x170 [zfs]
  [25496.745256]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25496.745473]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25496.745690]  zio_execute+0xac/0x110 [zfs]
  [25496.745730]  taskq_thread+0x2f8/0x570 [spl]
  [25496.745742]  kthread+0xfc/0x128
  [25496.745753]  ret_from_fork+0x10/0x1c
  [25496.745853] INFO: task zbackup:75339 blocked for more than 241 seconds.
  [25496.752621]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
  [25496.759557] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25496.767547] zbackup         D    0 75339   5499 0x00000000
  [25496.767561] Call trace:
  [25496.767585]  __switch_to+0x104/0x170
  [25496.767595]  __schedule+0x30c/0x7c0
  [25496.767606]  schedule+0x3c/0xb8
  [25496.767646]  cv_wait_common+0x188/0x1b0 [spl]
  [25496.767682]  __cv_wait+0x30/0x40 [spl]
  [25496.767906]  zil_commit_impl+0x234/0xd30 [zfs]
  [25496.768126]  zil_commit+0x48/0x70 [zfs]
  [25496.768355]  zfs_create+0x544/0x7d0 [zfs]
  [25496.768574]  zpl_create+0xb8/0x178 [zfs]
  [25496.768595]  lookup_open+0x4ec/0x6a8
  [25496.768604]  do_last+0x260/0x8c0
  [25496.768614]  path_openat+0x84/0x258
  [25496.768624]  do_filp_open+0x84/0x108
  [25496.768636]  do_sys_open+0x180/0x2b0
  [25496.768647]  __arm64_sys_openat+0x2c/0x38
  [25496.768657]  el0_svc_common.constprop.0+0x80/0x218
  [25496.768665]  el0_svc_handler+0x34/0xa0
  [25496.768674]  el0_svc+0x10/0x2cc

  tyson@ubuntu:~$ lsb_release -rd
  Description:    Ubuntu 20.04.1 LTS
  Release:        20.04

  tyson@ubuntu:~$ apt-cache policy zfs-dkms
  zfs-dkms:
    Installed: 0.8.3-1ubuntu12.4
    Candidate: 0.8.3-1ubuntu12.4
    Version table:
   *** 0.8.3-1ubuntu12.4 500
          500 http://ports.ubuntu.com/ubuntu-ports focal-updates/universe arm64 
Packages
          500 http://ports.ubuntu.com/ubuntu-ports focal-updates/universe armhf 
Packages
          100 /var/lib/dpkg/status
       0.8.3-1ubuntu12 500
          500 http://ports.ubuntu.com/ubuntu-ports focal/universe arm64 Packages
          500 http://ports.ubuntu.com/ubuntu-ports focal/universe armhf Packages

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: zfs-dkms 0.8.3-1ubuntu12.4
  ProcVersionSignature: Ubuntu 5.4.0-1018.20-raspi 5.4.55
  Uname: Linux 5.4.0-1018-raspi aarch64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu27.8
  Architecture: arm64
  CasperMD5CheckResult: skip
  Date: Sat Oct 10 08:50:34 2020
  ImageMediaBuild: 20200423.1
  PackageArchitecture: all
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: zfs-linux
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to