[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-23 Thread Tyson Key
Thanks, Colin. I intend to retest, with 20.10, but for now, with
deduplication disabled, things seem stable.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in Native ZFS for Linux:
  Unknown
Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h  D0  2161  2 0x0028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_syncD0  2333  2 0x0028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup D0 75339   5499 0x
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup D0 95187   5499 0x
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.980612]  __cv_wait+0x30/0x40 [spl]
  [25375.980834]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.981052]  zil_commit+0x48/0x70 [zfs]
  [25375.981280]  zfs_write+0xa3c/0xb90 [zfs]
  [25375.981498]  zpl_write_common_iovec+0xac/0x120 [zfs]
  [25375.981726]  zpl_iter_write+0xe4/0x150 [zfs]
  [25375.981766]  new_sync_write+0x100/0x1a8
  [25375.981776]  __vfs_write+0x74/0x90
  [25375.981784]  vfs_write+0xe4/0x1c8
  [25375.981794]  ksys_write+0x78/0x100
  [25375.981803]

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-14 Thread Tyson Key
And, after 18 hours+, I was able to archive a ZFS snapshot, using
BorgBackup, with the WBT settings, suggested, earlier:

tyson@ubuntu:~$ sudo zfs send Yaesu@Crucial-2TB-1951E22FA633 |  borg create 
--stats BorgStore::Yaesu@Crucial-2TB-1951E22FA633 -
^[[A^[[A--
Archive name: Yaesu@Crucial-2TB-1951E22FA633
Archive fingerprint: 
32d64e2a13eb543e4846cc34ac2c3609fe192127e3bf6d324a0fa9d61b988c19
Time (start): Tue, 2020-10-13 15:49:51
Time (end):   Wed, 2020-10-14 10:01:46
Duration: 18 hours 11 minutes 54.87 seconds
Number of files: 1
Utilization of max. archive size: 0%
--
   Original size  Compressed sizeDeduplicated size
This archive:1.89 TB  1.75 TB617.36 GB
All archives:7.44 TB  5.87 TB  1.99 TB

   Unique chunks Total chunks
Chunk index:  842436  2607257
--

The "dmesg" log is clean, too:

tyson@ubuntu:~$ dmesg
[0.00] Booting Linux on physical CPU 0x00 [0x410fd083]
[0.00] Linux version 5.4.0-1018-raspi (buildd@bos02-arm64-052) (gcc 
version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 
2020 (Ubuntu 5.4.0-1018.20-raspi 5.4.55)
[0.00] Machine model: Raspberry Pi 4 Model B Rev 1.4
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: UEFI not found.
[0.00] Reserved memory: created CMA memory pool at 0x2c00, 
size 64 MiB
[0.00] OF: reserved mem: initialized node linux,cma, compatible id 
shared-dma-pool
[0.00] On node 0 totalpages: 2061312
[0.00]   DMA zone: 3792 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 242688 pages, LIFO batch:63
[0.00]   DMA32 zone: 12288 pages used for memmap
[0.00]   DMA32 zone: 770048 pages, LIFO batch:63
[0.00]   Normal zone: 16384 pages used for memmap
[0.00]   Normal zone: 1048576 pages, LIFO batch:63
[0.00] percpu: Embedded 32 pages/cpu s92120 r8192 d30760 u131072
[0.00] pcpu-alloc: s92120 r8192 d30760 u131072 alloc=32*4096
[0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[0.00] Detected PIPT I-cache on CPU0
[0.00] CPU features: detected: EL2 vector hardening
[0.00] CPU features: kernel page table isolation forced ON by KASLR
[0.00] CPU features: detected: Kernel page table isolation (KPTI)
[0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 2028848
[0.00] Kernel command line:  coherent_pool=1M 8250.nr_uarts=1 
snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 
snd_bcm2835.enable_headphones=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 
bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:B8:1A:28 
vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000  net.ifnames=0 
dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=LABEL=writable 
rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash
[0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 
bytes, linear)
[0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 
bytes, linear)
[0.00] mem auto-init: stack:off, heap alloc:off, heap free:off
[0.00] software IO TLB: mapped [mem 0x3740-0x3b40] (64MB)
[0.00] Memory: 7898140K/8245248K available (11772K kernel code, 1236K 
rwdata, 4244K rodata, 6144K init, 1076K bss, 281572K reserved, 65536K 
cma-reserved)
[0.00] random: get_random_u64 called from kmem_cache_open+0x38/0x3e0 
with crng_init=0
[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[0.00] ftrace: allocating 39001 entries in 153 pages
[0.00] rcu: Hierarchical RCU implementation.
[0.00]  Tasks RCU enabled.
[0.00] rcu: RCU calculated value of scheduler-enlistment delay is 25 
jiffies.
[0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[0.00] GIC: Using split EOI/Deactivate mode
[0.00] arch_timer: cp15 timer(s) running at 54.00MHz (phys).
[0.00] clocksource: arch_sys_counter: mask: 0xff 
max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns
[0.05] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 
4398046511102ns
[0.000200] Console: colour dummy device 80x25
[0.000221] printk: console [tty1] enabled
[0.000273] Calibrating delay loop (skipped), value calculated using timer 
frequency.. 108.00 BogoMIPS (lpj=216000)
[0.000288] pid_max: default: 32768 minimum: 301
[0.000495] LSM: Security Framework initializing
[0.000543] Yama: becoming mindful.
[0.000631] AppArmor: AppArmor in

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
Needs more time, in terms of long-running I/O, but changing the WBT
setting at least makes retrieving snapshots a little faster, with "zfs
list -t all".

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in Native ZFS for Linux:
  Unknown
Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h  D0  2161  2 0x0028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_syncD0  2333  2 0x0028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup D0 75339   5499 0x
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup D0 95187   5499 0x
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.980612]  __cv_wait+0x30/0x40 [spl]
  [25375.980834]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.981052]  zil_commit+0x48/0x70 [zfs]
  [25375.981280]  zfs_write+0xa3c/0xb90 [zfs]
  [25375.981498]  zpl_write_common_iovec+0xac/0x120 [zfs]
  [25375.981726]  zpl_iter_write+0xe4/0x150 [zfs]
  [25375.981766]  new_sync_write+0x100/0x1a8
  [25375.981776]  __vfs_write+0x74/0x90
  [25375.981784]  vfs_write+0xe4/0x1c8
  [25375.981

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
Set the WBT value to 0, for all of the devices, whilst another archive
run takes place. The last one ran for just over 20 hours, before hitting
the I/O problem.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in Native ZFS for Linux:
  Unknown
Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h  D0  2161  2 0x0028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_syncD0  2333  2 0x0028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup D0 75339   5499 0x
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup D0 95187   5499 0x
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.980612]  __cv_wait+0x30/0x40 [spl]
  [25375.980834]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.981052]  zil_commit+0x48/0x70 [zfs]
  [25375.981280]  zfs_write+0xa3c/0xb90 [zfs]
  [25375.981498]  zpl_write_common_iovec+0xac/0x120 [zfs]
  [25375.981726]  zpl_iter_write+0xe4/0x150 [zfs]
  [25375.981766]  new_sync_write+0x100/0x1a8
  [25375.981776]  __vfs_write+0x74/0x90
  [25375.981784]  vfs_write+0xe4/0x1c8
  [2

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
All of the drives in the pool are SSDs (2x Crucial BX500 (2TB), 1x
SanDisk Ultra 3D (4TB), 1x Samsung 870 QVO (4TB), but obviously don't
perfectly-align, in terms of performance characteristics - but, I'll
test the WBT tuneable, to see if it makes a difference.

I'm also suspecting that some power management weirdness, affecting
waking up USB devices, is at play, on this machine, but it's a Ubuntu
Server installation, without any special GUI, or Raspberry Pi-specific
components installed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in Native ZFS for Linux:
  Unknown
Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h  D0  2161  2 0x0028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_syncD0  2333  2 0x0028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup D0 75339   5499 0x
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup D0 95187   5499 0x
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.980612]  __cv_wait+0x30/0x40 [spl]
  [25375.980834]  zil_commit_impl+0x234/0xd30 

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
Here's a full "dmesg" log, if it helps:

tyson@ubuntu:~$ dmesg
[0.00] Booting Linux on physical CPU 0x00 [0x410fd083]
[0.00] Linux version 5.4.0-1018-raspi (buildd@bos02-arm64-052) (gcc 
version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 
2020 (Ubuntu 5.4.0-1018.20-raspi 5.4.55)
[0.00] Machine model: Raspberry Pi 4 Model B Rev 1.4
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: UEFI not found.
[0.00] Reserved memory: created CMA memory pool at 0x2c00, 
size 64 MiB
[0.00] OF: reserved mem: initialized node linux,cma, compatible id 
shared-dma-pool
[0.00] On node 0 totalpages: 2061312
[0.00]   DMA zone: 3792 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 242688 pages, LIFO batch:63
[0.00]   DMA32 zone: 12288 pages used for memmap
[0.00]   DMA32 zone: 770048 pages, LIFO batch:63
[0.00]   Normal zone: 16384 pages used for memmap
[0.00]   Normal zone: 1048576 pages, LIFO batch:63
[0.00] percpu: Embedded 32 pages/cpu s92120 r8192 d30760 u131072
[0.00] pcpu-alloc: s92120 r8192 d30760 u131072 alloc=32*4096
[0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[0.00] Detected PIPT I-cache on CPU0
[0.00] CPU features: detected: EL2 vector hardening
[0.00] CPU features: kernel page table isolation forced ON by KASLR
[0.00] CPU features: detected: Kernel page table isolation (KPTI)
[0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 2028848
[0.00] Kernel command line:  coherent_pool=1M 8250.nr_uarts=1 
snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 
snd_bcm2835.enable_headphones=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 
bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:B8:1A:28 
vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000  net.ifnames=0 
dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=LABEL=writable 
rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash
[0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 
bytes, linear)
[0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 
bytes, linear)
[0.00] mem auto-init: stack:off, heap alloc:off, heap free:off
[0.00] software IO TLB: mapped [mem 0x3740-0x3b40] (64MB)
[0.00] Memory: 7898136K/8245248K available (11772K kernel code, 1236K 
rwdata, 4244K rodata, 6144K init, 1076K bss, 281576K reserved, 65536K 
cma-reserved)
[0.00] random: get_random_u64 called from kmem_cache_open+0x38/0x3e0 
with crng_init=0
[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[0.00] ftrace: allocating 39001 entries in 153 pages
[0.00] rcu: Hierarchical RCU implementation.
[0.00]  Tasks RCU enabled.
[0.00] rcu: RCU calculated value of scheduler-enlistment delay is 25 
jiffies.
[0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[0.00] GIC: Using split EOI/Deactivate mode
[0.00] arch_timer: cp15 timer(s) running at 54.00MHz (phys).
[0.00] clocksource: arch_sys_counter: mask: 0xff 
max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns
[0.05] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 
4398046511102ns
[0.000200] Console: colour dummy device 80x25
[0.000221] printk: console [tty1] enabled
[0.000273] Calibrating delay loop (skipped), value calculated using timer 
frequency.. 108.00 BogoMIPS (lpj=216000)
[0.000287] pid_max: default: 32768 minimum: 301
[0.000491] LSM: Security Framework initializing
[0.000540] Yama: becoming mindful.
[0.000629] AppArmor: AppArmor initialized
[0.000874] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, 
linear)
[0.000961] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 
bytes, linear)
[0.001028] *** VALIDATE tmpfs ***
[0.001634] *** VALIDATE proc ***
[0.002155] Disabling memory control group subsystem
[0.002224] *** VALIDATE cgroup1 ***
[0.002234] *** VALIDATE cgroup2 ***
[0.003438] ASID allocator initialised with 32768 entries
[0.003601] rcu: Hierarchical SRCU implementation.
[0.005657] EFI services will not be available.
[0.006469] smp: Bringing up secondary CPUs ...
[0.007318] Detected PIPT I-cache on CPU1
[0.007371] CPU1: Booted secondary processor 0x01 [0x410fd083]
[0.008416] Detected PIPT I-cache on CPU2
[0.008454] CPU2: Booted secondary processor 0x02 [0x410fd083]
[0.009407] Detected PIPT I-cache on CPU3
[0.009444] CPU3: Booted secondary processor 0x03 [0x410fd083]
[0.009586] smp: Brought up 1 node, 4 CPUs
[0.009594] SMP: Total of 4 processors activated.
[0.009604] CPU features: detected: 32-bit EL0 Support
[0.009613] CPU fe

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
I'm using 16GB of swap, on a ZVOL, in the pool, but I can also test with
swap, on the internal MicroSD card's EXT4 root partition, if it helps.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in Native ZFS for Linux:
  Unknown
Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h  D0  2161  2 0x0028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_syncD0  2333  2 0x0028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup D0 75339   5499 0x
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup D0 95187   5499 0x
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.980612]  __cv_wait+0x30/0x40 [spl]
  [25375.980834]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.981052]  zil_commit+0x48/0x70 [zfs]
  [25375.981280]  zfs_write+0xa3c/0xb90 [zfs]
  [25375.981498]  zpl_write_common_iovec+0xac/0x120 [zfs]
  [25375.981726]  zpl_iter_write+0xe4/0x150 [zfs]
  [25375.981766]  new_sync_write+0x100/0x1a8
  [25375.981776]  __vfs_write+0x74/0x90
  [25375.981784]  vfs_write+0xe4/0x1c8
  [25375.981794]  ksy

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
Looks like I could archive about 1.3TB, of a 1.64TB snapshot, before
things started to go bad, again:

[110079.681102] INFO: task z_wr_iss_h:2171 blocked for more than 120 seconds.
[110079.688123]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[110079.695167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[110079.703260] z_wr_iss_h  D0  2171  2 0x0028
[110079.703276] Call trace:
[110079.703297]  __switch_to+0x104/0x170
[110079.703308]  __schedule+0x30c/0x7c0
[110079.703317]  schedule+0x3c/0xb8
[110079.703325]  io_schedule+0x20/0x58
[110079.703338]  rq_qos_wait+0x100/0x178
[110079.703347]  wbt_wait+0xb4/0xf0
[110079.703357]  __rq_qos_throttle+0x38/0x50
[110079.703370]  blk_mq_make_request+0x128/0x610
[110079.703382]  generic_make_request+0xb4/0x2d8
[110079.703392]  submit_bio+0x48/0x218
[110079.703629]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[110079.703850]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[110079.704068]  zio_nowait+0xd4/0x170 [zfs]
[110079.704285]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[110079.704503]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[110079.704720]  zio_execute+0xac/0x110 [zfs]
[110079.704760]  taskq_thread+0x2f8/0x570 [spl]
[110079.704773]  kthread+0xfc/0x128
[110079.704784]  ret_from_fork+0x10/0x1c
[110079.704795] INFO: task z_wr_int:2173 blocked for more than 120 seconds.
[110079.711650]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[110079.718668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[110079.726745] z_wr_intD0  2173  2 0x0028
[110079.726760] Call trace:
[110079.726780]  __switch_to+0x104/0x170
[110079.726790]  __schedule+0x30c/0x7c0
[110079.726803]  schedule+0x3c/0xb8
[110079.726811]  io_schedule+0x20/0x58
[110079.726823]  rq_qos_wait+0x100/0x178
[110079.726832]  wbt_wait+0xb4/0xf0
[110079.726842]  __rq_qos_throttle+0x38/0x50
[110079.726854]  blk_mq_make_request+0x128/0x610
[110079.726866]  generic_make_request+0xb4/0x2d8
[110079.726877]  submit_bio+0x48/0x218
[110079.727100]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[110079.727320]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[110079.727549]  zio_nowait+0xd4/0x170 [zfs]
[110079.727766]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
[110079.727991]  zio_vdev_io_done+0xec/0x220 [zfs]
[110079.728209]  zio_execute+0xac/0x110 [zfs]
[110079.728255]  taskq_thread+0x2f8/0x570 [spl]
[110079.728268]  kthread+0xfc/0x128
[110079.728279]  ret_from_fork+0x10/0x1c
[110079.728309] INFO: task txg_sync:2388 blocked for more than 120 seconds.
[110079.735157]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[110079.742175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[110079.750306] txg_syncD0  2388  2 0x0028
[110079.750321] Call trace:
[110079.750337]  __switch_to+0x104/0x170
[110079.750348]  __schedule+0x30c/0x7c0
[110079.750356]  schedule+0x3c/0xb8
[110079.750366]  schedule_timeout+0x9c/0x190
[110079.750374]  io_schedule_timeout+0x28/0x48
[110079.750412]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
[110079.750447]  __cv_timedwait_io+0x3c/0x50 [spl]
[110079.750670]  zio_wait+0x130/0x2a0 [zfs]
[110079.750890]  dsl_pool_sync+0x3fc/0x498 [zfs]
[110079.751109]  spa_sync+0x538/0xe68 [zfs]
[110079.751326]  txg_sync_thread+0x2c0/0x468 [zfs]
[110079.751364]  thread_generic_wrapper+0x74/0xa0 [spl]
[110079.751377]  kthread+0xfc/0x128
[110079.751388]  ret_from_fork+0x10/0x1c
[110079.751459] INFO: task borg:1063669 blocked for more than 120 seconds.
[110079.758224]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[110079.765237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[110079.773310] borgD0 1063669 1063661 0x
[110079.773324] Call trace:
[110079.773344]  __switch_to+0x104/0x170
[110079.773355]  __schedule+0x30c/0x7c0
[110079.773366]  schedule+0x3c/0xb8
[110079.773406]  cv_wait_common+0x188/0x1b0 [spl]
[110079.773441]  __cv_wait+0x30/0x40 [spl]
[110079.773664]  zil_commit_impl+0x234/0xd30 [zfs]
[110079.773882]  zil_commit+0x48/0x70 [zfs]
[110079.774112]  zfs_write+0xa3c/0xb90 [zfs]
[110079.774331]  zpl_write_common_iovec+0xac/0x120 [zfs]
[110079.774557]  zpl_iter_write+0xe4/0x150 [zfs]
[110079.774569]  new_sync_write+0x100/0x1a8
[110079.774578]  __vfs_write+0x74/0x90
[110079.774593]  vfs_write+0xe4/0x1c8
[110079.774603]  ksys_write+0x78/0x100
[110079.774612]  __arm64_sys_write+0x24/0x30
[110079.774622]  el0_svc_common.constprop.0+0x80/0x218
[110079.774631]  el0_svc_handler+0x34/0xa0
[110079.774640]  el0_svc+0x10/0x2cc
[110079.774665] INFO: task z_wr_iss_h:3853507 blocked for more than 120 seconds.
[110079.781953]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[110079.788973] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[110079.797052] z_wr_iss_h  D0 3853507  2 0x0028
[110079.797067] Call trace:
[110079.797083]  __switch_to+0x104/0x170
[110079.797094]  __schedule+0x30c/0x7c0
[110

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-12 Thread Tyson Key
Thanks.

I was able to archive, and delete 400GB of data, without problems,
earlier on, today, which reduced the "REFER" of my data set, a little -
however, it looks like I probably need to focus on archiving, and
removing some of the older snapshots, if I want to trim down the memory
utilisation of my pool.

When I arrived home, from work, "dmesg" didn't have anything interesting
to say, in terms of panics, or errors - which seems promising, but it's
still early days, yet.

Seems a little like a "luxury problem", that I find myself with more
storage space, and data, than available RAM, given that usually, the
opposite situation is true...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h  D0  2161  2 0x0028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_syncD0  2333  2 0x0028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup D0 75339   5499 0x
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup D0 95187   5499 0x
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_com

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-12 Thread Tyson Key
Unfortunately, I haven't got a storage device large enough, to contain
all of the data from the pool, and much of it cannot be recreated, or
restored, from another source, so I won't be able to nuke the pool, and
rebuild it.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h  D0  2161  2 0x0028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_syncD0  2333  2 0x0028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup D0 75339   5499 0x
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup D0 95187   5499 0x
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.980612]  __cv_wait+0x30/0x40 [spl]
  [25375.980834]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.981052]  zil_commit+0x48/0x70 [zfs]
  [25375.981280]  zfs_write+0xa3c/0xb90 [zfs]
  [25375.981498]  zpl_write_common_iovec+0xac/0x120 [zfs]
  [25375.981726]  zpl_iter_write+0xe4/0x150 [zfs]
  [25375.981766]  new_sync_write+0x100/0x1a8
  [25375.981776]  __vfs_write+0x74/0x90
  [25375.981784]  vfs

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-11 Thread Tyson Key
Also receive this, when trying to archive a large directory:

[ 2055.147509] INFO: task z_wr_iss_h:2169 blocked for more than 120 seconds.
[ 2055.154450]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[ 2055.161401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2055.169403] z_wr_iss_h  D0  2169  2 0x0028
[ 2055.169420] Call trace:
[ 2055.169441]  __switch_to+0x104/0x170
[ 2055.169453]  __schedule+0x30c/0x7c0
[ 2055.169462]  schedule+0x3c/0xb8
[ 2055.169470]  io_schedule+0x20/0x58
[ 2055.169483]  rq_qos_wait+0x100/0x178
[ 2055.169492]  wbt_wait+0xb4/0xf0
[ 2055.169502]  __rq_qos_throttle+0x38/0x50
[ 2055.169514]  blk_mq_make_request+0x128/0x610
[ 2055.169526]  generic_make_request+0xb4/0x2d8
[ 2055.169537]  submit_bio+0x48/0x218
[ 2055.169779]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[ 2055.170005]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[ 2055.170229]  zio_nowait+0xd4/0x170 [zfs]
[ 2055.170451]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[ 2055.170673]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[ 2055.170896]  zio_execute+0xac/0x110 [zfs]
[ 2055.170937]  taskq_thread+0x2f8/0x570 [spl]
[ 2055.170950]  kthread+0xfc/0x128
[ 2055.170960]  ret_from_fork+0x10/0x1c
[ 2055.170990] INFO: task txg_quiesce:2380 blocked for more than 120 seconds.
[ 2055.178025]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[ 2055.184958] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2055.192949] txg_quiesce D0  2380  2 0x0028
[ 2055.192963] Call trace:
[ 2055.192984]  __switch_to+0x104/0x170
[ 2055.192995]  __schedule+0x30c/0x7c0
[ 2055.193003]  schedule+0x3c/0xb8
[ 2055.193046]  cv_wait_common+0x188/0x1b0 [spl]
[ 2055.193082]  __cv_wait+0x30/0x40 [spl]
[ 2055.193310]  txg_quiesce_thread+0x27c/0x380 [zfs]
[ 2055.193349]  thread_generic_wrapper+0x74/0xa0 [spl]
[ 2055.193370]  kthread+0xfc/0x128
[ 2055.193381]  ret_from_fork+0x10/0x1c
[ 2055.193438] INFO: task smbd:3662 blocked for more than 120 seconds.
[ 2055.199853]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[ 2055.206784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2055.214772] smbdD0  3662   3498 0x
[ 2055.214786] Call trace:
[ 2055.214800]  __switch_to+0x104/0x170
[ 2055.214810]  __schedule+0x30c/0x7c0
[ 2055.214818]  schedule+0x3c/0xb8
[ 2055.214856]  cv_wait_common+0x188/0x1b0 [spl]
[ 2055.214892]  __cv_wait+0x30/0x40 [spl]
[ 2055.215121]  zil_commit_impl+0x234/0xd30 [zfs]
[ 2055.215344]  zil_commit+0x48/0x70 [zfs]
[ 2055.215567]  zfs_read+0x2f8/0x418 [zfs]
[ 2055.215831]  zpl_read_common_iovec+0xa8/0x110 [zfs]
[ 2055.216065]  zpl_read_common+0x4c/0x70 [zfs]
[ 2055.216288]  zpl_xattr_get_dir+0x10c/0x128 [zfs]
[ 2055.216519]  zpl_xattr_get+0xcc/0x1a8 [zfs]
[ 2055.216742]  zpl_xattr_user_get+0x68/0x90 [zfs]
[ 2055.216764]  __vfs_getxattr+0x60/0x80
[ 2055.216774]  vfs_getxattr+0x140/0x160
[ 2055.216784]  getxattr+0x9c/0x2c0
[ 2055.216794]  path_getxattr+0x7c/0xd0
[ 2055.216805]  __arm64_sys_getxattr+0x28/0x38
[ 2055.216815]  el0_svc_common.constprop.0+0x80/0x218
[ 2055.216823]  el0_svc_handler+0x34/0xa0
[ 2055.216833]  el0_svc+0x10/0x2cc
[ 2055.216858] INFO: task borg:5639 blocked for more than 120 seconds.
[ 2055.223265]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[ 2055.230193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2055.238179] borgD0  5639   4001 0x
[ 2055.238193] Call trace:
[ 2055.238208]  __switch_to+0x104/0x170
[ 2055.238218]  __schedule+0x30c/0x7c0
[ 2055.238226]  schedule+0x3c/0xb8
[ 2055.238267]  cv_wait_common+0x188/0x1b0 [spl]
[ 2055.238303]  __cv_wait+0x30/0x40 [spl]
[ 2055.238530]  zil_commit_impl+0x234/0xd30 [zfs]
[ 2055.238752]  zil_commit+0x48/0x70 [zfs]
[ 2055.238975]  zfs_write+0xa3c/0xb90 [zfs]
[ 2055.239197]  zpl_write_common_iovec+0xac/0x120 [zfs]
[ 2055.239419]  zpl_iter_write+0xe4/0x150 [zfs]
[ 2055.239471]  new_sync_write+0x100/0x1a8
[ 2055.239480]  __vfs_write+0x74/0x90
[ 2055.239489]  vfs_write+0xe4/0x1c8
[ 2055.239505]  ksys_write+0x78/0x100
[ 2055.239514]  __arm64_sys_write+0x24/0x30
[ 2055.239524]  el0_svc_common.constprop.0+0x80/0x218
[ 2055.239532]  el0_svc_handler+0x34/0xa0
[ 2055.239541]  el0_svc+0x10/0x2cc
[ 2055.239555] INFO: task kworker/u8:3:35327 blocked for more than 120 seconds.
[ 2055.246754]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[ 2055.253681] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2055.261670] kworker/u8:3D0 35327  2 0x0028
[ 2055.261715] Workqueue: writeback wb_workfn (flush-zfs-1)
[ 2055.261728] Call trace:
[ 2055.261740]  __switch_to+0x104/0x170
[ 2055.261750]  __schedule+0x30c/0x7c0
[ 2055.261758]  schedule+0x3c/0xb8
[ 2055.261798]  cv_wait_common+0x188/0x1b0 [spl]
[ 2055.261834]  __cv_wait+0x30/0x40 [spl]
[ 2055.262059]  zil_commit_impl+0x234/0xd30 [zfs]
[ 2055.262284]  zil_commit+0x48/0x70 [zfs]
[

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-11 Thread Tyson Key
With deduplication disabled, it eventually gets further with the backup,
but still stalls, after leaving it, overnight:

[12084.274242] INFO: task z_wr_iss_h:2157 blocked for more than 120 seconds.
[12084.281171]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[12084.288126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[12084.296127] z_wr_iss_h  D0  2157  2 0x0028
[12084.296142] Call trace:
[12084.296163]  __switch_to+0x104/0x170
[12084.296174]  __schedule+0x30c/0x7c0
[12084.296183]  schedule+0x3c/0xb8
[12084.296191]  io_schedule+0x20/0x58
[12084.296204]  rq_qos_wait+0x100/0x178
[12084.296213]  wbt_wait+0xb4/0xf0
[12084.296223]  __rq_qos_throttle+0x38/0x50
[12084.296235]  blk_mq_make_request+0x128/0x610
[12084.296247]  generic_make_request+0xb4/0x2d8
[12084.296258]  submit_bio+0x48/0x218
[12084.296496]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[12084.296716]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[12084.296935]  zio_nowait+0xd4/0x170 [zfs]
[12084.297151]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[12084.297369]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[12084.297586]  zio_execute+0xac/0x110 [zfs]
[12084.297626]  taskq_thread+0x2f8/0x570 [spl]
[12084.297639]  kthread+0xfc/0x128
[12084.297650]  ret_from_fork+0x10/0x1c
[12084.297678] INFO: task txg_quiesce:3077 blocked for more than 120 seconds.
[12084.304712]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[12084.311641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[12084.319632] txg_quiesce D0  3077  2 0x0028
[12084.319646] Call trace:
[12084.319668]  __switch_to+0x104/0x170
[12084.319679]  __schedule+0x30c/0x7c0
[12084.319691]  schedule+0x3c/0xb8
[12084.319731]  cv_wait_common+0x188/0x1b0 [spl]
[12084.319767]  __cv_wait+0x30/0x40 [spl]
[12084.319990]  txg_quiesce_thread+0x27c/0x380 [zfs]
[12084.320028]  thread_generic_wrapper+0x74/0xa0 [spl]
[12084.320050]  kthread+0xfc/0x128
[12084.320062]  ret_from_fork+0x10/0x1c
[12084.320133] INFO: task borg:13631 blocked for more than 120 seconds.
[12084.326632]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[12084.333562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[12084.341550] borgD0 13631  13630 0x
[12084.341564] Call trace:
[12084.341578]  __switch_to+0x104/0x170
[12084.341588]  __schedule+0x30c/0x7c0
[12084.341596]  schedule+0x3c/0xb8
[12084.341635]  cv_wait_common+0x188/0x1b0 [spl]
[12084.341670]  __cv_wait+0x30/0x40 [spl]
[12084.341895]  zil_commit_impl+0x234/0xd30 [zfs]
[12084.342112]  zil_commit+0x48/0x70 [zfs]
[12084.342331]  zfs_write+0xa3c/0xb90 [zfs]
[12084.342588]  zpl_write_common_iovec+0xac/0x120 [zfs]
[12084.342817]  zpl_iter_write+0xe4/0x150 [zfs]
[12084.342830]  new_sync_write+0x100/0x1a8
[12084.342839]  __vfs_write+0x74/0x90
[12084.342855]  vfs_write+0xe4/0x1c8
[12084.342865]  ksys_write+0x78/0x100
[12084.342874]  __arm64_sys_write+0x24/0x30
[12084.342884]  el0_svc_common.constprop.0+0x80/0x218
[12084.342892]  el0_svc_handler+0x34/0xa0
[12084.342902]  el0_svc+0x10/0x2cc
[12084.342927] INFO: task z_wr_iss_h:114156 blocked for more than 120 seconds.
[12084.350038]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[12084.356966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[12084.364950] z_wr_iss_h  D0 114156  2 0x0028
[12084.364964] Call trace:
[12084.364981]  __switch_to+0x104/0x170
[12084.364990]  __schedule+0x30c/0x7c0
[12084.364998]  schedule+0x3c/0xb8
[12084.365006]  io_schedule+0x20/0x58
[12084.365019]  rq_qos_wait+0x100/0x178
[12084.365027]  wbt_wait+0xb4/0xf0
[12084.365037]  __rq_qos_throttle+0x38/0x50
[12084.365050]  blk_mq_make_request+0x128/0x610
[12084.365061]  generic_make_request+0xb4/0x2d8
[12084.365072]  submit_bio+0x48/0x218
[12084.365297]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[12084.365516]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[12084.365733]  zio_nowait+0xd4/0x170 [zfs]
[12084.365949]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[12084.366167]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[12084.366427]  zio_execute+0xac/0x110 [zfs]
[12084.366476]  taskq_thread+0x2f8/0x570 [spl]
[12084.366491]  kthread+0xfc/0x128
[12084.366502]  ret_from_fork+0x10/0x1c
[12084.366514] INFO: task z_wr_iss_h:114161 blocked for more than 120 seconds.
[12084.373624]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[12084.380553] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[12084.388541] z_wr_iss_h  D0 114161  2 0x0028
[12084.388555] Call trace:
[12084.388570]  __switch_to+0x104/0x170
[12084.388580]  __schedule+0x30c/0x7c0
[12084.388588]  schedule+0x3c/0xb8
[12084.388596]  io_schedule+0x20/0x58
[12084.388608]  rq_qos_wait+0x100/0x178
[12084.388616]  wbt_wait+0xb4/0xf0
[12084.388627]  __rq_qos_throttle+0x38/0x50
[12084.388638]  blk_mq_make_request+0x128/0x610
[12084.388650]  generic_make_request+0xb4/0x2d8
[12084.388660]  submit_bio+

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-11 Thread Tyson Key
In the meantime, I'll see if I can temporarily disable ZPool-level
deduplication, and retry the backup run, again, on the Windows machine.

Whilst it's not the perfect long-term solution, I might look into using
offline deduplication, for older, infrequently-accessed data in the pool
(probably with Borg/ZBackup), since this machine is mostly-used for bulk
storage of archival/back-up data, and file/HTTP serving, in a small
network.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h  D0  2161  2 0x0028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_syncD0  2333  2 0x0028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup D0 75339   5499 0x
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup D0 95187   5499 0x
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.980612]  __cv_wait+0x30/0x40 [spl]
  [25375.980834]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.981052]  zil_commit+0x48/0x70 [zfs]
  [25375.981280]  zfs_write+0xa3c/0xb90 [zfs]
  [2

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-11 Thread Tyson Key
It looks like writing snapshots, with "zfs snapshot" will sometimes
stall, even if other commands, like "zfs status" appear to work,
occasionally, too.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h  D0  2161  2 0x0028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_syncD0  2333  2 0x0028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup D0 75339   5499 0x
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup D0 95187   5499 0x
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.980612]  __cv_wait+0x30/0x40 [spl]
  [25375.980834]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.981052]  zil_commit+0x48/0x70 [zfs]
  [25375.981280]  zfs_write+0xa3c/0xb90 [zfs]
  [25375.981498]  zpl_write_common_iovec+0xac/0x120 [zfs]
  [25375.981726]  zpl_iter_write+0xe4/0x150 [zfs]
  [25375.981766]  new_sync_write+0x100/0x1a8
  [25375.981776]  __vfs_write+0x74/0x90
  [25375.981784]  vfs_write+0xe4/0x1c8
  [25375.981794]  ksys_write+0x78/0x100
  [25375.981803

[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-11 Thread Tyson Key
Thanks for your reply (I expected this to go into the ether, since it
seems to be a very common issue, with I/O, on this hardware platform).

I did try switching from LZJB, to LZ4, and got slightly-better
performance, and reliability, from at least testing with backing up a
Windows 10 machine, using WSL, and Borg, over SSH - however, I can still
lock the system up, after several runs, or get it to a stage where I can
send commands, over an open SSH session, but spawning a new one fails,
since it can't access my home directory (which is stored on the ZFS
array).

My last log, with Borg, is:

[   22.536219] bcmgenet: Skipping UMAC reset
[   22.537803] bcmgenet fd58.ethernet: configuring instance for external 
RGMII
[   22.538029] bcmgenet fd58.ethernet eth0: Link is Down
[   25.140217] new mount options do not match the existing superblock, will be 
ignored
[   26.751285] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled
[   27.651756] bcmgenet fd58.ethernet eth0: Link is Up - 1Gbps/Full - flow 
control rx/tx
[   27.651790] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   29.696209] Started bpfilter
[   29.701240] bpfilter: Loaded bpfilter_umh pid 3166
[   29.858264] bridge: filtering via arp/ip/ip6tables is no longer available by 
default. Update your scripts to load br_netfilter if you need this.
[   30.379027] kauditd_printk_skb: 27 callbacks suppressed
[   30.379032] audit: type=1400 audit(1602433969.316:39): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="/usr/bin/lxc-start" pid=3259 comm="apparmor_parser"
[   30.457589] audit: type=1400 audit(1602433969.396:40): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default" pid=3266 
comm="apparmor_parser"
[   30.457603] audit: type=1400 audit(1602433969.396:41): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default-cgns" pid=3266 
comm="apparmor_parser"
[   30.457620] audit: type=1400 audit(1602433969.396:42): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default-with-mounting" pid=3266 
comm="apparmor_parser"
[   30.457629] audit: type=1400 audit(1602433969.396:43): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default-with-nesting" pid=3266 
comm="apparmor_parser"
[   36.556387] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   36.556393] Bluetooth: BNEP filters: protocol multicast
[   36.556407] Bluetooth: BNEP socket layer initialized
[   74.720044] Adding 16777212k swap on /dev/zd0.  Priority:-2 extents:1 
across:16777212k SSFS
[10030.116676] INFO: task z_rd_int:2154 blocked for more than 120 seconds.
[10030.123432]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[10030.130389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.138388] z_rd_intD0  2154  2 0x0028
[10030.138403] Call trace:
[10030.138424]  __switch_to+0x104/0x170
[10030.138435]  __schedule+0x30c/0x7c0
[10030.138444]  schedule+0x3c/0xb8
[10030.138453]  io_schedule+0x20/0x58
[10030.138466]  rq_qos_wait+0x100/0x178
[10030.138475]  wbt_wait+0xb4/0xf0
[10030.138485]  __rq_qos_throttle+0x38/0x50
[10030.138497]  blk_mq_make_request+0x128/0x610
[10030.138509]  generic_make_request+0xb4/0x2d8
[10030.138519]  submit_bio+0x48/0x218
[10030.138756]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10030.138977]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10030.139197]  zio_execute+0xac/0x110 [zfs]
[10030.139414]  vdev_queue_io_done+0x18c/0x2a0 [zfs]
[10030.139632]  zio_vdev_io_done+0xec/0x220 [zfs]
[10030.139849]  zio_execute+0xac/0x110 [zfs]
[10030.139891]  taskq_thread+0x2f8/0x570 [spl]
[10030.139904]  kthread+0xfc/0x128
[10030.139915]  ret_from_fork+0x10/0x1c
[10030.139928] INFO: task z_rd_int:2158 blocked for more than 120 seconds.
[10030.146692]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[10030.153619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.161608] z_rd_intD0  2158  2 0x0028
[10030.161622] Call trace:
[10030.161643]  __switch_to+0x104/0x170
[10030.161655]  __schedule+0x30c/0x7c0
[10030.161663]  schedule+0x3c/0xb8
[10030.161673]  io_schedule+0x20/0x58
[10030.161686]  rq_qos_wait+0x100/0x178
[10030.161694]  wbt_wait+0xb4/0xf0
[10030.161705]  __rq_qos_throttle+0x38/0x50
[10030.161717]  blk_mq_make_request+0x128/0x610
[10030.161729]  generic_make_request+0xb4/0x2d8
[10030.161739]  submit_bio+0x48/0x218
[10030.161965]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10030.162186]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10030.162416]  zio_nowait+0xd4/0x170 [zfs]
[10030.162633]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
[10030.162859]  zio_vdev_io_done+0xec/0x220 [zfs

[Kernel-packages] [Bug 1899249] [NEW] OpenZFS writing stalls, under load

2020-10-10 Thread Tyson Key
Public bug reported:

Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

This issue seems to occur, intermittently, after some time (happens with
both SMB access, via Samba, and when interacting with the system, via
SSH), and never previously occurred, until a few months ago, and I
sometimes have to force a reboot of the system (at the cost of some data
loss), in order to use it again.

The "dmesg" log reports:

[25375.911590] z_wr_iss_h  D0  2161  2 0x0028
[25375.911606] Call trace:
[25375.911627]  __switch_to+0x104/0x170
[25375.911639]  __schedule+0x30c/0x7c0
[25375.911647]  schedule+0x3c/0xb8
[25375.911655]  io_schedule+0x20/0x58
[25375.911668]  rq_qos_wait+0x100/0x178
[25375.911677]  wbt_wait+0xb4/0xf0
[25375.911687]  __rq_qos_throttle+0x38/0x50
[25375.911700]  blk_mq_make_request+0x128/0x610
[25375.911712]  generic_make_request+0xb4/0x2d8
[25375.911722]  submit_bio+0x48/0x218
[25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[25375.912400]  zio_nowait+0xd4/0x170 [zfs]
[25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[25375.913057]  zio_execute+0xac/0x110 [zfs]
[25375.913096]  taskq_thread+0x2f8/0x570 [spl]
[25375.913108]  kthread+0xfc/0x128
[25375.913119]  ret_from_fork+0x10/0x1c
[25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
[25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[25375.934835] txg_syncD0  2333  2 0x0028
[25375.934850] Call trace:
[25375.934869]  __switch_to+0x104/0x170
[25375.934879]  __schedule+0x30c/0x7c0
[25375.934887]  schedule+0x3c/0xb8
[25375.934899]  schedule_timeout+0x9c/0x190
[25375.934908]  io_schedule_timeout+0x28/0x48
[25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
[25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
[25375.935205]  zio_wait+0x130/0x2a0 [zfs]
[25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
[25375.935650]  spa_sync+0x538/0xe68 [zfs]
[25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
[25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
[25375.935924]  kthread+0xfc/0x128
[25375.935935]  ret_from_fork+0x10/0x1c
[25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
[25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[25375.957702] zbackup D0 75339   5499 0x
[25375.957716] Call trace:
[25375.957732]  __switch_to+0x104/0x170
[25375.957742]  __schedule+0x30c/0x7c0
[25375.957750]  schedule+0x3c/0xb8
[25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
[25375.957823]  __cv_wait+0x30/0x40 [spl]
[25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
[25375.958263]  zil_commit+0x48/0x70 [zfs]
[25375.958481]  zfs_create+0x544/0x7d0 [zfs]
[25375.958698]  zpl_create+0xb8/0x178 [zfs]
[25375.958711]  lookup_open+0x4ec/0x6a8
[25375.958721]  do_last+0x260/0x8c0
[25375.958730]  path_openat+0x84/0x258
[25375.958739]  do_filp_open+0x84/0x108
[25375.958752]  do_sys_open+0x180/0x2b0
[25375.958763]  __arm64_sys_openat+0x2c/0x38
[25375.958773]  el0_svc_common.constprop.0+0x80/0x218
[25375.958781]  el0_svc_handler+0x34/0xa0
[25375.958791]  el0_svc+0x10/0x2cc
[25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
[25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[25375.980479] zbackup D0 95187   5499 0x
[25375.980493] Call trace:
[25375.980514]  __switch_to+0x104/0x170
[25375.980525]  __schedule+0x30c/0x7c0
[25375.980536]  schedule+0x3c/0xb8
[25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
[25375.980612]  __cv_wait+0x30/0x40 [spl]
[25375.980834]  zil_commit_impl+0x234/0xd30 [zfs]
[25375.981052]  zil_commit+0x48/0x70 [zfs]
[25375.981280]  zfs_write+0xa3c/0xb90 [zfs]
[25375.981498]  zpl_write_common_iovec+0xac/0x120 [zfs]
[25375.981726]  zpl_iter_write+0xe4/0x150 [zfs]
[25375.981766]  new_sync_write+0x100/0x1a8
[25375.981776]  __vfs_write+0x74/0x90
[25375.981784]  vfs_write+0xe4/0x1c8
[25375.981794]  ksys_write+0x78/0x100
[25375.981803]  __arm64_sys_write+0x24/0x30
[25375.981813]  el0_svc_common.constprop.0+0x80/0x218
[25375.981821]  el0_svc_handler+0x34/0xa0
[25375.981830]  el0_svc+0x10/0x2cc
[25375.981842] INFO: task zbackup:95188 blocked for more than 120 seconds.
[25375.988615]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[25375.995543] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[25376.003532] zbackup D0 95188   5499 0x0008
[25376.003545] Call trace:
[25376.003561]  __switch_to+0x104/0x170
[25376.003571]  __schedule+0x30c/0x7c0
[25376.003579]  schedule+