[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
Thanks, Colin. I intend to retest, with 20.10, but for now, with deduplication disabled, things seem stable. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in Native ZFS for Linux: Unknown Status in zfs-linux package in Ubuntu: Incomplete Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30 [zfs] [25375.981052] zil_commit+0x48/0x70 [zfs] [25375.981280] zfs_write+0xa3c/0xb90 [zfs] [25375.981498] zpl_write_common_iovec+0xac/0x120 [zfs] [25375.981726] zpl_iter_write+0xe4/0x150 [zfs] [25375.981766] new_sync_write+0x100/0x1a8 [25375.981776] __vfs_write+0x74/0x90 [25375.981784] vfs_write+0xe4/0x1c8 [25375.981794] ksys_write+0x78/0x100 [25375.981803]
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
And, after 18 hours+, I was able to archive a ZFS snapshot, using BorgBackup, with the WBT settings, suggested, earlier: tyson@ubuntu:~$ sudo zfs send Yaesu@Crucial-2TB-1951E22FA633 | borg create --stats BorgStore::Yaesu@Crucial-2TB-1951E22FA633 - ^[[A^[[A-- Archive name: Yaesu@Crucial-2TB-1951E22FA633 Archive fingerprint: 32d64e2a13eb543e4846cc34ac2c3609fe192127e3bf6d324a0fa9d61b988c19 Time (start): Tue, 2020-10-13 15:49:51 Time (end): Wed, 2020-10-14 10:01:46 Duration: 18 hours 11 minutes 54.87 seconds Number of files: 1 Utilization of max. archive size: 0% -- Original size Compressed sizeDeduplicated size This archive:1.89 TB 1.75 TB617.36 GB All archives:7.44 TB 5.87 TB 1.99 TB Unique chunks Total chunks Chunk index: 842436 2607257 -- The "dmesg" log is clean, too: tyson@ubuntu:~$ dmesg [0.00] Booting Linux on physical CPU 0x00 [0x410fd083] [0.00] Linux version 5.4.0-1018-raspi (buildd@bos02-arm64-052) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 2020 (Ubuntu 5.4.0-1018.20-raspi 5.4.55) [0.00] Machine model: Raspberry Pi 4 Model B Rev 1.4 [0.00] efi: Getting EFI parameters from FDT: [0.00] efi: UEFI not found. [0.00] Reserved memory: created CMA memory pool at 0x2c00, size 64 MiB [0.00] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool [0.00] On node 0 totalpages: 2061312 [0.00] DMA zone: 3792 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 242688 pages, LIFO batch:63 [0.00] DMA32 zone: 12288 pages used for memmap [0.00] DMA32 zone: 770048 pages, LIFO batch:63 [0.00] Normal zone: 16384 pages used for memmap [0.00] Normal zone: 1048576 pages, LIFO batch:63 [0.00] percpu: Embedded 32 pages/cpu s92120 r8192 d30760 u131072 [0.00] pcpu-alloc: s92120 r8192 d30760 u131072 alloc=32*4096 [0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0.00] Detected PIPT I-cache on CPU0 [0.00] CPU features: detected: EL2 vector hardening [0.00] CPU features: kernel page table isolation forced ON by KASLR [0.00] CPU features: detected: Kernel page table isolation (KPTI) [0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware [0.00] Built 1 zonelists, mobility grouping on. Total pages: 2028848 [0.00] Kernel command line: coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 snd_bcm2835.enable_headphones=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:B8:1A:28 vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000 net.ifnames=0 dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash [0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off [0.00] software IO TLB: mapped [mem 0x3740-0x3b40] (64MB) [0.00] Memory: 7898140K/8245248K available (11772K kernel code, 1236K rwdata, 4244K rodata, 6144K init, 1076K bss, 281572K reserved, 65536K cma-reserved) [0.00] random: get_random_u64 called from kmem_cache_open+0x38/0x3e0 with crng_init=0 [0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [0.00] ftrace: allocating 39001 entries in 153 pages [0.00] rcu: Hierarchical RCU implementation. [0.00] Tasks RCU enabled. [0.00] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [0.00] GIC: Using split EOI/Deactivate mode [0.00] arch_timer: cp15 timer(s) running at 54.00MHz (phys). [0.00] clocksource: arch_sys_counter: mask: 0xff max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns [0.05] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 4398046511102ns [0.000200] Console: colour dummy device 80x25 [0.000221] printk: console [tty1] enabled [0.000273] Calibrating delay loop (skipped), value calculated using timer frequency.. 108.00 BogoMIPS (lpj=216000) [0.000288] pid_max: default: 32768 minimum: 301 [0.000495] LSM: Security Framework initializing [0.000543] Yama: becoming mindful. [0.000631] AppArmor: AppArmor in
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
Needs more time, in terms of long-running I/O, but changing the WBT setting at least makes retrieving snapshots a little faster, with "zfs list -t all". -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in Native ZFS for Linux: Unknown Status in zfs-linux package in Ubuntu: Incomplete Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30 [zfs] [25375.981052] zil_commit+0x48/0x70 [zfs] [25375.981280] zfs_write+0xa3c/0xb90 [zfs] [25375.981498] zpl_write_common_iovec+0xac/0x120 [zfs] [25375.981726] zpl_iter_write+0xe4/0x150 [zfs] [25375.981766] new_sync_write+0x100/0x1a8 [25375.981776] __vfs_write+0x74/0x90 [25375.981784] vfs_write+0xe4/0x1c8 [25375.981
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
Set the WBT value to 0, for all of the devices, whilst another archive run takes place. The last one ran for just over 20 hours, before hitting the I/O problem. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in Native ZFS for Linux: Unknown Status in zfs-linux package in Ubuntu: Incomplete Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30 [zfs] [25375.981052] zil_commit+0x48/0x70 [zfs] [25375.981280] zfs_write+0xa3c/0xb90 [zfs] [25375.981498] zpl_write_common_iovec+0xac/0x120 [zfs] [25375.981726] zpl_iter_write+0xe4/0x150 [zfs] [25375.981766] new_sync_write+0x100/0x1a8 [25375.981776] __vfs_write+0x74/0x90 [25375.981784] vfs_write+0xe4/0x1c8 [2
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
All of the drives in the pool are SSDs (2x Crucial BX500 (2TB), 1x SanDisk Ultra 3D (4TB), 1x Samsung 870 QVO (4TB), but obviously don't perfectly-align, in terms of performance characteristics - but, I'll test the WBT tuneable, to see if it makes a difference. I'm also suspecting that some power management weirdness, affecting waking up USB devices, is at play, on this machine, but it's a Ubuntu Server installation, without any special GUI, or Raspberry Pi-specific components installed. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in Native ZFS for Linux: Unknown Status in zfs-linux package in Ubuntu: Incomplete Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
Here's a full "dmesg" log, if it helps: tyson@ubuntu:~$ dmesg [0.00] Booting Linux on physical CPU 0x00 [0x410fd083] [0.00] Linux version 5.4.0-1018-raspi (buildd@bos02-arm64-052) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 2020 (Ubuntu 5.4.0-1018.20-raspi 5.4.55) [0.00] Machine model: Raspberry Pi 4 Model B Rev 1.4 [0.00] efi: Getting EFI parameters from FDT: [0.00] efi: UEFI not found. [0.00] Reserved memory: created CMA memory pool at 0x2c00, size 64 MiB [0.00] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool [0.00] On node 0 totalpages: 2061312 [0.00] DMA zone: 3792 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 242688 pages, LIFO batch:63 [0.00] DMA32 zone: 12288 pages used for memmap [0.00] DMA32 zone: 770048 pages, LIFO batch:63 [0.00] Normal zone: 16384 pages used for memmap [0.00] Normal zone: 1048576 pages, LIFO batch:63 [0.00] percpu: Embedded 32 pages/cpu s92120 r8192 d30760 u131072 [0.00] pcpu-alloc: s92120 r8192 d30760 u131072 alloc=32*4096 [0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0.00] Detected PIPT I-cache on CPU0 [0.00] CPU features: detected: EL2 vector hardening [0.00] CPU features: kernel page table isolation forced ON by KASLR [0.00] CPU features: detected: Kernel page table isolation (KPTI) [0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware [0.00] Built 1 zonelists, mobility grouping on. Total pages: 2028848 [0.00] Kernel command line: coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 snd_bcm2835.enable_headphones=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:B8:1A:28 vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000 net.ifnames=0 dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash [0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off [0.00] software IO TLB: mapped [mem 0x3740-0x3b40] (64MB) [0.00] Memory: 7898136K/8245248K available (11772K kernel code, 1236K rwdata, 4244K rodata, 6144K init, 1076K bss, 281576K reserved, 65536K cma-reserved) [0.00] random: get_random_u64 called from kmem_cache_open+0x38/0x3e0 with crng_init=0 [0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [0.00] ftrace: allocating 39001 entries in 153 pages [0.00] rcu: Hierarchical RCU implementation. [0.00] Tasks RCU enabled. [0.00] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [0.00] GIC: Using split EOI/Deactivate mode [0.00] arch_timer: cp15 timer(s) running at 54.00MHz (phys). [0.00] clocksource: arch_sys_counter: mask: 0xff max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns [0.05] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 4398046511102ns [0.000200] Console: colour dummy device 80x25 [0.000221] printk: console [tty1] enabled [0.000273] Calibrating delay loop (skipped), value calculated using timer frequency.. 108.00 BogoMIPS (lpj=216000) [0.000287] pid_max: default: 32768 minimum: 301 [0.000491] LSM: Security Framework initializing [0.000540] Yama: becoming mindful. [0.000629] AppArmor: AppArmor initialized [0.000874] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [0.000961] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [0.001028] *** VALIDATE tmpfs *** [0.001634] *** VALIDATE proc *** [0.002155] Disabling memory control group subsystem [0.002224] *** VALIDATE cgroup1 *** [0.002234] *** VALIDATE cgroup2 *** [0.003438] ASID allocator initialised with 32768 entries [0.003601] rcu: Hierarchical SRCU implementation. [0.005657] EFI services will not be available. [0.006469] smp: Bringing up secondary CPUs ... [0.007318] Detected PIPT I-cache on CPU1 [0.007371] CPU1: Booted secondary processor 0x01 [0x410fd083] [0.008416] Detected PIPT I-cache on CPU2 [0.008454] CPU2: Booted secondary processor 0x02 [0x410fd083] [0.009407] Detected PIPT I-cache on CPU3 [0.009444] CPU3: Booted secondary processor 0x03 [0x410fd083] [0.009586] smp: Brought up 1 node, 4 CPUs [0.009594] SMP: Total of 4 processors activated. [0.009604] CPU features: detected: 32-bit EL0 Support [0.009613] CPU fe
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
I'm using 16GB of swap, on a ZVOL, in the pool, but I can also test with swap, on the internal MicroSD card's EXT4 root partition, if it helps. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in Native ZFS for Linux: Unknown Status in zfs-linux package in Ubuntu: Incomplete Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30 [zfs] [25375.981052] zil_commit+0x48/0x70 [zfs] [25375.981280] zfs_write+0xa3c/0xb90 [zfs] [25375.981498] zpl_write_common_iovec+0xac/0x120 [zfs] [25375.981726] zpl_iter_write+0xe4/0x150 [zfs] [25375.981766] new_sync_write+0x100/0x1a8 [25375.981776] __vfs_write+0x74/0x90 [25375.981784] vfs_write+0xe4/0x1c8 [25375.981794] ksy
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
Looks like I could archive about 1.3TB, of a 1.64TB snapshot, before things started to go bad, again: [110079.681102] INFO: task z_wr_iss_h:2171 blocked for more than 120 seconds. [110079.688123] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [110079.695167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110079.703260] z_wr_iss_h D0 2171 2 0x0028 [110079.703276] Call trace: [110079.703297] __switch_to+0x104/0x170 [110079.703308] __schedule+0x30c/0x7c0 [110079.703317] schedule+0x3c/0xb8 [110079.703325] io_schedule+0x20/0x58 [110079.703338] rq_qos_wait+0x100/0x178 [110079.703347] wbt_wait+0xb4/0xf0 [110079.703357] __rq_qos_throttle+0x38/0x50 [110079.703370] blk_mq_make_request+0x128/0x610 [110079.703382] generic_make_request+0xb4/0x2d8 [110079.703392] submit_bio+0x48/0x218 [110079.703629] vdev_disk_io_start+0x670/0x9f8 [zfs] [110079.703850] zio_vdev_io_start+0xdc/0x2b8 [zfs] [110079.704068] zio_nowait+0xd4/0x170 [zfs] [110079.704285] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [110079.704503] zio_vdev_io_start+0x248/0x2b8 [zfs] [110079.704720] zio_execute+0xac/0x110 [zfs] [110079.704760] taskq_thread+0x2f8/0x570 [spl] [110079.704773] kthread+0xfc/0x128 [110079.704784] ret_from_fork+0x10/0x1c [110079.704795] INFO: task z_wr_int:2173 blocked for more than 120 seconds. [110079.711650] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [110079.718668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110079.726745] z_wr_intD0 2173 2 0x0028 [110079.726760] Call trace: [110079.726780] __switch_to+0x104/0x170 [110079.726790] __schedule+0x30c/0x7c0 [110079.726803] schedule+0x3c/0xb8 [110079.726811] io_schedule+0x20/0x58 [110079.726823] rq_qos_wait+0x100/0x178 [110079.726832] wbt_wait+0xb4/0xf0 [110079.726842] __rq_qos_throttle+0x38/0x50 [110079.726854] blk_mq_make_request+0x128/0x610 [110079.726866] generic_make_request+0xb4/0x2d8 [110079.726877] submit_bio+0x48/0x218 [110079.727100] vdev_disk_io_start+0x670/0x9f8 [zfs] [110079.727320] zio_vdev_io_start+0xdc/0x2b8 [zfs] [110079.727549] zio_nowait+0xd4/0x170 [zfs] [110079.727766] vdev_queue_io_done+0x1ec/0x2a0 [zfs] [110079.727991] zio_vdev_io_done+0xec/0x220 [zfs] [110079.728209] zio_execute+0xac/0x110 [zfs] [110079.728255] taskq_thread+0x2f8/0x570 [spl] [110079.728268] kthread+0xfc/0x128 [110079.728279] ret_from_fork+0x10/0x1c [110079.728309] INFO: task txg_sync:2388 blocked for more than 120 seconds. [110079.735157] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [110079.742175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110079.750306] txg_syncD0 2388 2 0x0028 [110079.750321] Call trace: [110079.750337] __switch_to+0x104/0x170 [110079.750348] __schedule+0x30c/0x7c0 [110079.750356] schedule+0x3c/0xb8 [110079.750366] schedule_timeout+0x9c/0x190 [110079.750374] io_schedule_timeout+0x28/0x48 [110079.750412] __cv_timedwait_common+0x1a8/0x1f8 [spl] [110079.750447] __cv_timedwait_io+0x3c/0x50 [spl] [110079.750670] zio_wait+0x130/0x2a0 [zfs] [110079.750890] dsl_pool_sync+0x3fc/0x498 [zfs] [110079.751109] spa_sync+0x538/0xe68 [zfs] [110079.751326] txg_sync_thread+0x2c0/0x468 [zfs] [110079.751364] thread_generic_wrapper+0x74/0xa0 [spl] [110079.751377] kthread+0xfc/0x128 [110079.751388] ret_from_fork+0x10/0x1c [110079.751459] INFO: task borg:1063669 blocked for more than 120 seconds. [110079.758224] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [110079.765237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110079.773310] borgD0 1063669 1063661 0x [110079.773324] Call trace: [110079.773344] __switch_to+0x104/0x170 [110079.773355] __schedule+0x30c/0x7c0 [110079.773366] schedule+0x3c/0xb8 [110079.773406] cv_wait_common+0x188/0x1b0 [spl] [110079.773441] __cv_wait+0x30/0x40 [spl] [110079.773664] zil_commit_impl+0x234/0xd30 [zfs] [110079.773882] zil_commit+0x48/0x70 [zfs] [110079.774112] zfs_write+0xa3c/0xb90 [zfs] [110079.774331] zpl_write_common_iovec+0xac/0x120 [zfs] [110079.774557] zpl_iter_write+0xe4/0x150 [zfs] [110079.774569] new_sync_write+0x100/0x1a8 [110079.774578] __vfs_write+0x74/0x90 [110079.774593] vfs_write+0xe4/0x1c8 [110079.774603] ksys_write+0x78/0x100 [110079.774612] __arm64_sys_write+0x24/0x30 [110079.774622] el0_svc_common.constprop.0+0x80/0x218 [110079.774631] el0_svc_handler+0x34/0xa0 [110079.774640] el0_svc+0x10/0x2cc [110079.774665] INFO: task z_wr_iss_h:3853507 blocked for more than 120 seconds. [110079.781953] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [110079.788973] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110079.797052] z_wr_iss_h D0 3853507 2 0x0028 [110079.797067] Call trace: [110079.797083] __switch_to+0x104/0x170 [110079.797094] __schedule+0x30c/0x7c0 [110
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
Thanks. I was able to archive, and delete 400GB of data, without problems, earlier on, today, which reduced the "REFER" of my data set, a little - however, it looks like I probably need to focus on archiving, and removing some of the older snapshots, if I want to trim down the memory utilisation of my pool. When I arrived home, from work, "dmesg" didn't have anything interesting to say, in terms of panics, or errors - which seems promising, but it's still early days, yet. Seems a little like a "luxury problem", that I find myself with more storage space, and data, than available RAM, given that usually, the opposite situation is true... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in zfs-linux package in Ubuntu: Incomplete Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_com
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
Unfortunately, I haven't got a storage device large enough, to contain all of the data from the pool, and much of it cannot be recreated, or restored, from another source, so I won't be able to nuke the pool, and rebuild it. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in zfs-linux package in Ubuntu: Incomplete Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30 [zfs] [25375.981052] zil_commit+0x48/0x70 [zfs] [25375.981280] zfs_write+0xa3c/0xb90 [zfs] [25375.981498] zpl_write_common_iovec+0xac/0x120 [zfs] [25375.981726] zpl_iter_write+0xe4/0x150 [zfs] [25375.981766] new_sync_write+0x100/0x1a8 [25375.981776] __vfs_write+0x74/0x90 [25375.981784] vfs
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
Also receive this, when trying to archive a large directory: [ 2055.147509] INFO: task z_wr_iss_h:2169 blocked for more than 120 seconds. [ 2055.154450] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [ 2055.161401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2055.169403] z_wr_iss_h D0 2169 2 0x0028 [ 2055.169420] Call trace: [ 2055.169441] __switch_to+0x104/0x170 [ 2055.169453] __schedule+0x30c/0x7c0 [ 2055.169462] schedule+0x3c/0xb8 [ 2055.169470] io_schedule+0x20/0x58 [ 2055.169483] rq_qos_wait+0x100/0x178 [ 2055.169492] wbt_wait+0xb4/0xf0 [ 2055.169502] __rq_qos_throttle+0x38/0x50 [ 2055.169514] blk_mq_make_request+0x128/0x610 [ 2055.169526] generic_make_request+0xb4/0x2d8 [ 2055.169537] submit_bio+0x48/0x218 [ 2055.169779] vdev_disk_io_start+0x670/0x9f8 [zfs] [ 2055.170005] zio_vdev_io_start+0xdc/0x2b8 [zfs] [ 2055.170229] zio_nowait+0xd4/0x170 [zfs] [ 2055.170451] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [ 2055.170673] zio_vdev_io_start+0x248/0x2b8 [zfs] [ 2055.170896] zio_execute+0xac/0x110 [zfs] [ 2055.170937] taskq_thread+0x2f8/0x570 [spl] [ 2055.170950] kthread+0xfc/0x128 [ 2055.170960] ret_from_fork+0x10/0x1c [ 2055.170990] INFO: task txg_quiesce:2380 blocked for more than 120 seconds. [ 2055.178025] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [ 2055.184958] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2055.192949] txg_quiesce D0 2380 2 0x0028 [ 2055.192963] Call trace: [ 2055.192984] __switch_to+0x104/0x170 [ 2055.192995] __schedule+0x30c/0x7c0 [ 2055.193003] schedule+0x3c/0xb8 [ 2055.193046] cv_wait_common+0x188/0x1b0 [spl] [ 2055.193082] __cv_wait+0x30/0x40 [spl] [ 2055.193310] txg_quiesce_thread+0x27c/0x380 [zfs] [ 2055.193349] thread_generic_wrapper+0x74/0xa0 [spl] [ 2055.193370] kthread+0xfc/0x128 [ 2055.193381] ret_from_fork+0x10/0x1c [ 2055.193438] INFO: task smbd:3662 blocked for more than 120 seconds. [ 2055.199853] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [ 2055.206784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2055.214772] smbdD0 3662 3498 0x [ 2055.214786] Call trace: [ 2055.214800] __switch_to+0x104/0x170 [ 2055.214810] __schedule+0x30c/0x7c0 [ 2055.214818] schedule+0x3c/0xb8 [ 2055.214856] cv_wait_common+0x188/0x1b0 [spl] [ 2055.214892] __cv_wait+0x30/0x40 [spl] [ 2055.215121] zil_commit_impl+0x234/0xd30 [zfs] [ 2055.215344] zil_commit+0x48/0x70 [zfs] [ 2055.215567] zfs_read+0x2f8/0x418 [zfs] [ 2055.215831] zpl_read_common_iovec+0xa8/0x110 [zfs] [ 2055.216065] zpl_read_common+0x4c/0x70 [zfs] [ 2055.216288] zpl_xattr_get_dir+0x10c/0x128 [zfs] [ 2055.216519] zpl_xattr_get+0xcc/0x1a8 [zfs] [ 2055.216742] zpl_xattr_user_get+0x68/0x90 [zfs] [ 2055.216764] __vfs_getxattr+0x60/0x80 [ 2055.216774] vfs_getxattr+0x140/0x160 [ 2055.216784] getxattr+0x9c/0x2c0 [ 2055.216794] path_getxattr+0x7c/0xd0 [ 2055.216805] __arm64_sys_getxattr+0x28/0x38 [ 2055.216815] el0_svc_common.constprop.0+0x80/0x218 [ 2055.216823] el0_svc_handler+0x34/0xa0 [ 2055.216833] el0_svc+0x10/0x2cc [ 2055.216858] INFO: task borg:5639 blocked for more than 120 seconds. [ 2055.223265] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [ 2055.230193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2055.238179] borgD0 5639 4001 0x [ 2055.238193] Call trace: [ 2055.238208] __switch_to+0x104/0x170 [ 2055.238218] __schedule+0x30c/0x7c0 [ 2055.238226] schedule+0x3c/0xb8 [ 2055.238267] cv_wait_common+0x188/0x1b0 [spl] [ 2055.238303] __cv_wait+0x30/0x40 [spl] [ 2055.238530] zil_commit_impl+0x234/0xd30 [zfs] [ 2055.238752] zil_commit+0x48/0x70 [zfs] [ 2055.238975] zfs_write+0xa3c/0xb90 [zfs] [ 2055.239197] zpl_write_common_iovec+0xac/0x120 [zfs] [ 2055.239419] zpl_iter_write+0xe4/0x150 [zfs] [ 2055.239471] new_sync_write+0x100/0x1a8 [ 2055.239480] __vfs_write+0x74/0x90 [ 2055.239489] vfs_write+0xe4/0x1c8 [ 2055.239505] ksys_write+0x78/0x100 [ 2055.239514] __arm64_sys_write+0x24/0x30 [ 2055.239524] el0_svc_common.constprop.0+0x80/0x218 [ 2055.239532] el0_svc_handler+0x34/0xa0 [ 2055.239541] el0_svc+0x10/0x2cc [ 2055.239555] INFO: task kworker/u8:3:35327 blocked for more than 120 seconds. [ 2055.246754] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [ 2055.253681] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2055.261670] kworker/u8:3D0 35327 2 0x0028 [ 2055.261715] Workqueue: writeback wb_workfn (flush-zfs-1) [ 2055.261728] Call trace: [ 2055.261740] __switch_to+0x104/0x170 [ 2055.261750] __schedule+0x30c/0x7c0 [ 2055.261758] schedule+0x3c/0xb8 [ 2055.261798] cv_wait_common+0x188/0x1b0 [spl] [ 2055.261834] __cv_wait+0x30/0x40 [spl] [ 2055.262059] zil_commit_impl+0x234/0xd30 [zfs] [ 2055.262284] zil_commit+0x48/0x70 [zfs] [
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
With deduplication disabled, it eventually gets further with the backup, but still stalls, after leaving it, overnight: [12084.274242] INFO: task z_wr_iss_h:2157 blocked for more than 120 seconds. [12084.281171] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [12084.288126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.296127] z_wr_iss_h D0 2157 2 0x0028 [12084.296142] Call trace: [12084.296163] __switch_to+0x104/0x170 [12084.296174] __schedule+0x30c/0x7c0 [12084.296183] schedule+0x3c/0xb8 [12084.296191] io_schedule+0x20/0x58 [12084.296204] rq_qos_wait+0x100/0x178 [12084.296213] wbt_wait+0xb4/0xf0 [12084.296223] __rq_qos_throttle+0x38/0x50 [12084.296235] blk_mq_make_request+0x128/0x610 [12084.296247] generic_make_request+0xb4/0x2d8 [12084.296258] submit_bio+0x48/0x218 [12084.296496] vdev_disk_io_start+0x670/0x9f8 [zfs] [12084.296716] zio_vdev_io_start+0xdc/0x2b8 [zfs] [12084.296935] zio_nowait+0xd4/0x170 [zfs] [12084.297151] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [12084.297369] zio_vdev_io_start+0x248/0x2b8 [zfs] [12084.297586] zio_execute+0xac/0x110 [zfs] [12084.297626] taskq_thread+0x2f8/0x570 [spl] [12084.297639] kthread+0xfc/0x128 [12084.297650] ret_from_fork+0x10/0x1c [12084.297678] INFO: task txg_quiesce:3077 blocked for more than 120 seconds. [12084.304712] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [12084.311641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.319632] txg_quiesce D0 3077 2 0x0028 [12084.319646] Call trace: [12084.319668] __switch_to+0x104/0x170 [12084.319679] __schedule+0x30c/0x7c0 [12084.319691] schedule+0x3c/0xb8 [12084.319731] cv_wait_common+0x188/0x1b0 [spl] [12084.319767] __cv_wait+0x30/0x40 [spl] [12084.319990] txg_quiesce_thread+0x27c/0x380 [zfs] [12084.320028] thread_generic_wrapper+0x74/0xa0 [spl] [12084.320050] kthread+0xfc/0x128 [12084.320062] ret_from_fork+0x10/0x1c [12084.320133] INFO: task borg:13631 blocked for more than 120 seconds. [12084.326632] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [12084.333562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.341550] borgD0 13631 13630 0x [12084.341564] Call trace: [12084.341578] __switch_to+0x104/0x170 [12084.341588] __schedule+0x30c/0x7c0 [12084.341596] schedule+0x3c/0xb8 [12084.341635] cv_wait_common+0x188/0x1b0 [spl] [12084.341670] __cv_wait+0x30/0x40 [spl] [12084.341895] zil_commit_impl+0x234/0xd30 [zfs] [12084.342112] zil_commit+0x48/0x70 [zfs] [12084.342331] zfs_write+0xa3c/0xb90 [zfs] [12084.342588] zpl_write_common_iovec+0xac/0x120 [zfs] [12084.342817] zpl_iter_write+0xe4/0x150 [zfs] [12084.342830] new_sync_write+0x100/0x1a8 [12084.342839] __vfs_write+0x74/0x90 [12084.342855] vfs_write+0xe4/0x1c8 [12084.342865] ksys_write+0x78/0x100 [12084.342874] __arm64_sys_write+0x24/0x30 [12084.342884] el0_svc_common.constprop.0+0x80/0x218 [12084.342892] el0_svc_handler+0x34/0xa0 [12084.342902] el0_svc+0x10/0x2cc [12084.342927] INFO: task z_wr_iss_h:114156 blocked for more than 120 seconds. [12084.350038] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [12084.356966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.364950] z_wr_iss_h D0 114156 2 0x0028 [12084.364964] Call trace: [12084.364981] __switch_to+0x104/0x170 [12084.364990] __schedule+0x30c/0x7c0 [12084.364998] schedule+0x3c/0xb8 [12084.365006] io_schedule+0x20/0x58 [12084.365019] rq_qos_wait+0x100/0x178 [12084.365027] wbt_wait+0xb4/0xf0 [12084.365037] __rq_qos_throttle+0x38/0x50 [12084.365050] blk_mq_make_request+0x128/0x610 [12084.365061] generic_make_request+0xb4/0x2d8 [12084.365072] submit_bio+0x48/0x218 [12084.365297] vdev_disk_io_start+0x670/0x9f8 [zfs] [12084.365516] zio_vdev_io_start+0xdc/0x2b8 [zfs] [12084.365733] zio_nowait+0xd4/0x170 [zfs] [12084.365949] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [12084.366167] zio_vdev_io_start+0x248/0x2b8 [zfs] [12084.366427] zio_execute+0xac/0x110 [zfs] [12084.366476] taskq_thread+0x2f8/0x570 [spl] [12084.366491] kthread+0xfc/0x128 [12084.366502] ret_from_fork+0x10/0x1c [12084.366514] INFO: task z_wr_iss_h:114161 blocked for more than 120 seconds. [12084.373624] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [12084.380553] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.388541] z_wr_iss_h D0 114161 2 0x0028 [12084.388555] Call trace: [12084.388570] __switch_to+0x104/0x170 [12084.388580] __schedule+0x30c/0x7c0 [12084.388588] schedule+0x3c/0xb8 [12084.388596] io_schedule+0x20/0x58 [12084.388608] rq_qos_wait+0x100/0x178 [12084.388616] wbt_wait+0xb4/0xf0 [12084.388627] __rq_qos_throttle+0x38/0x50 [12084.388638] blk_mq_make_request+0x128/0x610 [12084.388650] generic_make_request+0xb4/0x2d8 [12084.388660] submit_bio+
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
In the meantime, I'll see if I can temporarily disable ZPool-level deduplication, and retry the backup run, again, on the Windows machine. Whilst it's not the perfect long-term solution, I might look into using offline deduplication, for older, infrequently-accessed data in the pool (probably with Borg/ZBackup), since this machine is mostly-used for bulk storage of archival/back-up data, and file/HTTP serving, in a small network. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in zfs-linux package in Ubuntu: Incomplete Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30 [zfs] [25375.981052] zil_commit+0x48/0x70 [zfs] [25375.981280] zfs_write+0xa3c/0xb90 [zfs] [2
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
It looks like writing snapshots, with "zfs snapshot" will sometimes stall, even if other commands, like "zfs status" appear to work, occasionally, too. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in zfs-linux package in Ubuntu: Incomplete Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30 [zfs] [25375.981052] zil_commit+0x48/0x70 [zfs] [25375.981280] zfs_write+0xa3c/0xb90 [zfs] [25375.981498] zpl_write_common_iovec+0xac/0x120 [zfs] [25375.981726] zpl_iter_write+0xe4/0x150 [zfs] [25375.981766] new_sync_write+0x100/0x1a8 [25375.981776] __vfs_write+0x74/0x90 [25375.981784] vfs_write+0xe4/0x1c8 [25375.981794] ksys_write+0x78/0x100 [25375.981803
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
Thanks for your reply (I expected this to go into the ether, since it seems to be a very common issue, with I/O, on this hardware platform). I did try switching from LZJB, to LZ4, and got slightly-better performance, and reliability, from at least testing with backing up a Windows 10 machine, using WSL, and Borg, over SSH - however, I can still lock the system up, after several runs, or get it to a stage where I can send commands, over an open SSH session, but spawning a new one fails, since it can't access my home directory (which is stored on the ZFS array). My last log, with Borg, is: [ 22.536219] bcmgenet: Skipping UMAC reset [ 22.537803] bcmgenet fd58.ethernet: configuring instance for external RGMII [ 22.538029] bcmgenet fd58.ethernet eth0: Link is Down [ 25.140217] new mount options do not match the existing superblock, will be ignored [ 26.751285] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled [ 27.651756] bcmgenet fd58.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 27.651790] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 29.696209] Started bpfilter [ 29.701240] bpfilter: Loaded bpfilter_umh pid 3166 [ 29.858264] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this. [ 30.379027] kauditd_printk_skb: 27 callbacks suppressed [ 30.379032] audit: type=1400 audit(1602433969.316:39): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/bin/lxc-start" pid=3259 comm="apparmor_parser" [ 30.457589] audit: type=1400 audit(1602433969.396:40): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default" pid=3266 comm="apparmor_parser" [ 30.457603] audit: type=1400 audit(1602433969.396:41): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-cgns" pid=3266 comm="apparmor_parser" [ 30.457620] audit: type=1400 audit(1602433969.396:42): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-mounting" pid=3266 comm="apparmor_parser" [ 30.457629] audit: type=1400 audit(1602433969.396:43): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-nesting" pid=3266 comm="apparmor_parser" [ 36.556387] Bluetooth: BNEP (Ethernet Emulation) ver 1.3 [ 36.556393] Bluetooth: BNEP filters: protocol multicast [ 36.556407] Bluetooth: BNEP socket layer initialized [ 74.720044] Adding 16777212k swap on /dev/zd0. Priority:-2 extents:1 across:16777212k SSFS [10030.116676] INFO: task z_rd_int:2154 blocked for more than 120 seconds. [10030.123432] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [10030.130389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10030.138388] z_rd_intD0 2154 2 0x0028 [10030.138403] Call trace: [10030.138424] __switch_to+0x104/0x170 [10030.138435] __schedule+0x30c/0x7c0 [10030.138444] schedule+0x3c/0xb8 [10030.138453] io_schedule+0x20/0x58 [10030.138466] rq_qos_wait+0x100/0x178 [10030.138475] wbt_wait+0xb4/0xf0 [10030.138485] __rq_qos_throttle+0x38/0x50 [10030.138497] blk_mq_make_request+0x128/0x610 [10030.138509] generic_make_request+0xb4/0x2d8 [10030.138519] submit_bio+0x48/0x218 [10030.138756] vdev_disk_io_start+0x670/0x9f8 [zfs] [10030.138977] zio_vdev_io_start+0xdc/0x2b8 [zfs] [10030.139197] zio_execute+0xac/0x110 [zfs] [10030.139414] vdev_queue_io_done+0x18c/0x2a0 [zfs] [10030.139632] zio_vdev_io_done+0xec/0x220 [zfs] [10030.139849] zio_execute+0xac/0x110 [zfs] [10030.139891] taskq_thread+0x2f8/0x570 [spl] [10030.139904] kthread+0xfc/0x128 [10030.139915] ret_from_fork+0x10/0x1c [10030.139928] INFO: task z_rd_int:2158 blocked for more than 120 seconds. [10030.146692] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [10030.153619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10030.161608] z_rd_intD0 2158 2 0x0028 [10030.161622] Call trace: [10030.161643] __switch_to+0x104/0x170 [10030.161655] __schedule+0x30c/0x7c0 [10030.161663] schedule+0x3c/0xb8 [10030.161673] io_schedule+0x20/0x58 [10030.161686] rq_qos_wait+0x100/0x178 [10030.161694] wbt_wait+0xb4/0xf0 [10030.161705] __rq_qos_throttle+0x38/0x50 [10030.161717] blk_mq_make_request+0x128/0x610 [10030.161729] generic_make_request+0xb4/0x2d8 [10030.161739] submit_bio+0x48/0x218 [10030.161965] vdev_disk_io_start+0x670/0x9f8 [zfs] [10030.162186] zio_vdev_io_start+0xdc/0x2b8 [zfs] [10030.162416] zio_nowait+0xd4/0x170 [zfs] [10030.162633] vdev_queue_io_done+0x1ec/0x2a0 [zfs] [10030.162859] zio_vdev_io_done+0xec/0x220 [zfs
[Kernel-packages] [Bug 1899249] [NEW] OpenZFS writing stalls, under load
Public bug reported: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30 [zfs] [25375.981052] zil_commit+0x48/0x70 [zfs] [25375.981280] zfs_write+0xa3c/0xb90 [zfs] [25375.981498] zpl_write_common_iovec+0xac/0x120 [zfs] [25375.981726] zpl_iter_write+0xe4/0x150 [zfs] [25375.981766] new_sync_write+0x100/0x1a8 [25375.981776] __vfs_write+0x74/0x90 [25375.981784] vfs_write+0xe4/0x1c8 [25375.981794] ksys_write+0x78/0x100 [25375.981803] __arm64_sys_write+0x24/0x30 [25375.981813] el0_svc_common.constprop.0+0x80/0x218 [25375.981821] el0_svc_handler+0x34/0xa0 [25375.981830] el0_svc+0x10/0x2cc [25375.981842] INFO: task zbackup:95188 blocked for more than 120 seconds. [25375.988615] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.995543] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25376.003532] zbackup D0 95188 5499 0x0008 [25376.003545] Call trace: [25376.003561] __switch_to+0x104/0x170 [25376.003571] __schedule+0x30c/0x7c0 [25376.003579] schedule+