[Bug 1899249] Re: OpenZFS writing stalls, under load
** Changed in: zfs-linux (Ubuntu) Assignee: Colin Ian King (colin-king) => (unassigned) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
OK, I'll close this bug for now. If it still bites please feel free to re-open the bug and I'll get back onto it. ** Changed in: zfs-linux (Ubuntu) Status: Incomplete => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Thanks, Colin. I intend to retest, with 20.10, but for now, with deduplication disabled, things seem stable. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
@Tyson, hopefully the WBT changes helped to resolve this issue. If so, please let me know and I can close this bug. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
That warning message is from the Raspberry Pi firmware broadcom get_property sys interface, some user space program has read an old deprecated sys interface and is just warning to use the hwmon sysfs interface and it has no bearing on the ZFS or block WBT settings. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Curiously, it worked fine, for quite some time, but I did manage to get this crash, with TAR, earlier: [0.00] Booting Linux on physical CPU 0x00 [0x410fd083] [0.00] Linux version 5.4.0-1018-raspi (buildd@bos02-arm64-052) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 2020 (Ubuntu 5.4.0-1018.20-raspi 5.4.55) [0.00] Machine model: Raspberry Pi 4 Model B Rev 1.4 [0.00] efi: Getting EFI parameters from FDT: [0.00] efi: UEFI not found. [0.00] Reserved memory: created CMA memory pool at 0x2c00, size 64 MiB [0.00] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool [0.00] On node 0 totalpages: 2061312 [0.00] DMA zone: 3792 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 242688 pages, LIFO batch:63 [0.00] DMA32 zone: 12288 pages used for memmap [0.00] DMA32 zone: 770048 pages, LIFO batch:63 [0.00] Normal zone: 16384 pages used for memmap [0.00] Normal zone: 1048576 pages, LIFO batch:63 [0.00] percpu: Embedded 32 pages/cpu s92120 r8192 d30760 u131072 [0.00] pcpu-alloc: s92120 r8192 d30760 u131072 alloc=32*4096 [0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0.00] Detected PIPT I-cache on CPU0 [0.00] CPU features: detected: EL2 vector hardening [0.00] CPU features: kernel page table isolation forced ON by KASLR [0.00] CPU features: detected: Kernel page table isolation (KPTI) [0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware [0.00] Built 1 zonelists, mobility grouping on. Total pages: 2028848 [0.00] Kernel command line: coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 snd_bcm2835.enable_headphones=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:B8:1A:28 vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000 net.ifnames=0 dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash [0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off [0.00] software IO TLB: mapped [mem 0x3740-0x3b40] (64MB) [0.00] Memory: 7898140K/8245248K available (11772K kernel code, 1236K rwdata, 4244K rodata, 6144K init, 1076K bss, 281572K reserved, 65536K cma-reserved) [0.00] random: get_random_u64 called from kmem_cache_open+0x38/0x3e0 with crng_init=0 [0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [0.00] ftrace: allocating 39001 entries in 153 pages [0.00] rcu: Hierarchical RCU implementation. [0.00] Tasks RCU enabled. [0.00] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [0.00] GIC: Using split EOI/Deactivate mode [0.00] arch_timer: cp15 timer(s) running at 54.00MHz (phys). [0.00] clocksource: arch_sys_counter: mask: 0xff max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns [0.05] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 4398046511102ns [0.000200] Console: colour dummy device 80x25 [0.000221] printk: console [tty1] enabled [0.000273] Calibrating delay loop (skipped), value calculated using timer frequency.. 108.00 BogoMIPS (lpj=216000) [0.000288] pid_max: default: 32768 minimum: 301 [0.000495] LSM: Security Framework initializing [0.000543] Yama: becoming mindful. [0.000631] AppArmor: AppArmor initialized [0.000875] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [0.000961] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [0.001026] *** VALIDATE tmpfs *** [0.001632] *** VALIDATE proc *** [0.002150] Disabling memory control group subsystem [0.002218] *** VALIDATE cgroup1 *** [0.002228] *** VALIDATE cgroup2 *** [0.003429] ASID allocator initialised with 32768 entries [0.003591] rcu: Hierarchical SRCU implementation. [0.005624] EFI services will not be available. [0.006427] smp: Bringing up secondary CPUs ... [0.007280] Detected PIPT I-cache on CPU1 [0.007335] CPU1: Booted secondary processor 0x01 [0x410fd083] [0.008377] Detected PIPT I-cache on CPU2 [0.008415] CPU2: Booted secondary processor 0x02 [0x410fd083] [0.009370] Detected PIPT I-cache on CPU3 [0.009406] CPU3: Booted secondary processor 0x03 [0x410fd083] [0.009546] smp: Brought up 1 node, 4 CPUs [0.009555] SMP: Total of 4 processors activated. [0.009564] CPU features:
[Bug 1899249] Re: OpenZFS writing stalls, under load
That's great news. Let's keep this bug open for the moment as "incomplete" and if you don't report back after ~6 weeks or so it will automatically be closed. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
And, after 18 hours+, I was able to archive a ZFS snapshot, using BorgBackup, with the WBT settings, suggested, earlier: tyson@ubuntu:~$ sudo zfs send Yaesu@Crucial-2TB-1951E22FA633 | borg create --stats BorgStore::Yaesu@Crucial-2TB-1951E22FA633 - ^[[A^[[A-- Archive name: Yaesu@Crucial-2TB-1951E22FA633 Archive fingerprint: 32d64e2a13eb543e4846cc34ac2c3609fe192127e3bf6d324a0fa9d61b988c19 Time (start): Tue, 2020-10-13 15:49:51 Time (end): Wed, 2020-10-14 10:01:46 Duration: 18 hours 11 minutes 54.87 seconds Number of files: 1 Utilization of max. archive size: 0% -- Original size Compressed sizeDeduplicated size This archive:1.89 TB 1.75 TB617.36 GB All archives:7.44 TB 5.87 TB 1.99 TB Unique chunks Total chunks Chunk index: 842436 2607257 -- The "dmesg" log is clean, too: tyson@ubuntu:~$ dmesg [0.00] Booting Linux on physical CPU 0x00 [0x410fd083] [0.00] Linux version 5.4.0-1018-raspi (buildd@bos02-arm64-052) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 2020 (Ubuntu 5.4.0-1018.20-raspi 5.4.55) [0.00] Machine model: Raspberry Pi 4 Model B Rev 1.4 [0.00] efi: Getting EFI parameters from FDT: [0.00] efi: UEFI not found. [0.00] Reserved memory: created CMA memory pool at 0x2c00, size 64 MiB [0.00] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool [0.00] On node 0 totalpages: 2061312 [0.00] DMA zone: 3792 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 242688 pages, LIFO batch:63 [0.00] DMA32 zone: 12288 pages used for memmap [0.00] DMA32 zone: 770048 pages, LIFO batch:63 [0.00] Normal zone: 16384 pages used for memmap [0.00] Normal zone: 1048576 pages, LIFO batch:63 [0.00] percpu: Embedded 32 pages/cpu s92120 r8192 d30760 u131072 [0.00] pcpu-alloc: s92120 r8192 d30760 u131072 alloc=32*4096 [0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0.00] Detected PIPT I-cache on CPU0 [0.00] CPU features: detected: EL2 vector hardening [0.00] CPU features: kernel page table isolation forced ON by KASLR [0.00] CPU features: detected: Kernel page table isolation (KPTI) [0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware [0.00] Built 1 zonelists, mobility grouping on. Total pages: 2028848 [0.00] Kernel command line: coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 snd_bcm2835.enable_headphones=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:B8:1A:28 vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000 net.ifnames=0 dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash [0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off [0.00] software IO TLB: mapped [mem 0x3740-0x3b40] (64MB) [0.00] Memory: 7898140K/8245248K available (11772K kernel code, 1236K rwdata, 4244K rodata, 6144K init, 1076K bss, 281572K reserved, 65536K cma-reserved) [0.00] random: get_random_u64 called from kmem_cache_open+0x38/0x3e0 with crng_init=0 [0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [0.00] ftrace: allocating 39001 entries in 153 pages [0.00] rcu: Hierarchical RCU implementation. [0.00] Tasks RCU enabled. [0.00] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [0.00] GIC: Using split EOI/Deactivate mode [0.00] arch_timer: cp15 timer(s) running at 54.00MHz (phys). [0.00] clocksource: arch_sys_counter: mask: 0xff max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns [0.05] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 4398046511102ns [0.000200] Console: colour dummy device 80x25 [0.000221] printk: console [tty1] enabled [0.000273] Calibrating delay loop (skipped), value calculated using timer frequency.. 108.00 BogoMIPS (lpj=216000) [0.000288] pid_max: default: 32768 minimum: 301 [0.000495] LSM: Security Framework initializing [0.000543] Yama: becoming mindful. [0.000631] AppArmor: AppArmor
[Bug 1899249] Re: OpenZFS writing stalls, under load
It may also be a good idea to disable any power management on USB too. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Needs more time, in terms of long-running I/O, but changing the WBT setting at least makes retrieving snapshots a little faster, with "zfs list -t all". -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
All of the drives in the pool are SSDs (2x Crucial BX500 (2TB), 1x SanDisk Ultra 3D (4TB), 1x Samsung 870 QVO (4TB), but obviously don't perfectly-align, in terms of performance characteristics - but, I'll test the WBT tuneable, to see if it makes a difference. I'm also suspecting that some power management weirdness, affecting waking up USB devices, is at play, on this machine, but it's a Ubuntu Server installation, without any special GUI, or Raspberry Pi-specific components installed. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Set the WBT value to 0, for all of the devices, whilst another archive run takes place. The last one ran for just over 20 hours, before hitting the I/O problem. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
I'm using 16GB of swap, on a ZVOL, in the pool, but I can also test with swap, on the internal MicroSD card's EXT4 root partition, if it helps. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Here's a full "dmesg" log, if it helps: tyson@ubuntu:~$ dmesg [0.00] Booting Linux on physical CPU 0x00 [0x410fd083] [0.00] Linux version 5.4.0-1018-raspi (buildd@bos02-arm64-052) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 2020 (Ubuntu 5.4.0-1018.20-raspi 5.4.55) [0.00] Machine model: Raspberry Pi 4 Model B Rev 1.4 [0.00] efi: Getting EFI parameters from FDT: [0.00] efi: UEFI not found. [0.00] Reserved memory: created CMA memory pool at 0x2c00, size 64 MiB [0.00] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool [0.00] On node 0 totalpages: 2061312 [0.00] DMA zone: 3792 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 242688 pages, LIFO batch:63 [0.00] DMA32 zone: 12288 pages used for memmap [0.00] DMA32 zone: 770048 pages, LIFO batch:63 [0.00] Normal zone: 16384 pages used for memmap [0.00] Normal zone: 1048576 pages, LIFO batch:63 [0.00] percpu: Embedded 32 pages/cpu s92120 r8192 d30760 u131072 [0.00] pcpu-alloc: s92120 r8192 d30760 u131072 alloc=32*4096 [0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0.00] Detected PIPT I-cache on CPU0 [0.00] CPU features: detected: EL2 vector hardening [0.00] CPU features: kernel page table isolation forced ON by KASLR [0.00] CPU features: detected: Kernel page table isolation (KPTI) [0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware [0.00] Built 1 zonelists, mobility grouping on. Total pages: 2028848 [0.00] Kernel command line: coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 snd_bcm2835.enable_headphones=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:B8:1A:28 vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000 net.ifnames=0 dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash [0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off [0.00] software IO TLB: mapped [mem 0x3740-0x3b40] (64MB) [0.00] Memory: 7898136K/8245248K available (11772K kernel code, 1236K rwdata, 4244K rodata, 6144K init, 1076K bss, 281576K reserved, 65536K cma-reserved) [0.00] random: get_random_u64 called from kmem_cache_open+0x38/0x3e0 with crng_init=0 [0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [0.00] ftrace: allocating 39001 entries in 153 pages [0.00] rcu: Hierarchical RCU implementation. [0.00] Tasks RCU enabled. [0.00] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [0.00] GIC: Using split EOI/Deactivate mode [0.00] arch_timer: cp15 timer(s) running at 54.00MHz (phys). [0.00] clocksource: arch_sys_counter: mask: 0xff max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns [0.05] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 4398046511102ns [0.000200] Console: colour dummy device 80x25 [0.000221] printk: console [tty1] enabled [0.000273] Calibrating delay loop (skipped), value calculated using timer frequency.. 108.00 BogoMIPS (lpj=216000) [0.000287] pid_max: default: 32768 minimum: 301 [0.000491] LSM: Security Framework initializing [0.000540] Yama: becoming mindful. [0.000629] AppArmor: AppArmor initialized [0.000874] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [0.000961] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [0.001028] *** VALIDATE tmpfs *** [0.001634] *** VALIDATE proc *** [0.002155] Disabling memory control group subsystem [0.002224] *** VALIDATE cgroup1 *** [0.002234] *** VALIDATE cgroup2 *** [0.003438] ASID allocator initialised with 32768 entries [0.003601] rcu: Hierarchical SRCU implementation. [0.005657] EFI services will not be available. [0.006469] smp: Bringing up secondary CPUs ... [0.007318] Detected PIPT I-cache on CPU1 [0.007371] CPU1: Booted secondary processor 0x01 [0x410fd083] [0.008416] Detected PIPT I-cache on CPU2 [0.008454] CPU2: Booted secondary processor 0x02 [0x410fd083] [0.009407] Detected PIPT I-cache on CPU3 [0.009444] CPU3: Booted secondary processor 0x03 [0x410fd083] [0.009586] smp: Brought up 1 node, 4 CPUs [0.009594] SMP: Total of 4 processors activated. [0.009604] CPU features: detected: 32-bit EL0 Support [0.009613] CPU
[Bug 1899249] Re: OpenZFS writing stalls, under load
So it may be that the write-back-throttling (wbt) for the underlying devices is getting confused about the exact throttle rates are for these devices and somehow getting stuck. It maybe worth experimenting by disabling the throttling and seeing if this gets I/O working again. For example, to disable wbt for a device /dev/sda use: echo 0 | sudo tee /sys/block/sda/queue/wbt_lat_usec and if you need to reset it back to the default: echo -1 | sudo tee /sys/block/sda/queue/wbt_lat_usec ..use the appropriate block device name for the block devices you have attached. It may even be worth setting the wbt_lat_usec to 0 for all the block devices in your pool as early as possible after boot and see if this helps. My hunch is that wbt is not set correctly for the attached non-spinny devices you have and it is assuming they are reasonably fast when in fact it may be rate limited to be slower than expected because they are attached to relatively slow host. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
..and does you raspi have swap enabled? ** Bug watch added: github.com/openzfs/zfs/issues #10522 https://github.com/openzfs/zfs/issues/10522 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Same issue as reported here: https://github.com/openzfs/zfs/issues/10522 ** Also affects: zfs via https://github.com/openzfs/zfs/issues/10522 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Is it possible to have the full dmesg from the start of where you boot to the point where you see the "INFO... blocked for more than 120 seconds" message? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Looks like I could archive about 1.3TB, of a 1.64TB snapshot, before things started to go bad, again: [110079.681102] INFO: task z_wr_iss_h:2171 blocked for more than 120 seconds. [110079.688123] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [110079.695167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110079.703260] z_wr_iss_h D0 2171 2 0x0028 [110079.703276] Call trace: [110079.703297] __switch_to+0x104/0x170 [110079.703308] __schedule+0x30c/0x7c0 [110079.703317] schedule+0x3c/0xb8 [110079.703325] io_schedule+0x20/0x58 [110079.703338] rq_qos_wait+0x100/0x178 [110079.703347] wbt_wait+0xb4/0xf0 [110079.703357] __rq_qos_throttle+0x38/0x50 [110079.703370] blk_mq_make_request+0x128/0x610 [110079.703382] generic_make_request+0xb4/0x2d8 [110079.703392] submit_bio+0x48/0x218 [110079.703629] vdev_disk_io_start+0x670/0x9f8 [zfs] [110079.703850] zio_vdev_io_start+0xdc/0x2b8 [zfs] [110079.704068] zio_nowait+0xd4/0x170 [zfs] [110079.704285] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [110079.704503] zio_vdev_io_start+0x248/0x2b8 [zfs] [110079.704720] zio_execute+0xac/0x110 [zfs] [110079.704760] taskq_thread+0x2f8/0x570 [spl] [110079.704773] kthread+0xfc/0x128 [110079.704784] ret_from_fork+0x10/0x1c [110079.704795] INFO: task z_wr_int:2173 blocked for more than 120 seconds. [110079.711650] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [110079.718668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110079.726745] z_wr_intD0 2173 2 0x0028 [110079.726760] Call trace: [110079.726780] __switch_to+0x104/0x170 [110079.726790] __schedule+0x30c/0x7c0 [110079.726803] schedule+0x3c/0xb8 [110079.726811] io_schedule+0x20/0x58 [110079.726823] rq_qos_wait+0x100/0x178 [110079.726832] wbt_wait+0xb4/0xf0 [110079.726842] __rq_qos_throttle+0x38/0x50 [110079.726854] blk_mq_make_request+0x128/0x610 [110079.726866] generic_make_request+0xb4/0x2d8 [110079.726877] submit_bio+0x48/0x218 [110079.727100] vdev_disk_io_start+0x670/0x9f8 [zfs] [110079.727320] zio_vdev_io_start+0xdc/0x2b8 [zfs] [110079.727549] zio_nowait+0xd4/0x170 [zfs] [110079.727766] vdev_queue_io_done+0x1ec/0x2a0 [zfs] [110079.727991] zio_vdev_io_done+0xec/0x220 [zfs] [110079.728209] zio_execute+0xac/0x110 [zfs] [110079.728255] taskq_thread+0x2f8/0x570 [spl] [110079.728268] kthread+0xfc/0x128 [110079.728279] ret_from_fork+0x10/0x1c [110079.728309] INFO: task txg_sync:2388 blocked for more than 120 seconds. [110079.735157] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [110079.742175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110079.750306] txg_syncD0 2388 2 0x0028 [110079.750321] Call trace: [110079.750337] __switch_to+0x104/0x170 [110079.750348] __schedule+0x30c/0x7c0 [110079.750356] schedule+0x3c/0xb8 [110079.750366] schedule_timeout+0x9c/0x190 [110079.750374] io_schedule_timeout+0x28/0x48 [110079.750412] __cv_timedwait_common+0x1a8/0x1f8 [spl] [110079.750447] __cv_timedwait_io+0x3c/0x50 [spl] [110079.750670] zio_wait+0x130/0x2a0 [zfs] [110079.750890] dsl_pool_sync+0x3fc/0x498 [zfs] [110079.751109] spa_sync+0x538/0xe68 [zfs] [110079.751326] txg_sync_thread+0x2c0/0x468 [zfs] [110079.751364] thread_generic_wrapper+0x74/0xa0 [spl] [110079.751377] kthread+0xfc/0x128 [110079.751388] ret_from_fork+0x10/0x1c [110079.751459] INFO: task borg:1063669 blocked for more than 120 seconds. [110079.758224] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [110079.765237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110079.773310] borgD0 1063669 1063661 0x [110079.773324] Call trace: [110079.773344] __switch_to+0x104/0x170 [110079.773355] __schedule+0x30c/0x7c0 [110079.773366] schedule+0x3c/0xb8 [110079.773406] cv_wait_common+0x188/0x1b0 [spl] [110079.773441] __cv_wait+0x30/0x40 [spl] [110079.773664] zil_commit_impl+0x234/0xd30 [zfs] [110079.773882] zil_commit+0x48/0x70 [zfs] [110079.774112] zfs_write+0xa3c/0xb90 [zfs] [110079.774331] zpl_write_common_iovec+0xac/0x120 [zfs] [110079.774557] zpl_iter_write+0xe4/0x150 [zfs] [110079.774569] new_sync_write+0x100/0x1a8 [110079.774578] __vfs_write+0x74/0x90 [110079.774593] vfs_write+0xe4/0x1c8 [110079.774603] ksys_write+0x78/0x100 [110079.774612] __arm64_sys_write+0x24/0x30 [110079.774622] el0_svc_common.constprop.0+0x80/0x218 [110079.774631] el0_svc_handler+0x34/0xa0 [110079.774640] el0_svc+0x10/0x2cc [110079.774665] INFO: task z_wr_iss_h:3853507 blocked for more than 120 seconds. [110079.781953] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [110079.788973] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110079.797052] z_wr_iss_h D0 3853507 2 0x0028 [110079.797067] Call trace: [110079.797083] __switch_to+0x104/0x170 [110079.797094] __schedule+0x30c/0x7c0
[Bug 1899249] Re: OpenZFS writing stalls, under load
Thanks. I was able to archive, and delete 400GB of data, without problems, earlier on, today, which reduced the "REFER" of my data set, a little - however, it looks like I probably need to focus on archiving, and removing some of the older snapshots, if I want to trim down the memory utilisation of my pool. When I arrived home, from work, "dmesg" didn't have anything interesting to say, in terms of panics, or errors - which seems promising, but it's still early days, yet. Seems a little like a "luxury problem", that I find myself with more storage space, and data, than available RAM, given that usually, the opposite situation is true... -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
You could shrink the DDT by making a copy of the files in place (with dedup off) and deleting the old file. That only requires enough extra space for a single file at a time. This assumes no snapshots. If you need to preserve snapshots, another option would be to send|recv a dataset at a time. If you have enough free space for a copy of the largest dataset, this would work. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Unfortunately, I haven't got a storage device large enough, to contain all of the data from the pool, and much of it cannot be recreated, or restored, from another source, so I won't be able to nuke the pool, and rebuild it. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Did you destroy and recreate the pool after disabling dedup? Otherwise you still have the same dedup table and haven’t really accomplished much. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Also receive this, when trying to archive a large directory: [ 2055.147509] INFO: task z_wr_iss_h:2169 blocked for more than 120 seconds. [ 2055.154450] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [ 2055.161401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2055.169403] z_wr_iss_h D0 2169 2 0x0028 [ 2055.169420] Call trace: [ 2055.169441] __switch_to+0x104/0x170 [ 2055.169453] __schedule+0x30c/0x7c0 [ 2055.169462] schedule+0x3c/0xb8 [ 2055.169470] io_schedule+0x20/0x58 [ 2055.169483] rq_qos_wait+0x100/0x178 [ 2055.169492] wbt_wait+0xb4/0xf0 [ 2055.169502] __rq_qos_throttle+0x38/0x50 [ 2055.169514] blk_mq_make_request+0x128/0x610 [ 2055.169526] generic_make_request+0xb4/0x2d8 [ 2055.169537] submit_bio+0x48/0x218 [ 2055.169779] vdev_disk_io_start+0x670/0x9f8 [zfs] [ 2055.170005] zio_vdev_io_start+0xdc/0x2b8 [zfs] [ 2055.170229] zio_nowait+0xd4/0x170 [zfs] [ 2055.170451] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [ 2055.170673] zio_vdev_io_start+0x248/0x2b8 [zfs] [ 2055.170896] zio_execute+0xac/0x110 [zfs] [ 2055.170937] taskq_thread+0x2f8/0x570 [spl] [ 2055.170950] kthread+0xfc/0x128 [ 2055.170960] ret_from_fork+0x10/0x1c [ 2055.170990] INFO: task txg_quiesce:2380 blocked for more than 120 seconds. [ 2055.178025] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [ 2055.184958] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2055.192949] txg_quiesce D0 2380 2 0x0028 [ 2055.192963] Call trace: [ 2055.192984] __switch_to+0x104/0x170 [ 2055.192995] __schedule+0x30c/0x7c0 [ 2055.193003] schedule+0x3c/0xb8 [ 2055.193046] cv_wait_common+0x188/0x1b0 [spl] [ 2055.193082] __cv_wait+0x30/0x40 [spl] [ 2055.193310] txg_quiesce_thread+0x27c/0x380 [zfs] [ 2055.193349] thread_generic_wrapper+0x74/0xa0 [spl] [ 2055.193370] kthread+0xfc/0x128 [ 2055.193381] ret_from_fork+0x10/0x1c [ 2055.193438] INFO: task smbd:3662 blocked for more than 120 seconds. [ 2055.199853] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [ 2055.206784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2055.214772] smbdD0 3662 3498 0x [ 2055.214786] Call trace: [ 2055.214800] __switch_to+0x104/0x170 [ 2055.214810] __schedule+0x30c/0x7c0 [ 2055.214818] schedule+0x3c/0xb8 [ 2055.214856] cv_wait_common+0x188/0x1b0 [spl] [ 2055.214892] __cv_wait+0x30/0x40 [spl] [ 2055.215121] zil_commit_impl+0x234/0xd30 [zfs] [ 2055.215344] zil_commit+0x48/0x70 [zfs] [ 2055.215567] zfs_read+0x2f8/0x418 [zfs] [ 2055.215831] zpl_read_common_iovec+0xa8/0x110 [zfs] [ 2055.216065] zpl_read_common+0x4c/0x70 [zfs] [ 2055.216288] zpl_xattr_get_dir+0x10c/0x128 [zfs] [ 2055.216519] zpl_xattr_get+0xcc/0x1a8 [zfs] [ 2055.216742] zpl_xattr_user_get+0x68/0x90 [zfs] [ 2055.216764] __vfs_getxattr+0x60/0x80 [ 2055.216774] vfs_getxattr+0x140/0x160 [ 2055.216784] getxattr+0x9c/0x2c0 [ 2055.216794] path_getxattr+0x7c/0xd0 [ 2055.216805] __arm64_sys_getxattr+0x28/0x38 [ 2055.216815] el0_svc_common.constprop.0+0x80/0x218 [ 2055.216823] el0_svc_handler+0x34/0xa0 [ 2055.216833] el0_svc+0x10/0x2cc [ 2055.216858] INFO: task borg:5639 blocked for more than 120 seconds. [ 2055.223265] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [ 2055.230193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2055.238179] borgD0 5639 4001 0x [ 2055.238193] Call trace: [ 2055.238208] __switch_to+0x104/0x170 [ 2055.238218] __schedule+0x30c/0x7c0 [ 2055.238226] schedule+0x3c/0xb8 [ 2055.238267] cv_wait_common+0x188/0x1b0 [spl] [ 2055.238303] __cv_wait+0x30/0x40 [spl] [ 2055.238530] zil_commit_impl+0x234/0xd30 [zfs] [ 2055.238752] zil_commit+0x48/0x70 [zfs] [ 2055.238975] zfs_write+0xa3c/0xb90 [zfs] [ 2055.239197] zpl_write_common_iovec+0xac/0x120 [zfs] [ 2055.239419] zpl_iter_write+0xe4/0x150 [zfs] [ 2055.239471] new_sync_write+0x100/0x1a8 [ 2055.239480] __vfs_write+0x74/0x90 [ 2055.239489] vfs_write+0xe4/0x1c8 [ 2055.239505] ksys_write+0x78/0x100 [ 2055.239514] __arm64_sys_write+0x24/0x30 [ 2055.239524] el0_svc_common.constprop.0+0x80/0x218 [ 2055.239532] el0_svc_handler+0x34/0xa0 [ 2055.239541] el0_svc+0x10/0x2cc [ 2055.239555] INFO: task kworker/u8:3:35327 blocked for more than 120 seconds. [ 2055.246754] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [ 2055.253681] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2055.261670] kworker/u8:3D0 35327 2 0x0028 [ 2055.261715] Workqueue: writeback wb_workfn (flush-zfs-1) [ 2055.261728] Call trace: [ 2055.261740] __switch_to+0x104/0x170 [ 2055.261750] __schedule+0x30c/0x7c0 [ 2055.261758] schedule+0x3c/0xb8 [ 2055.261798] cv_wait_common+0x188/0x1b0 [spl] [ 2055.261834] __cv_wait+0x30/0x40 [spl] [ 2055.262059] zil_commit_impl+0x234/0xd30 [zfs] [ 2055.262284] zil_commit+0x48/0x70 [zfs]
[Bug 1899249] Re: OpenZFS writing stalls, under load
With deduplication disabled, it eventually gets further with the backup, but still stalls, after leaving it, overnight: [12084.274242] INFO: task z_wr_iss_h:2157 blocked for more than 120 seconds. [12084.281171] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [12084.288126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.296127] z_wr_iss_h D0 2157 2 0x0028 [12084.296142] Call trace: [12084.296163] __switch_to+0x104/0x170 [12084.296174] __schedule+0x30c/0x7c0 [12084.296183] schedule+0x3c/0xb8 [12084.296191] io_schedule+0x20/0x58 [12084.296204] rq_qos_wait+0x100/0x178 [12084.296213] wbt_wait+0xb4/0xf0 [12084.296223] __rq_qos_throttle+0x38/0x50 [12084.296235] blk_mq_make_request+0x128/0x610 [12084.296247] generic_make_request+0xb4/0x2d8 [12084.296258] submit_bio+0x48/0x218 [12084.296496] vdev_disk_io_start+0x670/0x9f8 [zfs] [12084.296716] zio_vdev_io_start+0xdc/0x2b8 [zfs] [12084.296935] zio_nowait+0xd4/0x170 [zfs] [12084.297151] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [12084.297369] zio_vdev_io_start+0x248/0x2b8 [zfs] [12084.297586] zio_execute+0xac/0x110 [zfs] [12084.297626] taskq_thread+0x2f8/0x570 [spl] [12084.297639] kthread+0xfc/0x128 [12084.297650] ret_from_fork+0x10/0x1c [12084.297678] INFO: task txg_quiesce:3077 blocked for more than 120 seconds. [12084.304712] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [12084.311641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.319632] txg_quiesce D0 3077 2 0x0028 [12084.319646] Call trace: [12084.319668] __switch_to+0x104/0x170 [12084.319679] __schedule+0x30c/0x7c0 [12084.319691] schedule+0x3c/0xb8 [12084.319731] cv_wait_common+0x188/0x1b0 [spl] [12084.319767] __cv_wait+0x30/0x40 [spl] [12084.319990] txg_quiesce_thread+0x27c/0x380 [zfs] [12084.320028] thread_generic_wrapper+0x74/0xa0 [spl] [12084.320050] kthread+0xfc/0x128 [12084.320062] ret_from_fork+0x10/0x1c [12084.320133] INFO: task borg:13631 blocked for more than 120 seconds. [12084.326632] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [12084.333562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.341550] borgD0 13631 13630 0x [12084.341564] Call trace: [12084.341578] __switch_to+0x104/0x170 [12084.341588] __schedule+0x30c/0x7c0 [12084.341596] schedule+0x3c/0xb8 [12084.341635] cv_wait_common+0x188/0x1b0 [spl] [12084.341670] __cv_wait+0x30/0x40 [spl] [12084.341895] zil_commit_impl+0x234/0xd30 [zfs] [12084.342112] zil_commit+0x48/0x70 [zfs] [12084.342331] zfs_write+0xa3c/0xb90 [zfs] [12084.342588] zpl_write_common_iovec+0xac/0x120 [zfs] [12084.342817] zpl_iter_write+0xe4/0x150 [zfs] [12084.342830] new_sync_write+0x100/0x1a8 [12084.342839] __vfs_write+0x74/0x90 [12084.342855] vfs_write+0xe4/0x1c8 [12084.342865] ksys_write+0x78/0x100 [12084.342874] __arm64_sys_write+0x24/0x30 [12084.342884] el0_svc_common.constprop.0+0x80/0x218 [12084.342892] el0_svc_handler+0x34/0xa0 [12084.342902] el0_svc+0x10/0x2cc [12084.342927] INFO: task z_wr_iss_h:114156 blocked for more than 120 seconds. [12084.350038] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [12084.356966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.364950] z_wr_iss_h D0 114156 2 0x0028 [12084.364964] Call trace: [12084.364981] __switch_to+0x104/0x170 [12084.364990] __schedule+0x30c/0x7c0 [12084.364998] schedule+0x3c/0xb8 [12084.365006] io_schedule+0x20/0x58 [12084.365019] rq_qos_wait+0x100/0x178 [12084.365027] wbt_wait+0xb4/0xf0 [12084.365037] __rq_qos_throttle+0x38/0x50 [12084.365050] blk_mq_make_request+0x128/0x610 [12084.365061] generic_make_request+0xb4/0x2d8 [12084.365072] submit_bio+0x48/0x218 [12084.365297] vdev_disk_io_start+0x670/0x9f8 [zfs] [12084.365516] zio_vdev_io_start+0xdc/0x2b8 [zfs] [12084.365733] zio_nowait+0xd4/0x170 [zfs] [12084.365949] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [12084.366167] zio_vdev_io_start+0x248/0x2b8 [zfs] [12084.366427] zio_execute+0xac/0x110 [zfs] [12084.366476] taskq_thread+0x2f8/0x570 [spl] [12084.366491] kthread+0xfc/0x128 [12084.366502] ret_from_fork+0x10/0x1c [12084.366514] INFO: task z_wr_iss_h:114161 blocked for more than 120 seconds. [12084.373624] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [12084.380553] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.388541] z_wr_iss_h D0 114161 2 0x0028 [12084.388555] Call trace: [12084.388570] __switch_to+0x104/0x170 [12084.388580] __schedule+0x30c/0x7c0 [12084.388588] schedule+0x3c/0xb8 [12084.388596] io_schedule+0x20/0x58 [12084.388608] rq_qos_wait+0x100/0x178 [12084.388616] wbt_wait+0xb4/0xf0 [12084.388627] __rq_qos_throttle+0x38/0x50 [12084.388638] blk_mq_make_request+0x128/0x610 [12084.388650] generic_make_request+0xb4/0x2d8 [12084.388660]
[Bug 1899249] Re: OpenZFS writing stalls, under load
In the meantime, I'll see if I can temporarily disable ZPool-level deduplication, and retry the backup run, again, on the Windows machine. Whilst it's not the perfect long-term solution, I might look into using offline deduplication, for older, infrequently-accessed data in the pool (probably with Borg/ZBackup), since this machine is mostly-used for bulk storage of archival/back-up data, and file/HTTP serving, in a small network. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
It looks like writing snapshots, with "zfs snapshot" will sometimes stall, even if other commands, like "zfs status" appear to work, occasionally, too. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1899249] Re: OpenZFS writing stalls, under load
Thanks for your reply (I expected this to go into the ether, since it seems to be a very common issue, with I/O, on this hardware platform). I did try switching from LZJB, to LZ4, and got slightly-better performance, and reliability, from at least testing with backing up a Windows 10 machine, using WSL, and Borg, over SSH - however, I can still lock the system up, after several runs, or get it to a stage where I can send commands, over an open SSH session, but spawning a new one fails, since it can't access my home directory (which is stored on the ZFS array). My last log, with Borg, is: [ 22.536219] bcmgenet: Skipping UMAC reset [ 22.537803] bcmgenet fd58.ethernet: configuring instance for external RGMII [ 22.538029] bcmgenet fd58.ethernet eth0: Link is Down [ 25.140217] new mount options do not match the existing superblock, will be ignored [ 26.751285] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled [ 27.651756] bcmgenet fd58.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 27.651790] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 29.696209] Started bpfilter [ 29.701240] bpfilter: Loaded bpfilter_umh pid 3166 [ 29.858264] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this. [ 30.379027] kauditd_printk_skb: 27 callbacks suppressed [ 30.379032] audit: type=1400 audit(1602433969.316:39): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/bin/lxc-start" pid=3259 comm="apparmor_parser" [ 30.457589] audit: type=1400 audit(1602433969.396:40): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default" pid=3266 comm="apparmor_parser" [ 30.457603] audit: type=1400 audit(1602433969.396:41): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-cgns" pid=3266 comm="apparmor_parser" [ 30.457620] audit: type=1400 audit(1602433969.396:42): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-mounting" pid=3266 comm="apparmor_parser" [ 30.457629] audit: type=1400 audit(1602433969.396:43): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-nesting" pid=3266 comm="apparmor_parser" [ 36.556387] Bluetooth: BNEP (Ethernet Emulation) ver 1.3 [ 36.556393] Bluetooth: BNEP filters: protocol multicast [ 36.556407] Bluetooth: BNEP socket layer initialized [ 74.720044] Adding 16777212k swap on /dev/zd0. Priority:-2 extents:1 across:16777212k SSFS [10030.116676] INFO: task z_rd_int:2154 blocked for more than 120 seconds. [10030.123432] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [10030.130389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10030.138388] z_rd_intD0 2154 2 0x0028 [10030.138403] Call trace: [10030.138424] __switch_to+0x104/0x170 [10030.138435] __schedule+0x30c/0x7c0 [10030.138444] schedule+0x3c/0xb8 [10030.138453] io_schedule+0x20/0x58 [10030.138466] rq_qos_wait+0x100/0x178 [10030.138475] wbt_wait+0xb4/0xf0 [10030.138485] __rq_qos_throttle+0x38/0x50 [10030.138497] blk_mq_make_request+0x128/0x610 [10030.138509] generic_make_request+0xb4/0x2d8 [10030.138519] submit_bio+0x48/0x218 [10030.138756] vdev_disk_io_start+0x670/0x9f8 [zfs] [10030.138977] zio_vdev_io_start+0xdc/0x2b8 [zfs] [10030.139197] zio_execute+0xac/0x110 [zfs] [10030.139414] vdev_queue_io_done+0x18c/0x2a0 [zfs] [10030.139632] zio_vdev_io_done+0xec/0x220 [zfs] [10030.139849] zio_execute+0xac/0x110 [zfs] [10030.139891] taskq_thread+0x2f8/0x570 [spl] [10030.139904] kthread+0xfc/0x128 [10030.139915] ret_from_fork+0x10/0x1c [10030.139928] INFO: task z_rd_int:2158 blocked for more than 120 seconds. [10030.146692] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [10030.153619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10030.161608] z_rd_intD0 2158 2 0x0028 [10030.161622] Call trace: [10030.161643] __switch_to+0x104/0x170 [10030.161655] __schedule+0x30c/0x7c0 [10030.161663] schedule+0x3c/0xb8 [10030.161673] io_schedule+0x20/0x58 [10030.161686] rq_qos_wait+0x100/0x178 [10030.161694] wbt_wait+0xb4/0xf0 [10030.161705] __rq_qos_throttle+0x38/0x50 [10030.161717] blk_mq_make_request+0x128/0x610 [10030.161729] generic_make_request+0xb4/0x2d8 [10030.161739] submit_bio+0x48/0x218 [10030.161965] vdev_disk_io_start+0x670/0x9f8 [zfs] [10030.162186] zio_vdev_io_start+0xdc/0x2b8 [zfs] [10030.162416] zio_nowait+0xd4/0x170 [zfs] [10030.162633] vdev_queue_io_done+0x1ec/0x2a0 [zfs] [10030.162859] zio_vdev_io_done+0xec/0x220
[Bug 1899249] Re: OpenZFS writing stalls, under load
It is worth noting that using dededuplication does consume a lot of memory. Deduplication tables consume memory and eventually spill over and consume disk space - this causes extra read and write operations for every block of data on which deduplication is attempted cause more I/O waits. A system with large pools with small memory areas does not perform deduplication well. So, if you run the following command on your pool (pool-ssd in this example) sudo zdb -s pool-ssd ..you see something like the following: pool: pool-ssd state: ONLINE scan: scrub repaired 0B in 0 days 00:31:14 with 0 errors on Wed Sep 2 14:35:52 2020 config: NAMESTATE READ WRITE CKSUM pool-ssdONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-INTEL_SSDSC2BW480H6_CVTR608104NW480EGN ONLINE 0 0 0 ata-INTEL_SSDSC2BW480H6_CVTR552500S0480EGN ONLINE 0 0 0 errors: No known data errors dedup: DDT entries 4258028, size 448B on disk, 306B in core bucket allocated referenced __ __ __ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE -- -- - - - -- - - - 13.63M464G226G226G3.63M464G226G226G 2 376K 46.1G 24.3G 24.3G 822K100G 53.2G 53.2G 451.3K 5.60G 2.51G 2.51G 240K 26.3G 11.8G 11.8G 87.49K925M544M544M77.9K 9.41G 5.47G 5.47G 163.32K415M261M261M68.8K 8.41G 5.35G 5.35G 32 657 78.7M 62.1M 62.1M29.3K 3.51G 2.79G 2.79G 64 389 46.2M 42.7M 42.7M34.0K 4.03G 3.78G 3.78G 128 248 30.1M 29.5M 29.5M44.8K 5.44G 5.32G 5.32G 256 339 42.1M 40.8M 40.8M 123K 15.4G 14.9G 14.9G 512 374 46.8M 46.1M 46.1M 271K 33.9G 33.5G 33.5G 1K 254 31.8M 31.3M 31.3M 355K 44.4G 43.8G 43.8G 2K5640K513K513K12.9K 1.61G 1.20G 1.20G 4K4512K512K512K22.7K 2.84G 2.84G 2.84G 8K8 1M897K897K89.3K 11.2G 10.1G 10.1G 64K2256K 6K 6K 226K 28.3G679M679M Total4.06M517G254G254G5.99M759G421G421G --- Looking at the "dedup: DDT entries 4258028, size 448B on disk, 306B in core" line, we have: 4258028 de-dup entries, each entry used 306 bytes, so that's ~1.2GB of memory for just the de-dup/ table. But it is capped at 1/4 the size of the ARC, so you may find on a small memory system such as a raspberry pi that a lot of the dedup table can't be stored in memory and needs to be on-disk, and this increases the I/O waits. It may be an idea to disable de-dup and see if it helps to resolve your issue on a comparatively slow and memory constrained system such as a raspberry pi when you have large pools ** Changed in: zfs-linux (Ubuntu) Status: New => Incomplete ** Changed in: zfs-linux (Ubuntu) Importance: Undecided => Medium ** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs