[Bug 1899249] Re: OpenZFS writing stalls, under load

2024-02-28 Thread Colin Ian King
** Changed in: zfs-linux (Ubuntu)
 Assignee: Colin Ian King (colin-king) => (unassigned)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-23 Thread Colin Ian King
OK, I'll close this bug for now. If it still bites please feel free to
re-open the bug and I'll get back onto it.

** Changed in: zfs-linux (Ubuntu)
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-23 Thread Tyson Key
Thanks, Colin. I intend to retest, with 20.10, but for now, with
deduplication disabled, things seem stable.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-23 Thread Colin Ian King
@Tyson, hopefully the WBT changes helped to resolve this issue.  If so,
please let me know and I can close this bug.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-21 Thread Colin Ian King
That warning message is from the Raspberry Pi firmware broadcom
get_property sys interface, some user space program has read an old
deprecated sys interface and is just warning to use the hwmon sysfs
interface and it has no bearing on the ZFS or block WBT settings.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-14 Thread Tyson Key
Curiously, it worked fine, for quite some time, but I did manage to get
this crash, with TAR, earlier:

[0.00] Booting Linux on physical CPU 0x00 [0x410fd083]
[0.00] Linux version 5.4.0-1018-raspi (buildd@bos02-arm64-052) (gcc 
version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 
2020 (Ubuntu 5.4.0-1018.20-raspi 5.4.55)
[0.00] Machine model: Raspberry Pi 4 Model B Rev 1.4
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: UEFI not found.
[0.00] Reserved memory: created CMA memory pool at 0x2c00, 
size 64 MiB
[0.00] OF: reserved mem: initialized node linux,cma, compatible id 
shared-dma-pool
[0.00] On node 0 totalpages: 2061312
[0.00]   DMA zone: 3792 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 242688 pages, LIFO batch:63
[0.00]   DMA32 zone: 12288 pages used for memmap
[0.00]   DMA32 zone: 770048 pages, LIFO batch:63
[0.00]   Normal zone: 16384 pages used for memmap
[0.00]   Normal zone: 1048576 pages, LIFO batch:63
[0.00] percpu: Embedded 32 pages/cpu s92120 r8192 d30760 u131072
[0.00] pcpu-alloc: s92120 r8192 d30760 u131072 alloc=32*4096
[0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[0.00] Detected PIPT I-cache on CPU0
[0.00] CPU features: detected: EL2 vector hardening
[0.00] CPU features: kernel page table isolation forced ON by KASLR
[0.00] CPU features: detected: Kernel page table isolation (KPTI)
[0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 2028848
[0.00] Kernel command line:  coherent_pool=1M 8250.nr_uarts=1 
snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 
snd_bcm2835.enable_headphones=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 
bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:B8:1A:28 
vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000  net.ifnames=0 
dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=LABEL=writable 
rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash
[0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 
bytes, linear)
[0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 
bytes, linear)
[0.00] mem auto-init: stack:off, heap alloc:off, heap free:off
[0.00] software IO TLB: mapped [mem 0x3740-0x3b40] (64MB)
[0.00] Memory: 7898140K/8245248K available (11772K kernel code, 1236K 
rwdata, 4244K rodata, 6144K init, 1076K bss, 281572K reserved, 65536K 
cma-reserved)
[0.00] random: get_random_u64 called from kmem_cache_open+0x38/0x3e0 
with crng_init=0
[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[0.00] ftrace: allocating 39001 entries in 153 pages
[0.00] rcu: Hierarchical RCU implementation.
[0.00]  Tasks RCU enabled.
[0.00] rcu: RCU calculated value of scheduler-enlistment delay is 25 
jiffies.
[0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[0.00] GIC: Using split EOI/Deactivate mode
[0.00] arch_timer: cp15 timer(s) running at 54.00MHz (phys).
[0.00] clocksource: arch_sys_counter: mask: 0xff 
max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns
[0.05] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 
4398046511102ns
[0.000200] Console: colour dummy device 80x25
[0.000221] printk: console [tty1] enabled
[0.000273] Calibrating delay loop (skipped), value calculated using timer 
frequency.. 108.00 BogoMIPS (lpj=216000)
[0.000288] pid_max: default: 32768 minimum: 301
[0.000495] LSM: Security Framework initializing
[0.000543] Yama: becoming mindful.
[0.000631] AppArmor: AppArmor initialized
[0.000875] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, 
linear)
[0.000961] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 
bytes, linear)
[0.001026] *** VALIDATE tmpfs ***
[0.001632] *** VALIDATE proc ***
[0.002150] Disabling memory control group subsystem
[0.002218] *** VALIDATE cgroup1 ***
[0.002228] *** VALIDATE cgroup2 ***
[0.003429] ASID allocator initialised with 32768 entries
[0.003591] rcu: Hierarchical SRCU implementation.
[0.005624] EFI services will not be available.
[0.006427] smp: Bringing up secondary CPUs ...
[0.007280] Detected PIPT I-cache on CPU1
[0.007335] CPU1: Booted secondary processor 0x01 [0x410fd083]
[0.008377] Detected PIPT I-cache on CPU2
[0.008415] CPU2: Booted secondary processor 0x02 [0x410fd083]
[0.009370] Detected PIPT I-cache on CPU3
[0.009406] CPU3: Booted secondary processor 0x03 [0x410fd083]
[0.009546] smp: Brought up 1 node, 4 CPUs
[0.009555] SMP: Total of 4 processors activated.
[0.009564] CPU features: 

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-14 Thread Colin Ian King
That's great news. Let's keep this bug open for the moment as
"incomplete" and if you don't report back after ~6 weeks or so it will
automatically be closed.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-14 Thread Tyson Key
And, after 18 hours+, I was able to archive a ZFS snapshot, using
BorgBackup, with the WBT settings, suggested, earlier:

tyson@ubuntu:~$ sudo zfs send Yaesu@Crucial-2TB-1951E22FA633 |  borg create 
--stats BorgStore::Yaesu@Crucial-2TB-1951E22FA633 -
^[[A^[[A--
Archive name: Yaesu@Crucial-2TB-1951E22FA633
Archive fingerprint: 
32d64e2a13eb543e4846cc34ac2c3609fe192127e3bf6d324a0fa9d61b988c19
Time (start): Tue, 2020-10-13 15:49:51
Time (end):   Wed, 2020-10-14 10:01:46
Duration: 18 hours 11 minutes 54.87 seconds
Number of files: 1
Utilization of max. archive size: 0%
--
   Original size  Compressed sizeDeduplicated size
This archive:1.89 TB  1.75 TB617.36 GB
All archives:7.44 TB  5.87 TB  1.99 TB

   Unique chunks Total chunks
Chunk index:  842436  2607257
--

The "dmesg" log is clean, too:

tyson@ubuntu:~$ dmesg
[0.00] Booting Linux on physical CPU 0x00 [0x410fd083]
[0.00] Linux version 5.4.0-1018-raspi (buildd@bos02-arm64-052) (gcc 
version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 
2020 (Ubuntu 5.4.0-1018.20-raspi 5.4.55)
[0.00] Machine model: Raspberry Pi 4 Model B Rev 1.4
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: UEFI not found.
[0.00] Reserved memory: created CMA memory pool at 0x2c00, 
size 64 MiB
[0.00] OF: reserved mem: initialized node linux,cma, compatible id 
shared-dma-pool
[0.00] On node 0 totalpages: 2061312
[0.00]   DMA zone: 3792 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 242688 pages, LIFO batch:63
[0.00]   DMA32 zone: 12288 pages used for memmap
[0.00]   DMA32 zone: 770048 pages, LIFO batch:63
[0.00]   Normal zone: 16384 pages used for memmap
[0.00]   Normal zone: 1048576 pages, LIFO batch:63
[0.00] percpu: Embedded 32 pages/cpu s92120 r8192 d30760 u131072
[0.00] pcpu-alloc: s92120 r8192 d30760 u131072 alloc=32*4096
[0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[0.00] Detected PIPT I-cache on CPU0
[0.00] CPU features: detected: EL2 vector hardening
[0.00] CPU features: kernel page table isolation forced ON by KASLR
[0.00] CPU features: detected: Kernel page table isolation (KPTI)
[0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 2028848
[0.00] Kernel command line:  coherent_pool=1M 8250.nr_uarts=1 
snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 
snd_bcm2835.enable_headphones=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 
bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:B8:1A:28 
vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000  net.ifnames=0 
dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=LABEL=writable 
rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash
[0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 
bytes, linear)
[0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 
bytes, linear)
[0.00] mem auto-init: stack:off, heap alloc:off, heap free:off
[0.00] software IO TLB: mapped [mem 0x3740-0x3b40] (64MB)
[0.00] Memory: 7898140K/8245248K available (11772K kernel code, 1236K 
rwdata, 4244K rodata, 6144K init, 1076K bss, 281572K reserved, 65536K 
cma-reserved)
[0.00] random: get_random_u64 called from kmem_cache_open+0x38/0x3e0 
with crng_init=0
[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[0.00] ftrace: allocating 39001 entries in 153 pages
[0.00] rcu: Hierarchical RCU implementation.
[0.00]  Tasks RCU enabled.
[0.00] rcu: RCU calculated value of scheduler-enlistment delay is 25 
jiffies.
[0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[0.00] GIC: Using split EOI/Deactivate mode
[0.00] arch_timer: cp15 timer(s) running at 54.00MHz (phys).
[0.00] clocksource: arch_sys_counter: mask: 0xff 
max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns
[0.05] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 
4398046511102ns
[0.000200] Console: colour dummy device 80x25
[0.000221] printk: console [tty1] enabled
[0.000273] Calibrating delay loop (skipped), value calculated using timer 
frequency.. 108.00 BogoMIPS (lpj=216000)
[0.000288] pid_max: default: 32768 minimum: 301
[0.000495] LSM: Security Framework initializing
[0.000543] Yama: becoming mindful.
[0.000631] AppArmor: AppArmor 

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-14 Thread Colin Ian King
It may also be a good idea to disable any power management on USB too.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
Needs more time, in terms of long-running I/O, but changing the WBT
setting at least makes retrieving snapshots a little faster, with "zfs
list -t all".

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
All of the drives in the pool are SSDs (2x Crucial BX500 (2TB), 1x
SanDisk Ultra 3D (4TB), 1x Samsung 870 QVO (4TB), but obviously don't
perfectly-align, in terms of performance characteristics - but, I'll
test the WBT tuneable, to see if it makes a difference.

I'm also suspecting that some power management weirdness, affecting
waking up USB devices, is at play, on this machine, but it's a Ubuntu
Server installation, without any special GUI, or Raspberry Pi-specific
components installed.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
Set the WBT value to 0, for all of the devices, whilst another archive
run takes place. The last one ran for just over 20 hours, before hitting
the I/O problem.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
I'm using 16GB of swap, on a ZVOL, in the pool, but I can also test with
swap, on the internal MicroSD card's EXT4 root partition, if it helps.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
Here's a full "dmesg" log, if it helps:

tyson@ubuntu:~$ dmesg
[0.00] Booting Linux on physical CPU 0x00 [0x410fd083]
[0.00] Linux version 5.4.0-1018-raspi (buildd@bos02-arm64-052) (gcc 
version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 
2020 (Ubuntu 5.4.0-1018.20-raspi 5.4.55)
[0.00] Machine model: Raspberry Pi 4 Model B Rev 1.4
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: UEFI not found.
[0.00] Reserved memory: created CMA memory pool at 0x2c00, 
size 64 MiB
[0.00] OF: reserved mem: initialized node linux,cma, compatible id 
shared-dma-pool
[0.00] On node 0 totalpages: 2061312
[0.00]   DMA zone: 3792 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 242688 pages, LIFO batch:63
[0.00]   DMA32 zone: 12288 pages used for memmap
[0.00]   DMA32 zone: 770048 pages, LIFO batch:63
[0.00]   Normal zone: 16384 pages used for memmap
[0.00]   Normal zone: 1048576 pages, LIFO batch:63
[0.00] percpu: Embedded 32 pages/cpu s92120 r8192 d30760 u131072
[0.00] pcpu-alloc: s92120 r8192 d30760 u131072 alloc=32*4096
[0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[0.00] Detected PIPT I-cache on CPU0
[0.00] CPU features: detected: EL2 vector hardening
[0.00] CPU features: kernel page table isolation forced ON by KASLR
[0.00] CPU features: detected: Kernel page table isolation (KPTI)
[0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 2028848
[0.00] Kernel command line:  coherent_pool=1M 8250.nr_uarts=1 
snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 
snd_bcm2835.enable_headphones=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 
bcm2708_fb.fbswap=1 smsc95xx.macaddr=DC:A6:32:B8:1A:28 
vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000  net.ifnames=0 
dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=LABEL=writable 
rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash
[0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 
bytes, linear)
[0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 
bytes, linear)
[0.00] mem auto-init: stack:off, heap alloc:off, heap free:off
[0.00] software IO TLB: mapped [mem 0x3740-0x3b40] (64MB)
[0.00] Memory: 7898136K/8245248K available (11772K kernel code, 1236K 
rwdata, 4244K rodata, 6144K init, 1076K bss, 281576K reserved, 65536K 
cma-reserved)
[0.00] random: get_random_u64 called from kmem_cache_open+0x38/0x3e0 
with crng_init=0
[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[0.00] ftrace: allocating 39001 entries in 153 pages
[0.00] rcu: Hierarchical RCU implementation.
[0.00]  Tasks RCU enabled.
[0.00] rcu: RCU calculated value of scheduler-enlistment delay is 25 
jiffies.
[0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[0.00] GIC: Using split EOI/Deactivate mode
[0.00] arch_timer: cp15 timer(s) running at 54.00MHz (phys).
[0.00] clocksource: arch_sys_counter: mask: 0xff 
max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns
[0.05] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 
4398046511102ns
[0.000200] Console: colour dummy device 80x25
[0.000221] printk: console [tty1] enabled
[0.000273] Calibrating delay loop (skipped), value calculated using timer 
frequency.. 108.00 BogoMIPS (lpj=216000)
[0.000287] pid_max: default: 32768 minimum: 301
[0.000491] LSM: Security Framework initializing
[0.000540] Yama: becoming mindful.
[0.000629] AppArmor: AppArmor initialized
[0.000874] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, 
linear)
[0.000961] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 
bytes, linear)
[0.001028] *** VALIDATE tmpfs ***
[0.001634] *** VALIDATE proc ***
[0.002155] Disabling memory control group subsystem
[0.002224] *** VALIDATE cgroup1 ***
[0.002234] *** VALIDATE cgroup2 ***
[0.003438] ASID allocator initialised with 32768 entries
[0.003601] rcu: Hierarchical SRCU implementation.
[0.005657] EFI services will not be available.
[0.006469] smp: Bringing up secondary CPUs ...
[0.007318] Detected PIPT I-cache on CPU1
[0.007371] CPU1: Booted secondary processor 0x01 [0x410fd083]
[0.008416] Detected PIPT I-cache on CPU2
[0.008454] CPU2: Booted secondary processor 0x02 [0x410fd083]
[0.009407] Detected PIPT I-cache on CPU3
[0.009444] CPU3: Booted secondary processor 0x03 [0x410fd083]
[0.009586] smp: Brought up 1 node, 4 CPUs
[0.009594] SMP: Total of 4 processors activated.
[0.009604] CPU features: detected: 32-bit EL0 Support
[0.009613] CPU 

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Colin Ian King
So it may be that the write-back-throttling (wbt) for the underlying
devices is getting confused about the exact throttle rates are for these
devices and somehow getting stuck.  It maybe worth experimenting by
disabling the throttling and seeing if this gets I/O working again.

For example, to disable wbt for a device /dev/sda use:

echo 0 | sudo tee /sys/block/sda/queue/wbt_lat_usec

and if you need to reset it back to the default:

echo -1 | sudo tee /sys/block/sda/queue/wbt_lat_usec

..use the appropriate block device name for the block devices you have
attached. It may even be worth setting the wbt_lat_usec to 0 for all the
block devices in your pool as early as possible after boot and see if
this helps.  My hunch is that wbt is not set correctly for the attached
non-spinny devices you have and it is assuming they are reasonably fast
when in fact it may be rate limited to be slower than expected because
they are attached to relatively slow host.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Colin Ian King
..and does you raspi have swap enabled?

** Bug watch added: github.com/openzfs/zfs/issues #10522
   https://github.com/openzfs/zfs/issues/10522

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Colin Ian King
Same issue as reported here: https://github.com/openzfs/zfs/issues/10522

** Also affects: zfs via
   https://github.com/openzfs/zfs/issues/10522
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Colin Ian King
Is it possible to have the full dmesg from the start of where you boot
to the point where you see the "INFO... blocked for more than 120
seconds" message?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-13 Thread Tyson Key
Looks like I could archive about 1.3TB, of a 1.64TB snapshot, before
things started to go bad, again:

[110079.681102] INFO: task z_wr_iss_h:2171 blocked for more than 120 seconds.
[110079.688123]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[110079.695167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[110079.703260] z_wr_iss_h  D0  2171  2 0x0028
[110079.703276] Call trace:
[110079.703297]  __switch_to+0x104/0x170
[110079.703308]  __schedule+0x30c/0x7c0
[110079.703317]  schedule+0x3c/0xb8
[110079.703325]  io_schedule+0x20/0x58
[110079.703338]  rq_qos_wait+0x100/0x178
[110079.703347]  wbt_wait+0xb4/0xf0
[110079.703357]  __rq_qos_throttle+0x38/0x50
[110079.703370]  blk_mq_make_request+0x128/0x610
[110079.703382]  generic_make_request+0xb4/0x2d8
[110079.703392]  submit_bio+0x48/0x218
[110079.703629]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[110079.703850]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[110079.704068]  zio_nowait+0xd4/0x170 [zfs]
[110079.704285]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[110079.704503]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[110079.704720]  zio_execute+0xac/0x110 [zfs]
[110079.704760]  taskq_thread+0x2f8/0x570 [spl]
[110079.704773]  kthread+0xfc/0x128
[110079.704784]  ret_from_fork+0x10/0x1c
[110079.704795] INFO: task z_wr_int:2173 blocked for more than 120 seconds.
[110079.711650]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[110079.718668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[110079.726745] z_wr_intD0  2173  2 0x0028
[110079.726760] Call trace:
[110079.726780]  __switch_to+0x104/0x170
[110079.726790]  __schedule+0x30c/0x7c0
[110079.726803]  schedule+0x3c/0xb8
[110079.726811]  io_schedule+0x20/0x58
[110079.726823]  rq_qos_wait+0x100/0x178
[110079.726832]  wbt_wait+0xb4/0xf0
[110079.726842]  __rq_qos_throttle+0x38/0x50
[110079.726854]  blk_mq_make_request+0x128/0x610
[110079.726866]  generic_make_request+0xb4/0x2d8
[110079.726877]  submit_bio+0x48/0x218
[110079.727100]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[110079.727320]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[110079.727549]  zio_nowait+0xd4/0x170 [zfs]
[110079.727766]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
[110079.727991]  zio_vdev_io_done+0xec/0x220 [zfs]
[110079.728209]  zio_execute+0xac/0x110 [zfs]
[110079.728255]  taskq_thread+0x2f8/0x570 [spl]
[110079.728268]  kthread+0xfc/0x128
[110079.728279]  ret_from_fork+0x10/0x1c
[110079.728309] INFO: task txg_sync:2388 blocked for more than 120 seconds.
[110079.735157]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[110079.742175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[110079.750306] txg_syncD0  2388  2 0x0028
[110079.750321] Call trace:
[110079.750337]  __switch_to+0x104/0x170
[110079.750348]  __schedule+0x30c/0x7c0
[110079.750356]  schedule+0x3c/0xb8
[110079.750366]  schedule_timeout+0x9c/0x190
[110079.750374]  io_schedule_timeout+0x28/0x48
[110079.750412]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
[110079.750447]  __cv_timedwait_io+0x3c/0x50 [spl]
[110079.750670]  zio_wait+0x130/0x2a0 [zfs]
[110079.750890]  dsl_pool_sync+0x3fc/0x498 [zfs]
[110079.751109]  spa_sync+0x538/0xe68 [zfs]
[110079.751326]  txg_sync_thread+0x2c0/0x468 [zfs]
[110079.751364]  thread_generic_wrapper+0x74/0xa0 [spl]
[110079.751377]  kthread+0xfc/0x128
[110079.751388]  ret_from_fork+0x10/0x1c
[110079.751459] INFO: task borg:1063669 blocked for more than 120 seconds.
[110079.758224]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[110079.765237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[110079.773310] borgD0 1063669 1063661 0x
[110079.773324] Call trace:
[110079.773344]  __switch_to+0x104/0x170
[110079.773355]  __schedule+0x30c/0x7c0
[110079.773366]  schedule+0x3c/0xb8
[110079.773406]  cv_wait_common+0x188/0x1b0 [spl]
[110079.773441]  __cv_wait+0x30/0x40 [spl]
[110079.773664]  zil_commit_impl+0x234/0xd30 [zfs]
[110079.773882]  zil_commit+0x48/0x70 [zfs]
[110079.774112]  zfs_write+0xa3c/0xb90 [zfs]
[110079.774331]  zpl_write_common_iovec+0xac/0x120 [zfs]
[110079.774557]  zpl_iter_write+0xe4/0x150 [zfs]
[110079.774569]  new_sync_write+0x100/0x1a8
[110079.774578]  __vfs_write+0x74/0x90
[110079.774593]  vfs_write+0xe4/0x1c8
[110079.774603]  ksys_write+0x78/0x100
[110079.774612]  __arm64_sys_write+0x24/0x30
[110079.774622]  el0_svc_common.constprop.0+0x80/0x218
[110079.774631]  el0_svc_handler+0x34/0xa0
[110079.774640]  el0_svc+0x10/0x2cc
[110079.774665] INFO: task z_wr_iss_h:3853507 blocked for more than 120 seconds.
[110079.781953]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[110079.788973] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[110079.797052] z_wr_iss_h  D0 3853507  2 0x0028
[110079.797067] Call trace:
[110079.797083]  __switch_to+0x104/0x170
[110079.797094]  __schedule+0x30c/0x7c0

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-12 Thread Tyson Key
Thanks.

I was able to archive, and delete 400GB of data, without problems,
earlier on, today, which reduced the "REFER" of my data set, a little -
however, it looks like I probably need to focus on archiving, and
removing some of the older snapshots, if I want to trim down the memory
utilisation of my pool.

When I arrived home, from work, "dmesg" didn't have anything interesting
to say, in terms of panics, or errors - which seems promising, but it's
still early days, yet.

Seems a little like a "luxury problem", that I find myself with more
storage space, and data, than available RAM, given that usually, the
opposite situation is true...

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-12 Thread Richard Laager
You could shrink the DDT by making a copy of the files in place (with
dedup off) and deleting the old file. That only requires enough extra
space for a single file at a time. This assumes no snapshots.

If you need to preserve snapshots, another option would be to send|recv
a dataset at a time. If you have enough free space for a copy of the
largest dataset, this would work.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-12 Thread Tyson Key
Unfortunately, I haven't got a storage device large enough, to contain
all of the data from the pool, and much of it cannot be recreated, or
restored, from another source, so I won't be able to nuke the pool, and
rebuild it.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-12 Thread Richard Laager
Did you destroy and recreate the pool after disabling dedup? Otherwise
you still have the same dedup table and haven’t really accomplished
much.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-12 Thread Tyson Key
Also receive this, when trying to archive a large directory:

[ 2055.147509] INFO: task z_wr_iss_h:2169 blocked for more than 120 seconds.
[ 2055.154450]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[ 2055.161401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2055.169403] z_wr_iss_h  D0  2169  2 0x0028
[ 2055.169420] Call trace:
[ 2055.169441]  __switch_to+0x104/0x170
[ 2055.169453]  __schedule+0x30c/0x7c0
[ 2055.169462]  schedule+0x3c/0xb8
[ 2055.169470]  io_schedule+0x20/0x58
[ 2055.169483]  rq_qos_wait+0x100/0x178
[ 2055.169492]  wbt_wait+0xb4/0xf0
[ 2055.169502]  __rq_qos_throttle+0x38/0x50
[ 2055.169514]  blk_mq_make_request+0x128/0x610
[ 2055.169526]  generic_make_request+0xb4/0x2d8
[ 2055.169537]  submit_bio+0x48/0x218
[ 2055.169779]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[ 2055.170005]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[ 2055.170229]  zio_nowait+0xd4/0x170 [zfs]
[ 2055.170451]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[ 2055.170673]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[ 2055.170896]  zio_execute+0xac/0x110 [zfs]
[ 2055.170937]  taskq_thread+0x2f8/0x570 [spl]
[ 2055.170950]  kthread+0xfc/0x128
[ 2055.170960]  ret_from_fork+0x10/0x1c
[ 2055.170990] INFO: task txg_quiesce:2380 blocked for more than 120 seconds.
[ 2055.178025]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[ 2055.184958] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2055.192949] txg_quiesce D0  2380  2 0x0028
[ 2055.192963] Call trace:
[ 2055.192984]  __switch_to+0x104/0x170
[ 2055.192995]  __schedule+0x30c/0x7c0
[ 2055.193003]  schedule+0x3c/0xb8
[ 2055.193046]  cv_wait_common+0x188/0x1b0 [spl]
[ 2055.193082]  __cv_wait+0x30/0x40 [spl]
[ 2055.193310]  txg_quiesce_thread+0x27c/0x380 [zfs]
[ 2055.193349]  thread_generic_wrapper+0x74/0xa0 [spl]
[ 2055.193370]  kthread+0xfc/0x128
[ 2055.193381]  ret_from_fork+0x10/0x1c
[ 2055.193438] INFO: task smbd:3662 blocked for more than 120 seconds.
[ 2055.199853]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[ 2055.206784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2055.214772] smbdD0  3662   3498 0x
[ 2055.214786] Call trace:
[ 2055.214800]  __switch_to+0x104/0x170
[ 2055.214810]  __schedule+0x30c/0x7c0
[ 2055.214818]  schedule+0x3c/0xb8
[ 2055.214856]  cv_wait_common+0x188/0x1b0 [spl]
[ 2055.214892]  __cv_wait+0x30/0x40 [spl]
[ 2055.215121]  zil_commit_impl+0x234/0xd30 [zfs]
[ 2055.215344]  zil_commit+0x48/0x70 [zfs]
[ 2055.215567]  zfs_read+0x2f8/0x418 [zfs]
[ 2055.215831]  zpl_read_common_iovec+0xa8/0x110 [zfs]
[ 2055.216065]  zpl_read_common+0x4c/0x70 [zfs]
[ 2055.216288]  zpl_xattr_get_dir+0x10c/0x128 [zfs]
[ 2055.216519]  zpl_xattr_get+0xcc/0x1a8 [zfs]
[ 2055.216742]  zpl_xattr_user_get+0x68/0x90 [zfs]
[ 2055.216764]  __vfs_getxattr+0x60/0x80
[ 2055.216774]  vfs_getxattr+0x140/0x160
[ 2055.216784]  getxattr+0x9c/0x2c0
[ 2055.216794]  path_getxattr+0x7c/0xd0
[ 2055.216805]  __arm64_sys_getxattr+0x28/0x38
[ 2055.216815]  el0_svc_common.constprop.0+0x80/0x218
[ 2055.216823]  el0_svc_handler+0x34/0xa0
[ 2055.216833]  el0_svc+0x10/0x2cc
[ 2055.216858] INFO: task borg:5639 blocked for more than 120 seconds.
[ 2055.223265]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[ 2055.230193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2055.238179] borgD0  5639   4001 0x
[ 2055.238193] Call trace:
[ 2055.238208]  __switch_to+0x104/0x170
[ 2055.238218]  __schedule+0x30c/0x7c0
[ 2055.238226]  schedule+0x3c/0xb8
[ 2055.238267]  cv_wait_common+0x188/0x1b0 [spl]
[ 2055.238303]  __cv_wait+0x30/0x40 [spl]
[ 2055.238530]  zil_commit_impl+0x234/0xd30 [zfs]
[ 2055.238752]  zil_commit+0x48/0x70 [zfs]
[ 2055.238975]  zfs_write+0xa3c/0xb90 [zfs]
[ 2055.239197]  zpl_write_common_iovec+0xac/0x120 [zfs]
[ 2055.239419]  zpl_iter_write+0xe4/0x150 [zfs]
[ 2055.239471]  new_sync_write+0x100/0x1a8
[ 2055.239480]  __vfs_write+0x74/0x90
[ 2055.239489]  vfs_write+0xe4/0x1c8
[ 2055.239505]  ksys_write+0x78/0x100
[ 2055.239514]  __arm64_sys_write+0x24/0x30
[ 2055.239524]  el0_svc_common.constprop.0+0x80/0x218
[ 2055.239532]  el0_svc_handler+0x34/0xa0
[ 2055.239541]  el0_svc+0x10/0x2cc
[ 2055.239555] INFO: task kworker/u8:3:35327 blocked for more than 120 seconds.
[ 2055.246754]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[ 2055.253681] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2055.261670] kworker/u8:3D0 35327  2 0x0028
[ 2055.261715] Workqueue: writeback wb_workfn (flush-zfs-1)
[ 2055.261728] Call trace:
[ 2055.261740]  __switch_to+0x104/0x170
[ 2055.261750]  __schedule+0x30c/0x7c0
[ 2055.261758]  schedule+0x3c/0xb8
[ 2055.261798]  cv_wait_common+0x188/0x1b0 [spl]
[ 2055.261834]  __cv_wait+0x30/0x40 [spl]
[ 2055.262059]  zil_commit_impl+0x234/0xd30 [zfs]
[ 2055.262284]  zil_commit+0x48/0x70 [zfs]

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-11 Thread Tyson Key
With deduplication disabled, it eventually gets further with the backup,
but still stalls, after leaving it, overnight:

[12084.274242] INFO: task z_wr_iss_h:2157 blocked for more than 120 seconds.
[12084.281171]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[12084.288126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[12084.296127] z_wr_iss_h  D0  2157  2 0x0028
[12084.296142] Call trace:
[12084.296163]  __switch_to+0x104/0x170
[12084.296174]  __schedule+0x30c/0x7c0
[12084.296183]  schedule+0x3c/0xb8
[12084.296191]  io_schedule+0x20/0x58
[12084.296204]  rq_qos_wait+0x100/0x178
[12084.296213]  wbt_wait+0xb4/0xf0
[12084.296223]  __rq_qos_throttle+0x38/0x50
[12084.296235]  blk_mq_make_request+0x128/0x610
[12084.296247]  generic_make_request+0xb4/0x2d8
[12084.296258]  submit_bio+0x48/0x218
[12084.296496]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[12084.296716]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[12084.296935]  zio_nowait+0xd4/0x170 [zfs]
[12084.297151]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[12084.297369]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[12084.297586]  zio_execute+0xac/0x110 [zfs]
[12084.297626]  taskq_thread+0x2f8/0x570 [spl]
[12084.297639]  kthread+0xfc/0x128
[12084.297650]  ret_from_fork+0x10/0x1c
[12084.297678] INFO: task txg_quiesce:3077 blocked for more than 120 seconds.
[12084.304712]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[12084.311641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[12084.319632] txg_quiesce D0  3077  2 0x0028
[12084.319646] Call trace:
[12084.319668]  __switch_to+0x104/0x170
[12084.319679]  __schedule+0x30c/0x7c0
[12084.319691]  schedule+0x3c/0xb8
[12084.319731]  cv_wait_common+0x188/0x1b0 [spl]
[12084.319767]  __cv_wait+0x30/0x40 [spl]
[12084.319990]  txg_quiesce_thread+0x27c/0x380 [zfs]
[12084.320028]  thread_generic_wrapper+0x74/0xa0 [spl]
[12084.320050]  kthread+0xfc/0x128
[12084.320062]  ret_from_fork+0x10/0x1c
[12084.320133] INFO: task borg:13631 blocked for more than 120 seconds.
[12084.326632]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[12084.333562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[12084.341550] borgD0 13631  13630 0x
[12084.341564] Call trace:
[12084.341578]  __switch_to+0x104/0x170
[12084.341588]  __schedule+0x30c/0x7c0
[12084.341596]  schedule+0x3c/0xb8
[12084.341635]  cv_wait_common+0x188/0x1b0 [spl]
[12084.341670]  __cv_wait+0x30/0x40 [spl]
[12084.341895]  zil_commit_impl+0x234/0xd30 [zfs]
[12084.342112]  zil_commit+0x48/0x70 [zfs]
[12084.342331]  zfs_write+0xa3c/0xb90 [zfs]
[12084.342588]  zpl_write_common_iovec+0xac/0x120 [zfs]
[12084.342817]  zpl_iter_write+0xe4/0x150 [zfs]
[12084.342830]  new_sync_write+0x100/0x1a8
[12084.342839]  __vfs_write+0x74/0x90
[12084.342855]  vfs_write+0xe4/0x1c8
[12084.342865]  ksys_write+0x78/0x100
[12084.342874]  __arm64_sys_write+0x24/0x30
[12084.342884]  el0_svc_common.constprop.0+0x80/0x218
[12084.342892]  el0_svc_handler+0x34/0xa0
[12084.342902]  el0_svc+0x10/0x2cc
[12084.342927] INFO: task z_wr_iss_h:114156 blocked for more than 120 seconds.
[12084.350038]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[12084.356966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[12084.364950] z_wr_iss_h  D0 114156  2 0x0028
[12084.364964] Call trace:
[12084.364981]  __switch_to+0x104/0x170
[12084.364990]  __schedule+0x30c/0x7c0
[12084.364998]  schedule+0x3c/0xb8
[12084.365006]  io_schedule+0x20/0x58
[12084.365019]  rq_qos_wait+0x100/0x178
[12084.365027]  wbt_wait+0xb4/0xf0
[12084.365037]  __rq_qos_throttle+0x38/0x50
[12084.365050]  blk_mq_make_request+0x128/0x610
[12084.365061]  generic_make_request+0xb4/0x2d8
[12084.365072]  submit_bio+0x48/0x218
[12084.365297]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[12084.365516]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[12084.365733]  zio_nowait+0xd4/0x170 [zfs]
[12084.365949]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[12084.366167]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[12084.366427]  zio_execute+0xac/0x110 [zfs]
[12084.366476]  taskq_thread+0x2f8/0x570 [spl]
[12084.366491]  kthread+0xfc/0x128
[12084.366502]  ret_from_fork+0x10/0x1c
[12084.366514] INFO: task z_wr_iss_h:114161 blocked for more than 120 seconds.
[12084.373624]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[12084.380553] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[12084.388541] z_wr_iss_h  D0 114161  2 0x0028
[12084.388555] Call trace:
[12084.388570]  __switch_to+0x104/0x170
[12084.388580]  __schedule+0x30c/0x7c0
[12084.388588]  schedule+0x3c/0xb8
[12084.388596]  io_schedule+0x20/0x58
[12084.388608]  rq_qos_wait+0x100/0x178
[12084.388616]  wbt_wait+0xb4/0xf0
[12084.388627]  __rq_qos_throttle+0x38/0x50
[12084.388638]  blk_mq_make_request+0x128/0x610
[12084.388650]  generic_make_request+0xb4/0x2d8
[12084.388660]  

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-11 Thread Tyson Key
In the meantime, I'll see if I can temporarily disable ZPool-level
deduplication, and retry the backup run, again, on the Windows machine.

Whilst it's not the perfect long-term solution, I might look into using
offline deduplication, for older, infrequently-accessed data in the pool
(probably with Borg/ZBackup), since this machine is mostly-used for bulk
storage of archival/back-up data, and file/HTTP serving, in a small
network.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-11 Thread Tyson Key
It looks like writing snapshots, with "zfs snapshot" will sometimes
stall, even if other commands, like "zfs status" appear to work,
occasionally, too.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-11 Thread Tyson Key
Thanks for your reply (I expected this to go into the ether, since it
seems to be a very common issue, with I/O, on this hardware platform).

I did try switching from LZJB, to LZ4, and got slightly-better
performance, and reliability, from at least testing with backing up a
Windows 10 machine, using WSL, and Borg, over SSH - however, I can still
lock the system up, after several runs, or get it to a stage where I can
send commands, over an open SSH session, but spawning a new one fails,
since it can't access my home directory (which is stored on the ZFS
array).

My last log, with Borg, is:

[   22.536219] bcmgenet: Skipping UMAC reset
[   22.537803] bcmgenet fd58.ethernet: configuring instance for external 
RGMII
[   22.538029] bcmgenet fd58.ethernet eth0: Link is Down
[   25.140217] new mount options do not match the existing superblock, will be 
ignored
[   26.751285] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled
[   27.651756] bcmgenet fd58.ethernet eth0: Link is Up - 1Gbps/Full - flow 
control rx/tx
[   27.651790] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   29.696209] Started bpfilter
[   29.701240] bpfilter: Loaded bpfilter_umh pid 3166
[   29.858264] bridge: filtering via arp/ip/ip6tables is no longer available by 
default. Update your scripts to load br_netfilter if you need this.
[   30.379027] kauditd_printk_skb: 27 callbacks suppressed
[   30.379032] audit: type=1400 audit(1602433969.316:39): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="/usr/bin/lxc-start" pid=3259 comm="apparmor_parser"
[   30.457589] audit: type=1400 audit(1602433969.396:40): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default" pid=3266 
comm="apparmor_parser"
[   30.457603] audit: type=1400 audit(1602433969.396:41): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default-cgns" pid=3266 
comm="apparmor_parser"
[   30.457620] audit: type=1400 audit(1602433969.396:42): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default-with-mounting" pid=3266 
comm="apparmor_parser"
[   30.457629] audit: type=1400 audit(1602433969.396:43): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default-with-nesting" pid=3266 
comm="apparmor_parser"
[   36.556387] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   36.556393] Bluetooth: BNEP filters: protocol multicast
[   36.556407] Bluetooth: BNEP socket layer initialized
[   74.720044] Adding 16777212k swap on /dev/zd0.  Priority:-2 extents:1 
across:16777212k SSFS
[10030.116676] INFO: task z_rd_int:2154 blocked for more than 120 seconds.
[10030.123432]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[10030.130389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.138388] z_rd_intD0  2154  2 0x0028
[10030.138403] Call trace:
[10030.138424]  __switch_to+0x104/0x170
[10030.138435]  __schedule+0x30c/0x7c0
[10030.138444]  schedule+0x3c/0xb8
[10030.138453]  io_schedule+0x20/0x58
[10030.138466]  rq_qos_wait+0x100/0x178
[10030.138475]  wbt_wait+0xb4/0xf0
[10030.138485]  __rq_qos_throttle+0x38/0x50
[10030.138497]  blk_mq_make_request+0x128/0x610
[10030.138509]  generic_make_request+0xb4/0x2d8
[10030.138519]  submit_bio+0x48/0x218
[10030.138756]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10030.138977]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10030.139197]  zio_execute+0xac/0x110 [zfs]
[10030.139414]  vdev_queue_io_done+0x18c/0x2a0 [zfs]
[10030.139632]  zio_vdev_io_done+0xec/0x220 [zfs]
[10030.139849]  zio_execute+0xac/0x110 [zfs]
[10030.139891]  taskq_thread+0x2f8/0x570 [spl]
[10030.139904]  kthread+0xfc/0x128
[10030.139915]  ret_from_fork+0x10/0x1c
[10030.139928] INFO: task z_rd_int:2158 blocked for more than 120 seconds.
[10030.146692]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
[10030.153619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.161608] z_rd_intD0  2158  2 0x0028
[10030.161622] Call trace:
[10030.161643]  __switch_to+0x104/0x170
[10030.161655]  __schedule+0x30c/0x7c0
[10030.161663]  schedule+0x3c/0xb8
[10030.161673]  io_schedule+0x20/0x58
[10030.161686]  rq_qos_wait+0x100/0x178
[10030.161694]  wbt_wait+0xb4/0xf0
[10030.161705]  __rq_qos_throttle+0x38/0x50
[10030.161717]  blk_mq_make_request+0x128/0x610
[10030.161729]  generic_make_request+0xb4/0x2d8
[10030.161739]  submit_bio+0x48/0x218
[10030.161965]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10030.162186]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10030.162416]  zio_nowait+0xd4/0x170 [zfs]
[10030.162633]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
[10030.162859]  zio_vdev_io_done+0xec/0x220 

[Bug 1899249] Re: OpenZFS writing stalls, under load

2020-10-10 Thread Colin Ian King
It is worth noting that using dededuplication does consume a lot of
memory.

Deduplication tables consume memory and eventually spill over and
consume disk space - this causes extra read and write operations for
every block of data on which deduplication is attempted cause more I/O
waits.

A system with large pools with small memory areas does not perform
deduplication well.

So, if you run the following command on your pool (pool-ssd in this
example)

sudo zdb -s pool-ssd

..you see something like the following:

  pool: pool-ssd
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:31:14 with 0 errors on Wed Sep  2 
14:35:52 2020
config:

NAMESTATE READ WRITE 
CKSUM
pool-ssdONLINE   0 0
 0
  mirror-0  ONLINE   0 0
 0
ata-INTEL_SSDSC2BW480H6_CVTR608104NW480EGN  ONLINE   0 0
 0
ata-INTEL_SSDSC2BW480H6_CVTR552500S0480EGN  ONLINE   0 0
 0

errors: No known data errors

 dedup: DDT entries 4258028, size 448B on disk, 306B in core

bucket  allocated   referenced  
__   __   __
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
--   --   -   -   -   --   -   -   -
 13.63M464G226G226G3.63M464G226G226G
 2 376K   46.1G   24.3G   24.3G 822K100G   53.2G   53.2G
 451.3K   5.60G   2.51G   2.51G 240K   26.3G   11.8G   11.8G
 87.49K925M544M544M77.9K   9.41G   5.47G   5.47G
163.32K415M261M261M68.8K   8.41G   5.35G   5.35G
32  657   78.7M   62.1M   62.1M29.3K   3.51G   2.79G   2.79G
64  389   46.2M   42.7M   42.7M34.0K   4.03G   3.78G   3.78G
   128  248   30.1M   29.5M   29.5M44.8K   5.44G   5.32G   5.32G
   256  339   42.1M   40.8M   40.8M 123K   15.4G   14.9G   14.9G
   512  374   46.8M   46.1M   46.1M 271K   33.9G   33.5G   33.5G
1K  254   31.8M   31.3M   31.3M 355K   44.4G   43.8G   43.8G
2K5640K513K513K12.9K   1.61G   1.20G   1.20G
4K4512K512K512K22.7K   2.84G   2.84G   2.84G
8K8  1M897K897K89.3K   11.2G   10.1G   10.1G
   64K2256K  6K  6K 226K   28.3G679M679M
 Total4.06M517G254G254G5.99M759G421G421G

---
Looking at the "dedup: DDT entries 4258028, size 448B on disk, 306B in core" 
line, we have:

4258028 de-dup entries, each entry used 306 bytes, so that's ~1.2GB of
memory for just the de-dup/ table.  But it is capped at 1/4 the size of
the ARC, so you may find on a small memory system such as a raspberry pi
that a lot of the dedup table can't be stored in memory and needs to be
on-disk, and this increases the I/O waits.

It may be an idea to disable de-dup and see if it helps to resolve your
issue on a comparatively slow and memory constrained system such as a
raspberry pi when you have large pools

** Changed in: zfs-linux (Ubuntu)
   Status: New => Incomplete

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => Medium

** Changed in: zfs-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs