On 7/12/22 1:41 AM, Evgeniy Khramtsov wrote:
I can reproduce via:

$ truncate -s 10G /tmp/test
$ mdconfig -f /tmp/test -S 4096
$ zpool create test /dev/md1
$ zfs create -o checksum=blake3 test/b
$ dd if=/dev/random of=/test/b/noise bs=1M count=4096
$ sync
$ zpool scrub test
$ zpool status

I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was
most recently merged) built out of tree on either stable/13 70fd40edb86
or main 9aa02d5120a.

I'll update a system and see if I can reproduce it with the in-tree ZFS.

- Ryan

It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either.

Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using
different implementations?
I do see that blake3 went in with only a Linux module parameter for the
implementation selection, so I'll have to fix that. For now we can at least
see which was fastest, which should be the one selected. You just won't be
able to manually change it to see if that helps.

- Ryan

I found the culprit (kernel and base from download.FreeBSD.org
kernel.txz and base.txz respectively) (I forgot about local sysctl.conf...):

kern.sched.steal_thresh=1
kern.sched.preempt_thresh=121

Then

#!/bin/sh

truncate -s 10G /tmp/test
mdconfig -f /tmp/test -S 4096
zpool create test /dev/md0
zfs create -o checksum=blake3 test/b
dd if=/dev/random of=/test/b/noise bs=1M count=4096
sync
zpool scrub test
sleep 3
zpool status

zpool destroy test
mdconfig -d -u 0
rm /tmp/test

As for ULE "tuning", these values give me fine desktop interactivity
when building lang/rust when nice and idprio did not help, so I left
them in sysctl.conf. Not sure if scheduling parameters are worthy of
a ZFS PR, maybe something essential is preempted.

It could be missing fpu_kern_enter/leave that lack of preemption would
cover over.  I thought that missing that would give a panic in the
kernel though due to FPU instructions being disabled (including vector
instructions).  Maybe ZFS isn't using fpu_kern_enter(FPU_NOCTX) and is
instead trying to juggle contexts and it has a bug in how it manages
saved FPU contexts and reuses a context?  If so, I would just suggest
that ZFS switch to using FPU_KERN_NOCTX instead which runs all SSE
type code in a critical section to disable preemption but avoids
having to allocate and manage FPU contexts.

--
John Baldwin

Reply via email to