Re: BLAKE3 unstability?
On 7/12/22 1:41 AM, Evgeniy Khramtsov wrote: I can reproduce via: $ truncate -s 10G /tmp/test $ mdconfig -f /tmp/test -S 4096 $ zpool create test /dev/md1 $ zfs create -o checksum=blake3 test/b $ dd if=/dev/random of=/test/b/noise bs=1M count=4096 $ sync $ zpool scrub test $ zpool status I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was most recently merged) built out of tree on either stable/13 70fd40edb86 or main 9aa02d5120a. I'll update a system and see if I can reproduce it with the in-tree ZFS. - Ryan It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either. Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using different implementations? I do see that blake3 went in with only a Linux module parameter for the implementation selection, so I'll have to fix that. For now we can at least see which was fastest, which should be the one selected. You just won't be able to manually change it to see if that helps. - Ryan I found the culprit (kernel and base from download.FreeBSD.org kernel.txz and base.txz respectively) (I forgot about local sysctl.conf...): kern.sched.steal_thresh=1 kern.sched.preempt_thresh=121 Then #!/bin/sh truncate -s 10G /tmp/test mdconfig -f /tmp/test -S 4096 zpool create test /dev/md0 zfs create -o checksum=blake3 test/b dd if=/dev/random of=/test/b/noise bs=1M count=4096 sync zpool scrub test sleep 3 zpool status zpool destroy test mdconfig -d -u 0 rm /tmp/test As for ULE "tuning", these values give me fine desktop interactivity when building lang/rust when nice and idprio did not help, so I left them in sysctl.conf. Not sure if scheduling parameters are worthy of a ZFS PR, maybe something essential is preempted. It could be missing fpu_kern_enter/leave that lack of preemption would cover over. I thought that missing that would give a panic in the kernel though due to FPU instructions being disabled (including vector instructions). Maybe ZFS isn't using fpu_kern_enter(FPU_NOCTX) and is instead trying to juggle contexts and it has a bug in how it manages saved FPU contexts and reuses a context? If so, I would just suggest that ZFS switch to using FPU_KERN_NOCTX instead which runs all SSE type code in a critical section to disable preemption but avoids having to allocate and manage FPU contexts. -- John Baldwin
Re: BLAKE3 unstability?
> > > I can reproduce via: > > > > > > $ truncate -s 10G /tmp/test > > > $ mdconfig -f /tmp/test -S 4096 > > > $ zpool create test /dev/md1 > > > $ zfs create -o checksum=blake3 test/b > > > $ dd if=/dev/random of=/test/b/noise bs=1M count=4096 > > > $ sync > > > $ zpool scrub test > > > $ zpool status > > > > I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was > > most recently merged) built out of tree on either stable/13 70fd40edb86 > > or main 9aa02d5120a. > > > > I'll update a system and see if I can reproduce it with the in-tree ZFS. > > > > - Ryan > > > It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either. > > Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using > different implementations? > I do see that blake3 went in with only a Linux module parameter for the > implementation selection, so I'll have to fix that. For now we can at least > see which was fastest, which should be the one selected. You just won't be > able to manually change it to see if that helps. > > - Ryan I found the culprit (kernel and base from download.FreeBSD.org kernel.txz and base.txz respectively) (I forgot about local sysctl.conf...): kern.sched.steal_thresh=1 kern.sched.preempt_thresh=121 Then #!/bin/sh truncate -s 10G /tmp/test mdconfig -f /tmp/test -S 4096 zpool create test /dev/md0 zfs create -o checksum=blake3 test/b dd if=/dev/random of=/test/b/noise bs=1M count=4096 sync zpool scrub test sleep 3 zpool status zpool destroy test mdconfig -d -u 0 rm /tmp/test As for ULE "tuning", these values give me fine desktop interactivity when building lang/rust when nice and idprio did not help, so I left them in sysctl.conf. Not sure if scheduling parameters are worthy of a ZFS PR, maybe something essential is preempted.
Re: BLAKE3 unstability?
(forgot to CC -CURRENT, so replying to the list) > It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either. > > Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using > different implementations? > I do see that blake3 went in with only a Linux module parameter for the > implementation selection, so I'll have to fix that. For now we can at least > see which was fastest, which should be the one selected. You just won't be > able to manually change it to see if that helps. > > - Ryan $ sysctl kstat.zfs.misc.chksum_bench kstat.zfs.misc.chksum_bench: implementation 1k 4k 16k 64k256k 1m 4m edonr-generic1358158016421642162115601525 skein-generic 238 252 256 256 256 256 254 sha256-generic242 263 269 271 271 271 271 sha512-generic373 416 428 430 431 431 431 blake3-generic482 478 477 474 473 473 473 blake3-sse2 338140315051526151915041503 blake3-sse41 350160217251758175317471747 blake3-avx2 350187433363550352734923485 Could it be due to SIMD? I can try a patch. I can also try with GENERIC kernel tomorrow as it is the only local modification left. I thought it also could be damaged hardware, but AVX2 torture tests and memtest runs fine. Thanks for investigating this issue.
Re: BLAKE3 unstability?
On 7/11/22 11:43 AM, Ryan Moeller wrote: On 7/9/22 1:56 PM, Evgeniy Khramtsov wrote: I can reproduce via: $ truncate -s 10G /tmp/test $ mdconfig -f /tmp/test -S 4096 $ zpool create test /dev/md1 $ zfs create -o checksum=blake3 test/b $ dd if=/dev/random of=/test/b/noise bs=1M count=4096 $ sync $ zpool scrub test $ zpool status I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was most recently merged) built out of tree on either stable/13 70fd40edb86 or main 9aa02d5120a. I'll update a system and see if I can reproduce it with the in-tree ZFS. - Ryan It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either. Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using different implementations? I do see that blake3 went in with only a Linux module parameter for the implementation selection, so I'll have to fix that. For now we can at least see which was fastest, which should be the one selected. You just won't be able to manually change it to see if that helps. - Ryan
Re: BLAKE3 unstability?
On 7/9/22 1:56 PM, Evgeniy Khramtsov wrote: I can reproduce via: $ truncate -s 10G /tmp/test $ mdconfig -f /tmp/test -S 4096 $ zpool create test /dev/md1 $ zfs create -o checksum=blake3 test/b $ dd if=/dev/random of=/test/b/noise bs=1M count=4096 $ sync $ zpool scrub test $ zpool status I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was most recently merged) built out of tree on either stable/13 70fd40edb86 or main 9aa02d5120a. I'll update a system and see if I can reproduce it with the in-tree ZFS. - Ryan
Re: BLAKE3 unstability?
On Sat, Jul 9, 2022 at 11:57 AM Evgeniy Khramtsov wrote: > I can reproduce via: > > $ truncate -s 10G /tmp/test > $ mdconfig -f /tmp/test -S 4096 > $ zpool create test /dev/md1 > $ zfs create -o checksum=blake3 test/b > $ dd if=/dev/random of=/test/b/noise bs=1M count=4096 > $ sync > $ zpool scrub test > $ zpool status > OK. I'll leave it to the OpenZFS folks then. The boot path isn't involved in this at all :) Thanks for the reproducer. Warner
Re: BLAKE3 unstability?
I can reproduce via: $ truncate -s 10G /tmp/test $ mdconfig -f /tmp/test -S 4096 $ zpool create test /dev/md1 $ zfs create -o checksum=blake3 test/b $ dd if=/dev/random of=/test/b/noise bs=1M count=4096 $ sync $ zpool scrub test $ zpool status
Re: BLAKE3 unstability?
Be advised that blake3 went into the bootloader yesterday and has been only lightly tested. Don't think this would cause your checksum errors since the boot loader is read only. Warner On Sat, Jul 9, 2022, 10:27 AM Evgeniy Khramtsov wrote: > -CURRENT as of: > > commit eec3290266bc09b4c4b4d875d2269d611adc0016 (main) > Author: Ganbold Tsagaankhuu > Date: Sat Jul 9 13:06:52 2022 + > > Add RK3568 SoC support to pinctrl driver. > > Submitted by: sos > Reviewed by:manu > Differential Revision: > https://reviews.freebsd.org/D31330 > > New boot environment created via bectl, zfs set checksum=blake3 new_be, > pkg -r /tmp/be_N upgrade, reboot into new BE, login failed after > sysutils/pam_xdg failed integrity check. > > Rebooted into previous BE, did zpool scrub and got 32 checksum errors > without files specified before getting panic (no trace due to drm-kmod?). > > bectl destroy / zfs destroy of new env with blake3 restored pool back to > 0 checksum error state. > > No out-of-tree base modifications present, only a custom kernel config. > >