Re: BLAKE3 unstability?

2022-07-12 Thread John Baldwin

On 7/12/22 1:41 AM, Evgeniy Khramtsov wrote:

I can reproduce via:

$ truncate -s 10G /tmp/test
$ mdconfig -f /tmp/test -S 4096
$ zpool create test /dev/md1
$ zfs create -o checksum=blake3 test/b
$ dd if=/dev/random of=/test/b/noise bs=1M count=4096
$ sync
$ zpool scrub test
$ zpool status


I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was
most recently merged) built out of tree on either stable/13 70fd40edb86
or main 9aa02d5120a.

I'll update a system and see if I can reproduce it with the in-tree ZFS.

- Ryan


It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either.

Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using
different implementations?
I do see that blake3 went in with only a Linux module parameter for the
implementation selection, so I'll have to fix that. For now we can at least
see which was fastest, which should be the one selected. You just won't be
able to manually change it to see if that helps.

- Ryan


I found the culprit (kernel and base from download.FreeBSD.org
kernel.txz and base.txz respectively) (I forgot about local sysctl.conf...):

kern.sched.steal_thresh=1
kern.sched.preempt_thresh=121

Then

#!/bin/sh

truncate -s 10G /tmp/test
mdconfig -f /tmp/test -S 4096
zpool create test /dev/md0
zfs create -o checksum=blake3 test/b
dd if=/dev/random of=/test/b/noise bs=1M count=4096
sync
zpool scrub test
sleep 3
zpool status

zpool destroy test
mdconfig -d -u 0
rm /tmp/test

As for ULE "tuning", these values give me fine desktop interactivity
when building lang/rust when nice and idprio did not help, so I left
them in sysctl.conf. Not sure if scheduling parameters are worthy of
a ZFS PR, maybe something essential is preempted.


It could be missing fpu_kern_enter/leave that lack of preemption would
cover over.  I thought that missing that would give a panic in the
kernel though due to FPU instructions being disabled (including vector
instructions).  Maybe ZFS isn't using fpu_kern_enter(FPU_NOCTX) and is
instead trying to juggle contexts and it has a bug in how it manages
saved FPU contexts and reuses a context?  If so, I would just suggest
that ZFS switch to using FPU_KERN_NOCTX instead which runs all SSE
type code in a critical section to disable preemption but avoids
having to allocate and manage FPU contexts.

--
John Baldwin



Re: BLAKE3 unstability?

2022-07-12 Thread Evgeniy Khramtsov
> > > I can reproduce via:
> > > 
> > > $ truncate -s 10G /tmp/test
> > > $ mdconfig -f /tmp/test -S 4096
> > > $ zpool create test /dev/md1
> > > $ zfs create -o checksum=blake3 test/b
> > > $ dd if=/dev/random of=/test/b/noise bs=1M count=4096
> > > $ sync
> > > $ zpool scrub test
> > > $ zpool status
> > 
> > I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was
> > most recently merged) built out of tree on either stable/13 70fd40edb86
> > or main 9aa02d5120a.
> > 
> > I'll update a system and see if I can reproduce it with the in-tree ZFS.
> > 
> > - Ryan
> > 
> It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either.
> 
> Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using
> different implementations?
> I do see that blake3 went in with only a Linux module parameter for the
> implementation selection, so I'll have to fix that. For now we can at least
> see which was fastest, which should be the one selected. You just won't be
> able to manually change it to see if that helps.
> 
> - Ryan

I found the culprit (kernel and base from download.FreeBSD.org
kernel.txz and base.txz respectively) (I forgot about local sysctl.conf...):

kern.sched.steal_thresh=1
kern.sched.preempt_thresh=121

Then

#!/bin/sh

truncate -s 10G /tmp/test
mdconfig -f /tmp/test -S 4096
zpool create test /dev/md0
zfs create -o checksum=blake3 test/b
dd if=/dev/random of=/test/b/noise bs=1M count=4096
sync
zpool scrub test
sleep 3
zpool status

zpool destroy test
mdconfig -d -u 0
rm /tmp/test

As for ULE "tuning", these values give me fine desktop interactivity
when building lang/rust when nice and idprio did not help, so I left
them in sysctl.conf. Not sure if scheduling parameters are worthy of
a ZFS PR, maybe something essential is preempted.



Re: BLAKE3 unstability?

2022-07-11 Thread Evgeniy Khramtsov
(forgot to CC -CURRENT, so replying to the list)

> It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either.
> 
> Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using
> different implementations?
> I do see that blake3 went in with only a Linux module parameter for the
> implementation selection, so I'll have to fix that. For now we can at least
> see which was fastest, which should be the one selected. You just won't be
> able to manually change it to see if that helps.
> 
> - Ryan
 
$ sysctl kstat.zfs.misc.chksum_bench
kstat.zfs.misc.chksum_bench:
implementation 1k  4k 16k 64k256k  1m  4m
edonr-generic1358158016421642162115601525
skein-generic 238 252 256 256 256 256 254
sha256-generic242 263 269 271 271 271 271
sha512-generic373 416 428 430 431 431 431
blake3-generic482 478 477 474 473 473 473
blake3-sse2   338140315051526151915041503
blake3-sse41  350160217251758175317471747
blake3-avx2   350187433363550352734923485
 
Could it be due to SIMD? I can try a patch. I can also try with GENERIC
kernel tomorrow as it is the only local modification left. I thought it
also could be damaged hardware, but AVX2 torture tests and memtest runs
fine.
 
Thanks for investigating this issue.



Re: BLAKE3 unstability?

2022-07-11 Thread Ryan Moeller

On 7/11/22 11:43 AM, Ryan Moeller wrote:


On 7/9/22 1:56 PM, Evgeniy Khramtsov wrote:

I can reproduce via:

$ truncate -s 10G /tmp/test
$ mdconfig -f /tmp/test -S 4096
$ zpool create test /dev/md1
$ zfs create -o checksum=blake3 test/b
$ dd if=/dev/random of=/test/b/noise bs=1M count=4096
$ sync
$ zpool scrub test
$ zpool status


I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that 
was most recently merged) built out of tree on either stable/13 
70fd40edb86 or main 9aa02d5120a.


I'll update a system and see if I can reproduce it with the in-tree ZFS.

- Ryan


It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either.

Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using 
different implementations?
I do see that blake3 went in with only a Linux module parameter for the 
implementation selection, so I'll have to fix that. For now we can at 
least see which was fastest, which should be the one selected. You just 
won't be able to manually change it to see if that helps.


- Ryan

Re: BLAKE3 unstability?

2022-07-11 Thread Ryan Moeller



On 7/9/22 1:56 PM, Evgeniy Khramtsov wrote:

I can reproduce via:

$ truncate -s 10G /tmp/test
$ mdconfig -f /tmp/test -S 4096
$ zpool create test /dev/md1
$ zfs create -o checksum=blake3 test/b
$ dd if=/dev/random of=/test/b/noise bs=1M count=4096
$ sync
$ zpool scrub test
$ zpool status


I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was 
most recently merged) built out of tree on either stable/13 70fd40edb86 
or main 9aa02d5120a.


I'll update a system and see if I can reproduce it with the in-tree ZFS.

- Ryan




Re: BLAKE3 unstability?

2022-07-09 Thread Warner Losh
On Sat, Jul 9, 2022 at 11:57 AM Evgeniy Khramtsov 
wrote:

> I can reproduce via:
>
> $ truncate -s 10G /tmp/test
> $ mdconfig -f /tmp/test -S 4096
> $ zpool create test /dev/md1
> $ zfs create -o checksum=blake3 test/b
> $ dd if=/dev/random of=/test/b/noise bs=1M count=4096
> $ sync
> $ zpool scrub test
> $ zpool status
>


OK. I'll leave it to the OpenZFS folks then. The boot path isn't involved
in this at all :) Thanks for the reproducer.

Warner


Re: BLAKE3 unstability?

2022-07-09 Thread Evgeniy Khramtsov
I can reproduce via:

$ truncate -s 10G /tmp/test
$ mdconfig -f /tmp/test -S 4096
$ zpool create test /dev/md1
$ zfs create -o checksum=blake3 test/b
$ dd if=/dev/random of=/test/b/noise bs=1M count=4096
$ sync
$ zpool scrub test
$ zpool status



Re: BLAKE3 unstability?

2022-07-09 Thread Warner Losh
Be advised that blake3 went into the bootloader yesterday and has been only
lightly tested.

Don't think this would cause your checksum errors since the boot loader is
read only.

Warner

On Sat, Jul 9, 2022, 10:27 AM Evgeniy Khramtsov 
wrote:

> -CURRENT as of:
>
> commit eec3290266bc09b4c4b4d875d2269d611adc0016 (main)
> Author: Ganbold Tsagaankhuu 
> Date:   Sat Jul 9 13:06:52 2022 +
>
> Add RK3568 SoC support to pinctrl driver.
>
> Submitted by:   sos
> Reviewed by:manu
> Differential Revision:
> https://reviews.freebsd.org/D31330
>
> New boot environment created via bectl, zfs set checksum=blake3 new_be,
> pkg -r /tmp/be_N upgrade, reboot into new BE, login failed after
> sysutils/pam_xdg failed integrity check.
>
> Rebooted into previous BE, did zpool scrub and got 32 checksum errors
> without files specified before getting panic (no trace due to drm-kmod?).
>
> bectl destroy / zfs destroy of new env with blake3 restored pool back to
> 0 checksum error state.
>
> No out-of-tree base modifications present, only a custom kernel config.
>
>