Hi Ard,

> Since turning the FPU on and off is cheap these days, simplify the
> SIMD routine by dropping the per-page yield, which makes for a
> cleaner switch to the library API as well.

In my measurements that lazy FPU restore works as intended, and I could
not identify any slowdown by this change. 

> +++ b/arch/x86/crypto/chacha_glue.c
> @@ -127,32 +127,32 @@ static int chacha_simd_stream_xor [...]
>  
> +     do_simd = (walk->total > CHACHA_BLOCK_SIZE) && crypto_simd_usable();

Given that most users (including chacha20poly1305) likely involve
multiple operations under the same (real) FPU save/restore cycle, those
length checks both in chacha and in poly1305 hardly make sense anymore.

Obviously under tcrypt we get better results when engaging SIMD for any
length, but also for real users this seems beneficial. But of course we
may defer that to a later optimization patch.

Thanks,
Martin

Reply via email to