> From: Reshetova, Elena
> > Sent: 03 May 2019 17:17
> ...
> > rdrand (calling every 8 syscalls): Simple syscall: 0.0795 microseconds
> 
> You could try something like:
>       u64 rand_val = cpu_var->syscall_rand
> 
>       while (unlikely(rand_val == 0))
>               rand_val = rdrand64();
> 
>       stack_offset = rand_val & 0xff;
>       rand_val >>= 6;
>       if (likely(rand_val >= 4))
>               cpu_var->syscall_rand = rand_val;
>       else
>               cpu_var->syscall_rand = rdrand64();
> 
>       return stack_offset;
> 
> That gives you 10 system calls per rdrand instruction
> and mostly takes the latency out of line.

I am not really happy going the rdrand path for a couple of reasons:
- it is not available on older PCs
- its performance varies across CPUs that support it (and as I understood 
varies quite some)
- it is x86 centric and not generic

So, if we can use get_random_bytes() interface without tightening ourselves to
a particular instruction, I think it would be better. 
The numbers I have measured so far for buffer size of 4096 is SW only, 
I will try to measure today what boost (if any) we can have if we use SIMD 
code for it. 

Best Regards,
Elena.

Reply via email to