david mosberger wrote:
IIRC, the loop should be scheduled for L2 latency.

In respect to input data maybe, but there is no way one can schedule 3*n [or even 4*n] RC4 loop for L2. Loads from key schedule are commonly used already in the next cycle, in other words key schedule is expected to reside in L1D. A.

1. RC4 implementation.

I wonder why key schedule prefetch is performed with 128 stride? As far
as I understand 128 bytes is L2 line-size. But the loop is scheduled for
L1D access, which [unilke L2] has 64 byte line-size. In other words it
appears that prefetch fills only every second line in L1D. Is it
intentional? I mean I realize that there is potential trade-off between
amount of lfetch instructions vs. couple of stalls in the first loop
spin...
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to