Re: unrolled RC4 for ia64

Andy Polyakov Thu, 21 Jul 2005 01:41:18 -0700

david mosberger wrote:

IIRC, the loop should be scheduled for L2 latency.

In respect to input data maybe, but there is no way one can schedule 3*n[or even 4*n] RC4 loop for L2. Loads from key schedule are commonly usedalready in the next cycle, in other words key schedule is expected toreside in L1D. A.

1. RC4 implementation.


I wonder why key schedule prefetch is performed with 128 stride? As far
as I understand 128 bytes is L2 line-size. But the loop is scheduled for
L1D access, which [unilke L2] has 64 byte line-size. In other words it
appears that prefetch fills only every second line in L1D. Is it
intentional? I mean I realize that there is potential trade-off between
amount of lfetch instructions vs. couple of stalls in the first loop
spin...

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Re: unrolled RC4 for ia64

Reply via email to