George wrote:
>          One correction to my previous post.  I said that the latency to
> access the L1 data cache was 2 clocks.  This is correct for integer
> instructions only.  For floating point and SSE2 instructions the latency
> is 6 clocks!  Interestingly, the L2 cache latency is 7 clocks for both
> integer and floating point instructions.
> 
Look at the coupling that the FPU has to the cache for one reason.  I would
expect
that the FPU(s) have more ports on the L1 than that integer units do.  Also,
if you look
at the sensitivity of different types of code to load latency, integer code,
by far, is
more sensitive than floating point.  Think about the length of the floating
point
pipeline, it's pretty long to start with, so you're gonig to *have* to
unroll your code
to take advantage of the pipeline, so you might as well cover the additional
load to
use latency the same way.  With enough rename registers, it's all good. :)

Cheers,
David
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to