> Sounds like loop unrolling is what you're talking about. Most modern
compilers (try
> to) do this already automatically. However, I've experimented on different
variations
> of this with the Linux source to, I think, v16 or so, where it seemed
possible to
> attain small benefits from various variations of look-unrolling. The
biggest problem
> here is that the number of iterations isn't divisible by any fixed amount.
Because of
> that the last few iterations need to be done "manually" outside the
unrolled block.
> The main advantage of such unrolling comes from not needing to check for
the number
> of timed events present in Prime95/mprime between each iteration - due to
cache
> considerations actually copying the whole FFT code out as many times as
needed
> instead of just using calls to it is probably even worse.

There's another trick to this, primarily useful in assembler not C
programming...

Say you unroll something 16 times....   Take teh actual iteration count
modulo 16, and JMP into the loop at that offset to start with the repeat
counter set to the count/16.  i.e. if you need to do, say, 60 iterations of
the inner code, thats 48 + 12, so you set the loop count to 3, and jump to
the 4th block of the 16 (16-12 = 4)

Anyways, prime95 is HEAVILY unrolled, using assembler macros to generate the
inline linear inner 'loops'.

-jrp

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to