I was just perusing the IA-64 docs that came out last month...I came up with
a few thoughts on how it would be a GREAT mersenne prime CPU:

- 128 FPU registers (126 usable)
 96 of them are rotating (not stacked) which I imagine could be used to the
code's advantage quite well, holding more data in registers during the FFT

- 82bit FPU (??)
 One document mentioned 82 bits for the FPU and registers.  I imagine this
would help with round-off problems vs. the 80 bit FPU core.  The IA-32
processors had 80 bits, right?  The 82 bits are: 64 bit significand, 17 bit
exponent, 1 bit sign.  The IEEE double extended only specifies 80, but there
we are with 82.

- Memory "speculation"
 Preload code and/or data...while the FPU is churning away, preload more
data into L2/L1 cache so it's in the high-speed memory by the time it's
needed (data prefetch/lfetch).  That will REALLY help on these large FFT
datasets!

- Faster FPU
 On top of all this, the FPU core is supposedly redesigned to do more per
clock cycle.  Some of the "enhancements" I spotted were: having 4 FP
multiplier accumulators (single precision), the fused multiply-add
instruction enhancements, load-pair instruction to load 2 FPU registers
simultaneously, etc.

- 64 bit integer ops
 Integer unit with 64 bits...need I say more?

- 128 64 bit general purpose registers

- 64 one bit predicate registers
 Separate registers to control the conditionals branching/execution

- 8 64 bit branch registers
 Finally some more registers to hold branch address locations

- 128 "application registers"
 Don't know about these...some are earmarked "for future use".  Hrmm...

- Bunch of fun parallel arithmetic instructions
 Probably useful for large numbers...


Anyway, that's just skimming the surface.

I figure with 126 usable 82 bit FP registers, you can have A LOT of stuff
done in the registers alone, speeding up stuff greatly and really trimming
down on worrying about rounding errors once it comes out of the register.
Prefetching data into the cache from main memory will also help quite a bit.
The FPU instruction set has a few new goodies that I foresee could help out
with FFT algorithms.

Not being really on top of how the FFT code really works, I'll leave it to
others to figure out how best this would all help George's code.  And
George...I hope you'll work on a nice IA-64 native program to use all this
cool new stuff once it's available.  Using all the EPIC "hints" in your
assembly code might be tricky at first, but I think the payoff would be
significant.

Aaron

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

Reply via email to