Hello,

I've just returned from PACT 2002 (http://moss.csc.ncsu.edu/pact02/) and 
a very interesting paper was presented there (well, actually several 
intersting parapers were presented, but this one is quite appropriate 
for this list I think). The title is "Compiler-Controlled Caching in 
Superword Register Files for Multimedia Exension Architectures" and you 
can download it here: 
<http://moss.csc.ncsu.edu/pact02/papers/shin183.pdf>.

Basically, they use the altivec registers as a cache for integer 
registers instead of the slow memory. They do this in the context of the 
parallellizing of loops for processors with multimedia extensions, but 
it seems to me it could be used in a more general fashion as well.

The results they achieve are quite astounding some cases: on a G4/533, 
they get speedups of almost 300% for some benchmarks (although other 
show little or no improvement). I imagine that on the current crop of 
G4's, the results may be even better given that the memory-processor bus 
has become an even greater bottleneck.

Note that they use their own (propriatary?) compiler to transform the C 
code to the AltiVec-caching variant, after which it is normally compiled 
using gcc. As such, I don't know whether this technology is freely 
available for incorporation in Apple/FSF gcc, but it's an interesting 
approach nonetheless.


Jonas

Reply via email to