Hello, I've just returned from PACT 2002 (http://moss.csc.ncsu.edu/pact02/) and a very interesting paper was presented there (well, actually several intersting parapers were presented, but this one is quite appropriate for this list I think). The title is "Compiler-Controlled Caching in Superword Register Files for Multimedia Exension Architectures" and you can download it here: <http://moss.csc.ncsu.edu/pact02/papers/shin183.pdf>.
Basically, they use the altivec registers as a cache for integer registers instead of the slow memory. They do this in the context of the parallellizing of loops for processors with multimedia extensions, but it seems to me it could be used in a more general fashion as well. The results they achieve are quite astounding some cases: on a G4/533, they get speedups of almost 300% for some benchmarks (although other show little or no improvement). I imagine that on the current crop of G4's, the results may be even better given that the memory-processor bus has become an even greater bottleneck. Note that they use their own (propriatary?) compiler to transform the C code to the AltiVec-caching variant, after which it is normally compiled using gcc. As such, I don't know whether this technology is freely available for incorporation in Apple/FSF gcc, but it's an interesting approach nonetheless. Jonas
