> Looking at the SPUs on Cell, kencc won't let me make decent
> code for them: the vast space of vector instructions requires
> extensive language extensions to use well.  The overhead of a
> function call defeats the careful interleaving of those
> instructions.

I've probably read just enough about this architecture to
make a fool of myself, but it's Friday afternoon, so here
goes nothing.

One possible goal might be a language in which you could
describe high-level algorithms of a certain class which
could then be compiled to run well on a Cell (and, to be
a cool result, on some other thing).  This would probably
handle not just computation but also the necessary DMA
to get the data ready.  As you point out, that language
probably wouldn't be C, and it may well be the case that
it doesn't exist yet.

Failing that, it seems like what people will be doing for
a while is writing code carefully tuned to run well on
exactly one or two particular models of Cell, which seems
to me likely to look like carefully optimized "inner loop"
stuff wrapped by glue code which matters less.  I have to
wonder whether it would be less painful to learn the hardware
and write the optimized code in assembly language or to learn
the hardware *and* learn how to cajole a complicated compiler
into emitting the assembly language you know it should be
emitting.

With respect to kencc, I wonder how far you could get if
each Cell vector instruction were a C-callable .s function
of a few instructions and the SPU linker routinely inlined
all small-instruction-count functions and had an optimizer
explicitly designed for the SPU.

Dave Eckhardt

Reply via email to