>The part taking the most cycles in celt is the fft/imdct at the moment. I
have been contributing some patches with optimizations in this area
upstream and hope that soon we will be able to merge in the upstream codec
to rockbox. It also has some other optimizations and other improvements
that might help us.


For what its worth, I've been working on this.  This link:
http://pastie.org/4908106

indicates that 71% of the total runtime on PP is spent in the MDCT.  This
is insanely high and probably indicates that gcc is screwing up.
 Presumably it does not do much better on ARM9/11.

My first efforts are here:

http://gerrit.rockbox.org/r/#/c/377

Since then I've been working on writing the FFT butterlfy functions in
ARMv5 assembly.  It should be possible to get a large speed up by
exploiting our prior knowledge about the function arguments (in practice,
only the 480 point FFT significantly impacts performance thus we know the
stride and loop counts in advance) and by using 16 bit packed instructions
to implement the butterflies rather then the generic 32 bit arm ones.
 Together these should give a very large speed up I think.

Mike

Reply via email to