On 5 Oct 99, at 18:19, Guillermo Ballester Valor wrote:

> Well, I'm wonder the reason of such diferent performance. On intel
> machines MacLucasFFTW runs more than twice faster than MacLucas and on
> RISC processor MacLucasUNIX is better than MacLucasFFTW. Looking at the
> code, without deep understanding, one can see:
> 
> i) MacLucasUNIX uses intensively the 'register' key in local
> definitions, so a processor with many internal registers can allocate
> most of them. It is a good thing because they can be accessed very fast.
> The bad thing that is that in processors with very few registers (like
> intel's) it can slowdown the speed. 

And (according to a local computer scientist, who I think knows what 
he's talking about) with modern processors making extensive use of 
register renaming, it's not usually sensible to use the "register" 
keyword _at all_. The theory is that the instruction scheduler can do 
at least as good a job as the programmer - it gets more choice, 
anyway, e.g. there are 40 32-bit general-purpose registers in the 
Intel PPro, but only a few have "names" at any given time.
> 
> ii) On the other hand, FFTW does not use 'register' at all. All local
> variables are stored on stack. I don't know much about compilers, but
> perhaps some good compilers can use the register storage as speed
> optimization. Looking at the code generated by gnu-gcc on intel
> processors, some local double variables are stored on intel fpu and the
> performance is so good. 

Storing data on a stack is not very efficient in most RISC 
architectures - you tend to cause problems with cache alignment, 
overloading cache lines causing high miss rates, etc. The small 
caches on the Alpha 21164 design possibly contribute to this - the L1 
data cache is only 8K bytes & the L2 cache is only 96K bytes (but 
there can be a L3 cache which is at least 2M bytes, if fitted).

The Intel FPU is a special case! 
> 
> My question is: What can happen in FFTW code if we directly include
> 'register' keys management on its local temporal variable definitions?.
> This sort of things can be made with a single compiler option?. 
> 
> I did it. I've included register managements on all FFTW radix routines
> up to radix-16 (which need no more than 32 stack variables). For intel
> machines the code is untouched (because I previously defined REG as a
> void comment) . But I'm not the owner of a RISC machine so I have no
> idea about its performance. any volunteer?.
> 
Sure, I'll give it a go. Just mail me the source ... I've access to a 
Sun Ultra 10 as well (running Solaris, but with the gcc compiler, not 
Sun's own).
>
> Any improvement on MacLucasXXXX is desired.

Any improvement on _anything_ is desireable !!! Actually MacLucasUNIX 
on my Alpha system isn't bad, compiled from pure C source with gcc it 
gives Prime95/mprime running on a PII-333 a good run for its money (a 
bit faster, or a bit slower, depending on the exponent). Given the 
brilliant optimization George has done for the Intel CPU, I think 
this is quite good. I'm pretty sure I could at least double the speed 
of MacLucasUNIX on the Alpha, by replacing critical chunks of code 
with hand-tuned assembler, but the investment in terms of time & 
effort is too much for me 8-(
> 
> I think we can sqeeze FFTW a lot more. I like its code very much. The
> good performance on intel (45% with respect mprime) is good enought to
> work a litle more on it. 

I agree - in particular, there's an obvious gain in being able to do 
FFT with run lengths other than powers of 2, once you have the speed 
in the same ballpark. Nevertheless, I think FFTW will be hard pushed 
to match mprime on 32-bit Intel architecture systems.

There is an obvious need for something reasonably efficient and 
portable, if only to be able to take advantage of new processor 
designs (like Merced, and to a lesser extent Athlon) without having 
to expend very large amounts of effort in hand-optimization.

Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to