On Thu, 1 Jul 2004 10:25:51 +0200 Ruben van Royen <[EMAIL PROTECTED]> wrote:
> Hi all, > > please note that SSE2 has support for 64bit floats (doubles) and contains an > instruction that truncates to int, irregardless of controlwords. A new enough > gcc with (-march=pentium4 or -msse2) and -mfpmath=sse will use sse instead of > the old fp unit. This has more advantages, since sse math uses normal > registers instead of the stack in the old fp unit. SSE and SSE2 are a huge advantage for some algorithms and nearly useless for others. I recently spent a good deal of time trying to implement the inner most loop of Secret Rabbit Code in SSE where single precision fp was sufficient. The best the compiler could do by compiling the existing C code with -msse -mfpmath=sse was half the speed of the same code compiled for the standard FPU. I then turned to the <xintrinsics.h> header file and pretty much hand coded asm. My best effort was still about 20% slower than the C compiled for the standard FPU. The problem with SRC is that I am calculating coeffients for the filter on the fly by looking up a large table and interpolating between coefficients. There is simply no way to vectorize it. However, I'm working on another project where I expect to obtain close to the full 4 times speed improvement because the algorithm fits the SSE 4 samples-at-a-time processing model very, very well. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo [EMAIL PROTECTED] (Yes it's valid) +-----------------------------------------------------------+ Spammer: Any of you guys looking for a permanent position in Scotland? Kaz Kylheku: No, I'm looking for a thug in Scotland who might be interested in beating up off-topic Usenet spammers, on a pro bono basis.