Okay, I actually remembered to benchmark JavaPrime on the RS6000 we have
here...
That RS6000 can rock!
FFT = 256k
.4874s/iteration!!!
And that was with my old JavaLucas version... Gotta try the new one which
does the radix 8 ops... but its currently being revamped so I suppose when
I'm done...
I guess the PPC 603s (or are they 604s... hmmm its a RS/6000 63p) can kick
some butt w/ their Java FP libs... Makes me almost want to try it on the
AS/400, I can try it later today... problem is the thing is so friggin'
weird...
Does anyone have benchmarks for MacLucasUnix on the RS6K that I can compare
against? I'd look at the benchmark page, but for some reason my company's
firewall is blocking the site "because it contains sex related content"....
Odd...
Anyway, I was thinking yet again (I know, you gotta hate when I do that...)
The FFT algorithm inherently lends itself towards a recursive
implementation, but for speed reasons we do it iteratively. Or in more
modern implementations, semi-iteratively...
However, for a multiple processor machine... wouldn't it be effecient to
recurse deep enough to spawn X (x being the # of CPUs) threads from there? I
don't see how we run into any memory/access problems in a multithreaded
scenario here... Another thought is to have each CPU perform a portion of
the FFT, and use the Chinese Remainder Theorem to combine the results... (I
think, I need to read up on the CRT some more)...
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm