Yea, I also just ran it on a machine with a 4MB L2 cache instead of the
512KB one I had been using.  These machines usually are about equally fast
for purely CPU-bound stuff in my experience, and that holds here, too:
They're within 5% of each other.

On Mon, Aug 2, 2010 at 3:37 PM, Don Clugston <[email protected]>wrote:

> On 2 August 2010 19:49, David Simcha <[email protected]> wrote:
> > Oh, also, I don't think that cache effects are the main bottleneck
> because
> > switching to single-precision floats for both input and output has a
> > negligible effect on performance even though it cuts the size of the
> working
> > set in half.
>
> Interesting. Still, I think that because of the way FFT works, once
> you're bigger than the cache, nearly every memory access will be a
> cache miss. It could be that although the memory footprint halves, the
> number of cache misses remains constant.
>
> Anyway, the reason I posted the link was not so much to help with
> implementation, but more because it gives a great feel for what's
> involved in a "state of the art" FFT library. I suspect there's a
> sweet spot with high convenience, small code size, and good-enough
> performance.
> _______________________________________________
> phobos mailing list
> [email protected]
> http://lists.puremagic.com/mailman/listinfo/phobos
>
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos

Reply via email to