Yea, I also just ran it on a machine with a 4MB L2 cache instead of the 512KB one I had been using. These machines usually are about equally fast for purely CPU-bound stuff in my experience, and that holds here, too: They're within 5% of each other.
On Mon, Aug 2, 2010 at 3:37 PM, Don Clugston <[email protected]>wrote: > On 2 August 2010 19:49, David Simcha <[email protected]> wrote: > > Oh, also, I don't think that cache effects are the main bottleneck > because > > switching to single-precision floats for both input and output has a > > negligible effect on performance even though it cuts the size of the > working > > set in half. > > Interesting. Still, I think that because of the way FFT works, once > you're bigger than the cache, nearly every memory access will be a > cache miss. It could be that although the memory footprint halves, the > number of cache misses remains constant. > > Anyway, the reason I posted the link was not so much to help with > implementation, but more because it gives a great feel for what's > involved in a "state of the art" FFT library. I suspect there's a > sweet spot with high convenience, small code size, and good-enough > performance. > _______________________________________________ > phobos mailing list > [email protected] > http://lists.puremagic.com/mailman/listinfo/phobos >
_______________________________________________ phobos mailing list [email protected] http://lists.puremagic.com/mailman/listinfo/phobos
