Thanks a lot for your help. I am just starting to code with julia, and I surprisedly found that my julia code is about 2.5 times slower than the matlab one, they just do the same things. I learn from http://docs.julialang.org/en/release-0.4/stdlib/math/?highlight=fft#Base.fft about the fft optimization strategies. The optimization strategies are 1. use plan_fft!() instead of plan_fft() or fft() to decrease the memory to be allocated and preallocate the fft operation. 2. use flags like FFTW.MEASURE or FFEW.EXHAUSTIVE. In my project, I have involved that flags, and it surely make a difference. But I'm just confused why the fft! and plan_fft strategy didn't work, which was clearly explained by you.
在 2015年12月9日星期三 UTC+8下午9:09:07,Yichao Yu写道: > > > > On Wed, Dec 9, 2015 at 5:57 AM, 博陈 <chenph...@gmail.com <javascript:>> > wrote: > >> >> <https://lh3.googleusercontent.com/-lTsIsN0BaAY/VmgIypsEQ2I/AAAAAAAAAAk/n-j-ZalGl5I/s1600/QQ%25E6%2588%25AA%25E5%259B%25BE20151209185519.png> >> the optimization strategy for fft given by the official documentation >> seems to fail. Why? >> > > You didn't mention exactly what optimization strategy you are trying so I > would need to guess. > > 1. You should expect the first one to be no faster than the last one since > it's basically doing the same thing and the first one does it all in global > scope > 2. In place op doesn't make too much a difference here since the operation > you are doing is already very expensive. (most of the time are spent in > FFTW) > 3. It doesn't really matter for this one (since FFTW determines the > performance here) but you should benchmark the loop in a function and hoist > the creation of the plan out of the loop. For your actual code, you might > want to make the plan a global constant or a parametrized field of a type > since it has not been not particularly type stable. > 4. You can use `plan_fft(...., flags=FFTW.MEASURE)` to let FFTW select the > best algorithm by actually measuring the time instead of guessing. It gives > me 20% to 30% speed up for your example and IIRC more speed up for small > problems. > 5. You can use `FFTW.flops(p)` to figure out how much floating point > operations are needed to perform your transformation. On my computer, a > MEASURE'd plan takes 4.3s (100 times) and the naive estimation from > assuming one operation per clock cycle is 2s (100 times) so it's the right > order of magnitude. > >