Thanks a lot for your help. I am just starting to code with julia, and I 
surprisedly found that my julia code is about 2.5 times slower than the 
matlab one, they just do the same things. I learn from  
http://docs.julialang.org/en/release-0.4/stdlib/math/?highlight=fft#Base.fft  
about the fft optimization strategies. The optimization strategies are 
1. use plan_fft!() instead of plan_fft() or fft() to decrease the memory to 
be allocated and preallocate the fft operation.
2. use flags like FFTW.MEASURE or FFEW.EXHAUSTIVE. In my project, I have 
involved that flags, and it surely make a difference. But I'm just confused 
why the fft! and plan_fft strategy didn't work, which was clearly explained 
by you.





在 2015年12月9日星期三 UTC+8下午9:09:07,Yichao Yu写道:
>
>
>
> On Wed, Dec 9, 2015 at 5:57 AM, 博陈 <chenph...@gmail.com <javascript:>> 
> wrote:
>
>>
>> <https://lh3.googleusercontent.com/-lTsIsN0BaAY/VmgIypsEQ2I/AAAAAAAAAAk/n-j-ZalGl5I/s1600/QQ%25E6%2588%25AA%25E5%259B%25BE20151209185519.png>
>> the optimization strategy for fft given by the official documentation 
>> seems to fail. Why?
>>
>
> You didn't mention exactly what optimization strategy you are trying so I 
> would need to guess.
>
> 1. You should expect the first one to be no faster than the last one since 
> it's basically doing the same thing and the first one does it all in global 
> scope
> 2. In place op doesn't make too much a difference here since the operation 
> you are doing is already very expensive. (most of the time are spent in 
> FFTW)
> 3. It doesn't really matter for this one (since FFTW determines the 
> performance here) but you should benchmark the loop in a function and hoist 
> the creation of the plan out of the loop. For your actual code, you might 
> want to make the plan a global constant or a parametrized field of a type 
> since it has not been not particularly type stable.
> 4. You can use `plan_fft(...., flags=FFTW.MEASURE)` to let FFTW select the 
> best algorithm by actually measuring the time instead of guessing. It gives 
> me 20% to 30% speed up for your example and IIRC more speed up for small 
> problems.
> 5. You can use `FFTW.flops(p)` to figure out how much floating point 
> operations are needed to perform your transformation. On my computer, a 
> MEASURE'd plan takes 4.3s (100 times) and the naive estimation from 
> assuming one operation per clock cycle is 2s (100 times) so it's the right 
> order of magnitude.
>
>

Reply via email to