Re: [julia-users] The optimization strategy for fft didn't work

2015-12-09 Thread Yichao Yu
On Wed, Dec 9, 2015 at 5:57 AM, 博陈  wrote:

>
> 
> the optimization strategy for fft given by the official documentation
> seems to fail. Why?
>

You didn't mention exactly what optimization strategy you are trying so I
would need to guess.

1. You should expect the first one to be no faster than the last one since
it's basically doing the same thing and the first one does it all in global
scope
2. In place op doesn't make too much a difference here since the operation
you are doing is already very expensive. (most of the time are spent in
FFTW)
3. It doesn't really matter for this one (since FFTW determines the
performance here) but you should benchmark the loop in a function and hoist
the creation of the plan out of the loop. For your actual code, you might
want to make the plan a global constant or a parametrized field of a type
since it has not been not particularly type stable.
4. You can use `plan_fft(, flags=FFTW.MEASURE)` to let FFTW select the
best algorithm by actually measuring the time instead of guessing. It gives
me 20% to 30% speed up for your example and IIRC more speed up for small
problems.
5. You can use `FFTW.flops(p)` to figure out how much floating point
operations are needed to perform your transformation. On my computer, a
MEASURE'd plan takes 4.3s (100 times) and the naive estimation from
assuming one operation per clock cycle is 2s (100 times) so it's the right
order of magnitude.


Re: [julia-users] The optimization strategy for fft didn't work

2015-12-09 Thread 博陈
Thanks a lot for your help. I am just starting to code with julia, and I 
surprisedly found that my julia code is about 2.5 times slower than the 
matlab one, they just do the same things. I learn from  
http://docs.julialang.org/en/release-0.4/stdlib/math/?highlight=fft#Base.fft  
about the fft optimization strategies. The optimization strategies are 
1. use plan_fft!() instead of plan_fft() or fft() to decrease the memory to 
be allocated and preallocate the fft operation.
2. use flags like FFTW.MEASURE or FFEW.EXHAUSTIVE. In my project, I have 
involved that flags, and it surely make a difference. But I'm just confused 
why the fft! and plan_fft strategy didn't work, which was clearly explained 
by you.





在 2015年12月9日星期三 UTC+8下午9:09:07,Yichao Yu写道:
>
>
>
> On Wed, Dec 9, 2015 at 5:57 AM, 博陈  
> wrote:
>
>>
>> 
>> the optimization strategy for fft given by the official documentation 
>> seems to fail. Why?
>>
>
> You didn't mention exactly what optimization strategy you are trying so I 
> would need to guess.
>
> 1. You should expect the first one to be no faster than the last one since 
> it's basically doing the same thing and the first one does it all in global 
> scope
> 2. In place op doesn't make too much a difference here since the operation 
> you are doing is already very expensive. (most of the time are spent in 
> FFTW)
> 3. It doesn't really matter for this one (since FFTW determines the 
> performance here) but you should benchmark the loop in a function and hoist 
> the creation of the plan out of the loop. For your actual code, you might 
> want to make the plan a global constant or a parametrized field of a type 
> since it has not been not particularly type stable.
> 4. You can use `plan_fft(, flags=FFTW.MEASURE)` to let FFTW select the 
> best algorithm by actually measuring the time instead of guessing. It gives 
> me 20% to 30% speed up for your example and IIRC more speed up for small 
> problems.
> 5. You can use `FFTW.flops(p)` to figure out how much floating point 
> operations are needed to perform your transformation. On my computer, a 
> MEASURE'd plan takes 4.3s (100 times) and the naive estimation from 
> assuming one operation per clock cycle is 2s (100 times) so it's the right 
> order of magnitude.
>
>