Did you mean that I should change all my .* , Pl* and ./ operations into 
'for' loops? I have had a try, but it seems that this will take more 
running time. I also wrote some small scripts to test the use of 
devectorization, and could not observe any performance improvement.



在 2015年12月15日星期二 UTC+8上午2:15:11,Yichao Yu写道:
>
> On Mon, Dec 14, 2015 at 1:12 PM, Yichao Yu <yyc...@gmail.com <javascript:>> 
> wrote: 
> > On Sun, Dec 13, 2015 at 10:35 PM, 博陈 <chenph...@gmail.com <javascript:>> 
> wrote: 
> >> This is my first julia code, I am happy it did the right thing, but 
> compared 
> >> with the Matlab code that did the same thing, it runs so slowly. The 
> Matlab 
> >> code takes about 90s, but the julia below takes 130s. 
>
> Another issue is that the code is not vectorizable (SIMD sense) due to 
> the way we store the complex array. (This is almost identical to the 
> simulation I am doing so I've run into this problem.) I might do some 
> more benchmarks on alternatives and report back later. 
>
> > 
> > There's a few problems with the code. See below. You are still 
> > creating many temporary arrays which is in general a bad idea. 
> > 
> >> 
> >> Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) -2./sqrt(y^2+1.) 
>  + 
> >> 1./sqrt((x-y)^2+1.) 
> >> Ko(x::Float64,y::Float64) = x^2+y^2 
> >> 
> >> function ground() 
> >>     R = 320. 
> >>     Ks = 2^11 
> >>     x = linspace(-R, R, Ks+1) 
> > 
> > As a general advice, don't use this dimension for fft. If you replace 
> > you Ks with 2^11 - 1, you will see at least 2x speed up. 
> > 
> >>     dx = 2R/Ks 
> >>     dt = 0.1 
> >>     pm = 2π/2dx 
> >>     px = circshift( linspace(-pm, pm, Ks+1),  round(Int64, Ks/2) ) 
> >>     FFTW.set_num_threads(8) 
> >> 
> >>     V = Float64[Potential(ix, iy)  for ix in x, iy in x] 
> >>     ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px] 
> >>     ko2 = ko.^2 
> >>     vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x] 
> >>     ϕ = Array(Complex128,(Ks+1,Ks+1)) 
> >>     Pl = plan_fft!(ϕ; flags=FFTW.MEASURE) 
> >>     ϕ = V.*complex(1.) 
> >> 
> >>     normphi = sqrt(sumabs2(ϕ))*dx 
> >>     ϕ /= normphi 
> >> 
> >>     ϕ = Pl*ϕ 
> >>     ϕ .*= ko 
> >>     ϕ = Pl\ϕ 
> >> 
> >>     Δϕ = 1. 
> >>     nstep = 0 
> >>     print("start loop") 
> >>      while Δϕ > 1.e-15 
> >>         ϕ′ = ϕ 
> > 
> > You are better off pre-allocate the two buffers and swap them 
> > 
> >>         ϕ .*= vo 
> > 
> > Use `scale!` 
> > 
> >> 
> >>         normphi = sqrt(sumabs2(ϕ))*dx 
> >>         ϕ /= normphi 
> > 
> > Also use `scale!` or write the loop out 
> > 
> >> 
> >>         ϕ = Pl*ϕ 
> > 
> > The assignment is not necessary. 
> > 
> >>         ϕ .*= ko2 
> > 
> > scale! 
> > 
> >>         ϕ = Pl\ϕ 
> > 
> > Unnecessary 
> > 
> >> # if check  Δϕ for every step, 35s is needed. 
> >>         if nstep>500 
> >>             Δϕ = maxabs(ϕ-ϕ′) 
> >>         end 
> >>         nstep += 1 
> >>         if mod(nstep,200)==0 
> >>             print(nstep,"  ") 
> >>             print("Δϕ=",Δϕ,"   ") 
> >>         end 
> >> 
> >>     end 
> >>     ϕ .*= vo 
> >> 
> >>     ϕ = Pl*ϕ 
> >>     ϕ .*= ko 
> >>     ϕ = Pl\ϕ 
> >> 
> >>     normphi = sqrt(sumabs2(ϕ))*dx 
> >>     ϕ /= normphi 
> >> end 
> >> 
>

Reply via email to