Did you mean that I should change all my .* , Pl* and ./ operations into 'for' loops? I have had a try, but it seems that this will take more running time. I also wrote some small scripts to test the use of devectorization, and could not observe any performance improvement.
在 2015年12月15日星期二 UTC+8上午2:15:11,Yichao Yu写道: > > On Mon, Dec 14, 2015 at 1:12 PM, Yichao Yu <yyc...@gmail.com <javascript:>> > wrote: > > On Sun, Dec 13, 2015 at 10:35 PM, 博陈 <chenph...@gmail.com <javascript:>> > wrote: > >> This is my first julia code, I am happy it did the right thing, but > compared > >> with the Matlab code that did the same thing, it runs so slowly. The > Matlab > >> code takes about 90s, but the julia below takes 130s. > > Another issue is that the code is not vectorizable (SIMD sense) due to > the way we store the complex array. (This is almost identical to the > simulation I am doing so I've run into this problem.) I might do some > more benchmarks on alternatives and report back later. > > > > > There's a few problems with the code. See below. You are still > > creating many temporary arrays which is in general a bad idea. > > > >> > >> Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) -2./sqrt(y^2+1.) > + > >> 1./sqrt((x-y)^2+1.) > >> Ko(x::Float64,y::Float64) = x^2+y^2 > >> > >> function ground() > >> R = 320. > >> Ks = 2^11 > >> x = linspace(-R, R, Ks+1) > > > > As a general advice, don't use this dimension for fft. If you replace > > you Ks with 2^11 - 1, you will see at least 2x speed up. > > > >> dx = 2R/Ks > >> dt = 0.1 > >> pm = 2π/2dx > >> px = circshift( linspace(-pm, pm, Ks+1), round(Int64, Ks/2) ) > >> FFTW.set_num_threads(8) > >> > >> V = Float64[Potential(ix, iy) for ix in x, iy in x] > >> ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px] > >> ko2 = ko.^2 > >> vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x] > >> ϕ = Array(Complex128,(Ks+1,Ks+1)) > >> Pl = plan_fft!(ϕ; flags=FFTW.MEASURE) > >> ϕ = V.*complex(1.) > >> > >> normphi = sqrt(sumabs2(ϕ))*dx > >> ϕ /= normphi > >> > >> ϕ = Pl*ϕ > >> ϕ .*= ko > >> ϕ = Pl\ϕ > >> > >> Δϕ = 1. > >> nstep = 0 > >> print("start loop") > >> while Δϕ > 1.e-15 > >> ϕ′ = ϕ > > > > You are better off pre-allocate the two buffers and swap them > > > >> ϕ .*= vo > > > > Use `scale!` > > > >> > >> normphi = sqrt(sumabs2(ϕ))*dx > >> ϕ /= normphi > > > > Also use `scale!` or write the loop out > > > >> > >> ϕ = Pl*ϕ > > > > The assignment is not necessary. > > > >> ϕ .*= ko2 > > > > scale! > > > >> ϕ = Pl\ϕ > > > > Unnecessary > > > >> # if check Δϕ for every step, 35s is needed. > >> if nstep>500 > >> Δϕ = maxabs(ϕ-ϕ′) > >> end > >> nstep += 1 > >> if mod(nstep,200)==0 > >> print(nstep," ") > >> print("Δϕ=",Δϕ," ") > >> end > >> > >> end > >> ϕ .*= vo > >> > >> ϕ = Pl*ϕ > >> ϕ .*= ko > >> ϕ = Pl\ϕ > >> > >> normphi = sqrt(sumabs2(ϕ))*dx > >> ϕ /= normphi > >> end > >> >