Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-30 Thread Lutfullah Tomak
Surely not the main issue here but doing `length( ϕ[:, 1] )` wastes memory 
for nothing. Instead use `size(ϕ, 1)` ?
Also `for j1 in [1:n1-1; n2+1:ns]` this can be split to 2 for loops to 
avoid allocations and memory loads. 
j1 is off by 1 in indexing so just decrease ranges by 1 as `0:n1-2` and 
n2:ns-1` and use j1 instead of `j1-1` (The same for previous loop).

On Wednesday, March 30, 2016 at 8:58:55 AM UTC+3, 博陈 wrote:
>
> I rewrote my code and manually loop from 1:n, the pre-allocation problem 
> is solved. I also added some parenthesis as you suggested, that helps, but 
> not very much. 
>
> 在 2016年3月30日星期三 UTC+8上午1:56:47,Yichao Yu写道:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 1:50 PM, 博陈  wrote:
>>
>>> Additionally, the allocation problem is not solved. I guess this 
>>> http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-temporary-arrays-being-created-in-a-function-td14492.html
>>>  might 
>>> be helpful, but I just don't know how to change my code.
>>>
>>>
>> The only places you create temporary arrays according to your profile is 
>> the `sum(A[1:n])` and you just need to loop from 1:n manually instead of 
>> creating an subarray
>>  
>>
>>>
>>>
>>> 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
>>>


 On Tue, Mar 29, 2016 at 12:43 PM, 博陈  wrote:

> I tried the built-in profiler, and find that the problem lies in lines 
> I end  with **, the result is shown below:
> that proved my guess, how can I pre-allocate these arrays? If I don't 
> want to divide this code into several parts that calculate these arrays 
> separately. 
>

 I don't understand what you mean by `divide this code into several 
 parts that calculate these arrays separately`
  

> | lines | backtrace |
>
> |   169 |  9011 |  ***
>
> |   173 |  1552 |
>
> |   175 |  2604 |
>
> |   179 |  2906 |
>
> |   181 |  1541 |
>
> |   192 |  4458 |
>
> |   211 | 13332 |
>
> |   214 |  8431 |
>
> |   218 | 15871 |***
>
> |   221 |  2538 |
>
>
> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>>
>> Have you tried:
>>
>> (a) calling @code_typewarn on your function
>> (b) using the built-in profiler?
>>
>>
>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote:
>>
>>> First of all, have a look at the result.
>>>
>>>
>>> 
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> My code calculates the evolution of 1-d 2-electron system in the 
>>> electric field, some variables are calculated during the evolution.
>>> According to the result of @time evolution, my code must have a 
>>> pre-allocation problem. Before you see the long code, i suggest that 
>>> the 
>>> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt 
>>> that I 
>>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
>>> push!(m, new_m) inside the loop to pre-allocate the variable m, but in 
>>> my 
>>> situations, I don't know how to pre-allocate these arrays.
>>>
>>> Below is the script (precisely, the main function) itself.
>>>
>>> function evolution(ϕ::Array{Complex{Float64}, 2},
>>>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>>>flags::Tuple{Int64, Int64, Int64, Int64})
>>> ϕg = copy(ϕ)
>>> FFTW.set_num_threads(8)
>>> ns = length( ϕ[:, 1] )
>>> x = get_x(ns, dx)
>>> p = get_p(ns, dx)
>>> if flags[4] == 1
>>> pp = similar(p)
>>> A = -cumsum(ele) * dt
>>> A² = A.*A
>>> # splitting
>>> r_sp = 150.0
>>> δ_sp = 5.0
>>> splitter = Array(Float64, ns, ns)
>>> end
>>> nt = length( ele )
>>>
>>> # # Pre-allocate result and temporary arrays
>>> #if flags[1] == 1
>>> σ = zeros(Complex128, nt)
>>> #end
>>> #if flags[2] == 1
>>> a = zeros(Float64, nt)
>>> #end
>>> #if flags[3] == 1
>>> r_ionization = 20.0
>>> n1 = round(Int, ns/2 - r_ionization/dx)
>>> n2 = round(Int, ns/2 + r_ionization/dx)
>>> ip = zeros(Float64, nt)
>>> #end
>>>
>>> # FFT plan
>>> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>>>
>>> prop_x = similar( ϕ )
>>> prop_p = similar( prop_x )
>>> prop_e = similar( prop_x )
>>> # this two versions just cost the same time
>>> xplusy = Array(Float64, ns, ns)
>>> #xplusy = 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-30 Thread 博陈
I tried @code_warntype, and the result show that the red alert appear only 
in the io part. Maybe it's not a type-stability issue.

在 2016年3月30日星期三 UTC+8上午3:18:24,Tim Holy写道:
>
> I haven't look at this myself, but have you tried Stefan's suggestion to 
> look 
> at `@code_warntype`? This might not be a preallocation issue, it might be 
> a 
> type-stability issue. 
>
> --Tim 
>
> On Tuesday, March 29, 2016 01:56:21 PM Yichao Yu wrote: 
> > On Tue, Mar 29, 2016 at 1:50 PM, 博陈  
> wrote: 
> > > Additionally, the allocation problem is not solved. I guess this 
> > > 
> http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-tempo 
> > > rary-arrays-being-created-in-a-function-td14492.html might be helpful, 
> but 
> > > I just don't know how to change my code. 
> > 
> > The only places you create temporary arrays according to your profile is 
> > the `sum(A[1:n])` and you just need to loop from 1:n manually instead of 
> > creating an subarray 
> > 
> > > 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道: 
> > > 
> > >> On Tue, Mar 29, 2016 at 12:43 PM, 博陈  wrote: 
> > >>> I tried the built-in profiler, and find that the problem lies in 
> lines I 
> > >>> end  with **, the result is shown below: 
> > >>> that proved my guess, how can I pre-allocate these arrays? If I 
> don't 
> > >>> want to divide this code into several parts that calculate these 
> arrays 
> > >>> separately. 
> > >> 
> > >> I don't understand what you mean by `divide this code into several 
> parts 
> > >> that calculate these arrays separately` 
> > >> 
> > >>> | lines | backtrace | 
> > >>> | 
> > >>> |   169 |  9011 |  *** 
> > >>> |   
> > >>> |   173 |  1552 | 
> > >>> |   
> > >>> |   175 |  2604 | 
> > >>> |   
> > >>> |   179 |  2906 | 
> > >>> |   
> > >>> |   181 |  1541 | 
> > >>> |   
> > >>> |   192 |  4458 | 
> > >>> |   
> > >>> |   211 | 13332 | 
> > >>> |   
> > >>> |   214 |  8431 | 
> > >>> |   
> > >>> |   218 | 15871 |*** 
> > >>> |   
> > >>> |   221 |  2538 | 
> > >>> 
> > >>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道: 
> > >>> 
> >  Have you tried: 
> >  
> >  (a) calling @code_typewarn on your function 
> >  (b) using the built-in profiler? 
> >  
> >  On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote: 
> > > First of all, have a look at the result. 
> > > 
> > > 
> > > <
> https://lh3.googleusercontent.com/-anNt-E4P1vM/Vvp-TybegZI/AB 
> > > 
> E/ZvDO2xarndMSgKVOXy_hcPd5NTh-7QcEA/s1600/QQ%25E5%259B%25BE%25E7%2589% 
> > > 258720160329210732.png> 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > My code calculates the evolution of 1-d 2-electron system in the 
> > > electric field, some variables are calculated during the 
> evolution. 
> > > According to the result of @time evolution, my code must have a 
> > > pre-allocation problem. Before you see the long code, i suggest 
> that 
> > > the 
> > > hotspot might be around the Arrays prop_e, \phio, pp. I have 
> learnt 
> > > that I 
> > > can use m = Array(Float64, 1) outside a "for" loop and empty!(m) 
> and 
> > > push!(m, new_m) inside the loop to pre-allocate the variable m, 
> but in 
> > > my 
> > > situations, I don't know how to pre-allocate these arrays. 
> > > 
> > > Below is the script (precisely, the main function) itself. 
> > > 
> > > function evolution(ϕ::Array{Complex{Float64}, 2}, 
> > > 
> > >ele::Array{Float64, 1}, dx::Float64, 
> dt::Float64, 
> > >flags::Tuple{Int64, Int64, Int64, Int64}) 
> > > 
> > > ϕg = copy(ϕ) 
> > > FFTW.set_num_threads(8) 
> > > ns = length( ϕ[:, 1] ) 
> > > x = get_x(ns, dx) 
> > > p = get_p(ns, dx) 
> > > if flags[4] == 1 
> > > 
> > > pp = similar(p) 
> > > A = -cumsum(ele) * dt 
> > > A² = A.*A 
> > > # splitting 
> > > r_sp = 150.0 
> > > δ_sp = 5.0 
> > > splitter = Array(Float64, ns, ns) 
> > > 
> > > end 
> > > nt = length( ele ) 
> > > 
> > > # # Pre-allocate result and temporary arrays 
> > > #if flags[1] == 1 
> > > σ = zeros(Complex128, nt) 
> > > #end 
> > > #if flags[2] == 1 
> > > a = zeros(Float64, nt) 
> > > #end 
> > > #if flags[3] == 1 
> > > r_ionization = 20.0 
> > > n1 = round(Int, ns/2 - r_ionization/dx) 
> > > n2 = round(Int, ns/2 + r_ionization/dx) 
> > > ip = zeros(Float64, nt) 
> > > #end 
> > > 
> > > # FFT plan 
> > > p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE ) 
> > > 
> > > prop_x = 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread 博陈
I rewrote my code and manually loop from 1:n, the pre-allocation problem is 
solved. I also added some parenthesis as you suggested, that helps, but not 
very much. 

在 2016年3月30日星期三 UTC+8上午1:56:47,Yichao Yu写道:
>
>
>
> On Tue, Mar 29, 2016 at 1:50 PM, 博陈  
> wrote:
>
>> Additionally, the allocation problem is not solved. I guess this 
>> http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-temporary-arrays-being-created-in-a-function-td14492.html
>>  might 
>> be helpful, but I just don't know how to change my code.
>>
>>
> The only places you create temporary arrays according to your profile is 
> the `sum(A[1:n])` and you just need to loop from 1:n manually instead of 
> creating an subarray
>  
>
>>
>>
>> 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
>>
>>>
>>>
>>> On Tue, Mar 29, 2016 at 12:43 PM, 博陈  wrote:
>>>
 I tried the built-in profiler, and find that the problem lies in lines 
 I end  with **, the result is shown below:
 that proved my guess, how can I pre-allocate these arrays? If I don't 
 want to divide this code into several parts that calculate these arrays 
 separately. 

>>>
>>> I don't understand what you mean by `divide this code into several parts 
>>> that calculate these arrays separately`
>>>  
>>>
 | lines | backtrace |

 |   169 |  9011 |  ***

 |   173 |  1552 |

 |   175 |  2604 |

 |   179 |  2906 |

 |   181 |  1541 |

 |   192 |  4458 |

 |   211 | 13332 |

 |   214 |  8431 |

 |   218 | 15871 |***

 |   221 |  2538 |


 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>
> Have you tried:
>
> (a) calling @code_typewarn on your function
> (b) using the built-in profiler?
>
>
> On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote:
>
>> First of all, have a look at the result.
>>
>>
>> 
>>
>>
>>
>>
>>
>>
>>
>>
>> My code calculates the evolution of 1-d 2-electron system in the 
>> electric field, some variables are calculated during the evolution.
>> According to the result of @time evolution, my code must have a 
>> pre-allocation problem. Before you see the long code, i suggest that the 
>> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that 
>> I 
>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
>> push!(m, new_m) inside the loop to pre-allocate the variable m, but in 
>> my 
>> situations, I don't know how to pre-allocate these arrays.
>>
>> Below is the script (precisely, the main function) itself.
>>
>> function evolution(ϕ::Array{Complex{Float64}, 2},
>>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>>flags::Tuple{Int64, Int64, Int64, Int64})
>> ϕg = copy(ϕ)
>> FFTW.set_num_threads(8)
>> ns = length( ϕ[:, 1] )
>> x = get_x(ns, dx)
>> p = get_p(ns, dx)
>> if flags[4] == 1
>> pp = similar(p)
>> A = -cumsum(ele) * dt
>> A² = A.*A
>> # splitting
>> r_sp = 150.0
>> δ_sp = 5.0
>> splitter = Array(Float64, ns, ns)
>> end
>> nt = length( ele )
>>
>> # # Pre-allocate result and temporary arrays
>> #if flags[1] == 1
>> σ = zeros(Complex128, nt)
>> #end
>> #if flags[2] == 1
>> a = zeros(Float64, nt)
>> #end
>> #if flags[3] == 1
>> r_ionization = 20.0
>> n1 = round(Int, ns/2 - r_ionization/dx)
>> n2 = round(Int, ns/2 + r_ionization/dx)
>> ip = zeros(Float64, nt)
>> #end
>>
>> # FFT plan
>> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>>
>> prop_x = similar( ϕ )
>> prop_p = similar( prop_x )
>> prop_e = similar( prop_x )
>> # this two versions just cost the same time
>> xplusy = Array(Float64, ns, ns)
>> #xplusy = Array( Float64, ns^2)
>>
>> # absorb boundary
>> r_a = ns * dx /2 - 50.0
>> δ = 10.0
>> absorb = Array(Float64, ns, ns)
>>
>> k0 = 2π / (ns * dx)
>>
>> @inbounds for j in 1:ns
>> @inbounds for i in 1:ns
>> prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt 
>> / 2 )
>> prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )
>>
>> xplusy[i, j] = x[i] + x[j]
>>
>> absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread Tim Holy
I haven't look at this myself, but have you tried Stefan's suggestion to look 
at `@code_warntype`? This might not be a preallocation issue, it might be a 
type-stability issue.

--Tim

On Tuesday, March 29, 2016 01:56:21 PM Yichao Yu wrote:
> On Tue, Mar 29, 2016 at 1:50 PM, 博陈  wrote:
> > Additionally, the allocation problem is not solved. I guess this
> > http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-tempo
> > rary-arrays-being-created-in-a-function-td14492.html might be helpful, but
> > I just don't know how to change my code.
> 
> The only places you create temporary arrays according to your profile is
> the `sum(A[1:n])` and you just need to loop from 1:n manually instead of
> creating an subarray
> 
> > 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
> > 
> >> On Tue, Mar 29, 2016 at 12:43 PM, 博陈  wrote:
> >>> I tried the built-in profiler, and find that the problem lies in lines I
> >>> end  with **, the result is shown below:
> >>> that proved my guess, how can I pre-allocate these arrays? If I don't
> >>> want to divide this code into several parts that calculate these arrays
> >>> separately.
> >> 
> >> I don't understand what you mean by `divide this code into several parts
> >> that calculate these arrays separately`
> >> 
> >>> | lines | backtrace |
> >>> | 
> >>> |   169 |  9011 |  ***
> >>> |   
> >>> |   173 |  1552 |
> >>> |   
> >>> |   175 |  2604 |
> >>> |   
> >>> |   179 |  2906 |
> >>> |   
> >>> |   181 |  1541 |
> >>> |   
> >>> |   192 |  4458 |
> >>> |   
> >>> |   211 | 13332 |
> >>> |   
> >>> |   214 |  8431 |
> >>> |   
> >>> |   218 | 15871 |***
> >>> |   
> >>> |   221 |  2538 |
> >>> 
> >>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
> >>> 
>  Have you tried:
>  
>  (a) calling @code_typewarn on your function
>  (b) using the built-in profiler?
>  
>  On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote:
> > First of all, have a look at the result.
> > 
> > 
> >  > E/ZvDO2xarndMSgKVOXy_hcPd5NTh-7QcEA/s1600/QQ%25E5%259B%25BE%25E7%2589%
> > 258720160329210732.png>
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > My code calculates the evolution of 1-d 2-electron system in the
> > electric field, some variables are calculated during the evolution.
> > According to the result of @time evolution, my code must have a
> > pre-allocation problem. Before you see the long code, i suggest that
> > the
> > hotspot might be around the Arrays prop_e, \phio, pp. I have learnt
> > that I
> > can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and
> > push!(m, new_m) inside the loop to pre-allocate the variable m, but in
> > my
> > situations, I don't know how to pre-allocate these arrays.
> > 
> > Below is the script (precisely, the main function) itself.
> > 
> > function evolution(ϕ::Array{Complex{Float64}, 2},
> > 
> >ele::Array{Float64, 1}, dx::Float64, dt::Float64,
> >flags::Tuple{Int64, Int64, Int64, Int64})
> > 
> > ϕg = copy(ϕ)
> > FFTW.set_num_threads(8)
> > ns = length( ϕ[:, 1] )
> > x = get_x(ns, dx)
> > p = get_p(ns, dx)
> > if flags[4] == 1
> > 
> > pp = similar(p)
> > A = -cumsum(ele) * dt
> > A² = A.*A
> > # splitting
> > r_sp = 150.0
> > δ_sp = 5.0
> > splitter = Array(Float64, ns, ns)
> > 
> > end
> > nt = length( ele )
> > 
> > # # Pre-allocate result and temporary arrays
> > #if flags[1] == 1
> > σ = zeros(Complex128, nt)
> > #end
> > #if flags[2] == 1
> > a = zeros(Float64, nt)
> > #end
> > #if flags[3] == 1
> > r_ionization = 20.0
> > n1 = round(Int, ns/2 - r_ionization/dx)
> > n2 = round(Int, ns/2 + r_ionization/dx)
> > ip = zeros(Float64, nt)
> > #end
> > 
> > # FFT plan
> > p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
> > 
> > prop_x = similar( ϕ )
> > prop_p = similar( prop_x )
> > prop_e = similar( prop_x )
> > # this two versions just cost the same time
> > xplusy = Array(Float64, ns, ns)
> > #xplusy = Array( Float64, ns^2)
> > 
> > # absorb boundary
> > r_a = ns * dx /2 - 50.0
> > δ = 10.0
> > absorb = Array(Float64, ns, ns)
> > 
> > k0 = 2π / (ns * dx)
> > 
> > @inbounds for j in 1:ns
> > 
> > @inbounds for i in 1:ns
> > 
> > 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread Yichao Yu
On Tue, Mar 29, 2016 at 1:50 PM, 博陈  wrote:

> Additionally, the allocation problem is not solved. I guess this
> http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-temporary-arrays-being-created-in-a-function-td14492.html
>  might
> be helpful, but I just don't know how to change my code.
>
>
The only places you create temporary arrays according to your profile is
the `sum(A[1:n])` and you just need to loop from 1:n manually instead of
creating an subarray


>
>
> 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
>
>>
>>
>> On Tue, Mar 29, 2016 at 12:43 PM, 博陈  wrote:
>>
>>> I tried the built-in profiler, and find that the problem lies in lines I
>>> end  with **, the result is shown below:
>>> that proved my guess, how can I pre-allocate these arrays? If I don't
>>> want to divide this code into several parts that calculate these arrays
>>> separately.
>>>
>>
>> I don't understand what you mean by `divide this code into several parts
>> that calculate these arrays separately`
>>
>>
>>> | lines | backtrace |
>>>
>>> |   169 |  9011 |  ***
>>>
>>> |   173 |  1552 |
>>>
>>> |   175 |  2604 |
>>>
>>> |   179 |  2906 |
>>>
>>> |   181 |  1541 |
>>>
>>> |   192 |  4458 |
>>>
>>> |   211 | 13332 |
>>>
>>> |   214 |  8431 |
>>>
>>> |   218 | 15871 |***
>>>
>>> |   221 |  2538 |
>>>
>>>
>>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:

 Have you tried:

 (a) calling @code_typewarn on your function
 (b) using the built-in profiler?


 On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote:

> First of all, have a look at the result.
>
>
> 
>
>
>
>
>
>
>
>
> My code calculates the evolution of 1-d 2-electron system in the
> electric field, some variables are calculated during the evolution.
> According to the result of @time evolution, my code must have a
> pre-allocation problem. Before you see the long code, i suggest that the
> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I
> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and
> push!(m, new_m) inside the loop to pre-allocate the variable m, but in my
> situations, I don't know how to pre-allocate these arrays.
>
> Below is the script (precisely, the main function) itself.
>
> function evolution(ϕ::Array{Complex{Float64}, 2},
>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>flags::Tuple{Int64, Int64, Int64, Int64})
> ϕg = copy(ϕ)
> FFTW.set_num_threads(8)
> ns = length( ϕ[:, 1] )
> x = get_x(ns, dx)
> p = get_p(ns, dx)
> if flags[4] == 1
> pp = similar(p)
> A = -cumsum(ele) * dt
> A² = A.*A
> # splitting
> r_sp = 150.0
> δ_sp = 5.0
> splitter = Array(Float64, ns, ns)
> end
> nt = length( ele )
>
> # # Pre-allocate result and temporary arrays
> #if flags[1] == 1
> σ = zeros(Complex128, nt)
> #end
> #if flags[2] == 1
> a = zeros(Float64, nt)
> #end
> #if flags[3] == 1
> r_ionization = 20.0
> n1 = round(Int, ns/2 - r_ionization/dx)
> n2 = round(Int, ns/2 + r_ionization/dx)
> ip = zeros(Float64, nt)
> #end
>
> # FFT plan
> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>
> prop_x = similar( ϕ )
> prop_p = similar( prop_x )
> prop_e = similar( prop_x )
> # this two versions just cost the same time
> xplusy = Array(Float64, ns, ns)
> #xplusy = Array( Float64, ns^2)
>
> # absorb boundary
> r_a = ns * dx /2 - 50.0
> δ = 10.0
> absorb = Array(Float64, ns, ns)
>
> k0 = 2π / (ns * dx)
>
> @inbounds for j in 1:ns
> @inbounds for i in 1:ns
> prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt /
> 2 )
> prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )
>
> xplusy[i, j] = x[i] + x[j]
>
> absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 -
> get_out(x[j],
>  r_a, δ))
> end
> end
>
> if flags[2] == 1
> pvpx = Array(Float64, ns, ns)
> @inbounds for j in 1:ns
> @inbounds for i in 1:ns
> pvpx[i, j] = get_pvpx(x[i], x[j])
> end
> end
> end
>
> if flags[4] == 1
> ϕo = 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread Yichao Yu
On Tue, Mar 29, 2016 at 1:45 PM, 博陈  wrote:

> Thank you very much for your help.
>
> But I don't understand what the difference is between  `a * b * c * d` and
> `(a * b) * (c * d)`. Do you mean that the latter is slower?
>

The latter is faster since the compiler currently have a hard time inlining
the first one and creates really bad code.


>
> Before I wrote this function, I just wrote a much simple one, which only
> calculate array a (physically, the dipole acceleration). At that time, I
> only need to know the dipole acceleration during the evolution. Then I
> found
> I need other information during evolution, so I wrote another function to
> calculate ip, the ionization probability... Same story with \sigma and
> \phip, Now I wrote them together, and give a flags parameter. If flags =
> (1, 1, 1, 1), the script calculate all the 4 quantities, while if
> flags=(0,0,0,0), the function just give me the final wave function.
>
> 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
>
>>
>>
>> On Tue, Mar 29, 2016 at 12:43 PM, 博陈  wrote:
>>
>>> I tried the built-in profiler, and find that the problem lies in lines I
>>> end  with **, the result is shown below:
>>> that proved my guess, how can I pre-allocate these arrays? If I don't
>>> want to divide this code into several parts that calculate these arrays
>>> separately.
>>>
>>
>> I don't understand what you mean by `divide this code into several parts
>> that calculate these arrays separately`
>>
>>
>>> | lines | backtrace |
>>>
>>> |   169 |  9011 |  ***
>>>
>>> |   173 |  1552 |
>>>
>>> |   175 |  2604 |
>>>
>>> |   179 |  2906 |
>>>
>>> |   181 |  1541 |
>>>
>>> |   192 |  4458 |
>>>
>>> |   211 | 13332 |
>>>
>>> |   214 |  8431 |
>>>
>>> |   218 | 15871 |***
>>>
>>> |   221 |  2538 |
>>>
>>>
>>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:

 Have you tried:

 (a) calling @code_typewarn on your function
 (b) using the built-in profiler?


 On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote:

> First of all, have a look at the result.
>
>
> 
>
>
>
>
>
>
>
>
> My code calculates the evolution of 1-d 2-electron system in the
> electric field, some variables are calculated during the evolution.
> According to the result of @time evolution, my code must have a
> pre-allocation problem. Before you see the long code, i suggest that the
> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I
> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and
> push!(m, new_m) inside the loop to pre-allocate the variable m, but in my
> situations, I don't know how to pre-allocate these arrays.
>
> Below is the script (precisely, the main function) itself.
>
> function evolution(ϕ::Array{Complex{Float64}, 2},
>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>flags::Tuple{Int64, Int64, Int64, Int64})
> ϕg = copy(ϕ)
> FFTW.set_num_threads(8)
> ns = length( ϕ[:, 1] )
> x = get_x(ns, dx)
> p = get_p(ns, dx)
> if flags[4] == 1
> pp = similar(p)
> A = -cumsum(ele) * dt
> A² = A.*A
> # splitting
> r_sp = 150.0
> δ_sp = 5.0
> splitter = Array(Float64, ns, ns)
> end
> nt = length( ele )
>
> # # Pre-allocate result and temporary arrays
> #if flags[1] == 1
> σ = zeros(Complex128, nt)
> #end
> #if flags[2] == 1
> a = zeros(Float64, nt)
> #end
> #if flags[3] == 1
> r_ionization = 20.0
> n1 = round(Int, ns/2 - r_ionization/dx)
> n2 = round(Int, ns/2 + r_ionization/dx)
> ip = zeros(Float64, nt)
> #end
>
> # FFT plan
> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>
> prop_x = similar( ϕ )
> prop_p = similar( prop_x )
> prop_e = similar( prop_x )
> # this two versions just cost the same time
> xplusy = Array(Float64, ns, ns)
> #xplusy = Array( Float64, ns^2)
>
> # absorb boundary
> r_a = ns * dx /2 - 50.0
> δ = 10.0
> absorb = Array(Float64, ns, ns)
>
> k0 = 2π / (ns * dx)
>
> @inbounds for j in 1:ns
> @inbounds for i in 1:ns
> prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt /
> 2 )
> prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )
>
> xplusy[i, j] = x[i] + x[j]
>
> 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread 博陈
Additionally, the allocation problem is not solved. I guess this 
http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-temporary-arrays-being-created-in-a-function-td14492.html
 might 
be helpful, but I just don't know how to change my code.



在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
>
>
>
> On Tue, Mar 29, 2016 at 12:43 PM, 博陈  
> wrote:
>
>> I tried the built-in profiler, and find that the problem lies in lines I 
>> end  with **, the result is shown below:
>> that proved my guess, how can I pre-allocate these arrays? If I don't 
>> want to divide this code into several parts that calculate these arrays 
>> separately. 
>>
>
> I don't understand what you mean by `divide this code into several parts 
> that calculate these arrays separately`
>  
>
>> | lines | backtrace |
>>
>> |   169 |  9011 |  ***
>>
>> |   173 |  1552 |
>>
>> |   175 |  2604 |
>>
>> |   179 |  2906 |
>>
>> |   181 |  1541 |
>>
>> |   192 |  4458 |
>>
>> |   211 | 13332 |
>>
>> |   214 |  8431 |
>>
>> |   218 | 15871 |***
>>
>> |   221 |  2538 |
>>
>>
>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>>>
>>> Have you tried:
>>>
>>> (a) calling @code_typewarn on your function
>>> (b) using the built-in profiler?
>>>
>>>
>>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote:
>>>
 First of all, have a look at the result.


 








 My code calculates the evolution of 1-d 2-electron system in the 
 electric field, some variables are calculated during the evolution.
 According to the result of @time evolution, my code must have a 
 pre-allocation problem. Before you see the long code, i suggest that the 
 hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I 
 can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
 push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
 situations, I don't know how to pre-allocate these arrays.

 Below is the script (precisely, the main function) itself.

 function evolution(ϕ::Array{Complex{Float64}, 2},
ele::Array{Float64, 1}, dx::Float64, dt::Float64,
flags::Tuple{Int64, Int64, Int64, Int64})
 ϕg = copy(ϕ)
 FFTW.set_num_threads(8)
 ns = length( ϕ[:, 1] )
 x = get_x(ns, dx)
 p = get_p(ns, dx)
 if flags[4] == 1
 pp = similar(p)
 A = -cumsum(ele) * dt
 A² = A.*A
 # splitting
 r_sp = 150.0
 δ_sp = 5.0
 splitter = Array(Float64, ns, ns)
 end
 nt = length( ele )

 # # Pre-allocate result and temporary arrays
 #if flags[1] == 1
 σ = zeros(Complex128, nt)
 #end
 #if flags[2] == 1
 a = zeros(Float64, nt)
 #end
 #if flags[3] == 1
 r_ionization = 20.0
 n1 = round(Int, ns/2 - r_ionization/dx)
 n2 = round(Int, ns/2 + r_ionization/dx)
 ip = zeros(Float64, nt)
 #end

 # FFT plan
 p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )

 prop_x = similar( ϕ )
 prop_p = similar( prop_x )
 prop_e = similar( prop_x )
 # this two versions just cost the same time
 xplusy = Array(Float64, ns, ns)
 #xplusy = Array( Float64, ns^2)

 # absorb boundary
 r_a = ns * dx /2 - 50.0
 δ = 10.0
 absorb = Array(Float64, ns, ns)

 k0 = 2π / (ns * dx)

 @inbounds for j in 1:ns
 @inbounds for i in 1:ns
 prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt / 
 2 )
 prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )

 xplusy[i, j] = x[i] + x[j]

 absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - 
 get_out(x[j],
  r_a, δ))
 end
 end

 if flags[2] == 1
 pvpx = Array(Float64, ns, ns)
 @inbounds for j in 1:ns
 @inbounds for i in 1:ns
 pvpx[i, j] = get_pvpx(x[i], x[j])
 end
 end
 end

 if flags[4] == 1
 ϕo = zeros(Complex128, ns, ns)
 ϕp = zeros(Complex128, ns, ns)
 @inbounds for  j in 1:ns
 @inbounds for  i in 1:ns
 splitter[i, j] = get_out(x[i], r_sp, δ_sp) * 
 get_out(x[j], r_sp, δ_sp)
 end
 end
 end

 for i in 1:nt
 for j in eachindex(ϕ)
 prop_e[j] 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread 博陈
Thank you very much for your help. 

But I don't understand what the difference is between  `a * b * c * d` and 
`(a * b) * (c * d)`. Do you mean that the latter is slower?

Before I wrote this function, I just wrote a much simple one, which only 
calculate array a (physically, the dipole acceleration). At that time, I 
only need to know the dipole acceleration during the evolution. Then I 
found 
I need other information during evolution, so I wrote another function to 
calculate ip, the ionization probability... Same story with \sigma and 
\phip, Now I wrote them together, and give a flags parameter. If flags = 
(1, 1, 1, 1), the script calculate all the 4 quantities, while if 
flags=(0,0,0,0), the function just give me the final wave function.

在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
>
>
>
> On Tue, Mar 29, 2016 at 12:43 PM, 博陈  
> wrote:
>
>> I tried the built-in profiler, and find that the problem lies in lines I 
>> end  with **, the result is shown below:
>> that proved my guess, how can I pre-allocate these arrays? If I don't 
>> want to divide this code into several parts that calculate these arrays 
>> separately. 
>>
>
> I don't understand what you mean by `divide this code into several parts 
> that calculate these arrays separately`
>  
>
>> | lines | backtrace |
>>
>> |   169 |  9011 |  ***
>>
>> |   173 |  1552 |
>>
>> |   175 |  2604 |
>>
>> |   179 |  2906 |
>>
>> |   181 |  1541 |
>>
>> |   192 |  4458 |
>>
>> |   211 | 13332 |
>>
>> |   214 |  8431 |
>>
>> |   218 | 15871 |***
>>
>> |   221 |  2538 |
>>
>>
>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>>>
>>> Have you tried:
>>>
>>> (a) calling @code_typewarn on your function
>>> (b) using the built-in profiler?
>>>
>>>
>>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote:
>>>
 First of all, have a look at the result.


 








 My code calculates the evolution of 1-d 2-electron system in the 
 electric field, some variables are calculated during the evolution.
 According to the result of @time evolution, my code must have a 
 pre-allocation problem. Before you see the long code, i suggest that the 
 hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I 
 can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
 push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
 situations, I don't know how to pre-allocate these arrays.

 Below is the script (precisely, the main function) itself.

 function evolution(ϕ::Array{Complex{Float64}, 2},
ele::Array{Float64, 1}, dx::Float64, dt::Float64,
flags::Tuple{Int64, Int64, Int64, Int64})
 ϕg = copy(ϕ)
 FFTW.set_num_threads(8)
 ns = length( ϕ[:, 1] )
 x = get_x(ns, dx)
 p = get_p(ns, dx)
 if flags[4] == 1
 pp = similar(p)
 A = -cumsum(ele) * dt
 A² = A.*A
 # splitting
 r_sp = 150.0
 δ_sp = 5.0
 splitter = Array(Float64, ns, ns)
 end
 nt = length( ele )

 # # Pre-allocate result and temporary arrays
 #if flags[1] == 1
 σ = zeros(Complex128, nt)
 #end
 #if flags[2] == 1
 a = zeros(Float64, nt)
 #end
 #if flags[3] == 1
 r_ionization = 20.0
 n1 = round(Int, ns/2 - r_ionization/dx)
 n2 = round(Int, ns/2 + r_ionization/dx)
 ip = zeros(Float64, nt)
 #end

 # FFT plan
 p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )

 prop_x = similar( ϕ )
 prop_p = similar( prop_x )
 prop_e = similar( prop_x )
 # this two versions just cost the same time
 xplusy = Array(Float64, ns, ns)
 #xplusy = Array( Float64, ns^2)

 # absorb boundary
 r_a = ns * dx /2 - 50.0
 δ = 10.0
 absorb = Array(Float64, ns, ns)

 k0 = 2π / (ns * dx)

 @inbounds for j in 1:ns
 @inbounds for i in 1:ns
 prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt / 
 2 )
 prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )

 xplusy[i, j] = x[i] + x[j]

 absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - 
 get_out(x[j],
  r_a, δ))
 end
 end

 if flags[2] == 1
 pvpx = Array(Float64, ns, ns)
 @inbounds for j in 1:ns
 @inbounds for i in 1:ns
 pvpx[i, j] = 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread Yichao Yu
On Tue, Mar 29, 2016 at 12:43 PM, 博陈  wrote:

> I tried the built-in profiler, and find that the problem lies in lines I
> end  with **, the result is shown below:
> that proved my guess, how can I pre-allocate these arrays? If I don't want
> to divide this code into several parts that calculate these arrays
> separately.
>

I don't understand what you mean by `divide this code into several parts
that calculate these arrays separately`


> | lines | backtrace |
>
> |   169 |  9011 |  ***
>
> |   173 |  1552 |
>
> |   175 |  2604 |
>
> |   179 |  2906 |
>
> |   181 |  1541 |
>
> |   192 |  4458 |
>
> |   211 | 13332 |
>
> |   214 |  8431 |
>
> |   218 | 15871 |***
>
> |   221 |  2538 |
>
>
> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>>
>> Have you tried:
>>
>> (a) calling @code_typewarn on your function
>> (b) using the built-in profiler?
>>
>>
>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote:
>>
>>> First of all, have a look at the result.
>>>
>>>
>>> 
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> My code calculates the evolution of 1-d 2-electron system in the
>>> electric field, some variables are calculated during the evolution.
>>> According to the result of @time evolution, my code must have a
>>> pre-allocation problem. Before you see the long code, i suggest that the
>>> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I
>>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and
>>> push!(m, new_m) inside the loop to pre-allocate the variable m, but in my
>>> situations, I don't know how to pre-allocate these arrays.
>>>
>>> Below is the script (precisely, the main function) itself.
>>>
>>> function evolution(ϕ::Array{Complex{Float64}, 2},
>>>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>>>flags::Tuple{Int64, Int64, Int64, Int64})
>>> ϕg = copy(ϕ)
>>> FFTW.set_num_threads(8)
>>> ns = length( ϕ[:, 1] )
>>> x = get_x(ns, dx)
>>> p = get_p(ns, dx)
>>> if flags[4] == 1
>>> pp = similar(p)
>>> A = -cumsum(ele) * dt
>>> A² = A.*A
>>> # splitting
>>> r_sp = 150.0
>>> δ_sp = 5.0
>>> splitter = Array(Float64, ns, ns)
>>> end
>>> nt = length( ele )
>>>
>>> # # Pre-allocate result and temporary arrays
>>> #if flags[1] == 1
>>> σ = zeros(Complex128, nt)
>>> #end
>>> #if flags[2] == 1
>>> a = zeros(Float64, nt)
>>> #end
>>> #if flags[3] == 1
>>> r_ionization = 20.0
>>> n1 = round(Int, ns/2 - r_ionization/dx)
>>> n2 = round(Int, ns/2 + r_ionization/dx)
>>> ip = zeros(Float64, nt)
>>> #end
>>>
>>> # FFT plan
>>> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>>>
>>> prop_x = similar( ϕ )
>>> prop_p = similar( prop_x )
>>> prop_e = similar( prop_x )
>>> # this two versions just cost the same time
>>> xplusy = Array(Float64, ns, ns)
>>> #xplusy = Array( Float64, ns^2)
>>>
>>> # absorb boundary
>>> r_a = ns * dx /2 - 50.0
>>> δ = 10.0
>>> absorb = Array(Float64, ns, ns)
>>>
>>> k0 = 2π / (ns * dx)
>>>
>>> @inbounds for j in 1:ns
>>> @inbounds for i in 1:ns
>>> prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt / 2
>>> )
>>> prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )
>>>
>>> xplusy[i, j] = x[i] + x[j]
>>>
>>> absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 -
>>> get_out(x[j],
>>>  r_a, δ))
>>> end
>>> end
>>>
>>> if flags[2] == 1
>>> pvpx = Array(Float64, ns, ns)
>>> @inbounds for j in 1:ns
>>> @inbounds for i in 1:ns
>>> pvpx[i, j] = get_pvpx(x[i], x[j])
>>> end
>>> end
>>> end
>>>
>>> if flags[4] == 1
>>> ϕo = zeros(Complex128, ns, ns)
>>> ϕp = zeros(Complex128, ns, ns)
>>> @inbounds for  j in 1:ns
>>> @inbounds for  i in 1:ns
>>> splitter[i, j] = get_out(x[i], r_sp, δ_sp) *
>>> get_out(x[j], r_sp, δ_sp)
>>> end
>>> end
>>> end
>>>
>>> for i in 1:nt
>>> for j in eachindex(ϕ)
>>> prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0)
>>> 169
>>>
>>>
You might be hitting a stupid inlining issue here, try adding parenthesis
to the multiplication
(i.e. instead of `a * b * c * d` do `(a * b) * (c * d)`)


> end
>>>
>>> for j in eachindex(ϕ)
>>> ϕ[j] *= prop_x[j] * prop_e[j]
>>> end
>>> p_fft! * ϕ
>>> for j in eachindex(ϕ)
>>> ϕ[j] *= prop_p[j]
>>> end
>>> 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread 博陈
sorry, but I give the lines in the citing area below the table.

在 2016年3月30日星期三 UTC+8上午12:50:30,Milan Bouchet-Valat写道:
>
> Le mardi 29 mars 2016 à 09:43 -0700, 博陈 a écrit : 
> > I tried the built-in profiler, and find that the problem lies in 
> > lines I end  with **, the result is shown below: 
> > that proved my guess, how can I pre-allocate these arrays? If I don't 
> > want to divide this code into several parts that calculate these 
> > arrays separately.  
> Can you show us which lines the numbers correspond to? There's no line 
> 211 in the script you posted. 
>
>
> Regards 
>
> > | lines | backtrace | 
> > |   169 |  9011 |  *** 
> > |   173 |  1552 | 
> > |   175 |  2604 | 
> > |   179 |  2906 | 
> > |   181 |  1541 | 
> > |   192 |  4458 | 
> > |   211 | 13332 | 
> > |   214 |  8431 | 
> > |   218 | 15871 |*** 
> > |   221 |  2538 | 
> > 
> > > Have you tried: 
> > > 
> > > (a) calling @code_typewarn on your function 
> > > (b) using the built-in profiler? 
> > > 
> > > 
> > > On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote: 
> > > > First of all, have a look at the result. 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > My code calculates the evolution of 1-d 2-electron system in the 
> electric field, some variables are calculated during the evolution. 
> > > > According to the result of @time evolution, my code must have a 
> pre-allocation problem. Before you see the long code, i suggest that the 
> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I 
> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
> push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
> situations, I don't know how to pre-allocate these arrays. 
> > > > 
> > > > Below is the script (precisely, the main function) itself. 
> > > > 
> > > > function evolution(ϕ::Array{Complex{Float64}, 2}, 
> > > >ele::Array{Float64, 1}, dx::Float64, dt::Float64, 
> > > >flags::Tuple{Int64, Int64, Int64, Int64}) 
> > > > ϕg = copy(ϕ) 
> > > > FFTW.set_num_threads(8) 
> > > > ns = length( ϕ[:, 1] ) 
> > > > x = get_x(ns, dx) 
> > > > p = get_p(ns, dx) 
> > > > if flags[4] == 1 
> > > > pp = similar(p) 
> > > > A = -cumsum(ele) * dt 
> > > > A² = A.*A 
> > > > # splitting 
> > > > r_sp = 150.0 
> > > > δ_sp = 5.0 
> > > > splitter = Array(Float64, ns, ns) 
> > > > end 
> > > > nt = length( ele ) 
> > > > 
> > > > # # Pre-allocate result and temporary arrays 
> > > > #if flags[1] == 1 
> > > > σ = zeros(Complex128, nt) 
> > > > #end 
> > > > #if flags[2] == 1 
> > > > a = zeros(Float64, nt) 
> > > > #end 
> > > > #if flags[3] == 1 
> > > > r_ionization = 20.0 
> > > > n1 = round(Int, ns/2 - r_ionization/dx) 
> > > > n2 = round(Int, ns/2 + r_ionization/dx) 
> > > > ip = zeros(Float64, nt) 
> > > > #end 
> > > > 
> > > > # FFT plan 
> > > > p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE ) 
> > > > 
> > > > prop_x = similar( ϕ ) 
> > > > prop_p = similar( prop_x ) 
> > > > prop_e = similar( prop_x ) 
> > > > # this two versions just cost the same time 
> > > > xplusy = Array(Float64, ns, ns) 
> > > > #xplusy = Array( Float64, ns^2) 
> > > > 
> > > > # absorb boundary 
> > > > r_a = ns * dx /2 - 50.0 
> > > > δ = 10.0 
> > > > absorb = Array(Float64, ns, ns) 
> > > > 
> > > > k0 = 2π / (ns * dx) 
> > > > 
> > > > @inbounds for j in 1:ns 
> > > > @inbounds for i in 1:ns 
> > > > prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt 
> / 2 ) 
> > > > prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt ) 
> > > > 
> > > > xplusy[i, j] = x[i] + x[j] 
> > > > 
> > > > absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - 
> get_out(x[j], 
> > > >  r_a, δ)) 
> > > > end 
> > > > end 
> > > > 
> > > > if flags[2] == 1 
> > > > pvpx = Array(Float64, ns, ns) 
> > > > @inbounds for j in 1:ns 
> > > > @inbounds for i in 1:ns 
> > > > pvpx[i, j] = get_pvpx(x[i], x[j]) 
> > > > end 
> > > > end 
> > > > end 
> > > > 
> > > > if flags[4] == 1 
> > > > ϕo = zeros(Complex128, ns, ns) 
> > > > ϕp = zeros(Complex128, ns, ns) 
> > > > @inbounds for  j in 1:ns 
> > > > @inbounds for  i in 1:ns 
> > > > splitter[i, j] = get_out(x[i], r_sp, δ_sp) * 
> get_out(x[j], r_sp, δ_sp) 
> > > > end 
> > > > end 
> > > > end 
> > > > 
> > > > for i in 1:nt 
> > > > for j in eachindex(ϕ) 
> > > > prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0) 
> 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread Milan Bouchet-Valat
Le mardi 29 mars 2016 à 09:43 -0700, 博陈 a écrit :
> I tried the built-in profiler, and find that the problem lies in
> lines I end  with **, the result is shown below:
> that proved my guess, how can I pre-allocate these arrays? If I don't
> want to divide this code into several parts that calculate these
> arrays separately. 
Can you show us which lines the numbers correspond to? There's no line
211 in the script you posted.


Regards

> | lines | backtrace |
> |   169 |      9011 |  ***
> |   173 |      1552 |
> |   175 |      2604 |
> |   179 |      2906 |
> |   181 |      1541 |
> |   192 |      4458 |
> |   211 |     13332 |
> |   214 |      8431 |
> |   218 |     15871 |***
> |   221 |      2538 |
> 
> > Have you tried:
> > 
> > (a) calling @code_typewarn on your function
> > (b) using the built-in profiler?
> > 
> > 
> > On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote:
> > > First of all, have a look at the result.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > My code calculates the evolution of 1-d 2-electron system in the electric 
> > > field, some variables are calculated during the evolution.
> > > According to the result of @time evolution, my code must have a 
> > > pre-allocation problem. Before you see the long code, i suggest that the 
> > > hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that 
> > > I can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
> > > push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
> > > situations, I don't know how to pre-allocate these arrays.
> > > 
> > > Below is the script (precisely, the main function) itself.
> > > 
> > > function evolution(ϕ::Array{Complex{Float64}, 2},
> > >                    ele::Array{Float64, 1}, dx::Float64, dt::Float64,
> > >                    flags::Tuple{Int64, Int64, Int64, Int64})
> > >     ϕg = copy(ϕ)
> > >     FFTW.set_num_threads(8)
> > >     ns = length( ϕ[:, 1] )
> > >     x = get_x(ns, dx)
> > >     p = get_p(ns, dx)
> > >     if flags[4] == 1
> > >         pp = similar(p)
> > >         A = -cumsum(ele) * dt
> > >         A² = A.*A
> > >         # splitting
> > >         r_sp = 150.0
> > >         δ_sp = 5.0
> > >         splitter = Array(Float64, ns, ns)
> > >     end
> > >     nt = length( ele )
> > > 
> > >     # # Pre-allocate result and temporary arrays
> > >     #if flags[1] == 1
> > >     σ = zeros(Complex128, nt)
> > >     #end
> > >     #if flags[2] == 1
> > >     a = zeros(Float64, nt)
> > >     #end
> > >     #if flags[3] == 1
> > >     r_ionization = 20.0
> > >     n1 = round(Int, ns/2 - r_ionization/dx)
> > >     n2 = round(Int, ns/2 + r_ionization/dx)
> > >     ip = zeros(Float64, nt)
> > >     #end
> > > 
> > >     # FFT plan
> > >     p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
> > > 
> > >     prop_x = similar( ϕ )
> > >     prop_p = similar( prop_x )
> > >     prop_e = similar( prop_x )
> > >     # this two versions just cost the same time
> > >     xplusy = Array(Float64, ns, ns)
> > >     #xplusy = Array( Float64, ns^2)
> > > 
> > >     # absorb boundary
> > >     r_a = ns * dx /2 - 50.0
> > >     δ = 10.0
> > >     absorb = Array(Float64, ns, ns)
> > > 
> > >     k0 = 2π / (ns * dx)
> > > 
> > >     @inbounds for j in 1:ns
> > >         @inbounds for i in 1:ns
> > >             prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt / 2 )
> > >             prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )
> > > 
> > >             xplusy[i, j] = x[i] + x[j]
> > > 
> > >             absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - 
> > > get_out(x[j],
> > >              r_a, δ))
> > >         end
> > >     end
> > > 
> > >     if flags[2] == 1
> > >         pvpx = Array(Float64, ns, ns)
> > >         @inbounds for j in 1:ns
> > >             @inbounds for i in 1:ns
> > >                 pvpx[i, j] = get_pvpx(x[i], x[j])
> > >             end
> > >         end
> > >     end
> > > 
> > >     if flags[4] == 1
> > >         ϕo = zeros(Complex128, ns, ns)
> > >         ϕp = zeros(Complex128, ns, ns)
> > >         @inbounds for  j in 1:ns
> > >             @inbounds for  i in 1:ns
> > >                 splitter[i, j] = get_out(x[i], r_sp, δ_sp) * 
> > > get_out(x[j], r_sp, δ_sp)
> > >             end
> > >         end
> > >     end
> > > 
> > >     for i in 1:nt
> > >         for j in eachindex(ϕ)
> > >             prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0) 
> > > 169
> > >         end
> > > 
> > >         for j in eachindex(ϕ)
> > >             ϕ[j] *= prop_x[j] * prop_e[j]
> > >         end
> > >         p_fft! * ϕ
> > >         for j in eachindex(ϕ)
> > >             ϕ[j] *= prop_p[j]
> > >         end
> > >         p_fft! \ ϕ
> > >         for j in eachindex(ϕ)
> > >             ϕ[j] *= prop_x[j] * prop_e[j]
> > >         end
> > >         ## autocorrelation function σ(t)
> 

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread 博陈
I tried the built-in profiler, and find that the problem lies in lines I 
end  with **, the result is shown below:
that proved my guess, how can I pre-allocate these arrays? If I don't want 
to divide this code into several parts that calculate these arrays 
separately. 

| lines | backtrace |

|   169 |  9011 |  ***

|   173 |  1552 |

|   175 |  2604 |

|   179 |  2906 |

|   181 |  1541 |

|   192 |  4458 |

|   211 | 13332 |

|   214 |  8431 |

|   218 | 15871 |***

|   221 |  2538 |


在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>
> Have you tried:
>
> (a) calling @code_typewarn on your function
> (b) using the built-in profiler?
>
>
> On Tue, Mar 29, 2016 at 9:23 AM, 博陈  
> wrote:
>
>> First of all, have a look at the result.
>>
>>
>> 
>>
>>
>>
>>
>>
>>
>>
>>
>> My code calculates the evolution of 1-d 2-electron system in the electric 
>> field, some variables are calculated during the evolution.
>> According to the result of @time evolution, my code must have a 
>> pre-allocation problem. Before you see the long code, i suggest that the 
>> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I 
>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
>> push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
>> situations, I don't know how to pre-allocate these arrays.
>>
>> Below is the script (precisely, the main function) itself.
>>
>> function evolution(ϕ::Array{Complex{Float64}, 2},
>>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>>flags::Tuple{Int64, Int64, Int64, Int64})
>> ϕg = copy(ϕ)
>> FFTW.set_num_threads(8)
>> ns = length( ϕ[:, 1] )
>> x = get_x(ns, dx)
>> p = get_p(ns, dx)
>> if flags[4] == 1
>> pp = similar(p)
>> A = -cumsum(ele) * dt
>> A² = A.*A
>> # splitting
>> r_sp = 150.0
>> δ_sp = 5.0
>> splitter = Array(Float64, ns, ns)
>> end
>> nt = length( ele )
>>
>> # # Pre-allocate result and temporary arrays
>> #if flags[1] == 1
>> σ = zeros(Complex128, nt)
>> #end
>> #if flags[2] == 1
>> a = zeros(Float64, nt)
>> #end
>> #if flags[3] == 1
>> r_ionization = 20.0
>> n1 = round(Int, ns/2 - r_ionization/dx)
>> n2 = round(Int, ns/2 + r_ionization/dx)
>> ip = zeros(Float64, nt)
>> #end
>>
>> # FFT plan
>> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>>
>> prop_x = similar( ϕ )
>> prop_p = similar( prop_x )
>> prop_e = similar( prop_x )
>> # this two versions just cost the same time
>> xplusy = Array(Float64, ns, ns)
>> #xplusy = Array( Float64, ns^2)
>>
>> # absorb boundary
>> r_a = ns * dx /2 - 50.0
>> δ = 10.0
>> absorb = Array(Float64, ns, ns)
>>
>> k0 = 2π / (ns * dx)
>>
>> @inbounds for j in 1:ns
>> @inbounds for i in 1:ns
>> prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt / 2 )
>> prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )
>>
>> xplusy[i, j] = x[i] + x[j]
>>
>> absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - 
>> get_out(x[j],
>>  r_a, δ))
>> end
>> end
>>
>> if flags[2] == 1
>> pvpx = Array(Float64, ns, ns)
>> @inbounds for j in 1:ns
>> @inbounds for i in 1:ns
>> pvpx[i, j] = get_pvpx(x[i], x[j])
>> end
>> end
>> end
>>
>> if flags[4] == 1
>> ϕo = zeros(Complex128, ns, ns)
>> ϕp = zeros(Complex128, ns, ns)
>> @inbounds for  j in 1:ns
>> @inbounds for  i in 1:ns
>> splitter[i, j] = get_out(x[i], r_sp, δ_sp) * 
>> get_out(x[j], r_sp, δ_sp)
>> end
>> end
>> end
>>
>> for i in 1:nt
>> for j in eachindex(ϕ)
>> prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0) 
>> 169
>> end
>>
>> for j in eachindex(ϕ)
>> ϕ[j] *= prop_x[j] * prop_e[j]
>> end
>> p_fft! * ϕ
>> for j in eachindex(ϕ)
>> ϕ[j] *= prop_p[j]
>> end
>> p_fft! \ ϕ
>> for j in eachindex(ϕ)
>> ϕ[j] *= prop_x[j] * prop_e[j]
>> end
>> ## autocorrelation function σ(t)
>> if flags[1] == 1
>> for j in eachindex(ϕ)
>> σ[i] += conj(ϕg[j]) * ϕ[j]
>> end
>> end
>> ## dipole acceleration a(t)
>> if flags[2] == 1
>> for j in eachindex(ϕ)
>> a[i] += abs(ϕ[j])^2 * (pvpx[j] + 2ele[i])
>> end
>> end
>>  

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread Yichao Yu
On Tue, Mar 29, 2016 at 10:00 AM, 博陈  wrote:
> Actually, I know the arrays allocated in every loop, my problem is that I
> don't know the strategy to pre-allocate such arrays.
> In short, this is the pre-allocating problem of arrays like array a
> described below:
>
>
> n = 1024; nt = 1000;
> dt = 0.1
> a = Array(Float64, n, n)
> for i in 1:nt
> t = i*dt
> for j in eachindex(a)
>  a[j] = exp(pi * t) * a0
> end
> ...  scripts to use a[j] for a single i
> end
>
> In my problem, such arrays are hard to pre-allocate, do you have any
> suggestions?

The code above shouldn't have any allocation in the loop?
Can you point out the part of the code that allocates a lot in loops?

>
> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>>
>> Have you tried:
>>
>> (a) calling @code_typewarn on your function
>> (b) using the built-in profiler?
>>
>>
>


Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread 博陈
Actually, I know the arrays allocated in every loop, my problem is that I 
don't know the strategy to pre-allocate such arrays. 
In short, this is the pre-allocating problem of arrays like array a 
described below:


n = 1024; nt = 1000;
dt = 0.1
a = Array(Float64, n, n)
for i in 1:nt
t = i*dt
for j in eachindex(a)
 a[j] = exp(pi * t) * a0
end
...  scripts to use a[j] for a single i
end

In my problem, such arrays are hard to pre-allocate, do you have any 
suggestions?

在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>
> Have you tried:
>
> (a) calling @code_typewarn on your function
> (b) using the built-in profiler?
>
>
>

Re: [julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread Stefan Karpinski
Have you tried:

(a) calling @code_typewarn on your function
(b) using the built-in profiler?


On Tue, Mar 29, 2016 at 9:23 AM, 博陈  wrote:

> First of all, have a look at the result.
>
>
> 
>
>
>
>
>
>
>
>
> My code calculates the evolution of 1-d 2-electron system in the electric
> field, some variables are calculated during the evolution.
> According to the result of @time evolution, my code must have a
> pre-allocation problem. Before you see the long code, i suggest that the
> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I
> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and
> push!(m, new_m) inside the loop to pre-allocate the variable m, but in my
> situations, I don't know how to pre-allocate these arrays.
>
> Below is the script (precisely, the main function) itself.
>
> function evolution(ϕ::Array{Complex{Float64}, 2},
>ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>flags::Tuple{Int64, Int64, Int64, Int64})
> ϕg = copy(ϕ)
> FFTW.set_num_threads(8)
> ns = length( ϕ[:, 1] )
> x = get_x(ns, dx)
> p = get_p(ns, dx)
> if flags[4] == 1
> pp = similar(p)
> A = -cumsum(ele) * dt
> A² = A.*A
> # splitting
> r_sp = 150.0
> δ_sp = 5.0
> splitter = Array(Float64, ns, ns)
> end
> nt = length( ele )
>
> # # Pre-allocate result and temporary arrays
> #if flags[1] == 1
> σ = zeros(Complex128, nt)
> #end
> #if flags[2] == 1
> a = zeros(Float64, nt)
> #end
> #if flags[3] == 1
> r_ionization = 20.0
> n1 = round(Int, ns/2 - r_ionization/dx)
> n2 = round(Int, ns/2 + r_ionization/dx)
> ip = zeros(Float64, nt)
> #end
>
> # FFT plan
> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>
> prop_x = similar( ϕ )
> prop_p = similar( prop_x )
> prop_e = similar( prop_x )
> # this two versions just cost the same time
> xplusy = Array(Float64, ns, ns)
> #xplusy = Array( Float64, ns^2)
>
> # absorb boundary
> r_a = ns * dx /2 - 50.0
> δ = 10.0
> absorb = Array(Float64, ns, ns)
>
> k0 = 2π / (ns * dx)
>
> @inbounds for j in 1:ns
> @inbounds for i in 1:ns
> prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt / 2 )
> prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )
>
> xplusy[i, j] = x[i] + x[j]
>
> absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 -
> get_out(x[j],
>  r_a, δ))
> end
> end
>
> if flags[2] == 1
> pvpx = Array(Float64, ns, ns)
> @inbounds for j in 1:ns
> @inbounds for i in 1:ns
> pvpx[i, j] = get_pvpx(x[i], x[j])
> end
> end
> end
>
> if flags[4] == 1
> ϕo = zeros(Complex128, ns, ns)
> ϕp = zeros(Complex128, ns, ns)
> @inbounds for  j in 1:ns
> @inbounds for  i in 1:ns
> splitter[i, j] = get_out(x[i], r_sp, δ_sp) * get_out(x[j],
> r_sp, δ_sp)
> end
> end
> end
>
> for i in 1:nt
> for j in eachindex(ϕ)
> prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0)
> end
>
> for j in eachindex(ϕ)
> ϕ[j] *= prop_x[j] * prop_e[j]
> end
> p_fft! * ϕ
> for j in eachindex(ϕ)
> ϕ[j] *= prop_p[j]
> end
> p_fft! \ ϕ
> for j in eachindex(ϕ)
> ϕ[j] *= prop_x[j] * prop_e[j]
> end
> ## autocorrelation function σ(t)
> if flags[1] == 1
> for j in eachindex(ϕ)
> σ[i] += conj(ϕg[j]) * ϕ[j]
> end
> end
> ## dipole acceleration a(t)
> if flags[2] == 1
> for j in eachindex(ϕ)
> a[i] += abs(ϕ[j])^2 * (pvpx[j] + 2ele[i])
> end
> end
> ## ionization probability ip(t)
> if flags[3] == 1
> for j1 in n1:n2
> for j2 in 1:ns
> ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2
> end
> end
> for j1 in [1:n1-1; n2+1:ns]
> for j2 in n1:n2
> ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2
> end
> end
> end
> ## get momentum
> if flags[4] == 1
> for j in eachindex(ϕo)
> ϕo[j] = ϕ[j] * splitter[j] * exp( -im * A[i]*xplusy[j] )
> end
> for j in eachindex(p)
> pp[j] = p[j]^2 /2 * (nt-i) - p[j] *sum( A[i:nt] ) + sum(
> A²[1:nt] ) /2
> end
> for j2 in 1:ns
> for j1 in 1:ns
> 

[julia-users] The Arrays are hard to pre-allocate for me, are they possible to be pre-allocated?

2016-03-29 Thread 博陈
First of all, have a look at the result.










My code calculates the evolution of 1-d 2-electron system in the electric 
field, some variables are calculated during the evolution.
According to the result of @time evolution, my code must have a 
pre-allocation problem. Before you see the long code, i suggest that the 
hotspot might be around the Arrays prop_e, \phio, pp. I have learnt that I 
can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
push!(m, new_m) inside the loop to pre-allocate the variable m, but in my 
situations, I don't know how to pre-allocate these arrays.

Below is the script (precisely, the main function) itself.

function evolution(ϕ::Array{Complex{Float64}, 2},
   ele::Array{Float64, 1}, dx::Float64, dt::Float64,
   flags::Tuple{Int64, Int64, Int64, Int64})
ϕg = copy(ϕ)
FFTW.set_num_threads(8)
ns = length( ϕ[:, 1] )
x = get_x(ns, dx)
p = get_p(ns, dx)
if flags[4] == 1
pp = similar(p)
A = -cumsum(ele) * dt
A² = A.*A
# splitting
r_sp = 150.0
δ_sp = 5.0
splitter = Array(Float64, ns, ns)
end
nt = length( ele )

# # Pre-allocate result and temporary arrays
#if flags[1] == 1
σ = zeros(Complex128, nt)
#end
#if flags[2] == 1
a = zeros(Float64, nt)
#end
#if flags[3] == 1
r_ionization = 20.0
n1 = round(Int, ns/2 - r_ionization/dx)
n2 = round(Int, ns/2 + r_ionization/dx)
ip = zeros(Float64, nt)
#end

# FFT plan
p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )

prop_x = similar( ϕ )
prop_p = similar( prop_x )
prop_e = similar( prop_x )
# this two versions just cost the same time
xplusy = Array(Float64, ns, ns)
#xplusy = Array( Float64, ns^2)

# absorb boundary
r_a = ns * dx /2 - 50.0
δ = 10.0
absorb = Array(Float64, ns, ns)

k0 = 2π / (ns * dx)

@inbounds for j in 1:ns
@inbounds for i in 1:ns
prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt / 2 )
prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )

xplusy[i, j] = x[i] + x[j]

absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - 
get_out(x[j],
 r_a, δ))
end
end

if flags[2] == 1
pvpx = Array(Float64, ns, ns)
@inbounds for j in 1:ns
@inbounds for i in 1:ns
pvpx[i, j] = get_pvpx(x[i], x[j])
end
end
end

if flags[4] == 1
ϕo = zeros(Complex128, ns, ns)
ϕp = zeros(Complex128, ns, ns)
@inbounds for  j in 1:ns
@inbounds for  i in 1:ns
splitter[i, j] = get_out(x[i], r_sp, δ_sp) * get_out(x[j], 
r_sp, δ_sp)
end
end
end

for i in 1:nt
for j in eachindex(ϕ)
prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0)
end

for j in eachindex(ϕ)
ϕ[j] *= prop_x[j] * prop_e[j]
end
p_fft! * ϕ
for j in eachindex(ϕ)
ϕ[j] *= prop_p[j]
end
p_fft! \ ϕ
for j in eachindex(ϕ)
ϕ[j] *= prop_x[j] * prop_e[j]
end
## autocorrelation function σ(t)
if flags[1] == 1
for j in eachindex(ϕ)
σ[i] += conj(ϕg[j]) * ϕ[j]
end
end
## dipole acceleration a(t)
if flags[2] == 1
for j in eachindex(ϕ)
a[i] += abs(ϕ[j])^2 * (pvpx[j] + 2ele[i])
end
end
## ionization probability ip(t)
if flags[3] == 1
for j1 in n1:n2
for j2 in 1:ns
ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2
end
end
for j1 in [1:n1-1; n2+1:ns]
for j2 in n1:n2
ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2
end
end
end
## get momentum
if flags[4] == 1
for j in eachindex(ϕo)
ϕo[j] = ϕ[j] * splitter[j] * exp( -im * A[i]*xplusy[j] )
end
for j in eachindex(p)
pp[j] = p[j]^2 /2 * (nt-i) - p[j] *sum( A[i:nt] ) + sum( 
A²[1:nt] ) /2
end
for j2 in 1:ns
for j1 in 1:ns
ϕo[j1, j2] = ϕo[j1, j2] * exp( -im * (pp[j1] + pp[j2]) 
* dt)
end
end
p_fft! * ϕo
for j in eachindex(ϕp)
ϕp[j] += ϕo[j]
end
end

## absorb boundary
if mod(i, 300) == 0
for j in eachindex(ϕ)
ϕ[j] *= absorb[j]
end
end

if (mod(i, 500) == 0)
println("i = $i")