[julia-users] Re: SIMD long inner expression

2016-02-04 Thread alan souza
Hi. I think that the number of variables in the inner loop  is too high and 
the compiler can't vectorize the code because of a insufficient number of 
available registers (see wik i- Register_allocation 
 )
and even if the compiler could vectorize this loop the speedup will be at 
best of two times for SSE code (Could you use float32). I believe that can 
be easier to the compiler if split your computation
(calculate the arrays with powers, pointwise multiplication and finally a 
reduction).
On Thursday, February 4, 2016 at 7:29:53 PM UTC-2, Chris Rackauckas wrote:
>
> Hi, I am trying to optimize a code which just adds a bunch of things. My 
> first instinct was to unravel the loops and run it as SIMD, like so:
>
> dx = 1/400
> addprocs(3)
> imin = -6
> jmin = -2
>
> #Some SIMD
> @time res = @sync @parallel (+) for i = imin:dx:0
>   tmp = 0
>   for j=jmin:dx:0
> ans = 0
> @simd for k=1:24
>   @inbounds ans += coefs[k]*(i^(powz[k]))*(j^(poww[k]))
> end
> tmp += abs(ans)<1
>   end
>   tmp
> end
> res = res*(12/(length(imin:dx:0)*length(jmin:dx:0)))
> println(res)#Mathematica gives 2.98
>
> (Coefficients arrays posted at the bottom for completeness). On a 4 core 
> i7 this runs in ~4.68 seconds. I was able to beat this by optimizing the 
> power calculations like so:
>
> ## Full unravel
> @time res = @sync @parallel (+) for i = imin:dx:0
>   tmp = 0
>   isq2 = i*i; isq3 = i*isq2; isq4 = isq2*isq2; isq5 = i*isq4
>   isq6 = isq4*isq2; isq7 = i*isq6; isq8 = isq4*isq4
>   for j=jmin:dx:0
> jsq2 = j*j; jsq4 = jsq2*jsq2; jsq6 = jsq2*jsq4; jsq8 = jsq4*jsq4
> @inbounds tmp += abs(coefs[1]*(jsq2) + coefs[2]*(jsq4) + coefs[3]*(
> jsq6) + coefs[4]*(jsq8) + coefs[5]*(i) + coefs[6]*(i)*(jsq2) + coefs[7]*(i
> )*(jsq4) + coefs[8]*(i)*(jsq6) +
> coefs[9]*(isq2) + coefs[10]*(isq2)*(jsq2) + coefs[11]*(isq2)*(jsq4) + 
> coefs[12]*(isq2)*(jsq6) + coefs[13]*(isq3) + coefs[14]*(isq3)*(jsq2) + 
> coefs[15]*(isq3)*(jsq4) +
> coefs[16]*(isq4) + coefs[17]*(isq4)*(jsq2) + coefs[18]*(isq4)*(jsq4) + 
> coefs[19]*(isq5) + coefs[20]*(isq5)*(jsq2) + coefs[21]*(isq6) + coefs[22
> ]*(isq6)*(jsq2) +
> coefs[23]*(isq7) + coefs[24]*(isq8))<1
>   end
>   tmp
> end
> res = res*(12/(length(imin:dx:0)*length(jmin:dx:0)))
> println(res)#Mathematica gives 2.98
>
> This is the best version and gets ~1.83 seconds (FYI it bets a version 
> where I distributed out by isq). However, when I put this into a function 
> and call it with @code_llvm, I get:
>
> define %jl_value_t* @julia_simdCheck_3161() {
> top:
>   %0 = alloca [8 x %jl_value_t*], align 8
>   %.sub = getelementptr inbounds [8 x %jl_value_t*]* %0, i64 0, i64 0
>   %1 = getelementptr [8 x %jl_value_t*]* %0, i64 0, i64 2
>   %2 = getelementptr [8 x %jl_value_t*]* %0, i64 0, i64 3
>   store %jl_value_t* inttoptr (i64 12 to %jl_value_t*), %jl_value_t** %.
> sub, align 8
>   %3 = getelementptr [8 x %jl_value_t*]* %0, i64 0, i64 1
>   %4 = load %jl_value_t*** @jl_pgcstack, align 8
>   %.c = bitcast %jl_value_t** %4 to %jl_value_t*
>   store %jl_value_t* %.c, %jl_value_t** %3, align 8
>   store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8
>   store %jl_value_t* null, %jl_value_t** %1, align 8
>   store %jl_value_t* null, %jl_value_t** %2, align 8
>   %5 = getelementptr [8 x %jl_value_t*]* %0, i64 0, i64 4
>   store %jl_value_t* null, %jl_value_t** %5, align 8
>   %6 = getelementptr [8 x %jl_value_t*]* %0, i64 0, i64 5
>   store %jl_value_t* null, %jl_value_t** %6, align 8
>   %7 = getelementptr [8 x %jl_value_t*]* %0, i64 0, i64 6
>   store %jl_value_t* null, %jl_value_t** %7, align 8
>   %8 = getelementptr [8 x %jl_value_t*]* %0, i64 0, i64 7
>   store %jl_value_t* null, %jl_value_t** %8, align 8
>   %9 = call %jl_value_t* @julia_sync_begin_535()
>   store %jl_value_t* inttoptr (i64 2274649264 to %jl_value_t*), %
> jl_value_t** %2, align 8
>   %10 = load %jl_value_t* (%jl_value_t*, %jl_value_t**, i32)** inttoptr (i64 
> 2274649264 to %jl_value_t* (%jl_value_t*, %jl_value_t**, i32)**), align 16
>   %11 = load %jl_value_t** inttoptr (i64 2212282536 to %jl_value_t**), 
> align 8
>   store %jl_value_t* %11, %jl_value_t** %5, align 8
>   %12 = load %jl_value_t** inttoptr (i64 2197909240 to %jl_value_t**), 
> align 8
>   store %jl_value_t* %12, %jl_value_t** %6, align 8
>   %13 = load %jl_value_t** inttoptr (i64 2197909288 to %jl_value_t**), 
> align 8
>   store %jl_value_t* %13, %jl_value_t** %7, align 8
>   %14 = load %jl_value_t** inttoptr (i64 2212418552 to %jl_value_t**), 
> align 8
>   store %jl_value_t* %14, %jl_value_t** %8, align 8
>   %15 = call %jl_value_t* %10(%jl_value_t* inttoptr (i64 2274649264 to %
> jl_value_t*), %jl_value_t** %5, i32 4)
>   store %jl_value_t* %15, %jl_value_t** %1, align 8
>   call void @julia_sync_end_541()
>   %16 = load %jl_value_t** %3, align 8
>   %17 = getelementptr inbounds %jl_value_t* %16, i64 0, i32 0
>   store %jl_value_t** %17, %jl_value_t*** @jl_pgcstack, 

[julia-users] Re: Sparse matrix memory preallocation?

2016-02-01 Thread alan souza
You could try to use the triplet form (tree vectors containing the row/col 
indexes and the value of the entry) and call the function sparse.
In this way you can preallocate in advance these three vectors. 
However  doing in this way is more cumbersome because you need to have a 
good estimate of the number of entries and to explicitly calculate the 
index for all entries.

On Sunday, January 31, 2016 at 8:07:56 PM UTC-2, Gabriel Goh wrote:
>
> Generating a sparse matrix from scratch seems to be quite memory 
> intensive. and slow. Say I wish to create a large block diagonal matrix 
> with 2x2 block entries.
>
> Doing it naively is quite slow
>
> function f(k)
>   M = spzeros(2*k,2*k)
>   for i = 1:k
> D = (i-1)*2 + 1:i*2
> M[D,D] = randn(2,2)
>   end
>   return M
> end
>
> julia> @time f(1)
> 2.534277 seconds (239.26 k allocations: 3.013 GB, 15.58% gc time)
>
> Is there a way to speed this up by preallocating the memory somehow?
>


[julia-users] Re: passing function as arguments and avoiding unecessary allocations

2015-12-28 Thread alan souza
Thanks for the tip. I was trying to annotate the function call before the 
loop without success.

Do you know the reason for the speedup? Because the extra allocations still 
happening albeit a little less.

thanks

On Sunday, December 27, 2015 at 5:07:56 PM UTC-2, Lutfullah Tomak wrote:
>
> It seems adding typecheck is 2x improvements in my laptop.
>y = zeros(T, n)
> for i in eachindex(X)
> y[i] = F(X[i])::T
> end
> return y
>


[julia-users] Re: How to avoid extra allocations when passing a function as parameter

2015-12-28 Thread alan souza
Sorry for the double posting.
Actually I was only interested for "decimal" inputs (single and double 
precision) is there a better way to express this instead of using Real?
thanks

On Sunday, December 27, 2015 at 6:09:30 PM UTC-2, Ismael Venegas Castelló 
wrote:
>
> For calcF your variables:
>
> Variables:
>   X::Array{Float64,1}
>   F::F
>   n::Int64
>   y::Array{Float64,1}
>   #s76::Int64
>   i::Int64
>   ##dims#7321::Tuple{Int64}
>
> and return value:
>
> return y::Array{Float64,1}
> end::Array{Float64,1}
>
> Are correctly inferred if I use floats so it seems fine.
>
> But if I use int's as arguments to main I get an InexactError, are you 
> sure it should accept `T <: Real`?  Maybe you just need Float64?
>
>
>
> El domingo, 27 de diciembre de 2015, 11:21:35 (UTC-6), alan souza escribió:
>>
>> Hi. I was wondering what is the correct way to pass a function as 
>> argument preserving the return type information of this function
>>
>> For instance in the following code:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *function sinc{T<:Real}(x::T)return T((x != zero(T) 
>> )?(sin(pi*x)/(pi*x)):(one(T)))endfunction calcF{T<:Real}(X::Array{T,1}, 
>> F::Function)n = length(X); @assert(n > 0)y = zeros(T, n)for 
>> i in eachindex(X)y[i] = F(X[i])endreturn yendfunction 
>> calcSinc{T<:Real}(X::Array{T,1})n = length(X); @assert(n > 0)y 
>> = zeros(T, n)for i in eachindex(X)y[i] = sinc(X[i])end
>> return yendfunction main{T<:Real}(min::T, max::T, d::T)X = 
>> collect(min:d:max)calcF(X, sinc)calcSinc(X)@code_warntype 
>> calcF(X, sinc)@time calcF(X, sinc)@code_warntype calcSinc(X)
>> @time calcSinc(X)YF = calcF(X, sinc)YS = calcSinc(X)end*
>>
>> Running this code with julia (versions 0.5+dev or 0.4.2) I got these 
>> results:
>>
>>
>>
>>
>>
>>
>> *  #s1 = GenSym(5) # /home/me/scratch/julia_fd/teste.jl, line 
>> 10:  GenSym(2) = 
>> (F::F)((Base.arrayref)(X::Array{Float64,1},i::Int64)::Float64)::ANY  
>> (Base.arrayset)(y::Array{Float64,1},(top(typeassert))((Base.convert)(Float64,GenSym(2))::ANY,Float64)::Float64,i::Int64)::Array{Float64,1}
>>   
>> 5:   unless 
>> (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
>>  
>> === 
>> (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool
>>  
>> goto 4  3:   0.080181 seconds (1.88 M allocations: 33.556 MB, 6.41% gc 
>> time)*
>>
>> and for the calcSinc function:
>>
>>
>>
>>
>>
>>
>>
>> *   #s1 = GenSym(5) # /home/me/scratch/julia_fd/teste.jl, line 20:  
>> GenSym(2) = 
>> (Main.sinc)((Base.arrayref)(X::Array{Float64,1},i::Int64)::Float64)::Float64 
>>  
>> (Base.arrayset)(y::Array{Float64,1},GenSym(2),i::Int64)::Array{Float64,1}
>>   
>> 5:   unless 
>> (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
>>  
>> === 
>> (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool
>>  
>> goto 4  3:   0.031224 seconds (2 allocations: 4.794 MB)*
>> Is there a way to prevent the result of function to "evaluate" to ::Any 
>> on the first case?
>>
>> thanks
>>
>>

[julia-users] How to avoid extra allocations when passing a function as parameter

2015-12-27 Thread alan souza
Hi. I was wondering what is the correct way to pass a function as argument 
preserving the return type information of this function

For instance in the following code:



































*function sinc{T<:Real}(x::T)return T((x != zero(T) 
)?(sin(pi*x)/(pi*x)):(one(T)))endfunction calcF{T<:Real}(X::Array{T,1}, 
F::Function)n = length(X); @assert(n > 0)y = zeros(T, n)for 
i in eachindex(X)y[i] = F(X[i])endreturn yendfunction 
calcSinc{T<:Real}(X::Array{T,1})n = length(X); @assert(n > 0)y 
= zeros(T, n)for i in eachindex(X)y[i] = sinc(X[i])end
return yendfunction main{T<:Real}(min::T, max::T, d::T)X = 
collect(min:d:max)calcF(X, sinc)calcSinc(X)@code_warntype 
calcF(X, sinc)@time calcF(X, sinc)@code_warntype calcSinc(X)
@time calcSinc(X)YF = calcF(X, sinc)YS = calcSinc(X)end*

Running this code with julia (versions 0.5+dev or 0.4.2) I got these 
results:






*  #s1 = GenSym(5) # /home/me/scratch/julia_fd/teste.jl, line 10:  
GenSym(2) = 
(F::F)((Base.arrayref)(X::Array{Float64,1},i::Int64)::Float64)::ANY  
(Base.arrayset)(y::Array{Float64,1},(top(typeassert))((Base.convert)(Float64,GenSym(2))::ANY,Float64)::Float64,i::Int64)::Array{Float64,1}
  
5:   unless 
(Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
 
=== 
(Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool
 
goto 4  3:   0.080181 seconds (1.88 M allocations: 33.556 MB, 6.41% gc 
time)*

and for the calcSinc function:
   






*   #s1 = GenSym(5) # /home/me/scratch/julia_fd/teste.jl, line 20:  
GenSym(2) = 
(Main.sinc)((Base.arrayref)(X::Array{Float64,1},i::Int64)::Float64)::Float64
  
(Base.arrayset)(y::Array{Float64,1},GenSym(2),i::Int64)::Array{Float64,1}  
5:   unless 
(Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
 
=== 
(Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool
 
goto 4  3:   0.031224 seconds (2 allocations: 4.794 MB)*
Is there a way to prevent the result of function to "evaluate" to ::Any on 
the first case?

thanks



[julia-users] passing function as arguments and avoiding unecessary allocations

2015-12-27 Thread alan souza
Hi. I was wondering what is the correct way to pass a function as argument 
preserving the return type information

For instance in the following code:







































*function sinc{T<:Real}(x::T)return T((x != zero(T) 
)?(sin(pi*x)/(pi*x)):(one(T)))endfunction calcF{T<:Real}(X::Array{T,1}, 
F::Function)n = length(X); @assert(n > 0)y = zeros(T, n)for 
i in eachindex(X)y[i] = F(X[i])endreturn yendfunction 
calcSinc{T<:Real}(X::Array{T,1})n = length(X); @assert(n > 0)y 
= zeros(T, n)for i in eachindex(X)y[i] = sinc(X[i])end
return yendfunction main{T<:Real}(min::T, max::T, d::T)X = 
collect(min:d:max)calcF(X, sinc)calcSinc(X)@code_warntype 
calcF(X, sinc)@time calcF(X, sinc)@code_warntype calcSinc(X)
@time calcSinc(X)YF = calcF(X, sinc)YS = 
calcSinc(X)endmain(-1.0*pi,1.0*pi,0.1)*After running this code I got 
these results (using @code_warntype and @time) :
function CalcF:






*  #s1 = GenSym(5) # /home/user/scratch/julia/test.jl, line 10:  
GenSym(2) = 
(F::F)((Base.arrayref)(X::Array{Float64,1},i::Int64)::Float64)::ANY  
(Base.arrayset)(y::Array{Float64,1},(top(typeassert))((Base.convert)(Float64,GenSym(2))::ANY,Float64)::Float64,i::Int64)::Array{Float64,1}
  
5:   unless 
(Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
 
=== 
(Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool
 
goto 4  3:   0.080181 seconds (1.88 M allocations: 33.556 MB, 6.41% gc 
time)*

but for the function calcSinc:






*  #s1 = GenSym(5) # /home/user/scratch/julia/test.jl, line 20:  
GenSym(2) = 
(Main.sinc)((Base.arrayref)(X::Array{Float64,1},i::Int64)::Float64)::Float64
  
(Base.arrayset)(y::Array{Float64,1},GenSym(2),i::Int64)::Array{Float64,1}  
5:   unless 
(Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
 
=== 
(Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool
 
goto 4  3:   0.031224 seconds (2 allocations: 4.794 MB)*

Is there a way to avoid these allocation on the first case?

thanks.