Hi Carlos,

I am working on something that will allow to do multithreading on Julia 
functions (https://github.com/JuliaLang/julia/pull/6741). Implementing 
vectorize_1arg_openmp is actually a lot less trivial as the Julia runtime 
is not thread safe (yet)

Your example is great. I first got a slowdown of 10 because the example 
revealed a locking issue. With a little trick I now get a speedup of 1.75 
on a 2 core machine. Not to bad taking into account that memory allocation 
cannot be parallelized.

The tweaked code looks like

function tanh_core(x,y,i)

    N=length(x)

    for l=1:N/2

      y[l+i*N/2] = tanh(x[l+i*N/2])

    end

end


function ptanh(x;numthreads=2)

    y = similar(x)

    N = length(x)

    parapply(tanh_core,(x,y), 0:1, numthreads=numthreads)

    y

end


I actually want this to be also fast for


function tanh_core(x,y,i)

    y[i] = tanh(x[i])

end


function ptanh(x;numthreads=2)

    y = similar(x)

    N = length(x)

    parapply(tanh_core,(x,y), 1:N, numthreads=numthreads)

    y

end

Am Sonntag, 18. Mai 2014 11:40:13 UTC+2 schrieb Carlos Becker:
>
> now that I think about it, maybe openblas has nothing to do here, since 
> @which tanh(y) leads to a call to vectorize_1arg().
>
> If that's the case, wouldn't it be advantageous to have a 
> vectorize_1arg_openmp() function (defined in C/C++) that works for 
> element-wise operations on scalar arrays,
> multi-threading with OpenMP?
>
>
> El domingo, 18 de mayo de 2014 11:34:11 UTC+2, Carlos Becker escribió:
>>
>> forgot to add versioninfo():
>>
>> julia> versioninfo()
>> Julia Version 0.3.0-prerelease+2921
>> Commit ea70e4d* (2014-05-07 17:56 UTC)
>> Platform Info:
>>   System: Linux (x86_64-linux-gnu)
>>   CPU: Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
>>   WORD_SIZE: 64
>>   BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
>>   LAPACK: libopenblas
>>   LIBM: libopenlibm
>>
>>
>> El domingo, 18 de mayo de 2014 11:33:45 UTC+2, Carlos Becker escribió:
>>>
>>> This is probably related to openblas, but it seems to be that tanh() is 
>>> not multi-threaded, which hinders a considerable speed improvement.
>>> For example, MATLAB does multi-thread it and gets something around 3x 
>>> speed-up over the single-threaded version.
>>>
>>> For example,
>>>
>>>   x = rand(100000,200);
>>>   @time y = tanh(x);
>>>
>>> yields:
>>>   - 0.71 sec in Julia
>>>   - 0.76 sec in matlab with -singleCompThread
>>>   - and 0.09 sec in Matlab (this one uses multi-threading by default)
>>>
>>> Good news is that julia (w/openblas) is competitive with matlab 
>>> single-threaded version,
>>> though setting the env variable OPENBLAS_NUM_THREADS doesn't have any 
>>> effect on the timings, nor I see higher CPU usage with 'top'.
>>>
>>> Is there an override for OPENBLAS_NUM_THREADS in julia? what am I 
>>> missing?
>>>
>>

Reply via email to