Hi Carlos, I am working on something that will allow to do multithreading on Julia functions (https://github.com/JuliaLang/julia/pull/6741). Implementing vectorize_1arg_openmp is actually a lot less trivial as the Julia runtime is not thread safe (yet)
Your example is great. I first got a slowdown of 10 because the example revealed a locking issue. With a little trick I now get a speedup of 1.75 on a 2 core machine. Not to bad taking into account that memory allocation cannot be parallelized. The tweaked code looks like function tanh_core(x,y,i) N=length(x) for l=1:N/2 y[l+i*N/2] = tanh(x[l+i*N/2]) end end function ptanh(x;numthreads=2) y = similar(x) N = length(x) parapply(tanh_core,(x,y), 0:1, numthreads=numthreads) y end I actually want this to be also fast for function tanh_core(x,y,i) y[i] = tanh(x[i]) end function ptanh(x;numthreads=2) y = similar(x) N = length(x) parapply(tanh_core,(x,y), 1:N, numthreads=numthreads) y end Am Sonntag, 18. Mai 2014 11:40:13 UTC+2 schrieb Carlos Becker: > > now that I think about it, maybe openblas has nothing to do here, since > @which tanh(y) leads to a call to vectorize_1arg(). > > If that's the case, wouldn't it be advantageous to have a > vectorize_1arg_openmp() function (defined in C/C++) that works for > element-wise operations on scalar arrays, > multi-threading with OpenMP? > > > El domingo, 18 de mayo de 2014 11:34:11 UTC+2, Carlos Becker escribió: >> >> forgot to add versioninfo(): >> >> julia> versioninfo() >> Julia Version 0.3.0-prerelease+2921 >> Commit ea70e4d* (2014-05-07 17:56 UTC) >> Platform Info: >> System: Linux (x86_64-linux-gnu) >> CPU: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz >> WORD_SIZE: 64 >> BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY) >> LAPACK: libopenblas >> LIBM: libopenlibm >> >> >> El domingo, 18 de mayo de 2014 11:33:45 UTC+2, Carlos Becker escribió: >>> >>> This is probably related to openblas, but it seems to be that tanh() is >>> not multi-threaded, which hinders a considerable speed improvement. >>> For example, MATLAB does multi-thread it and gets something around 3x >>> speed-up over the single-threaded version. >>> >>> For example, >>> >>> x = rand(100000,200); >>> @time y = tanh(x); >>> >>> yields: >>> - 0.71 sec in Julia >>> - 0.76 sec in matlab with -singleCompThread >>> - and 0.09 sec in Matlab (this one uses multi-threading by default) >>> >>> Good news is that julia (w/openblas) is competitive with matlab >>> single-threaded version, >>> though setting the env variable OPENBLAS_NUM_THREADS doesn't have any >>> effect on the timings, nor I see higher CPU usage with 'top'. >>> >>> Is there an override for OPENBLAS_NUM_THREADS in julia? what am I >>> missing? >>> >>