now that I think about it, maybe openblas has nothing to do here, since @which tanh(y) leads to a call to vectorize_1arg().
If that's the case, wouldn't it be advantageous to have a vectorize_1arg_openmp() function (defined in C/C++) that works for element-wise operations on scalar arrays, multi-threading with OpenMP? El domingo, 18 de mayo de 2014 11:34:11 UTC+2, Carlos Becker escribió: > > forgot to add versioninfo(): > > julia> versioninfo() > Julia Version 0.3.0-prerelease+2921 > Commit ea70e4d* (2014-05-07 17:56 UTC) > Platform Info: > System: Linux (x86_64-linux-gnu) > CPU: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz > WORD_SIZE: 64 > BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY) > LAPACK: libopenblas > LIBM: libopenlibm > > > El domingo, 18 de mayo de 2014 11:33:45 UTC+2, Carlos Becker escribió: >> >> This is probably related to openblas, but it seems to be that tanh() is >> not multi-threaded, which hinders a considerable speed improvement. >> For example, MATLAB does multi-thread it and gets something around 3x >> speed-up over the single-threaded version. >> >> For example, >> >> x = rand(100000,200); >> @time y = tanh(x); >> >> yields: >> - 0.71 sec in Julia >> - 0.76 sec in matlab with -singleCompThread >> - and 0.09 sec in Matlab (this one uses multi-threading by default) >> >> Good news is that julia (w/openblas) is competitive with matlab >> single-threaded version, >> though setting the env variable OPENBLAS_NUM_THREADS doesn't have any >> effect on the timings, nor I see higher CPU usage with 'top'. >> >> Is there an override for OPENBLAS_NUM_THREADS in julia? what am I missing? >> >