I have been working on a package https://github.com/dmbates/ParalllelGLM.jl and noticed some peculiarities in the timings on a couple of shared-memory servers, each with 32 cores. In particular changing from 16 workers to 32 workers actually slowed down the fitting process. So I decided to check how changing the number of OpenBLAS threads affected the peakflops() result. I end up with essentially the same results for 8, 16 and 32 threads on this machine with 32 cores. Is that to be expected?
_ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_) | Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type "help()" for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.4.0-dev+1944 (2014-12-04 15:06 UTC) _/ |\__'_|_|_|\__'_| | Commit 87e9ee1* (0 days old master) |__/ | x86_64-unknown-linux-gnu julia> [peakflops()::Float64 for i in 1:6] 6-element Array{Float64,1}: 1.41151e11 1.1676e11 1.27597e11 1.27607e11 1.27518e11 1.27478e11 julia> CPU_CORES 32 julia> blas_set_num_threads(16) julia> [peakflops()::Float64 for i in 1:6] 6-element Array{Float64,1}: 1.23523e11 1.27119e11 1.11381e11 1.17847e11 1.28415e11 1.17998e11 julia> blas_set_num_threads(8) julia> [peakflops()::Float64 for i in 1:6] 6-element Array{Float64,1}: 1.25194e11 1.20969e11 1.25777e11 1.20757e11 1.26086e11 1.20958e11 julia> versioninfo(true) Julia Version 0.4.0-dev+1944 Commit 87e9ee1* (2014-12-04 15:06 UTC) Platform Info: System: Linux (x86_64-unknown-linux-gnu) CPU: AMD Opteron(tm) Processor 6328 WORD_SIZE: 64 "Red Hat Enterprise Linux Server release 6.5 (Santiago)" uname: Linux 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 2013 x86_64 x86_64 Memory: 504.78467178344727 GB (508598.8125 MB free) Uptime: 261586.0 sec Load Avg: 0.08740234375 0.19384765625 0.8330078125 AMD Opteron(tm) Processor 6328 : speed user nice sys idle irq #1-32 3199 MHz 1855973 s 23392 s 670932 s 834073187 s 21 s BLAS: libopenblas (USE64BITINT NO_AFFINITY PILEDRIVER) LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.5.0 Environment: TERM = screen PATH = /s/cmake-3.0.2/bin:/s/gcc-4.9.2/bin:./u/b/a/bates/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/s/std/bin:/usr/afsws/bin: WWW_HOME = http://www.stat.wisc.edu/ JULIA_PKGDIR = /scratch/bates/.julia HOME = /u/b/a/bates Package Directory: /scratch/bates/.julia/v0.4 2 required packages: - Distributions 0.6.1 - Docile 0.3.2 5 additional packages: - ArrayViews 0.4.8 - Compat 0.2.5 - PDMats 0.3.1 - ParallelGLM 0.0.0- master (unregistered) - StatsBase 0.6.10