> On 05-Dec-2014, at 1:16 am, Douglas Bates <[email protected]> wrote: > > Thanks, I'll try that. I'm still curious as to why there is so little > difference between 8 and 16 threads.
peakflops() just performs a matrix multiplication to estimate the flops. It uses a 2000x2000 matrix by default, which is good for most laptops, but for bigger machines with more cores, one often needs to use a larger matrix to see the speedup. peakflops(8000) should give a good indication. I am not sure what the running time will be, so you may want to gradually increase the size. -viral > > -viral > > On Friday, December 5, 2014 1:00:39 AM UTC+5:30, Douglas Bates wrote: > I have been working on a package https://github.com/dmbates/ParalllelGLM.jl > and noticed some peculiarities in the timings on a couple of shared-memory > servers, each with 32 cores. In particular changing from 16 workers to 32 > workers actually slowed down the fitting process. So I decided to check how > changing the number of OpenBLAS threads affected the peakflops() result. I > end up with essentially the same results for 8, 16 and 32 threads on this > machine with 32 cores. Is that to be expected? > > _ _ _(_)_ | A fresh approach to technical computing > (_) | (_) (_) | Documentation: http://docs.julialang.org > _ _ _| |_ __ _ | Type "help()" for help. > | | | | | | |/ _` | | > | | |_| | | | (_| | | Version 0.4.0-dev+1944 (2014-12-04 15:06 UTC) > _/ |\__'_|_|_|\__'_| | Commit 87e9ee1* (0 days old master) > |__/ | x86_64-unknown-linux-gnu > > julia> [peakflops()::Float64 for i in 1:6] > 6-element Array{Float64,1}: > 1.41151e11 > 1.1676e11 > 1.27597e11 > 1.27607e11 > 1.27518e11 > 1.27478e11 > > julia> CPU_CORES > 32 > > julia> blas_set_num_threads(16) > > julia> [peakflops()::Float64 for i in 1:6] > 6-element Array{Float64,1}: > 1.23523e11 > 1.27119e11 > 1.11381e11 > 1.17847e11 > 1.28415e11 > 1.17998e11 > > julia> blas_set_num_threads(8) > > julia> [peakflops()::Float64 for i in 1:6] > 6-element Array{Float64,1}: > 1.25194e11 > 1.20969e11 > 1.25777e11 > 1.20757e11 > 1.26086e11 > 1.20958e11 > > julia> versioninfo(true) > Julia Version 0.4.0-dev+1944 > Commit 87e9ee1* (2014-12-04 15:06 UTC) > Platform Info: > System: Linux (x86_64-unknown-linux-gnu) > CPU: AMD Opteron(tm) Processor 6328 > WORD_SIZE: 64 > "Red Hat Enterprise Linux Server release 6.5 (Santiago)" > uname: Linux 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 2013 > x86_64 x86_64 > Memory: 504.78467178344727 GB (508598.8125 MB free) > Uptime: 261586.0 sec > Load Avg: 0.08740234375 0.19384765625 0.8330078125 > AMD Opteron(tm) Processor 6328 : > speed user nice sys idle > irq > #1-32 3199 MHz 1855973 s 23392 s 670932 s 834073187 s > 21 s > > BLAS: libopenblas (USE64BITINT NO_AFFINITY PILEDRIVER) > LAPACK: libopenblas > LIBM: libopenlibm > LLVM: libLLVM-3.5.0 > Environment: > TERM = screen > PATH = > /s/cmake-3.0.2/bin:/s/gcc-4.9.2/bin:./u/b/a/bates/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/s/std/bin:/usr/afsws/bin: > WWW_HOME = http://www.stat.wisc.edu/ > JULIA_PKGDIR = /scratch/bates/.julia > HOME = /u/b/a/bates > > Package Directory: /scratch/bates/.julia/v0.4 > 2 required packages: > - Distributions 0.6.1 > - Docile 0.3.2 > 5 additional packages: > - ArrayViews 0.4.8 > - Compat 0.2.5 > - PDMats 0.3.1 > - ParallelGLM 0.0.0- master (unregistered) > - StatsBase 0.6.10 > > >
