On Thursday, December 4, 2014 1:50:06 PM UTC-6, Viral Shah wrote:
>
> > On 05-Dec-2014, at 1:16 am, Douglas Bates <[email protected]
> <javascript:>> wrote:
> >
> > Thanks, I'll try that. I'm still curious as to why there is so little
> difference between 8 and 16 threads.
>
> peakflops() just performs a matrix multiplication to estimate the flops.
> It uses a 2000x2000 matrix by default, which is good for most laptops, but
> for bigger machines with more cores, one often needs to use a larger matrix
> to see the speedup.
>
> peakflops(8000) should give a good indication. I am not sure what the
> running time will be, so you may want to gradually increase the size.
>
>
8000 is reasonable on this machine and it does stabilize the results from
repeated timings. But I still have essentially no difference between 8 and
16 threads. I wonder if somehow the NUM_THREADS is being set to 8,
although looking in the deps/Makefile it does seem that it should be 16
julia> blas_set_num_threads(4)
julia> [peakflops(8000)::Float64 for i in 1:6]
6-element Array{Float64,1}:
8.66823e10
8.65584e10
8.65692e10
8.64753e10
8.64083e10
8.63359e10
julia> blas_set_num_threads(8)
julia> [peakflops(8000)::Float64 for i in 1:6]
6-element Array{Float64,1}:
1.68008e11
1.67772e11
1.67378e11
1.67397e11
1.6746e11
1.67623e11
julia> blas_set_num_threads(16)
julia> [peakflops(8000)::Float64 for i in 1:6]
6-element Array{Float64,1}:
1.66779e11
1.70068e11
1.698e11
1.70419e11
1.70601e11
1.67226e11
> -viral
>
>
>
> >
> > -viral
> >
> > On Friday, December 5, 2014 1:00:39 AM UTC+5:30, Douglas Bates wrote:
> > I have been working on a package
> https://github.com/dmbates/ParalllelGLM.jl and noticed some peculiarities
> in the timings on a couple of shared-memory servers, each with 32 cores.
> In particular changing from 16 workers to 32 workers actually slowed down
> the fitting process. So I decided to check how changing the number of
> OpenBLAS threads affected the peakflops() result. I end up with
> essentially the same results for 8, 16 and 32 threads on this machine with
> 32 cores. Is that to be expected?
> >
> > _ _ _(_)_ | A fresh approach to technical computing
> > (_) | (_) (_) | Documentation: http://docs.julialang.org
> > _ _ _| |_ __ _ | Type "help()" for help.
> > | | | | | | |/ _` | |
> > | | |_| | | | (_| | | Version 0.4.0-dev+1944 (2014-12-04 15:06 UTC)
> > _/ |\__'_|_|_|\__'_| | Commit 87e9ee1* (0 days old master)
> > |__/ | x86_64-unknown-linux-gnu
> >
> > julia> [peakflops()::Float64 for i in 1:6]
> > 6-element Array{Float64,1}:
> > 1.41151e11
> > 1.1676e11
> > 1.27597e11
> > 1.27607e11
> > 1.27518e11
> > 1.27478e11
> >
> > julia> CPU_CORES
> > 32
> >
> > julia> blas_set_num_threads(16)
> >
> > julia> [peakflops()::Float64 for i in 1:6]
> > 6-element Array{Float64,1}:
> > 1.23523e11
> > 1.27119e11
> > 1.11381e11
> > 1.17847e11
> > 1.28415e11
> > 1.17998e11
> >
> > julia> blas_set_num_threads(8)
> >
> > julia> [peakflops()::Float64 for i in 1:6]
> > 6-element Array{Float64,1}:
> > 1.25194e11
> > 1.20969e11
> > 1.25777e11
> > 1.20757e11
> > 1.26086e11
> > 1.20958e11
> >
> > julia> versioninfo(true)
> > Julia Version 0.4.0-dev+1944
> > Commit 87e9ee1* (2014-12-04 15:06 UTC)
> > Platform Info:
> > System: Linux (x86_64-unknown-linux-gnu)
> > CPU: AMD Opteron(tm) Processor 6328
> > WORD_SIZE: 64
> > "Red Hat Enterprise Linux Server release 6.5 (Santiago)"
> > uname: Linux 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST
> 2013 x86_64 x86_64
> > Memory: 504.78467178344727 GB (508598.8125 MB free)
> > Uptime: 261586.0 sec
> > Load Avg: 0.08740234375 0.19384765625 0.8330078125
> > AMD Opteron(tm) Processor 6328 :
> > speed user nice sys idle
> irq
> > #1-32 3199 MHz 1855973 s 23392 s 670932 s 834073187 s
> 21 s
> >
> > BLAS: libopenblas (USE64BITINT NO_AFFINITY PILEDRIVER)
> > LAPACK: libopenblas
> > LIBM: libopenlibm
> > LLVM: libLLVM-3.5.0
> > Environment:
> > TERM = screen
> > PATH =
> /s/cmake-3.0.2/bin:/s/gcc-4.9.2/bin:./u/b/a/bates/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/s/std/bin:/usr/afsws/bin:
>
>
> > WWW_HOME = http://www.stat.wisc.edu/
> > JULIA_PKGDIR = /scratch/bates/.julia
> > HOME = /u/b/a/bates
> >
> > Package Directory: /scratch/bates/.julia/v0.4
> > 2 required packages:
> > - Distributions 0.6.1
> > - Docile 0.3.2
> > 5 additional packages:
> > - ArrayViews 0.4.8
> > - Compat 0.2.5
> > - PDMats 0.3.1
> > - ParallelGLM 0.0.0- master
> (unregistered)
> > - StatsBase 0.6.10
> >
> >
> >
>
>