> On 05-Dec-2014, at 1:16 am, Douglas Bates <[email protected]> wrote:
> 
> Thanks, I'll try that.  I'm still curious as to why there is so little 
> difference between 8 and 16 threads. 

peakflops() just performs a matrix multiplication to estimate the flops. It 
uses a 2000x2000 matrix by default, which is good for most laptops, but for 
bigger machines with more cores, one often needs to use a larger matrix to see 
the speedup.

peakflops(8000) should give a good indication. I am not sure what the running 
time will be, so you may want to gradually increase the size.

-viral



> 
> -viral
> 
> On Friday, December 5, 2014 1:00:39 AM UTC+5:30, Douglas Bates wrote:
> I have been working on a package https://github.com/dmbates/ParalllelGLM.jl 
> and noticed some peculiarities in the timings on a couple of shared-memory 
> servers, each with 32 cores.  In particular changing from 16 workers to 32 
> workers actually slowed down the fitting process.  So I decided to check how 
> changing the number of OpenBLAS threads affected the peakflops() result.  I 
> end up with essentially the same results for 8, 16 and 32 threads on this 
> machine with 32 cores.  Is that to be expected?
> 
>    _       _ _(_)_     |  A fresh approach to technical computing
>   (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
>    _ _   _| |_  __ _   |  Type "help()" for help.
>   | | | | | | |/ _` |  |
>   | | |_| | | | (_| |  |  Version 0.4.0-dev+1944 (2014-12-04 15:06 UTC)
>  _/ |\__'_|_|_|\__'_|  |  Commit 87e9ee1* (0 days old master)
> |__/                   |  x86_64-unknown-linux-gnu
> 
> julia> [peakflops()::Float64 for i in 1:6]
> 6-element Array{Float64,1}:
>  1.41151e11
>  1.1676e11 
>  1.27597e11
>  1.27607e11
>  1.27518e11
>  1.27478e11
> 
> julia> CPU_CORES
> 32
> 
> julia> blas_set_num_threads(16)
> 
> julia> [peakflops()::Float64 for i in 1:6]
> 6-element Array{Float64,1}:
>  1.23523e11
>  1.27119e11
>  1.11381e11
>  1.17847e11
>  1.28415e11
>  1.17998e11
> 
> julia> blas_set_num_threads(8)
> 
> julia> [peakflops()::Float64 for i in 1:6]
> 6-element Array{Float64,1}:
>  1.25194e11
>  1.20969e11
>  1.25777e11
>  1.20757e11
>  1.26086e11
>  1.20958e11
> 
> julia> versioninfo(true)
> Julia Version 0.4.0-dev+1944
> Commit 87e9ee1* (2014-12-04 15:06 UTC)
> Platform Info:
>   System: Linux (x86_64-unknown-linux-gnu)
>   CPU: AMD Opteron(tm) Processor 6328                 
>   WORD_SIZE: 64
>            "Red Hat Enterprise Linux Server release 6.5 (Santiago)"
>   uname: Linux 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 2013 
> x86_64 x86_64
> Memory: 504.78467178344727 GB (508598.8125 MB free)
> Uptime: 261586.0 sec
> Load Avg:  0.08740234375  0.19384765625  0.8330078125
> AMD Opteron(tm) Processor 6328                 : 
>           speed         user         nice          sys         idle          
> irq
> #1-32  3199 MHz    1855973 s      23392 s     670932 s  834073187 s         
> 21 s
> 
>   BLAS: libopenblas (USE64BITINT NO_AFFINITY PILEDRIVER)
>   LAPACK: libopenblas
>   LIBM: libopenlibm
>   LLVM: libLLVM-3.5.0
> Environment:
>   TERM = screen
>   PATH = 
> /s/cmake-3.0.2/bin:/s/gcc-4.9.2/bin:./u/b/a/bates/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/s/std/bin:/usr/afsws/bin:
>   WWW_HOME = http://www.stat.wisc.edu/
>   JULIA_PKGDIR = /scratch/bates/.julia
>   HOME = /u/b/a/bates
> 
> Package Directory: /scratch/bates/.julia/v0.4
> 2 required packages:
>  - Distributions                 0.6.1
>  - Docile                        0.3.2
> 5 additional packages:
>  - ArrayViews                    0.4.8
>  - Compat                        0.2.5
>  - PDMats                        0.3.1
>  - ParallelGLM                   0.0.0-             master (unregistered)
>  - StatsBase                     0.6.10
> 
> 
> 

Reply via email to