I have been working on a package https://github.com/dmbates/ParalllelGLM.jl 
and noticed some peculiarities in the timings on a couple of shared-memory 
servers, each with 32 cores.  In particular changing from 16 workers to 32 
workers actually slowed down the fitting process.  So I decided to check 
how changing the number of OpenBLAS threads affected the peakflops() 
result.  I end up with essentially the same results for 8, 16 and 32 
threads on this machine with 32 cores.  Is that to be expected?

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+1944 (2014-12-04 15:06 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 87e9ee1* (0 days old master)
|__/                   |  x86_64-unknown-linux-gnu

julia> [peakflops()::Float64 for i in 1:6]
6-element Array{Float64,1}:
 1.41151e11
 1.1676e11 
 1.27597e11
 1.27607e11
 1.27518e11
 1.27478e11

julia> CPU_CORES
32

julia> blas_set_num_threads(16)

julia> [peakflops()::Float64 for i in 1:6]
6-element Array{Float64,1}:
 1.23523e11
 1.27119e11
 1.11381e11
 1.17847e11
 1.28415e11
 1.17998e11

julia> blas_set_num_threads(8)

julia> [peakflops()::Float64 for i in 1:6]
6-element Array{Float64,1}:
 1.25194e11
 1.20969e11
 1.25777e11
 1.20757e11
 1.26086e11
 1.20958e11

julia> versioninfo(true)
Julia Version 0.4.0-dev+1944
Commit 87e9ee1* (2014-12-04 15:06 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: AMD Opteron(tm) Processor 6328                 
  WORD_SIZE: 64
           "Red Hat Enterprise Linux Server release 6.5 (Santiago)"
  uname: Linux 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 
2013 x86_64 x86_64
Memory: 504.78467178344727 GB (508598.8125 MB free)
Uptime: 261586.0 sec
Load Avg:  0.08740234375  0.19384765625  0.8330078125
AMD Opteron(tm) Processor 6328                 : 
          speed         user         nice          sys         idle         
 irq
#1-32  3199 MHz    1855973 s      23392 s     670932 s  834073187 s         
21 s

  BLAS: libopenblas (USE64BITINT NO_AFFINITY PILEDRIVER)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.5.0
Environment:
  TERM = screen
  PATH = 
/s/cmake-3.0.2/bin:/s/gcc-4.9.2/bin:./u/b/a/bates/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/s/std/bin:/usr/afsws/bin:
  WWW_HOME = http://www.stat.wisc.edu/
  JULIA_PKGDIR = /scratch/bates/.julia
  HOME = /u/b/a/bates

Package Directory: /scratch/bates/.julia/v0.4
2 required packages:
 - Distributions                 0.6.1
 - Docile                        0.3.2
5 additional packages:
 - ArrayViews                    0.4.8
 - Compat                        0.2.5
 - PDMats                        0.3.1
 - ParallelGLM                   0.0.0-             master (unregistered)
 - StatsBase                     0.6.10



Reply via email to