> On 05-Dec-2014, at 1:32 am, Douglas Bates <[email protected]> wrote:
> 
> On Thursday, December 4, 2014 1:50:06 PM UTC-6, Viral Shah wrote:
> > On 05-Dec-2014, at 1:16 am, Douglas Bates <[email protected] <javascript:>> 
> > wrote: 
> > 
> > Thanks, I'll try that.  I'm still curious as to why there is so little 
> > difference between 8 and 16 threads. 
> 
> peakflops() just performs a matrix multiplication to estimate the flops. It 
> uses a 2000x2000 matrix by default, which is good for most laptops, but for 
> bigger machines with more cores, one often needs to use a larger matrix to 
> see the speedup. 
> 
> peakflops(8000) should give a good indication. I am not sure what the running 
> time will be, so you may want to gradually increase the size. 
> 
> 
> 8000 is reasonable on this machine and it does stabilize the results from 
> repeated timings.  But I still have essentially no difference between 8 and 
> 16 threads.  I wonder if somehow the NUM_THREADS is being set to 8, although 
> looking in the deps/Makefile it does seem that it should be 16


I tried on julia.mit.edu, and I do see a scale up from 1->16 processors with 
peakflops(4000). That seems to suggest that the build is ok, and openblas can 
scale. I think it would be best to check with Xianyi about this - perhaps file 
an issue against OpenBLAS?

Perhaps someone here may have some other ideas too.

-viral


> 
> julia> blas_set_num_threads(4)
> 
> julia> [peakflops(8000)::Float64 for i in 1:6]
> 6-element Array{Float64,1}:
>  8.66823e10
>  8.65584e10
>  8.65692e10
>  8.64753e10
>  8.64083e10
>  8.63359e10
> 
> julia> blas_set_num_threads(8)
> 
> julia> [peakflops(8000)::Float64 for i in 1:6]
> 6-element Array{Float64,1}:
>  1.68008e11
>  1.67772e11
>  1.67378e11
>  1.67397e11
>  1.6746e11 
>  1.67623e11
> 
> julia> blas_set_num_threads(16)
> 
> julia> [peakflops(8000)::Float64 for i in 1:6]
> 6-element Array{Float64,1}:
>  1.66779e11
>  1.70068e11
>  1.698e11  
>  1.70419e11
>  1.70601e11
>  1.67226e11
> 
> 
>  
> -viral 
> 
> 
> 
> > 
> > -viral 
> > 
> > On Friday, December 5, 2014 1:00:39 AM UTC+5:30, Douglas Bates wrote: 
> > I have been working on a package https://github.com/dmbates/ParalllelGLM.jl 
> > <https://github.com/dmbates/ParalllelGLM.jl> and noticed some peculiarities 
> > in the timings on a couple of shared-memory servers, each with 32 cores.  
> > In particular changing from 16 workers to 32 workers actually slowed down 
> > the fitting process.  So I decided to check how changing the number of 
> > OpenBLAS threads affected the peakflops() result.  I end up with 
> > essentially the same results for 8, 16 and 32 threads on this machine with 
> > 32 cores.  Is that to be expected? 
> > 
> >    _       _ _(_)_     |  A fresh approach to technical computing 
> >   (_)     | (_) (_)    |  Documentation: http://docs.julialang.org 
> > <http://docs.julialang.org/> 
> >    _ _   _| |_  __ _   |  Type "help()" for help. 
> >   | | | | | | |/ _` |  | 
> >   | | |_| | | | (_| |  |  Version 0.4.0-dev+1944 (2014-12-04 15:06 UTC) 
> >  _/ |\__'_|_|_|\__'_|  |  Commit 87e9ee1* (0 days old master) 
> > |__/                   |  x86_64-unknown-linux-gnu 
> > 
> > julia> [peakflops()::Float64 for i in 1:6] 
> > 6-element Array{Float64,1}: 
> >  1.41151e11 
> >  1.1676e11 
> >  1.27597e11 
> >  1.27607e11 
> >  1.27518e11 
> >  1.27478e11 
> > 
> > julia> CPU_CORES 
> > 32 
> > 
> > julia> blas_set_num_threads(16) 
> > 
> > julia> [peakflops()::Float64 for i in 1:6] 
> > 6-element Array{Float64,1}: 
> >  1.23523e11 
> >  1.27119e11 
> >  1.11381e11 
> >  1.17847e11 
> >  1.28415e11 
> >  1.17998e11 
> > 
> > julia> blas_set_num_threads(8) 
> > 
> > julia> [peakflops()::Float64 for i in 1:6] 
> > 6-element Array{Float64,1}: 
> >  1.25194e11 
> >  1.20969e11 
> >  1.25777e11 
> >  1.20757e11 
> >  1.26086e11 
> >  1.20958e11 
> > 
> > julia> versioninfo(true) 
> > Julia Version 0.4.0-dev+1944 
> > Commit 87e9ee1* (2014-12-04 15:06 UTC) 
> > Platform Info: 
> >   System: Linux (x86_64-unknown-linux-gnu) 
> >   CPU: AMD Opteron(tm) Processor 6328                 
> >   WORD_SIZE: 64 
> >            "Red Hat Enterprise Linux Server release 6.5 (Santiago)" 
> >   uname: Linux 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 
> > 2013 x86_64 x86_64 
> > Memory: 504.78467178344727 GB (508598.8125 MB free) 
> > Uptime: 261586.0 sec 
> > Load Avg:  0.08740234375  0.19384765625  0.8330078125 
> > AMD Opteron(tm) Processor 6328                 : 
> >           speed         user         nice          sys         idle         
> >  irq 
> > #1-32  3199 MHz    1855973 s      23392 s     670932 s  834073187 s         
> > 21 s 
> > 
> >   BLAS: libopenblas (USE64BITINT NO_AFFINITY PILEDRIVER) 
> >   LAPACK: libopenblas 
> >   LIBM: libopenlibm 
> >   LLVM: libLLVM-3.5.0 
> > Environment: 
> >   TERM = screen 
> >   PATH = 
> > /s/cmake-3.0.2/bin:/s/gcc-4.9.2/bin:./u/b/a/bates/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/s/std/bin:/usr/afsws/bin:
> >  
> >   WWW_HOME = http://www.stat.wisc.edu/ <http://www.stat.wisc.edu/> 
> >   JULIA_PKGDIR = /scratch/bates/.julia 
> >   HOME = /u/b/a/bates 
> > 
> > Package Directory: /scratch/bates/.julia/v0.4 
> > 2 required packages: 
> >  - Distributions                 0.6.1 
> >  - Docile                        0.3.2 
> > 5 additional packages: 
> >  - ArrayViews                    0.4.8 
> >  - Compat                        0.2.5 
> >  - PDMats                        0.3.1 
> >  - ParallelGLM                   0.0.0-             master (unregistered) 
> >  - StatsBase                     0.6.10 
> > 
> > 
> > 
> 

Reply via email to