So are you suggesting that real numerical workloads under 
BLAS.set_num_threads(4) will indeed be faster than 
under BLAS.set_num_threads(2)?  That hasn't been my experience and I 
figured the peakflops() example would constitute an MWE.  Is there another 
workload you would suggest I try to figure out if this is just a peak 
flops() aberration or a real issue?

On Wednesday, October 19, 2016 at 8:28:16 PM UTC-5, Ralph Smith wrote:
>
> At least 2 things contribute to erratic results from peakflops(). With 
> hyperthreading, the threads are not always put on separate cores. Secondly, 
> the measured time includes
> the allocation of the result matrix, so garbage collection affects some of 
> the results.  Most available advice says to disable hyperthreading on 
> dedicated number crunchers
> (most full loads work slightly more efficiently without the extra context 
> switching).  The GC issue seems to be a mistake, if "peak" is to be taken 
> seriously.
>
> On Wednesday, October 19, 2016 at 12:04:00 PM UTC-4, Thomas Covert wrote:
>>
>> I have a recent iMac with 4 logical cores (and 8 hyper threads).  I would 
>> have thought that peakflops(N) for a large enough N should be increasing in 
>> the number of threads I allow BLAS to use.  I do find that peakflops(N) 
>> with 1 thread is about half as high as peakflops(N) with 2 threads, but 
>> there is no gain to 4 threads.  Are my expectations wrong here, or is it 
>> possible that BLAS is somehow configured incorrectly on my machine?  In the 
>> example below, N = 6755, a number relevant for my work, but the results are 
>> similar with 5000 or 10000.
>>
>> here is my versioninfo()
>> julia> versioninfo()
>> Julia Version 0.5.0
>> Commit 3c9d753* (2016-09-19 18:14 UTC)
>> Platform Info:
>>   System: Darwin (x86_64-apple-darwin15.6.0)
>>   CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
>>   WORD_SIZE: 64
>>   BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
>>   LAPACK: libopenblas
>>   LIBM: libopenlibm
>>   LLVM: libLLVM-3.7.1 (ORCJIT, haswell)
>>
>> here is an example peakflops() exercise:
>> julia> BLAS.set_num_threads(1)
>>
>> julia> mean(peakflops(6755) for i=1:10)
>> 5.225580459387056e10
>>
>> julia> BLAS.set_num_threads(2)
>>
>> julia> mean(peakflops(6755) for i=1:10)
>> 1.004317640281997e11
>>
>> julia> BLAS.set_num_threads(4)
>>
>> julia> mean(peakflops(6755) for i=1:10)
>> 9.838116463900085e10
>>
>>
>>
>>
>>
>>

Reply via email to