Thanks, just tried wrapping the for loop inside a function and it seems to make the @threads version slightly slower and serial version slightly faster, so I'm even further from the speedup I was hoping for! Reading through that Issue and linked ones, I guess I may not be the only one seeing this.
For ref, what I did: function myloop(inv_cl,d_cl,fish,ijs,nl) @threads for ij in ijs i,j = ij for l in 1:nl fish[i,j] += (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl[:,:,l]*d_cl[j][:,:,l]) end end end function test(nl,np) inv_cl = ones(3,3,nl) d_cl = Dict(i => ones(3,3,nl) for i=1:np) fish = zeros(np,np) ijs = [(i,j) for i=1:np, j=1:np] myloop(inv_cl,d_cl,fish,ijs,nl) end # with @threads @timeit test(3000,40) 1 loops, best of 3: 3.84 s per loop # without @threads @timeit test(3000,40) 1 loops, best of 3: 2.33 s per loop On Monday, August 29, 2016 at 6:50:15 PM UTC+2, Tim Holy wrote: > > Very quickly (train to catch!): try this https://github.com/JuliaLang/julia/ > > issues/17395#issuecomment-241911387 > <https://github.com/JuliaLang/julia/issues/17395#issuecomment-241911387> > and see if it helps. > > --Tim > > On Monday, August 29, 2016 9:22:09 AM CDT Marius Millea wrote: > > I've parallelized some code with @threads, but instead of a factor NCPUs > > speed improvement (for me, 8), I'm seeing rather a bit under a factor 2. > I > > suppose the answer may be that my bottleneck isn't computation, rather > > memory access. But during running the code, I see my CPU usage go to > 100% > > on all 8 CPUs, if it were memory access would I still see this? Maybe > the > > answer is yes, in which case memory access is likely the culprit; is > there > > some way to confirm this though? If no, how do I figure out what *is* > the > > culprit? > > > > Here's a stripped down version of my code, > > > > > > function test(nl,np) > > > > inv_cl = ones(3,3,nl) > > d_cl = Dict(i => ones(3,3,nl) for i=1:np) > > > > fish = zeros(np,np) > > ijs = [(i,j) for i=1:np, j=1:np] > > > > Threads.@threads for ij in ijs > > i,j = ij > > for l in 1:nl > > fish[i,j] += > (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl > > [:,:,l]*d_cl[j][:,:,l]) > > end > > end > > > > end > > > > > > # with the @threads > > @timeit test(3000,40) > > 1 loops, best of 3: 3.17 s per loop > > > > # now remove the @threads from above > > @timeit test(3000,40) > > 1 loops, best of 3: 4.42 s per loop > > > > > > > > Thanks. > > >