Thanks, just tried wrapping the for loop inside a function and it seems to 
make the @threads version slightly slower and serial version slightly 
faster, so I'm even further from the speedup I was hoping for! Reading 
through that Issue and linked ones, I guess I may not be the only one 
seeing this. 

For ref, what I did:

function myloop(inv_cl,d_cl,fish,ijs,nl)
    @threads for ij in ijs
        i,j = ij
        for l in 1:nl
            fish[i,j] += 
(2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl[:,:,l]*d_cl[j][:,:,l])
        end
    end
end

function test(nl,np)
    inv_cl = ones(3,3,nl)
    d_cl = Dict(i => ones(3,3,nl) for i=1:np)
        
    fish = zeros(np,np)
    ijs = [(i,j) for i=1:np, j=1:np]
    
    myloop(inv_cl,d_cl,fish,ijs,nl)
end

# with @threads
@timeit test(3000,40)
1 loops, best of 3: 3.84 s per loop

# without @threads
@timeit test(3000,40)
1 loops, best of 3: 2.33 s per loop







On Monday, August 29, 2016 at 6:50:15 PM UTC+2, Tim Holy wrote:
>
> Very quickly (train to catch!): try this https://github.com/JuliaLang/julia/ 
>
> issues/17395#issuecomment-241911387 
> <https://github.com/JuliaLang/julia/issues/17395#issuecomment-241911387> 
> and see if it helps. 
>
> --Tim 
>
> On Monday, August 29, 2016 9:22:09 AM CDT Marius Millea wrote: 
> > I've parallelized some code with @threads, but instead of a factor NCPUs 
> > speed improvement (for me, 8), I'm seeing rather a bit under a factor 2. 
> I 
> > suppose the answer may be that my bottleneck isn't computation, rather 
> > memory access. But during running the code, I see my CPU usage go to 
> 100% 
> > on all 8 CPUs, if it were memory access would I still see this? Maybe 
> the 
> > answer is yes, in which case memory access is likely the culprit; is 
> there 
> > some way to confirm this though? If no, how do I figure out what *is* 
> the 
> > culprit? 
> > 
> > Here's a stripped down version of my code, 
> > 
> > 
> > function test(nl,np) 
> > 
> >     inv_cl = ones(3,3,nl) 
> >     d_cl = Dict(i => ones(3,3,nl) for i=1:np) 
> > 
> >     fish = zeros(np,np) 
> >     ijs = [(i,j) for i=1:np, j=1:np] 
> > 
> >     Threads.@threads for ij in ijs 
> >         i,j = ij 
> >         for l in 1:nl 
> >             fish[i,j] += 
> (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl 
> > [:,:,l]*d_cl[j][:,:,l]) 
> >         end 
> >     end 
> > 
> > end 
> > 
> > 
> > # with the @threads 
> > @timeit test(3000,40) 
> > 1 loops, best of 3: 3.17 s per loop 
> > 
> > # now remove the @threads from above 
> > @timeit test(3000,40) 
> > 1 loops, best of 3: 4.42 s per loop 
> > 
> > 
> > 
> > Thanks. 
>
>
>

Reply via email to