Thanks!

Yes, in this setting I would stay away from the SharedArrays, due to the above 
reasons. (All processors see the same arrays, so they interfere all the time 
with their edits)

SharedArrays are good to 

1) share immutable input data accross local workers (no data being 
serialized/copied except for the SharedArray "metadata") 
2) store outputs, but only when each worker is responsible for a specific part 
of the output. 
3) 1+2 combined, when each worker manipulates its part of the array in-place.

The pmap version looks good to me as far as I can see!


Am 13.03.2015 um 18:09 schrieb Pieter Barendrecht <pjbarendre...@gmail.com>:

> Cheers. I uploaded the two scripts —
> 
> https://gist.github.com/pjbarendrecht/ee4eff971ec2073bfad6 (using 
> SharedArrays)
> https://gist.github.com/pjbarendrecht/617b73a36b4848634eae (using the pmap() 
> function) → use ParSet(10) to run 10,000 simulations.
> 
> Pieter
> 
> 
> On Friday, March 13, 2015 at 3:29:48 PM UTC, René Donner wrote:
> 
> Am 13.03.2015 um 16:20 schrieb Pieter Barendrecht <pjbare...@gmail.com>: 
> 
> > Thanks! I tried both approaches you suggested. Some results using 
> > SharedArrays (100,000 simulations) 
> > 
> > #workers #time 
> > 1 ~120s 
> > 3 ~42s 
> > 6 ~40s 
> > 
> > Short question. The first print statement after the for-loop is already 
> > executed before the for-loop ends. How do I prevent this from happening? 
> > 
> > Some results using the other approach (again 100,000 simulations) 
> > 
> > #workers #time 
> > 1 ~118s 
> > 2 ~60s 
> > 3 ~42s 
> > 4 ~38s 
> > 6 ~40s 
> > 6 ~40s 
> > 
> 
> Could you post a simplified code snippet? Either here on in a gist. It is 
> difficult to know what exactly you doing ;-) 
> 
> > Couple of questions. My equivalent of "myfunc_pure()" also requires a 
> > second argument. 
> 
> Is that argument changing, or is this there to switch between different 
> algorithms etc? 
> 
> > In addition, I don't make use of the "startindex" argument in the function. 
> > What's the common approach here? Next, there are actually multiple 
> > variables that should be returned, not just "result". 
> 
> You can always return (a,b,c) instead of a, i.e. a tuple. The function you 
> provide to reduce then has the following signature: myreducer(a::Tuple, 
> b::Tuple). Combine the tuples, and again return a tuple. 
> 
> > 
> > Overall, I'm a bit surprised that using more than 3 or 4 workers does not 
> > decrease the running time. Any ideas? I'm using Julia 0.3.6 on a 64bit Arch 
> > Linux system, Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz. 
> 
> Can be any number of things, could be the memory bandwidth being the limiting 
> factor, or that the computation is actually nicely sped up and a lot of what 
> you see is communication overhead. In that case, work on chunks of data / 
> batches of itertations, i.e. dont pmap over millions of things but only a 
> couple dozen. Looking at the code might shed some light. 
> 
> > 
> > On Friday, March 13, 2015 at 8:37:19 AM UTC, René Donner wrote: 
> > Perhaps SharedArrays are what you need here? 
> > http://docs.julialang.org/en/release-0.3/stdlib/parallel/?highlight=sharedarray#Base.SharedArray
> >  
> > 
> > Reading from a shared array in workers is fine, but when different workers 
> > try to update the same part of that array you will get racy behaviour and 
> > most likely not the correct result. 
> > 
> > Can you somehow re-formulate your problem along these lines, using a map 
> > and reduce approach using a pure function? 
> > 
> >   @everywhere function myfunc_pure(startindex) 
> >       result = zeros(Int,10) 
> >       for i in startindex + (0:19)  # 20 iterations 
> >           result[mod(i,length(result))+1] += 1 
> >       end 
> >       result 
> >   end 
> >   reduce(+,pmap(myfunc_pure, 1:5))  # 5 blocks of 20 iterations 
> > 
> > Like this you don't have a shared mutable state and thus no risk for 
> > mess-ups. 
> > 
> > 
> > 
> > 
> > Am 13.03.2015 um 00:56 schrieb Pieter Barendrecht <pjbare...@gmail.com>: 
> > 
> > > I'm wondering how to save data/results in a parallel for-loop. Let's 
> > > assume there is a single Int64 array, initialised using zeros() before 
> > > starting the for-loop. In the for-loop (typically ~100,000 iterations, 
> > > that's the reason I'm interested in parallel processing) the entries of 
> > > this Int64 array should be increased (based on the results of an 
> > > algorithm that's invoked in the for-loop). 
> > > 
> > > Everything works fine when using just a single proc, but I'm not sure how 
> > > to modify the code such that, when using e.g. addprocs(4), the 
> > > data/results stored in the Int64 array can be processed once the for-loop 
> > > ends. The algorithm (a separate function) is available to all procs 
> > > (using the require() function). Just using the Int64 array in the 
> > > for-loop (using @parallel for k=1:100000) does not work as each proc 
> > > receives its own copy, so after the for-loop it contains just zeros (as 
> > > illustrated in a set of slides on the Julia language). I guess it 
> > > involves @spawn and fetch() and/or pmap(). Any suggestions or examples 
> > > would be much appreciated :). 
> 

Reply via email to