Hi Michael, 

I recently had a similar problem and this SO thread helped me a lot: 

As a side question: Which code are you using to calculate spherical 
harmonics transforms. I was looking for a julia package some time ago and 
did not find any, so if your code is publicly available, could you point me 
to it? 


On Wednesday, May 18, 2016 at 9:28:34 PM UTC+2, Michael Eastwood wrote:
> Hi julia-users!
> I need some help with distributing a workload across tens of workers 
> across several machines.
> The problem I am trying to solve involves calculating the elements of a 
> large block diagonal matrix. The total size of the blocks is >500 GB, so I 
> cannot store the entire thing in memory. The way the calculation works, I 
> do a bunch of spherical harmonic transforms and the results give me one row 
> in each block of the matrix.
> The following code illustrates what I am doing currently. I am 
> distributing the spherical harmonic transforms amongst all the workers and 
> bringing the data back to the master process to write the results to disk 
> (the master process has each matrix block mmapped to disk).
>     idx = 1
>     limit = 10000
>     nextidx() = (myidx = idx; idx += 1; myidx)
>     @sync for worker in workers()
>         @async while true
>             myidx = nextidx()
>             myidx ≤ limit || break
>             coefficients = remotecall_fetch(spherical_harmonic_transforms, 
> worker, input[myidx])
>             write_results_to_disk(coefficients)
>         end
>     end
> Each spherical harmonic transform takes O(10 seconds) so I thought the 
> data movement cost would be negligible compared to this. However, if I have 
> three machines each with 16 workers, machine 1 will have all 16 workers 
> working hard (the master process is on machine 1) and machines 2&3 will 
> have most of their workers idling. My hypothesis is that the cost of moving 
> the data to and from the workers is preventing machines 2&3 from being 
> fully utilized.
> coefficients is a vector of a million Complex128s (16 MB)
> input is composed of two parts: 1) a vector of 10 million Float64s (100 
> MB) and 2) a small amount of additional information that is negligibly 
> small compared to the first part.
> The trick is that the first part of input (the 100 MB vector) doesn't 
> change between iterations. So I could alleviate most of the data movement 
> problem by moving that part to each worker once. Problem is that I can't 
> seem to figure out how to do that. The manual (
> http://docs.julialang.org/en/release-0.4/manual/parallel-computing/#remoterefs-and-abstractchannels)
> is a little thin on how to use RemoteRefs.
> So how do you move data to workers in a way that it can be re-used on 
> subsequent iterations? An example in the manual would be very helpful!
> Thanks,
> Michael

Reply via email to