Hi Fabian,

It looks like that SO answer is moving data into the global scope of each 
worker. It is probably worth experimenting with but I'd be worried about 
performance implications of non-const global variables. It's probably the 
case that this is still a win for my use case though. Thanks for the link.

I'm using alm2map and map2alm from LibHealpix.jl (
https://github.com/mweastwood/LibHealpix.jl) to do the spherical harmonic 
transforms.

Michael

On Thursday, May 19, 2016 at 12:45:05 AM UTC-7, Fabian Gans wrote:
>
> Hi Michael, 
>
> I recently had a similar problem and this SO thread helped me a lot: 
> http://stackoverflow.com/questions/27677399/julia-how-to-copy-data-to-another-processor-in-julia
>
> As a side question: Which code are you using to calculate spherical 
> harmonics transforms. I was looking for a julia package some time ago and 
> did not find any, so if your code is publicly available, could you point me 
> to it? 
>
> Thanks
> Fabian
>
>
>
> On Wednesday, May 18, 2016 at 9:28:34 PM UTC+2, Michael Eastwood wrote:
>>
>> Hi julia-users!
>>
>> I need some help with distributing a workload across tens of workers 
>> across several machines.
>>
>> The problem I am trying to solve involves calculating the elements of a 
>> large block diagonal matrix. The total size of the blocks is >500 GB, so I 
>> cannot store the entire thing in memory. The way the calculation works, I 
>> do a bunch of spherical harmonic transforms and the results give me one row 
>> in each block of the matrix.
>>
>> The following code illustrates what I am doing currently. I am 
>> distributing the spherical harmonic transforms amongst all the workers and 
>> bringing the data back to the master process to write the results to disk 
>> (the master process has each matrix block mmapped to disk).
>>
>>     idx = 1
>>     limit = 10000
>>     nextidx() = (myidx = idx; idx += 1; myidx)
>>     @sync for worker in workers()
>>         @async while true
>>             myidx = nextidx()
>>             myidx ≤ limit || break
>>             coefficients = 
>> remotecall_fetch(spherical_harmonic_transforms, worker, input[myidx])
>>             write_results_to_disk(coefficients)
>>         end
>>     end
>>
>> Each spherical harmonic transform takes O(10 seconds) so I thought the 
>> data movement cost would be negligible compared to this. However, if I have 
>> three machines each with 16 workers, machine 1 will have all 16 workers 
>> working hard (the master process is on machine 1) and machines 2&3 will 
>> have most of their workers idling. My hypothesis is that the cost of moving 
>> the data to and from the workers is preventing machines 2&3 from being 
>> fully utilized.
>>
>> coefficients is a vector of a million Complex128s (16 MB)
>> input is composed of two parts: 1) a vector of 10 million Float64s (100 
>> MB) and 2) a small amount of additional information that is negligibly 
>> small compared to the first part.
>>
>> The trick is that the first part of input (the 100 MB vector) doesn't 
>> change between iterations. So I could alleviate most of the data movement 
>> problem by moving that part to each worker once. Problem is that I can't 
>> seem to figure out how to do that. The manual (
>> http://docs.julialang.org/en/release-0.4/manual/parallel-computing/#remoterefs-and-abstractchannels)
>>  
>> is a little thin on how to use RemoteRefs.
>>
>> So how do you move data to workers in a way that it can be re-used on 
>> subsequent iterations? An example in the manual would be very helpful!
>>
>> Thanks,
>> Michael
>>
>

Reply via email to