Hi julia-users!

I need some help with distributing a workload across tens of workers across
several machines.

The problem I am trying to solve involves calculating the elements of a
large block diagonal matrix. The total size of the blocks is >500 GB, so I
cannot store the entire thing in memory. The way the calculation works, I
do a bunch of spherical harmonic transforms and the results give me one row
in each block of the matrix.

The following code illustrates what I am doing currently. I am distributing
the spherical harmonic transforms amongst all the workers and bringing the
data back to the master process to write the results to disk (the master
process has each matrix block mmapped to disk).

    idx = 1
    limit = 10000
    nextidx() = (myidx = idx; idx += 1; myidx)
    @sync for worker in workers()
        @async while true
            myidx = nextidx()
            myidx ≤ limit || break
            coefficients = remotecall_fetch(spherical_harmonic_transforms,
worker, input[myidx])
            write_results_to_disk(coefficients)
        end
    end

Each spherical harmonic transform takes O(10 seconds) so I thought the data
movement cost would be negligible compared to this. However, if I have
three machines each with 16 workers, machine 1 will have all 16 workers
working hard (the master process is on machine 1) and machines 2&3 will
have most of their workers idling. My hypothesis is that the cost of moving
the data to and from the workers is preventing machines 2&3 from being
fully utilized.

coefficients is a vector of a million Complex128s (16 MB)
input is composed of two parts: 1) a vector of 10 million Float64s (100 MB)
and 2) a small amount of additional information that is negligibly small
compared to the first part.

The trick is that the first part of input (the 100 MB vector) doesn't
change between iterations. So I could alleviate most of the data movement
problem by moving that part to each worker once. Problem is that I can't
seem to figure out how to do that. The manual (
http://docs.julialang.org/en/release-0.4/manual/parallel-computing/#remoterefs-and-abstractchannels)
is a little thin on how to use RemoteRefs.

So how do you move data to workers in a way that it can be re-used on
subsequent iterations? An example in the manual would be very helpful!

Thanks,
Michael

Reply via email to