Ben

With "large", I mean clusters containing more than 100 or 1000 nodes.
Also -- I have not measured this; depending on the application, the
non-scalability may be acceptable. For example, if Julia mostly farms
out processes that work mostly independently, then communication
overhead is much less relevant than if the processes communicate
tightly.

The reason I expect Julia not to scale is the way in which the
communication is set up. Currently, each pair of processes that
exchanges data needs to open a TCP port. For N processes with a
complex communication pattern, this requires O(N^2) TCP connections.
Other communication mechanism -- such as MPI -- do not require O(N^2)
work, but only e.g. O(N log N). This is possible in MPI since
efficient implementations include a routing or multiplexing mechanism
that is not present in TCP or Julia.

To make Julia use Infiniband, I would use MPI instead of TCP as
transport mechanism; this is probably the easiest high-level interface
to use, and it supports many other network types as well. Open MPI
<http://www.open-mpi.org/> is one well-known open-source MPI
implementation, but there are several widely used others as well.
There are also lower-level APIs to access Infiniband, but I have no
experience with them.

As is, MPI probably cannot be used as drop-in replacement for TCP,
since MPI uses its own startup mechanism. Instead of starting several
processes (e.g. via ssh) and having them look for and connect to each
other, one passes MPI a list of host names, and it then starts all
processes, including the main process. That is, instead of

julia -p 8 code.jl

one would write

mpirun -np 9 julia code.jl

I assume one can rather easily modify the cluster manager skip the ssh
and TCP connection part. When I looked at the cluster manager a few
weeks ago, then it depended a lot on TCP ports and port numbers. I
haven't looked in the mean time, but if this dependency was also
removed (so that the sending Julia worker process does not need to be
identified via a TCP socket), one could use MPI for communication.

-erik


On Mon, Oct 6, 2014 at 6:03 PM, Ben Arthur <bjarthu...@gmail.com> wrote:
> thanks for your post erik.
>
> could you please elaborate why Julia does not scale well to larger clusters?
>
> also, what would it take to get Julia to utilize InfiniBand were it
> available.
>
> thanks,
>
> ben
>
>
>
> On Wednesday, October 1, 2014 12:26:11 PM UTC-4, Erik Schnetter wrote:
>>
>> For example, when running Julia on a cluster, there may be special
>> high-speed low-latency communication hardware such as InfiniBand
>> interconnects. Currently, Julia would not use these, but MPI would,
>> giving MPI an unfair advantage until Julia is using such hardware as
>> well.
>>
>> Similarly, Julia's current cluster manager should work fine and
>> efficiently if you are using (say) 10 workstations. However, if you
>> are using many more -- say 1000 -- then Julia's current cluster
>> manager implementation will not scale. Again, this is only a
>> limitation of the current implementation, not of the high-level API
>> that Julia is offering, and I'm sure this will be improved in time.



-- 
Erik Schnetter <schnet...@cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/

Reply via email to