Ben With "large", I mean clusters containing more than 100 or 1000 nodes. Also -- I have not measured this; depending on the application, the non-scalability may be acceptable. For example, if Julia mostly farms out processes that work mostly independently, then communication overhead is much less relevant than if the processes communicate tightly.
The reason I expect Julia not to scale is the way in which the communication is set up. Currently, each pair of processes that exchanges data needs to open a TCP port. For N processes with a complex communication pattern, this requires O(N^2) TCP connections. Other communication mechanism -- such as MPI -- do not require O(N^2) work, but only e.g. O(N log N). This is possible in MPI since efficient implementations include a routing or multiplexing mechanism that is not present in TCP or Julia. To make Julia use Infiniband, I would use MPI instead of TCP as transport mechanism; this is probably the easiest high-level interface to use, and it supports many other network types as well. Open MPI <http://www.open-mpi.org/> is one well-known open-source MPI implementation, but there are several widely used others as well. There are also lower-level APIs to access Infiniband, but I have no experience with them. As is, MPI probably cannot be used as drop-in replacement for TCP, since MPI uses its own startup mechanism. Instead of starting several processes (e.g. via ssh) and having them look for and connect to each other, one passes MPI a list of host names, and it then starts all processes, including the main process. That is, instead of julia -p 8 code.jl one would write mpirun -np 9 julia code.jl I assume one can rather easily modify the cluster manager skip the ssh and TCP connection part. When I looked at the cluster manager a few weeks ago, then it depended a lot on TCP ports and port numbers. I haven't looked in the mean time, but if this dependency was also removed (so that the sending Julia worker process does not need to be identified via a TCP socket), one could use MPI for communication. -erik On Mon, Oct 6, 2014 at 6:03 PM, Ben Arthur <bjarthu...@gmail.com> wrote: > thanks for your post erik. > > could you please elaborate why Julia does not scale well to larger clusters? > > also, what would it take to get Julia to utilize InfiniBand were it > available. > > thanks, > > ben > > > > On Wednesday, October 1, 2014 12:26:11 PM UTC-4, Erik Schnetter wrote: >> >> For example, when running Julia on a cluster, there may be special >> high-speed low-latency communication hardware such as InfiniBand >> interconnects. Currently, Julia would not use these, but MPI would, >> giving MPI an unfair advantage until Julia is using such hardware as >> well. >> >> Similarly, Julia's current cluster manager should work fine and >> efficiently if you are using (say) 10 workstations. However, if you >> are using many more -- say 1000 -- then Julia's current cluster >> manager implementation will not scale. Again, this is only a >> limitation of the current implementation, not of the high-level API >> that Julia is offering, and I'm sure this will be improved in time. -- Erik Schnetter <schnet...@cct.lsu.edu> http://www.perimeterinstitute.ca/personal/eschnetter/