"David Li" <lidav...@apache.org> writes: > Thanks for the clarification Yibo, looking forward to the results. Even if it > is a very hacky PoC it will be interesting to see how it affects performance, > though as Keith points out there are benefits in general to UCX (or similar > library), and we can work out the implementation plan from there. > > To Benson's point - the work done to get UCX supported would pave the way to > supporting other backends as well. I'm personally not familiar with UCX, MPI, > etc. so is MPI here more about playing well with established practices or > does it also offer potential hardware support/performance improvements like > UCX would?
There are two main implementations of MPI, MPICH and Open MPI, both of which are permissively licensed open source community projects. Both have direct support for UCX and unless your needs are very specific, the overhead of going through MPI is likely to be negligible. Both also have proprietary derivatives, such as Cray MPI (MPICH derivative) and Spectrum MPI (Open MPI derivative), which may have optimizations for proprietary networks. Both MPICH and Open MPI can be built without UCX, and this is often easier (UCX 'master' is more volatile in my experience). The vast majority of distributed memory scientific applications use MPI or higher level libraries, rather than writing directly to UCX (which provides less coverage of HPC networks). I think MPI compatibility is important. >From way up-thread (sorry): >> >>>>> Jed - how would you see MPI and Flight interacting? As another >> >>>>> transport/alternative to UCX? I admit I'm not familiar with the HPC >> >>>>> space. MPI has collective operations like MPI_Allreduce (perform a reduction and give every process the result; these run in log(P) or better time with small constants -- 15 microseconds is typical for a cheap reduction operation on a million processes). MPI supports user-defined operations for reductions and prefix-scan operations. If we defined MPI_Ops for Arrow types, we could compute summary statistics and other algorithmic building blocks fast at arbitrary scale. The collective execution model might not be everyone's bag, but MPI_Op can also be used in one-sided operations (MPI_Accumulate and MPI_Fetch_and_op) and dropping into collective mode has big advantages for certain algorithms in computational statistics/machine learning.