"David Li" <lidav...@apache.org> writes:

> Thanks for the clarification Yibo, looking forward to the results. Even if it 
> is a very hacky PoC it will be interesting to see how it affects performance, 
> though as Keith points out there are benefits in general to UCX (or similar 
> library), and we can work out the implementation plan from there.
>
> To Benson's point - the work done to get UCX supported would pave the way to 
> supporting other backends as well. I'm personally not familiar with UCX, MPI, 
> etc. so is MPI here more about playing well with established practices or 
> does it also offer potential hardware support/performance improvements like 
> UCX would?

There are two main implementations of MPI, MPICH and Open MPI, both of which 
are permissively licensed open source community projects. Both have direct 
support for UCX and unless your needs are very specific, the overhead of going 
through MPI is likely to be negligible. Both also have proprietary derivatives, 
such as Cray MPI (MPICH derivative) and Spectrum MPI (Open MPI derivative), 
which may have optimizations for proprietary networks. Both MPICH and Open MPI 
can be built without UCX, and this is often easier (UCX 'master' is more 
volatile in my experience).

The vast majority of distributed memory scientific applications use MPI or 
higher level libraries, rather than writing directly to UCX (which provides 
less coverage of HPC networks). I think MPI compatibility is important.

>From way up-thread (sorry):

>> >>>>> Jed - how would you see MPI and Flight interacting? As another
>> >>>>> transport/alternative to UCX? I admit I'm not familiar with the HPC
>> >>>>> space.

MPI has collective operations like MPI_Allreduce (perform a reduction and give 
every process the result; these run in log(P) or better time with small 
constants -- 15 microseconds is typical for a cheap reduction operation on a 
million processes). MPI supports user-defined operations for reductions and 
prefix-scan operations. If we defined MPI_Ops for Arrow types, we could compute 
summary statistics and other algorithmic building blocks fast at arbitrary 
scale.

The collective execution model might not be everyone's bag, but MPI_Op can also 
be used in one-sided operations (MPI_Accumulate and MPI_Fetch_and_op) and 
dropping into collective mode has big advantages for certain algorithms in 
computational statistics/machine learning.

Reply via email to