Thanks for the clarification Yibo, looking forward to the results. Even if it 
is a very hacky PoC it will be interesting to see how it affects performance, 
though as Keith points out there are benefits in general to UCX (or similar 
library), and we can work out the implementation plan from there.

To Benson's point - the work done to get UCX supported would pave the way to 
supporting other backends as well. I'm personally not familiar with UCX, MPI, 
etc. so is MPI here more about playing well with established practices or does 
it also offer potential hardware support/performance improvements like UCX 
would?

-David

On Wed, Oct 27, 2021, at 06:30, Benson Muite wrote:
> UCX is interesting, relatively new and seems like it may be easier to 
> integrate. MPI is the most commonly used backend for HPC. Influencing 
> the development of UCX is more difficult than influencing the 
> development of MPI, but both have a slower pace of development than 
> Arrow. One may want to consider support for multiple accelerators, Arrow 
> has CUDA support but SYCL support seems like it will fit well with a C++ 
> base for compute unit/node level parallelism and then UCX and/or MPI 
> support for multinode parallelism.
> 
> OLCF does support RAPIDS 
> https://docs.olcf.ornl.gov/software/analytics/nvidia-rapids.html so HPC 
> in the commercial cloud could also make of Arrow.
> 
> On 10/27/21 5:26 AM, Keith Kraus wrote:
> > Outside of just HPC, integrating UCX would potentially allow taking
> > advantage of its shared memory backend which would be interesting from a
> > performance perspective in the single-node, multi-process case in many
> > situations.
> > 
> > Not sure it's worth the UCX dependency in the long run, but would allow us
> > to experiment with a lot of different transport backends.
> > 
> > On Tue, Oct 26, 2021 at 10:10 PM Yibo Cai <yibo....@arm.com> wrote:
> > 
> >>
> >> On 10/26/21 10:02 PM, David Li wrote:
> >>> Hi Yibo,
> >>>
> >>> Just curious, has there been more thought on this from your/the HPC side?
> >>
> >> Yes. I will investigate the possible approach. Maybe build a quick (and
> >> dirty) POC test at first.
> >>
> >>>
> >>> I also realized we never asked, what is motivating Flight in this space
> >> in the first place? Presumably broader Arrow support in general?
> >>
> >> No special reason. Will be great if comes up with something useful, or
> >> an interesting experiment otherwise.
> >>
> >>>
> >>> -David
> >>>
> >>> On Fri, Sep 10, 2021, at 12:27, Micah Kornfield wrote:
> >>>>>
> >>>>> I would support doing the work necessary to get UCX (or really any
> >> other
> >>>>> transport) supported, even if it is a lot of work. (I'm hoping this
> >> clears
> >>>>> the path to supporting a Flight-to-browser transport as well; a few
> >>>>> projects seem to have rolled their own approaches but I think Flight
> >> itself
> >>>>> should really handle this, too.)
> >>>>
> >>>>
> >>>> Another possible technical approach is investigating to see if coming up
> >>>> with a  custom gRPC "channel" implementation for new transports .
> >>>> Searching around it seems like there were some defunct PRs trying to
> >>>> enable UCX as one, I didn't look closely enough at why they might have
> >>>> failed.
> >>>>
> >>>> On Thu, Sep 9, 2021 at 11:07 AM David Li <lidav...@apache.org> wrote:
> >>>>
> >>>>> I would support doing the work necessary to get UCX (or really any
> >> other
> >>>>> transport) supported, even if it is a lot of work. (I'm hoping this
> >> clears
> >>>>> the path to supporting a Flight-to-browser transport as well; a few
> >>>>> projects seem to have rolled their own approaches but I think Flight
> >> itself
> >>>>> should really handle this, too.)
> >>>>>
> >>>>>   From what I understand, you could tunnel gRPC over UCX as Keith
> >> mentions,
> >>>>> or directly use UCX, which is what it sounds like you are thinking
> >> about.
> >>>>> One idea we had previously was to stick to gRPC for 'control plane'
> >>>>> methods, and support alternate protocols only for 'data plane' methods
> >> like
> >>>>> DoGet - this might be more manageable, depending on what you have in
> >> mind.
> >>>>>
> >>>>> In general - there's quite a bit of work here, so it would help to
> >>>>> separate the work into phases, and share some more detailed
> >>>>> design/implementation plans, to make review more manageable. (I
> >> realize of
> >>>>> course this is just a general interest check right now.) Just splitting
> >>>>> gRPC/Flight is going to take a decent amount of work, and (from what
> >> little
> >>>>> I understand) using UCX means choosing from various communication
> >> methods
> >>>>> it offers and writing a decent amount of scaffolding code, so it would
> >> be
> >>>>> good to establish what exactly a 'UCX' transport means. (For instance,
> >>>>> presumably there's no need to stick to the Protobuf-based wire format,
> >> but
> >>>>> what format would we use?)
> >>>>>
> >>>>> It would also be good to expand the benchmarks, to validate the
> >>>>> performance we get from UCX and have a way to compare it against gRPC.
> >>>>> Anecdotally I've found gRPC isn't quite able to saturate a connection
> >> so it
> >>>>> would be interesting to see what other transports can do.
> >>>>>
> >>>>> Jed - how would you see MPI and Flight interacting? As another
> >>>>> transport/alternative to UCX? I admit I'm not familiar with the HPC
> >> space.
> >>>>>
> >>>>> About transferring commands with data: Flight already has an
> >> app_metadata
> >>>>> field in various places to allow things like this, it may be
> >> interesting to
> >>>>> combine with the ComputeIR proposal on this mailing list, and
> >> hopefully you
> >>>>> & your colleagues can take a look there as well.
> >>>>>
> >>>>> -David
> >>>>>
> >>>>> On Thu, Sep 9, 2021, at 11:24, Jed Brown wrote:
> >>>>>> Yibo Cai <yibo....@arm.com> writes:
> >>>>>>
> >>>>>>> HPC infrastructure normally leverages RDMA for fast data transfer
> >>>>> among
> >>>>>>> storage nodes and compute nodes. Computation tasks are dispatched to
> >>>>>>> compute nodes with best fit resources.
> >>>>>>>
> >>>>>>> Concretely, we are investigating porting UCX as Flight transport
> >>>>> layer.
> >>>>>>> UCX is a communication framework for modern networks. [1]
> >>>>>>> Besides HPC usage, many projects (spark, dask, blazingsql, etc) also
> >>>>>>> adopt UCX to accelerate network transmission. [2][3]
> >>>>>>
> >>>>>> I'm interested in this topic and think it's important that even if the
> >>>>> focus is direct to UCX, that there be some thought into MPI
> >>>>> interoperability and support for scalable collectives. MPI considers
> >> UCX to
> >>>>> be an implementation detail, but the two main implementations (MPICH
> >> and
> >>>>> Open MPI) support it and vendor implementations are all derived from
> >> these
> >>>>> two.
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > 
> 
> 

Reply via email to