Outside of just HPC, integrating UCX would potentially allow taking
advantage of its shared memory backend which would be interesting from a
performance perspective in the single-node, multi-process case in many
situations.

Not sure it's worth the UCX dependency in the long run, but would allow us
to experiment with a lot of different transport backends.

On Tue, Oct 26, 2021 at 10:10 PM Yibo Cai <yibo....@arm.com> wrote:

>
> On 10/26/21 10:02 PM, David Li wrote:
> > Hi Yibo,
> >
> > Just curious, has there been more thought on this from your/the HPC side?
>
> Yes. I will investigate the possible approach. Maybe build a quick (and
> dirty) POC test at first.
>
> >
> > I also realized we never asked, what is motivating Flight in this space
> in the first place? Presumably broader Arrow support in general?
>
> No special reason. Will be great if comes up with something useful, or
> an interesting experiment otherwise.
>
> >
> > -David
> >
> > On Fri, Sep 10, 2021, at 12:27, Micah Kornfield wrote:
> >>>
> >>> I would support doing the work necessary to get UCX (or really any
> other
> >>> transport) supported, even if it is a lot of work. (I'm hoping this
> clears
> >>> the path to supporting a Flight-to-browser transport as well; a few
> >>> projects seem to have rolled their own approaches but I think Flight
> itself
> >>> should really handle this, too.)
> >>
> >>
> >> Another possible technical approach is investigating to see if coming up
> >> with a  custom gRPC "channel" implementation for new transports .
> >> Searching around it seems like there were some defunct PRs trying to
> >> enable UCX as one, I didn't look closely enough at why they might have
> >> failed.
> >>
> >> On Thu, Sep 9, 2021 at 11:07 AM David Li <lidav...@apache.org> wrote:
> >>
> >>> I would support doing the work necessary to get UCX (or really any
> other
> >>> transport) supported, even if it is a lot of work. (I'm hoping this
> clears
> >>> the path to supporting a Flight-to-browser transport as well; a few
> >>> projects seem to have rolled their own approaches but I think Flight
> itself
> >>> should really handle this, too.)
> >>>
> >>>  From what I understand, you could tunnel gRPC over UCX as Keith
> mentions,
> >>> or directly use UCX, which is what it sounds like you are thinking
> about.
> >>> One idea we had previously was to stick to gRPC for 'control plane'
> >>> methods, and support alternate protocols only for 'data plane' methods
> like
> >>> DoGet - this might be more manageable, depending on what you have in
> mind.
> >>>
> >>> In general - there's quite a bit of work here, so it would help to
> >>> separate the work into phases, and share some more detailed
> >>> design/implementation plans, to make review more manageable. (I
> realize of
> >>> course this is just a general interest check right now.) Just splitting
> >>> gRPC/Flight is going to take a decent amount of work, and (from what
> little
> >>> I understand) using UCX means choosing from various communication
> methods
> >>> it offers and writing a decent amount of scaffolding code, so it would
> be
> >>> good to establish what exactly a 'UCX' transport means. (For instance,
> >>> presumably there's no need to stick to the Protobuf-based wire format,
> but
> >>> what format would we use?)
> >>>
> >>> It would also be good to expand the benchmarks, to validate the
> >>> performance we get from UCX and have a way to compare it against gRPC.
> >>> Anecdotally I've found gRPC isn't quite able to saturate a connection
> so it
> >>> would be interesting to see what other transports can do.
> >>>
> >>> Jed - how would you see MPI and Flight interacting? As another
> >>> transport/alternative to UCX? I admit I'm not familiar with the HPC
> space.
> >>>
> >>> About transferring commands with data: Flight already has an
> app_metadata
> >>> field in various places to allow things like this, it may be
> interesting to
> >>> combine with the ComputeIR proposal on this mailing list, and
> hopefully you
> >>> & your colleagues can take a look there as well.
> >>>
> >>> -David
> >>>
> >>> On Thu, Sep 9, 2021, at 11:24, Jed Brown wrote:
> >>>> Yibo Cai <yibo....@arm.com> writes:
> >>>>
> >>>>> HPC infrastructure normally leverages RDMA for fast data transfer
> >>> among
> >>>>> storage nodes and compute nodes. Computation tasks are dispatched to
> >>>>> compute nodes with best fit resources.
> >>>>>
> >>>>> Concretely, we are investigating porting UCX as Flight transport
> >>> layer.
> >>>>> UCX is a communication framework for modern networks. [1]
> >>>>> Besides HPC usage, many projects (spark, dask, blazingsql, etc) also
> >>>>> adopt UCX to accelerate network transmission. [2][3]
> >>>>
> >>>> I'm interested in this topic and think it's important that even if the
> >>> focus is direct to UCX, that there be some thought into MPI
> >>> interoperability and support for scalable collectives. MPI considers
> UCX to
> >>> be an implementation detail, but the two main implementations (MPICH
> and
> >>> Open MPI) support it and vendor implementations are all derived from
> these
> >>> two.
> >>>>
> >>>
> >>
> >
>

Reply via email to