Hey all,

In my current work I've been experimenting and playing around with
utilizing Arrow and non-cpu memory data. While the creation of the
ArrowDeviceArray struct and the enhancements to the Arrow library Device
abstractions were necessary, there is also a need to extend the
communications specs we utilize, i.e. Flight.

Currently there is no real way to utilize Arrow Flight with shared memory
or with non-CPU memory (without an expensive Device -> Host copy first). To
this end I've done a bunch of research and toying around and came up with a
protocol to propose and a reference implementation using UCX[1]. Attached
to the proposal is also a couple extensions for Flight itself to make it
easier for users to still use Flight for metadata / dataset information and
then point consumers elsewhere to actually retrieve the data. The idea here
is that this would be a new specification for how to transport Arrow data
across these high-performance transports such as UCX / libfabric / shared
memory / etc. We wouldn't necessarily expose / directly add implementations
of the spec to the Arrow libraries, just provide reference/example
implementations.

I've written the proposal up on a google doc[2] that everyone should be
able to comment on. Once we get some community discussion on there, if
everyone is okay with it I'd like eventually do a vote on adopting this
spec and if we do, I'll then make a PR to start adding it to the Arrow
documentation, etc.

Anyways, thank you everyone in advance for your feedback and comments!

--Matt

[1]: https://github.com/openucx/ucx/
[2]:
https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit?usp=sharing

Reply via email to