Thanks for working on this! I think the flexibility of independent metadata and data storage and transfer is quite promising. I've added some comments
On Mon, Feb 12, 2024 at 8:45 PM Matt Topol <zotthewiz...@gmail.com> wrote: > Just pinging on this thread to hopefully encourage more comments and > engagement on the document. I still have to respond to a few of Antoine's > open comments, but so far there's only be the one individual who has given > feedback. > > I've added a large "background context" section at the top of the document > to hopefully make it easier for individuals to understand the motivation > and comment on the document. > > If there's no objections by the end of the week I'd like to propose a vote > to officially adopt this and start putting a PR together for updating the > docs. But personally I'd rather see more comments than it being adopted by > default of no one objecting. > > --Matt > > On Wed, Feb 7, 2024, 11:50 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Hi Matt > > > > Thanks for sharing. It looks interesting and I like the idea. > > > > Let me review the document and eventually add comments. > > > > Thanks ! > > Regards > > JB > > > > On Sat, Feb 3, 2024 at 12:22 AM Matt Topol <zotthewiz...@gmail.com> > wrote: > > > > > > Hey all, > > > > > > In my current work I've been experimenting and playing around with > > > utilizing Arrow and non-cpu memory data. While the creation of the > > > ArrowDeviceArray struct and the enhancements to the Arrow library > Device > > > abstractions were necessary, there is also a need to extend the > > > communications specs we utilize, i.e. Flight. > > > > > > Currently there is no real way to utilize Arrow Flight with shared > memory > > > or with non-CPU memory (without an expensive Device -> Host copy > first). > > To > > > this end I've done a bunch of research and toying around and came up > > with a > > > protocol to propose and a reference implementation using UCX[1]. > Attached > > > to the proposal is also a couple extensions for Flight itself to make > it > > > easier for users to still use Flight for metadata / dataset information > > and > > > then point consumers elsewhere to actually retrieve the data. The idea > > here > > > is that this would be a new specification for how to transport Arrow > data > > > across these high-performance transports such as UCX / libfabric / > shared > > > memory / etc. We wouldn't necessarily expose / directly add > > implementations > > > of the spec to the Arrow libraries, just provide reference/example > > > implementations. > > > > > > I've written the proposal up on a google doc[2] that everyone should be > > > able to comment on. Once we get some community discussion on there, if > > > everyone is okay with it I'd like eventually do a vote on adopting this > > > spec and if we do, I'll then make a PR to start adding it to the Arrow > > > documentation, etc. > > > > > > Anyways, thank you everyone in advance for your feedback and comments! > > > > > > --Matt > > > > > > [1]: https://github.com/openucx/ucx/ > > > [2]: > > > > > > https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit?usp=sharing > > >