Hey All, It's been another month and we've gotten a whole bunch of feedback
and engagement on the document from a variety of individuals. Myself and a
few others have proactively attempted to reach out to as many third parties
as we could, hoping to pull more engagement also. While it would be great
to get even more feedback, the comments have slowed down and we haven't
gotten anything in a few days at this point.

If there's no objections, I'd like to try to open up for voting again to
officially adopt this as a protocol to add to our docs.

Thanks all!

--Matt

On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pgwha...@gmail.com> wrote:

> Agreed that it makes sense not to focus on in-place updating for this
> proposal.  I’m not even sure it’s a great fit as a “general purpose” Arrow
> protocol, because of all the assumptions and restrictions required as you
> noted.
>
> I took another look at the proposal and don’t think there’s anything
> preventing in-place updating in the future - ultimately the data body could
> just be in the same location for subsequent messages.
>
> Thanks!
> Paul
>
> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <zotthewiz...@gmail.com> wrote:
>
> > > @pgwhalen: As a potential "end user developer," (and aspiring
> > contributor) this
> > immediately excited me when I first saw it.
> >
> > Yay! Good to hear that!
> >
> > > @pgwhalen: And it wasn't clear to me whether updating batches in
> > place (and the producer/consumer coordination that comes with that) was
> > supported or encouraged as part of the proposal.
> >
> > So, updating batches in place was not a particular use-case we were
> > targeting with this approach. Instead using shared memory to produce and
> > consume the buffers/batches without having to physically copy the data.
> > Trying to update a batch in place is a dangerous prospect for a number of
> > reasons, but as you've mentioned it can technically be made safe if the
> > shape is staying the same and you're only modifying fixed-width data
> types
> > (i.e. not only is the *shape* unchanged but the sizes of the underlying
> > data buffers are also remaining unchanged). The producer/consumer
> > coordination that would be needed for updating batches in place is not
> part
> > of this proposal but is definitely something we can look into as a
> > follow-up to this for extending it. There's a number of discussions that
> > would need to be had around that so I don't want to add on another
> > complexity to this already complex proposal.
> >
> > That said, if you or anyone see something in this proposal that would
> > hinder or prevent being able to use it for your use case please let me
> know
> > so we can address it. Even though the proposal as it currently exists
> > doesn't fully support the in-place updating of batches, I don't want to
> > make things harder for us in such a follow-up where we'd end up requiring
> > an entirely new protocol to support that.
> >
> > > @octalene.dev: I know of a third party that is interested in Arrow for
> > HPC environments that could be interested in the proposal and I can see
> if
> > they're interested in providing feedback.
> >
> > Awesome! Thanks much!
> >
> >
> > For reference to anyone who hasn't looked at the document in a while,
> since
> > the original discussion thread on this I have added a full "Background
> > Context" page to the beginning of the proposal to help anyone who isn't
> > already familiar with the issues this protocol is trying to solve or
> isn't
> > already familiar with ucx or libfabric transports to better understand
> > *why* I'm
> > proposing this and what it is trying to solve. The point of this
> background
> > information is to help ensure that anyone who might have thoughts on
> > protocols in general or APIs should still be able to understand the base
> > reasons and goals that we're trying to achieve with this protocol
> proposal.
> > You don't need to already understand managing GPU/device memory or ucx to
> > be able to have meaningful input on the document.
> >
> > Thanks again to all who have contributed so far and please spread to any
> > contacts that you think might be interested in this for their particular
> > use cases.
> >
> > --Matt
> >
> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin <octalene....@pm.me.invalid>
> wrote:
> >
> > > I am interested in this as well, but I haven't gotten to a point where
> I
> > > can have valuable input (I haven't tried other transports). I know of a
> > > third party that is interested in Arrow for HPC environments that could
> > be
> > > interested in the proposal and I can see if they're interested in
> > providing
> > > feedback.
> > >
> > > I glanced at the document before but I'll go through again to see if
> > there
> > > is anything I can comment on.
> > >
> > >
> > >
> > > # ------------------------------
> > > # Aldrin
> > >
> > >
> > > https://github.com/drin/
> > > https://gitlab.com/octalene
> > > https://keybase.io/octalene
> > >
> > >
> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
> > pgwha...@gmail.com>
> > > wrote:
> > >
> > > > As a potential "end user developer," (and aspiring contributor) this
> > > > immediately excited me when I first saw it.
> > > >
> > >
> > > > I work at a trading firm, and my team has developed an IPC mechanism
> > for
> > > > efficiently transmitting pandas dataframes both remotely via TCP and
> > > > locally via shared memory, where the interface for the application
> > > > developer is the same for both. The data in the dataframes may change
> > > > rapidly, so when communicating locally via shared memory, if the
> shape
> > of
> > > > the dataframe doesn't change, we update the memory in place,
> > coordinating
> > > > between the producer and consumer via TCP.
> > > >
> > >
> > > > We intend to move away from our remote TCP mechanism towards Arrow
> > > Flight,
> > > > or a lighter-weight version of Arrow IPC. For the local shared memory
> > > > mechanism which we previously did not have a good answer for, it
> seems
> > > like
> > > > Disassociated Arrow IPC maps quite well to our problem.
> > > >
> > >
> > > > So some features that enable our use case are:
> > > > - Updating existing batches in place is supported
> > > > - The interface is pretty similar to Flight
> > > >
> > >
> > > > I'd imagine we're not the only financial firm to implement something
> > like
> > > > this, given how widespread pandas usage is, so that could be a place
> to
> > > > seek feedback.
> > > >
> > >
> > > > As I was reading the proposal initially, I gleaned that the most
> > > important
> > > > audience was those writing interfaces to GPUs/remote
> > memory/non-standard
> > > > transports/etc. And it wasn't clear to me whether updating batches in
> > > > place (and the producer/consumer coordination that comes with that)
> was
> > > > supported or encouraged as part of the proposal. But regardless, as
> an
> > > end
> > > > user, this seems like an easier and more efficient way to glue pieces
> > in
> > > > the Arrow ecosystem together if it was adopted broadly.
> > > >
> > >
> > > > Paul
> > > >
> > >
> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol zotthewiz...@gmail.com
> > wrote:
> > > >
> > >
> > > > > I'll continue my efforts of trying to reach out to other interested
> > > > > parties, but if anyone else here has any contacts or connections
> that
> > > they
> > > > > think might be interested please forward them the link to the
> Google
> > > doc.
> > > > >
> > >
> > > > > I really do want to get as much engagement and feedback as possible
> > on
> > > > > this.
> > > > >
> > >
> > > > > Thanks!
> > > > >
> > >
> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney wesmck...@gmail.com
> > wrote:
> > > > >
> > >
> > > > > > Have there been efforts to proactively reach out to other third
> > > parties
> > > > > > that might have an interest in this or be a potential user at
> some
> > > point?
> > > > > > There are a lot of interested parties in Arrow that may not
> > actively
> > > > > > follow
> > > > > > the mailing list.
> > > > > >
> > >
> > > > > > Seems like folks from the Dask, Ray, RAPIDS (especially folks at
> > > NVIDIA
> > > > > > or
> > > > > > working on UCX), or other communities like that might have
> > > constructive
> > > > > > thoughts about this. DLPack (
> https://dmlc.github.io/dlpack/latest/
> > )
> > > also
> > > > > > seems adjacent and worth reaching out to. Other ideas for
> projects
> > or
> > > > > > companies that could be reached out to for feedback.
> > > > > >
> > >
> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou
> anto...@python.org
> > > > > > wrote:
> > > > > >
> > >
> > > > > > > If there's no engagement, then I'm afraid it might mean that
> > third
> > > > > > > parties have no interest in this. I don't really have any
> > solution
> > > for
> > > > > > > generating engagement except nagging and pinging people
> > explicitly
> > > :-)
> > > > > > >
> > >
> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> > > > > > >
> > >
> > > > > > > > I would like to see the same Antoine, currently given the
> lack
> > of
> > > > > > > > engagement (both for OR against) I was going to take the
> > silence
> > > as
> > > > > > > > assent
> > > > > > > > and hope for non-Voltron Data PMC members to vote in this.
> > > > > > > >
> > >
> > > > > > > > If anyone has any suggestions on how we could potentially
> > > generate
> > > > > > > > more
> > > > > > > > engagement and discussion on this, please let me know as I
> want
> > > as
> > > > > > > > many
> > > > > > > > parties in the community as possible to be part of this.
> > > > > > > >
> > >
> > > > > > > > Thanks everyone.
> > > > > > > >
> > >
> > > > > > > > --Matt
> > > > > > > >
> > >
> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
> > > anto...@python.org
> > > > > > > > wrote:
> > > > > > > >
> > >
> > > > > > > > > Hello,
> > > > > > > > >
> > >
> > > > > > > > > I'd really like to see more engagement and criticism from
> > > > > > > > > non-Voltron
> > > > > > > > > Data parties before this is formally adopted as an Arrow
> > spec.
> > > > > > > > >
> > >
> > > > > > > > > Regards
> > > > > > > > >
> > >
> > > > > > > > > Antoine.
> > > > > > > > >
> > >
> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> > > > > > > > >
> > >
> > > > > > > > > > Hey all,
> > > > > > > > > >
> > >
> > > > > > > > > > I'd like to propose a vote for us to officially adopt the
> > > protocol
> > > > > > > > > > described in the google doc[1] for Dissociated Arrow IPC
> > > > > > > > > > Transports.
> > > > > > > > > > This
> > > > > > > > > > proposal was originally discussed at 2. Once this
> proposal
> > is
> > > > > > > > > > adopted,
> > > > > > > > > > I
> > > > > > > > > > will work on adding the necessary documentation to the
> > Arrow
> > > > > > > > > > website
> > > > > > > > > > along
> > > > > > > > > > with examples etc.
> > > > > > > > > >
> > >
> > > > > > > > > > The vote will be open for at least 72 hours.
> > > > > > > > > >
> > >
> > > > > > > > > > [ ] +1 Accept this Proposal
> > > > > > > > > > [ ] +0
> > > > > > > > > > [ ] -1 Do not accept this proposal because...
> > > > > > > > > >
> > >
> > > > > > > > > > Thank you everyone!
> > > > > > > > > >
> > >
> > > > > > > > > > --Matt
> > > > > > > > > >
> > >
> > > > > > > > > > [1]:
> > > > >
> > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
> >
>

Reply via email to