I do not think there is an urgency to remove Plasma from the Arrow
codebase (as it currently does not cause much maintenance burden), but
the reality is that Ray has already hard-forked and so new maintainers
will need to come out of the woodwork to help support the project if
it is to continue having a life of its own. I started this thread to
create more awareness of the issue so that existing Plasma
stakeholders can make themselves known and possibly volunteer their
time to develop and maintain the codebase.

On Tue, Aug 18, 2020 at 12:02 PM Matthias Vallentin
<matth...@vallentin.net> wrote:
>
> We are very interested in Plasma as a stand-alone project. The fork would
> hit us doubly hard, because it reduces both the appeal of an Arrow-specific
> use case as well as our planned Ray integration.
>
> We are developing effectively a database for network activity data that
> runs with Arrow as data plane. See https://github.com/tenzir/vast for
> details. One of our upcoming features is supporting a 1:N output channel
> using Plasma, where multiple downstream tools (Python/Pandas, R, Spark) can
> process the same data set that's exactly materialized in memory once. We
> currently don't have the developer bandwidth to prioritize this effort, but
> the concurrent, multi-tool processing capability was one of the main
> strategic reasons to go with Arrow as data plane. If Plasma has no future,
> Arrow has a reduced appeal for us in the medium term.
>
> We also have Ray as a data consumer on our roadmap, but the dependency
> chain seems now inverted. If we have to do costly custom plumbing for Ray,
> with a custom version of Plasma, the Ray integration will lose quite a bit
> of appeal because it doesn't fit into the existing 1:N model. That is, even
> though the fork may make sense from a Ray-internal point of view, it
> decreases the appeal of Ray from the outside. (Again, only speaking shared
> data plane here.)
>
> In the future, we're happy to contribute cycles when it comes to keeping
> Plasma as a useful standalone project. We recently made sure that static
> builds work as expected <https://github.com/apache/arrow/pull/7842>. As of
> now, we unfortunately cannot commit to anything specific though, but our
> interest extends to Gandiva, Flight, and lots of other parts of the Arrow
> ecosystem.
>
> On Tue, Aug 18, 2020 at 4:02 AM Robert Nishihara <robertnishih...@gmail.com>
> wrote:
>
> > To answer Wes's question, the Plasma inside of Ray is not currently usable
> >
> >
> > in a C++ library context, though it wouldn't be impossible to make that
> >
> >
> > happen.
> >
> >
> >
> >
> >
> > I (or someone) could conduct a simple poll via Google Forms on the user
> >
> >
> > mailing list to gauge demand if we are concerned about breaking a lot of
> >
> >
> > people's workflow.
> >
> >
> >
> >
> >
> > On Mon, Aug 17, 2020 at 3:21 AM Antoine Pitrou <anto...@python.org> wrote:
> >
> >
> >
> >
> >
> > >
> >
> >
> > > Le 15/08/2020 à 17:56, Wes McKinney a écrit :
> >
> >
> > > >
> >
> >
> > > > What isn't clear is whether the Plasma that's in Ray is usable in a
> >
> >
> > > > C++ library context (e.g. what we currently ship as libplasma-dev e.g.
> >
> >
> > > > on Ubuntu/Debian). That seems still useful, but if the project isn't
> >
> >
> > > > being actively maintained / developed (which, given the series of
> >
> >
> > > > stale PRs over the last year or two, it doesn't seem to be) it's
> >
> >
> > > > unclear whether we want to keep shipping it.
> >
> >
> > >
> >
> >
> > > At least on GitHub, the C++ API seems to be getting little use.  Most
> >
> >
> > > search results below are forks/copies of the Arrow or Ray codebases.
> >
> >
> > > There are also a couple stale experiments:
> >
> >
> > > https://github.com/search?l=C%2B%2B&p=1&q=PlasmaClient&type=Code
> >
> >
> > >
> >
> >
> > > Regards
> >
> >
> > >
> >
> >
> > > Antoine.
> >
> >
> > >
> >
> >
> >

Reply via email to