We are very interested in Plasma as a stand-alone project. The fork would
hit us doubly hard, because it reduces both the appeal of an Arrow-specific
use case as well as our planned Ray integration.

We are developing effectively a database for network activity data that
runs with Arrow as data plane. See https://github.com/tenzir/vast for
details. One of our upcoming features is supporting a 1:N output channel
using Plasma, where multiple downstream tools (Python/Pandas, R, Spark) can
process the same data set that's exactly materialized in memory once. We
currently don't have the developer bandwidth to prioritize this effort, but
the concurrent, multi-tool processing capability was one of the main
strategic reasons to go with Arrow as data plane. If Plasma has no future,
Arrow has a reduced appeal for us in the medium term.

We also have Ray as a data consumer on our roadmap, but the dependency
chain seems now inverted. If we have to do costly custom plumbing for Ray,
with a custom version of Plasma, the Ray integration will lose quite a bit
of appeal because it doesn't fit into the existing 1:N model. That is, even
though the fork may make sense from a Ray-internal point of view, it
decreases the appeal of Ray from the outside. (Again, only speaking shared
data plane here.)

In the future, we're happy to contribute cycles when it comes to keeping
Plasma as a useful standalone project. We recently made sure that static
builds work as expected <https://github.com/apache/arrow/pull/7842>. As of
now, we unfortunately cannot commit to anything specific though, but our
interest extends to Gandiva, Flight, and lots of other parts of the Arrow
ecosystem.

On Tue, Aug 18, 2020 at 4:02 AM Robert Nishihara <robertnishih...@gmail.com>
wrote:

> To answer Wes's question, the Plasma inside of Ray is not currently usable
>
>
> in a C++ library context, though it wouldn't be impossible to make that
>
>
> happen.
>
>
>
>
>
> I (or someone) could conduct a simple poll via Google Forms on the user
>
>
> mailing list to gauge demand if we are concerned about breaking a lot of
>
>
> people's workflow.
>
>
>
>
>
> On Mon, Aug 17, 2020 at 3:21 AM Antoine Pitrou <anto...@python.org> wrote:
>
>
>
>
>
> >
>
>
> > Le 15/08/2020 à 17:56, Wes McKinney a écrit :
>
>
> > >
>
>
> > > What isn't clear is whether the Plasma that's in Ray is usable in a
>
>
> > > C++ library context (e.g. what we currently ship as libplasma-dev e.g.
>
>
> > > on Ubuntu/Debian). That seems still useful, but if the project isn't
>
>
> > > being actively maintained / developed (which, given the series of
>
>
> > > stale PRs over the last year or two, it doesn't seem to be) it's
>
>
> > > unclear whether we want to keep shipping it.
>
>
> >
>
>
> > At least on GitHub, the C++ API seems to be getting little use.  Most
>
>
> > search results below are forks/copies of the Arrow or Ray codebases.
>
>
> > There are also a couple stale experiments:
>
>
> > https://github.com/search?l=C%2B%2B&p=1&q=PlasmaClient&type=Code
>
>
> >
>
>
> > Regards
>
>
> >
>
>
> > Antoine.
>
>
> >
>
>
>

Reply via email to