Re: RDD MLLib Deprecation Question

Nick Pentreath Tue, 30 May 2017 06:42:06 -0700

The short answer is those distributed linalg parts will not go away.

In the medium term, it's much less likely that the distributed matrix
classes will be ported over to DataFrames (though the ideal would be to
have DataFrame-backed distributed matrix classes) - given the time and
effort it's taken just to port the various ML models and feature
transformers over to ML.

The current distributed matrices use the old mllib linear algebra
primitives for backing datastructures and ops, so those will have to be
ported at some point to the ml package vectors & matrices, though overall
functionality would remain the same initially I would expect.

There is https://issues.apache.org/jira/browse/SPARK-15882 that discusses
some of the ideas. The decision would still need to be made on the
higher-level API (whether it remains the same is current, or changes to be
DF-based, and/or changed in other ways, etc)

On Tue, 30 May 2017 at 15:33 John Compitello <jo...@broadinstitute.org>
wrote:

> Hey all,
>
> I see on the MLLib website that there are plans to deprecate the RDD based
> API for MLLib once the new ML API reaches feature parity with RDD based
> one. Are there currently plans to reimplement all the distributed linear
> algebra / matrices operations as part of this new API, or are these things
> just going away? Like, will there still be a BlockMatrix class for
> distributed multiplies?
>
> Best,
>
> John
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: RDD MLLib Deprecation Question

Reply via email to