Nobody asked me, and this is a comment on a broader question, not this
one, but:

In light of a number of recent items about adding more algorithms,
I'll say that I personally think an explosion of algorithms should
come after the MLlib "core" is more fully baked. I'm thinking of
finishing out the changes to vectors and matrices, for example. Things
are going to change significantly in the short term as people use the
algorithms and see how well the abstractions do or don't work. I've
seen another similar project suffer mightily from too many algorithms
too early, so maybe I'm just paranoid.

Anyway, long-term, I think lots of good algorithms is a right and
proper goal for MLlib, myself. Consistent approaches, representations
and APIs will make or break MLlib much more than having or not having
a particular algorithm. With the plumbing in place, writing the algo
is the fun easy part.
--
Sean Owen | Director, Data Science | London


On Mon, Apr 21, 2014 at 4:39 PM, Aliaksei Litouka
<aliaksei.lito...@gmail.com> wrote:
> Hi, Spark developers.
> Are there any plans for implementing new clustering algorithms in MLLib? As
> far as I understand, current version of Spark ships with only one
> clustering algorithm - K-Means. I want to contribute to Spark and I'm
> thinking of adding more clustering algorithms - maybe
> DBSCAN<http://en.wikipedia.org/wiki/DBSCAN>.
> I can start working on it. Does anyone want to join me?

Reply via email to