Nobody asked me, and this is a comment on a broader question, not this one, but:
In light of a number of recent items about adding more algorithms, I'll say that I personally think an explosion of algorithms should come after the MLlib "core" is more fully baked. I'm thinking of finishing out the changes to vectors and matrices, for example. Things are going to change significantly in the short term as people use the algorithms and see how well the abstractions do or don't work. I've seen another similar project suffer mightily from too many algorithms too early, so maybe I'm just paranoid. Anyway, long-term, I think lots of good algorithms is a right and proper goal for MLlib, myself. Consistent approaches, representations and APIs will make or break MLlib much more than having or not having a particular algorithm. With the plumbing in place, writing the algo is the fun easy part. -- Sean Owen | Director, Data Science | London On Mon, Apr 21, 2014 at 4:39 PM, Aliaksei Litouka <aliaksei.lito...@gmail.com> wrote: > Hi, Spark developers. > Are there any plans for implementing new clustering algorithms in MLLib? As > far as I understand, current version of Spark ships with only one > clustering algorithm - K-Means. I want to contribute to Spark and I'm > thinking of adding more clustering algorithms - maybe > DBSCAN<http://en.wikipedia.org/wiki/DBSCAN>. > I can start working on it. Does anyone want to join me?