[ 
https://issues.apache.org/jira/browse/SPARK-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591009#comment-14591009
 ] 

Joseph K. Bradley commented on SPARK-8402:
------------------------------------------

Feel free to go ahead and work on it.  I have not heard of too many users 
needing this feature, however, so it might be worth polling for interest/need 
(e.g., on the dev list).  Often, people just try several numbers of means and 
pick the smallest which gives decent results on their data.  But it's 
definitely worth considering.  Thanks!

Btw, I'll remove the target version.  A committer should set that since it's 
meant to be a commitment to get a feature in for a particular release.

> DP means clustering 
> --------------------
>
>                 Key: SPARK-8402
>                 URL: https://issues.apache.org/jira/browse/SPARK-8402
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Meethu Mathew
>              Labels: features
>
> At present, all the clustering algorithms in MLlib require the number of 
> clusters to be specified in advance. 
> The Dirichlet process (DP) is a popular non-parametric Bayesian mixture model 
> that allows for flexible clustering of data without having to specify apriori 
> the number of clusters. 
> DP means is a non-parametric clustering algorithm that uses a scale parameter 
> 'lambda' to control the creation of new clusters["Revisiting k-means: New 
> Algorithms via Bayesian Nonparametrics" by Brian Kulis, Michael I. Jordan].
> We have followed the distributed implementation of DP means which has been 
> proposed in the paper titled "MLbase: Distributed Machine Learning Made Easy" 
> by Xinghao Pan, Evan R. Sparks, Andre Wibisono.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to