[ 
https://issues.apache.org/jira/browse/SPARK-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615502#comment-14615502
 ] 

Joseph K. Bradley commented on SPARK-6001:
------------------------------------------

[~yalamart]  This should probably be done under the Pipelines API, via the 
R-like stats design linked above.  I'd recommend we wait to include this until 
the initial (LinearRegression) PR for R-like stats is merged, after which this 
JIRA can follow that design as an example.

> K-Means clusterer should return the assignments of input points to clusters
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-6001
>                 URL: https://issues.apache.org/jira/browse/SPARK-6001
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.2.1
>            Reporter: Derrick Burns
>            Priority: Minor
>
> The K-Means clusterer returns a KMeansModel that contains the cluster 
> centers. However, when available, I suggest that the K-Means clusterer also 
> return an RDD of the assignments of the input data to the clusters. While the 
> assignments can be computed given the KMeansModel, why not return assignments 
> if they are available to save re-computation costs.
> The K-means implementation at 
> https://github.com/derrickburns/generalized-kmeans-clustering returns the 
> assignments when available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to