[jira] [Comment Edited] (SPARK-14380) Review spark.ml parity for clustering

Xinh Huynh (JIRA) Mon, 06 Jun 2016 10:31:15 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-14380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314617#comment-15314617
 ]


Xinh Huynh edited comment on SPARK-14380 at 6/6/16 5:29 PM:
------------------------------------------------------------

Existing algorithms
* KMeans
** Param: initial model, bypassing the random initialization or k-means|| 
(SPARK-10780)
** PMML (SPARK-11237)
* Gaussian Mixture Model (GMM)
** Param: initial model (SPARK-15785, related to warm-start support, 
SPARK-11136)
* LDA
** In DistributedLDAModel: topDocumentsPerTopic, topTopicsPerDocument, 
topicAssignments (needed for parity?)
** In LocalLDAModel: topicDistribution (for just one document, like 
SPARK-10809, needed for parity?), topics (is this the same as topicsMatrix?)
* Bisecting KMeans
** computeCost on just one input point (does this fall under evaluation, which 
is not under consideration here?)

Did not create JIRA for LDA because the API seems to be evolving still.


was (Author: xinhh):
Existing algorithms
* KMeans
** Param: initial model, bypassing the random initialization or k-means|| 
(SPARK-10780)
** PMML (SPARK-11237)
* Gaussian Mixture Model (GMM)
** Param: initial model (will create JIRA, related to warm-start support, 
SPARK-11136)
* LDA
** In DistributedLDAModel: topDocumentsPerTopic, topTopicsPerDocument, 
topicAssignments (needed for parity?)
** In LocalLDAModel: topicDistribution (for just one document, like 
SPARK-10809, needed for parity?), topics (is this the same as topicsMatrix?)
* Bisecting KMeans
** computeCost on just one input point (does this fall under evaluation, which 
is not under consideration here?)


> Review spark.ml parity for clustering
> -------------------------------------
>
>                 Key: SPARK-14380
>                 URL: https://issues.apache.org/jira/browse/SPARK-14380
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> Review parity of spark.ml vs. spark.mllib to ensure spark.ml contains all 
> functionality. List all missing items.
> This only covers Scala since we can compare Scala vs. Python in spark.ml 
> itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14380) Review spark.ml parity for clustering

Reply via email to