[ https://issues.apache.org/jira/browse/SPARK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304099#comment-14304099 ]
Manoj Kumar edited comment on SPARK-5021 at 2/3/15 10:02 PM: ------------------------------------------------------------- Hi. I'm almost there. I have one last question. In this line, https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala#L223 I'm not sure how to do this, other than doing an own implementation which does not depend on NativeBlas for a SparseVector. Is that okay? was (Author: mechcoder): Hi. I'm almost there. I have one last question. In this line, https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala#L223 I'm not sure how to do this, other than doing an own implementation which does not depend on NativeBlas for sparse data. Is that okay? > GaussianMixtureEM should be faster for SparseVector input > --------------------------------------------------------- > > Key: SPARK-5021 > URL: https://issues.apache.org/jira/browse/SPARK-5021 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > Assignee: Manoj Kumar > > GaussianMixtureEM currently converts everything to dense vectors. It would > be nice if it were faster for SparseVectors (running in time linear in the > number of non-zero values). > However, this may not be too important since clustering should rarely be done > in high dimensions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org