[jira] [Commented] (SPARK-5021) GaussianMixtureEM should be faster for SparseVector input

Manoj Kumar (JIRA) Sun, 01 Feb 2015 22:38:13 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300927#comment-14300927
 ]


Manoj Kumar commented on SPARK-5021:
------------------------------------

I see that it is resolved in master.

What do you think should be the preferred datatype, to handle an array of 
SparseVectors? Do we use CoordinateMatrix? This might involve improving 
CoordinateMatrix to add additional functionality.

> GaussianMixtureEM should be faster for SparseVector input
> ---------------------------------------------------------
>
>                 Key: SPARK-5021
>                 URL: https://issues.apache.org/jira/browse/SPARK-5021
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>            Assignee: Manoj Kumar
>
> GaussianMixtureEM currently converts everything to dense vectors.  It would 
> be nice if it were faster for SparseVectors (running in time linear in the 
> number of non-zero values).
> However, this may not be too important since clustering should rarely be done 
> in high dimensions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-5021) GaussianMixtureEM should be faster for SparseVector input

Reply via email to