[ 
https://issues.apache.org/jira/browse/SPARK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305171#comment-14305171
 ] 

Travis Galoppo commented on SPARK-5021:
---------------------------------------

[~MechCoder] You may be making things harder on yourself than necessary.  The 
current code maps the incoming vectors to dense breeze vectors, but you can 
simply map them to generic breeze vectors... ie

(GaussianMixture.scala: line 126) val breezeData = data.map(u => 
u.toBreeze.toDenseVector).cache()
=>
val breezeData = data.map(_.toBreeze).cache()

then genericize everything expecting a dense breeze vector/matrix to expect 
just a generic vector/matrix... when the time finally arrives where the cases 
must be separated, you can match on the variable, ie:

def foo(x: BreezeVector) = {
  x match {
    case dx: DenseBreezeVector => // do dense vector calculation
    case sx: SparseBreezeVector => // do sparse vector calculation
  }
}
...

I know this is kind of high level... but it could avoid a lot of dual-path code.


> GaussianMixtureEM should be faster for SparseVector input
> ---------------------------------------------------------
>
>                 Key: SPARK-5021
>                 URL: https://issues.apache.org/jira/browse/SPARK-5021
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>            Assignee: Manoj Kumar
>
> GaussianMixtureEM currently converts everything to dense vectors.  It would 
> be nice if it were faster for SparseVectors (running in time linear in the 
> number of non-zero values).
> However, this may not be too important since clustering should rarely be done 
> in high dimensions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to