[
https://issues.apache.org/jira/browse/SPARK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305058#comment-14305058
]
Manoj Kumar commented on SPARK-5021:
------------------------------------
Hmm. I figured it out, it is because I have something like this.
val trainData = {
if sparse
data.map(sample => sample.asInstanceOf[SparseVector]).cache()
else
data.map(u => u.toBreeze.toDenseVector).cache()
Now since trainData can have two possible types, this statement returns an
error.
val sums = {
if (isSparse) {
val compute = sc.broadcast(ExpectationSum.addSparse(weights,
gaussians)_)
trainData.aggregate(ExpectationSum.zero(k, d))(compute.value, _ += _)
}
else {
val compute = sc.broadcast(ExpectationSum.add(weights, gaussians)_)
trainData.aggregate(ExpectationSum.zero(k, d))(compute.value, _ += _)
}
}
[error] found : (org.apache.spark.mllib.clustering.ExpectationSum,
org.apache.spark.mllib.linalg.SparseVector) =>
org.apache.spark.mllib.clustering.ExpectationSum
[error] required: (org.apache.spark.mllib.clustering.ExpectationSum, _0) =>
org.apache.spark.mllib.clustering.ExpectationSum
[error] trainData.aggregate(ExpectationSum.zero(k, d))(compute.value,
_ += _)
What it the best way to overcome this?
> GaussianMixtureEM should be faster for SparseVector input
> ---------------------------------------------------------
>
> Key: SPARK-5021
> URL: https://issues.apache.org/jira/browse/SPARK-5021
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 1.3.0
> Reporter: Joseph K. Bradley
> Assignee: Manoj Kumar
>
> GaussianMixtureEM currently converts everything to dense vectors. It would
> be nice if it were faster for SparseVectors (running in time linear in the
> number of non-zero values).
> However, this may not be too important since clustering should rarely be done
> in high dimensions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]