[ https://issues.apache.org/jira/browse/SPARK-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
DB Tsai updated SPARK-4708: --------------------------- Summary: Make k-mean runs two/three times faster with dense/sparse sample (was: k-mean runs two/three times faster with dense/sparse sample) > Make k-mean runs two/three times faster with dense/sparse sample > ---------------------------------------------------------------- > > Key: SPARK-4708 > URL: https://issues.apache.org/jira/browse/SPARK-4708 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: DB Tsai > > Note that the usage of `breezeSquaredDistance` in > `org.apache.spark.mllib.util.MLUtils.fastSquaredDistance` is in the critical > path, and breezeSquaredDistance is slow. We should replace it with our own > implementation. > Here is the benchmark against mnist8m dataset. > Before > DenseVector: 70.04secs > SparseVector: 59.05secs > With this PR > DenseVector: 30.58secs > SparseVector: 21.14secs -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org