[ https://issues.apache.org/jira/browse/SPARK-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353854#comment-14353854 ]
Sean Owen commented on SPARK-6234: ---------------------------------- No, the thing that's not important here is the example implementation. It is not an example of using K-means in MLlib, but an example of a completely de novo, separate implementation of K-means that is provided as an example of using *Spark*. I don't know why Breeze or something that uses it would be slower though. The only thing here doing any serious computation is squaredDistance. That did change in 0.11: https://github.com/scalanlp/breeze/commit/5c26a9bceb1fbd621421fa459e1b1202e91f5e9b#diff-e9531f2d5b65b7140b75c0b1c4dab541 If you have the energy, a tightly-focused test case on this method that shows a performance hit would be useful to report against Breeze. I think all in all the positives of 0.11 outweigh negatives, but, this downside was not expected, if it is confirmed. If so it may not only affect this example. > 10% Performance regression with Breeze upgrade > ---------------------------------------------- > > Key: SPARK-6234 > URL: https://issues.apache.org/jira/browse/SPARK-6234 > Project: Spark > Issue Type: Bug > Reporter: Nishkam Ravi > > KMeans regresses by 10% with the Breeze upgrade from 0.10 to 0.11 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org