[ 
https://issues.apache.org/jira/browse/SPARK-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353854#comment-14353854
 ] 

Sean Owen commented on SPARK-6234:
----------------------------------

No, the thing that's not important here is the example implementation. It is 
not an example of using K-means in MLlib, but an example of a completely de 
novo, separate implementation of K-means that is provided as an example of 
using *Spark*.

I don't know why Breeze or something that uses it would be slower though. The 
only thing here doing any serious computation is squaredDistance. That did 
change in 0.11:

https://github.com/scalanlp/breeze/commit/5c26a9bceb1fbd621421fa459e1b1202e91f5e9b#diff-e9531f2d5b65b7140b75c0b1c4dab541

If you have the energy, a tightly-focused test case on this method that shows a 
performance hit would be useful to report against Breeze. 

I think all in all the positives of 0.11 outweigh negatives, but, this downside 
was not expected, if it is confirmed. If so it may not only affect this example.

> 10% Performance regression with Breeze upgrade
> ----------------------------------------------
>
>                 Key: SPARK-6234
>                 URL: https://issues.apache.org/jira/browse/SPARK-6234
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Nishkam Ravi
>
> KMeans regresses by 10% with the Breeze upgrade from 0.10 to 0.11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to