[GitHub] spark issue #14937: [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KMeans i...

yanboliang Sun, 18 Sep 2016 09:06:19 -0700

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/14937
  
    @srowen Yes, I'm working on this. You can see the performance test result 
in the PR description. We can found that the optimization k-means can get 
performance improvements about 2 ~ 4 times by using native BLAS level 3 
matrix-matrix multiplications for dense input. However, we saw performance 
degradation for sparse input. For example, the new implementation spent almost 
twice time as much as the old one when training k-means model on the famous 
mnist data set.
    In the view of the current performance test result, I think we should only 
make this optimization for dense input and let sparse input still run the old 
code.
    I have sent the performance test result to @mengxr and waiting for his 
opinion. I'm also appreciate your thoughts and suggestions. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14937: [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KMeans i...

Reply via email to