Github user sethah commented on the issue: https://github.com/apache/spark/pull/14937 @yanboliang I ran some tests on a 3 node bare-metal cluster, 144 cores, 384 gb ram on some dense synthetic data. I installed OpenBLAS customized for the hardware on the nodes (I can confirm it's successfully using NativeBLAS, not positive it's optimized though). With this patch at first, I was seeing something like 10 minute iteration times compared to master branch of ~30 seconds. After refactoring the code to avoid some copying, I am still seeing about a 3-5x slowdown using this approach. I am still working through some of the timings and I haven't done a lot of experimentation with the block size. I will give more details at some point. For now, I can point out that copying the center in [here](https://github.com/yanboliang/spark/blob/1c31cda0f78b8c2b11406d76da447e9b3216a97d/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L379) seems to have a huge impact.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org