GitHub user KyleLi1985 opened a pull request:

    https://github.com/apache/spark/pull/22893

    One part of Spark MLlib Kmean Logic Performance problem

    [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic Performance problem
    
    ## What changes were proposed in this pull request?
    Reduce low performance logic in function fastSquaredDistance
    
    ## How was this patch tested?
    ./dev/run-tests pass
    Calculate two vector for test fastSquaredDistance 100000000 times
    1 2 3 4 3 4 5 6 7 8 9 0 1 3 4 6 7 4 2 2 5 7 8 9 3 2 3 5 7 8 9 3 3 2 1 1 2 2 
9 3 3 4 5
    4 5 2 1 5 6 3 2 1 3 4 6 7 8 9 0 3 2 1 2 3 4 5 6 7 8 5 3 2 1 4 5 6 7 8 4 3 2 
4 6 7 8 9
    After added patch, the cost time update from 8395 to 5448 milliseconds


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KyleLi1985/spark updatekmeanpatch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22893.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22893
    
----
commit 701223b39a0c473de865de30b0017af4883441f3
Author: 李亮 <liang.li.work@...>
Date:   2018-10-30T11:03:02Z

    upgrade kmean performance

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to