[GitHub] spark issue #14937: [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KMeans i...

yanboliang Sun, 30 Oct 2016 22:16:02 -0700

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/14937
  
    @sethah I think the test result can be reproduced against the current 
patch, however, there are two issues should be considered:
    * Make sure you installed optimized/native BLAS on your system and loaded 
it correctly in JVM via netlib-java. Otherwise, it will fall back to Java 
implementation.
    * Make sure you load the dataset via DenseVector which will be converted 
into DenseMatrix and get performance improvement.
    
    ```Scala
    val df = spark.read.format("libsvm").options(Map("vectorType" -> 
"dense")).load(path)
    ```
    Spark loads dataset of libsvm format into SparseVector/SparseMatrix by 
default, and it will fall into the branch of processing sparse data which will 
cause huge performance degradation.
    
    Could you share some of your test detail? If you already considered the 
above two tips correctly, please let me know as well. I'm on a business travel 
and will resolve the merge conflicts in a few days. I'm very appreciate to hear 
your thoughts about this issue. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14937: [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KMeans i...

Reply via email to