[jira] [Commented] (SPARK-21305) The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance

Yanbo Liang (JIRA) Tue, 04 Jul 2017 22:43:16 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074275#comment-16074275
 ]


Yanbo Liang commented on SPARK-21305:
-------------------------------------

{quote}
For example, for the ALS recommendForAll method before SPARK 2.2 which uses 
BLAS gemm for matrix multiplication. 
If you only test the matrix multiplication performance of native BLAS gemm 
(like Intel MKL, and OpenBLAS) and netlib-java F2j BLAS gemm , the native BLAS 
is about 10X performance improvement. But if you test the Spark Job end-to-end 
performance, F2j is much faster than native BLAS, very interesting.
{quote}
I saw similar issue when I was doing {{KMeans}} optimization by BLAS gemm for 
matrix multiplication. My test environment is MacOS with native BLAS enabled. I 
didn't have enough time to study it in-depth. [~peng.m...@intel.com] Could you 
share some performance test result to let us understand this issue more 
impressive? Thanks.

> The BKM (best known methods) of using native BLAS to improvement ML/MLLIB 
> performance
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-21305
>                 URL: https://issues.apache.org/jira/browse/SPARK-21305
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, MLlib
>    Affects Versions: 2.3.0
>            Reporter: Peng Meng
>            Priority: Critical
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Many ML/MLLIB algorithms use native BLAS (like Intel MKL, ATLAS, OpenBLAS) to 
> improvement the performance. 
> The methods to use native BLAS is important for the performance,  sometimes 
> (high opportunity) native BLAS even causes worse performance.  
> For example, for the ALS recommendForAll method before SPARK 2.2 which uses 
> BLAS gemm for matrix multiplication. 
> If you only test the matrix multiplication performance of native BLAS gemm 
> (like Intel MKL, and OpenBLAS) and netlib-java F2j BLAS gemm , the native 
> BLAS is about 10X performance improvement.  But if you test the Spark Job 
> end-to-end performance, F2j is much faster than native BLAS, very 
> interesting. 
> I spend much time for this problem, and find we should not use native BLAS 
> (like OpenBLAS and Intel MKL) which support multi-thread with no any setting. 
> By default, this native BLAS will enable multi-thread, which will conflict 
> with Spark executor.  You can use multi-thread native BLAS, but it is better 
> to disable multi-thread first. 
> https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded
> https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications
> I think we should add some comments in docs/ml-guilde.md for this first. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21305) The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance

Reply via email to