[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cfb718 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs 8cfb718 is described below commit 8cfb7183865c5358a547ec892f10d4f1350300ff Author: Xiaochang Wu AuthorDate: Tue Jul 28 08:36:11 2020 -0700 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu Co-authored-by: Wu, Xiaochang Signed-off-by: Huaxin Gao (cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90) Signed-off-by: Huaxin Gao --- docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddce98b..1b4a3e4 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is n [...] +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead: +``` +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS +``` To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md new file mode 100644 index 000..7390913 --- /dev/null +++ b/docs/ml-linalg-guide.md @@ -0,0 +1,103 @@ +--- +layout: global +title: MLlib Linear Algebra Acceleration Guide +displayTitle: MLlib Linear Algebra Acceleration Guide +license: | + Licensed to the Apache Software Foundation
[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cfb718 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs 8cfb718 is described below commit 8cfb7183865c5358a547ec892f10d4f1350300ff Author: Xiaochang Wu AuthorDate: Tue Jul 28 08:36:11 2020 -0700 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu Co-authored-by: Wu, Xiaochang Signed-off-by: Huaxin Gao (cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90) Signed-off-by: Huaxin Gao --- docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddce98b..1b4a3e4 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is n [...] +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead: +``` +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS +``` To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md new file mode 100644 index 000..7390913 --- /dev/null +++ b/docs/ml-linalg-guide.md @@ -0,0 +1,103 @@ +--- +layout: global +title: MLlib Linear Algebra Acceleration Guide +displayTitle: MLlib Linear Algebra Acceleration Guide +license: | + Licensed to the Apache Software Foundation
[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cfb718 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs 8cfb718 is described below commit 8cfb7183865c5358a547ec892f10d4f1350300ff Author: Xiaochang Wu AuthorDate: Tue Jul 28 08:36:11 2020 -0700 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu Co-authored-by: Wu, Xiaochang Signed-off-by: Huaxin Gao (cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90) Signed-off-by: Huaxin Gao --- docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddce98b..1b4a3e4 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is n [...] +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead: +``` +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS +``` To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md new file mode 100644 index 000..7390913 --- /dev/null +++ b/docs/ml-linalg-guide.md @@ -0,0 +1,103 @@ +--- +layout: global +title: MLlib Linear Algebra Acceleration Guide +displayTitle: MLlib Linear Algebra Acceleration Guide +license: | + Licensed to the Apache Software Foundation
[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cfb718 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs 8cfb718 is described below commit 8cfb7183865c5358a547ec892f10d4f1350300ff Author: Xiaochang Wu AuthorDate: Tue Jul 28 08:36:11 2020 -0700 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu Co-authored-by: Wu, Xiaochang Signed-off-by: Huaxin Gao (cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90) Signed-off-by: Huaxin Gao --- docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddce98b..1b4a3e4 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is n [...] +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead: +``` +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS +``` To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md new file mode 100644 index 000..7390913 --- /dev/null +++ b/docs/ml-linalg-guide.md @@ -0,0 +1,103 @@ +--- +layout: global +title: MLlib Linear Algebra Acceleration Guide +displayTitle: MLlib Linear Algebra Acceleration Guide +license: | + Licensed to the Apache Software Foundation
[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cfb718 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs 8cfb718 is described below commit 8cfb7183865c5358a547ec892f10d4f1350300ff Author: Xiaochang Wu AuthorDate: Tue Jul 28 08:36:11 2020 -0700 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu Co-authored-by: Wu, Xiaochang Signed-off-by: Huaxin Gao (cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90) Signed-off-by: Huaxin Gao --- docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddce98b..1b4a3e4 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is n [...] +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead: +``` +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS +``` To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md new file mode 100644 index 000..7390913 --- /dev/null +++ b/docs/ml-linalg-guide.md @@ -0,0 +1,103 @@ +--- +layout: global +title: MLlib Linear Algebra Acceleration Guide +displayTitle: MLlib Linear Algebra Acceleration Guide +license: | + Licensed to the Apache Software Foundation