Re: [Numpy-discussion] OpenBLAS on Mac
On 22/02/14 23:39, Sturla Molden wrote: Ok, next runner up is Accelerate. Let's see how it compares to OpenBLAS and MKL on Mavericks. It seems Accelerate has roughly the same performance as MKL now. Did the upgrade to Mavericks do this? These are the compile lines, in case you wonder: $ CC -O2 -o perftest_openblas -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib perftest_openblas.c -lopenblas $ CC -O2 -o perftest_accelerate perftest_accelerate.c -framework Accelerate $ source /opt/intel/composer_xe_2013/mkl/bin/mklvars.sh intel64 $ icc -O2 -o perftest_mkl -mkl -static-intel perftest_mkl.c Sturla #include #include #include #include double nanodiff(const uint64_t _t0, const uint64_t _t1) { long double t0, t1, numer, denom, nanosec; mach_timebase_info_data_t tb_info; mach_timebase_info(&tb_info); numer = (long double)(tb_info.numer); denom = (long double)(tb_info.denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } int main(int argc, char **argv) { long double nanosec; int n = 512; int m = n, k = n; double *A = (double*)malloc(n*n*sizeof(double)); double *B = (double*)malloc(n*n*sizeof(double)); double *C = (double*)malloc(n*n*sizeof(double)); uint64_t t0, t1; t0 = mach_absolute_time(); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, n, 1.0, C, n); t1 = mach_absolute_time(); nanosec = nanodiff(t0, t1); printf("elapsed time: %g ns\n", (double)nanosec); free(A); free(B); free(C); } ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
On 22/02/14 22:15, Nathaniel Smith wrote: $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 You'll definitely want to disable the affinity support too, and probably memory warmup. And possibly increase the maximum thread count, unless you'll only use the library on the computer it was built on. And maybe other things. The OpenBLAS build process has so many ways to accidentally impale yourself, it's an object lesson in why building regulations are a good thing. Thanks for the advice. Right now I am just testing on my own computer. cblas_dgemm is running roughly 50 % faster with OpenBLAS than MKL 11.1 update 2, sometimes OpenBLAS is twice as fast as MKL. WTF??? :-D Ok, next runner up is Accelerate. Let's see how it compares to OpenBLAS and MKL on Mavericks. Sturla #include #include #include #include "mkl.h" double nanodiff(const uint64_t _t0, const uint64_t _t1) { long double t0, t1, numer, denom, nanosec; mach_timebase_info_data_t tb_info; mach_timebase_info(&tb_info); numer = (long double)(tb_info.numer); denom = (long double)(tb_info.denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } int main(int argc, char **argv) { const int BOUNDARY = 64; long double nanosec; int n = 512; int m = n, k = n; double *A = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); double *B = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); double *C = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); uint64_t t0, t1; t0 = mach_absolute_time(); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, n, 1.0, C, n); t1 = mach_absolute_time(); nanosec = nanodiff(t0, t1); printf("elapsed time: %g ns\n", (double)nanosec); mkl_free(A); mkl_free(B); mkl_free(C); } #include #include #include #include double nanodiff(const uint64_t _t0, const uint64_t _t1) { long double t0, t1, numer, denom, nanosec; mach_timebase_info_data_t tb_info; mach_timebase_info(&tb_info); numer = (long double)(tb_info.numer); denom = (long double)(tb_info.denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } int main(int argc, char **argv) { long double nanosec; int n = 512; int m = n, k = n; double *A = (double*)malloc(n*n*sizeof(double)); double *B = (double*)malloc(n*n*sizeof(double)); double *C = (double*)malloc(n*n*sizeof(double)); uint64_t t0, t1; t0 = mach_absolute_time(); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, n, 1.0, C, n); t1 = mach_absolute_time(); nanosec = nanodiff(t0, t1); printf("elapsed time: %g ns\n", (double)nanosec); free(A); free(B); free(C); } ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
On Sat, Feb 22, 2014 at 3:55 PM, Sturla Molden wrote: > On 20/02/14 17:57, Jurgen Van Gael wrote: >> Hi All, >> >> I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy. >> I've downloaded the OpenBLAS source and compiled it (thanks to Olivier >> Grisel). > > How? > > $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 You'll definitely want to disable the affinity support too, and probably memory warmup. And possibly increase the maximum thread count, unless you'll only use the library on the computer it was built on. And maybe other things. The OpenBLAS build process has so many ways to accidentally impale yourself, it's an object lesson in why building regulations are a good thing. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
On 22/02/14 22:00, Robert Kern wrote: > If you actually want some help, you will have to provide a *little* more > detail. $ git clone https://github.com/xianyi/OpenBLAS Oops... $ cd OpenBLAS did the trick. I need some coffee :) Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
On Sat, Feb 22, 2014 at 8:55 PM, Sturla Molden wrote: > On 20/02/14 17:57, Jurgen Van Gael wrote: >> Hi All, >> >> I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy. >> I've downloaded the OpenBLAS source and compiled it (thanks to Olivier >> Grisel). > > How? > > $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 > make: *** No targets specified and no makefile found. Stop. > > (staying with MKL...) Without any further details about what you downloaded and where you executed this command, one can only assume PEBCAK. There is certainly a Makefile in the root directory of the OpenBLAS source: https://github.com/xianyi/OpenBLAS If you actually want some help, you will have to provide a *little* more detail. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
On 20/02/14 17:57, Jurgen Van Gael wrote: > Hi All, > > I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy. > I've downloaded the OpenBLAS source and compiled it (thanks to Olivier > Grisel). How? $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 make: *** No targets specified and no makefile found. Stop. (staying with MKL...) Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
Indeed I just ran the bench on my Mac and OSX Veclib is more than 2x faster than OpenBLAS on such squared matrix multiplication (I just have 2 physical cores on this box). MKL from Canopy Express is slightly slower OpenBLAS for this GEMM bench on that box. I really wonder why Veclib is faster in this case. Maybe OSX 10.9 did improve its perf... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
First thing I noticed when installing into /opt/OpenBLAS was that the LAPACK header files were not being copied properly. This was because the OpenBLAS makefile uses the "-D" option in the install command which the default Mac install doesn't support. A quick "brew install coreutils" solved that problem. I rebuilt a new virtualenv and rebuilt numpy into it using the OpenBLAS in /opt/OpenBLAS and things seem to be absolutely fine now. I can run the OpenBLAS version on my mac. Thanks for the suggestions! I ran the test: https://gist.githubusercontent.com/osdf/3842524/raw/df01f7fa9d849bec353d6ab03eae0c1ee68f1538/test_numpy.py On my Macbook Pro (Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz, 8GB Ram) disappointingly: the Atlas version gives consistent 0.02, the OpenBLAS version runs in 0.1 with OMP_NUM_THREADS=1 and 0.04 with OMP_NUM_THREADS=8. Happy to run a more extensive test suite if anyone is interested. Jurgen On Thu, Feb 20, 2014 at 6:07 PM, Olivier Grisel wrote: > I have exactly the same setup as yours and it links to OpenBLAS > correctly (in a venv as well, installed with python setup.py install). > The only difference is that I installed OpenBLAS in the default > folder: /opt/OpenBLAS (and I reflected that in site.cfg). > > When you run otool -L, is it in your source tree or do you point to > the numpy/core/_dotblas.so of the site-packages folder of your venv? > > If you activate your venv, go to a different folder (e.g. /tmp) and type: > > python -c "import numpy as np; np.show_config()" > > what do you get? I get: > > $ python -c "import numpy as np; np.show_config()" > lapack_opt_info: > libraries = ['openblas', 'openblas'] > library_dirs = ['/opt/OpenBLAS/lib'] > language = f77 > blas_opt_info: > libraries = ['openblas', 'openblas'] > library_dirs = ['/opt/OpenBLAS/lib'] > language = f77 > openblas_info: > libraries = ['openblas', 'openblas'] > library_dirs = ['/opt/OpenBLAS/lib'] > language = f77 > blas_mkl_info: > NOT AVAILABLE > > -- > Olivier > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
I have exactly the same setup as yours and it links to OpenBLAS correctly (in a venv as well, installed with python setup.py install). The only difference is that I installed OpenBLAS in the default folder: /opt/OpenBLAS (and I reflected that in site.cfg). When you run otool -L, is it in your source tree or do you point to the numpy/core/_dotblas.so of the site-packages folder of your venv? If you activate your venv, go to a different folder (e.g. /tmp) and type: python -c "import numpy as np; np.show_config()" what do you get? I get: $ python -c "import numpy as np; np.show_config()" lapack_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = f77 blas_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = f77 openblas_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = f77 blas_mkl_info: NOT AVAILABLE -- Olivier ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion