Re: [Numpy-discussion] OpenBLAS on Mac

2014-02-22 Thread Sturla Molden

On 22/02/14 23:39, Sturla Molden wrote:


Ok, next runner up is Accelerate. Let's see how it compares to OpenBLAS
and MKL on Mavericks.


It seems Accelerate has roughly the same performance as MKL now.

Did the upgrade to Mavericks do this?



These are the compile lines, in case you wonder:

$ CC -O2 -o perftest_openblas -I/opt/OpenBLAS/include 
-L/opt/OpenBLAS/lib perftest_openblas.c -lopenblas


$ CC -O2 -o perftest_accelerate perftest_accelerate.c -framework Accelerate

$ source /opt/intel/composer_xe_2013/mkl/bin/mklvars.sh intel64
$ icc -O2 -o perftest_mkl -mkl -static-intel perftest_mkl.c




Sturla





#include 
#include 
#include 
#include 

double nanodiff(const uint64_t _t0, const uint64_t _t1)
{   
long double t0, t1, numer, denom, nanosec;
mach_timebase_info_data_t tb_info;
mach_timebase_info(&tb_info);
numer = (long double)(tb_info.numer);
denom = (long double)(tb_info.denom);
t0 = (long double)(_t0);
t1 = (long double)(_t1);
nanosec = (t1 - t0) * numer / denom;
return (double)nanosec;
}

int main(int argc, char **argv)
{
long double nanosec;
int n = 512;
int m = n, k = n;
double *A = (double*)malloc(n*n*sizeof(double)); 
double *B = (double*)malloc(n*n*sizeof(double)); 
double *C = (double*)malloc(n*n*sizeof(double));
uint64_t t0, t1;

t0 = mach_absolute_time();

cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
   m, n, k, 1.0, A, k, B, n, 1.0, C, n);

t1 = mach_absolute_time();

nanosec = nanodiff(t0, t1);

printf("elapsed time: %g ns\n", (double)nanosec);

free(A); free(B); free(C);
}


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenBLAS on Mac

2014-02-22 Thread Sturla Molden

On 22/02/14 22:15, Nathaniel Smith wrote:


$ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1


You'll definitely want to disable the affinity support too, and
probably memory warmup. And possibly increase the maximum thread
count, unless you'll only use the library on the computer it was built
on. And maybe other things. The OpenBLAS build process has so many
ways to accidentally impale yourself, it's an object lesson in why
building regulations are a good thing.


Thanks for the advice.

Right now I am just testing on my own computer.

cblas_dgemm is running roughly 50 % faster with OpenBLAS than MKL 11.1 
update 2, sometimes OpenBLAS is twice as fast as MKL.


WTF???

:-D

Ok, next runner up is Accelerate. Let's see how it compares to OpenBLAS 
and MKL on Mavericks.



Sturla


#include 
#include 
#include 
#include "mkl.h"

double nanodiff(const uint64_t _t0, const uint64_t _t1)
{   
long double t0, t1, numer, denom, nanosec;
mach_timebase_info_data_t tb_info;
mach_timebase_info(&tb_info);
numer = (long double)(tb_info.numer);
denom = (long double)(tb_info.denom);
t0 = (long double)(_t0);
t1 = (long double)(_t1);
nanosec = (t1 - t0) * numer / denom;
return (double)nanosec;
}

int main(int argc, char **argv)
{
const int BOUNDARY = 64;
long double nanosec;
int n = 512;
int m = n, k = n;
double *A = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); 
double *B = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); 
double *C = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY);
uint64_t t0, t1;

t0 = mach_absolute_time();

cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
   m, n, k, 1.0, A, k, B, n, 1.0, C, n);

t1 = mach_absolute_time();

nanosec = nanodiff(t0, t1);

printf("elapsed time: %g ns\n", (double)nanosec);

mkl_free(A); mkl_free(B); mkl_free(C);
}


#include 
#include 
#include 
#include 

double nanodiff(const uint64_t _t0, const uint64_t _t1)
{   
long double t0, t1, numer, denom, nanosec;
mach_timebase_info_data_t tb_info;
mach_timebase_info(&tb_info);
numer = (long double)(tb_info.numer);
denom = (long double)(tb_info.denom);
t0 = (long double)(_t0);
t1 = (long double)(_t1);
nanosec = (t1 - t0) * numer / denom;
return (double)nanosec;
}

int main(int argc, char **argv)
{
long double nanosec;
int n = 512;
int m = n, k = n;
double *A = (double*)malloc(n*n*sizeof(double)); 
double *B = (double*)malloc(n*n*sizeof(double)); 
double *C = (double*)malloc(n*n*sizeof(double));
uint64_t t0, t1;

t0 = mach_absolute_time();

cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
   m, n, k, 1.0, A, k, B, n, 1.0, C, n);

t1 = mach_absolute_time();

nanosec = nanodiff(t0, t1);

printf("elapsed time: %g ns\n", (double)nanosec);

free(A); free(B); free(C);
}


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenBLAS on Mac

2014-02-22 Thread Nathaniel Smith
On Sat, Feb 22, 2014 at 3:55 PM, Sturla Molden  wrote:
> On 20/02/14 17:57, Jurgen Van Gael wrote:
>> Hi All,
>>
>> I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy.
>> I've downloaded the OpenBLAS source and compiled it (thanks to Olivier
>> Grisel).
>
> How?
>
> $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1

You'll definitely want to disable the affinity support too, and
probably memory warmup. And possibly increase the maximum thread
count, unless you'll only use the library on the computer it was built
on. And maybe other things. The OpenBLAS build process has so many
ways to accidentally impale yourself, it's an object lesson in why
building regulations are a good thing.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenBLAS on Mac

2014-02-22 Thread Sturla Molden
On 22/02/14 22:00, Robert Kern wrote:

> If you actually want some help, you will have to provide a *little* more 
> detail.


$ git clone https://github.com/xianyi/OpenBLAS

Oops...

$ cd OpenBLAS

did the trick. I need some coffee :)

Sturla






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenBLAS on Mac

2014-02-22 Thread Robert Kern
On Sat, Feb 22, 2014 at 8:55 PM, Sturla Molden  wrote:
> On 20/02/14 17:57, Jurgen Van Gael wrote:
>> Hi All,
>>
>> I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy.
>> I've downloaded the OpenBLAS source and compiled it (thanks to Olivier
>> Grisel).
>
> How?
>
> $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1
> make: *** No targets specified and no makefile found.  Stop.
>
> (staying with MKL...)

Without any further details about what you downloaded and where you
executed this command, one can only assume PEBCAK. There is certainly
a Makefile in the root directory of the OpenBLAS source:
https://github.com/xianyi/OpenBLAS

If you actually want some help, you will have to provide a *little* more detail.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenBLAS on Mac

2014-02-22 Thread Sturla Molden
On 20/02/14 17:57, Jurgen Van Gael wrote:
> Hi All,
>
> I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy.
> I've downloaded the OpenBLAS source and compiled it (thanks to Olivier
> Grisel).

How?

$ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1
make: *** No targets specified and no makefile found.  Stop.

(staying with MKL...)


Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenBLAS on Mac

2014-02-21 Thread Olivier Grisel
Indeed I just ran the bench on my Mac and OSX Veclib is more than 2x
faster than OpenBLAS on such squared matrix multiplication (I just
have 2 physical cores on this box).

MKL from Canopy Express is slightly slower OpenBLAS for this GEMM
bench on that box.

I really wonder why Veclib is faster in this case. Maybe OSX 10.9 did
improve its perf...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenBLAS on Mac

2014-02-21 Thread Jurgen Van Gael
First thing I noticed when installing into /opt/OpenBLAS was that the
LAPACK header files were not being copied properly. This was because the
OpenBLAS makefile uses the "-D" option in the install command which the
default Mac install doesn't support. A quick "brew install coreutils"
solved that problem.

I rebuilt a new virtualenv and rebuilt numpy into it using the OpenBLAS in
/opt/OpenBLAS and things seem to be absolutely fine now. I can run the
OpenBLAS version on my mac. Thanks for the suggestions!

I ran the test:
https://gist.githubusercontent.com/osdf/3842524/raw/df01f7fa9d849bec353d6ab03eae0c1ee68f1538/test_numpy.py

On my Macbook Pro (Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz, 8GB Ram)
disappointingly:
the Atlas version gives consistent 0.02,
the OpenBLAS version runs in 0.1 with OMP_NUM_THREADS=1 and 0.04 with
OMP_NUM_THREADS=8.

Happy to run a more extensive test suite if anyone is interested.

Jurgen




On Thu, Feb 20, 2014 at 6:07 PM, Olivier Grisel wrote:

> I have exactly the same setup as yours and it links to OpenBLAS
> correctly (in a venv as well, installed with python setup.py install).
> The only difference is that I installed OpenBLAS in the default
> folder: /opt/OpenBLAS (and I reflected that in site.cfg).
>
> When you run otool -L, is it in your source tree or do you point to
> the numpy/core/_dotblas.so of the site-packages folder of your venv?
>
> If you activate your venv, go to a different folder (e.g. /tmp) and type:
>
> python -c "import numpy as np; np.show_config()"
>
> what do you get? I get:
>
> $ python -c "import numpy as np; np.show_config()"
> lapack_opt_info:
> libraries = ['openblas', 'openblas']
> library_dirs = ['/opt/OpenBLAS/lib']
> language = f77
> blas_opt_info:
> libraries = ['openblas', 'openblas']
> library_dirs = ['/opt/OpenBLAS/lib']
> language = f77
> openblas_info:
> libraries = ['openblas', 'openblas']
> library_dirs = ['/opt/OpenBLAS/lib']
> language = f77
> blas_mkl_info:
>   NOT AVAILABLE
>
> --
> Olivier
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OpenBLAS on Mac

2014-02-20 Thread Olivier Grisel
I have exactly the same setup as yours and it links to OpenBLAS
correctly (in a venv as well, installed with python setup.py install).
The only difference is that I installed OpenBLAS in the default
folder: /opt/OpenBLAS (and I reflected that in site.cfg).

When you run otool -L, is it in your source tree or do you point to
the numpy/core/_dotblas.so of the site-packages folder of your venv?

If you activate your venv, go to a different folder (e.g. /tmp) and type:

python -c "import numpy as np; np.show_config()"

what do you get? I get:

$ python -c "import numpy as np; np.show_config()"
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/opt/OpenBLAS/lib']
language = f77
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/opt/OpenBLAS/lib']
language = f77
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/opt/OpenBLAS/lib']
language = f77
blas_mkl_info:
  NOT AVAILABLE

-- 
Olivier
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion