Re: [Numpy-discussion] OpenBLAS on Mac
On 20/02/14 17:57, Jurgen Van Gael wrote: Hi All, I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy. I've downloaded the OpenBLAS source and compiled it (thanks to Olivier Grisel). How? $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 make: *** No targets specified and no makefile found. Stop. (staying with MKL...) Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
On Sat, Feb 22, 2014 at 8:55 PM, Sturla Molden sturla.mol...@gmail.com wrote: On 20/02/14 17:57, Jurgen Van Gael wrote: Hi All, I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy. I've downloaded the OpenBLAS source and compiled it (thanks to Olivier Grisel). How? $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 make: *** No targets specified and no makefile found. Stop. (staying with MKL...) Without any further details about what you downloaded and where you executed this command, one can only assume PEBCAK. There is certainly a Makefile in the root directory of the OpenBLAS source: https://github.com/xianyi/OpenBLAS If you actually want some help, you will have to provide a *little* more detail. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
On 22/02/14 22:00, Robert Kern wrote: If you actually want some help, you will have to provide a *little* more detail. $ git clone https://github.com/xianyi/OpenBLAS Oops... $ cd OpenBLAS did the trick. I need some coffee :) Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
On Sat, Feb 22, 2014 at 3:55 PM, Sturla Molden sturla.mol...@gmail.com wrote: On 20/02/14 17:57, Jurgen Van Gael wrote: Hi All, I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy. I've downloaded the OpenBLAS source and compiled it (thanks to Olivier Grisel). How? $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 You'll definitely want to disable the affinity support too, and probably memory warmup. And possibly increase the maximum thread count, unless you'll only use the library on the computer it was built on. And maybe other things. The OpenBLAS build process has so many ways to accidentally impale yourself, it's an object lesson in why building regulations are a good thing. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] A PEP for adding infix matrix multiply to Python
[Apologies for wide distribution -- please direct followups to either the github PR linked below, or else numpy-discussion@scipy.org] After the numpy-discussion thread about np.matrix a week or so back, I got curious and read the old PEPs that attempted to add better matrix/elementwise operators to Python. http://legacy.python.org/dev/peps/pep-0211/ http://legacy.python.org/dev/peps/pep-0225/ And I was a bit surprised -- if I were BDFL I probably would have rejected these PEPs too. One is actually a proposal to make itertools.product into an infix operator, which no-one would consider seriously on its own merits. And the other adds a whole pile of weirdly spelled new operators with no clear idea about what they should do. But it seems to me that at this point, with the benefit of multiple years more experience, we know much better what we want -- basically, just a nice clean infix op for matrix multiplication. And that just asking for this directly, and explaining clearly why we want it, is something that hasn't been tried. So maybe we should try and see what happens. As a starting point for discussion, I wrote a draft. It can be read and commented on here: https://github.com/numpy/numpy/pull/4351 It's important that if we're going to do this at all, we do it right, and that means being able to honestly say that this document represents our consensus when going to python-dev. So if you think you might object please do so now :-) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] How exactly ought 'dot' to work?
Hi all, Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. So here's one proposal. # CURRENT: dot(0d, any) - scalar multiplication dot(any, 0d) - scalar multiplication dot(1d, 1d) - inner product dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then discard added axis dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard added axis dot(2d, 2d) - matrix multiply dot(2-or-more d, 2-or-more d) - a complicated outer product thing: Specifically, if the inputs have shapes (r, n, m), (s, m, k), then numpy returns an array with shape (r, s, n, k), created like: for i in range(r): for j in range(s): output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) # PROPOSED: General rule: given dot on shape1, shape2, we try to match these shapes against two templates like (..., n?, m) and (..., m, k?) where ... indicates zero or more dimensions, and ? indicates an optional axis. ? axes are always matched before ... axes, so for an input with ndim=2, the ? axis is always matched. An unmatched ? axis is treated as having size 1. Next, the ... axes are broadcast against each other in the usual way (prepending 1s to make lengths the same, requiring corresponding entries to either match or have the value 1). And then the actual computations are performed using the usual broadcasting rules. Finally, we return an output with shape (..., n?, k?). Here ... indicates the result of broadcasting the input ...'s against each other. And, n? and k? mean: either the value taken from the input shape, if the corresponding entry was matched -- but if no match was made, then we leave this entry out. The idea is that just as a column vector on the right is m x 1, a 1d vector on the right is treated as m x nothing. For purposes of actually computing the product, nothing acts like 1, as mentioned above. But it makes a difference in what we return: in each of these cases we copy the input shape into the output, so we can get an output with shape (n, nothing), or (nothing, k), or (nothing, nothing), which work out to be (n,), (k,) and (), respectively. This gives a (somewhat) intuitive principle for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they are, and a general template for extending this behaviour to other operations like gufunc 'solve'. Anyway, the end result of this is that the PROPOSED behaviour differs from the current behaviour in the following ways: - passing 0d arrays to 'dot' becomes an error. (This in particular is an important thing to know, because if core Python adds an operator for 'dot', then we must decide what it should do for Python scalars, which are logically 0d.) - ndim2 arrays are now handled by aligning and broadcasting the extra axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) returns (r, m, k), not (r, r, m, k). Comments? -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
Hi, On Sat, Feb 22, 2014 at 2:03 PM, Nathaniel Smith n...@pobox.com wrote: Hi all, Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. So here's one proposal. # CURRENT: dot(0d, any) - scalar multiplication dot(any, 0d) - scalar multiplication dot(1d, 1d) - inner product dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then discard added axis dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard added axis dot(2d, 2d) - matrix multiply dot(2-or-more d, 2-or-more d) - a complicated outer product thing: Specifically, if the inputs have shapes (r, n, m), (s, m, k), then numpy returns an array with shape (r, s, n, k), created like: for i in range(r): for j in range(s): output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) # PROPOSED: General rule: given dot on shape1, shape2, we try to match these shapes against two templates like (..., n?, m) and (..., m, k?) where ... indicates zero or more dimensions, and ? indicates an optional axis. ? axes are always matched before ... axes, so for an input with ndim=2, the ? axis is always matched. An unmatched ? axis is treated as having size 1. Next, the ... axes are broadcast against each other in the usual way (prepending 1s to make lengths the same, requiring corresponding entries to either match or have the value 1). And then the actual computations are performed using the usual broadcasting rules. Finally, we return an output with shape (..., n?, k?). Here ... indicates the result of broadcasting the input ...'s against each other. And, n? and k? mean: either the value taken from the input shape, if the corresponding entry was matched -- but if no match was made, then we leave this entry out. The idea is that just as a column vector on the right is m x 1, a 1d vector on the right is treated as m x nothing. For purposes of actually computing the product, nothing acts like 1, as mentioned above. But it makes a difference in what we return: in each of these cases we copy the input shape into the output, so we can get an output with shape (n, nothing), or (nothing, k), or (nothing, nothing), which work out to be (n,), (k,) and (), respectively. This gives a (somewhat) intuitive principle for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they are, and a general template for extending this behaviour to other operations like gufunc 'solve'. Anyway, the end result of this is that the PROPOSED behaviour differs from the current behaviour in the following ways: - passing 0d arrays to 'dot' becomes an error. (This in particular is an important thing to know, because if core Python adds an operator for 'dot', then we must decide what it should do for Python scalars, which are logically 0d.) - ndim2 arrays are now handled by aligning and broadcasting the extra axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) returns (r, m, k), not (r, r, m, k). Comments? The discussion might become confusing in the conflation of: * backward incompatible changes to dot * coherent behavior to propose in a PEP Maybe we could concentrate on the second, on the basis it's likely to be a few years until it is available, and the work out compatibility if the PEP gets accepted. If A @ B means 'matrix multiply A, B' - then it would be a shame to raise an error if A or B is a scalar. Sympy, for example, will allow matrix multiplication by a scalar, MATLAB / Octave too. I have used 2D dot calls in the past, maybe still do, I'm not sure. Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
On Feb 22, 2014 2:03 PM, Nathaniel Smith n...@pobox.com wrote: Hi all, Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. So here's one proposal. # CURRENT: dot(0d, any) - scalar multiplication dot(any, 0d) - scalar multiplication dot(1d, 1d) - inner product dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then discard added axis dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard added axis dot(2d, 2d) - matrix multiply dot(2-or-more d, 2-or-more d) - a complicated outer product thing: Specifically, if the inputs have shapes (r, n, m), (s, m, k), then numpy returns an array with shape (r, s, n, k), created like: for i in range(r): for j in range(s): output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) # PROPOSED: General rule: given dot on shape1, shape2, we try to match these shapes against two templates like (..., n?, m) and (..., m, k?) where ... indicates zero or more dimensions, and ? indicates an optional axis. ? axes are always matched before ... axes, so for an input with ndim=2, the ? axis is always matched. An unmatched ? axis is treated as having size 1. Next, the ... axes are broadcast against each other in the usual way (prepending 1s to make lengths the same, requiring corresponding entries to either match or have the value 1). And then the actual computations are performed using the usual broadcasting rules. Finally, we return an output with shape (..., n?, k?). Here ... indicates the result of broadcasting the input ...'s against each other. And, n? and k? mean: either the value taken from the input shape, if the corresponding entry was matched -- but if no match was made, then we leave this entry out. The idea is that just as a column vector on the right is m x 1, a 1d vector on the right is treated as m x nothing. For purposes of actually computing the product, nothing acts like 1, as mentioned above. But it makes a difference in what we return: in each of these cases we copy the input shape into the output, so we can get an output with shape (n, nothing), or (nothing, k), or (nothing, nothing), which work out to be (n,), (k,) and (), respectively. This gives a (somewhat) intuitive principle for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they are, and a general template for extending this behaviour to other operations like gufunc 'solve'. Anyway, the end result of this is that the PROPOSED behaviour differs from the current behaviour in the following ways: - passing 0d arrays to 'dot' becomes an error. (This in particular is an important thing to know, because if core Python adds an operator for 'dot', then we must decide what it should do for Python scalars, which are logically 0d.) - ndim2 arrays are now handled by aligning and broadcasting the extra axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) returns (r, m, k), not (r, r, m, k). Comments? The proposed behavior for ndim 2 is what matrix_multiply (is it still in umath_tests?) does. The nice thing of the proposed new behavior is that the old behavior is easy to reproduce by fooling a little around with the shape of the first argument, while the opposite is not true. Jaime -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
On Sat, Feb 22, 2014 at 5:17 PM, Matthew Brett matthew.br...@gmail.com wrote: The discussion might become confusing in the conflation of: * backward incompatible changes to dot * coherent behavior to propose in a PEP Right, I definitely am asking about how we think the ideal dot operator should work. The migration strategy to get to there from here is a separate question, that will raise a bunch of details that are a distraction from the fundamental question of what we *want*. If A @ B means 'matrix multiply A, B' - then it would be a shame to raise an error if A or B is a scalar. Sympy, for example, will allow matrix multiplication by a scalar, MATLAB / Octave too. Interesting! We do disagree on this then. I feel strongly that given separate matrix product and elementwise product operators @ and *, then 'scalar @ matrix' should be an error, and if you want scalar (elementwise) product then you should write 'scalar * matrix'. Sympy, MATLAB, Octave are not really good guides, because either they have only a single operator available (Sympy with *, np.matrix with *), or they have an alternative operator available but it's annoying to type and rarely used (MATLAB/Octave with .*). For us, the two operators are both first-class, and we've all decided that the scalar/elementwise operator is actually the more important and commonly used one, and matrix multiply is the unusual case (regardless of whether we end up spelling it 'dot' or '@'). So why would we want a special case for scalars in dot/@? And anyway, TOOWTDI. Notation like 'L * X @ Y' really makes it immediately clear what sort of operations we're dealing with, too. I have used 2D dot calls in the past, maybe still do, I'm not sure. Have you used dot(2D, 2D)? That's the case that this proposal would change -- dot(2D, =2D) is the same under both the outer product and broadcasting proposals. I feel pretty strongly that the broadcasting proposal is the right thing, for consistency with other operators -- e.g., all ufuncs and all functions in np.linalg already do broadcasting, so 'dot' is currently really the odd one out. The outer product functionality is potentially useful, but it should be called np.dot.outer, because logically it has the same relationship to np.dot that np.add.outer has to np.add, etc. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
On 22/02/14 22:15, Nathaniel Smith wrote: $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 You'll definitely want to disable the affinity support too, and probably memory warmup. And possibly increase the maximum thread count, unless you'll only use the library on the computer it was built on. And maybe other things. The OpenBLAS build process has so many ways to accidentally impale yourself, it's an object lesson in why building regulations are a good thing. Thanks for the advice. Right now I am just testing on my own computer. cblas_dgemm is running roughly 50 % faster with OpenBLAS than MKL 11.1 update 2, sometimes OpenBLAS is twice as fast as MKL. WTF??? :-D Ok, next runner up is Accelerate. Let's see how it compares to OpenBLAS and MKL on Mavericks. Sturla #include mach/mach.h #include mach/mach_time.h #include stdlib.h #include mkl.h double nanodiff(const uint64_t _t0, const uint64_t _t1) { long double t0, t1, numer, denom, nanosec; mach_timebase_info_data_t tb_info; mach_timebase_info(tb_info); numer = (long double)(tb_info.numer); denom = (long double)(tb_info.denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } int main(int argc, char **argv) { const int BOUNDARY = 64; long double nanosec; int n = 512; int m = n, k = n; double *A = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); double *B = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); double *C = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); uint64_t t0, t1; t0 = mach_absolute_time(); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, n, 1.0, C, n); t1 = mach_absolute_time(); nanosec = nanodiff(t0, t1); printf(elapsed time: %g ns\n, (double)nanosec); mkl_free(A); mkl_free(B); mkl_free(C); } #include mach/mach.h #include mach/mach_time.h #include stdlib.h #include cblas.h double nanodiff(const uint64_t _t0, const uint64_t _t1) { long double t0, t1, numer, denom, nanosec; mach_timebase_info_data_t tb_info; mach_timebase_info(tb_info); numer = (long double)(tb_info.numer); denom = (long double)(tb_info.denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } int main(int argc, char **argv) { long double nanosec; int n = 512; int m = n, k = n; double *A = (double*)malloc(n*n*sizeof(double)); double *B = (double*)malloc(n*n*sizeof(double)); double *C = (double*)malloc(n*n*sizeof(double)); uint64_t t0, t1; t0 = mach_absolute_time(); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, n, 1.0, C, n); t1 = mach_absolute_time(); nanosec = nanodiff(t0, t1); printf(elapsed time: %g ns\n, (double)nanosec); free(A); free(B); free(C); } ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
On Sat, Feb 22, 2014 at 5:17 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Feb 22, 2014 at 2:03 PM, Nathaniel Smith n...@pobox.com wrote: Hi all, Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. So here's one proposal. # CURRENT: dot(0d, any) - scalar multiplication dot(any, 0d) - scalar multiplication dot(1d, 1d) - inner product dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then discard added axis dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard added axis dot(2d, 2d) - matrix multiply dot(2-or-more d, 2-or-more d) - a complicated outer product thing: Specifically, if the inputs have shapes (r, n, m), (s, m, k), then numpy returns an array with shape (r, s, n, k), created like: for i in range(r): for j in range(s): output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) # PROPOSED: General rule: given dot on shape1, shape2, we try to match these shapes against two templates like (..., n?, m) and (..., m, k?) where ... indicates zero or more dimensions, and ? indicates an optional axis. ? axes are always matched before ... axes, so for an input with ndim=2, the ? axis is always matched. An unmatched ? axis is treated as having size 1. Next, the ... axes are broadcast against each other in the usual way (prepending 1s to make lengths the same, requiring corresponding entries to either match or have the value 1). And then the actual computations are performed using the usual broadcasting rules. Finally, we return an output with shape (..., n?, k?). Here ... indicates the result of broadcasting the input ...'s against each other. And, n? and k? mean: either the value taken from the input shape, if the corresponding entry was matched -- but if no match was made, then we leave this entry out. The idea is that just as a column vector on the right is m x 1, a 1d vector on the right is treated as m x nothing. For purposes of actually computing the product, nothing acts like 1, as mentioned above. But it makes a difference in what we return: in each of these cases we copy the input shape into the output, so we can get an output with shape (n, nothing), or (nothing, k), or (nothing, nothing), which work out to be (n,), (k,) and (), respectively. This gives a (somewhat) intuitive principle for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they are, and a general template for extending this behaviour to other operations like gufunc 'solve'. Anyway, the end result of this is that the PROPOSED behaviour differs from the current behaviour in the following ways: - passing 0d arrays to 'dot' becomes an error. (This in particular is an important thing to know, because if core Python adds an operator for 'dot', then we must decide what it should do for Python scalars, which are logically 0d.) - ndim2 arrays are now handled by aligning and broadcasting the extra axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) returns (r, m, k), not (r, r, m, k). I don't quite manage to follow that description for nd behavior. How do you figure out which axes to use for the cross-product dot((m,m,m), (m, m)) ? Doesn't numpy have a new gufunc that does this kind of vectorized/broadcasted dot product? I cannot find it right now. Comments? The discussion might become confusing in the conflation of: * backward incompatible changes to dot * coherent behavior to propose in a PEP Maybe we could concentrate on the second, on the basis it's likely to be a few years until it is available, and the work out compatibility if the PEP gets accepted. If A @ B means 'matrix multiply A, B' - then it would be a shame to raise an error if A or B is a scalar. Sympy, for example, will allow matrix multiplication by a scalar, MATLAB / Octave too. I also don't see a reason to disallow multiplication with a scalar (it's just the math), but I doubt we use it in statsmodels. I have used 2D dot calls in the past, maybe still do, I'm not sure. I've never found a use for dot with ndim 2 (nor tensordot), it never did what I needed. Josef Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OpenBLAS on Mac
On 22/02/14 23:39, Sturla Molden wrote: Ok, next runner up is Accelerate. Let's see how it compares to OpenBLAS and MKL on Mavericks. It seems Accelerate has roughly the same performance as MKL now. Did the upgrade to Mavericks do this? These are the compile lines, in case you wonder: $ CC -O2 -o perftest_openblas -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib perftest_openblas.c -lopenblas $ CC -O2 -o perftest_accelerate perftest_accelerate.c -framework Accelerate $ source /opt/intel/composer_xe_2013/mkl/bin/mklvars.sh intel64 $ icc -O2 -o perftest_mkl -mkl -static-intel perftest_mkl.c Sturla #include mach/mach.h #include mach/mach_time.h #include stdlib.h #include Accelerate/Accelerate.h double nanodiff(const uint64_t _t0, const uint64_t _t1) { long double t0, t1, numer, denom, nanosec; mach_timebase_info_data_t tb_info; mach_timebase_info(tb_info); numer = (long double)(tb_info.numer); denom = (long double)(tb_info.denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } int main(int argc, char **argv) { long double nanosec; int n = 512; int m = n, k = n; double *A = (double*)malloc(n*n*sizeof(double)); double *B = (double*)malloc(n*n*sizeof(double)); double *C = (double*)malloc(n*n*sizeof(double)); uint64_t t0, t1; t0 = mach_absolute_time(); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, n, 1.0, C, n); t1 = mach_absolute_time(); nanosec = nanodiff(t0, t1); printf(elapsed time: %g ns\n, (double)nanosec); free(A); free(B); free(C); } ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
It would help me follow this discussion if it were broken up into pieces: - what is being asserted as first principles for `dot` (e.g., what mathematical understanding)? - to what extent do other important implementations (e.g., Mathematica and Julia) deviate from the proposed mathematical understanding? - were the motivations for any deviations adequate (e.g., supported by strong use cases)? An example is the discussion of whether scalar multiplication of a matrix should be represented by * or by a new operator (e.g., @). I am personally most comfortable with the idea that a new matrix multiplication operator would not handle scalar multiplication or violate tensor product rules (perhaps I am influenced by Mathematica), but I am not prepared to argue the principles of such choices, and would appreciate hearing from those who are. Thanks, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
23.02.2014 00:03, Nathaniel Smith kirjoitti: Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. I'm not sure it's necessary to say much about this in the PEP. It should in my view concentrate on arguing why the new binop is needed in the Python language, and for that, restricting to 2D is good enough IMHO. How exactly Numpy makes use of the capability for 2-dim arrays is something that should definitely be discussed. But I think this is a problem mainly interesting for Numpy devs, and not for CPython devs. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
Hi, On Sat, Feb 22, 2014 at 4:09 PM, Pauli Virtanen p...@iki.fi wrote: 23.02.2014 00:03, Nathaniel Smith kirjoitti: Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. I'm not sure it's necessary to say much about this in the PEP. It should in my view concentrate on arguing why the new binop is needed in the Python language, and for that, restricting to 2D is good enough IMHO. Yes, I was thinking the same. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion