Re: [Numpy-discussion] How exactly ought 'dot' to work?
On 24 February 2014 05:21, Nathaniel Smith n...@pobox.com wrote: I've tentatively rewritten the first section of the PEP to try and accomplish this framing: https://github.com/njsmith/numpy/blob/matmul-pep/doc/neps/return-of-revenge-of-matmul-pep.rst Comments welcome etc. I've not been following the discussion about this but just out of interest is there no interest in Matlab style left- and right- matrix division? Personally I always found the idea of matrix division confusing and dislike teaching it to students but it does seem to be popular among Matlab users. Oscar ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
23.02.2014 00:03, Nathaniel Smith kirjoitti: Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. On Sat, Feb 22, 2014 at 7:09 PM, Pauli Virtanen wrote: I'm not sure it's necessary to say much about this in the PEP. It should in my view concentrate on arguing why the new binop is needed in the Python language, and for that, restricting to 2D is good enough IMHO. How exactly Numpy makes use of the capability for 2-dim arrays is something that should definitely be discussed. But I think this is a problem mainly interesting for Numpy devs, and not for CPython devs. On 2/24/2014 12:21 AM, Nathaniel Smith wrote: I actually disagree strongly. I think it's very important to make clear that we have a clear, well thought through, and cross-project approach to what @ is supposed to mean I think Paul is right. We know `@` is supposed to mean matrix multiply when dealing with conformable 2d arrays. That is the real motivation of the PEP. I cannot see why the PEP itself would need to go beyond that. The behavior of `@` in other cases seems a discussion that should go *much* slower than that of the core of the PEP, which is to get an operator for matrix multiplication. Furthermore, I am not able to understand the principles behind the discussion of how `@` should behave in other cases. I do not think they are being clearly stated. (I have added a comment to the PEP asking for clarification.) To be concrete, if `@` is proposed to behave unlike Mathematica's `Dot` command, I would hope to hear a very clear mathematical motivation for this. (Specifically, I do not understand why `@` would do scalar product.) Otoh, if the proposal is just that `@` should behave just like NumPy's `dot` does, that should be simply stated. Cheers, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
On Sat, Feb 22, 2014 at 7:09 PM, Pauli Virtanen p...@iki.fi wrote: 23.02.2014 00:03, Nathaniel Smith kirjoitti: Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. I'm not sure it's necessary to say much about this in the PEP. It should in my view concentrate on arguing why the new binop is needed in the Python language, and for that, restricting to 2D is good enough IMHO. How exactly Numpy makes use of the capability for 2-dim arrays is something that should definitely be discussed. But I think this is a problem mainly interesting for Numpy devs, and not for CPython devs. I actually disagree strongly. I think it's very important to make clear that we have a clear, well thought through, and cross-project approach to what @ is supposed to mean, so that this doesn't come across as numpy asking python-dev for a blank check to go and define the de facto semantics of a new operator just for ourselves and without any adult supervision. I just don't think they trust us that much. (Honestly I probably wouldn't either in their place). It's true that the higher-dim cases aren't the most central ones, but it can't hurt to document all the details. I've tentatively rewritten the first section of the PEP to try and accomplish this framing: https://github.com/njsmith/numpy/blob/matmul-pep/doc/neps/return-of-revenge-of-matmul-pep.rst Comments welcome etc. Also BTW in the process I discovered another reason why broadcasting is better than the outer product semantics -- with broadcasting, writing down matrix power for 2d is trivial and natural, but with the outer product semantics, it's practically impossible! -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
Hi, On Sat, Feb 22, 2014 at 2:03 PM, Nathaniel Smith n...@pobox.com wrote: Hi all, Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. So here's one proposal. # CURRENT: dot(0d, any) - scalar multiplication dot(any, 0d) - scalar multiplication dot(1d, 1d) - inner product dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then discard added axis dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard added axis dot(2d, 2d) - matrix multiply dot(2-or-more d, 2-or-more d) - a complicated outer product thing: Specifically, if the inputs have shapes (r, n, m), (s, m, k), then numpy returns an array with shape (r, s, n, k), created like: for i in range(r): for j in range(s): output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) # PROPOSED: General rule: given dot on shape1, shape2, we try to match these shapes against two templates like (..., n?, m) and (..., m, k?) where ... indicates zero or more dimensions, and ? indicates an optional axis. ? axes are always matched before ... axes, so for an input with ndim=2, the ? axis is always matched. An unmatched ? axis is treated as having size 1. Next, the ... axes are broadcast against each other in the usual way (prepending 1s to make lengths the same, requiring corresponding entries to either match or have the value 1). And then the actual computations are performed using the usual broadcasting rules. Finally, we return an output with shape (..., n?, k?). Here ... indicates the result of broadcasting the input ...'s against each other. And, n? and k? mean: either the value taken from the input shape, if the corresponding entry was matched -- but if no match was made, then we leave this entry out. The idea is that just as a column vector on the right is m x 1, a 1d vector on the right is treated as m x nothing. For purposes of actually computing the product, nothing acts like 1, as mentioned above. But it makes a difference in what we return: in each of these cases we copy the input shape into the output, so we can get an output with shape (n, nothing), or (nothing, k), or (nothing, nothing), which work out to be (n,), (k,) and (), respectively. This gives a (somewhat) intuitive principle for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they are, and a general template for extending this behaviour to other operations like gufunc 'solve'. Anyway, the end result of this is that the PROPOSED behaviour differs from the current behaviour in the following ways: - passing 0d arrays to 'dot' becomes an error. (This in particular is an important thing to know, because if core Python adds an operator for 'dot', then we must decide what it should do for Python scalars, which are logically 0d.) - ndim2 arrays are now handled by aligning and broadcasting the extra axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) returns (r, m, k), not (r, r, m, k). Comments? The discussion might become confusing in the conflation of: * backward incompatible changes to dot * coherent behavior to propose in a PEP Maybe we could concentrate on the second, on the basis it's likely to be a few years until it is available, and the work out compatibility if the PEP gets accepted. If A @ B means 'matrix multiply A, B' - then it would be a shame to raise an error if A or B is a scalar. Sympy, for example, will allow matrix multiplication by a scalar, MATLAB / Octave too. I have used 2D dot calls in the past, maybe still do, I'm not sure. Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
On Feb 22, 2014 2:03 PM, Nathaniel Smith n...@pobox.com wrote: Hi all, Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. So here's one proposal. # CURRENT: dot(0d, any) - scalar multiplication dot(any, 0d) - scalar multiplication dot(1d, 1d) - inner product dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then discard added axis dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard added axis dot(2d, 2d) - matrix multiply dot(2-or-more d, 2-or-more d) - a complicated outer product thing: Specifically, if the inputs have shapes (r, n, m), (s, m, k), then numpy returns an array with shape (r, s, n, k), created like: for i in range(r): for j in range(s): output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) # PROPOSED: General rule: given dot on shape1, shape2, we try to match these shapes against two templates like (..., n?, m) and (..., m, k?) where ... indicates zero or more dimensions, and ? indicates an optional axis. ? axes are always matched before ... axes, so for an input with ndim=2, the ? axis is always matched. An unmatched ? axis is treated as having size 1. Next, the ... axes are broadcast against each other in the usual way (prepending 1s to make lengths the same, requiring corresponding entries to either match or have the value 1). And then the actual computations are performed using the usual broadcasting rules. Finally, we return an output with shape (..., n?, k?). Here ... indicates the result of broadcasting the input ...'s against each other. And, n? and k? mean: either the value taken from the input shape, if the corresponding entry was matched -- but if no match was made, then we leave this entry out. The idea is that just as a column vector on the right is m x 1, a 1d vector on the right is treated as m x nothing. For purposes of actually computing the product, nothing acts like 1, as mentioned above. But it makes a difference in what we return: in each of these cases we copy the input shape into the output, so we can get an output with shape (n, nothing), or (nothing, k), or (nothing, nothing), which work out to be (n,), (k,) and (), respectively. This gives a (somewhat) intuitive principle for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they are, and a general template for extending this behaviour to other operations like gufunc 'solve'. Anyway, the end result of this is that the PROPOSED behaviour differs from the current behaviour in the following ways: - passing 0d arrays to 'dot' becomes an error. (This in particular is an important thing to know, because if core Python adds an operator for 'dot', then we must decide what it should do for Python scalars, which are logically 0d.) - ndim2 arrays are now handled by aligning and broadcasting the extra axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) returns (r, m, k), not (r, r, m, k). Comments? The proposed behavior for ndim 2 is what matrix_multiply (is it still in umath_tests?) does. The nice thing of the proposed new behavior is that the old behavior is easy to reproduce by fooling a little around with the shape of the first argument, while the opposite is not true. Jaime -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
On Sat, Feb 22, 2014 at 5:17 PM, Matthew Brett matthew.br...@gmail.com wrote: The discussion might become confusing in the conflation of: * backward incompatible changes to dot * coherent behavior to propose in a PEP Right, I definitely am asking about how we think the ideal dot operator should work. The migration strategy to get to there from here is a separate question, that will raise a bunch of details that are a distraction from the fundamental question of what we *want*. If A @ B means 'matrix multiply A, B' - then it would be a shame to raise an error if A or B is a scalar. Sympy, for example, will allow matrix multiplication by a scalar, MATLAB / Octave too. Interesting! We do disagree on this then. I feel strongly that given separate matrix product and elementwise product operators @ and *, then 'scalar @ matrix' should be an error, and if you want scalar (elementwise) product then you should write 'scalar * matrix'. Sympy, MATLAB, Octave are not really good guides, because either they have only a single operator available (Sympy with *, np.matrix with *), or they have an alternative operator available but it's annoying to type and rarely used (MATLAB/Octave with .*). For us, the two operators are both first-class, and we've all decided that the scalar/elementwise operator is actually the more important and commonly used one, and matrix multiply is the unusual case (regardless of whether we end up spelling it 'dot' or '@'). So why would we want a special case for scalars in dot/@? And anyway, TOOWTDI. Notation like 'L * X @ Y' really makes it immediately clear what sort of operations we're dealing with, too. I have used 2D dot calls in the past, maybe still do, I'm not sure. Have you used dot(2D, 2D)? That's the case that this proposal would change -- dot(2D, =2D) is the same under both the outer product and broadcasting proposals. I feel pretty strongly that the broadcasting proposal is the right thing, for consistency with other operators -- e.g., all ufuncs and all functions in np.linalg already do broadcasting, so 'dot' is currently really the odd one out. The outer product functionality is potentially useful, but it should be called np.dot.outer, because logically it has the same relationship to np.dot that np.add.outer has to np.add, etc. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
On Sat, Feb 22, 2014 at 5:17 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Feb 22, 2014 at 2:03 PM, Nathaniel Smith n...@pobox.com wrote: Hi all, Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. So here's one proposal. # CURRENT: dot(0d, any) - scalar multiplication dot(any, 0d) - scalar multiplication dot(1d, 1d) - inner product dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then discard added axis dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard added axis dot(2d, 2d) - matrix multiply dot(2-or-more d, 2-or-more d) - a complicated outer product thing: Specifically, if the inputs have shapes (r, n, m), (s, m, k), then numpy returns an array with shape (r, s, n, k), created like: for i in range(r): for j in range(s): output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) # PROPOSED: General rule: given dot on shape1, shape2, we try to match these shapes against two templates like (..., n?, m) and (..., m, k?) where ... indicates zero or more dimensions, and ? indicates an optional axis. ? axes are always matched before ... axes, so for an input with ndim=2, the ? axis is always matched. An unmatched ? axis is treated as having size 1. Next, the ... axes are broadcast against each other in the usual way (prepending 1s to make lengths the same, requiring corresponding entries to either match or have the value 1). And then the actual computations are performed using the usual broadcasting rules. Finally, we return an output with shape (..., n?, k?). Here ... indicates the result of broadcasting the input ...'s against each other. And, n? and k? mean: either the value taken from the input shape, if the corresponding entry was matched -- but if no match was made, then we leave this entry out. The idea is that just as a column vector on the right is m x 1, a 1d vector on the right is treated as m x nothing. For purposes of actually computing the product, nothing acts like 1, as mentioned above. But it makes a difference in what we return: in each of these cases we copy the input shape into the output, so we can get an output with shape (n, nothing), or (nothing, k), or (nothing, nothing), which work out to be (n,), (k,) and (), respectively. This gives a (somewhat) intuitive principle for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they are, and a general template for extending this behaviour to other operations like gufunc 'solve'. Anyway, the end result of this is that the PROPOSED behaviour differs from the current behaviour in the following ways: - passing 0d arrays to 'dot' becomes an error. (This in particular is an important thing to know, because if core Python adds an operator for 'dot', then we must decide what it should do for Python scalars, which are logically 0d.) - ndim2 arrays are now handled by aligning and broadcasting the extra axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) returns (r, m, k), not (r, r, m, k). I don't quite manage to follow that description for nd behavior. How do you figure out which axes to use for the cross-product dot((m,m,m), (m, m)) ? Doesn't numpy have a new gufunc that does this kind of vectorized/broadcasted dot product? I cannot find it right now. Comments? The discussion might become confusing in the conflation of: * backward incompatible changes to dot * coherent behavior to propose in a PEP Maybe we could concentrate on the second, on the basis it's likely to be a few years until it is available, and the work out compatibility if the PEP gets accepted. If A @ B means 'matrix multiply A, B' - then it would be a shame to raise an error if A or B is a scalar. Sympy, for example, will allow matrix multiplication by a scalar, MATLAB / Octave too. I also don't see a reason to disallow multiplication with a scalar (it's just the math), but I doubt we use it in statsmodels. I have used 2D dot calls in the past, maybe still do, I'm not sure. I've never found a use for dot with ndim 2 (nor tensordot), it never did what I needed. Josef Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
It would help me follow this discussion if it were broken up into pieces: - what is being asserted as first principles for `dot` (e.g., what mathematical understanding)? - to what extent do other important implementations (e.g., Mathematica and Julia) deviate from the proposed mathematical understanding? - were the motivations for any deviations adequate (e.g., supported by strong use cases)? An example is the discussion of whether scalar multiplication of a matrix should be represented by * or by a new operator (e.g., @). I am personally most comfortable with the idea that a new matrix multiplication operator would not handle scalar multiplication or violate tensor product rules (perhaps I am influenced by Mathematica), but I am not prepared to argue the principles of such choices, and would appreciate hearing from those who are. Thanks, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
23.02.2014 00:03, Nathaniel Smith kirjoitti: Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. I'm not sure it's necessary to say much about this in the PEP. It should in my view concentrate on arguing why the new binop is needed in the Python language, and for that, restricting to 2D is good enough IMHO. How exactly Numpy makes use of the capability for 2-dim arrays is something that should definitely be discussed. But I think this is a problem mainly interesting for Numpy devs, and not for CPython devs. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How exactly ought 'dot' to work?
Hi, On Sat, Feb 22, 2014 at 4:09 PM, Pauli Virtanen p...@iki.fi wrote: 23.02.2014 00:03, Nathaniel Smith kirjoitti: Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. I'm not sure it's necessary to say much about this in the PEP. It should in my view concentrate on arguing why the new binop is needed in the Python language, and for that, restricting to 2D is good enough IMHO. Yes, I was thinking the same. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion