Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-24 Thread Oscar Benjamin
On 24 February 2014 05:21, Nathaniel Smith n...@pobox.com wrote:

 I've tentatively rewritten the first section of the PEP to try and
 accomplish this framing:

 https://github.com/njsmith/numpy/blob/matmul-pep/doc/neps/return-of-revenge-of-matmul-pep.rst
 Comments welcome etc.

I've not been following the discussion about this but just out of
interest is there no interest in Matlab style left- and right- matrix
division?

Personally I always found the idea of matrix division confusing and
dislike teaching it to students but it does seem to be popular among
Matlab users.


Oscar
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-24 Thread Alan G Isaac
 23.02.2014 00:03, Nathaniel Smith kirjoitti:
 Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In
 practice this doesn't usually matter much, because these are very
 rarely used. But, I would like to nail down the behaviour so we can
 say something precise in the matrix multiplication PEP.


 On Sat, Feb 22, 2014 at 7:09 PM, Pauli Virtanen wrote:
 I'm not sure it's necessary to say much about this in the PEP. It should
 in my view concentrate on arguing why the new binop is needed in the
 Python language, and for that, restricting to 2D is good enough IMHO.

 How exactly Numpy makes use of the capability for  2-dim arrays is
 something that should definitely be discussed.

 But I think this is a problem mainly interesting for Numpy devs, and not
 for CPython devs.


On 2/24/2014 12:21 AM, Nathaniel Smith wrote:
 I actually disagree strongly. I think it's very important to make
 clear that we have a clear, well thought through, and
 cross-project approach to what @ is supposed to mean


I think Paul is right.  We know `@` is supposed to mean
matrix multiply when dealing with conformable 2d arrays.
That is the real motivation of the PEP.  I cannot see why
the PEP itself would need to go beyond that.  The behavior
of `@` in other cases seems a discussion that should go
*much* slower than that of the core of the PEP, which is
to get an operator for matrix multiplication.

Furthermore, I am not able to understand the principles
behind the discussion of how `@` should behave in other cases.
I do not think they are being clearly stated.  (I have added
a comment to the PEP asking for clarification.)  To be
concrete, if `@` is proposed to behave unlike Mathematica's
`Dot` command, I would hope to hear a very clear
mathematical motivation for this.  (Specifically, I do not
understand why `@` would do scalar product.)

Otoh, if the proposal is just that `@` should behave just like
NumPy's `dot` does, that should be simply stated.

Cheers,
Alan Isaac

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-23 Thread Nathaniel Smith
On Sat, Feb 22, 2014 at 7:09 PM, Pauli Virtanen p...@iki.fi wrote:
 23.02.2014 00:03, Nathaniel Smith kirjoitti:
 Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In
 practice this doesn't usually matter much, because these are very
 rarely used. But, I would like to nail down the behaviour so we can
 say something precise in the matrix multiplication PEP.

 I'm not sure it's necessary to say much about this in the PEP. It should
 in my view concentrate on arguing why the new binop is needed in the
 Python language, and for that, restricting to 2D is good enough IMHO.

 How exactly Numpy makes use of the capability for  2-dim arrays is
 something that should definitely be discussed.

 But I think this is a problem mainly interesting for Numpy devs, and not
 for CPython devs.

I actually disagree strongly. I think it's very important to make
clear that we have a clear, well thought through, and cross-project
approach to what @ is supposed to mean, so that this doesn't come
across as numpy asking python-dev for a blank check to go and define
the de facto semantics of a new operator just for ourselves and
without any adult supervision. I just don't think they trust us that
much. (Honestly I probably wouldn't either in their place). It's true
that the higher-dim cases aren't the most central ones, but it can't
hurt to document all the details.

I've tentatively rewritten the first section of the PEP to try and
accomplish this framing:
   
https://github.com/njsmith/numpy/blob/matmul-pep/doc/neps/return-of-revenge-of-matmul-pep.rst
Comments welcome etc.

Also BTW in the process I discovered another reason why broadcasting
is better than the outer product semantics -- with broadcasting,
writing down matrix power for 2d is trivial and natural, but with the
outer product semantics, it's practically impossible!

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-22 Thread Matthew Brett
Hi,

On Sat, Feb 22, 2014 at 2:03 PM, Nathaniel Smith n...@pobox.com wrote:
 Hi all,

 Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In
 practice this doesn't usually matter much, because these are very
 rarely used. But, I would like to nail down the behaviour so we can
 say something precise in the matrix multiplication PEP. So here's one
 proposal.

 # CURRENT:

 dot(0d, any) - scalar multiplication
 dot(any, 0d) - scalar multiplication
 dot(1d, 1d) - inner product
 dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then
 discard added axis
 dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard added 
 axis
 dot(2d, 2d) - matrix multiply
 dot(2-or-more d, 2-or-more d) - a complicated outer product thing:
 Specifically, if the inputs have shapes (r, n, m), (s, m, k), then
 numpy returns an array with shape (r, s, n, k), created like:
 for i in range(r):
 for j in range(s):
 output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :])

 # PROPOSED:

 General rule: given dot on shape1, shape2, we try to match these
 shapes against two templates like
   (..., n?, m) and (..., m, k?)
 where ... indicates zero or more dimensions, and ? indicates an
 optional axis. ? axes are always matched before ... axes, so for an
 input with ndim=2, the ? axis is always matched. An unmatched ? axis
 is treated as having size 1.

 Next, the ... axes are broadcast against each other in the usual way
 (prepending 1s to make lengths the same, requiring corresponding
 entries to either match or have the value 1).  And then the actual
 computations are performed using the usual broadcasting rules.

 Finally, we return an output with shape (..., n?, k?). Here ...
 indicates the result of broadcasting the input ...'s against each
 other. And, n? and k? mean: either the value taken from the input
 shape, if the corresponding entry was matched -- but if no match was
 made, then we leave this entry out. The idea is that just as a column
 vector on the right is m x 1, a 1d vector on the right is treated as
 m x nothing. For purposes of actually computing the product,
 nothing acts like 1, as mentioned above. But it makes a difference
 in what we return: in each of these cases we copy the input shape into
 the output, so we can get an output with shape (n, nothing), or
 (nothing, k), or (nothing, nothing), which work out to be (n,),
 (k,) and (), respectively. This gives a (somewhat) intuitive principle
 for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they
 are, and a general template for extending this behaviour to other
 operations like gufunc 'solve'.

 Anyway, the end result of this is that the PROPOSED behaviour differs
 from the current behaviour in the following ways:
 - passing 0d arrays to 'dot' becomes an error. (This in particular is
 an important thing to know, because if core Python adds an operator
 for 'dot', then we must decide what it should do for Python scalars,
 which are logically 0d.)
 - ndim2 arrays are now handled by aligning and broadcasting the extra
 axes, instead of taking an outer product. So dot((r, m, n), (r, n, k))
 returns (r, m, k), not (r, r, m, k).

 Comments?

The discussion might become confusing in the conflation of:

* backward incompatible changes to dot
* coherent behavior to propose in a PEP

Maybe we could concentrate on the second, on the basis it's likely to
be a few years until it is available, and the work out compatibility
if the PEP gets accepted.

If A @ B means 'matrix multiply A, B' - then it would be a shame to
raise an error if A or B is a scalar.

Sympy, for example, will allow matrix multiplication by a scalar,
MATLAB / Octave too.

I have used  2D dot calls in the past, maybe still do, I'm not sure.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-22 Thread Jaime Fernández del Río
On Feb 22, 2014 2:03 PM, Nathaniel Smith n...@pobox.com wrote:

 Hi all,

 Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In
 practice this doesn't usually matter much, because these are very
 rarely used. But, I would like to nail down the behaviour so we can
 say something precise in the matrix multiplication PEP. So here's one
 proposal.

 # CURRENT:

 dot(0d, any) - scalar multiplication
 dot(any, 0d) - scalar multiplication
 dot(1d, 1d) - inner product
 dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then
 discard added axis
 dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard
added axis
 dot(2d, 2d) - matrix multiply
 dot(2-or-more d, 2-or-more d) - a complicated outer product thing:
 Specifically, if the inputs have shapes (r, n, m), (s, m, k), then
 numpy returns an array with shape (r, s, n, k), created like:
 for i in range(r):
 for j in range(s):
 output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :])

 # PROPOSED:

 General rule: given dot on shape1, shape2, we try to match these
 shapes against two templates like
   (..., n?, m) and (..., m, k?)
 where ... indicates zero or more dimensions, and ? indicates an
 optional axis. ? axes are always matched before ... axes, so for an
 input with ndim=2, the ? axis is always matched. An unmatched ? axis
 is treated as having size 1.

 Next, the ... axes are broadcast against each other in the usual way
 (prepending 1s to make lengths the same, requiring corresponding
 entries to either match or have the value 1).  And then the actual
 computations are performed using the usual broadcasting rules.

 Finally, we return an output with shape (..., n?, k?). Here ...
 indicates the result of broadcasting the input ...'s against each
 other. And, n? and k? mean: either the value taken from the input
 shape, if the corresponding entry was matched -- but if no match was
 made, then we leave this entry out. The idea is that just as a column
 vector on the right is m x 1, a 1d vector on the right is treated as
 m x nothing. For purposes of actually computing the product,
 nothing acts like 1, as mentioned above. But it makes a difference
 in what we return: in each of these cases we copy the input shape into
 the output, so we can get an output with shape (n, nothing), or
 (nothing, k), or (nothing, nothing), which work out to be (n,),
 (k,) and (), respectively. This gives a (somewhat) intuitive principle
 for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they
 are, and a general template for extending this behaviour to other
 operations like gufunc 'solve'.

 Anyway, the end result of this is that the PROPOSED behaviour differs
 from the current behaviour in the following ways:
 - passing 0d arrays to 'dot' becomes an error. (This in particular is
 an important thing to know, because if core Python adds an operator
 for 'dot', then we must decide what it should do for Python scalars,
 which are logically 0d.)
 - ndim2 arrays are now handled by aligning and broadcasting the extra
 axes, instead of taking an outer product. So dot((r, m, n), (r, n, k))
 returns (r, m, k), not (r, r, m, k).

 Comments?

The proposed behavior for ndim  2 is what matrix_multiply (is it still in
umath_tests?) does. The nice thing of the proposed new behavior is that the
old behavior is easy to reproduce by fooling a little around with the shape
of the first argument, while the opposite is not true.

Jaime


 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-22 Thread Nathaniel Smith
On Sat, Feb 22, 2014 at 5:17 PM, Matthew Brett matthew.br...@gmail.com wrote:
 The discussion might become confusing in the conflation of:

 * backward incompatible changes to dot
 * coherent behavior to propose in a PEP

Right, I definitely am asking about how we think the ideal dot
operator should work. The migration strategy to get to there from here
is a separate question, that will raise a bunch of details that are a
distraction from the fundamental question of what we *want*.

 If A @ B means 'matrix multiply A, B' - then it would be a shame to
 raise an error if A or B is a scalar.

 Sympy, for example, will allow matrix multiplication by a scalar,
 MATLAB / Octave too.

Interesting! We do disagree on this then. I feel strongly that given
separate matrix product and elementwise product operators @ and *,
then 'scalar @ matrix' should be an error, and if you want scalar
(elementwise) product then you should write 'scalar * matrix'.

Sympy, MATLAB, Octave are not really good guides, because either they
have only a single operator available (Sympy with *, np.matrix with
*), or they have an alternative operator available but it's annoying
to type and rarely used (MATLAB/Octave with .*).

For us, the two operators are both first-class, and we've all decided
that the scalar/elementwise operator is actually the more important
and commonly used one, and matrix multiply is the unusual case
(regardless of whether we end up spelling it 'dot' or '@'). So why
would we want a special case for scalars in dot/@? And anyway,
TOOWTDI.

Notation like 'L * X @ Y' really makes it immediately clear what sort
of operations we're dealing with, too.

 I have used  2D dot calls in the past, maybe still do, I'm not sure.

Have you used dot(2D, 2D)? That's the case that this proposal would
change -- dot(2D, =2D) is the same under both the outer product and
broadcasting proposals.

I feel pretty strongly that the broadcasting proposal is the right
thing, for consistency with other operators -- e.g., all ufuncs and
all functions in np.linalg already do broadcasting, so 'dot' is
currently really the odd one out. The outer product functionality is
potentially useful, but it should be called np.dot.outer, because
logically it has the same relationship to np.dot that np.add.outer has
to np.add, etc.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-22 Thread josef . pktd
On Sat, Feb 22, 2014 at 5:17 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Sat, Feb 22, 2014 at 2:03 PM, Nathaniel Smith n...@pobox.com wrote:
 Hi all,

 Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In
 practice this doesn't usually matter much, because these are very
 rarely used. But, I would like to nail down the behaviour so we can
 say something precise in the matrix multiplication PEP. So here's one
 proposal.

 # CURRENT:

 dot(0d, any) - scalar multiplication
 dot(any, 0d) - scalar multiplication
 dot(1d, 1d) - inner product
 dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then
 discard added axis
 dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard added 
 axis
 dot(2d, 2d) - matrix multiply
 dot(2-or-more d, 2-or-more d) - a complicated outer product thing:
 Specifically, if the inputs have shapes (r, n, m), (s, m, k), then
 numpy returns an array with shape (r, s, n, k), created like:
 for i in range(r):
 for j in range(s):
 output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :])

 # PROPOSED:

 General rule: given dot on shape1, shape2, we try to match these
 shapes against two templates like
   (..., n?, m) and (..., m, k?)
 where ... indicates zero or more dimensions, and ? indicates an
 optional axis. ? axes are always matched before ... axes, so for an
 input with ndim=2, the ? axis is always matched. An unmatched ? axis
 is treated as having size 1.

 Next, the ... axes are broadcast against each other in the usual way
 (prepending 1s to make lengths the same, requiring corresponding
 entries to either match or have the value 1).  And then the actual
 computations are performed using the usual broadcasting rules.

 Finally, we return an output with shape (..., n?, k?). Here ...
 indicates the result of broadcasting the input ...'s against each
 other. And, n? and k? mean: either the value taken from the input
 shape, if the corresponding entry was matched -- but if no match was
 made, then we leave this entry out. The idea is that just as a column
 vector on the right is m x 1, a 1d vector on the right is treated as
 m x nothing. For purposes of actually computing the product,
 nothing acts like 1, as mentioned above. But it makes a difference
 in what we return: in each of these cases we copy the input shape into
 the output, so we can get an output with shape (n, nothing), or
 (nothing, k), or (nothing, nothing), which work out to be (n,),
 (k,) and (), respectively. This gives a (somewhat) intuitive principle
 for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they
 are, and a general template for extending this behaviour to other
 operations like gufunc 'solve'.

 Anyway, the end result of this is that the PROPOSED behaviour differs
 from the current behaviour in the following ways:
 - passing 0d arrays to 'dot' becomes an error. (This in particular is
 an important thing to know, because if core Python adds an operator
 for 'dot', then we must decide what it should do for Python scalars,
 which are logically 0d.)
 - ndim2 arrays are now handled by aligning and broadcasting the extra
 axes, instead of taking an outer product. So dot((r, m, n), (r, n, k))
 returns (r, m, k), not (r, r, m, k).

I don't quite manage to follow that description for nd behavior.
How do you figure out which axes to use for the cross-product

dot((m,m,m), (m, m)) ?

Doesn't numpy have a new gufunc that does this kind of
vectorized/broadcasted dot product?
I cannot find it right now.


 Comments?

 The discussion might become confusing in the conflation of:

 * backward incompatible changes to dot
 * coherent behavior to propose in a PEP

 Maybe we could concentrate on the second, on the basis it's likely to
 be a few years until it is available, and the work out compatibility
 if the PEP gets accepted.

 If A @ B means 'matrix multiply A, B' - then it would be a shame to
 raise an error if A or B is a scalar.

 Sympy, for example, will allow matrix multiplication by a scalar,
 MATLAB / Octave too.

I also don't see a reason to disallow multiplication with a scalar
(it's just the math), but I doubt we use it in statsmodels.


 I have used  2D dot calls in the past, maybe still do, I'm not sure.

I've never found a use for dot with ndim  2 (nor tensordot), it never
did what I needed.

Josef


 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-22 Thread Alan G Isaac
It would help me follow this discussion if it were broken up
into pieces:

- what is being asserted as first principles for `dot`
   (e.g., what mathematical understanding)?
- to what extent do other important implementations
   (e.g., Mathematica and Julia) deviate from the
   proposed mathematical understanding?
- were the motivations for any deviations adequate
   (e.g., supported by strong use cases)?

An example is the discussion of whether scalar
multiplication of a matrix should be represented
by * or by a new operator (e.g., @).  I am
personally most comfortable with the idea that
a new matrix multiplication operator would not handle
scalar multiplication or violate tensor product rules
(perhaps I am influenced by Mathematica), but I am not
prepared to argue the principles of such choices, and
would appreciate hearing from those who are.

Thanks,
Alan Isaac


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-22 Thread Pauli Virtanen
23.02.2014 00:03, Nathaniel Smith kirjoitti:
 Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In
 practice this doesn't usually matter much, because these are very
 rarely used. But, I would like to nail down the behaviour so we can
 say something precise in the matrix multiplication PEP. 

I'm not sure it's necessary to say much about this in the PEP. It should
in my view concentrate on arguing why the new binop is needed in the
Python language, and for that, restricting to 2D is good enough IMHO.

How exactly Numpy makes use of the capability for  2-dim arrays is
something that should definitely be discussed.

But I think this is a problem mainly interesting for Numpy devs, and not
for CPython devs.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-22 Thread Matthew Brett
Hi,

On Sat, Feb 22, 2014 at 4:09 PM, Pauli Virtanen p...@iki.fi wrote:
 23.02.2014 00:03, Nathaniel Smith kirjoitti:
 Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In
 practice this doesn't usually matter much, because these are very
 rarely used. But, I would like to nail down the behaviour so we can
 say something precise in the matrix multiplication PEP.

 I'm not sure it's necessary to say much about this in the PEP. It should
 in my view concentrate on arguing why the new binop is needed in the
 Python language, and for that, restricting to 2D is good enough IMHO.

Yes, I was thinking the same.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion