Hi all,

Let me preface my two cents by saying that I think the best part of @ being 
accepted is the potential for deprecating the matrix class — the syntactic 
beauty of infix for matrix multiply is a nice side effect IMHO :) This may be 
why my basic attitude is:

I don’t think it matters very much but I would vote (weakly) for weak-right. 
Where there is ambiguity, I suspect most practitioners will just put in 
parentheses anyway — especially with combinations of * and @, where I don’t 
think there is a natural intuitive precedence relationship. At least, 
element-wise multiplication is very rare in math/physics texts as an explicitly 
defined elementary operation so I’d be surprised if anybody had a strong 
intuition about the precedence of the ‘*’ operator. And the binding order 
doesn’t matter if it is scalar multiplication.

I have quite a bit of code with large matrices where the order of matrix-vector 
multiplies is an important optimization and I would certainly have a few 
simpler looking expressions for op @ op @ vec, hence the weak preference for 
right-associativity. That said, I routinely come across situations where the 
optimal matrix multiplication order is more complicated than can be expressed 
as left-right or right-left (because some matrices might be diagonal, CSR or 
CSC), which is why the preference is only weak. I don’t see a down-side in the 
use-case that it is actually associative (as in matrix-matrix-vector). 

Best, Chris



-- 
Chris Laumann
Sent with Airmail

On March 14, 2014 at 8:42:00 PM, Nathaniel Smith (n...@pobox.com) wrote:

Hi all,

Here's the main blocker for adding a matrix multiply operator '@' to Python: we 
need to decide what we think its precedence and associativity should be. I'll 
explain what that means so we're on the same page, and what the choices are, 
and then we can all argue about it. But even better would be if we could get 
some data to guide our decision, and this would be a lot easier if some of you 
all can help; I'll suggest some ways you might be able to do that.

So! Precedence and left- versus right-associativity. If you already know what 
these are you can skim down until you see CAPITAL LETTERS.

We all know what precedence is. Code like this:
  a + b * c
gets evaluated as:
  a + (b * c)
because * has higher precedence than +. It "binds more tightly", as they say. 
Python's complete precedence able is here:
  http://docs.python.org/3/reference/expressions.html#operator-precedence

Associativity, in the parsing sense, is less well known, though it's just as 
important. It's about deciding how to evaluate code like this:
  a * b * c
Do we use
  a * (b * c)    # * is "right associative"
or
  (a * b) * c    # * is "left associative"
? Here all the operators have the same precedence (because, uh... they're the 
same operator), so precedence doesn't help. And mostly we can ignore this in 
day-to-day life, because both versions give the same answer, so who cares. But 
a programming language has to pick one (consider what happens if one of those 
objects has a non-default __mul__ implementation). And of course it matters a 
lot for non-associative operations like
  a - b - c
or
  a / b / c
So when figuring out order of evaluations, what you do first is check the 
precedence, and then if you have multiple operators next to each other with the 
same precedence, you check their associativity. Notice that this means that if 
you have different operators that share the same precedence level (like + and 
-, or * and /), then they have to all have the same associativity. All else 
being equal, it's generally considered nice to have fewer precedence levels, 
because these have to be memorized by users.

Right now in Python, every precedence level is left-associative, except for 
'**'. If you write these formulas without any parentheses, then what the 
interpreter will actually execute is:
  (a * b) * c
  (a - b) - c
  (a / b) / c
but
  a ** (b ** c)

Okay, that's the background. Here's the question. We need to decide on 
precedence and associativity for '@'. In particular, there are three different 
options that are interesting:

OPTION 1 FOR @:
Precedence: same as *
Associativity: left
My shorthand name for it: "same-left" (yes, very creative)

This means that if you don't use parentheses, you get:
   a @ b @ c  ->  (a @ b) @ c
   a * b @ c  ->  (a * b) @ c
   a @ b * c  ->  (a @ b) * c

OPTION 2 FOR @:
Precedence: more-weakly-binding than *
Associativity: right
My shorthand name for it: "weak-right"

This means that if you don't use parentheses, you get:
   a @ b @ c  ->  a @ (b @ c)
   a * b @ c  ->  (a * b) @ c
   a @ b * c  ->  a @ (b * c)

OPTION 3 FOR @:
Precedence: more-tightly-binding than *
Associativity: right
My shorthand name for it: "tight-right"

This means that if you don't use parentheses, you get:
   a @ b @ c  ->  a @ (b @ c)
   a * b @ c  ->  a * (b @ c)
   a @ b * c  ->  (a @ b) * c

We need to pick which of which options we think is best, based on whatever 
reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy 
feelings" ;-). (In principle the other 2 possible options are tight-left and 
weak-left, but there doesn't seem to be any argument in favor of either, so 
we'll leave them out of the discussion.)

Some things to consider:

* and @ are actually not associative (in the math sense) with respect to each 
other, i.e., (a * b) @ c and a * (b @ c) in general give different results when 
'a' is not a scalar. So considering the two expressions 'a * b @ c' and 'a @ b 
* c', we can see that each of these three options gives produces different 
results in some cases.

"Same-left" is the easiest to explain and remember, because it's just, "@ acts 
like * and /". So we already have to know the rule in order to understand other 
non-associative expressions like a / b / c or a - b - c, and it'd be nice if 
the same rule applied to things like a * b @ c so we only had to memorize *one* 
rule. (Of course there's ** which uses the opposite rule, but I guess everyone 
internalized that one in secondary school; that's not true for * versus @.) 
This is definitely the default we should choose unless we have a good reason to 
do otherwise.

BUT: there might indeed be a good reason to do otherwise, which is the whole 
reason this has come up. Consider:
    Mat1 @ Mat2 @ vec
Obviously this will execute much more quickly if we do
    Mat1 @ (Mat2 @ vec)
because that results in two cheap matrix-vector multiplies, while
    (Mat1 @ Mat2) @ vec
starts out by doing an expensive matrix-matrix multiply. So: maybe @ should be 
right associative, so that we get the fast behaviour without having to use 
explicit parentheses! /If/ these kinds of expressions are common enough that 
having to remember to put explicit parentheses in all the time is more of a 
programmer burden than having to memorize a special associativity rule for @. 
Obviously Mat @ Mat @ vec is more common than vec @ Mat @ Mat, but maybe 
they're both so rare that it doesn't matter in practice -- I don't know.

Also, if we do want @ to be right associative, then I can't think of any clever 
reasons to prefer weak-right over tight-right, or vice-versa. For the scalar 
multiplication case, I believe both options produce the same result in the same 
amount of time. For the non-scalar case, they give different answers. Do people 
have strong intuitions about what expressions like
  a * b @ c
  a @ b * c
should do actually? (I'm guessing not, but hey, you never know.)

And, while intuition is useful, it would be really *really* nice to be basing 
these decisions on more than *just* intuition, since whatever we decide will be 
subtly influencing the experience of writing linear algebra code in Python for 
the rest of time. So here's where I could use some help. First, of course, if 
you have any other reasons why one or the other of these options is better, 
then please share! But second, I think we need to know something about how 
often the Mat @ Mat @ vec type cases arise in practice. How often do non-scalar 
* and np.dot show up in the same expression? How often does it look like a * 
np.dot(b, c), and how often does it look like np.dot(a * b, c)? How often do we 
see expressions like np.dot(np.dot(a, b), c), and how often do we see 
expressions like np.dot(a, np.dot(b, c))? This would really help guide the 
debate. I don't have this data, and I'm not sure the best way to get it. A 
super-fancy approach would be to write a little script that uses the 'ast' 
module to count things automatically. A less fancy approach would be to just 
pick some code you've written, or a well-known package, grep through for calls 
to 'dot', and make notes on what you see. (An advantage of the less-fancy 
approach is that as a human you might be able to tell the difference between 
scalar and non-scalar *, or check whether it actually matters what order the 
'dot' calls are done in.)

-n

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

_______________________________________________  
NumPy-Discussion mailing list  
NumPy-Discussion@scipy.org  
http://mail.scipy.org/mailman/listinfo/numpy-discussion  
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to