On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <n...@pobox.com> wrote: > Mathematica: instead of having an associativity, a @ b @ c gets > converted into mdot([a, b, c])
So, I've been thinking about this (thanks to @rfateman for pointing it out), and wondering if Mathematica's approach is worth following up more. (It would need to make it past python-dev, of course, but worst case is just that they say no and we're back where we are now, so we might as well think it through.) Here's how it would work: Currently Python has 3 different kinds of ops: left-associative (most of them), right-associative (**), and "chaining". Chaining is used for comparison ops. Example: a < b < c gets parsed to something like do_comparison(args=[a, b, c], ops=[lt, lt]) Notice this is very different from either of (a < b) < c a < (b < c) Which means that comparisons aren't left- OR right-associative, they're this other thing, "chaining". So we could propose adding a 4th kind of op, calling "grouping", which would be only @. And the idea is that a @ b @ c would be equivalent to operator.matmul((a, b, c)) which eventually (see below) becomes a call to a.__matmul__((a, b, c)) We'd use exactly the same parsing rules as the chaining ops, so you can still control evaluation order with parentheses if you want: a @ (b @ c) -> matmul((a, matmul((b, c)))) (a @ b) @ c -> matmul((matmul((a, c)), c)) ...but if you don't specify, then each contiguous group of @ operators gets collected up and handed to __matmul__ together, and the __matmul__ implementation gets to decide which evaluation strategy to use. It's trivially fast for the computer to figure out the best evaluation order for matrix multiplication, so in practice I think this would mean that you could just stop worrying about parentheses for multiple contiguous calls to matmul. Fancier versions of __matmul__ defined on more specialized non-ndarray classes might even take into account their specialized knowledge of how expensive different evaluation orders are for their specific type -- I'm not sure if this actually happens in practice, but it might. (Like maybe the best way to evaluate a @ b @ c depends on the sparsity pattern in the various matrices, or maybe it depends on which matrices are currently on the GPU and which are in memory? Anyone have any real examples of this?) (Of course, this same evaluation-order problem arises for *all* expressions using numpy; being able to optimize whole expressions at a time is exactly what numexpr/numba/theano/etc. are useful for. So one could argue that "baking it in" to @ is pointless, if anyone gets tired of writing parentheses they should just use one of these libraries. Or maybe evaluation order problems arise so rarely for @ that no-one cares. But OTOH it would be so nice for once to just have a single best solution -- "always use @ and be happy, it just works" -- instead of all the caveats we normally do -- "@ is good in some cases, but in other cases mdot is better, or if you know you can just use @ with the right parentheses...".) Of course, we still have to say something about what value a @ b @ c actually computes. In the PEP semantics, it isn't always associative -- specifically not if we do Mat @ vec @ Mat. So in this approach, we still need to decide what matmul((Mat, vec, Mat)) should return. But, this is actually a feature! Because obviously what *should* be returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @ Mat). Both of those answers are terrible; it's just, if you have an ordinary left-/right-associative operator, those are your only options. What *should* be returned is an error. And in this scheme we get to see the whole @ expression at once, so we actually can raise an error for such things. So, this possibly has nicer performance characteristics, and is also possibly has nicer semantics. Now, how would this look in terms of the language definition? As far as the parser and AST go, this would use exactly the same rules as the chaining ops, so that's easy. Having parsed, we must evaluate. Likely the most contentious part of this approach is that we now have an n-arg operator, so the standard __X__/__rX__ dichotomy won't work, we need to do something like multiple dispatch. I haven't followed the debate on this issue in detail, but what I'd propose for this limited context is not to do anything like "real" multiple dispatch, but just directly generalize the familiar __rX__ rule to n arguments. The __rX__ rule is how Python's existing binary operators work: usually to evaluate a # b, you try a.__foo__, and then b.__foo__ EXCEPT if b is a proper subclass of a, you try b first. Generalized to >2 arguments, this looks like: def operator.matmul(args): candidates = list(args) while candidates: candidate = pop_next_candidate(candidates) if hasattr(candidate, "__matmul__"): result = candidate.__matmul__(args) if result is not NotImplemented: return result raise TypeError def pop_next_candidate(candidates): classes = [c.__class__ for c in candidates] # We'll try the left-most remaining candidate... for i in range(len(candidates)): # ...unless there's a later, untried candidate that's a proper subclass. if not has_proper_subclass(classes[i], classes): return candidates.pop(i) assert False def has_proper_subclass(class_, other_classes): for other_class in other_classes: if (issubclass(other_class, class_) and not issubclass(class_, other_class)): return True return False ...which, it turns out, is exactly the lookup rule that __numpy_ufunc__ will use, so at least it isn't too weird from our point of view: http://docs.scipy.org/doc/numpy-dev/reference/arrays.classes.html#numpy.class.__numpy_ufunc__ There are still plenty of details to think about, e.g.: What does the in-place operator do? I think a @= b @ c would have to be the same as a = a @ (b @ c) and NOT a = a @ b @ c because otherwise what do you do with a @= b @ c + d. I'm not sure how this would interact with implementing np.dot (which we'd still need for its out= argument, and will I guess do dispatch through the __numpy_ufunc__ mechanism?). We should probably work through this in detail. We'd still need to pick a precedence for @: grouping is different than left-associativity, so we can't do same-group, we'd have to pick either tight-group or weak-group. My gut feeling is that tight-group makes more sense that weak-group, because if @ is a magical thing that collects up a group of items, then it is helpful if there's a simple visual mapping where the group starts just to the left of the first @ and extends just to the right of the last @. I'm still not at all sure this rigmorale is worth it -- I still think we need some data on how often people chain together multiple @ or np.dot calls. Still, I thought I'd throw this out here and see what people think of it. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion