On Mar 17, 2014 5:54 PM, "Nathaniel Smith" <n...@pobox.com> wrote: > > On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <n...@pobox.com> wrote: > > Mathematica: instead of having an associativity, a @ b @ c gets > > converted into mdot([a, b, c]) > > So, I've been thinking about this (thanks to @rfateman for pointing it > out), and wondering if Mathematica's approach is worth following up > more. (It would need to make it past python-dev, of course, but worst > case is just that they say no and we're back where we are now, so we > might as well think it through.) > > Here's how it would work: > > Currently Python has 3 different kinds of ops: left-associative (most > of them), right-associative (**), and "chaining". Chaining is used for > comparison ops. Example: > > a < b < c > > gets parsed to something like > > do_comparison(args=[a, b, c], ops=[lt, lt]) > > Notice this is very different from either of > > (a < b) < c > a < (b < c) > > Which means that comparisons aren't left- OR right-associative, > they're this other thing, "chaining". > > So we could propose adding a 4th kind of op, calling "grouping", which > would be only @. And the idea is that > > a @ b @ c > > would be equivalent to > > operator.matmul((a, b, c)) > > which eventually (see below) becomes a call to > > a.__matmul__((a, b, c)) > > We'd use exactly the same parsing rules as the chaining ops, so you > can still control evaluation order with parentheses if you want: > > a @ (b @ c) -> matmul((a, matmul((b, c)))) > (a @ b) @ c -> matmul((matmul((a, c)), c)) > > ...but if you don't specify, then each contiguous group of @ operators > gets collected up and handed to __matmul__ together, and the > __matmul__ implementation gets to decide which evaluation strategy to > use. > > It's trivially fast for the computer to figure out the best evaluation > order for matrix multiplication, so in practice I think this would > mean that you could just stop worrying about parentheses for multiple > contiguous calls to matmul. Fancier versions of __matmul__ defined on > more specialized non-ndarray classes might even take into account > their specialized knowledge of how expensive different evaluation > orders are for their specific type -- I'm not sure if this actually > happens in practice, but it might. (Like maybe the best way to > evaluate a @ b @ c depends on the sparsity pattern in the various > matrices, or maybe it depends on which matrices are currently on the > GPU and which are in memory? Anyone have any real examples of this?) > > (Of course, this same evaluation-order problem arises for *all* > expressions using numpy; being able to optimize whole expressions at a > time is exactly what numexpr/numba/theano/etc. are useful for. So one > could argue that "baking it in" to @ is pointless, if anyone gets > tired of writing parentheses they should just use one of these > libraries. Or maybe evaluation order problems arise so rarely for @ > that no-one cares. But OTOH it would be so nice for once to just have > a single best solution -- "always use @ and be happy, it just works" > -- instead of all the caveats we normally do -- "@ is good in some > cases, but in other cases mdot is better, or if you know you can just > use @ with the right parentheses...".) > > Of course, we still have to say something about what value a @ b @ c > actually computes. In the PEP semantics, it isn't always associative > -- specifically not if we do Mat @ vec @ Mat. So in this approach, we > still need to decide what > matmul((Mat, vec, Mat)) > should return. > > But, this is actually a feature! Because obviously what *should* be > returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @ > Mat). Both of those answers are terrible; it's just, if you have an > ordinary left-/right-associative operator, those are your only > options. What *should* be returned is an error. And in this scheme we > get to see the whole @ expression at once, so we actually can raise an > error for such things. > > So, this possibly has nicer performance characteristics, and is also > possibly has nicer semantics. > > Now, how would this look in terms of the language definition? > > As far as the parser and AST go, this would use exactly the same rules > as the chaining ops, so that's easy. > > Having parsed, we must evaluate. Likely the most contentious part of > this approach is that we now have an n-arg operator, so the standard > __X__/__rX__ dichotomy won't work, we need to do something like > multiple dispatch. I haven't followed the debate on this issue in > detail, but what I'd propose for this limited context is not to do > anything like "real" multiple dispatch, but just directly generalize > the familiar __rX__ rule to n arguments. The __rX__ rule is how > Python's existing binary operators work: usually to evaluate a # b, > you try a.__foo__, and then b.__foo__ EXCEPT if b is a proper subclass > of a, you try b first. Generalized to >2 arguments, this looks like: > > def operator.matmul(args): > candidates = list(args) > while candidates: > candidate = pop_next_candidate(candidates) > if hasattr(candidate, "__matmul__"): > result = candidate.__matmul__(args) > if result is not NotImplemented: > return result > raise TypeError > > def pop_next_candidate(candidates): > classes = [c.__class__ for c in candidates] > # We'll try the left-most remaining candidate... > for i in range(len(candidates)): > # ...unless there's a later, untried candidate that's a proper subclass. > if not has_proper_subclass(classes[i], classes): > return candidates.pop(i) > assert False > > def has_proper_subclass(class_, other_classes): > for other_class in other_classes: > if (issubclass(other_class, class_) > and not issubclass(class_, other_class)): > return True > return False > > ...which, it turns out, is exactly the lookup rule that > __numpy_ufunc__ will use, so at least it isn't too weird from our > point of view: > http://docs.scipy.org/doc/numpy-dev/reference/arrays.classes.html#numpy.class.__numpy_ufunc__ > > There are still plenty of details to think about, e.g.: > > What does the in-place operator do? I think > a @= b @ c > would have to be the same as > a = a @ (b @ c) > and NOT > a = a @ b @ c > because otherwise what do you do with > a @= b @ c + d.
You cannot do inplace matrix multiplication without an intermediate copy of the first array, or at least of each row of the first array in turns. I don't like the idea of providing syntax that looks faster but isn't, I'd rather see @= return an error in the context of matrix multiplication. > I'm not sure how this would interact with implementing np.dot (which > we'd still need for its out= argument, and will I guess do dispatch > through the __numpy_ufunc__ mechanism?). We should probably work > through this in detail. > > We'd still need to pick a precedence for @: grouping is different than > left-associativity, so we can't do same-group, we'd have to pick > either tight-group or weak-group. My gut feeling is that tight-group > makes more sense that weak-group, because if @ is a magical thing that > collects up a group of items, then it is helpful if there's a simple > visual mapping where the group starts just to the left of the first @ > and extends just to the right of the last @. > > I'm still not at all sure this rigmorale is worth it -- I still think > we need some data on how often people chain together multiple @ or > np.dot calls. > > Still, I thought I'd throw this out here and see what people think of it. > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion