On Wed, May 30, 2018 at 11:14 AM, Marten van Kerkwijk <m.h.vankerkw...@gmail.com> wrote: > Hi All, > > Following on a PR combining the ability to provide fixed and flexible > dimensions [1] (useful for, e.g., 3-vector input with a signature like > `(3),(3)->(3)`, and for `matmul`, resp.; based on earlier PRs by Jaime > [2] and Matt (Picus) [3]), I've now made a PR with a further > enhancement, which allows one can indicate that a core dimension can > be broadcast [4]. > > A particular use case is `all_equal`, a new function suggested in a > stalled PR by Matt (Harrigan) [5], which compares two arrays > axis-by-axis, but short-circuits if a non-equality is found (unlike > what is the case if one does `(a==b).all(axis)`). One thing that would > be obviously useful for a routine like `all_equal` is to be able to > provide an array as one argument and a constant as another, i.e., if > the core dimensions can be broadcast if needed, just like they are in > `(a==b).all(axis)`. This is currently not possible: with its signature > of `(n),(n)->()`, the two arrays have to have the same trailing size. > > My PR provides the ability to indicate in the signature that a core > dimension can be broadcast, by using a suffix of "|1". Thus, the > signature of `all_equal` would become: > > ``` > (n|1),(n|1)->() > ``` > > Comments most welcome (yes, even on the notation - though I think it > is fairly self-explanatory)!
I'm currently -0.5 on both fixed dimensions and this broadcasting dimension idea. My reasoning is: - The use cases seem fairly esoteric. For fixed dimensions, I guess the motivating example is cross-product (are there any others?). But would it be so bad for a cross-product gufunc to raise an error if it receives the wrong number of dimensions? For this broadcasting case... well, obviously we've survived this long without all_equal :-). And there's something funny about all_equal, since it's really smushing together two conceptually separate gufuncs for efficiency. Should we also have all_less_than, sum_square, ...? If this is a big problem, then wouldn't it be better to solve it in a general way, like dask or Numba or numexpr do? To be clear, I'm not saying these features are necessarily *bad* ideas, in isolation -- just that the benefits aren't very convincing, and there are trade-offs, like: - When it comes to the core ufunc machinery, we have a limited complexity budget. I'm nervous that if we add too many bells and whistles, we'll end up writing ourselves into a corner where we have trouble maintaining it, where it becomes difficult to predict how different features interact, it becomes increasingly difficult for third-parties to handle all the different features in their __array_ufunc__ methods... - And, we have a lot of other demands on the core ufunc machinery, that might be better places to spend our limited complexity budget. For example, can we come up with an extension to make np.sort a gufunc? That seems like a much higher priority than figuring out how to make all_equal a gufunc. What about refactoring the ufunc machinery to support user-defined dtypes? That'll need some serious work, and again, it's probably higher priority than supporting cross-product or all_equal directly (or at least it seems that way to me). Maybe there are more compelling use cases that I'm missing, but as it is, I feel like trying to add too many features to the current ufunc machinery is pretty risky for future maintainability, and we shouldn't do it without really solid use cases. -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion