I think dtype-based casting makes a lot of sense, the problem is backward compatibility.
Numpy casting is weird in a number of ways: The array + array casting is unexpected to many users (eg, uint64 + int64 -> float64), and the casting of array + scalar is different from that, and value based. Personally I wouldn't want to try change it unless we make a backward-incompatible release (numpy 2.0), based on my experience trying to change much more minor things. We already put "casting" on the list of desired backward-incompatible changes on the list here: https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release Relatedly, I've previously dreamed about a different "C-style" way casting might behave: https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76 The proposal there is that array + array casting, array + scalar, and array + python casting would all work in the same dtype-based way, which mimics the familiar "C" casting rules. See also: https://github.com/numpy/numpy/issues/12525 Allan On 6/5/19 4:41 PM, Sebastian Berg wrote: > Hi all, > > TL;DR: > > Value based promotion seems complex both for users and ufunc- > dispatching/promotion logic. Is there any way we can move forward here, > and if we do, could we just risk some possible (maybe not-existing) > corner cases to break early to get on the way? > > ----------- > > Currently when you write code such as: > > arr = np.array([1, 43, 23], dtype=np.uint16) > res = arr + 1 > > Numpy uses fairly sophisticated logic to decide that `1` can be > represented as a uint16, and thus for all unary functions (and most > others as well), the output will have a `res.dtype` of uint16. > > Similar logic also exists for floating point types, where a lower > precision floating point can be used: > > arr = np.array([1, 43, 23], dtype=np.float32) > (arr + np.float64(2.)).dtype # will be float32 > > Currently, this value based logic is enforced by checking whether the > cast is possible: "4" can be cast to int8, uint8. So the first call > above will at some point check if "uint16 + uint16 -> uint16" is a > valid operation, find that it is, and thus stop searching. (There is > the additional logic, that when both/all operands are scalars, it is > not applied). > > Note that while it is defined in terms of casting "1" to uint8 safely > being possible even though 1 may be typed as int64. This logic thus > affects all promotion rules as well (i.e. what should the output dtype > be). > > > There 2 main discussion points/issues about it: > > 1. Should value based casting/promotion logic exist at all? > > Arguably an `np.int32(3)` has type information attached to it, so why > should we ignore it. It can also be tricky for users, because a small > change in values can change the result data type. > Because 0-D arrays and scalars are too close inside numpy (you will > often not know which one you get). There is not much option but to > handle them identically. However, it seems pretty odd that: > * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8) > * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8) > > give a different result. > > This is a bit different for python scalars, which do not have a type > attached already. > > > 2. Promotion and type resolution in Ufuncs: > > What is currently bothering me is that the decision what the output > dtypes should be currently depends on the values in complicated ways. > It would be nice if we can decide which type signature to use without > actually looking at values (or at least only very early on). > > One reason here is caching and simplicity. I would like to be able to > cache which loop should be used for what input. Having value based > casting in there bloats up the problem. > Of course it currently works OK, but especially when user dtypes come > into play, caching would seem like a nice optimization option. > > Because `uint8(127)` can also be a `int8`, but uint8(128) it is not as > simple as finding the "minimal" dtype once and working with that." > Of course Eric and I discussed this a bit before, and you could create > an internal "uint7" dtype which has the only purpose of flagging that a > cast to int8 is safe. > > I suppose it is possible I am barking up the wrong tree here, and this > caching/predictability is not vital (or can be solved with such an > internal dtype easily, although I am not sure it seems elegant). > > > Possible options to move forward > -------------------------------- > > I have to still see a bit how trick things are. But there are a few > possible options. I would like to move the scalar logic to the > beginning of ufunc calls: > * The uint7 idea would be one solution > * Simply implement something that works for numpy and all except > strange external ufuncs (I can only think of numba as a plausible > candidate for creating such). > > My current plan is to see where the second thing leaves me. > > We also should see if we cannot move the whole thing forward, in which > case the main decision would have to be forward to where. My opinion is > currently that when a type has a dtype associated with it clearly, we > should always use that dtype in the future. This mostly means that > numpy dtypes such as `np.int64` will always be treated like an int64, > and never like a `uint8` because they happen to be castable to that. > > For values without a dtype attached (read python integers, floats), I > see three options, from more complex to simpler: > > 1. Keep the current logic in place as much as possible > 2. Only support value based promotion for operators, e.g.: > `arr + scalar` may do it, but `np.add(arr, scalar)` will not. > The upside is that it limits the complexity to a much simpler > problem, the downside is that the ufunc call and operator match > less clearly. > 3. Just associate python float with float64 and python integers with > long/int64 and force users to always type them explicitly if they > need to. > > The downside of 1. is that it doesn't help with simplifying the current > situation all that much, because we still have the special casting > around... > > > I have realized that this got much too long, so I hope it makes sense. > I will continue to dabble along on these things a bit, so if nothing > else maybe writing it helps me to get a bit clearer on things... > > Best, > > Sebastian > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion