On Fri, 2019-06-07 at 13:19 -0500, Sebastian Berg wrote: > On Fri, 2019-06-07 at 07:18 +0200, Ralf Gommers wrote: > > > > On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith <n...@pobox.com> > > wrote: > > > My intuition is that what users actually want is for *native > > > Python > > > types* to be treated as having 'underspecified' dtypes, e.g. int > > > is > > > happy to coerce to int8/int32/int64/whatever, float is happy to > > > coerce > > > to float32/float64/whatever, but once you have a fully-specified > > > numpy > > > dtype, it should stay. > > > > Thanks Nathaniel, I think this expresses a possible solution better > > than anything I've seen on this list before. An explicit > > "underspecified types" concept could make casting understandable. > > Yes, there is one small additional annoyance (but maybe it is just > that). In that 127 is the 'underspecified' dtype `uint7` (it can be > safely cast both to uint8 and int8). > > > > In any case, it would probably be helpful to start by just > > > writing > > > down the whole set of rules we have now, because I'm not sure > > > anyone > > > understands all the details... > > > > +1 > > OK, let me try to sketch the details below: > > 0. "Scalars" means scalars or 0-D arrays here. > > 1. The logic below will only be used if we have a mix of arrays and > scalars. If all are scalars, the logic is never used. (Plus one > additional tricky case within ufuncs, which is more hypothetical [0]) >
And of course I just realized that, trying to be simple, I forgot an important point there: The logic in 2. is only used when there is a mix of scalars and arrays, and the arrays are in the same or higher category. As an example: np.array([1, 2, 3], dtype=np.uint8) + np.float64(12.) will not demote the float64, because the scalars "float" is a higher category than the arrays "integer". - Sebastian > 2. Scalars will only be demoted within their category. The categories > and casting rules within the category are as follows: > > Boolean: > Casts safely to all (nothing surprising). > > Integers: > Casting is possible if output can hold the value. > This includes uint8(127) casting to an int8. > (unsigned and signed integers are the same "category") > > Floats: > Scalars can be demoted based on value, roughly this > avoids overflows: > float16: -65000 < value < 65000 > float32: -3.4e38 < value < 3.4e38 > float64: -1.7e308 < value < 1.7e308 > float128 (largest type, does not apply). > > Complex: Same logic as floats (applied to .real and .imag). > > Others: Anything else. > > --- > > Ufunc, as well as `result_type` will use this liberally, which > basically means finding the smallest type for each category and using > that. Of course for floats we cannot do the actual cast until later, > since initially we do not know if the cast will actually be > performed. > > This is only tricky for uint vs. int, because uint8(127) is a "small > unsigned". I.e. with our current dtypes there is no strict type > hierarchy uint8(x) may or may not cast to int8. > > --- > > We could think of doing: > > arr, min_dtype = np.asarray_and_min_dtype(pyobject) > > which could even fix the list example Nathaniel had. Which would work > if you would do the dtype hierarchy. > > This is where the `uint7` came from a hypothetical `uint7` would fix > the integer dtype hierarchy, by representing the numbers `0-127` > which > can be cast to uint8 and int8. > > Best, > > Sebastian > > > [0] Amendment for point 1: > > There is one detail (bug?) here in the logic though, that I missed > before. If a ufunc (or result_type) sees a mix of scalars and arrays, > it will try to decide whether or not to use value based logic. Value > based logic will be skipped if the scalars are in a higher category > (based on the ones above) then the highest array – for optimization I > assume. > Plausibly, this could cause incorrect logic when the dtype signature > of > a ufunc is mixed: > float32, int8 -> float32 > float32, int64 -> float64 > > May choose the second loop unnecessarily. Or for example if we have a > datetime64 in the inputs, there would be no way for value based > casting > to be used. > > > > > Ralf > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion