On Wed, 2019-06-05 at 14:14 -0700, Stephan Hoyer wrote:
> On Wed, Jun 5, 2019 at 1:43 PM Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
> > Hi all,
> > 

<snip>

> > 
> > Because `uint8(127)` can also be a `int8`, but uint8(128) it is not
> > as
> > simple as finding the "minimal" dtype once and working with that." 
> > Of course Eric and I discussed this a bit before, and you could
> > create
> > an internal "uint7" dtype which has the only purpose of flagging
> > that a
> > cast to int8 is safe.
> 
> Does NumPy actually have an logic that does these sort of checks
> currently? If so, it would be interesting to see what it is.
> 
> My experiments suggest that we currently have this logic of finding
> the "minimal" dtype that can hold the scalar value:
> 
> >>> np.array([127], dtype=np.int8) + 127  # silent overflow!
> array([-2], dtype=int8)
> 
> >>> np.array([127], dtype=np.int8) + 128  # correct result
> array([255], dtype=int16)
> 

The current checks all come down to `np.can_cast` (on the C side this
is `PyArray_CanCastArray()`), answering True. The actual result value
is not taken into account of course. So 127 can be represented as int8
and since the "int8,int8->int8" loop is checked first (and "can cast"
correctly) it is used.
Alternatively, you can think of it as using `np.result_type()` which
will, for all practical purposes, give the same dtype (but result type
may or may not be actually used, and there are some subtle differences
in principle).

Effectively, in your example you could reduce it to a minimal dtype of
uint7 for 127, since a uint7 can be cast safely to an int8 and also to
a uint8. (If you would just say the minimal dtype is uint8, you could
not distinguish the two examples).

Does that answer the question?

Best,

Sebastian

> 
> > I suppose it is possible I am barking up the wrong tree here, and
> > this
> > caching/predictability is not vital (or can be solved with such an
> > internal dtype easily, although I am not sure it seems elegant).
> > 
> > 
> > Possible options to move forward
> > --------------------------------
> > 
> > I have to still see a bit how trick things are. But there are a few
> > possible options. I would like to move the scalar logic to the
> > beginning of ufunc calls:
> >   * The uint7 idea would be one solution
> >   * Simply implement something that works for numpy and all except
> >     strange external ufuncs (I can only think of numba as a
> > plausible
> >     candidate for creating such).
> > 
> > My current plan is to see where the second thing leaves me.
> > 
> > We also should see if we cannot move the whole thing forward, in
> > which
> > case the main decision would have to be forward to where. My
> > opinion is
> > currently that when a type has a dtype associated with it clearly,
> > we
> > should always use that dtype in the future. This mostly means that
> > numpy dtypes such as `np.int64` will always be treated like an
> > int64,
> > and never like a `uint8` because they happen to be castable to
> > that.
> > 
> > For values without a dtype attached (read python integers, floats),
> > I
> > see three options, from more complex to simpler:
> > 
> > 1. Keep the current logic in place as much as possible
> > 2. Only support value based promotion for operators, e.g.:
> >    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
> >    The upside is that it limits the complexity to a much simpler
> >    problem, the downside is that the ufunc call and operator match
> >    less clearly.
> > 3. Just associate python float with float64 and python integers
> > with
> >    long/int64 and force users to always type them explicitly if
> > they
> >    need to.
> > 
> > The downside of 1. is that it doesn't help with simplifying the
> > current
> > situation all that much, because we still have the special casting
> > around...
> 
> I think it would be fine to special case operators, but NEP-13 means
> that the ufuncs corresponding to operators really do need to work
> exactly the same way. So we should also special-case those ufuncs.
> 
> I don't think Option (3) is viable. Too many users rely upon
> arithmetic like "x + 1" having a predictable dtype.
>  
> > I have realized that this got much too long, so I hope it makes
> > sense.
> > I will continue to dabble along on these things a bit, so if
> > nothing
> > else maybe writing it helps me to get a bit clearer on things...
> > 
> > Best,
> > 
> > Sebastian
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to