Hi all,

I am happy that we have the correct integer handling for NEP 50 merged,
so the relevant parts of the proposal can now be tested. [1]

However, this has highlighted that NumPy has problems with applying the
"cast safety" logic to scalars.  We had discussed this a bit yesterday,
and this is an attempt to summarize the issue and thoughts on how to
"resolve" it.

This mainly affects Python int, float, and complex due to their special
handling with NEP 50.


NumPy has the cast safety concept for converting between different
dtypes:
  https://numpy.org/doc/stable/reference/generated/numpy.can_cast.html

It uses "same-kind" in ufuncs (users do not usually notice this unless
`out=` or `dtype=` is used).
NumPy otherwise tends to use "unsafe" for casts and assignments by
default which can lead to undefined/strange results at times.


Since casts/assignment use "unsafe" casting, scalars are often
converted in a non-safe way.  However, there are certain exceptions:

    np.arange(5)[3] = np.nan  # Errors (an unsafe cast would not)

More importantly, NEP 50 requires the following to error:

    np.uint8(3) + 5000  # 5000 cannot be converted to uint8

And we just put in a deprecation that would always disallow the above!
But what would the answer to:

    np.can_cast(5000, np.uint8, casting="safe/same_kind/unsafe")

be?  And how to resolve the fact that casting scalars and arrays has a
different notion of "safety"?

I could imagine two main approaches:

* cast-safety doesn't apply to scalar conversions, they are whatever
  they currently are (sometimes unsafe, sometimes same-kind, but
  strictly safe for integer assignments).
  `np.can_cast(5000, np.uint8)` just errors out.  We have an assignment
  "safety" that is independent of casting safety.

  For `np.add(np.uint8(5), 100, casting="safe")` the "safe" (or
  other modes) simply doesn't make sense for the `100` since
  effectively the assignment "safety" is used.

* Scalar conversions also have a cast-safety and it may inspect the
  value.

The problem with defining cast-safety for scalar conversion is not
implementing it, but rather how to (not?) resolve the inconsistencies.

Even if we change the default casting for assignments to "same kind" (a
deprecation also applied to arrays):

    int8_arr[3] = 5000

should presumably be an error (not even "unsafe"), but:

    np.can_cast(np.int64, np.int8, casting="same_kind")

returns `True` (an int64 could be 5000 as well), and `same_kind` is
what ufuncs also use.


I don't have a clear plan on this right now, my best thought is that we
live with the inconsistency:

    np.can_cast(100, np.int8)

would be "safe" while:

    np.can_cast(100., np.int8)

would be "unsafe" (and other conversions through `__int__`).  And:

    np.can_cast(1000, dtype=np.int8) 

would always return `False` (the assignment would fail), even though
that is not what would happen when casting integers.
More confusingly, maybe:

    np.can_cast(1000., dtype=npint8)

is "unsafe" and making it an error (completely unsafe) might be a
future deprecation.

That would add a cast-safety that is slightly inconsistent between
Python integer and NumPy integers

Cheers,

Sebastian



[1] The NEP: https://numpy.org/neps/nep-0050-scalar-promotion.html

The new part is mainly that `np.uint8(5) + 300` will now give the
proposed error (when opting in).
Calls that use `casting=` or `can_cast()` may not have the fully
correct future behavior, but these should be very niche.


[2] A bit tricky to define, but right now:

      arr.astype(new_dtype, casting="safe").astype(arr.dtype)

    should always round-trip correctly.

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to