Re: [Numpy-discussion] Moving forward with value based casting

Sebastian Berg Wed, 05 Jun 2019 15:37:53 -0700

Hi all,

Maybe to clarify this at least a little, here are some examples for
what currently happen and what I could imagine we can go to (all in
terms of output dtype).


float32_arr = np.ones(10, dtype=np.float32)
int8_arr = np.ones(10, dtype=np.int8)
uint8_arr = np.ones(10, dtype=np.uint8)


Current behaviour:
------------------

float32_arr + 12.  # float32
float32_arr + 2**200  # float64 (because np.float32(2**200) == np.inf)

int8_arr + 127     # int8
int8_arr + 128     # int16
int8_arr + 2**20   # int32
uint8_arr + -1     # uint16

# But only for arrays that are not 0d:
int8_arr + np.array(1, dtype=np.int32)  # int8
int8_arr + np.array([1], dtype=np.int32)  # int32

# When the actual typing is given, this does not change:

float32_arr + np.float64(12.)                  # float32
float32_arr + np.array(12., dtype=np.float64)  # float32

# Except for inexact types, or complex:
int8_arr + np.float16(3)  # float16  (same as array behaviour)

# The exact same happens with all ufuncs:
np.add(float32_arr, 1)                               # float32
np.add(float32_arr, np.array(12., dtype=np.float64)  # float32


Keeping Value based casting only for python types
-------------------------------------------------

In this case, most examples above stay unchanged, because they use
plain python integers or floats, such as 2, 127, 12., 3, ... without
any type information attached, such as `np.float64(12.)`.

These change for example:

float32_arr + np.float64(12.)                        # float64
float32_arr + np.array(12., dtype=np.float64)        # float64
np.add(float32_arr, np.array(12., dtype=np.float64)  # float64

# so if you use `np.int32` it will be the same as np.uint64(10000)

int8_arr + np.int32(1)      # int32
int8_arr + np.int32(2**20)  # int32


Remove Value based casting completely
-------------------------------------

We could simply abolish it completely, a python `1` would always behave
the same as `np.int_(1)`. The downside of this is that:

int8_arr + 1  # int64 (or int32)

uses much more memory suddenly. Or, we remove it from ufuncs, but not
from operators:

int8_arr + 1  # int8 dtype

but:

np.add(int8_arr, 1)  # int64
# same as:
np.add(int8_arr, np.array(1))  # int16

The main reason why I was wondering about that is that for operators
the logic seems fairly simple, but for general ufuncs it seems more
complex.

Best,

Sebastian



On Wed, 2019-06-05 at 15:41 -0500, Sebastian Berg wrote:
> Hi all,
> 
> TL;DR:
> 
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward
> here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
> 
> -----------
> 
> Currently when you write code such as:
> 
> arr = np.array([1, 43, 23], dtype=np.uint16)
> res = arr + 1
> 
> Numpy uses fairly sophisticated logic to decide that `1` can be
> represented as a uint16, and thus for all unary functions (and most
> others as well), the output will have a `res.dtype` of uint16.
> 
> Similar logic also exists for floating point types, where a lower
> precision floating point can be used:
> 
> arr = np.array([1, 43, 23], dtype=np.float32)
> (arr + np.float64(2.)).dtype  # will be float32
> 
> Currently, this value based logic is enforced by checking whether the
> cast is possible: "4" can be cast to int8, uint8. So the first call
> above will at some point check if "uint16 + uint16 -> uint16" is a
> valid operation, find that it is, and thus stop searching. (There is
> the additional logic, that when both/all operands are scalars, it is
> not applied).
> 
> Note that while it is defined in terms of casting "1" to uint8 safely
> being possible even though 1 may be typed as int64. This logic thus
> affects all promotion rules as well (i.e. what should the output
> dtype
> be).
> 
> 
> There 2 main discussion points/issues about it:
> 
> 1. Should value based casting/promotion logic exist at all?
> 
> Arguably an `np.int32(3)` has type information attached to it, so why
> should we ignore it. It can also be tricky for users, because a small
> change in values can change the result data type.
> Because 0-D arrays and scalars are too close inside numpy (you will
> often not know which one you get). There is not much option but to
> handle them identically. However, it seems pretty odd that:
>  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
>  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> 
> give a different result.
> 
> This is a bit different for python scalars, which do not have a type
> attached already.
> 
> 
> 2. Promotion and type resolution in Ufuncs:
> 
> What is currently bothering me is that the decision what the output
> dtypes should be currently depends on the values in complicated ways.
> It would be nice if we can decide which type signature to use without
> actually looking at values (or at least only very early on).
> 
> One reason here is caching and simplicity. I would like to be able to
> cache which loop should be used for what input. Having value based
> casting in there bloats up the problem.
> Of course it currently works OK, but especially when user dtypes come
> into play, caching would seem like a nice optimization option.
> 
> Because `uint8(127)` can also be a `int8`, but uint8(128) it is not
> as
> simple as finding the "minimal" dtype once and working with that." 
> Of course Eric and I discussed this a bit before, and you could
> create
> an internal "uint7" dtype which has the only purpose of flagging that
> a
> cast to int8 is safe.
> 
> I suppose it is possible I am barking up the wrong tree here, and
> this
> caching/predictability is not vital (or can be solved with such an
> internal dtype easily, although I am not sure it seems elegant).
> 
> 
> Possible options to move forward
> --------------------------------
> 
> I have to still see a bit how trick things are. But there are a few
> possible options. I would like to move the scalar logic to the
> beginning of ufunc calls:
>   * The uint7 idea would be one solution
>   * Simply implement something that works for numpy and all except
>     strange external ufuncs (I can only think of numba as a plausible
>     candidate for creating such).
> 
> My current plan is to see where the second thing leaves me.
> 
> We also should see if we cannot move the whole thing forward, in
> which
> case the main decision would have to be forward to where. My opinion
> is
> currently that when a type has a dtype associated with it clearly, we
> should always use that dtype in the future. This mostly means that
> numpy dtypes such as `np.int64` will always be treated like an int64,
> and never like a `uint8` because they happen to be castable to that.
> 
> For values without a dtype attached (read python integers, floats), I
> see three options, from more complex to simpler:
> 
> 1. Keep the current logic in place as much as possible
> 2. Only support value based promotion for operators, e.g.:
>    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
>    The upside is that it limits the complexity to a much simpler
>    problem, the downside is that the ufunc call and operator match
>    less clearly.
> 3. Just associate python float with float64 and python integers with
>    long/int64 and force users to always type them explicitly if they
>    need to.
> 
> The downside of 1. is that it doesn't help with simplifying the
> current
> situation all that much, because we still have the special casting
> around...
> 
> 
> I have realized that this got much too long, so I hope it makes
> sense.
> I will continue to dabble along on these things a bit, so if nothing
> else maybe writing it helps me to get a bit clearer on things...
> 
> Best,
> 
> Sebastian
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Moving forward with value based casting

Reply via email to