On Tue, 2019-06-11 at 22:08 -0400, Marten van Kerkwijk wrote: > HI Sebastian, > > Thanks for the overview! In the value-based casting, what perhaps > surprises me most is that it is done within a kind; it would seem an > improvement to check whether a given integer scalar is exactly > representable in a given float (your example of 1024 in `float16`). > If we switch to the python-only scalar values idea, I would suggest > to abandon this. That might make dealing with things like `Decimal` > or `Fraction` easier as well. >
Yeah, one can argue that since we have this "safe casting" based approach, we should go all the way for the value based logic. I think I tend to agree, but I am not quite sure right now to be honest. Fractions and Decimals are very interesting in that they raise the question what happens to user dtypes [0]. Although, you would still need a "no lower category" rule, since you do not want 1024. or 12/3 be demoted to an integer. For me right now, what is most interesting is what we should do with ufunc calls, and if we can simplify them. I feel right now we have to types of ufuncs: 1. Ufuncs which use a "common type", where we can find the minimal type before dispatching. 2. More complex ufuncs, for which finding the minimal type is trickier [1]. And while I could not find any weird enough ufunc, I am not sure that blind promotion is a good idea for general ufuncs. Best, Sebastian [0] A python fraction could be converted to int64/int64 or int32/int32, etc. depending on the value, in principle. If we want such things to work in principle, we need machinery (although I expect one could tag that on later). [1] It is not impossible, but we need to insert non-existing types into the type hierarchy. PS: Another interesting issue is that if we try to move away from value based casting for numpy scalars, that initial `np.asarray(...)` call may lose the information that a python integer was passed in. So to support such things, we might need a whole new machinery. > All the best, > > Marten > > On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg < > sebast...@sipsolutions.net> wrote: > > Hi all, > > > > strange, something went wrong sending that email, but in any > > case... > > > > I tried to "summarize" the current behaviour of promotion and value > > based promotion in numpy (correcting a small error in what I wrote > > earlier). Since it got a bit long, you can find it here (also copy > > pasted at the end): > > > > https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA > > > > Allan's document which I link in there is also very interesting. > > One > > thing I had not really thought about before was the problem of > > commutativity. > > > > I do not have any specific points I want to discuss based on it > > (but > > those are likely to come up again later). > > > > All the Best, > > > > Sebastian > > > > > > ----------------------------- > > > > PS: Below a copy of what I wrote: > > > > --- > > title: Numpy Value Based Promotion Rules > > author: Sebastian Berg > > --- > > > > > > > > NumPy Value Based Scalar Casting and Promotion > > ============================================== > > > > This document reviews some of the behaviours of the promotion rules > > within numpy. This is especially with respect to the promotion of > > scalars and 0D arrays which inspect the value to decide casting and > > promotion. > > > > Other documents discussing these things: > > > > * `from numpy.testing import print_coercion_tables` prints the > > current promotion tables including value based promotion for small > > positive/negative scalars. > > * Allan Haldane's thoughts on changing casting/promotion to be > > more > > C-like and discussing things such as here: > > > > https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76 > > * Discussion around the problem of uint64 and int64 being > > promoted to > > float64: https://github.com/numpy/numpy/issues/12525 (lists many > > related issues). > > > > > > Nomenclature and Defintions > > --------------------------- > > > > * **dtype/type**: The data type of an array or scalar: `float32`, > > `float64`, `int8`, … > > > > * **Category**: A category to which the data type belongs, in this > > context these are: > > 1. boolean > > 2. integer (unsigned and signed are not split up here, but are > > different "kinds") > > 3. floating point and complex (not split up here but are > > different > > "kinds") > > 5. All others > > > > * **Casting**: converting from one dtype to another. There are four > > different rules of casting: > > 1. *"safe"* casting: All values are representable in the new data > > type. I.e. no information is lost during the conversion. > > 2. *"same kind"* casting: data loss may occur, but only within > > the > > same "kind". For example a float64 can be converted to float32 > > using > > "same kind" rules, an int64 can be converted to int16. This is > > although > > both lose precision or even produce incorrect values. Note that > > "kind" > > is different from "category" in that it distinguishes between > > signed > > and unsigned integers. > > 4. *"unsafe"* casting: Any conversion which can be defined, e.g. > > floating point to integer. For promotion this is fairly > > unimportant. > > (Some conversions such as string to integer, which not even work > > fall > > in this category, but could also be called coercions or > > conversions.) > > > > * **Promotion**: The general process of finding a new dtype for > > multiple input dtypes. Will be used here to also denote any kind of > > casting/promotion done before a specific function is called. This > > can > > be more complex, because in rare cases a functions can for example > > take > > floating point numbers and integers as input at the same time (i.e. > > `np.ldexp`). > > > > * **Common dtype**: A dtype which can represent all input data. In > > general this means that all inputs can be safely cast to this > > dtype. > > Within numpy this is the normal and simplest form of promotion. > > > > * **`type1, type2 -> type3`**: Defines a promotion or signature. > > For > > example adding two integers: `np.int32(5) + np.int32(3)` gives > > `np.int32(8)`. The dtype signature for that example would be: > > `int32, > > int32 -> int32`. A short form for this is also `ii->i` using C-like > > type codes, this can be found for example in `np.ldexp.types` (and > > any > > numpy ufunc). > > > > * **Scalar**: A numpy or python scalar or a **0-D array**. It is > > important to remember that zero dimensional arrays are treated just > > like scalars with respect to casting and promotion. > > > > > > Current Situation in Numpy > > -------------------------- > > > > The current situation can be understand mostly in terms of safe > > casting > > which is defined based on the type hierarchy and is sensitive to > > values > > for scalars. > > > > This safe casting based approach is in contrast for example to > > promotion within C or Julia, which work based on category first. > > For > > example `int32` cannot be safely cast to `float32`, but C or Julia > > will > > use `int32, float32 -> float32` as the common type/promotion rule > > for > > example to decide on the output dtype for addition. > > > > > > ### Python Integers and Floats > > > > Note that python integers are handled exactly like numpy ones. They > > are, however, special in that they do not have a dtype associated > > with > > them explicitly. Value based logic, as described here, seems useful > > for > > python integers and floats to allow: > > ``` > > arr = np.arange(10, dtype=np.int8) > > arr += 1 > > # or: > > res = arr + 1 > > res.dtype == np.int8 > > ``` > > which ensures that no upcast (for example with higher memory usage) > > occurs. > > > > > > ### Safe Casting > > > > Most safe casting is clearly defined based on whether or not any > > possible value is representable in the ouput dtype. Within numpy > > there > > is currently a single exception to this rule: > > `np.can_cast(np.int64, > > np.float64, casting="safe")` is considered to be true although > > float64 > > cannot represent some large integer values exactly. In contrast, > > `np.can_cast(np.int32, np.float32, casting="safe")` is `False` and > > `np.float64` would have to be used if a "safe" cast is desired. > > > > This exception may be one thing that should be changed, however, > > concurrently the promotion rules have to be adapted to keep doing > > the > > same thing, or a larger behaviour change decided. > > > > > > #### Scalar based rules > > > > Unlike arrays, where inspection of all values is not feasable, for > > scalars (and 0-D arrays) the value is inspected. The casting > > becomes a > > two step process: > > 1. The minimal dtype capable of holding the value is found. > > 2. The normal casting rules are applied to the new dtype. > > > > The first step uses the following rules by finding the minimal > > dtype > > within its category: > > > > * Boolean: Dtype is already minimal > > > > * Integers: > > Casting is possible if output can hold the value. This includes > > uint8(127) casting to an int8. > > > > * Floats and Complex > > Scalars can be demoted based on value, roughly this avoids > > overflows: > > ``` > > float16: -65000 < value < 65000 > > float32: -3.4e38 < value < 3.4e38 > > float64: -1.7e308 < value < 1.7e308 > > float128 (largest type, does not apply). > > ``` > > For complex, the logic is simply applied to both real and > > imaginary > > part. Complex numbers cannot be downcast to floating point. > > > > * Others: Dtype is not modified. > > > > > > This two step process means that `np.can_cast(np.int16(1024), > > np.float16)` is `False` even though float16 is capable of exactly > > representing the value 1024, since value based "demotion" to a > > lower > > dtype is used only within each category. > > > > > > > > ### Common Type Promotion > > > > For most operations in numpy the output type is just the common > > type of > > the inputs, this holds for example for concatenation, as well as > > almost > > all math funcions (e.g. addition and multiplication have two > > identical > > inputs and need one ouput dtype). This operation is exposed as > > `np.result_type` which includes value based logic, and > > `np.promote_types` which only accepts dtypes as input. > > > > Normal type promotion without value based/scalar logic finds the > > smallest type which both inputs can cast to safely. This will be > > the > > largest "kind" (bool < unsigned < integer < float < complex < > > other). > > > > Note that type promotion is handled in a "reduce" manner from left > > to > > right. In rare cases this means it is not associatetive: `float32, > > uint16, int16 -> float32`, but `float32, (uint16, int16) -> > > float64`. > > > > #### Scalar based rule > > > > When there is a mix of scalars and arrays, numpy will usually allow > > the > > scalars to be handled in the same fashion as for "safe" casting > > rules. > > > > The rules are as follows: > > > > 1. Value based logic is only applied if the "category" of any array > > is > > larger or equal to the category of all scalars. If this is not the > > case, the typical rules are used. > > * Specifically, this means: `np.array([1, 2, 3], > > dtype=np.uint8) + > > np.float64(12.)` gives a `float64` result, because the > > `np.float64(12.)` is not considered for being demoted. > > > > 2. Promotion is applied as normally, however, instead of the > > original > > dtype, the minimal dtype is used. In the case where the minimal > > data > > type is unsigned (say uint8) but the value is small enough, the > > minimal > > type may in fact be either `uint8` or `int8` (127 can be both). > > This > > promotion is also applied in pairs (reduction-like) from left to > > right. > > > > > > ### General Promotion during Function Execution > > > > General functions (read "ufuncs" such as `np.add`) may have a > > specific > > dtype signature which is (for most dtypes) stored e.g. as > > `np.add.types`. For many of these functions the common type > > promotion > > is used unchanged. > > > > However, some functions will employ a slightly different method > > (which > > should be equivalent in most cases). They will loop through all > > loops > > listed in `np.add.types` in order and find the first one to which > > all > > inputs can be safely cast: > > ``` > > np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...] > > ``` > > Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the first > > `float16, float16 -> float16` (`'ee->e'`) loop because `int16` > > cannot > > be cast safely, and then pick the float32 (`'ff->f'`) one. > > > > For simple functions, which commonly have two identical inputs, > > this > > should be identical, since normally a clear order exists for the > > dtypes > > (it does require checking int8 before uint8, etc.). > > > > #### Scalar based rule > > > > When scalars are involved, the "safe" cast logic based on values is > > applied *if and only if* rule 1. applies as before: That is there > > must > > be an array with a higher or equal category as all of the scalars. > > > > In the above `np.divide` example, this means that > > `np.divide(np.int16(4), np.array([3], dtype=np.float16))` *will* > > use > > the `'ee->e'` loop, because the scalar `4` is of a lower or equal > > category than the array (integer <= float or complex). While > > checking, > > 4 is found to be safely castable to float16, since `(u)int8` is > > sufficient to hold 4 and that can be safely cast to `float16`. > > However, `np.divide(np.int16(4), np.int16(3))` would use `float32` > > because both are scalars and thus value based logic is not used > > (Note > > that in reality numpy forces double output for an all integer input > > in > > divide). > > > > In it is possible for ufuncs to have mixed type signatures (this is > > very rare within numy) and arbitrary inputs. In this case, in > > principle, the question is whether or not a clear ordering exists > > and > > if the rule of using value based logic is always clear. This is > > rather > > academical (I could not find any such function in numpy or > > `scipy.special` [^scipy-ufuncs]). But consider: > > ``` > > imaginary_ufunc.types: > > int32, float32 -> int32, float32 > > int64, float32 -> int64, float32 > > ... > > ``` > > it is not clear that `np.int64(5) + np.float32(3.)` should be able > > to > > demote the `5`. This is very theoretical of course > > > > > > > > > > Footnotes > > --------- > > > > [^scipy-ufuncs]: See for example these functions: > > ```python > > import scipy.special > > for n, func in scipy.special.__dict__.items(): > > if not isinstance(func, np.ufunc): > > continue > > > > if func.nin == 1: > > # a single input is not interesting > > continue > > > > # check if the signature is not uniform > > for types in func.types: > > if len(set(types[:func.nin])) != 1: > > break > > else: > > continue > > print(func, func.types) > > ``` > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion