Hi Sebastian, One way to avoid an ugly lookup table and special cases is to store the amount of sign bits, the amount of integer/mantissa bits and the amount of exponent bits for each numeric style. A safe cast can only happen if all three are exceeded or equal. Just a thought.
Best Regards, Hameer Abbasi > On Wednesday, Jun 12, 2019 at 9:50 PM, Sebastian Berg > <sebast...@sipsolutions.net (mailto:sebast...@sipsolutions.net)> wrote: > On Wed, 2019-06-12 at 12:03 -0500, Sebastian Berg wrote: > > On Tue, 2019-06-11 at 22:08 -0400, Marten van Kerkwijk wrote: > > > HI Sebastian, > > > > > > Thanks for the overview! In the value-based casting, what perhaps > > > surprises me most is that it is done within a kind; it would seem > > > an > > > improvement to check whether a given integer scalar is exactly > > > representable in a given float (your example of 1024 in `float16`). > > > If we switch to the python-only scalar values idea, I would suggest > > > to abandon this. That might make dealing with things like `Decimal` > > > or `Fraction` easier as well. > > > > > > > Yeah, one can argue that since we have this "safe casting" based > > approach, we should go all the way for the value based logic. I think > > I > > tend to agree, but I am not quite sure right now to be honest. > > Just realized, one issue with this is that you get much more "special > cases" if you think of it in terms of "minimal dtype". Because > suddenly, not just the unsigned/signed integers such as "< 128" are > special, but even more values require special handling. An int16 > "minimal dtype" may or may not be castable to float16. > > For `can_cast` that does not matter much, but if we use the same logic > for promotion things may get uglier. Although, maybe it just gets > uglier implementation wise and is fairly logic on the user side... > > - Sebastian > > > > > > Fractions and Decimals are very interesting in that they raise the > > question what happens to user dtypes [0]. Although, you would still > > need a "no lower category" rule, since you do not want 1024. or 12/3 > > be > > demoted to an integer. > > > > For me right now, what is most interesting is what we should do with > > ufunc calls, and if we can simplify them. I feel right now we have to > > types of ufuncs: > > > > 1. Ufuncs which use a "common type", where we can find the minimal > > type > > before dispatching. > > > > 2. More complex ufuncs, for which finding the minimal type is > > trickier > > [1]. And while I could not find any weird enough ufunc, I am not sure > > that blind promotion is a good idea for general ufuncs. > > > > Best, > > > > Sebastian > > > > > > [0] A python fraction could be converted to int64/int64 or > > int32/int32, > > etc. depending on the value, in principle. If we want such things to > > work in principle, we need machinery (although I expect one could tag > > that on later). > > [1] It is not impossible, but we need to insert non-existing types > > into > > the type hierarchy. > > > > > > > > PS: Another interesting issue is that if we try to move away from > > value > > based casting for numpy scalars, that initial `np.asarray(...)` call > > may lose the information that a python integer was passed in. So to > > support such things, we might need a whole new machinery. > > > > > > > > > > > All the best, > > > > > > Marten > > > > > > On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg < > > > sebast...@sipsolutions.net> wrote: > > > > Hi all, > > > > > > > > strange, something went wrong sending that email, but in any > > > > case... > > > > > > > > I tried to "summarize" the current behaviour of promotion and > > > > value > > > > based promotion in numpy (correcting a small error in what I > > > > wrote > > > > earlier). Since it got a bit long, you can find it here (also > > > > copy > > > > pasted at the end): > > > > > > > > https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA > > > > > > > > Allan's document which I link in there is also very interesting. > > > > One > > > > thing I had not really thought about before was the problem of > > > > commutativity. > > > > > > > > I do not have any specific points I want to discuss based on it > > > > (but > > > > those are likely to come up again later). > > > > > > > > All the Best, > > > > > > > > Sebastian > > > > > > > > > > > > ----------------------------- > > > > > > > > PS: Below a copy of what I wrote: > > > > > > > > --- > > > > title: Numpy Value Based Promotion Rules > > > > author: Sebastian Berg > > > > --- > > > > > > > > > > > > > > > > NumPy Value Based Scalar Casting and Promotion > > > > ============================================== > > > > > > > > This document reviews some of the behaviours of the promotion > > > > rules > > > > within numpy. This is especially with respect to the promotion of > > > > scalars and 0D arrays which inspect the value to decide casting > > > > and > > > > promotion. > > > > > > > > Other documents discussing these things: > > > > > > > > * `from numpy.testing import print_coercion_tables` prints the > > > > current promotion tables including value based promotion for > > > > small > > > > positive/negative scalars. > > > > * Allan Haldane's thoughts on changing casting/promotion to be > > > > more > > > > C-like and discussing things such as here: > > > > > > > > https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76 > > > > * Discussion around the problem of uint64 and int64 being > > > > promoted to > > > > float64: https://github.com/numpy/numpy/issues/12525 (lists many > > > > related issues). > > > > > > > > > > > > Nomenclature and Defintions > > > > --------------------------- > > > > > > > > * **dtype/type**: The data type of an array or scalar: `float32`, > > > > `float64`, `int8`, … > > > > > > > > * **Category**: A category to which the data type belongs, in > > > > this > > > > context these are: > > > > 1. boolean > > > > 2. integer (unsigned and signed are not split up here, but are > > > > different "kinds") > > > > 3. floating point and complex (not split up here but are > > > > different > > > > "kinds") > > > > 5. All others > > > > > > > > * **Casting**: converting from one dtype to another. There are > > > > four > > > > different rules of casting: > > > > 1. *"safe"* casting: All values are representable in the new > > > > data > > > > type. I.e. no information is lost during the conversion. > > > > 2. *"same kind"* casting: data loss may occur, but only within > > > > the > > > > same "kind". For example a float64 can be converted to float32 > > > > using > > > > "same kind" rules, an int64 can be converted to int16. This is > > > > although > > > > both lose precision or even produce incorrect values. Note that > > > > "kind" > > > > is different from "category" in that it distinguishes between > > > > signed > > > > and unsigned integers. > > > > 4. *"unsafe"* casting: Any conversion which can be defined, > > > > e.g. > > > > floating point to integer. For promotion this is fairly > > > > unimportant. > > > > (Some conversions such as string to integer, which not even work > > > > fall > > > > in this category, but could also be called coercions or > > > > conversions.) > > > > > > > > * **Promotion**: The general process of finding a new dtype for > > > > multiple input dtypes. Will be used here to also denote any kind > > > > of > > > > casting/promotion done before a specific function is called. This > > > > can > > > > be more complex, because in rare cases a functions can for > > > > example > > > > take > > > > floating point numbers and integers as input at the same time > > > > (i.e. > > > > `np.ldexp`). > > > > > > > > * **Common dtype**: A dtype which can represent all input data. > > > > In > > > > general this means that all inputs can be safely cast to this > > > > dtype. > > > > Within numpy this is the normal and simplest form of promotion. > > > > > > > > * **`type1, type2 -> type3`**: Defines a promotion or signature. > > > > For > > > > example adding two integers: `np.int32(5) + np.int32(3)` gives > > > > `np.int32(8)`. The dtype signature for that example would be: > > > > `int32, > > > > int32 -> int32`. A short form for this is also `ii->i` using C- > > > > like > > > > type codes, this can be found for example in `np.ldexp.types` > > > > (and > > > > any > > > > numpy ufunc). > > > > > > > > * **Scalar**: A numpy or python scalar or a **0-D array**. It is > > > > important to remember that zero dimensional arrays are treated > > > > just > > > > like scalars with respect to casting and promotion. > > > > > > > > > > > > Current Situation in Numpy > > > > -------------------------- > > > > > > > > The current situation can be understand mostly in terms of safe > > > > casting > > > > which is defined based on the type hierarchy and is sensitive to > > > > values > > > > for scalars. > > > > > > > > This safe casting based approach is in contrast for example to > > > > promotion within C or Julia, which work based on category first. > > > > For > > > > example `int32` cannot be safely cast to `float32`, but C or > > > > Julia > > > > will > > > > use `int32, float32 -> float32` as the common type/promotion rule > > > > for > > > > example to decide on the output dtype for addition. > > > > > > > > > > > > ### Python Integers and Floats > > > > > > > > Note that python integers are handled exactly like numpy ones. > > > > They > > > > are, however, special in that they do not have a dtype associated > > > > with > > > > them explicitly. Value based logic, as described here, seems > > > > useful > > > > for > > > > python integers and floats to allow: > > > > ``` > > > > arr = np.arange(10, dtype=np.int8) > > > > arr += 1 > > > > # or: > > > > res = arr + 1 > > > > res.dtype == np.int8 > > > > ``` > > > > which ensures that no upcast (for example with higher memory > > > > usage) > > > > occurs. > > > > > > > > > > > > ### Safe Casting > > > > > > > > Most safe casting is clearly defined based on whether or not any > > > > possible value is representable in the ouput dtype. Within numpy > > > > there > > > > is currently a single exception to this rule: > > > > `np.can_cast(np.int64, > > > > np.float64, casting="safe")` is considered to be true although > > > > float64 > > > > cannot represent some large integer values exactly. In contrast, > > > > `np.can_cast(np.int32, np.float32, casting="safe")` is `False` > > > > and > > > > `np.float64` would have to be used if a "safe" cast is desired. > > > > > > > > This exception may be one thing that should be changed, however, > > > > concurrently the promotion rules have to be adapted to keep doing > > > > the > > > > same thing, or a larger behaviour change decided. > > > > > > > > > > > > #### Scalar based rules > > > > > > > > Unlike arrays, where inspection of all values is not feasable, > > > > for > > > > scalars (and 0-D arrays) the value is inspected. The casting > > > > becomes a > > > > two step process: > > > > 1. The minimal dtype capable of holding the value is found. > > > > 2. The normal casting rules are applied to the new dtype. > > > > > > > > The first step uses the following rules by finding the minimal > > > > dtype > > > > within its category: > > > > > > > > * Boolean: Dtype is already minimal > > > > > > > > * Integers: > > > > Casting is possible if output can hold the value. This > > > > includes > > > > uint8(127) casting to an int8. > > > > > > > > * Floats and Complex > > > > Scalars can be demoted based on value, roughly this avoids > > > > overflows: > > > > ``` > > > > float16: -65000 < value < 65000 > > > > float32: -3.4e38 < value < 3.4e38 > > > > float64: -1.7e308 < value < 1.7e308 > > > > float128 (largest type, does not apply). > > > > ``` > > > > For complex, the logic is simply applied to both real and > > > > imaginary > > > > part. Complex numbers cannot be downcast to floating point. > > > > > > > > * Others: Dtype is not modified. > > > > > > > > > > > > This two step process means that `np.can_cast(np.int16(1024), > > > > np.float16)` is `False` even though float16 is capable of exactly > > > > representing the value 1024, since value based "demotion" to a > > > > lower > > > > dtype is used only within each category. > > > > > > > > > > > > > > > > ### Common Type Promotion > > > > > > > > For most operations in numpy the output type is just the common > > > > type of > > > > the inputs, this holds for example for concatenation, as well as > > > > almost > > > > all math funcions (e.g. addition and multiplication have two > > > > identical > > > > inputs and need one ouput dtype). This operation is exposed as > > > > `np.result_type` which includes value based logic, and > > > > `np.promote_types` which only accepts dtypes as input. > > > > > > > > Normal type promotion without value based/scalar logic finds the > > > > smallest type which both inputs can cast to safely. This will be > > > > the > > > > largest "kind" (bool < unsigned < integer < float < complex < > > > > other). > > > > > > > > Note that type promotion is handled in a "reduce" manner from > > > > left > > > > to > > > > right. In rare cases this means it is not associatetive: > > > > `float32, > > > > uint16, int16 -> float32`, but `float32, (uint16, int16) -> > > > > float64`. > > > > > > > > #### Scalar based rule > > > > > > > > When there is a mix of scalars and arrays, numpy will usually > > > > allow > > > > the > > > > scalars to be handled in the same fashion as for "safe" casting > > > > rules. > > > > > > > > The rules are as follows: > > > > > > > > 1. Value based logic is only applied if the "category" of any > > > > array > > > > is > > > > larger or equal to the category of all scalars. If this is not > > > > the > > > > case, the typical rules are used. > > > > * Specifically, this means: `np.array([1, 2, 3], > > > > dtype=np.uint8) + > > > > np.float64(12.)` gives a `float64` result, because the > > > > `np.float64(12.)` is not considered for being demoted. > > > > > > > > 2. Promotion is applied as normally, however, instead of the > > > > original > > > > dtype, the minimal dtype is used. In the case where the minimal > > > > data > > > > type is unsigned (say uint8) but the value is small enough, the > > > > minimal > > > > type may in fact be either `uint8` or `int8` (127 can be both). > > > > This > > > > promotion is also applied in pairs (reduction-like) from left to > > > > right. > > > > > > > > > > > > ### General Promotion during Function Execution > > > > > > > > General functions (read "ufuncs" such as `np.add`) may have a > > > > specific > > > > dtype signature which is (for most dtypes) stored e.g. as > > > > `np.add.types`. For many of these functions the common type > > > > promotion > > > > is used unchanged. > > > > > > > > However, some functions will employ a slightly different method > > > > (which > > > > should be equivalent in most cases). They will loop through all > > > > loops > > > > listed in `np.add.types` in order and find the first one to which > > > > all > > > > inputs can be safely cast: > > > > ``` > > > > np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...] > > > > ``` > > > > Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the > > > > first > > > > `float16, float16 -> float16` (`'ee->e'`) loop because `int16` > > > > cannot > > > > be cast safely, and then pick the float32 (`'ff->f'`) one. > > > > > > > > For simple functions, which commonly have two identical inputs, > > > > this > > > > should be identical, since normally a clear order exists for the > > > > dtypes > > > > (it does require checking int8 before uint8, etc.). > > > > > > > > #### Scalar based rule > > > > > > > > When scalars are involved, the "safe" cast logic based on values > > > > is > > > > applied *if and only if* rule 1. applies as before: That is there > > > > must > > > > be an array with a higher or equal category as all of the > > > > scalars. > > > > > > > > In the above `np.divide` example, this means that > > > > `np.divide(np.int16(4), np.array([3], dtype=np.float16))` *will* > > > > use > > > > the `'ee->e'` loop, because the scalar `4` is of a lower or equal > > > > category than the array (integer <= float or complex). While > > > > checking, > > > > 4 is found to be safely castable to float16, since `(u)int8` is > > > > sufficient to hold 4 and that can be safely cast to `float16`. > > > > However, `np.divide(np.int16(4), np.int16(3))` would use > > > > `float32` > > > > because both are scalars and thus value based logic is not used > > > > (Note > > > > that in reality numpy forces double output for an all integer > > > > input > > > > in > > > > divide). > > > > > > > > In it is possible for ufuncs to have mixed type signatures (this > > > > is > > > > very rare within numy) and arbitrary inputs. In this case, in > > > > principle, the question is whether or not a clear ordering exists > > > > and > > > > if the rule of using value based logic is always clear. This is > > > > rather > > > > academical (I could not find any such function in numpy or > > > > `scipy.special` [^scipy-ufuncs]). But consider: > > > > ``` > > > > imaginary_ufunc.types: > > > > int32, float32 -> int32, float32 > > > > int64, float32 -> int64, float32 > > > > ... > > > > ``` > > > > it is not clear that `np.int64(5) + np.float32(3.)` should be > > > > able > > > > to > > > > demote the `5`. This is very theoretical of course > > > > > > > > > > > > > > > > > > > > Footnotes > > > > --------- > > > > > > > > [^scipy-ufuncs]: See for example these functions: > > > > ```python > > > > import scipy.special > > > > for n, func in scipy.special.__dict__.items(): > > > > if not isinstance(func, np.ufunc): > > > > continue > > > > > > > > if func.nin == 1: > > > > # a single input is not interesting > > > > continue > > > > > > > > # check if the signature is not uniform > > > > for types in func.types: > > > > if len(set(types[:func.nin])) != 1: > > > > break > > > > else: > > > > continue > > > > print(func, func.types) > > > > ``` > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion@python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion