Re: [Numpy-discussion] Moving forward with value based casting

2019-06-06 Thread Ralf Gommers
On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith  wrote:

>
> My intuition is that what users actually want is for *native Python
> types* to be treated as having 'underspecified' dtypes, e.g. int is
> happy to coerce to int8/int32/int64/whatever, float is happy to coerce
> to float32/float64/whatever, but once you have a fully-specified numpy
> dtype, it should stay.
>

Thanks Nathaniel, I think this expresses a possible solution better than
anything I've seen on this list before. An explicit "underspecified types"
concept could make casting understandable.


> In any case, it would probably be helpful to start by just writing
> down the whole set of rules we have now, because I'm not sure anyone
> understands all the details...
>

+1

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: SciPy 1.2.2 (LTS)

2019-06-06 Thread Tyler Reddy
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi all,

On behalf of the SciPy development team I'm pleased to announce
the release of SciPy 1.2.2, which is a bug fix release. This is part
of the long-term support (LTS) branch that includes Python 2.7.

Sources and binary wheels can be found at:
https://pypi.org/project/scipy/
and at: https://github.com/scipy/scipy/releases/tag/v1.2.2

One of a few ways to install this release with pip:

pip install scipy==1.2.2

=
SciPy 1.2.2 Release Notes
=

SciPy 1.2.2 is a bug-fix release with no new features compared to 1.2.1.
Importantly, the SciPy 1.2.2 wheels are built with OpenBLAS 0.3.7.dev to
alleviate issues with SkylakeX AVX512 kernels.

Authors
==

* CJ Carey
* Tyler Dawson +
* Ralf Gommers
* Kai Striega
* Andrew Nelson
* Tyler Reddy
* Kevin Sheppard +

A total of 7 people contributed to this release.
People with a "+" by their names contributed a patch for the first time.
This list of names is automatically generated, and may not be fully
complete.

Issues closed for 1.2.2
--
* `#9611 `__: Overflow error
with new way of p-value calculation in kendall tau correlation for
perfectly monotonic vectors
* `#9964 `__: optimize.newton :
overwrites x0 argument when it is a numpy array
* `#9784 `__: TST: Minimum
NumPy version is not being CI tested
* `#10132 `__: Docs:
Description of nnz attribute of sparse.csc_matrix misleading

Pull requests for 1.2.2
-
* `#10056 `__: BUG: Ensure
factorial is not too large in kendaltau
* `#9991 `__: BUG: Avoid inplace
modification of input array in newton
* `#9788 `__: TST, BUG:
f2py-related issues with NumPy < 1.14.0
* `#9749 `__: BUG:
MapWrapper.__exit__ should terminate
* `#10141 `__: Update
description for nnz on csc.py

Checksums
=

MD5
~~~

f5d23361e78f230f70fd117be20930e1
 
scipy-1.2.2-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
44387030d96a2495e5576800b2a567d6  scipy-1.2.2-cp27-cp27m-manylinux1_i686.whl
bc56bf862deadc96f6be1f67dc8eaf89
 scipy-1.2.2-cp27-cp27m-manylinux1_x86_64.whl
a45382978ff7d032041847f66e2f7351  scipy-1.2.2-cp27-cp27m-win32.whl
1140063ad53c44414f9feaae3c4fbf8c  scipy-1.2.2-cp27-cp27m-win_amd64.whl
3407230bae0c36210c5d3fee717a3579
 scipy-1.2.2-cp27-cp27mu-manylinux1_i686.whl
fbb9867ea3ba38cc0c979c38b8c77871
 scipy-1.2.2-cp27-cp27mu-manylinux1_x86_64.whl
8b4497e964c17135b6b2e8f691bed49e
 
scipy-1.2.2-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
9139c344bc6ef05f7f22191af0810ef6  scipy-1.2.2-cp34-cp34m-manylinux1_i686.whl
a62c1f316c33af02007da3374ebf02c3
 scipy-1.2.2-cp34-cp34m-manylinux1_x86_64.whl
780ce592f99ade01a9b0883ac767f798  scipy-1.2.2-cp34-cp34m-win32.whl
498e740b099182df30c16144a109acdf  scipy-1.2.2-cp34-cp34m-win_amd64.whl
8b157f5433846d8798ff6941d0f9671f
 
scipy-1.2.2-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
e1692a9e3e9a9b2764bccd0c9575bfef  scipy-1.2.2-cp35-cp35m-manylinux1_i686.whl
70863fc59dc034c07b73de765eb693f9
 scipy-1.2.2-cp35-cp35m-manylinux1_x86_64.whl
ce676f1adc72f8180b2eacec7e44c802  scipy-1.2.2-cp35-cp35m-win32.whl
21a9fac5e289682abe35ce6d54f5805f  scipy-1.2.2-cp35-cp35m-win_amd64.whl
470fa57418223df8fc27e9ec45bc7a94
 
scipy-1.2.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
4001f322a2967de0aa0b8148e0116def  scipy-1.2.2-cp36-cp36m-manylinux1_i686.whl
4e0d727cbbfe8410bd1229d197fb11d8
 scipy-1.2.2-cp36-cp36m-manylinux1_x86_64.whl
352608fa1f48877fc76a55217e689240  scipy-1.2.2-cp36-cp36m-win32.whl
559ca5cda1935a9992436bb1398dbcd0  scipy-1.2.2-cp36-cp36m-win_amd64.whl
92b9356944c239520f5b2897ba531c16
 
scipy-1.2.2-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
d9b427be8fc3bfd5b2a8330e1215b0ee  scipy-1.2.2-cp37-cp37m-manylinux1_i686.whl
4f2d513b1950ab7c147ddf3e4acb2542
 scipy-1.2.2-cp37-cp37m-manylinux1_x86_64.whl
1598ffe78061854f7bed87290250c33f  scipy-1.2.2-cp37-cp37m-win32.whl
9dad5d71152b714694e073d1c0c54288  scipy-1.2.2-cp37-cp37m-win_amd64.whl
d94de858fba4f24de7d6dd16f1caeb5d  scipy-1.2.2.tar.gz
136c5ee1bc4b259a12a7efe331b15d64  scipy-1.2.2.tar.xz
b9a5b4cbdf54cf681eda3b4d94a73c18  scipy-1.2.2.zip

SHA256
~~

271c6e56c8f9a3d6c3f0bc857d7a6e7cf7a8415c879a3915701cd011e82a83a3
 
scipy-1.2.2-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx

Re: [Numpy-discussion] Moving forward with value based casting

2019-06-06 Thread Nathaniel Smith
I haven't read all the thread super carefully, so I might have missed
something, but I think we might want to look at this together with the
special rule for scalar casting.

IIUC, the basic end-user problem that motivates all thi sis: when you
have a simple Python constant whose exact dtype is unspecified, people
don't want numpy to first automatically pick a dtype for it, and then
use that automatically chosen dtype to override the explicit dtypes
that the user specified. That's that "x + 1" problem. (This also comes
up a ton for languages trying to figure out how to type manifest
constants.)

Numpy's original solution for this was the special casting rule for
scalars. I don't understand the exact semantics, but it's something
like: in any operation involving a mix of non-zero-dim arrays and
zero-dim arrays, we throw out the exact dtype information for the
scalar ("float64", "int32") and replace it with just the "kind"
("float", "int").

This has several surprising consequences:

- The output dtype depends on not just the input dtypes, but also the
input shapes:

In [19]: (np.array([1, 2], dtype=np.int8) + 1).dtype
Out[19]: dtype('int8')

In [20]: (np.array([1, 2], dtype=np.int8) + [1]).dtype
Out[20]: dtype('int64')

- It doesn't just affect Python scalars with vague dtypes, but also
scalars where the user has specifically set the dtype:

In [21]: (np.array([1, 2], dtype=np.int8) + np.int64(1)).dtype
Out[21]: dtype('int8')

- I'm not sure the "kind" rule even does the right thing, especially
for mixed-kind operations. float16-array + int8-scalar has to do the
same thing as float16-array + int64-scalar, but that feels weird? I
think this is why value-based casting got added (at around the same
time as float16, in fact).

(Kinds are kinda problematic in general... the SAME_KIND casting rule
is very weird – casting int32->int64 is radically different from
casting float64->float32, which is radically different than casting
int64->int32, but SAME_KIND treats them all the same. And it's really
unclear how to generalize the 'kind' concept to new dtypes.)

My intuition is that what users actually want is for *native Python
types* to be treated as having 'underspecified' dtypes, e.g. int is
happy to coerce to int8/int32/int64/whatever, float is happy to coerce
to float32/float64/whatever, but once you have a fully-specified numpy
dtype, it should stay.

Some cases to think about:

np.array([1, 2], dtype=int8) + [1, 1]
 -> maybe this should have dtype int8, because there's no type info on
the right side to contradict that?

np.array([1, 2], dtype=int8) + 2**40
 -> maybe this should be an error, because you can't cast 2**40 to
int8 (under default casting safety rules)? That would introduce some
value-dependence, but it would only affect whether you get an error or
not, and there's precedent for that (e.g. division by zero).

In any case, it would probably be helpful to start by just writing
down the whole set of rules we have now, because I'm not sure anyone
understands all the details...

-n

On Wed, Jun 5, 2019 at 1:42 PM Sebastian Berg
 wrote:
>
> Hi all,
>
> TL;DR:
>
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
>
> ---
>
> Currently when you write code such as:
>
> arr = np.array([1, 43, 23], dtype=np.uint16)
> res = arr + 1
>
> Numpy uses fairly sophisticated logic to decide that `1` can be
> represented as a uint16, and thus for all unary functions (and most
> others as well), the output will have a `res.dtype` of uint16.
>
> Similar logic also exists for floating point types, where a lower
> precision floating point can be used:
>
> arr = np.array([1, 43, 23], dtype=np.float32)
> (arr + np.float64(2.)).dtype  # will be float32
>
> Currently, this value based logic is enforced by checking whether the
> cast is possible: "4" can be cast to int8, uint8. So the first call
> above will at some point check if "uint16 + uint16 -> uint16" is a
> valid operation, find that it is, and thus stop searching. (There is
> the additional logic, that when both/all operands are scalars, it is
> not applied).
>
> Note that while it is defined in terms of casting "1" to uint8 safely
> being possible even though 1 may be typed as int64. This logic thus
> affects all promotion rules as well (i.e. what should the output dtype
> be).
>
>
> There 2 main discussion points/issues about it:
>
> 1. Should value based casting/promotion logic exist at all?
>
> Arguably an `np.int32(3)` has type information attached to it, so why
> should we ignore it. It can also be tricky for users, because a small
> change in values can change the result data type.
> Because 0-D arrays and scalars are too close inside numpy (you will
> often not know which one you get). There is not much option but to
> handle them identically. However, 

Re: [Numpy-discussion] Moving forward with value based casting

2019-06-06 Thread Allan Haldane
On 6/6/19 12:46 PM, Sebastian Berg wrote:
> On Thu, 2019-06-06 at 11:57 -0400, Allan Haldane wrote:
>> I think dtype-based casting makes a lot of sense, the problem is
>> backward compatibility.
>>
>> Numpy casting is weird in a number of ways: The array + array casting
>> is
>> unexpected to many users (eg, uint64 + int64 -> float64), and the
>> casting of array + scalar is different from that, and value based.
>> Personally I wouldn't want to try change it unless we make a
>> backward-incompatible release (numpy 2.0), based on my experience
>> trying
>> to change much more minor things. We already put "casting" on the
>> list
>> of desired backward-incompatible changes on the list here:
>> https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release
>>
>> Relatedly, I've previously dreamed about a different "C-style" way
>> casting might behave:
>> https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
>>
>> The proposal there is that array + array casting, array + scalar, and
>> array + python casting would all work in the same dtype-based way,
>> which
>> mimics the familiar "C" casting rules.
> 
> If I read it right, you do propose that array + python would cast in a
> "minimal type" way for python.

I'm a little unclear what you mean by "minimal type" way. By "minimal
type", I thought you and others are talking about the rule numpy
currently uses that "the output dtype is the minimal dtype capable of
representing the value of both input dtypes", right? But in that gist I
am instead proposing that output-dtype is determined by C-like rules.

For array+py_scalar I was less certain what to do than for array+array
and array+npy_scalar. But I proposed the three "ranks" of 1. bool, 2.
int, and 3. float/complex. My rule for array+py_scalar is that if the
python scalar's rank is less than the numpy operand dtype's rank, use
the numpy dtype. If the python-scalar's rank is greater, use the
"default" types of bool_, int64, float64 respectively. Eg:

np.bool_(1) + 1-> int64   (default int wins)
np.int8(1) + 1 -> int8(numpy wins)
np.uint8(1) + (-1) -> uint8   (numpy wins)
np.int64(1) + 1-> int64   (numpy wins)
np.int64(1) + 1.0  -> float64 (default float wins)
np.float32(1.0) + 1.0  -> float32 (numpy wins)

Note it does not depend on the numerical value of the scalar, only its type.

> In your write up, you describe that if you mix array + scalar, the
> scalar uses a minimal dtype compared to the array's dtype. 

Sorry if I'm nitpicking/misunderstanding, but in my rules np.uint64(1) +
1 -> uint64 but in numpy's "minimal dtype" rules it is  -> float64. So I
don't think I am using the minimal rule.

> What we
> instead have is that in principle you could have loops such as:
> 
> "ifi->f"
> "idi->d"
> 
> and I think we should chose the first for a scalar, because it "fits"
> into f just fine. (if the input is) `ufunc(int_arr, 12., int_arr)`.

I feel I'm not understanding you, but the casting rules in my gist
follow those two rules if i, f are the numpy types int32 and float32.

If instead you mean (np.int64, py_float, np.int64) my rules would cast
to float64, since py_float has the highest rank and so is converted to
the default numpy-type for that rank, float64.

I would also add that unlike current numpy, my C-casting rules are
associative (if all operands are numpy types, see note below), so it
does not matter in which order you promote the types: (if)i  and i(fi)
give the same result. In current numpy this is not always the case:

p = np.promote_types
p(p('u2',   'i1'), 'f4')# ->  f8
p(  'u2', p('i1',  'f4'))   # ->  f4

(However, my casting rules are not associative if you include python
scalars.. eg  np.float32(1) + 1.0 + np.int64(1) . Maybe I should try to
fix that...)

Best,
Allan

> I do not mind keeping the "simple" two (or even more) operand "lets
> assume we have uniform types" logic around. For those it is easy to
> find a "minimum type" even before actual loop lookup.
> For the above example it would work in any case well, but it would get
> complicating, if for example the last integer is an unsigned integer,
> that happens to be small enough to fit also into an integer.
> 
> That might give some wiggle room, possibly also to attach warnings to
> it, or at least make things easier. But I would also like to figure out
> as well if we shouldn't try to move in any case. Sure, attach a major
> version to it, but hopefully not a "big step type".
> 
> One thing that I had not thought about is, that if we create
> FutureWarnings, we will need to provide a way to opt-in to the new/old
> behaviour.
> The old behaviour can be achieved by just using the python types (which
> probably is what most code that wants this behaviour does already), but
> the behaviour is tricky. Users can pass `dtype` explicitly, but that is
> a huge kludge...
> Will think about if there is a solution to that, because if there is
> not, you are right

Re: [Numpy-discussion] Moving forward with value based casting

2019-06-06 Thread Sebastian Berg
On Thu, 2019-06-06 at 11:57 -0400, Allan Haldane wrote:
> I think dtype-based casting makes a lot of sense, the problem is
> backward compatibility.
> 
> Numpy casting is weird in a number of ways: The array + array casting
> is
> unexpected to many users (eg, uint64 + int64 -> float64), and the
> casting of array + scalar is different from that, and value based.
> Personally I wouldn't want to try change it unless we make a
> backward-incompatible release (numpy 2.0), based on my experience
> trying
> to change much more minor things. We already put "casting" on the
> list
> of desired backward-incompatible changes on the list here:
> https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release
> 
> Relatedly, I've previously dreamed about a different "C-style" way
> casting might behave:
> https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
> 
> The proposal there is that array + array casting, array + scalar, and
> array + python casting would all work in the same dtype-based way,
> which
> mimics the familiar "C" casting rules.

If I read it right, you do propose that array + python would cast in a
"minimal type" way for python.

In your write up, you describe that if you mix array + scalar, the
scalar uses a minimal dtype compared to the array's dtype. What we
instead have is that in principle you could have loops such as:

"ifi->f"
"idi->d"

and I think we should chose the first for a scalar, because it "fits"
into f just fine. (if the input is) `ufunc(int_arr, 12., int_arr)`.

I do not mind keeping the "simple" two (or even more) operand "lets
assume we have uniform types" logic around. For those it is easy to
find a "minimum type" even before actual loop lookup.
For the above example it would work in any case well, but it would get
complicating, if for example the last integer is an unsigned integer,
that happens to be small enough to fit also into an integer.

That might give some wiggle room, possibly also to attach warnings to
it, or at least make things easier. But I would also like to figure out
as well if we shouldn't try to move in any case. Sure, attach a major
version to it, but hopefully not a "big step type".

One thing that I had not thought about is, that if we create
FutureWarnings, we will need to provide a way to opt-in to the new/old
behaviour.
The old behaviour can be achieved by just using the python types (which
probably is what most code that wants this behaviour does already), but
the behaviour is tricky. Users can pass `dtype` explicitly, but that is
a huge kludge...
Will think about if there is a solution to that, because if there is
not, you are right. It has to be a "big step" kind of release.
Although, even then it would be nice to have warnings that can be
enabled to ease the transition!

- Sebastian


> 
> See also:
> https://github.com/numpy/numpy/issues/12525
> 
> Allan
> 
> 
> On 6/5/19 4:41 PM, Sebastian Berg wrote:
> > Hi all,
> > 
> > TL;DR:
> > 
> > Value based promotion seems complex both for users and ufunc-
> > dispatching/promotion logic. Is there any way we can move forward
> > here,
> > and if we do, could we just risk some possible (maybe not-existing)
> > corner cases to break early to get on the way?
> > 
> > ---
> > 
> > Currently when you write code such as:
> > 
> > arr = np.array([1, 43, 23], dtype=np.uint16)
> > res = arr + 1
> > 
> > Numpy uses fairly sophisticated logic to decide that `1` can be
> > represented as a uint16, and thus for all unary functions (and most
> > others as well), the output will have a `res.dtype` of uint16.
> > 
> > Similar logic also exists for floating point types, where a lower
> > precision floating point can be used:
> > 
> > arr = np.array([1, 43, 23], dtype=np.float32)
> > (arr + np.float64(2.)).dtype  # will be float32
> > 
> > Currently, this value based logic is enforced by checking whether
> > the
> > cast is possible: "4" can be cast to int8, uint8. So the first call
> > above will at some point check if "uint16 + uint16 -> uint16" is a
> > valid operation, find that it is, and thus stop searching. (There
> > is
> > the additional logic, that when both/all operands are scalars, it
> > is
> > not applied).
> > 
> > Note that while it is defined in terms of casting "1" to uint8
> > safely
> > being possible even though 1 may be typed as int64. This logic thus
> > affects all promotion rules as well (i.e. what should the output
> > dtype
> > be).
> > 
> > 
> > There 2 main discussion points/issues about it:
> > 
> > 1. Should value based casting/promotion logic exist at all?
> > 
> > Arguably an `np.int32(3)` has type information attached to it, so
> > why
> > should we ignore it. It can also be tricky for users, because a
> > small
> > change in values can change the result data type.
> > Because 0-D arrays and scalars are too close inside numpy (you will
> > often not know which one you get). There is not much option but to
> > handle them identically. However, it se

Re: [Numpy-discussion] Moving forward with value based casting

2019-06-06 Thread Allan Haldane


I think dtype-based casting makes a lot of sense, the problem is
backward compatibility.

Numpy casting is weird in a number of ways: The array + array casting is
unexpected to many users (eg, uint64 + int64 -> float64), and the
casting of array + scalar is different from that, and value based.
Personally I wouldn't want to try change it unless we make a
backward-incompatible release (numpy 2.0), based on my experience trying
to change much more minor things. We already put "casting" on the list
of desired backward-incompatible changes on the list here:
https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release

Relatedly, I've previously dreamed about a different "C-style" way
casting might behave:
https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76

The proposal there is that array + array casting, array + scalar, and
array + python casting would all work in the same dtype-based way, which
mimics the familiar "C" casting rules.

See also:
https://github.com/numpy/numpy/issues/12525

Allan


On 6/5/19 4:41 PM, Sebastian Berg wrote:
> Hi all,
> 
> TL;DR:
> 
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
> 
> ---
> 
> Currently when you write code such as:
> 
> arr = np.array([1, 43, 23], dtype=np.uint16)
> res = arr + 1
> 
> Numpy uses fairly sophisticated logic to decide that `1` can be
> represented as a uint16, and thus for all unary functions (and most
> others as well), the output will have a `res.dtype` of uint16.
> 
> Similar logic also exists for floating point types, where a lower
> precision floating point can be used:
> 
> arr = np.array([1, 43, 23], dtype=np.float32)
> (arr + np.float64(2.)).dtype  # will be float32
> 
> Currently, this value based logic is enforced by checking whether the
> cast is possible: "4" can be cast to int8, uint8. So the first call
> above will at some point check if "uint16 + uint16 -> uint16" is a
> valid operation, find that it is, and thus stop searching. (There is
> the additional logic, that when both/all operands are scalars, it is
> not applied).
> 
> Note that while it is defined in terms of casting "1" to uint8 safely
> being possible even though 1 may be typed as int64. This logic thus
> affects all promotion rules as well (i.e. what should the output dtype
> be).
> 
> 
> There 2 main discussion points/issues about it:
> 
> 1. Should value based casting/promotion logic exist at all?
> 
> Arguably an `np.int32(3)` has type information attached to it, so why
> should we ignore it. It can also be tricky for users, because a small
> change in values can change the result data type.
> Because 0-D arrays and scalars are too close inside numpy (you will
> often not know which one you get). There is not much option but to
> handle them identically. However, it seems pretty odd that:
>  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
>  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> 
> give a different result.
> 
> This is a bit different for python scalars, which do not have a type
> attached already.
> 
> 
> 2. Promotion and type resolution in Ufuncs:
> 
> What is currently bothering me is that the decision what the output
> dtypes should be currently depends on the values in complicated ways.
> It would be nice if we can decide which type signature to use without
> actually looking at values (or at least only very early on).
> 
> One reason here is caching and simplicity. I would like to be able to
> cache which loop should be used for what input. Having value based
> casting in there bloats up the problem.
> Of course it currently works OK, but especially when user dtypes come
> into play, caching would seem like a nice optimization option.
> 
> Because `uint8(127)` can also be a `int8`, but uint8(128) it is not as
> simple as finding the "minimal" dtype once and working with that." 
> Of course Eric and I discussed this a bit before, and you could create
> an internal "uint7" dtype which has the only purpose of flagging that a
> cast to int8 is safe.
> 
> I suppose it is possible I am barking up the wrong tree here, and this
> caching/predictability is not vital (or can be solved with such an
> internal dtype easily, although I am not sure it seems elegant).
> 
> 
> Possible options to move forward
> 
> 
> I have to still see a bit how trick things are. But there are a few
> possible options. I would like to move the scalar logic to the
> beginning of ufunc calls:
>   * The uint7 idea would be one solution
>   * Simply implement something that works for numpy and all except
> strange external ufuncs (I can only think of numba as a plausible
> candidate for creating such).
> 
> My current plan is to see where the second thing leaves me.
> 

Re: [Numpy-discussion] Moving forward with value based casting

2019-06-06 Thread Sebastian Berg
On Wed, 2019-06-05 at 21:35 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> Tricky! It seems a balance between unexpected memory blow-up and
> unexpected wrapping (the latter mostly for integers). 
> 
> Some comments specifically on your message first, then some more
> general related ones. 
> 
> 1. I'm very much against letting `a + b` do anything else than
> `np.add(a, b)`.

Well, I tend to agree. But just to put it out there:

[1] + [2]  == [1, 2]
np.add([1], [2]) == 3

So that is already far from true, since coercion has to occur. Of
course it is true that:

arr + something_else

will at some point force coercion of `something_else`, so that point is
only half valid if either `a` or `b` is already a numpy array/scalar.


> 2. For python values, an argument for casting by value is that a
> python int can be arbitrarily long; the only reasonable course of
> action for those seems to make them float, and once you do that one
> might as well cast to whatever type can hold the value (at least
> approximately).

To be honest, the "arbitrary long" thing is another issue, which is the
silent conversion to "object" dtype. Something that is also on the not
done list of: Maybe we should deprecate it.

In other words, we would freeze python int to one clear type, if you
have an arbitrarily large int, you would need to use `object` dtype (or
preferably a new `pyint/arbitrary_precision_int` dtype) explicitly.

> 3. Not necessarily preferred, but for casting of scalars, one can get
> more consistent behaviour also by extending the casting by value to
> any array that has size=1.
> 

That sounds just as horrible as the current mismatch to me, to be
honest.

> Overall, just on the narrow question, I'd be quite happy with your
> suggestion of using type information if available, i.e., only cast
> python values to a minimal dtype.If one uses numpy types, those
> mostly will have come from previous calculations with the same
> arrays, so things will work as expected. And in most memory-limited
> applications, one would do calculations in-place anyway (or, as Tyler
> noted, for power users one can assume awareness of memory and thus
> the incentive to tell explicitly what dtype is wanted - just
> `np.add(a, b, dtype=...)`, no need to create `out`).
> 
> More generally, I guess what I don't like about the casting rules
> generally is that there is a presumption that if the value can be
> cast, the operation will generally succeed. For `np.add` and
> `np.subtract`, this perhaps is somewhat reasonable (though for
> unsigned a bit more dubious), but for `np.multiply` or `np.power` it
> is much less so. (Indeed, we had a long discussion about what to do
> with `int ** power` - now special-casing negative integer powers.)
> Changing this, however, probably really is a bridge too far!

Indeed that is right. But that is a different point. E.g. there is
nothing wrong for example that `np.power` shouldn't decide that
`int**power` should always _promote_ (not cast) `int` to some larger
integer type if available.
The only point where we seriously have such logic right now is for
np.add.reduce (sum) and np.multiply.reduce (prod), which always use at
least `long` precision (and actually upcast bool->int, although
np.add(True, True) does not. Another difference to True + True...)

> 
> Finally, somewhat related: I think the largest confusing actually
> results from the `uint64+in64 -> float64` casting.  Should this cast
> to int64 instead?

Not sure, but yes, it is the other quirk in our casting that should be
discussed….

- Sebastian

> 
> All the best,
> 
> Marten
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving forward with value based casting

2019-06-06 Thread Sebastian Berg
On Wed, 2019-06-05 at 17:14 -0700, Tyler Reddy wrote:
> A few thoughts:
> 
> - We're not trying to achieve systematic guards against integer
> overflow / wrapping in ufunc inner loops, right? The performance
> tradeoffs for a "result-based" casting / exception handling addition
> would presumably be controversial? I know there was some discussion
> about having an "overflow detection mode"  (toggle) of some sort that
> could be activated for ufunc loops, but don't think that gained much
> traction/ priority. I think for floats we have an awkward way to
> propagate something back to the user if there's an issue.

No, that is indeed a different issue. It would be nice to provide the
option of integer overflow warnings/errors, but it is different since
it should not affect the dtypes in use (i.e. we would never upcast to
avoid the error).

> - It sounds like the objective is instead primarily to achieve pure
> dtype-based promotion, which is then effectively just a casting
> table, which is what I think you mean by "cache?"

Yes, the cache was a bad word, I used it thinking of user types where a
large table would probably not be created on the fly.

> - Is it a safe assumption that for a cache (dtype-only casting
> table), the main tradeoff is that we'd likely tend towards
> conservative upcasting and using more memory in output types in many
> cases vs. NumPy at the moment? Stephan seems concerned about that,
> presumably because x + 1 suddenly changes output dtype in an
> overwhelming number of current code lines and future simple examples
> for end users.

Yes. That is at least what we currently have. For x + 1 there is a good
point with sudden memory blow up. Maybe an even nicer example is
`float32_arr + 1`, which would have to go to float64 if 1 is
interpreted as `int32(1)`.

> - If np.array + 1 absolutely has to stay the same output dtype moving
> forward, then "Keeping Value based casting only for python types" is
> the one that looks most promising to me initially, with a few further
> concerns:

Well, while it is annoying me. I think we should base that decision of
what we want the user API to be only. And because of that, it seems
like the most likely option.
At least my gut feeling is, if it is typed, we should honor the type
(also for scalars), but code like x + 1 suddenly blowing up memory is
not a good idea.
I just realized that one (anti?)-pattern that is common is the:

arr + 0.  # make sure its "inexact/float"

is exactly an example of where you do not want to upcast unnecessarily.


> 1) Would that give you enough refactoring "wiggle room" to achieve
> the simplifications you need? If value-based promotion still happens
> for a non-NumPy operand, can you abstract that logic cleanly from the
> "pure dtype cache / table" that is planned for NumPy operands?

It is tricky. There is always the slightly strange solution of making
dtypes such as uint7, which "fixes" the type hierarchy as a minimal
dtype for promotion purpose, but would never be exposed to users.
(You probably need more strange dtypes for float and int combinations.)

To give me some wiggle room, what I was now doing is to simply decide
on the correct dtype before lookup. I am pretty sure that works for
all, except possibly one ufunc within numpy. The reason that this works
is that almost all of our ufuncs are typed as "ii->i" (identical
types).
Maybe that is OK to start working, and the strange dtype hierarchy can
be thought of later.


> 2) Is the "out" argument to ufuncs a satisfactory alternative to the
> "power users" who want to "override" default output casting type? We
> suggest that they pre-allocate an output array of the desired type if
> they want to save memory and if they overflow or wrap integers that
> is their problem. Can we reasonably ask people who currently depend
> on the memory-conservation they might get from value-based behavior
> to adjust in this way?

The can also use `dtype=...` (or at least we can fix that part to be
reliable). Or they can cast type the input. Especially if we want to
use it only for python integers/floats, adding the `np.int8(3)` is not
much effort.

> 3) Presumably "out" does / will circumvent the "cache / dtype casting
> table?"

Well, out fixes one of the types, if we look at the general machinery,
it would be possible to have:

ff->d
df->d
dd->d

loops. So if such loops are defined we cannot quite circumvent the
whole lookup. If we know that all loops are of the `ff->f` all same
dtype kind (which is true for almost all functions inside numpy),
lookup could be simplified.
For those loops with all the same dtype, the issue is fairly straight
forward anyway, because I can just decide how to handle the scalar
before hand.

Best,

Sebastian


> 
> Tyler
> 
> On Wed, 5 Jun 2019 at 15:37, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
> > Hi all,
> > 
> > Maybe to clarify this at least a little, here are some examples for
> > what currently happen and what I could imagine we can go

Re: [Numpy-discussion] Moving forward with value based casting

2019-06-06 Thread Ralf Gommers
On Wed, Jun 5, 2019 at 10:42 PM Sebastian Berg 
wrote:

> Hi all,
>
> TL;DR:
>
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
> ...
> I have realized that this got much too long, so I hope it makes sense.
> I will continue to dabble along on these things a bit, so if nothing
> else maybe writing it helps me to get a bit clearer on things...
>

Your email was long but very clear. The part I'm missing is "why are things
the way they are?". Before diving into casting rules and all other wishes
people may have, can you please try to answer that? Because there's more to
it than "(maybe not-existing) corner cases".

Marten's first sentence ("a balance between unexpected memory blow-up and
unexpected wrapping") is in the right direction. As is Stephan's "Too many
users rely upon arithmetic like "x + 1" having a predictable dtype."

The problem is clear, however you need to figure out the constraints first,
then decide within the wiggle room you have what the options are.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion