[Numpy-discussion] NEP 50: Promotion rules for Python scalars

2022-06-01 Thread Sebastian Berg
Hi all,

I would like to share the first formal draft of

NEP 50: Promotion rules for Python scalars

with everyone.  The full text can be found here:

https://numpy.org/neps/nep-0050-scalar-promotion.html

NEP 50 is an attempt to remove value-based casting/promotion.  We wish
to replace it with clearer rules for the resulting dtype when mixing
NumPy arrays and Python scalars.  As a brief example, the proposal
allows the following (unchanged):

>>> np.array([1, 2, 3], dtype=np.int8) + 100
np.array([101, 102, 103], dtype=np.int8)

While clearing up confusion caused by the value-inspecting behavior
that we see sometimes, such as:

>>> np.array([1, 2, 3], dtype=np.int8) + 300
np.array([301, 302, 303], dtype=np.int16)  # note the int16

Where 300 is too large to fit an ``int8``.  As well as removing the
special behavior of 0-D arrays or NumPy scalars:

>>> res = np.array(1, dtype=np.int8) + 100
>>> res.dtype
dtype('int64')

This is the continuation of a long discussion (see the "Discussion"
section), including the poll I once posted:
https://discuss.scientific-python.org/t/poll-future-numpy-behavior-when-mixing-arrays-numpy-scalars-and-python-scalars/202

I would be happy for any feadback, be it just editorial or fundamental
discussion.  There are many alternatives which I have tried to capture
in the NEP.
So lets discuss here, or on discuss:

   
https://discuss.scientific-python.org/t/nep-50-promotion-rules-for-python-scalars/280

For smaller edits, don't hesitate to open a NumPy PR, or propose edits
on my branch (you can use the edit button to create a PR):

   
https://github.com/seberg/numpy/blob/nep50/doc/neps/nep-0050-scalar-promotion.rst

An important part of moving forward will be assessing the real world
impact.  To start that process, I have created a branch as a draft PR
(at this time):

https://github.com/numpy/numpy/pull/21626

It is missing some parts, but should allow preliminary testing. The
main missing part is that the integer warnings and errors are less
strict than proposed in the NEP.
It would be invaluable to get a better idea to what extent existing
code, especially end-user code, is affected by the proposed changes.

Thanks in advance for any input!  This is a big, complicated proposal,
but finding a way forward will hopefully clear up a source of confusion
and inconsistencies that make both maintainers and users life harder.

Cheers,

Sebastian


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 50: Promotion rules for Python scalars

2022-06-01 Thread Ralf Gommers
On Wed, Jun 1, 2022 at 5:51 PM Sebastian Berg 
wrote:

>
> An important part of moving forward will be assessing the real world
> impact.  To start that process, I have created a branch as a draft PR
> (at this time):
>
> https://github.com/numpy/numpy/pull/21626
>
> It is missing some parts, but should allow preliminary testing. The
> main missing part is that the integer warnings and errors are less
> strict than proposed in the NEP.
> It would be invaluable to get a better idea to what extent existing
> code, especially end-user code, is affected by the proposed changes.
>

Thanks Sebastian! For testing, did you already try with some of the usual
suspects, or would it be helpful to use this branch on SciPy, Pandas, etc.?
Also, do you expect it's useful to do platform-specific testing? I can
imagine there's some Windows-specific behavior; adapting a SciPy CI job to
work from your branch is easy to do if that would be helpful.

Cheers,
Ralf
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 50: Promotion rules for Python scalars

2022-06-01 Thread Sebastian Berg
On Wed, 2022-06-01 at 20:23 +0200, Ralf Gommers wrote:
> On Wed, Jun 1, 2022 at 5:51 PM Sebastian Berg
> 
> wrote:
> 
> > 
> > An important part of moving forward will be assessing the real
> > world
> > impact.  To start that process, I have created a branch as a draft
> > PR
> > (at this time):
> > 
> >     https://github.com/numpy/numpy/pull/21626
> > 
> > It is missing some parts, but should allow preliminary testing. The
> > main missing part is that the integer warnings and errors are less
> > strict than proposed in the NEP.
> > It would be invaluable to get a better idea to what extent existing
> > code, especially end-user code, is affected by the proposed
> > changes.
> > 
> 
> Thanks Sebastian! For testing, did you already try with some of the
> usual
> suspects, or would it be helpful to use this branch on SciPy, Pandas,
> etc.?
> Also, do you expect it's useful to do platform-specific testing? I
> can
> imagine there's some Windows-specific behavior; adapting a SciPy CI
> job to
> work from your branch is easy to do if that would be helpful.
> 

Yes, I have for SciPy.  As noted in the PR, those look "mostly
harmless" on first sight (not that it won't mean quite a bit of work,
but I think it is manageable work).
I would be more scared if there is a need to systematically vet all
places where behavior (may have) changed.

For example, in NumPy:

   np.median(np.float32([1, 2, 3, 4]))

did return a float64 before and will now return a float32.  I assume
because somewhere we write: `(np.float64(3) + np.float32(2)) / 2`.

There a few places that I suspect just need updated test or a bit of
thought.  And at least one or two that need to use the correct integer
types (IIRC `scipy.io.idl` seems to be using some low precision or
unsigned integer type internally and that leads to failures).

I thought pandas would fail much harder, but it seems only had a 150-
200 failures (many probably clustered).  One larger annoyance there is
that one parametrized test runs into an infinite recursion which makes
it run excruciatingly slow.

In any case, I believe that it would be far more helpful if those more
familiar with the libraries have a look at the failures.  Not only do
they know better how much impact they have; it also helps to get a feel
for how painful the transition will be.

One problem I see, is that I still expect that libraries are not the
main issue.
Using a SciPy integrator may end up with a float32 rather than a
float64 result.  In the SciPy test suite, that probably just means
tweaking the test a bit.
But that same change will also break someones script out there,
somewhere.  So the real affected persons (who may occasionally get less
precise/breaking results) are likely the end-users rather than the
libraries.

Cheers,

Sebastian


> Cheers,
> Ralf
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Small API addition: unique now has `equal_nan=False` keyword argument

2022-06-01 Thread Sebastian Berg
Hi all,

this has been discussed before, so mainly a brief announcement that we
merged a PR to add the `equal_nan` kwarg to `np.unique`.

If set to False, multiple `NaN`s will be reported multiple times (which
was the behavior prior to NumPy 1.21).

The keyword argument name was chosen to match that of `np.array_equal`
and the testing functions that.

Since the actual release is a bit off and this has no backwards
compatibility concerns, there is a chance that this may be backported
into NumPy 1.23 (This is at Chucks discretion as the release manager).

Cheers,

Sebastian


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] API Addition: Polynomial classes now have a "symbol" denoting the variable

2022-06-01 Thread Sebastian Berg
Hi all,

just another small API announcement, that I merged:

https://github.com/numpy/numpy/pull/16154

which adds `symbol="x"` to the polynomial classes.  Ross' more detailed
explanation is copied below.

Cheers,

Sebastian



New attribute ``symbol`` added to polynomial classes


The polynomial classes in the ``numpy.polynomial`` package have a new
``symbol`` attribute which is used to represent the indeterminate
of the polynomial.
This can be used to change the value of the variable when printing::

>>> P_y = np.polynomial.Polynomial([1, 0, -1], symbol="y")
>>> print(P_y)
1.0 + 0.0·y¹ - 1.0·y²

Note that the polynomial classes only support 1D polynomials, so
operations
that involve polynomials with different symbols are disallowed when the
result would be multivariate::

>>> P = np.polynomial.Polynomial([1, -1])  # default symbol is "x"
>>> P_z = np.polynomial.Polynomial([1, 1], symbol="z")
>>> P * P_z
Traceback (most recent call last)
   ...
ValueError: Polynomial symbols differ

The symbol can be any valid Python identifier. The default is
``symbol=x``, consistent with existing behavior.



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 50: Promotion rules for Python scalars

2022-06-01 Thread Juan Nunez-Iglesias
> For example, in NumPy:
> 
>np.median(np.float32([1, 2, 3, 4]))
> 
> did return a float64 before and will now return a float32.  I assume
> because somewhere we write: `(np.float64(3) + np.float32(2)) / 2`.

Sorry, I missed this part of the discussion — I know the discussion centered 
around Python literals being weak, but for NumPy dtypes, I thought the larger 
dtype would always win?

Indeed, reading the NEP I see:

Expression: array([1.], float32) + array(1., float64)
Old result: array([2.], float32)
New result: array([2.], float64)

which seems to contradict your statement above?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 50: Promotion rules for Python scalars

2022-06-01 Thread Sebastian Berg
On Wed, 2022-06-01 at 18:37 -0500, Juan Nunez-Iglesias wrote:
> > For example, in NumPy:
> > 
> >    np.median(np.float32([1, 2, 3, 4]))
> > 
> > did return a float64 before and will now return a float32.  I
> > assume
> > because somewhere we write: `(np.float64(3) + np.float32(2)) / 2`.
> 
> Sorry, I missed this part of the discussion — I know the discussion
> centered around Python literals being weak, but for NumPy dtypes, I
> thought the larger dtype would always win?


Good reading carefully enough to notice :)!

Sorry... my bad, the float64 is a typo.  That should have read:

(float32(3) + float32(2)) / 2

Which does show the change in behavior as described/discussed.  If
there was a float64 involved, of course the result would be/remain
float64.

- Sebastian



> 
> Indeed, reading the NEP I see:
> 
> Expression: array([1.], float32) + array(1., float64)
> Old result: array([2.], float32)
> New result: array([2.], float64)
> 
> which seems to contradict your statement above?
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com