Re: [Numpy-discussion] Experimental `like=` attribute for array creation functions

2020-08-17 Thread Peter Andreas Entschev
As per discussed, I've opened a PR
https://github.com/numpy/numpy/pull/17093 attempting to clarify some
of the writing and to follow the NEP Template. As suggested in the
template, please find below the top part of NEP-35 (up to and
including the Backward Compatibility section). Please feel free to
comment and suggest improvements or point out what may still be
unclear, personally I would prefer comments directly on the PR if
possible.

===
NEP 35 — Array Creation Dispatching With __array_function__
===

:Author: Peter Andreas Entschev 
:Status: Draft
:Type: Standards Track
:Created: 2019-10-15
:Updated: 2020-08-17
:Resolution:

Abstract


We propose the introduction of a new keyword argument ``like=`` to all array
creation functions, this argument permits the creation of an array based on
a non-NumPy reference array passed via that argument, resulting in an array
defined by the downstream library implementing that type, which also implements
the ``__array_function__`` protocol. With this we address one of that
protocol's shortcomings, as described by NEP 18 [1]_.

Motivation and Scope


Many are the libraries implementing the NumPy API, such as Dask for graph
computing, CuPy for GPGPU computing, xarray for N-D labeled arrays, etc. All
the libraries mentioned have yet another thing in common: they have also adopted
the ``__array_function__`` protocol. The protocol defines a mechanism allowing a
user to directly use the NumPy API as a dispatcher based on the input array
type. In essence, dispatching means users are able to pass a downstream array,
such as a Dask array, directly to one of NumPy's compute functions, and NumPy
will be able to automatically recognize that and send the work back to Dask's
implementation of that function, which will define the return value. For
example:

.. code:: python

x = dask.array.arange(5)# Creates dask.array
np.sum(a)   # Returns dask.array

Note above how we called Dask's implementation of ``sum`` via the NumPy
namespace by calling ``np.sum``, and the same would apply if we had a CuPy
array or any other array from a library that adopts ``__array_function__``.
This allows writing code that is agnostic to the implementation library, thus
users can write their code once and still be able to use different array
implementations according to their needs.

Unfortunately, ``__array_function__`` has limitations, one of them being array
creation functions. In the example above, NumPy was able to call Dask's
implementation because the input array was a Dask array. The same is not true
for array creation functions, in the example the input of ``arange`` is simply
the integer ``5``, not providing any information of the array type that should
be the result, that's where a reference array passed by the ``like=`` argument
proposed here can be of help, as it provides NumPy with the information
required to create the expected type of array.

The new ``like=`` keyword proposed is solely intended to identify the downstream
library where to dispatch and the object is used only as reference, meaning that
no modifications, copies or processing will be performed on that object.

We expect that this functionality will be mostly useful to library developers,
allowing them to create new arrays for internal usage based on arrays passed
by the user, preventing unnecessary creation of NumPy arrays that will
ultimately lead to an additional conversion into a downstream array type.

Support for Python 2.7 has been dropped since NumPy 1.17, therefore we make use
of the keyword-only argument standard described in PEP-3102 [2]_ to implement
``like=``, thus preventing it from being passed by position.

.. _neps.like-kwarg.usage-and-impact:

Usage and Impact


To understand the intended use for ``like=``, and before we move to more complex
cases, consider the following illustrative example consisting only of NumPy and
CuPy arrays:

.. code:: python

import numpy as np
import cupy

def my_pad(arr, padding):
padding = np.array(padding, like=arr)
return np.concatenate((padding, arr, padding))

my_pad(np.arange(5), [-1, -1])# Returns np.ndarray
my_pad(cupy.arange(5), [-1, -1])  # Returns cupy.core.core.ndarray

Note in the ``my_pad`` function above how ``arr`` is used as a reference to
dictate what array type padding should have, before concatenating the arrays to
produce the result. On the other hand, if ``like=`` wasn't used, the NumPy case
case would still work, but CuPy wouldn't allow this kind of automatic
conversion, ultimately raising a
``TypeError: Only cupy arrays can be concatenated`` exception.

Now we should look at how a library like Dask could benefit from ``like=``.
Before we understand that, it's important to understand a bit about Dask basics
and ensures correctness with ``_

[Numpy-discussion] Announcing 2020 Google Season of Docs Technical Writers for NumPy

2020-08-17 Thread Melissa Mendonça
Hello all,

I'm pleased to announce that NumPy was awarded two slots in the Google
Season of Docs program (you can see the full results here:
https://developers.google.com/season-of-docs/docs/participants).

The selected projects are

- "NumPy Documentation for Community Education", by Ryan Cooper (Proposal:
https://developers.google.com/season-of-docs/docs/participants/project-numpy-cooperrc
)


- "High level restructuring and end user focus", by kubedoc (Proposal:
https://developers.google.com/season-of-docs/docs/participants/project-numpy-kubedoc
)

We appreciate all projects that were submitted and thank all participants
for their efforts in putting together their proposals. Also, if you wish to
contribute documentation to NumPy on a volunteer basis, you are welcome to
do so!

Cheers,

- Melissa
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] start of an array (tensor) and dataframe API standardization initiative

2020-08-17 Thread Ralf Gommers
Hi all,

I'd like to share this announcement blog post about the creation of a
consortium for array and dataframe API standardization here:
https://data-apis.org/blog/announcing_the_consortium/. It's still in the
beginning stages, but starting to take shape. We have participation from
one or more maintainers of most array and tensor libraries - NumPy,
TensorFlow, PyTorch, MXNet, Dask, JAX, Xarray. Stephan Hoyer, Travis
Oliphant and myself have been providing input from a NumPy perspective.

The effort is very much related to some of the interoperability work we've
been doing in NumPy (e.g. it could provide an answer to what's described in
https://numpy.org/neps/nep-0037-array-module.html#requesting-restricted-subsets-of-numpy-s-api
).

At this point we're looking for feedback from maintainers at a high level
(see the blog post for details).

Also important: the python-record-api tooling and data in its repo has very
granular API usage data, of the kind we could really use when making
decisions that impact backwards compatibility.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion