Re: [Numpy-discussion] manylinux upgrade for numpy wheels

2020-02-06 Thread Neal Becker
Slightly off topic perhaps, it is recommended to perform custom compilation
for best performance, yet is there an
easy way to do this?  I don't think a simple pip will do.

On Wed, Feb 5, 2020 at 4:07 AM Matthew Brett 
wrote:

> Hi,
>
> On Tue, Feb 4, 2020 at 10:38 PM Nathaniel Smith  wrote:
> >
> > Pretty sure the 2010 and 2014 images both have much newer compilers than
> that.
> >
> > There are still a lot of users on CentOS 6, so I'd still stick to 2010
> for now on x86_64 at least. We could potentially start adding 2014 wheels
> for the other platforms where we currently don't ship wheels – gotta be
> better than nothing, right?
> >
> > There probably still is some tail of end users whose pip is too old to
> know about 2010 wheels. I don't know how big that tail is. If we wanted to
> be really careful, we could ship both manylinux1 and manylinux2010 wheels
> for a bit – pip will automatically pick the latest one it recognizes – and
> see what the download numbers look like.
>
> That all sounds right to me too.
>
> Cheers,
>
> Matthew
>
> > On Tue, Feb 4, 2020, 13:18 Charles R Harris 
> wrote:
> >>
> >> Hi All,
> >>
> >> Thought now would be a good time to decide on upgrading manylinux for
> the 1.19 release so that we can make sure that everything works as
> expected. The choices are
> >>
> >> manylinux1 -- CentOS 5, currently used, gcc 4.2 (in practice 4.5), only
> supports i686, x86_64.
> >> manylinux2010 -- CentOS 6, gcc 4.5, only supports i686, x86_64.
> >> manylinux2014 -- CentOS 7, gcc 4.8, supports many more architectures.
> >>
> >> The main advantage of manylinux2014 is that it supports many new
> architectures, some of which we are already testing against. The main
> disadvantage is that it requires pip >= 19.x, which may not be much of a
> problem 4 months from now but will undoubtedly cause some installation
> problems. Unfortunately, the compiler remains archaic, but folks interested
> in performance should be using a performance oriented distribution or
> compiling for their native architecture.
> >>
> >> Chuck
> >>
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 
*Those who don't understand recursion are doomed to repeat it*
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] manylinux upgrade for numpy wheels

2020-02-06 Thread Aldcroft, Thomas
Our organization is still using CentOS-6, so my vote is for that.

Thanks,
Tom

On Tue, Feb 4, 2020 at 5:38 PM Nathaniel Smith  wrote:

> Pretty sure the 2010 and 2014 images both have much newer compilers than
> that.
>
> There are still a lot of users on CentOS 6, so I'd still stick to 2010 for
> now on x86_64 at least. We could potentially start adding 2014 wheels for
> the other platforms where we currently don't ship wheels – gotta be better
> than nothing, right?
>
> There probably still is some tail of end users whose pip is too old to
> know about 2010 wheels. I don't know how big that tail is. If we wanted to
> be really careful, we could ship both manylinux1 and manylinux2010 wheels
> for a bit – pip will automatically pick the latest one it recognizes – and
> see what the download numbers look like.
>
> On Tue, Feb 4, 2020, 13:18 Charles R Harris 
> wrote:
>
>> Hi All,
>>
>> Thought now would be a good time to decide on upgrading manylinux for the
>> 1.19 release so that we can make sure that everything works as expected.
>> The choices are
>>
>> manylinux1  -- CentOS 5,
>> currently used, gcc 4.2 (in practice 4.5), only supports i686, x86_64.
>> manylinux2010  -- CentOS 6,
>> gcc 4.5, only supports i686, x86_64.
>> manylinux2014  -- CentOS 7,
>> gcc 4.8, supports many more architectures.
>>
>> The main advantage of manylinux2014 is that it supports many new
>> architectures, some of which we are already testing against. The main
>> disadvantage is that it requires pip >= 19.x, which may not be much of a
>> problem 4 months from now but will undoubtedly cause some installation
>> problems. Unfortunately, the compiler remains archaic, but folks interested
>> in performance should be using a performance oriented distribution or
>> compiling for their native architecture.
>>
>> Chuck
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

2020-02-06 Thread Stephan Hoyer
On Wed, Feb 5, 2020 at 8:02 AM Andreas Mueller  wrote:

> A bit late to the NEP 37 party.
> I just wanted to say that at least from my perspective it seems a great
> solution that will help sklearn move towards more flexible compute engines.
> I think one of the biggest issues is array creation (including random
> arrays), and that's handled quite nicely with NEP 37.
>

Andreas, thanks for sharing your feedback here! Your perspective is really
appreciated.


> - We use scipy.linalg in many places, and we would need to do a separate
> dispatching to check whether we can use module.linalg instead
>  (that might be an issue for many libraries but I'm not sure).
>

This brings up a good question -- obviously the final decision here is up
to SciPy maintainers, but how should we encourage SciPy to support
dispatching?

We could pretty easily make __array_function__ cover SciPy by simply
exposing NumPy's internal utilities. SciPy could simply use the
np.array_function_dispatch decorator internally and that would be enough.

It is less clear how this could work for __array_module__, because
__array_module__ and get_array_module() are not generic -- they refers
explicitly to a NumPy like module. If we want to extend it to SciPy (for
which I agree there are good use-cases), what should that look like?

The obvious choices would be to either add a new protocol, e.g.,
__scipy_module__ (but then NumPy needs to know about SciPy), or to add some
sort of "module request" parameter to np.get_array_module(), to indicate
the requested API, e.g., np.get_array_module(*arrays, matching='scipy').
This is pretty similar to the "default" argument but would need to get
passed into the __array_module__ protocol, too.


> - Some models have several possible optimization algorithms, some of which
> are pure numpy and some which are Cython. If someone provides a different
> array module,
>  we might want to choose an algorithm that is actually supported by that
> module. While this exact issue is maybe sklearn specific, a similar issue
> could appear for most downstream libs that use Cython in some places.
>  Many Cython algorithms could be implemented in pure numpy with a
> potential slowdown, but once we have NEP 37 there might be a benefit to
> having a pure NumPy implementation as an alternative code path.
>
>
> Anyway, NEP 37 seems a great step in the right direction and would enable
> sklearn to actually dispatch in some places. Dispatching just based on
> __array_function__ seems not really feasible so far.
>
> Best,
> Andreas Mueller
>
>
> On 1/6/20 11:29 PM, Stephan Hoyer wrote:
>
> I am pleased to present a new NumPy Enhancement Proposal for discussion:
> "NEP-37: A dispatch protocol for NumPy-like modules." Feedback would be
> very welcome!
>
> The full text follows. The rendered proposal can also be found online at
> https://numpy.org/neps/nep-0037-array-module.html
>
> Best,
> Stephan Hoyer
>
> ===
> NEP 37 — A dispatch protocol for NumPy-like modules
> ===
>
> :Author: Stephan Hoyer 
> :Author: Hameer Abbasi
> :Author: Sebastian Berg
> :Status: Draft
> :Type: Standards Track
> :Created: 2019-12-29
>
> Abstract
> 
>
> NEP-18's ``__array_function__`` has been a mixed success. Some projects
> (e.g.,
> dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted it. Others
> (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we propose a new
> protocol, ``__array_module__``, that we expect could eventually subsume
> most
> use-cases for ``__array_function__``. The protocol requires explicit
> adoption
> by both users and library authors, which ensures backwards compatibility,
> and
> is also significantly simpler than ``__array_function__``, both of which we
> expect will make it easier to adopt.
>
> Why ``__array_function__`` hasn't been enough
> -
>
> There are two broad ways in which NEP-18 has fallen short of its goals:
>
> 1. **Maintainability concerns**. `__array_function__` has significant
>implications for libraries that use it:
>
>- Projects like `PyTorch
>  `_, `JAX
>  `_ and even `scipy.sparse
>  `_ have been reluctant
> to
>  implement `__array_function__` in part because they are concerned
> about
>  **breaking existing code**: users expect NumPy functions like
>  ``np.concatenate`` to return NumPy arrays. This is a fundamental
>  limitation of the ``__array_function__`` design, which we chose to
> allow
>  overriding the existing ``numpy`` namespace.
>- ``__array_function__`` currently requires an "all or nothing"
> approach to
>  implementing NumPy's API. There is no good pathway for **incremental
>  adoption**, which is particularly problematic for established projects

Re: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

2020-02-06 Thread Sebastian Berg
On Thu, 2020-02-06 at 09:35 -0800, Stephan Hoyer wrote:
> On Wed, Feb 5, 2020 at 8:02 AM Andreas Mueller 
> wrote:

>  
> > - We use scipy.linalg in many places, and we would need to do a
> > separate dispatching to check whether we can use module.linalg
> > instead
> >  (that might be an issue for many libraries but I'm not sure).
> > 
> 
> This brings up a good question -- obviously the final decision here
> is up to SciPy maintainers, but how should we encourage SciPy to
> support dispatching?
> We could pretty easily make __array_function__ cover SciPy by simply
> exposing NumPy's internal utilities. SciPy could simply use the
> np.array_function_dispatch decorator internally and that would be
> enough.

Hmmm, in NumPy we can easily force basically 100% of (desired)
coverage, i.e. JAX can return a namespace that implements everything.
With SciPy that is already muss less feasible, and as you go to domain
specific tools it seems implausible.

`get_array_module` solves the issue of a library that wants to support
all array likes. As long as:
  * most functions rely only on the NumPy API
  * the domain specific library is expected to implement support for
specific array objects if necessary. E.g. sklearn can include
special code for Dask support. Dask does not replace sklearn code.

> It is less clear how this could work for __array_module__, because
> __array_module__ and get_array_module() are not generic -- they
> refers explicitly to a NumPy like module. If we want to extend it to
> SciPy (for which I agree there are good use-cases), what should that
> look __array_module__` 

I suppose the question is here, where should the code reside? For
SciPy, I agree there is a good reason why you may want to "reverse" the
implementation. The code to support JAX arrays, should live inside JAX.

One, probably silly, option is to return a "global" namespace, so that:

np = get_array_module(*arrays).numpy`

We have to distinct issues: Where should e.g. SciPy put a generic
implementation (assuming they to provide implementations that only
require NumPy-API support to not require overriding)?
And, also if a library provides generic support, should we define a
standard of how the context/namespace may be passed in/provided?

sklearn's main namespace is expected to support many array
objects/types, but it could be nice to pass in an already known
context/namespace (say scikit-image already found it, and then calls
scikit-learn internally). A "generic" namespace may even require this
to infer the correct output array object.


Another thing about backward compatibility: What is our vision there
actually?
This NEP will *not* give the *end user* the option to opt-in! Here,
opt-in is really reserved to the *library user* (e.g. sklearn). (I did
not realize this clearly before)

Thinking about that for a bit now, that seems like the right choice.
But it also means that the library requires an easy way of giving a
FutureWarning, to notify the end-user of the upcoming change. The end-
user will easily be able to convert to a NumPy array to keep the old
behaviour.
Once this warning is given (maybe during `get_array_module()`, the
array module object/context would preferably be passed around,
hopefully even between libraries. That provides a reasonable way to
opt-in to the new behaviour without a warning (mainly for library
users, end-users can silence the warning if they wish so). 

- Sebastian



> The obvious choices would be to either add a new protocol, e.g.,
> __scipy_module__ (but then NumPy needs to know about SciPy), or to
> add some sort of "module request" parameter to np.get_array_module(),
> to indicate the requested API, e.g., np.get_array_module(*arrays,
> matching='scipy'). This is pretty similar to the "default" argument
> but would need to get passed into the __array_module__ protocol, too.
>  
> > - Some models have several possible optimization algorithms, some
> > of which are pure numpy and some which are Cython. If someone
> > provides a different array module,
> >  we might want to choose an algorithm that is actually supported by
> > that module. While this exact issue is maybe sklearn specific, a
> > similar issue could appear for most downstream libs that use Cython
> > in some places.
> >  Many Cython algorithms could be implemented in pure numpy with a
> > potential slowdown, but once we have NEP 37 there might be a
> > benefit to having a pure NumPy implementation as an alternative
> > code path.
> > 
> > 
> > Anyway, NEP 37 seems a great step in the right direction and would
> > enable sklearn to actually dispatch in some places. Dispatching
> > just based on __array_function__ seems not really feasible so far.
> > 
> > Best,
> > Andreas Mueller
> > 
> > 
> > On 1/6/20 11:29 PM, Stephan Hoyer wrote:
> > > I am pleased to present a new NumPy Enhancement Proposal for
> > > discussion: "NEP-37: A dispatch protocol for NumPy-like modules."
> > > Feedback would be very welcome!
> > > 
> > >