Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-04 Thread Stephan Hoyer
PEP-574 isn't on the roadmap (yet!), but I think we would clearly welcome
it. Like all NumPy improvements, it would need to implemented by an
interested party.
On Mon, Jun 4, 2018 at 1:52 AM Antoine Pitrou  wrote:

>
> Hi,
>
> Do you plan to consider trying to add PEP 574 / pickle5 support? There's
> an implementation ready (and a PyPI backport) that you can play with.
> https://www.python.org/dev/peps/pep-0574/
>
> PEP 574 implicits targets Numpy arrays as one of its primary producers,
> since Numpy arrays is how large scientific or numerical data often ends
> up represented and where zero-copy is often desired by users.
>
> PEP 574 could certainly be useful even without Numpy arrays supporting
> it, but less so.  So I would welcome any feedback on that front (and,
> given that I'd like PEP 574 to be accepted in time for Python 3.8, I'd
> ideally like to have that feedback sometimes in the forthcoming months
> ;-)).
>
> Best regards
>
> Antoine.
>
>
> On Thu, 31 May 2018 16:50:02 -0700
> Matti Picus  wrote:
> > At the recent NumPy sprint at BIDS (thanks to those who made the trip)
> > we spent some time brainstorming about a roadmap for NumPy, in the
> > spirit of similar work that was done for Jupyter. The idea is that a
> > document with wide community acceptance can guide the work of the
> > full-time developer(s), and be a source of ideas for expanding
> > development efforts.
> >
> > I put the document up at
> > https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
> > it at a BOF session during SciPy in the middle of July in Austin.
> >
> > Eventually it could become a NEP or formalized in another way.
> >
> > Matti
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Charles R Harris
On Thu, May 31, 2018 at 5:50 PM, Matti Picus  wrote:

> At the recent NumPy sprint at BIDS (thanks to those who made the trip) we
> spent some time brainstorming about a roadmap for NumPy, in the spirit of
> similar work that was done for Jupyter. The idea is that a document with
> wide community acceptance can guide the work of the full-time developer(s),
> and be a source of ideas for expanding development efforts.
>
> I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap,
> and hope to discuss it at a BOF session during SciPy in the middle of July
> in Austin.
>
> Eventually it could become a NEP or formalized in another way.
>
> Matti
>

Under maintenance we could add something about the transition to Python 3,
in particular cleaning up the code and updating the documentation examples.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Ralf Gommers
On Fri, Jun 1, 2018 at 9:57 AM, Stefan van der Walt 
wrote:

> Hi Ralf,
>
> On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
> > - "internal refactorings": MaskedArray yes, but the other ones no.
> > numpy.distutils and f2py are very hard to test, a big refactor pretty
> much
> > guarantees breakage. there's also not much need for refactoring, because
> > those things are not coupled to the numpy.core internals. numpy.financial
> > is simply uninteresting - we wish it wasn't there but it is, so now it
> > simply stays where it is.
>
> I want to clarify that in the current notes we put down ideas that
> prompted active discussion, even if they weren't necessarily feasible.
> I feel it is important to keep the conversation open to run its course
> until we have a good understanding of the various issues at hand.
>
> You may find that, in person, people are more willing to admit to their
> support for some "heretical" ideas than they are here on the list.
>

Thanks Stefan, good points. I totally agree that anything can be discussed.


>
> E.g., you say that the financial functions "now simply stay", but that
> promises a future of a NumPy that never shrinks, while there is
> certainly some support for allowing NumPy to contract so that we can
> release maintenance burden and allow development of other core areas
> that have been neglected for a long time.
>
> You will *always* have small, vocal proponents of any specific piece of
> functionality; that doesn't necessarily mean that such functionality
> contributes to the health of a project as a whole.
>
> So, I gently urge us carefully reconsider the narrative that nothing can
> change/be removed, and evaluate each suggestion carefully, not weighing
> only the very evident negatives but also the longer term positives.
>

I don't think there's such a narrative - e.g. the removal of np.matrix that
we've planned and getting rid of MaskedArray at some point once we have a
better new masked array implementation are *major* removals. We do plan
those things because they have major benefits. Imho "major benefits" is a
bar that needs to be passed before listing features as up for removal on a
roadmap (even a draft one).

It would be helpful maybe to find a form for the roadmap where the
essentials of such discussions (key pros/cons) can be captured. Or at least
split it in good/desirable/planned items and "wild ideas".

Re `financial`, there isn't much of a pro as far as I can tell - there's
almost zero maintenance cost now, and it doesn't hinder any of the proposed
new features. Plus it's a discussion we've had a couple of times before.

I know that the current roadmap doc is only draft, but it still says "NumPy
Roadmap" and it's the best thing we have now, so I'd prefer to not have
things there (or have them in a separate random/controversial ideas
section) that are unlikely to happen or for which it's unclear if they're
good ideas.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Gael Varoquaux
While we are in the crazy wish-list: having dtypes that are universal
enough for pandas to use them and export their columns with them would be
my crazy wish. I hope that it would help adding more uniform support for
things like categorical variables in the pydata ecosystem.

Gaël
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Matthew Harrigan
I would love to see gufuncs become more general.  Specifically I would like
an optional prologue and epilogue function. The prologue could potentially
1) inspect parameterized dtypes 2) kwargs 3) set non-trivial output array
sizes 4) initialize data structures 5) defer processing to other functions
(BLAS).  The epilogue function could do any clean up of data structures.

On Fri, Jun 1, 2018 at 12:57 PM, Stefan van der Walt 
wrote:

> Hi Ralf,
>
> On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
> > - "internal refactorings": MaskedArray yes, but the other ones no.
> > numpy.distutils and f2py are very hard to test, a big refactor pretty
> much
> > guarantees breakage. there's also not much need for refactoring, because
> > those things are not coupled to the numpy.core internals. numpy.financial
> > is simply uninteresting - we wish it wasn't there but it is, so now it
> > simply stays where it is.
>
> I want to clarify that in the current notes we put down ideas that
> prompted active discussion, even if they weren't necessarily feasible.
> I feel it is important to keep the conversation open to run its course
> until we have a good understanding of the various issues at hand.
>
> You may find that, in person, people are more willing to admit to their
> support for some "heretical" ideas than they are here on the list.
>
> E.g., you say that the financial functions "now simply stay", but that
> promises a future of a NumPy that never shrinks, while there is
> certainly some support for allowing NumPy to contract so that we can
> release maintenance burden and allow development of other core areas
> that have been neglected for a long time.
>
> You will *always* have small, vocal proponents of any specific piece of
> functionality; that doesn't necessarily mean that such functionality
> contributes to the health of a project as a whole.
>
> So, I gently urge us carefully reconsider the narrative that nothing can
> change/be removed, and evaluate each suggestion carefully, not weighing
> only the very evident negatives but also the longer term positives.
>
> Best regards,
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Chris Barker
On Fri, Jun 1, 2018 at 9:46 AM, Chris Barker  wrote:

> numpy is also quite a bit slower than raw python for math with (very)
> small arrays:
>

doing a bit more experimentation, the advantage is with pure python for
over 10 elements (I got bored...). but I noticed that the time for numpy
computation is pretty much constant for 2 up to around 100 elements. Which
implies that the bulk of the issue is with "startup" costs, rather than
fancy indexing or anything like that. so maybe a short cut wouldn't be
helpful.

Note if you use a list comp (the pythonic translation of an array
operation) thecrossover point is about 15 elements (in my tests, on my
machine...)

In [90]: % timeit t2 = [x * 10 for x in t]

920 ns ± 4.88 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

-CHB




> In [31]: % timeit t2 = (t[0] * 10, t[1] * 10)
> 162 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>
> In [32]: a
> Out[32]: array([ 3.4,  5.6])
>
> In [33]: % timeit a2 = a * 10
> 941 ns ± 7.95 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
>
> (I often want to so this sort of thing, not for performance, but for ease
> of computation -- say you have 2 or three coordinates that represent a
> point -- it's really nice to be able to scale or shift with array
> operations, rather than all that indexing -- but it is pretty slo with
> numpy.
>
> I've wondered if numpy could be optimized for small 1D arrays, and maybe
> even 2d arrays with a small fixed second dimension (N x 2, N x 3), by
> special-casing / short-cutting those cases.
>
> It would require some careful profiling to see if it would help, but it
> sure seems possible.
>
> And maybe scalars could be fit into the same system.
>
> -CHB
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Stefan van der Walt
Hi Ralf,

On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
> - "internal refactorings": MaskedArray yes, but the other ones no.
> numpy.distutils and f2py are very hard to test, a big refactor pretty much
> guarantees breakage. there's also not much need for refactoring, because
> those things are not coupled to the numpy.core internals. numpy.financial
> is simply uninteresting - we wish it wasn't there but it is, so now it
> simply stays where it is.

I want to clarify that in the current notes we put down ideas that
prompted active discussion, even if they weren't necessarily feasible.
I feel it is important to keep the conversation open to run its course
until we have a good understanding of the various issues at hand.

You may find that, in person, people are more willing to admit to their
support for some "heretical" ideas than they are here on the list.

E.g., you say that the financial functions "now simply stay", but that
promises a future of a NumPy that never shrinks, while there is
certainly some support for allowing NumPy to contract so that we can
release maintenance burden and allow development of other core areas
that have been neglected for a long time.

You will *always* have small, vocal proponents of any specific piece of
functionality; that doesn't necessarily mean that such functionality
contributes to the health of a project as a whole.

So, I gently urge us carefully reconsider the narrative that nothing can
change/be removed, and evaluate each suggestion carefully, not weighing
only the very evident negatives but also the longer term positives.

Best regards,
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Chris Barker
On Fri, Jun 1, 2018 at 4:43 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:


>  one thing that always slightly annoyed me is that numpy math is way
> slower for scalars than python math
>

numpy is also quite a bit slower than raw python for math with (very) small
arrays:

In [31]: % timeit t2 = (t[0] * 10, t[1] * 10)
162 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [32]: a
Out[32]: array([ 3.4,  5.6])

In [33]: % timeit a2 = a * 10
941 ns ± 7.95 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)


(I often want to so this sort of thing, not for performance, but for ease
of computation -- say you have 2 or three coordinates that represent a
point -- it's really nice to be able to scale or shift with array
operations, rather than all that indexing -- but it is pretty slo with
numpy.

I've wondered if numpy could be optimized for small 1D arrays, and maybe
even 2d arrays with a small fixed second dimension (N x 2, N x 3), by
special-casing / short-cutting those cases.

It would require some careful profiling to see if it would help, but it
sure seems possible.

And maybe scalars could be fit into the same system.

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Todd
On Fri, Jun 1, 2018, 11:27 Todd  wrote:

>
>
> On Thu, May 31, 2018, 19:50 Matti Picus  wrote:
>
>> At the recent NumPy sprint at BIDS (thanks to those who made the trip)
>> we spent some time brainstorming about a roadmap for NumPy, in the
>> spirit of similar work that was done for Jupyter. The idea is that a
>> document with wide community acceptance can guide the work of the
>> full-time developer(s), and be a source of ideas for expanding
>> development efforts.
>>
>> I put the document up at
>> https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
>> it at a BOF session during SciPy in the middle of July in Austin.
>>
>> Eventually it could become a NEP or formalized in another way.
>>
>> Matti
>>
>
>
> Some things I have seen mentioned but don't know the current plans for:
>
> * Categorical arrays
> * Releasing the GIL wherever possible
> * Using multithreading internally
> * making use of the next generation blas when available and stay involved
> in planning to make sure it supports our needs
> * Figure out where to use Cython and were not to
>

Also:

* Figure out the best way to handle strings.  This may involve multiple
approaches for different situations but the current approach may not be the
best default approach.
* Decimal and/or rational arrays
* if yes to labeled arrays, then there should probably be a pep about
label-based indexing
* A decision about how to handle numpy 2.0

>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Todd
On Thu, May 31, 2018, 19:50 Matti Picus  wrote:

> At the recent NumPy sprint at BIDS (thanks to those who made the trip)
> we spent some time brainstorming about a roadmap for NumPy, in the
> spirit of similar work that was done for Jupyter. The idea is that a
> document with wide community acceptance can guide the work of the
> full-time developer(s), and be a source of ideas for expanding
> development efforts.
>
> I put the document up at
> https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
> it at a BOF session during SciPy in the middle of July in Austin.
>
> Eventually it could become a NEP or formalized in another way.
>
> Matti
>


Some things I have seen mentioned but don't know the current plans for:

* Categorical arrays
* Releasing the GIL wherever possible
* Using multithreading internally
* making use of the next generation blas when available and stay involved
in planning to make sure it supports our needs
* Figure out where to use Cython and were not to

>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Marten van Kerkwijk
Hi Matti,

Thanks for sharing the roadmap. Overall, it looks very nice. A practical
question is on whether you want input via the mailing list, or should one
just edit the wiki and add questions or so?

As the roadmap mentioned interaction with python proper (and a possible
PEP): one thing that always slightly annoyed me is that numpy math is way
slower for scalars than python math - and duplicates all the function
names. It would seem to make sense to allow python's math module to be
overridden for non-python input, including arrays. That could be another
PEP...

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion