Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-26 Thread bruno Piguet
Disclaimer : this is a user's point of view. I never commited a line in
numpy.

In my usage, missing values happen or the need for some kind of mask, such
as sea/land.
I've been told, here,  that using MA is superior to using NaNs, and indeed,
I found a couple case where other libraries (matplotlib, ...) behaved
better with MA than with NaNs in simple ndarrays.
Thus, I fear moving masked arrays to a separate package would give them a
second-class status, have them look as optional, and lower their support by
third-party libraries.
And I view a as a bad idea any suggestion to deprecate MaskedArray before
any replacement is designed an implemented.

Bruno.

2018-05-24 17:40 GMT+02:00 Hameer Abbasi :

> I also somewhat like the idea of taking it out (once we have a first
> replacement) in the case that we have a plan to do a better/lower level
> replacement at a later point within numpy.
> Removal generally has its merits, but if a (mid term) replacement will
> come in any case, it would be nice to get those started first if
> possible.
> Otherwise downstream might end up having to fix up things twice.
>
> - Sebastian
>
>
> I also like the idea of designing a replacement first (using modern array
> protocols, perhaps in a separate repository) and then deprecating
> MaskedArray second. Deprecating an entire class in NumPy seems
> counterproductive, although I will admit I’ve never found use from it. From
> this thread, it’s clear that others have, though.
>
> Sent from Astro  for Mac
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-25 Thread Marten van Kerkwijk
Hi All,

I agree with comments above that deprecating/removing MaskedArray is
premature; we certainly depend on it in astropy (which is indeed what
got me started to contribute to numpy -- it was quite buggy!).

 I also think that, unlike Matrix, it is far from a neglected part of
numpy. Eric Wieser in particular has been cleaning it up quite a bit.
Also like Allan, I'm excited about making a new version based on
`__array_ufunc__`.

Beyond this, I think it is actually very useful for numpy to contain
at least one ndarray subclass, so that we have an internal check that
any changes we make to the base ndarray actually work.

So, my own sense would be that we should instead write a NEP with a
roadmap of what we want MaskedArray 2.0 to be like (e.g., no more
`nomask`...)

All the best,

Marten

p.s. And, of course, deprecation of Matrix is actually starting to
happen: with my PRs, it is now in a state where one could remove
`matrixlib` and all tests would still pass, and there is a pending PR
to start giving out PendingDeprectationWarning.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-24 Thread Allan Haldane
On 05/24/2018 11:31 AM, Sebastian Berg wrote:

> I also somewhat like the idea of taking it out (once we have a first
> replacement) in the case that we have a plan to do a better/lower level
> replacement at a later point within numpy.
> Removal generally has its merits, but if a (mid term) replacement will
> come in any case, it would be nice to get those started first if
> possible.
> Otherwise downstream might end up having to fix up things twice.
> 
> - Sebastian

Yes, I think the way forward is to start working on a new masked array
while keeping the old one in place.

Once it has progressed a little and we can step back and look at it, we
can consider how to switch over. I imagine we would have both present in
numpy under different names for a while.

Also, I think it would be nice to work on it soon because it is a chance
for us to eat our own dogfood in the __array_ufunc__ interface, which is
not yet set in stone so we can fix any problems we discover with it.

Allan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-24 Thread Sebastian Berg
On Wed, 2018-05-23 at 23:48 +0200, Sebastian Berg wrote:
> On Wed, 2018-05-23 at 17:33 -0400, Allan Haldane wrote:



> 
> If we do not plan to replace it within numpy, we need to discuss a
> bit
> how it might affect infrastructure (multiple implementations).
> 
> There is the other discussion about how to replace it. By opening
> up/creating new masked dtypes or similar (cool but unclear how
> complex/long term) or `__array_ufunc__` based (relatively simple,
> will
> get rid of the nastier hacks that are currently needed).
> 
> Or even both, just on different time scales?
> 

I also somewhat like the idea of taking it out (once we have a first
replacement) in the case that we have a plan to do a better/lower level
replacement at a later point within numpy.
Removal generally has its merits, but if a (mid term) replacement will
come in any case, it would be nice to get those started first if
possible.
Otherwise downstream might end up having to fix up things twice.

- Sebastian


> My first gut feeling about the proposal is: I love the idea to get
> rid
> of it... but lets not do it, it does feel like it makes too much
> infrastructure unclear.
> 
> - Sebastian
> 
> 
> > 
> > Allan
> > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Benjamin Root
As further evidence of a widely used package that is often considered
"critical" to an ecosystem that gets negligible support, look no further
than Basemap. It went almost two years without any commits before I took it
up (and then only because my employer needed a couple of fixes).

I worry that a masked array package would turn into Basemap.

Ben Root


On Wed, May 23, 2018 at 10:52 PM, Benjamin Root 
wrote:

> users of a package does not equate to maintainers of a package. Scikits
> are successful because scientists that have specialty in a field can
> contribute code and support the packages using their domain knowledge. How
> many people here are specialists in masked/missing value computation?
>
> Would I like to see better missing value support in numpy? Sure, but until
> then, MaskedArrays are what we have and it is still better than just using
> NaNs all over the place.
>
> Cheers!
> Ben Root
>
> On Wed, May 23, 2018 at 7:38 PM, Stefan van der Walt  > wrote:
>
>> Hi Eric,
>>
>> On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote:
>> > Masked arrays are critical to my numpy usage, and I suspect they are
>> > critical for many other use cases as well.
>>
>> That's good to know; and the goal of this NEP should be to improve your
>> siatuion, not make it worse.
>>
>> > In fact, I would prefer that a high priority for major numpy
>> > development be the more complete integration of masked array
>> capabilities
>> > into numpy, not their removal to a separate package.
>> >
>> > I was unhappy to see
>> > the effort in that direction a few years ago being killed.  I didn't
>> agree
>> > with every design decision, but overall I thought it was going in the
>> right
>> > direction.
>>
>> I see this and the NEP as orthogonal issues.  MaskedArrays, one
>> particular version of the masked value solution, has never truly been a
>> first class citizen.
>>
>> If we could instead implement masked arrays such that it simply sits on
>> top of existing NumPy functionality (using, e.g., special dtypes or
>> bitmasks), re-using all the standard machinery, that would be a natural
>> fit in the core of NumPy, and would negate the need for MaskedArrays.
>> But we haven't reached that point yet, and I am not aware of any current
>> proposal to do so.
>>
>> > Bad or missing values (and situations where one wants to use a mask to
>> > operate on a subset of an array) are found in many domains of real
>> life; do
>> > you really want python users in those domains to have to fall back on
>> > Matlab-style reliance on nans and/or manual mask manipulations, as the
>> new
>> > maskedarray package is sidelined?
>>
>> This is not too far from the current status quo, I would argue.  The
>> functionality exists, but it is "bolted on" rather than "built in".  And
>> my guess is that the component will benefit from some extra attention
>> that it is not getting as part of the current package.
>>
>> > Or is there any realistic prospect for maintenance and improvement of
>> the
>> > package after it is separated out?
>>
>> In order to prevent the package from being "sidelined", we would have to
>> strengthen this part of the story.
>>
>> > Side question: does your proposed purification of numpy include
>> elimination
>> > of linalg and random?  Based on the criteria in the NEP, I would expect
>> it
>> > does; so maybe you should have a more ambitious NEP, and do the
>> purification
>> > all in one step as a numpy version 2.0.  (Surely if masked arrays are
>> > purged, the matrix class should be booted out at the same time.)
>>
>> That's an interesting question, and one I have wondered about.  Would it
>> make sense to ship just the core ndarray object?  I don't know.  It
>> probably depends a lot on whether we can define clear API boundaries,
>> whether this kind of split is desired from the average user's
>> perspective, and whether it could benefit the development of the
>> subcomponents.
>>
>> W.r.t. matrices, I think you're setting a trap for me here, but I'm
>> going to step into it anyway ;)
>>
>> https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html
>>
>> It is, then, not the first time I argued in favor of moving certain
>> components out of NumPy onto their own packages.  I would probably have
>> written that NEP this time around, had it not been for the many strings
>> attached via SciPy sparse (and therefore sklearn etc.).  Before matrix
>> deprecation can be discussed further, therefore, we need to implement
>> sparse *arrays* for SciPy (and some efforts are slowly underway).
>>
>> See also:
>>
>> https://mail.python.org/pipermail/numpy-discussion/2017-
>> January/076290.html
>> http://numpy-discussion.10968.n7.nabble.com/Deprecate-matric
>> es-in-1-15-and-remove-in-1-17-tp44968.html
>>
>> Best regards,
>> Stéfan
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> 

Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Stefan van der Walt
Hi Eric,

On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote:
> Masked arrays are critical to my numpy usage, and I suspect they are
> critical for many other use cases as well.

That's good to know; and the goal of this NEP should be to improve your
siatuion, not make it worse.

> In fact, I would prefer that a high priority for major numpy
> development be the more complete integration of masked array capabilities
> into numpy, not their removal to a separate package.
>
> I was unhappy to see
> the effort in that direction a few years ago being killed.  I didn't agree
> with every design decision, but overall I thought it was going in the right
> direction.

I see this and the NEP as orthogonal issues.  MaskedArrays, one
particular version of the masked value solution, has never truly been a
first class citizen.

If we could instead implement masked arrays such that it simply sits on
top of existing NumPy functionality (using, e.g., special dtypes or
bitmasks), re-using all the standard machinery, that would be a natural
fit in the core of NumPy, and would negate the need for MaskedArrays.
But we haven't reached that point yet, and I am not aware of any current
proposal to do so.

> Bad or missing values (and situations where one wants to use a mask to
> operate on a subset of an array) are found in many domains of real life; do
> you really want python users in those domains to have to fall back on
> Matlab-style reliance on nans and/or manual mask manipulations, as the new
> maskedarray package is sidelined?

This is not too far from the current status quo, I would argue.  The
functionality exists, but it is "bolted on" rather than "built in".  And
my guess is that the component will benefit from some extra attention
that it is not getting as part of the current package.

> Or is there any realistic prospect for maintenance and improvement of the
> package after it is separated out?

In order to prevent the package from being "sidelined", we would have to
strengthen this part of the story.

> Side question: does your proposed purification of numpy include elimination
> of linalg and random?  Based on the criteria in the NEP, I would expect it
> does; so maybe you should have a more ambitious NEP, and do the purification
> all in one step as a numpy version 2.0.  (Surely if masked arrays are
> purged, the matrix class should be booted out at the same time.)

That's an interesting question, and one I have wondered about.  Would it
make sense to ship just the core ndarray object?  I don't know.  It
probably depends a lot on whether we can define clear API boundaries,
whether this kind of split is desired from the average user's
perspective, and whether it could benefit the development of the
subcomponents.

W.r.t. matrices, I think you're setting a trap for me here, but I'm
going to step into it anyway ;)

https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html

It is, then, not the first time I argued in favor of moving certain
components out of NumPy onto their own packages.  I would probably have
written that NEP this time around, had it not been for the many strings
attached via SciPy sparse (and therefore sklearn etc.).  Before matrix
deprecation can be discussed further, therefore, we need to implement
sparse *arrays* for SciPy (and some efforts are slowly underway).

See also:

https://mail.python.org/pipermail/numpy-discussion/2017-January/076290.html
http://numpy-discussion.10968.n7.nabble.com/Deprecate-matrices-in-1-15-and-remove-in-1-17-tp44968.html

Best regards,
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Matthew Rocklin
Hi All,

*Disclaimer: I don't spend any hours actually maintaining Numpy, so please
don't take my comments here with much weight.*

My gut reaction here is that if removing masked array allows Numpy to
evolve more quickly then this excites me.

It could be that a plan goes something like the following:

   1. Remove masked array to a separate package, pin it to current versions
   of Numpy.
   2. Evolve Numpy to the point where making new array types becomes
   attractive
   3. Make a new masked array with that new functionality that doesn't have
   the problems of the current implementation

Of course this is a simplistic view of the world, and it could also be that
this triggers a forking event.  However, hopefully it gets a general theme
across though that there is value to allowing Numpy to move quickly, and
that it might make sense for some feature-sets to miss out on that
evolution for a time for the greater good of the ecosystem's evolution.

-matt

On Wed, May 23, 2018 at 6:08 PM, Matthew Brett 
wrote:

> Hi,
>
> On Wed, May 23, 2018 at 10:42 PM, Stefan van der Walt
>  wrote:
> > On May 23, 2018 14:28:05 Matthew Brett  wrote:
> >>
> >>
> >> Can I ask what the plans are for supporting missing values, inside or
> >> outside numpy?  Is there are successor to MaskedArray - and is this
> >> part of the succession plan?
> >
> >
> > I am not aware of any concrete plans, maybe others can chime in?
> >
> > It's a bit strange, the words that are used in this thread: "succession",
> > "purification", "elimination", and "purge". I don't have my knife out for
> > MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect
> there
> > may be a better and more supporting home/project configuration for it,
> > perhaps still under the NumPy umbrella.
>
> The NEP notes that MaskedArray imposes a significant maintenance
> burden, as a motivation for removing it.  I'm sure you'd predict that
> the Numpy developers are likely to spend less time on it, if it moves
> to its own package.  I guess the hope would be that others would take
> over, but is that likely?  What if they don't?
>
> Would it be reasonable to develop an alternative plan for missing
> arrays in concert with this NEP, maybe along the lines that Allan
> mentioned, above?
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Matthew Brett
Hi,

On Wed, May 23, 2018 at 10:42 PM, Stefan van der Walt
 wrote:
> On May 23, 2018 14:28:05 Matthew Brett  wrote:
>>
>>
>> Can I ask what the plans are for supporting missing values, inside or
>> outside numpy?  Is there are successor to MaskedArray - and is this
>> part of the succession plan?
>
>
> I am not aware of any concrete plans, maybe others can chime in?
>
> It's a bit strange, the words that are used in this thread: "succession",
> "purification", "elimination", and "purge". I don't have my knife out for
> MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect there
> may be a better and more supporting home/project configuration for it,
> perhaps still under the NumPy umbrella.

The NEP notes that MaskedArray imposes a significant maintenance
burden, as a motivation for removing it.  I'm sure you'd predict that
the Numpy developers are likely to spend less time on it, if it moves
to its own package.  I guess the hope would be that others would take
over, but is that likely?  What if they don't?

Would it be reasonable to develop an alternative plan for missing
arrays in concert with this NEP, maybe along the lines that Allan
mentioned, above?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Sebastian Berg
On Wed, 2018-05-23 at 17:33 -0400, Allan Haldane wrote:
> On 05/23/2018 04:02 PM, Eric Firing wrote:
> > Bad or missing values (and situations where one wants to use a mask
> > to
> > operate on a subset of an array) are found in many domains of real
> > life;
> > do you really want python users in those domains to have to fall
> > back on
> > Matlab-style reliance on nans and/or manual mask manipulations, as
> > the
> > new maskedarray package is sidelined?
> 
> I also think that missing value support is important to include
> inside
> numpy, just as it is included in other numerical packages like R and
> Julia.
> 
> The time is ripe to write a new and better MaskedArray, because
> __array_ufunc__ exists now. With some other numpy devs a few months
> ago
> we also played with rewriting MA using __array_ufunc__ and fixing up
> all
> the bugs and inconsistencies we have discovered over time (eg,
> getting
> rid of the Masked constant). Both Eric and I started working on some
> code changes, but never submitted PRs. See a little bit of discussion
> here (there was some more elsewhere I can't find now):
> 
> https://github.com/numpy/numpy/pull/9792#issuecomment-46420
> 
> As I say there, numpy's current MA support is pretty poor compared to
> R
> - Wes McKinney partly justified his desire to move pandas away from
> numpy because of it. We have a lot to gain by implementing it nicely.
> 
> We already have an NEP discussing possible ways forward:
> https://docs.scipy.org/doc/numpy-1.14.0/neps/missing-data.html
> 
> I was pretty excited by discussion above, and still am. I want to get
> back to it after I finish more immediate priorities - finishing
> printing/loading/saving fixes and structured array fixes.
> 
> But Masked-Array-2 is on my list of desired long-term enhancements
> for
> numpy.

Well, if we plan to replace it within numpy, I think we should wait
until then for any move on deprecation (after which it seems like the
obviously right choice)?

If we do not plan to replace it within numpy, we need to discuss a bit
how it might affect infrastructure (multiple implementations).

There is the other discussion about how to replace it. By opening
up/creating new masked dtypes or similar (cool but unclear how
complex/long term) or `__array_ufunc__` based (relatively simple, will
get rid of the nastier hacks that are currently needed).

Or even both, just on different time scales?

My first gut feeling about the proposal is: I love the idea to get rid
of it... but lets not do it, it does feel like it makes too much
infrastructure unclear.

- Sebastian


> 
> Allan
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Stefan van der Walt

On May 23, 2018 14:28:05 Matthew Brett  wrote:


Can I ask what the plans are for supporting missing values, inside or
outside numpy?  Is there are successor to MaskedArray - and is this
part of the succession plan?


I am not aware of any concrete plans, maybe others can chime in?

It's a bit strange, the words that are used in this thread: "succession", 
"purification", "elimination", and "purge". I don't have my knife out for 
MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect 
there may be a better and more supporting home/project configuration for 
it, perhaps still under the NumPy umbrella.



Best regards,
Stéfan


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Allan Haldane
On 05/23/2018 04:02 PM, Eric Firing wrote:
> Bad or missing values (and situations where one wants to use a mask to
> operate on a subset of an array) are found in many domains of real life;
> do you really want python users in those domains to have to fall back on
> Matlab-style reliance on nans and/or manual mask manipulations, as the
> new maskedarray package is sidelined?

I also think that missing value support is important to include inside
numpy, just as it is included in other numerical packages like R and Julia.

The time is ripe to write a new and better MaskedArray, because
__array_ufunc__ exists now. With some other numpy devs a few months ago
we also played with rewriting MA using __array_ufunc__ and fixing up all
the bugs and inconsistencies we have discovered over time (eg, getting
rid of the Masked constant). Both Eric and I started working on some
code changes, but never submitted PRs. See a little bit of discussion
here (there was some more elsewhere I can't find now):

https://github.com/numpy/numpy/pull/9792#issuecomment-46420

As I say there, numpy's current MA support is pretty poor compared to R
- Wes McKinney partly justified his desire to move pandas away from
numpy because of it. We have a lot to gain by implementing it nicely.

We already have an NEP discussing possible ways forward:
https://docs.scipy.org/doc/numpy-1.14.0/neps/missing-data.html

I was pretty excited by discussion above, and still am. I want to get
back to it after I finish more immediate priorities - finishing
printing/loading/saving fixes and structured array fixes.

But Masked-Array-2 is on my list of desired long-term enhancements for
numpy.

Allan


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Matthew Brett
Hi,


On Wed, May 23, 2018 at 9:51 PM, Stefan van der Walt
 wrote:
> Hi Eric,
>
> On May 23, 2018 13:25:44 Eric Firing  wrote:
>
>> On 2018/05/23 9:06 AM, Matti Picus wrote:
>> I understand at least some of the motivation and potential advantages,
>> but as it stands, I find this NEP highly alarming.
>
>
> I am not at my computer right now, so I will respond in more detail later.
> But I wanted to address your statement above:
>
> I see a NEP as an opportunity to discuss and flesh out an idea, and I
> certainly hope that you there's no reason for alarm.
>
> I do not expect to know whether this is a good idea before discussions
> conclude, so I appreciate your feedback. If we cannot find good support for
> the idea, with very specific benefits, it should simply be dropped.
>
> But, I think there's a lot to learn from the conversation in the meantime
> w.r.t. exactly how streamlined people want NumPy to be, how core
> functionality can perhaps be strengthened by becoming a customer of our own
> API, how to optimally maintain sub-components, etc.

Can I ask what the plans are for supporting missing values, inside or
outside numpy?  Is there are successor to MaskedArray - and is this
part of the succession plan?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Ilhan Polat
 As far as I understand from the discussion above, I think the opposite
would be a better strategy for the sanity of our scarce but brave
maintainers. I would argue that if there is a maintenance burden, then the
ballasts seem to be the linalg and random indeed. Similar pain points exist
in SciPy too. There are a lot of issues that has been already thought of,
years ago but never materialized (be it backwards compatibility, lack of
champions and so on) because they are not the priority of the maintaining
team. It is very common that a discussion ends with "yes, we should
probably make it a ufunc" and then fades away. I feel that if there were
less things to worry about more people would step up and "do it".

I would also argue that highest expectancy from NumPy would be having a
really sound data structure basis with more ufuncs, more array manipulation
tricks and so on. Masked arrays, imho, fall into that category. Hence, if
the codebase gets more refined in that respect and less stuff to maintain,
less moving parts, I think there would be a more coherent overall picture
and more focused action plan. Now the attention of maintainers seem to be
divided into a lot of orthogonal issues which is not a bad thing per se but
tedious at times. Currently NumPy has a lot of code that really doesn't
need to bother and can delegate to higher level packages like SciPy or any
other subpackage. It sounds like NumPy 2.0 but actually more of a gradual
thinning out.




On Wed, May 23, 2018 at 10:51 PM, Stefan van der Walt 
wrote:

> Hi Eric,
>
> On May 23, 2018 13:25:44 Eric Firing  wrote:
>
> On 2018/05/23 9:06 AM, Matti Picus wrote:
>> I understand at least some of the motivation and potential advantages,
>> but as it stands, I find this NEP highly alarming.
>>
>
> I am not at my computer right now, so I will respond in more detail later.
> But I wanted to address your statement above:
>
> I see a NEP as an opportunity to discuss and flesh out an idea, and I
> certainly hope that you there's no reason for alarm.
>
> I do not expect to know whether this is a good idea before discussions
> conclude, so I appreciate your feedback. If we cannot find good support for
> the idea, with very specific benefits, it should simply be dropped.
>
> But, I think there's a lot to learn from the conversation in the meantime
> w.r.t. exactly how streamlined people want NumPy to be, how core
> functionality can perhaps be strengthened by becoming a customer of our own
> API, how to optimally maintain sub-components, etc.
>
> Best regards,
> Stéfan
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Stefan van der Walt

Hi Eric,

On May 23, 2018 13:25:44 Eric Firing  wrote:


On 2018/05/23 9:06 AM, Matti Picus wrote:
I understand at least some of the motivation and potential advantages,
but as it stands, I find this NEP highly alarming.


I am not at my computer right now, so I will respond in more detail later. 
But I wanted to address your statement above:


I see a NEP as an opportunity to discuss and flesh out an idea, and I 
certainly hope that you there's no reason for alarm.


I do not expect to know whether this is a good idea before discussions 
conclude, so I appreciate your feedback. If we cannot find good support for 
the idea, with very specific benefits, it should simply be dropped.


But, I think there's a lot to learn from the conversation in the meantime 
w.r.t. exactly how streamlined people want NumPy to be, how core 
functionality can perhaps be strengthened by becoming a customer of our own 
API, how to optimally maintain sub-components, etc.


Best regards,
Stéfan


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Ralf Gommers
On Wed, May 23, 2018 at 1:03 PM, Stefan van der Walt 
wrote:

> On Wed, 23 May 2018 12:29:32 -0700, Ralf Gommers wrote:
> > >>  * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
> > >>often cause complications when being used with other packages.
> > >>Fixing these issues is outside the scope of NumPy development.
> > >
> > Hmm, I wouldn't say it's out of scope at all. Currently it's simply part
> of
> > numpy.
>
> That is currently the situation, yes.  I think this was meant more as
> "we'd preferably not like to think about MaskedArrays any differently
> than we do about other external packages, such as dask".  I.e., not
> support specific hacks to make it work.
>
> > You're missing an important step I think. You're proposing to deprecate
> > MaskedArray completely (or not?). IIRC this has not been decided or
> > seriously discussed before.
>
> Good point, which certainly needs to be discussed.  My thought was to
> move it out into a separate package that could be maintained more in the
> spirit of a scikit by people who care deeply about its functionality.
>

That would be good in principle, but it's only possible that way once the
specific hacks you refer to above are removed. As long as MaskedArray
depends on implementation details of ndarray, evolving them in lock-step
will be necessary. And that is much easier when they're in the same package.

Regarding whether a split-off package will actually be developed, I think
that depends on having at least one champion for it stepping up. If we just
move it over into github.com/numpy/maskedarray, I think it will get less
rather than more attention.

Cheers,
Ralf


>
> Best regards,
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Eric Firing

On 2018/05/23 9:06 AM, Matti Picus wrote:
MaskedArray is a strange but useful creature. This NEP proposes to 
distribute it as a separate package under the NumPy brand.


As I understand the process, a proposed NEP should be first discussed 
here to gauge general acceptance, then after that the details should be 
discussed on the pull request itself 
https://github.com/numpy/numpy/pull/11146.


Here is the motivation section from the NEP:


MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds
masking capabilities, i.e. the ability to ignore or hide certain array
values during computation.

While historically convenient to distribute this class inside of NumPy,
improved packaging has made it possible to distribute it separately
without difficulty.

Motivations for this move include:

 * Focus: the NumPy package should strive to only include the
   `ndarray` object, and the essential utilities needed to manipulate
   such arrays.
 * Complexity: the MaskedArray implementation is non-trivial, and imposes
   a significant maintenance burden.
 * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
   often cause complications when being used with other packages.
   Fixing these issues is outside the scope of NumPy development.

This NEP proposes a deprecation pathway through which MaskedArrays
would still be accessible to users, but no longer as part of the core
package.


Any thoughts?

Matti and Stefan


I understand at least some of the motivation and potential advantages, 
but as it stands, I find this NEP highly alarming.  Masked arrays are 
critical to my numpy usage, and I suspect they are critical for many 
other use cases as well.  In fact, I would prefer that a high priority 
for major numpy development be the more complete integration of masked 
array capabilities into numpy, not their removal to a separate package. 
I was unhappy to see the effort in that direction a few years ago being 
killed.  I didn't agree with every design decision, but overall I 
thought it was going in the right direction.


Bad or missing values (and situations where one wants to use a mask to 
operate on a subset of an array) are found in many domains of real life; 
do you really want python users in those domains to have to fall back on 
Matlab-style reliance on nans and/or manual mask manipulations, as the 
new maskedarray package is sidelined?


Or is there any realistic prospect for maintenance and improvement of 
the package after it is separated out?  Or of mask/missing value 
handling being integrated into numpy?  Is the latter option on the table 
in any form, or is it DOA?


Side question: does your proposed purification of numpy include 
elimination of linalg and random?  Based on the criteria in the NEP, I 
would expect it does; so maybe you should have a more ambitious NEP, and 
do the purification all in one step as a numpy version 2.0.  (Surely if 
masked arrays are purged, the matrix class should be booted out at the 
same time.)


Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Stefan van der Walt
On Wed, 23 May 2018 12:29:32 -0700, Ralf Gommers wrote:
> >>  * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
> >>often cause complications when being used with other packages.
> >>Fixing these issues is outside the scope of NumPy development.
> >
> Hmm, I wouldn't say it's out of scope at all. Currently it's simply part of
> numpy.

That is currently the situation, yes.  I think this was meant more as
"we'd preferably not like to think about MaskedArrays any differently
than we do about other external packages, such as dask".  I.e., not
support specific hacks to make it work.

> You're missing an important step I think. You're proposing to deprecate
> MaskedArray completely (or not?). IIRC this has not been decided or
> seriously discussed before.

Good point, which certainly needs to be discussed.  My thought was to
move it out into a separate package that could be maintained more in the
spirit of a scikit by people who care deeply about its functionality.

Best regards,
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Matti Picus
MaskedArray is a strange but useful creature. This NEP proposes to 
distribute it as a separate package under the NumPy brand.


As I understand the process, a proposed NEP should be first discussed 
here to gauge general acceptance, then after that the details should be 
discussed on the pull request itself 
https://github.com/numpy/numpy/pull/11146.


Here is the motivation section from the NEP:


MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds
masking capabilities, i.e. the ability to ignore or hide certain array
values during computation.

While historically convenient to distribute this class inside of NumPy,
improved packaging has made it possible to distribute it separately
without difficulty.

Motivations for this move include:

 * Focus: the NumPy package should strive to only include the
   `ndarray` object, and the essential utilities needed to manipulate
   such arrays.
 * Complexity: the MaskedArray implementation is non-trivial, and imposes
   a significant maintenance burden.
 * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
   often cause complications when being used with other packages.
   Fixing these issues is outside the scope of NumPy development.

This NEP proposes a deprecation pathway through which MaskedArrays
would still be accessible to users, but no longer as part of the core
package.


Any thoughts?

Matti and Stefan


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion