Re: [Numpy-discussion] Splitting MaskedArray into a separate package
As further evidence of a widely used package that is often considered "critical" to an ecosystem that gets negligible support, look no further than Basemap. It went almost two years without any commits before I took it up (and then only because my employer needed a couple of fixes). I worry that a masked array package would turn into Basemap. Ben Root On Wed, May 23, 2018 at 10:52 PM, Benjamin Rootwrote: > users of a package does not equate to maintainers of a package. Scikits > are successful because scientists that have specialty in a field can > contribute code and support the packages using their domain knowledge. How > many people here are specialists in masked/missing value computation? > > Would I like to see better missing value support in numpy? Sure, but until > then, MaskedArrays are what we have and it is still better than just using > NaNs all over the place. > > Cheers! > Ben Root > > On Wed, May 23, 2018 at 7:38 PM, Stefan van der Walt > wrote: > >> Hi Eric, >> >> On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote: >> > Masked arrays are critical to my numpy usage, and I suspect they are >> > critical for many other use cases as well. >> >> That's good to know; and the goal of this NEP should be to improve your >> siatuion, not make it worse. >> >> > In fact, I would prefer that a high priority for major numpy >> > development be the more complete integration of masked array >> capabilities >> > into numpy, not their removal to a separate package. >> > >> > I was unhappy to see >> > the effort in that direction a few years ago being killed. I didn't >> agree >> > with every design decision, but overall I thought it was going in the >> right >> > direction. >> >> I see this and the NEP as orthogonal issues. MaskedArrays, one >> particular version of the masked value solution, has never truly been a >> first class citizen. >> >> If we could instead implement masked arrays such that it simply sits on >> top of existing NumPy functionality (using, e.g., special dtypes or >> bitmasks), re-using all the standard machinery, that would be a natural >> fit in the core of NumPy, and would negate the need for MaskedArrays. >> But we haven't reached that point yet, and I am not aware of any current >> proposal to do so. >> >> > Bad or missing values (and situations where one wants to use a mask to >> > operate on a subset of an array) are found in many domains of real >> life; do >> > you really want python users in those domains to have to fall back on >> > Matlab-style reliance on nans and/or manual mask manipulations, as the >> new >> > maskedarray package is sidelined? >> >> This is not too far from the current status quo, I would argue. The >> functionality exists, but it is "bolted on" rather than "built in". And >> my guess is that the component will benefit from some extra attention >> that it is not getting as part of the current package. >> >> > Or is there any realistic prospect for maintenance and improvement of >> the >> > package after it is separated out? >> >> In order to prevent the package from being "sidelined", we would have to >> strengthen this part of the story. >> >> > Side question: does your proposed purification of numpy include >> elimination >> > of linalg and random? Based on the criteria in the NEP, I would expect >> it >> > does; so maybe you should have a more ambitious NEP, and do the >> purification >> > all in one step as a numpy version 2.0. (Surely if masked arrays are >> > purged, the matrix class should be booted out at the same time.) >> >> That's an interesting question, and one I have wondered about. Would it >> make sense to ship just the core ndarray object? I don't know. It >> probably depends a lot on whether we can define clear API boundaries, >> whether this kind of split is desired from the average user's >> perspective, and whether it could benefit the development of the >> subcomponents. >> >> W.r.t. matrices, I think you're setting a trap for me here, but I'm >> going to step into it anyway ;) >> >> https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html >> >> It is, then, not the first time I argued in favor of moving certain >> components out of NumPy onto their own packages. I would probably have >> written that NEP this time around, had it not been for the many strings >> attached via SciPy sparse (and therefore sklearn etc.). Before matrix >> deprecation can be discussed further, therefore, we need to implement >> sparse *arrays* for SciPy (and some efforts are slowly underway). >> >> See also: >> >> https://mail.python.org/pipermail/numpy-discussion/2017- >> January/076290.html >> http://numpy-discussion.10968.n7.nabble.com/Deprecate-matric >> es-in-1-15-and-remove-in-1-17-tp44968.html >> >> Best regards, >> Stéfan >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >>
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
Hi Eric, On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote: > Masked arrays are critical to my numpy usage, and I suspect they are > critical for many other use cases as well. That's good to know; and the goal of this NEP should be to improve your siatuion, not make it worse. > In fact, I would prefer that a high priority for major numpy > development be the more complete integration of masked array capabilities > into numpy, not their removal to a separate package. > > I was unhappy to see > the effort in that direction a few years ago being killed. I didn't agree > with every design decision, but overall I thought it was going in the right > direction. I see this and the NEP as orthogonal issues. MaskedArrays, one particular version of the masked value solution, has never truly been a first class citizen. If we could instead implement masked arrays such that it simply sits on top of existing NumPy functionality (using, e.g., special dtypes or bitmasks), re-using all the standard machinery, that would be a natural fit in the core of NumPy, and would negate the need for MaskedArrays. But we haven't reached that point yet, and I am not aware of any current proposal to do so. > Bad or missing values (and situations where one wants to use a mask to > operate on a subset of an array) are found in many domains of real life; do > you really want python users in those domains to have to fall back on > Matlab-style reliance on nans and/or manual mask manipulations, as the new > maskedarray package is sidelined? This is not too far from the current status quo, I would argue. The functionality exists, but it is "bolted on" rather than "built in". And my guess is that the component will benefit from some extra attention that it is not getting as part of the current package. > Or is there any realistic prospect for maintenance and improvement of the > package after it is separated out? In order to prevent the package from being "sidelined", we would have to strengthen this part of the story. > Side question: does your proposed purification of numpy include elimination > of linalg and random? Based on the criteria in the NEP, I would expect it > does; so maybe you should have a more ambitious NEP, and do the purification > all in one step as a numpy version 2.0. (Surely if masked arrays are > purged, the matrix class should be booted out at the same time.) That's an interesting question, and one I have wondered about. Would it make sense to ship just the core ndarray object? I don't know. It probably depends a lot on whether we can define clear API boundaries, whether this kind of split is desired from the average user's perspective, and whether it could benefit the development of the subcomponents. W.r.t. matrices, I think you're setting a trap for me here, but I'm going to step into it anyway ;) https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html It is, then, not the first time I argued in favor of moving certain components out of NumPy onto their own packages. I would probably have written that NEP this time around, had it not been for the many strings attached via SciPy sparse (and therefore sklearn etc.). Before matrix deprecation can be discussed further, therefore, we need to implement sparse *arrays* for SciPy (and some efforts are slowly underway). See also: https://mail.python.org/pipermail/numpy-discussion/2017-January/076290.html http://numpy-discussion.10968.n7.nabble.com/Deprecate-matrices-in-1-15-and-remove-in-1-17-tp44968.html Best regards, Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy-Discussion Digest, Vol 140, Issue 25
If someone implements a separate library for masked arrays without changing anything in numpy and its better and people use it then maybe the deprecation of it in numpy would be wise. But for me it seems like a large disruption to force such a transition. Much in the way that the numeric standard library is de facto not the numerical library for python. ~Sam ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
Hi All, *Disclaimer: I don't spend any hours actually maintaining Numpy, so please don't take my comments here with much weight.* My gut reaction here is that if removing masked array allows Numpy to evolve more quickly then this excites me. It could be that a plan goes something like the following: 1. Remove masked array to a separate package, pin it to current versions of Numpy. 2. Evolve Numpy to the point where making new array types becomes attractive 3. Make a new masked array with that new functionality that doesn't have the problems of the current implementation Of course this is a simplistic view of the world, and it could also be that this triggers a forking event. However, hopefully it gets a general theme across though that there is value to allowing Numpy to move quickly, and that it might make sense for some feature-sets to miss out on that evolution for a time for the greater good of the ecosystem's evolution. -matt On Wed, May 23, 2018 at 6:08 PM, Matthew Brettwrote: > Hi, > > On Wed, May 23, 2018 at 10:42 PM, Stefan van der Walt > wrote: > > On May 23, 2018 14:28:05 Matthew Brett wrote: > >> > >> > >> Can I ask what the plans are for supporting missing values, inside or > >> outside numpy? Is there are successor to MaskedArray - and is this > >> part of the succession plan? > > > > > > I am not aware of any concrete plans, maybe others can chime in? > > > > It's a bit strange, the words that are used in this thread: "succession", > > "purification", "elimination", and "purge". I don't have my knife out for > > MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect > there > > may be a better and more supporting home/project configuration for it, > > perhaps still under the NumPy umbrella. > > The NEP notes that MaskedArray imposes a significant maintenance > burden, as a motivation for removing it. I'm sure you'd predict that > the Numpy developers are likely to spend less time on it, if it moves > to its own package. I guess the hope would be that others would take > over, but is that likely? What if they don't? > > Would it be reasonable to develop an alternative plan for missing > arrays in concert with this NEP, maybe along the lines that Allan > mentioned, above? > > Cheers, > > Matthew > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
Hi, On Wed, May 23, 2018 at 10:42 PM, Stefan van der Waltwrote: > On May 23, 2018 14:28:05 Matthew Brett wrote: >> >> >> Can I ask what the plans are for supporting missing values, inside or >> outside numpy? Is there are successor to MaskedArray - and is this >> part of the succession plan? > > > I am not aware of any concrete plans, maybe others can chime in? > > It's a bit strange, the words that are used in this thread: "succession", > "purification", "elimination", and "purge". I don't have my knife out for > MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect there > may be a better and more supporting home/project configuration for it, > perhaps still under the NumPy umbrella. The NEP notes that MaskedArray imposes a significant maintenance burden, as a motivation for removing it. I'm sure you'd predict that the Numpy developers are likely to spend less time on it, if it moves to its own package. I guess the hope would be that others would take over, but is that likely? What if they don't? Would it be reasonable to develop an alternative plan for missing arrays in concert with this NEP, maybe along the lines that Allan mentioned, above? Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
On Wed, 2018-05-23 at 17:33 -0400, Allan Haldane wrote: > On 05/23/2018 04:02 PM, Eric Firing wrote: > > Bad or missing values (and situations where one wants to use a mask > > to > > operate on a subset of an array) are found in many domains of real > > life; > > do you really want python users in those domains to have to fall > > back on > > Matlab-style reliance on nans and/or manual mask manipulations, as > > the > > new maskedarray package is sidelined? > > I also think that missing value support is important to include > inside > numpy, just as it is included in other numerical packages like R and > Julia. > > The time is ripe to write a new and better MaskedArray, because > __array_ufunc__ exists now. With some other numpy devs a few months > ago > we also played with rewriting MA using __array_ufunc__ and fixing up > all > the bugs and inconsistencies we have discovered over time (eg, > getting > rid of the Masked constant). Both Eric and I started working on some > code changes, but never submitted PRs. See a little bit of discussion > here (there was some more elsewhere I can't find now): > > https://github.com/numpy/numpy/pull/9792#issuecomment-46420 > > As I say there, numpy's current MA support is pretty poor compared to > R > - Wes McKinney partly justified his desire to move pandas away from > numpy because of it. We have a lot to gain by implementing it nicely. > > We already have an NEP discussing possible ways forward: > https://docs.scipy.org/doc/numpy-1.14.0/neps/missing-data.html > > I was pretty excited by discussion above, and still am. I want to get > back to it after I finish more immediate priorities - finishing > printing/loading/saving fixes and structured array fixes. > > But Masked-Array-2 is on my list of desired long-term enhancements > for > numpy. Well, if we plan to replace it within numpy, I think we should wait until then for any move on deprecation (after which it seems like the obviously right choice)? If we do not plan to replace it within numpy, we need to discuss a bit how it might affect infrastructure (multiple implementations). There is the other discussion about how to replace it. By opening up/creating new masked dtypes or similar (cool but unclear how complex/long term) or `__array_ufunc__` based (relatively simple, will get rid of the nastier hacks that are currently needed). Or even both, just on different time scales? My first gut feeling about the proposal is: I love the idea to get rid of it... but lets not do it, it does feel like it makes too much infrastructure unclear. - Sebastian > > Allan > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > signature.asc Description: This is a digitally signed message part ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
On May 23, 2018 14:28:05 Matthew Brettwrote: Can I ask what the plans are for supporting missing values, inside or outside numpy? Is there are successor to MaskedArray - and is this part of the succession plan? I am not aware of any concrete plans, maybe others can chime in? It's a bit strange, the words that are used in this thread: "succession", "purification", "elimination", and "purge". I don't have my knife out for MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect there may be a better and more supporting home/project configuration for it, perhaps still under the NumPy umbrella. Best regards, Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
On 05/23/2018 04:02 PM, Eric Firing wrote: > Bad or missing values (and situations where one wants to use a mask to > operate on a subset of an array) are found in many domains of real life; > do you really want python users in those domains to have to fall back on > Matlab-style reliance on nans and/or manual mask manipulations, as the > new maskedarray package is sidelined? I also think that missing value support is important to include inside numpy, just as it is included in other numerical packages like R and Julia. The time is ripe to write a new and better MaskedArray, because __array_ufunc__ exists now. With some other numpy devs a few months ago we also played with rewriting MA using __array_ufunc__ and fixing up all the bugs and inconsistencies we have discovered over time (eg, getting rid of the Masked constant). Both Eric and I started working on some code changes, but never submitted PRs. See a little bit of discussion here (there was some more elsewhere I can't find now): https://github.com/numpy/numpy/pull/9792#issuecomment-46420 As I say there, numpy's current MA support is pretty poor compared to R - Wes McKinney partly justified his desire to move pandas away from numpy because of it. We have a lot to gain by implementing it nicely. We already have an NEP discussing possible ways forward: https://docs.scipy.org/doc/numpy-1.14.0/neps/missing-data.html I was pretty excited by discussion above, and still am. I want to get back to it after I finish more immediate priorities - finishing printing/loading/saving fixes and structured array fixes. But Masked-Array-2 is on my list of desired long-term enhancements for numpy. Allan ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
Hi, On Wed, May 23, 2018 at 9:51 PM, Stefan van der Waltwrote: > Hi Eric, > > On May 23, 2018 13:25:44 Eric Firing wrote: > >> On 2018/05/23 9:06 AM, Matti Picus wrote: >> I understand at least some of the motivation and potential advantages, >> but as it stands, I find this NEP highly alarming. > > > I am not at my computer right now, so I will respond in more detail later. > But I wanted to address your statement above: > > I see a NEP as an opportunity to discuss and flesh out an idea, and I > certainly hope that you there's no reason for alarm. > > I do not expect to know whether this is a good idea before discussions > conclude, so I appreciate your feedback. If we cannot find good support for > the idea, with very specific benefits, it should simply be dropped. > > But, I think there's a lot to learn from the conversation in the meantime > w.r.t. exactly how streamlined people want NumPy to be, how core > functionality can perhaps be strengthened by becoming a customer of our own > API, how to optimally maintain sub-components, etc. Can I ask what the plans are for supporting missing values, inside or outside numpy? Is there are successor to MaskedArray - and is this part of the succession plan? Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
As far as I understand from the discussion above, I think the opposite would be a better strategy for the sanity of our scarce but brave maintainers. I would argue that if there is a maintenance burden, then the ballasts seem to be the linalg and random indeed. Similar pain points exist in SciPy too. There are a lot of issues that has been already thought of, years ago but never materialized (be it backwards compatibility, lack of champions and so on) because they are not the priority of the maintaining team. It is very common that a discussion ends with "yes, we should probably make it a ufunc" and then fades away. I feel that if there were less things to worry about more people would step up and "do it". I would also argue that highest expectancy from NumPy would be having a really sound data structure basis with more ufuncs, more array manipulation tricks and so on. Masked arrays, imho, fall into that category. Hence, if the codebase gets more refined in that respect and less stuff to maintain, less moving parts, I think there would be a more coherent overall picture and more focused action plan. Now the attention of maintainers seem to be divided into a lot of orthogonal issues which is not a bad thing per se but tedious at times. Currently NumPy has a lot of code that really doesn't need to bother and can delegate to higher level packages like SciPy or any other subpackage. It sounds like NumPy 2.0 but actually more of a gradual thinning out. On Wed, May 23, 2018 at 10:51 PM, Stefan van der Waltwrote: > Hi Eric, > > On May 23, 2018 13:25:44 Eric Firing wrote: > > On 2018/05/23 9:06 AM, Matti Picus wrote: >> I understand at least some of the motivation and potential advantages, >> but as it stands, I find this NEP highly alarming. >> > > I am not at my computer right now, so I will respond in more detail later. > But I wanted to address your statement above: > > I see a NEP as an opportunity to discuss and flesh out an idea, and I > certainly hope that you there's no reason for alarm. > > I do not expect to know whether this is a good idea before discussions > conclude, so I appreciate your feedback. If we cannot find good support for > the idea, with very specific benefits, it should simply be dropped. > > But, I think there's a lot to learn from the conversation in the meantime > w.r.t. exactly how streamlined people want NumPy to be, how core > functionality can perhaps be strengthened by becoming a customer of our own > API, how to optimally maintain sub-components, etc. > > Best regards, > Stéfan > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
Hi Eric, On May 23, 2018 13:25:44 Eric Firingwrote: On 2018/05/23 9:06 AM, Matti Picus wrote: I understand at least some of the motivation and potential advantages, but as it stands, I find this NEP highly alarming. I am not at my computer right now, so I will respond in more detail later. But I wanted to address your statement above: I see a NEP as an opportunity to discuss and flesh out an idea, and I certainly hope that you there's no reason for alarm. I do not expect to know whether this is a good idea before discussions conclude, so I appreciate your feedback. If we cannot find good support for the idea, with very specific benefits, it should simply be dropped. But, I think there's a lot to learn from the conversation in the meantime w.r.t. exactly how streamlined people want NumPy to be, how core functionality can perhaps be strengthened by becoming a customer of our own API, how to optimally maintain sub-components, etc. Best regards, Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
On Wed, May 23, 2018 at 1:03 PM, Stefan van der Waltwrote: > On Wed, 23 May 2018 12:29:32 -0700, Ralf Gommers wrote: > > >> * Compatibility: MaskedArray objects, being subclasses of `ndarrays`, > > >>often cause complications when being used with other packages. > > >>Fixing these issues is outside the scope of NumPy development. > > > > > Hmm, I wouldn't say it's out of scope at all. Currently it's simply part > of > > numpy. > > That is currently the situation, yes. I think this was meant more as > "we'd preferably not like to think about MaskedArrays any differently > than we do about other external packages, such as dask". I.e., not > support specific hacks to make it work. > > > You're missing an important step I think. You're proposing to deprecate > > MaskedArray completely (or not?). IIRC this has not been decided or > > seriously discussed before. > > Good point, which certainly needs to be discussed. My thought was to > move it out into a separate package that could be maintained more in the > spirit of a scikit by people who care deeply about its functionality. > That would be good in principle, but it's only possible that way once the specific hacks you refer to above are removed. As long as MaskedArray depends on implementation details of ndarray, evolving them in lock-step will be necessary. And that is much easier when they're in the same package. Regarding whether a split-off package will actually be developed, I think that depends on having at least one champion for it stepping up. If we just move it over into github.com/numpy/maskedarray, I think it will get less rather than more attention. Cheers, Ralf > > Best regards, > Stéfan > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
On 2018/05/23 9:06 AM, Matti Picus wrote: MaskedArray is a strange but useful creature. This NEP proposes to distribute it as a separate package under the NumPy brand. As I understand the process, a proposed NEP should be first discussed here to gauge general acceptance, then after that the details should be discussed on the pull request itself https://github.com/numpy/numpy/pull/11146. Here is the motivation section from the NEP: MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds masking capabilities, i.e. the ability to ignore or hide certain array values during computation. While historically convenient to distribute this class inside of NumPy, improved packaging has made it possible to distribute it separately without difficulty. Motivations for this move include: * Focus: the NumPy package should strive to only include the `ndarray` object, and the essential utilities needed to manipulate such arrays. * Complexity: the MaskedArray implementation is non-trivial, and imposes a significant maintenance burden. * Compatibility: MaskedArray objects, being subclasses of `ndarrays`, often cause complications when being used with other packages. Fixing these issues is outside the scope of NumPy development. This NEP proposes a deprecation pathway through which MaskedArrays would still be accessible to users, but no longer as part of the core package. Any thoughts? Matti and Stefan I understand at least some of the motivation and potential advantages, but as it stands, I find this NEP highly alarming. Masked arrays are critical to my numpy usage, and I suspect they are critical for many other use cases as well. In fact, I would prefer that a high priority for major numpy development be the more complete integration of masked array capabilities into numpy, not their removal to a separate package. I was unhappy to see the effort in that direction a few years ago being killed. I didn't agree with every design decision, but overall I thought it was going in the right direction. Bad or missing values (and situations where one wants to use a mask to operate on a subset of an array) are found in many domains of real life; do you really want python users in those domains to have to fall back on Matlab-style reliance on nans and/or manual mask manipulations, as the new maskedarray package is sidelined? Or is there any realistic prospect for maintenance and improvement of the package after it is separated out? Or of mask/missing value handling being integrated into numpy? Is the latter option on the table in any form, or is it DOA? Side question: does your proposed purification of numpy include elimination of linalg and random? Based on the criteria in the NEP, I would expect it does; so maybe you should have a more ambitious NEP, and do the purification all in one step as a numpy version 2.0. (Surely if masked arrays are purged, the matrix class should be booted out at the same time.) Eric ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Splitting MaskedArray into a separate package
On Wed, 23 May 2018 12:29:32 -0700, Ralf Gommers wrote: > >> * Compatibility: MaskedArray objects, being subclasses of `ndarrays`, > >>often cause complications when being used with other packages. > >>Fixing these issues is outside the scope of NumPy development. > > > Hmm, I wouldn't say it's out of scope at all. Currently it's simply part of > numpy. That is currently the situation, yes. I think this was meant more as "we'd preferably not like to think about MaskedArrays any differently than we do about other external packages, such as dask". I.e., not support specific hacks to make it work. > You're missing an important step I think. You're proposing to deprecate > MaskedArray completely (or not?). IIRC this has not been decided or > seriously discussed before. Good point, which certainly needs to be discussed. My thought was to move it out into a separate package that could be maintained more in the spirit of a scikit by people who care deeply about its functionality. Best regards, Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Splitting MaskedArray into a separate package
MaskedArray is a strange but useful creature. This NEP proposes to distribute it as a separate package under the NumPy brand. As I understand the process, a proposed NEP should be first discussed here to gauge general acceptance, then after that the details should be discussed on the pull request itself https://github.com/numpy/numpy/pull/11146. Here is the motivation section from the NEP: MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds masking capabilities, i.e. the ability to ignore or hide certain array values during computation. While historically convenient to distribute this class inside of NumPy, improved packaging has made it possible to distribute it separately without difficulty. Motivations for this move include: * Focus: the NumPy package should strive to only include the `ndarray` object, and the essential utilities needed to manipulate such arrays. * Complexity: the MaskedArray implementation is non-trivial, and imposes a significant maintenance burden. * Compatibility: MaskedArray objects, being subclasses of `ndarrays`, often cause complications when being used with other packages. Fixing these issues is outside the scope of NumPy development. This NEP proposes a deprecation pathway through which MaskedArrays would still be accessible to users, but no longer as part of the core package. Any thoughts? Matti and Stefan ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion