subclass

Eric Firing Sat, 10 Nov 2018 22:45:02 -0800

On 2018/11/10 12:39 PM, Stephan Hoyer wrote:

On Sat, Nov 10, 2018 at 2:22 PM Hameer Abbasi <einstein.edi...@gmail.com<mailto:einstein.edi...@gmail.com>> wrote:


        To summarize, I think these are our options:

        1. Change the behavior of np.anyarray() to check for an
        __anyarray__() protocol. Change np.matrix.__anyarray__() to
        return a base numpy array (this is a minor backwards
        compatibility break, but probably for the best). Start issuing a
        FutureWarning for any MaskedArray operations that violate Liskov
        and add a skipna argument that in the future will default to
        skipna=False.

        2. Introduce a new coercion function, e.g., np.duckarray(). This
        is the easiest option because we don't need to cleanup NumPy's
        existing ndarray subclasses.


    My vote is still for 1. I don’t have an issue for PyData/Sparse
    depending on recent-ish NumPy versions — It’ll need a lot of the
    recent protocols anyway, although I could be convinced otherwise if
    major package devs (scikits, SciPy, Dask) were to weigh in and say
    they’ll jump on it (which seems unlikely given SciPy’s policy to
    support old NumPy versions).

I agree that option (1) is fine for PyData/sparse. The bigger issue isthat this change should be conditional on making breaking changes (atleast raising FutureWarning for now) to np.ma.MaskedArray.

I don't know how people who currently use MaskedArray would feel aboutthat. I would love to hear their thoughts.

Thank you. I am a user of masked arrays, and have been since pre-numpydays. I introduced their extensive use in matplotlib long ago. I havebeen a bit concerned, indeed, that all of the discussion of modifyingmasked arrays seems to be by people who don't actually use themexplicitly (though they might be using them without knowing it viainternal operations in matplotlib, or they might be quickly getting ridof them after they are yielded by netCDF4.Dataset()).

I think that those of us who do use masked arrays recognize that theyare not perfect; they have some quirks and gotchas, and one has to becareful to use numpy.ma functions instead of numpy functions in mostcases. But we use them because they have real advantages over thealternatives, which are using nans and/or manually tracking independentmasks throughout calculations. These advantages are largely becausemasked values *don't* behave like nan, *don't* propagate. This isfundamental to the design, and motivated by real-life use cases.

The proposal to add a skipna kwarg to MaskedArray looks to me like it isgiving purity priority over practicality. It will force ma users toinsert skipna kwargs all over the place--because the default will becontrary to the primary purposes of using masked arrays, in most cases.How many people will it actually benefit? How many people are beingbitten, and how badly, by masked array behavior?

If there were a prospect of truly integrating missing/masked valuehandling into numpy, simplifying or phasing out numpy.ma, I would bedelighted--I think it is the biggest single fundamental improvement thatcould be made, from the user's standpoint. I was sad to see MarkWiebe's work in that direction come to grief.

If there are ways of gradually improving numpy.ma and itsinteroperability with the rest of numpy and with the proliferation ofduck arrays, I'm all in favor--so long as they don't effectively wrecknumpy.ma for its present intended purposes.


Eric


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

Reply via email to