On Mon, Jun 27, 2011 at 8:18 PM, Matthew Brett <matthew.br...@gmail.com>wrote:
> Hi, > > On Mon, Jun 27, 2011 at 5:53 PM, Charles R Harris > <charlesr.har...@gmail.com> wrote: > > > > > > On Mon, Jun 27, 2011 at 9:55 AM, Mark Wiebe <mwwi...@gmail.com> wrote: > >> > >> First I'd like to thank everyone for all the feedback you're providing, > >> clearly this is an important topic to many people, and the discussion > has > >> helped clarify the ideas for me. I've renamed and updated the NEP, then > >> placed it into the master NumPy repository so it has a more permanent > home > >> here: > >> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > >> In the NEP, I've tried to address everything that was raised in the > >> original thread and in Nathaniel's followup 'Concepts' thread. To deal > with > >> the issue of whether a mask is True or False for a missing value, I've > >> removed the 'mask' attribute entirely, except for ufunc-like functions > >> np.ismissing and np.isavail which return the two styles of masks. Here's > a > >> high level summary of how I'm thinking of the topic, and what I will > >> implement: > >> Missing Data Abstraction > >> There appear to be two useful ways to think about missing data that are > >> worth supporting. > >> 1) Unknown yet existing data > >> 2) Data that doesn't exist > >> In 1), an NA value causes outputs to become NA except in a small number > of > >> exceptions such as boolean logic, and in 2), operations treat the data > as if > >> there were a smaller array without the NA values. > >> Temporarily Ignoring Data > >> In some cases, it is useful to flag data as NA temporarily, possibly in > >> several different ways, for particular calculations or testing out > different > >> ways of throwing away outliers. This is independent of the missing data > >> abstraction, still requiring a choice of 1) or 2) above. > >> Implementation Techniques > >> There are two mechanisms generally used to implement missing data > >> abstractions, > >> 1) An NA bit pattern > >> 2) A mask > >> I've described a design in the NEP which can include both techniques > using > >> the same interface. The mask approach is strictly more general than the > NA > >> bit pattern approach, except for a few things like the idea of > supporting > >> the dtype 'NA[f8,InfNan]' which you can read about in the NEP. > >> My intention is to implement the mask-based design, and possibly also > >> implement the NA bit pattern design, but if anything gets cut it will be > the > >> NA bit patterns. > > > > I have the impression that the mask-based design would be easier. Perhaps > > you could do that one first and folks could try out the API and see how > they > > like it and discover whether the memory overhead is a problem in > practice. > > That seems like a risky strategy to me, as the most likely outcome is > that people worried about memory will avoid masked arrays because they > know they use more memory. The memory usage is predictable and we > won't learn any more about it from use. We most of us already know if > we're having to optimize code for memory. > > You won't get complaints, you'll just lose a group of users, who will, > I suspect, stick to NaNs, unsatisfactory as they are. > +1 - eat > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion