On Tue, Jun 28, 2011 at 2:41 PM, Eric Firing <efir...@hawaii.edu> wrote:
> On 06/28/2011 07:26 AM, Nathaniel Smith wrote: > > On Tue, Jun 28, 2011 at 9:38 AM, Charles R Harris > > <charlesr.har...@gmail.com> wrote: > >> Nathaniel, an implementation using masks will look *exactly* like an > >> implementation using na-dtypes from the user's point of view. Except > that > >> taking a masked view of an unmasked array allows ignoring values without > >> destroying or copying the original data. > > > > Charles, I know that :-). > > > > But if that view thing is an advertised feature -- in fact, the key > > selling point for the masking-based implementation, included > > specifically to make a significant contingent of users happy -- then > > it's certainly user-visible. And it will make other users unhappy, > > like I said. That's life. > > > > But who cares? My main point is that implementing a missing data > > solution and a separate masked array solution is probably less work > > than implementing a single everything-to-everybody solution *anyway*, > > *and* it might make both sets of users happier too. Notice that in my > > proposal, there's really nothing there that isn't already in Mark's > > NEP in some form or another, but in my version there's almost no > > overlap between the two features. That's not because I was trying to > > make them artificially different; it's because I tried to think of the > > most natural ways to satisfy each set of use cases, and they're just > > different. > > I think you are exaggerating some of the differences associated with the > implementation, and ignoring one *key* difference: for integer types, > the masked implementation can handle the full numeric range of the type, > while the bit-pattern approach cannot. > > Balanced against that, the *key* advantages of the bit-pattern approach > would seem to be the simplicity of using a single array, particularly > for IO (including memmapping) and interfacing with extension code. > Although I am a heavy user of masked arrays, I consider these > bit-pattern advantages to be substantial and deserving of careful > consideration--perhaps of more weight and planning than they have gotten > so far. > > Datasets on disk--e.g. climatological data, numerical model output, > etc.--typically do use reserved values as missing value flags, although > occasionally one also finds separate mask arrays. > > One of the real frustrations of the present masked array is that there > is no savez/load support. I could roll my own by using a convention > like saving the mask of xxx as xxx__mask__, and then reversing the > process in a modified load; but I haven't gotten around to doing it. > Regardless of internal implementation, I hope that core support for > missing values will be included in savez/load. > This sounds reasonable to me, and probably will require extending the file format a bit. -Mark > > Eric > > > > > > > -- Nathaniel > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion