On Sat, Jun 25, 2011 at 10:26 AM, Matthew Brett <matthew.br...@gmail.com>wrote:
> Hi, > > On Sat, Jun 25, 2011 at 5:05 PM, Nathaniel Smith <n...@pobox.com> wrote: > > So obviously there's a lot of interest in this question, but I'm > > losing track of all the different issues that've being raised in the > > 150-post thread of doom. I think I'll find this easier if we start by > > putting aside the questions about implementation and such and focus > > for now on the *conceptual model* that we want. Maybe I'm not the only > > one? > > > > So as far as I can tell, there are three different ways of thinking > > about masked/missing data that people have been using in the other > > thread: > > > > 1) Missingness is part of the data. Some data is missing, some isn't, > > this might change through computation on the data (just like some data > > might change from a 3 to a 6 when we apply some transformation, NA | > > True could be True, instead of NA), but we can't just "decide" that > > some data is no longer missing. It makes no sense to ask what value is > > "really" there underneath the missingness. And It's critical that we > > keep track of this through all operations, because otherwise we may > > silently give incorrect answers -- exactly like it's critical that we > > keep track of the difference between 3 and 6. > > So far I see the difference between 1) and 2) being that you cannot > unmask. So, if you didn't even know you could unmask data, then it > would not matter that 1) was being implemented by masks? > > > 2) All the data exists, at least in some sense, but we don't always > > want to look at all of it. We lay a mask over our data to view and > > manipulate only parts of it at a time. We might want to use different > > masks at different times, mutate the mask as we go, etc. The most > > important thing is to provide convenient ways to do complex > > manipulations -- preserve masks through indexing operations, overlay > > the mask from one array on top of another array, etc. When it comes to > > other sorts of operations then we'd rather just silently skip the > > masked values -- we know there are values that are masked, that's the > > whole point, to work with the unmasked subset of the data, so if sum > > returned NA then that would just be a stupid hassle. > > To clarify, you're proposing for: > > a = np.sum(np.array([np.NA, np.NA]) > > 1) -> np.NA > 2) -> 0.0 > > ? > > > But that's just my opinion. I'm wondering if we can get any consensus > > on which of these we actually *want* (or maybe we want some fourth > > option!), and *then* we can try to figure out the best way to get > > there? Pretty much any implementation strategy we've talked about > > could work for any of these, but hard to decide between them if we > > don't even know what we're trying to do... > > I agree it's good to separate the API from the implementation. I > think the implementation is also important because I care about memory > and possibly speed. But, that is a separate problem from the API... > > In a larger sense, we are seeking to add metadata to array elements and have ufuncs that use that metadata together with the element values to compute results. Off topic a bit, but it reminds me of the Burroughs 6600 that I once used. The word size on that machine was 48 bits, so it could accommodate both 6 and 8 bit characters, and 3 bits of metadata were appended to mark the type. So there was a machine with 51 bit words ;) IIRC, Knuth was involved in the design and helped with the OS, which was written in ALGOL... Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion