On 05/10/2012 11:38 AM, Dag Sverre Seljebotn wrote: > On 05/10/2012 10:40 AM, Charles R Harris wrote: >> >> >> On Thu, May 10, 2012 at 1:10 AM, Dag Sverre Seljebotn >> <d.s.seljeb...@astro.uio.no<mailto:d.s.seljeb...@astro.uio.no>> wrote: >> >> On 05/10/2012 06:18 AM, Charles R Harris wrote: >> > >> > >> > On Wed, May 9, 2012 at 9:54 PM, Dag Sverre Seljebotn >> > <d.s.seljeb...@astro.uio.no<mailto:d.s.seljeb...@astro.uio.no> >> <mailto:d.s.seljeb...@astro.uio.no >> <mailto:d.s.seljeb...@astro.uio.no>>> wrote: >> > >> > Sorry everyone for being so dense and contaminating that >> other thread. >> > Here's a new thread where I can respond to Nathaniel's response. >> > >> > On 05/10/2012 01:08 AM, Nathaniel Smith wrote: >> > > Hi Dag, >> > > >> > > On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn >> > > <d.s.seljeb...@astro.uio.no<mailto:d.s.seljeb...@astro.uio.no> >> <mailto:d.s.seljeb...@astro.uio.no<mailto:d.s.seljeb...@astro.uio.no>>> >> > wrote: >> > >> I'm a heavy user of masks, which are used to make data NA in the >> > >> statistical sense. The setting is that we have to mask out the >> > radiation >> > >> coming from the Milky Way in full-sky images of the Cosmic >> Microwave >> > >> Background. There's data, but we know we can't trust it, so we >> > make it >> > >> NA. But we also do play around with different masks. >> > > >> > > Oh, this is great -- that means you're one of the users that I >> wasn't >> > > sure existed or not :-). Now I know! >> > > >> > >> Today we keep the mask in a seperate array, and to zero-mask we >> do >> > >> >> > >> masked_data = data * mask >> > >> >> > >> or >> > >> >> > >> masked_data = data.copy() >> > >> masked_data[mask == 0] = np.nan # soon np.NA >> > >> >> > >> depending on the circumstances. >> > >> >> > >> Honestly, API-wise, this is as good as its gets for us. Nice and >> > >> transparent, no new semantics to learn in the special case of >> masks. >> > >> >> > >> Now, this has performance issues: Lots of memory use, extra >> > transfers >> > >> over the memory bus. >> > > >> > > Right -- this is a case where (in the NA-overview terminology) >> masked >> > > storage+NA semantics would be useful. >> > > >> > >> BUT, NumPy has that problem all over the place, even for "x + y >> > + z"! >> > >> Solving it in the special case of masks, by making a new API, >> > seems a >> > >> bit myopic to me. >> > >> >> > >> IMO, that's much better solved at the fundamental level. As an >> > >> *illustration*: >> > >> >> > >> with np.lazy: >> > >> masked_data1 = data * mask1 >> > >> masked_data2 = data * (mask1 | mask2) >> > >> masked_data3 = (x + y + z) * (mask1& mask3) >> > >> >> > >> This would create three "generator arrays" that would >> zero-mask the >> > >> arrays (and perform the three-term addition...) upon request. >> > You could >> > >> slice the generator arrays as you wish, and by that slice the >> > data and >> > >> the mask in one operation. Obviously this could handle >> > NA-masking too. >> > >> >> > >> You can probably do this today with Theano and numexpr, and I >> think >> > >> Travis mentioned that "generator arrays" are on his radar for >> core >> > NumPy. >> > > >> > > Implementing this today would require some black magic hacks, >> because >> > > on entry/exit to the context manager you'd have to "reach up" >> > into the >> > > calling scope and replace all the ndarray's with LazyArrays and >> then >> > > vice-versa. This is actually totally possible: >> > > https://gist.github.com/2347382 >> > > but I'm not sure I'd call it *wise*. (You could probably avoid >> the >> > > truly horrible set_globals_dict part of that gist, though.) >> Might be >> > > fun to prototype, though... >> > >> > 1) My main point was just that I believe masked arrays is >> something that >> > to me feels immature, and that it is the kind of thing that >> should be >> > constructed from simpler primitives. And that NumPy should >> focus on >> > simple primitives. You could make it >> > >> > >> > I can't disagree, as I suggested the same as a possibility myself ;) >> > There is a lot of infrastructure now in numpy, but given the use >> cases >> > I'm tending towards the view that masked arrays should be left to >> > others, at least for the time being. The question is how to >> generalize >> > the infrastructure and what hooks to provide. I think just spending >> a >> > month or two pulling stuff out is counter productive, but >> evolving the >> > code is definitely needed. If you could familiarize yourself with >> what >> > is in there, something that seems largely neglected by the >> critics, and >> > make suggestions, that would be helpful. >> >> But how on earth can I make constructive criticisms about code when I >> don't know what the purpose of that code is supposed to be? >> >> >> What do you mean? I thought the purpose was quite clearly laid out in >> the NEP. But the implementation of that purpose required some >> infrastructure. The point, I suppose, is for you to suggest what would >> serve your use case. > > What would serve me? I use NumPy as a glorified "double*". And I'm > probably going to continue doing so. I'd be most happy by the status quo > of NumPy 1.6 + bugfixes. > > Main reason I'm getting involved is that the direction NumPy is going > has consequences for Cython and Cython users and the Cython code I write > myself. > > So my main comment is that I'd like (by whatever means necessary) to > have PyObject_TypeCheck fail whenever you can't sanely access > PyArray_DATA or in other ways use the good old NumPy C API (for whatever > reason, such as masks). > > If PyObject_TypeCheck fails, existing C extension modules are > forward-compatible, and so one doesn't have to go over old extension > module code to add in a check for the presence of masks (or whatever) > and raise an exception. > > Suggestion: Use the PyArrayObject struct for both ndarray and ndmasked, > so that ndarray has NULL mask fields. So ob_type makes them appear > different to Python and C extensions, but internally in NumPy you could > just check for a NULL mask field rather than the type, if you prefer that. > > (When I say I like status quo: There are things I'd *like* to see > changed in NumPy, but it's better phrased in terms of full > reimplementation rather than iterative refinement, which probably means > I should keep those thoughts to myself until I grow up. But until NumPy > changes on a pretty fundamental level, all I want is my glorified > "double*". I'm probably not a representative user.) > >> Are you saying you agree that the masking aspect should be banned (or at >> least not "core"), and asking me to look at code from that perspective >> and comment on how to get there while keeping as much as possible of the >> rest? Would that really be helpful? >> >> >> No, I don't agree that it should be banned, but your perspective seems >> to be that it should be, so I ask you to determine what is worth >> keeping. We can of course pull it all out and forget about the whole >> thing. But I'm getting tired of people saying do this or that without >> making technical suggestions that can be implemented, looking at the >> code, testing things, and providing feedback. At a minimum, I expect you >> to have an idea of how things *should* work and how to get there. > > Look, I'm used to coding; at least I used to spend a significant amount > of time on Cython, doing some pretty extensive refactorings of the code > base that touched pretty much all the code. > > And on the Cython list, we almost always keep details of implementation > second-order to the semantics we want. We do keep an eye on "how > difficult it would be to implement" while discussing semantics, and > sometimes have to say "we ideally want this long-term, what's the > stopgap solution we can use until we can make it work the way we want". > But it's nothing like what you suggest above. And we don't compromise > short-term if better semantics are available long-term (e.g., recent > thread on None-checking). > > But talking about process is getting old, I guess. > > (Of course there's plenty of other factors involved that could explain > why the Cython/SymPy/Sage/etc. development communities are so good, and > the NumPy one so...ehrmm..)
I'm very sorry. I should have said "lists"! I don't really know anything about the NumPy developer community. Anyway, I sense my best course of action will be to shut up. Dag _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion