Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__
But wouldn't the performance hit only come when I use it in this way? __getattr__ is only called if the named attribute is *not* found (I guess it falls off the end of the case statement, or is the result of the attribute hash table miss). That's why I said that __getattr__ would perhaps work better. So do you want me to try out an implementation and supply a patch? If so, where should I send the patch? Ian attachment: ijstokes.vcf___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__
Hi, Fri, 29 Oct 2010 05:59:07 -0400, Ian Stokes-Rees wrote: But wouldn't the performance hit only come when I use it in this way? __getattr__ is only called if the named attribute is *not* found (I guess it falls off the end of the case statement, or is the result of the attribute hash table miss). That's why I said that __getattr__ would perhaps work better. So do you want me to try out an implementation and supply a patch? If so, where should I send the patch? See here: http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__
But wouldn't the performance hit only come when I use it in this way? __getattr__ is only called if the named attribute is *not* found (I guess it falls off the end of the case statement, or is the result of the attribute hash table miss). That's why I said that __getattr__ would perhaps work better. So do you want me to try out an implementation and supply a patch? If so, where should I send the patch? Ian Note that there are various extant projects that I think attempt to provide similar functionality to what you're wanting (unless I badly misread your original email, in which case apologies): http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes You might want to check these out (or pitch in with those efforts) before starting up your own variant. (Though if your idea has more modest goals, then it might be a good complement to these more extensive/intrusive solutions.) Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__
Note that there are various extant projects that I think attempt to provide similar functionality to what you're wanting (unless I badly misread your original email, in which case apologies): http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes Having looked into it more, I think recarray is probably what I want. I need to play with this and see how easy it is to convert ndarrays into recarrays. The main thing is that I'm looking for something *really* simple that would, truly, just allow the conversion of: myarray['myaxis'] to myarray.myaxis My suggestion is to define __getattr__ directly on the ndarray class. This is, TTBOMK, only called if an attribute is *not* found on the object. In this event, prior to throwing an AttributeError exception, the object could check to see if the specified attribute exists as a named axis/dimension of the multi-dimensional array, and if so, return this. Otherwise, carry on with the AttributeError exception. I've spent an hour looking at the numpy code (my first time), and I don't see any obvious way to do this, since ndarray is (AFAICT) a pure-C object with auto-generated wrappers, which seems to preclude (easily) adding a __getattr__(self,attr) method to the class. If someone can point me in the right direction, I'll keep looking into this, otherwise I'm giving up and will just try and use recarray. Ian ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__
On Oct 29, 2010, at 3:58 PM, Ian Stokes-Rees wrote: Note that there are various extant projects that I think attempt to provide similar functionality to what you're wanting (unless I badly misread your original email, in which case apologies): http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes Having looked into it more, I think recarray is probably what I want. I need to play with this and see how easy it is to convert ndarrays into recarrays. your_rec=your_array.view(np.recarray) The main thing is that I'm looking for something *really* simple that would, truly, just allow the conversion of: myarray['myaxis'] to myarray.myaxis Attribute-style access is IMHO the only interest of recarrays over regular structured arrays. Note that it comes to a cost... My suggestion is to define __getattr__ directly on the ndarray class. This is, TTBOMK, only called if an attribute is *not* found on the object. In this event, prior to throwing an AttributeError exception, the object could check to see if the specified attribute exists as a named axis/dimension of the multi-dimensional array, and if so, return this. Otherwise, carry on with the AttributeError exception. I've spent an hour looking at the numpy code (my first time), and I don't see any obvious way to do this, since ndarray is (AFAICT) a pure-C object with auto-generated wrappers, which seems to preclude (easily) adding a __getattr__(self,attr) method to the class. If someone can point me in the right direction, I'll keep looking into this, otherwise I'm giving up and will just try and use recarray. Indeed. Check how recarray does it, for example. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__
Fri, 29 Oct 2010 09:58:33 -0400, Ian Stokes-Rees wrote: [clip] I've spent an hour looking at the numpy code (my first time), and I don't see any obvious way to do this, since ndarray is (AFAICT) a pure-C object with auto-generated wrappers, which seems to preclude (easily) adding a __getattr__(self,attr) method to the class. If someone can point me in the right direction, I'll keep looking into this, otherwise I'm giving up and will just try and use recarray. http://docs.python.org/c-api/typeobj.html#tp_getattro The Python documentation doesn't seem to say very if method/attribute slots are consulted before falling back to tp_getattro. If tp_getattro is consulted first, then implementing it will lead to a performance hit. I'd probably be +0 on providing recarray-like functionality on ordinary ndarrays, if it can be done without (significant) performance issues. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__
On Thu, Oct 28, 2010 at 15:17, Ian Stokes-Rees ijsto...@hkl.hms.harvard.edu wrote: I have an ndarray with named dimensions. I find myself writing some fairly laborious code with lots of square brackets and quotes. It seems like it wouldn't be such a big deal to overload __getattribute__ so instead of doing: r = genfromtxt('results.dat',dtype=[('a','int'), ('b', 'f8'), ('c','int'), ('d', 'a20')]) scatter(r[r['d'] == 'OK']['a'], r[r['d'] == 'OK']['b']) I could do: scatter(r[r.d == 'OK'].a, r[r.d == 'OK'].b) which is really a lot clearer. Is something like this already possible somehow? See recarray which uses __getattribute__. Is there some reason not to map __getattr__ to __getitem__? Using __getattribute__ tends to slow down almost all operations on the array substantially. Perhaps __getattr__ would work better, but all of the methods and attributes would mask the fields. If you can find a better solution that doesn't have such an impact on normal performance, we'd be happy to hear it. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__
On 10/28/10 5:29 PM, Robert Kern wrote: On Thu, Oct 28, 2010 at 15:17, Ian Stokes-Rees ijsto...@hkl.hms.harvard.edu wrote: I have an ndarray with named dimensions. I find myself writing some fairly laborious code with lots of square brackets and quotes. It seems like it wouldn't be such a big deal to overload __getattribute__ so instead of doing: r = genfromtxt('results.dat',dtype=[('a','int'), ('b', 'f8'), ('c','int'), ('d', 'a20')]) scatter(r[r['d'] == 'OK']['a'], r[r['d'] == 'OK']['b']) I could do: scatter(r[r.d == 'OK'].a, r[r.d == 'OK'].b) which is really a lot clearer. Is something like this already possible somehow? See recarray which uses __getattribute__. Thanks -- I'll look into it. Is there some reason not to map __getattr__ to __getitem__? Using __getattribute__ tends to slow down almost all operations on the array substantially. Perhaps __getattr__ would work better, but all of the methods and attributes would mask the fields. If you can find a better solution that doesn't have such an impact on normal performance, we'd be happy to hear it. But wouldn't the performance hit only come when I use it in this way? __getattr__ is only called if the named attribute is *not* found (I guess it falls off the end of the case statement, or is the result of the attribute hash table miss). So the proviso is this shortcut only works if the field names are distinct from any methods or attributes on the ndarray object (or its sub-classes). You've gotta admit that the readability of the code goes up *a lot* with the alternative I'm proposing. Ian -- Ian Stokes-Rees, PhDW: http://portal.nebiogrid.org ijsto...@hkl.hms.harvard.eduT: +1.617.432.5608 x75 NEBioGrid, Harvard Medical School C: +1.617.331.5993 attachment: ijstokes.vcf___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__
On Thu, Oct 28, 2010 at 16:37, Ian Stokes-Rees ijsto...@hkl.hms.harvard.edu wrote: On 10/28/10 5:29 PM, Robert Kern wrote: On Thu, Oct 28, 2010 at 15:17, Ian Stokes-Rees ijsto...@hkl.hms.harvard.edu wrote: I have an ndarray with named dimensions. I find myself writing some fairly laborious code with lots of square brackets and quotes. It seems like it wouldn't be such a big deal to overload __getattribute__ so instead of doing: r = genfromtxt('results.dat',dtype=[('a','int'), ('b', 'f8'), ('c','int'), ('d', 'a20')]) scatter(r[r['d'] == 'OK']['a'], r[r['d'] == 'OK']['b']) I could do: scatter(r[r.d == 'OK'].a, r[r.d == 'OK'].b) which is really a lot clearer. Is something like this already possible somehow? See recarray which uses __getattribute__. Thanks -- I'll look into it. Is there some reason not to map __getattr__ to __getitem__? Using __getattribute__ tends to slow down almost all operations on the array substantially. Perhaps __getattr__ would work better, but all of the methods and attributes would mask the fields. If you can find a better solution that doesn't have such an impact on normal performance, we'd be happy to hear it. But wouldn't the performance hit only come when I use it in this way? __getattr__ is only called if the named attribute is *not* found (I guess it falls off the end of the case statement, or is the result of the attribute hash table miss). That's why I said that __getattr__ would perhaps work better. So the proviso is this shortcut only works if the field names are distinct from any methods or attributes on the ndarray object (or its sub-classes). Right. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion