Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__

2010-10-29 Thread Ian Stokes-Rees

 But wouldn't the performance hit only come when I use it in this way?
 __getattr__ is only called if the named attribute is *not* found (I
 guess it falls off the end of the case statement, or is the result of
 the attribute hash table miss).
 That's why I said that __getattr__ would perhaps work better.


So do you want me to try out an implementation and supply a patch?  If
so, where should I send the patch?

Ian
attachment: ijstokes.vcf___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__

2010-10-29 Thread Pauli Virtanen
Hi,

Fri, 29 Oct 2010 05:59:07 -0400, Ian Stokes-Rees wrote:
 But wouldn't the performance hit only come when I use it in this way?
 __getattr__ is only called if the named attribute is *not* found (I
 guess it falls off the end of the case statement, or is the result of
 the attribute hash table miss).
 That's why I said that __getattr__ would perhaps work better.
 
 So do you want me to try out an implementation and supply a patch?  If
 so, where should I send the patch?

See here:

http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__

2010-10-29 Thread Zachary Pincus
 But wouldn't the performance hit only come when I use it in this  
 way?
 __getattr__ is only called if the named attribute is *not* found (I
 guess it falls off the end of the case statement, or is the result  
 of
 the attribute hash table miss).
 That's why I said that __getattr__ would perhaps work better.


 So do you want me to try out an implementation and supply a patch?  If
 so, where should I send the patch?

 Ian

Note that there are various extant projects that I think attempt to  
provide similar functionality to what you're wanting (unless I badly  
misread your original email, in which case apologies):
http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes

You might want to check these out (or pitch in with those efforts)  
before starting up your own variant. (Though if your idea has more  
modest goals, then it might be a good complement to these more  
extensive/intrusive solutions.)

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__

2010-10-29 Thread Ian Stokes-Rees

 Note that there are various extant projects that I think attempt to  
 provide similar functionality to what you're wanting (unless I badly  
 misread your original email, in which case apologies):
 http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes

Having looked into it more, I think recarray is probably what I want.  I
need to play with this and see how easy it is to convert ndarrays into
recarrays.

The main thing is that I'm looking for something *really* simple that
would, truly, just allow the conversion of:

myarray['myaxis']

to

myarray.myaxis

My suggestion is to define __getattr__ directly on the ndarray class. 
This is, TTBOMK, only called if an attribute is *not* found on the
object.  In this event, prior to throwing an AttributeError exception,
the object could check to see if the specified attribute exists as a
named axis/dimension of the multi-dimensional array, and if so, return
this.  Otherwise, carry on with the AttributeError exception.

I've spent an hour looking at the numpy code (my first time), and I
don't see any obvious way to do this, since ndarray is (AFAICT) a pure-C
object with auto-generated wrappers, which seems to preclude (easily)
adding a __getattr__(self,attr) method to the class.  If someone can
point me in the right direction, I'll keep looking into this, otherwise
I'm giving up and will just try and use recarray.

Ian
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__

2010-10-29 Thread Pierre GM

On Oct 29, 2010, at 3:58 PM, Ian Stokes-Rees wrote:

 
 Note that there are various extant projects that I think attempt to  
 provide similar functionality to what you're wanting (unless I badly  
 misread your original email, in which case apologies):
 http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes
 
 Having looked into it more, I think recarray is probably what I want.  I
 need to play with this and see how easy it is to convert ndarrays into
 recarrays.

your_rec=your_array.view(np.recarray)


 The main thing is that I'm looking for something *really* simple that
 would, truly, just allow the conversion of:
 
 myarray['myaxis']
 
 to
 
 myarray.myaxis

Attribute-style access is IMHO the only interest of recarrays over regular 
structured arrays. Note that it comes to a cost...


 My suggestion is to define __getattr__ directly on the ndarray class. 
 This is, TTBOMK, only called if an attribute is *not* found on the
 object.  In this event, prior to throwing an AttributeError exception,
 the object could check to see if the specified attribute exists as a
 named axis/dimension of the multi-dimensional array, and if so, return
 this.  Otherwise, carry on with the AttributeError exception.
 
 I've spent an hour looking at the numpy code (my first time), and I
 don't see any obvious way to do this, since ndarray is (AFAICT) a pure-C
 object with auto-generated wrappers, which seems to preclude (easily)
 adding a __getattr__(self,attr) method to the class.  If someone can
 point me in the right direction, I'll keep looking into this, otherwise
 I'm giving up and will just try and use recarray.

Indeed. Check how recarray does it, for example.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__

2010-10-29 Thread Pauli Virtanen
Fri, 29 Oct 2010 09:58:33 -0400, Ian Stokes-Rees wrote:
[clip]
 I've spent an hour looking at the numpy code (my first time), and I
 don't see any obvious way to do this, since ndarray is (AFAICT) a pure-C
 object with auto-generated wrappers, which seems to preclude (easily)
 adding a __getattr__(self,attr) method to the class.  If someone can
 point me in the right direction, I'll keep looking into this, otherwise
 I'm giving up and will just try and use recarray.

http://docs.python.org/c-api/typeobj.html#tp_getattro

The Python documentation doesn't seem to say very if method/attribute 
slots are consulted before falling back to tp_getattro. If tp_getattro is 
consulted first, then implementing it will lead to a performance hit.

I'd probably be +0 on providing recarray-like functionality on ordinary 
ndarrays, if it can be done without (significant) performance issues.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__

2010-10-28 Thread Robert Kern
On Thu, Oct 28, 2010 at 15:17, Ian Stokes-Rees
ijsto...@hkl.hms.harvard.edu wrote:
 I have an ndarray with named dimensions.  I find myself writing some
 fairly laborious code with lots of square brackets and quotes.  It seems
 like it wouldn't be such a big deal to overload __getattribute__ so
 instead of doing:

 r = genfromtxt('results.dat',dtype=[('a','int'), ('b', 'f8'),
 ('c','int'), ('d', 'a20')])
 scatter(r[r['d'] == 'OK']['a'], r[r['d'] == 'OK']['b'])

 I could do:

 scatter(r[r.d == 'OK'].a, r[r.d == 'OK'].b)

 which is really a lot clearer.  Is something like this already possible
 somehow?

See recarray which uses __getattribute__.

 Is there some reason not to map __getattr__ to __getitem__?

Using __getattribute__ tends to slow down almost all operations on the
array substantially. Perhaps __getattr__ would work better, but all of
the methods and attributes would mask the fields. If you can find a
better solution that doesn't have such an impact on normal
performance, we'd be happy to hear it.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__

2010-10-28 Thread Ian Stokes-Rees


On 10/28/10 5:29 PM, Robert Kern wrote:
 On Thu, Oct 28, 2010 at 15:17, Ian Stokes-Rees
 ijsto...@hkl.hms.harvard.edu wrote:
 I have an ndarray with named dimensions.  I find myself writing some
 fairly laborious code with lots of square brackets and quotes.  It seems
 like it wouldn't be such a big deal to overload __getattribute__ so
 instead of doing:

 r = genfromtxt('results.dat',dtype=[('a','int'), ('b', 'f8'),
 ('c','int'), ('d', 'a20')])
 scatter(r[r['d'] == 'OK']['a'], r[r['d'] == 'OK']['b'])

 I could do:

 scatter(r[r.d == 'OK'].a, r[r.d == 'OK'].b)

 which is really a lot clearer.  Is something like this already possible
 somehow?
 See recarray which uses __getattribute__.

Thanks -- I'll look into it.

 Is there some reason not to map __getattr__ to __getitem__?
 Using __getattribute__ tends to slow down almost all operations on the
 array substantially. Perhaps __getattr__ would work better, but all of
 the methods and attributes would mask the fields. If you can find a
 better solution that doesn't have such an impact on normal
 performance, we'd be happy to hear it.

But wouldn't the performance hit only come when I use it in this way? 
__getattr__ is only called if the named attribute is *not* found (I
guess it falls off the end of the case statement, or is the result of
the attribute hash table miss).

So the proviso is this shortcut only works if the field names are
distinct from any methods or attributes on the ndarray object (or its
sub-classes).

You've gotta admit that the readability of the code goes up *a lot* with
the alternative I'm proposing.

Ian

-- 
Ian Stokes-Rees, PhDW: http://portal.nebiogrid.org
ijsto...@hkl.hms.harvard.eduT: +1.617.432.5608 x75
NEBioGrid, Harvard Medical School   C: +1.617.331.5993

attachment: ijstokes.vcf___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__

2010-10-28 Thread Robert Kern
On Thu, Oct 28, 2010 at 16:37, Ian Stokes-Rees
ijsto...@hkl.hms.harvard.edu wrote:


 On 10/28/10 5:29 PM, Robert Kern wrote:
 On Thu, Oct 28, 2010 at 15:17, Ian Stokes-Rees
 ijsto...@hkl.hms.harvard.edu wrote:
 I have an ndarray with named dimensions.  I find myself writing some
 fairly laborious code with lots of square brackets and quotes.  It seems
 like it wouldn't be such a big deal to overload __getattribute__ so
 instead of doing:

 r = genfromtxt('results.dat',dtype=[('a','int'), ('b', 'f8'),
 ('c','int'), ('d', 'a20')])
 scatter(r[r['d'] == 'OK']['a'], r[r['d'] == 'OK']['b'])

 I could do:

 scatter(r[r.d == 'OK'].a, r[r.d == 'OK'].b)

 which is really a lot clearer.  Is something like this already possible
 somehow?
 See recarray which uses __getattribute__.

 Thanks -- I'll look into it.

 Is there some reason not to map __getattr__ to __getitem__?
 Using __getattribute__ tends to slow down almost all operations on the
 array substantially. Perhaps __getattr__ would work better, but all of
 the methods and attributes would mask the fields. If you can find a
 better solution that doesn't have such an impact on normal
 performance, we'd be happy to hear it.

 But wouldn't the performance hit only come when I use it in this way?
 __getattr__ is only called if the named attribute is *not* found (I
 guess it falls off the end of the case statement, or is the result of
 the attribute hash table miss).

That's why I said that __getattr__ would perhaps work better.

 So the proviso is this shortcut only works if the field names are
 distinct from any methods or attributes on the ndarray object (or its
 sub-classes).

Right.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion