[Cython] Python type optimizations (NumPy GSoC-related)

Dag Sverre Seljebotn Fri, 09 May 2008 03:11:26 -0700

I think there's a fundamental flaw to my NumPy proposal, which need 
correction. I've thought about it as "polishing the access to the 
extension type", so that after


cdef object x = numpy.zeros([3,3])
cdef numpy.ndarray y = x

y will behave efficiently when treated like a Python object.

There are problems to this way of thinking though: "print y.strides" 
will not work as it is a pointer-array, while "print x.strides" will 
work as it is a tuple (one has to do "print (<object>y).strides", which 
is not going to fly with for the users of my NumPy project). On the 
other hand, one has to remember to do "print y.shape" for tuple-access 
but "y.dimensions[i]" for speedy, non-Python access. (And this 
difference in behaviour comes entirely from strides having a name-clash 
while shape/dimensions do not).

So while my approach has been to make the y variable act "more like a 
numpy array", I now think this is flawed. Optimizations through typing 
and access to extension structs should probably instead be treated as 
fundamentally different things, and the typical NumPy user shouldn't 
deal with a reference to the extension struct even if typing for speed 
is wanted.

I'll now propose a solution for this. It has been proposed before in a 
different form (pxd shadowing etc.); I hope I succeed better now.

I think that what is wanted here is to /speed up how NumPy objects are 
accessed/, and the extension type only comes into it peripherally. So 
I'd like a new syntax for modifying the compile-time behaviour of how 
objects are treated. It could look something like the following.

I'm calling the keyword "compiletimefeatures" but that is for lack of a 
better word, also I use cython_ndarray to avoid namespace clashes (have 
ideas for allowing "ndarray" directly but I'd like to leave that out of 
the discussion for now).

Anyway, numpy.ndarray is an extension type like before, while 
cython_ndarray is a new "type specifier" providing extra compile-time 
optimizations to the variable that carries its type. numpy.pxd:

cdef extern from "numpy/arrayobject.h":
    ctypedef class numpy.ndarray [object PyArrayObject]:
        cdef char *data
        cdef int nd
        cdef Py_intptr_t *dimensions
        cdef Py_intptr_t *strides
        ...


compiletimefeatures cython_ndarray:
    def __applyfeatures__(self):
        if not isinstance(self, numpy.ndarray):
            raise TypeError(...)

    @property
    cdef shape(self):
        cdef numpy.ndarray imp = self
        return (imp.dimensions[i] for idx in range(imp.nd))
       
    # See my gsoc project for __getitem__, just replace "self" with "imp"
    ...


User code:

cdef cython_ndarray y = x
print y.shape

So, what happens:

 * cython_ndarray is registered as a new "cdef-able" type.

 * When assigning something to y, a runtime call to __applyfeatures__ is 
automatically added (i.e. principially the same thing that happens with 
extension type declarations, but more specific and versatile)

 * When y is operated on, it is first checked whether cython_array 
contains any compile time optimizations. If so, they are called (using y 
as the "self", like a class, however "self" is a Python object!)

 * End result: Rather than having to hope for some good luck in the 
namespace resolution towards the extension type struct to provide the 
speedup, one can be explicit about it.

Note: The "optimizations" provided above will only be faster if Cython 
gets quite sophisticated unrolling/optimization. While I'd like to have 
this, a more declarative approach might be more realistic. So something 
like:

compiletimefeatures cython_ndarray:
    carray_or_tuple shape(x): extension_type(x).shape

Sorry, couldn't think about a good declarative syntax now :-)

Note that this is instead of the plans to allow inlineable code in 
extension type declarations. Extension type declarations wouldn't have 
to be touched at all then.

Flame away :-) (yes, I see that this could be confusing to OOP. However 
it is no worse than the current situation, one cannot really override 
extension type struct items either.)


-- 
Dag Sverre

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

[Cython] Python type optimizations (NumPy GSoC-related)

Reply via email to