I think there's a fundamental flaw to my NumPy proposal, which need
correction. I've thought about it as "polishing the access to the
extension type", so that after
cdef object x = numpy.zeros([3,3])
cdef numpy.ndarray y = x
y will behave efficiently when treated like a Python object.
There are problems to this way of thinking though: "print y.strides"
will not work as it is a pointer-array, while "print x.strides" will
work as it is a tuple (one has to do "print (<object>y).strides", which
is not going to fly with for the users of my NumPy project). On the
other hand, one has to remember to do "print y.shape" for tuple-access
but "y.dimensions[i]" for speedy, non-Python access. (And this
difference in behaviour comes entirely from strides having a name-clash
while shape/dimensions do not).
So while my approach has been to make the y variable act "more like a
numpy array", I now think this is flawed. Optimizations through typing
and access to extension structs should probably instead be treated as
fundamentally different things, and the typical NumPy user shouldn't
deal with a reference to the extension struct even if typing for speed
is wanted.
I'll now propose a solution for this. It has been proposed before in a
different form (pxd shadowing etc.); I hope I succeed better now.
I think that what is wanted here is to /speed up how NumPy objects are
accessed/, and the extension type only comes into it peripherally. So
I'd like a new syntax for modifying the compile-time behaviour of how
objects are treated. It could look something like the following.
I'm calling the keyword "compiletimefeatures" but that is for lack of a
better word, also I use cython_ndarray to avoid namespace clashes (have
ideas for allowing "ndarray" directly but I'd like to leave that out of
the discussion for now).
Anyway, numpy.ndarray is an extension type like before, while
cython_ndarray is a new "type specifier" providing extra compile-time
optimizations to the variable that carries its type. numpy.pxd:
cdef extern from "numpy/arrayobject.h":
ctypedef class numpy.ndarray [object PyArrayObject]:
cdef char *data
cdef int nd
cdef Py_intptr_t *dimensions
cdef Py_intptr_t *strides
...
compiletimefeatures cython_ndarray:
def __applyfeatures__(self):
if not isinstance(self, numpy.ndarray):
raise TypeError(...)
@property
cdef shape(self):
cdef numpy.ndarray imp = self
return (imp.dimensions[i] for idx in range(imp.nd))
# See my gsoc project for __getitem__, just replace "self" with "imp"
...
User code:
cdef cython_ndarray y = x
print y.shape
So, what happens:
* cython_ndarray is registered as a new "cdef-able" type.
* When assigning something to y, a runtime call to __applyfeatures__ is
automatically added (i.e. principially the same thing that happens with
extension type declarations, but more specific and versatile)
* When y is operated on, it is first checked whether cython_array
contains any compile time optimizations. If so, they are called (using y
as the "self", like a class, however "self" is a Python object!)
* End result: Rather than having to hope for some good luck in the
namespace resolution towards the extension type struct to provide the
speedup, one can be explicit about it.
Note: The "optimizations" provided above will only be faster if Cython
gets quite sophisticated unrolling/optimization. While I'd like to have
this, a more declarative approach might be more realistic. So something
like:
compiletimefeatures cython_ndarray:
carray_or_tuple shape(x): extension_type(x).shape
Sorry, couldn't think about a good declarative syntax now :-)
Note that this is instead of the plans to allow inlineable code in
extension type declarations. Extension type declarations wouldn't have
to be touched at all then.
Flame away :-) (yes, I see that this could be confusing to OOP. However
it is no worse than the current situation, one cannot really override
extension type struct items either.)
--
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev