[Cython] Thoughts on numerical computing/NumPy support

Dag Sverre Seljebotn Tue, 04 Mar 2008 12:54:23 -0800

Since Robert mentioned NumPy in relation with adding operator support I 
thought about sharing my more thoughts about NumPy - I'm very new to 
Cython so I guess take it for what it is worth - however what I've seen 
so far looks so promising for me that I might want to spend some time in 
a few months working on implementing some of this, which perhaps may 
make my thoughts more intereseting :-)


Currently, Cython is mostly geared towards wrapping C code, but it is 
also an excellent foundation for being a numerical tool - but the rough 
edges are still prohibitive. A few relatively small steps (in terms of 
man-hours needed) would improve the situation a lot I think - not 
perfect, but perhaps in a few years we can have something that will 
finally kill FORTRAN :-)

Three suggestions comes briefly here, if anyone's interested and it is 
not already discussed and decided I might flesh them out in "PEP-style" 
in the coming month?

Note that a) is what is important for me, b) and c) is just something I 
throw along...

a) numpy.ndarray syntax candy. Really, what one should implement is 
syntax support for PEP-3118:

http://www.python.org/dev/peps/pep-3118/

Because this protocol will be shared between NumPy, PIL etc. in Python 3 
it could make sense to simply have "native"/hard-coded support for this 
aspect without necesarrily making it a generic operator feature, and one 
can then use the same approach as will be needed for buffers in Python 3 
for NumPy in Python 2?

Example (where "array" is considered a new, Cython-native type that will 
have automatic conversion from any NumPy arrays and Python 3 buffers):

def myfunc(array<2, unsigned char> arr):
    arr[4, 5] = 10

might be translated to the equivalent of the currently legal:

def myfunc(numpy.ndarray arr):
    if arr.nd != 2 or arr.dtype != numpy.dtype(numpy.uint8):
      raise ValueError("Must pass 2-dimensional uint8 array.")
    cdef unsigned char* arr_buf = <unsigned char*>arr.data
    arr.data[4 * arr.strides[0] + 5 * arr.strides[1]] = 10

(Probably caching the strides in local variables etc.). That should do 
as a first implementation -- it is always possible to be more 
sophisticated, but this little will allow NumPyers to simply dive in. 
Specifically, the number of dimensions must be declared first and only 
direct access in that many dimensions are allowed. Slices etc. should be 
less important (they can be done on the Python object instead).

Moving on from here, one should probably instead define bufferinfo from 
PEP-3118 and make it say

def myfunc(bufferinfo arr):
    if arr.ndim != 2 or arr.format != "B") or arr.readonly:
      raise ValueError("Must pass writeable 2-dimensional buffer with 
format 'B'.")
...

with automatic conversion from NumPy arrays to bufferinfo.


b) Allow numpy types? Basically, make it possible to say "cdef uint8 
myvar", at least for in-function-variables that is not interfacing with 
C code, so that for numerical use one doesn't need to learn C. This can 
be in addition, so it should not break existing code, though I can 
understand resentment against the idea as well.

c) Probably controversial: More Pythonic syntax. A syntax for decoration 
of function arguments is decided upon (at least in Python 3), so to 
align with that one could allow for stuff like

@Compile
def myfunc(a: uint8, b: array(2, uint8), c: int = 10):
    d: ptr(int) = &a
    print a, b, c, d

Which is "almost" Python - only the definition of d is different, but 
consistency talks for change there as well. This can also be in addition 
to the existing syntax so it should not break anything (allowing, say, 
only one type of syntax per function).

But a) is what is interesting here...

-- 
Dag Sverre

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

[Cython] Thoughts on numerical computing/NumPy support

Reply via email to