Greg Ewing skrev:
> IMO it would be better to think about how to enhance the
> language to remove the need for using PyObject in the first
> place.
>
From my own experience:
- When sorting a list, we know refcounts do not change, we just need to
swap pointers quickly around. We can use PyObject* for this, but not
object because Cython/Pyrex starts refcounting.
- When re-implementing scipy.signal.lfilter, I used two buffers for
filter delays (z values), which was swapped for every sample filtered.
Just using pointers is fast, using object is slow.
Suggestion 1:
Provide a new built-in type "cobject", indicating PyObject* without
automatic refcounting. This might lead to more efficient code, avoiding
the second layer of indirection we'd get from object *. It might
actually be implemented as simple as ctypedef PyObject *cobject. Now we
can write code like this:
cdef _sort( cobject array[] ) nogil:
# here we can swap array[i] and array[j] quickly
def sort(object iterable):
cdef np.ndarray[object] array = np.array(iterable, dtype=object)
cdef Py_ssize_t n, i
with nogil:
_sort( <cobject *> array.data) # allow cobject*, but not object*
n = array.shape[0]
for i in range(n):
iterable[i] = array[i]
Advantage: Very simple to implement and gives efficient C. This is
basically how PyObject is used in Pyrex/Cython anyway. Disadvantage: no
benefit for numerical programmers like myself.
Suggestion 2:
Introduce a weakref keyword. Basically a qualifier to a Python object
that means 'turn off refcounting please'. A weakref object would not
require the GIL. It would be the responsibility of the programmer not to
mess up refcounts and ensure that a reference to the object is actually
kept. And efficient sort could now look like this:
cdef _sort(weakref np.ndarray[weakref object] array) nogil:
# here we can swap array[i] and array[j] quickly,
# weakrefs are not refcounted
def sort(object iterable):
cdef np.ndarray[object] array = np.array(iterable, dtype=object)
cdef weakref np.ndarray[weakref object] wref_array = array
cdef Py_ssize_t n, i
with nogil:
_sort( wref_array )
n = array.shape[0]
for i in range(n):
iterable[i] = array[i]
Advantage: "cdef weakref np.ndarray[double] array" would be a major
advantage for numerical programming, as it allows ndarrays to be passed
to functions in nogil blocks. This would allow a "weakref np.ndarray" to
be used like a Fortran 90 pointer, i.e. operator [] indexes the
referenced buffer, not the pointer.
Suggestion 3:
Allow pointers to Python objects (object *) which would indicate a
second layer of indirection (PyObject**). It would be the responsibility
of the programmer to ensure that an object* always points to a living
object. Now an efficient sort could look like this:
cdef _sort(object *parray[]) nogil:
# Here we can swap parray[i] and parray[j] without
# any refcounting slowing us down or locking the
# interpreter. parray[i][0] is iterable[i].
def sort(object iterable):
cdef np.ndarray[object] tmp
cdef Py_ssize_t n, i
cdef np.ndarray[object*] array = np.zeros(len(iterable), dtype=int)
n = len(lst)
for i in range(n):
array[i] = &(iterable[i])
with nogil:
_sort( <object **> array.data )
tmp = np.array(lst, dtype=object)
for i in range(n):
tmp[i] = array[i][0]
for i in range(n):
iterable[i] = tmp[i]
Problem: some people will probably discover that np.ndarray[PyObject*]
is more efficient and use that instead. Thus, including object* may not
work as intended - i.e. PyObject would still be used. The double
indirection also makes the syntax more messy. Advantage: allows pointers
to ndarrays for numerical programming:
cdef np.ndarray[double] array = np.zeros(10)
cdef np.ndarray[double] *parray
parray = &array
It would look nicer if unary * did work, so we could write *parray[i]
instead of parray[0][i]. This syntax is error prone, though, as someone
could easily write parray[i] instead of parray[0][i]. With a dtype of
object, the mistake might not be caught by the compiler either, but
crash at run-time.
Indexing becomes a havoc:
cdef np.ndarray[object] b
cdef np.ndarray[np.npy_intp] tmp = np.arange(&b.data, &b.data + len(b) *
sizeof(object*), sizeof(object*))
cdef np.ndarray[object] *a = &b => index a[0][i] to get b[i], a not
refcounted, content refcounted
cdef np.ndarray[object*] *a = &tmp => index a[0][i][0] to get b[i],
nothing refcounted
cdef np.ndarray[object*] a = tmp => index a[i][0] to get b[i], a
refcounted, content not refcounted
cdef object *a = b.data => index a[i] to get b[i], a not refcounted,
content refcounted
cdef object **a = tmp.data => index a[0][i] to get b[i], nothing refcounted
This is a syntactical mess and error prone. One could also consider to
allow both cobject and object*, making the sort functions something like
this:
cdef _sort(np.ndarray[cobject] *parray) nogil:
# faster than using object *parray[], as indexes are transposed:
# parray[0][i] is iterable[i], instead of parray[i][0], adding
further confusion.
The bad thing is that this is how PyObject** often is used in
Pyrex/Cython code. Having a weakref keyword is much more safe and clean,
and better in any way I can think of. Except that it is more difficult
to implement.
Finally, if you are afraid of dangling pointers, none of these
suggestions are any worse than this:
cdef char *pstr
string = "whatever"
pstr = <char *> string
My personal preference would thus be a weakref keyword (or something
similar) if that kind of syntax is feasible. It gives full control over
refcounting, removes the need to use PyObject*, and allows "Fortran
pointers" for numpy programming. One just has to educate programmers to
use numpy arrays instead of allocating C buffers with malloc:
cdef object tmp
cdef weakref np.ndarray[weakref object] arr
arr = tmp = np.zeros(10, dtype=object)
instead of:
cdef PyObject **arr = <PyObject **> malloc(10 * sizeof(PyObject*))
It should do almost the same, while preventing common C mistakes like
forgetting (or not bothering) to check the return value from malloc,
forgetting to call free afterwards, skipping the call to free in case of
an exception, etc.
At least, please don't allow object* in the language. I really don't
want to waste my time debugging someone's source code using it (and bad
experience tells me I am going to...)
Regards,
Sturla Molden
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev