Hi,

I would like to get support for borrowed references in Cython.

My use case is a package I recently wrote, which uses an automaton (DFA) 
for multi-keyword text search in a unicode string. The automaton was 
originally modeled as a set of state objects that basically contain a 
(non-dict) map from characters to subsequent state objects. The 
transformation is currently written in Python. The search engine itself, 
written in Cython, is rather straight forward. It starts with a reference 
to one state and jumps from one state to the next while reading the 
sequence of input characters. The engine doesn't create any new objects 
until a match was found. All it does is update a reference to the current 
state. If that reference was an unmanaged reference, the whole engine could 
run without requiring the GIL. However, since it's not, Cython requires the 
GIL to update the reference for each new input character. (Note that a 
PyObject* won't work here as this would prevent the code from accessing the 
state's attributes).

I ended up rewriting the engine to use a struct instead, which added quite 
a bit to the LOC count and also a bit to the memory footprint (due to 
duplicated pointers). The code was nice and simple before that, now it's 
still somewhat short, but it clearly became less beautiful.

I'm not sure yet what would be needed to support borrowed references, but I 
don't think it's trivial. There was an older discussion about borrowed 
references on cython-dev:

http://comments.gmane.org/gmane.comp.python.cython.devel/6864

However, that only dealt with stolen function arguments and borrowed return 
values. My use case above makes me believe that it would be just as useful 
for local variables and (potentially) object attributes. So you could write

     cdef borrowed object borrowed_ref

and Cython would disable ref-counting for "borrowed_ref", i.e.

     borrowed_ref = some_value       # no incref
     some_normal_var = borrowed_ref  # normal incref

However, this becomes problematic when a new reference is (accidentally?) 
assigned to the variable, e.g.

     borrowed_ref = []

The above could raise an error at compile time, and I actually think that 
we could use the same mechanism as for e.g. bytes->char* conversions of 
temporary values to detect incorrect code. Also, functions could be 
required to declare their return values as "borrowed" to allow such an 
assignment. That would provide a reasonable level of safety IMO.

Comments?

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to