On Nov 12, 2009, at 7:08 PM, Stefan Behnel wrote:
> Hi all,
>
> warning: long e-mail ahead, don't read in a hurry!
>
> I gave generator functions a couple of thoughts. Implementing them
> actually
> sounds simpler than it is, not because of the state keeping, but
> because of
> the refactoring (point 1 below) that I would like to see done before
> going
> there.
>
> Here's what I think should be done:
>
> 1) refactor def functions into a Python wrapper and a static C
> function
> * Python wrapper does all argument unpacking, return value packing
> and
> the final exception propagation
> * C function contains the complete body of the original function and
> returns the return value directly
That could clean things up a lot--it would be nice if this refactoring
cleaned up some of the redundancies and inconsistencies between
CFuncDefNode and DefNode. I'm not sure exactly what you mean by "final
exception propagation" or "return value packing" though, as we still
want cdef functions to propagate the exception. This could make cpdef
functions more natural as well. I think the extra c function call
overhead should be negligible, but we should be sure. Perhaps I'm
prematurely optimizing, but I hate the idea of potentially introducing
regressions.
> a) non-closure functions:
> - C function has signature as written in the code
> - Python wrapper calls C function to execute the body
Sure.
> b) closure functions:
> - C function has METH_NOARGS signature
> - Python wrapper creates closure and fills in arguments
> - Python wrapper calls C function with closure as 'self'
I'm not sure I'm following you here--the closure is created when the
function is created, not when it's called, and the function needs to
take its arguments later. We could separate things out as in case (a),
letting self be the actual closure argument, and binding it with the
bound method type as is done now.
I'm probably miss-understanding you, given
def add_n(n):
def f(x):
return x+n
return f
which C function has METH_NOARGS? Both add_n and f could be handled as
in (a), where f carries around a "self" that is the scope inherited
from add_n.
On that note, I certainly want to support
cdef add_n(n):
def f(x):
return x+n
return f
and perhaps, if we can figure out a clean way to do it,
cdef ??? add_n(n):
cdef int f(int x):
return x+n
return f
though this last case is not as important.
> 2) support writing utility code in Cython (does this work already?)
> * likely just compile TreeFragments inside of the utility_scope?
> (does the utility_scope actually have a unique mangling prefix
> or will it interfere with a user provided "utility" module?)
>
> 3) implement a generic 'generator' type in Cython code (see code
> below)
> * methods: __iter__, __next__, send, throw, close (as in PEP 342,
> see
> http://www.python.org/dev/peps/pep-0342/ )
> * fields: closure, exception, __weakref__, C function pointer
I assume the motivation is that this would be easier than just
generating a class with these methods every time? There's overhead to
calling virtual methods, and it seems odd to direct throw() and
__next__/send() into the same method, only to have an if statement to
separate them in its body.
> 4) implement generators as extension to 1b)
> * Python wrapper works mostly as in 1b), but
> - does not call the C function
> - creates and returns a generator instance instead and fills in
> the
> created closure and the pointer to the C function part of the
> generator function
I was imagining
def my_xrange(n):
i = 0
while i < n:
yield i
i += 1
would get transformed into
def my_xrange(n):
return __Pyx_my_xrange_generator_class(n)
> * generator functions become modified closure functions:
> - METH_O signature instead of METH_NOARGS to receive the send(x)
> value
> directly (note that gen.__next__() is defined as
> gen.send(None) and
> gen.throw(exc) could be implemented as gen.send(NULL))
> - closures additionally contain function temps (I'm thinking of a
> union of structs, i.e. one struct for each set of temps that
> existed
> during the code generation for a yield node, but I guess storing
> all temps is just fine to start with - won't impact performance,
> just memory)
I don't think we should put temps in all closure functions, but for
generators we certainly need to.
> - closures have an additional C field to store the execution state
> (void* to a function label, initially NULL)
Ah, function lablels, good idea. It might make sense to initialize it
to a label at the entry point of the function, so you can always start
with a goto *self->exec_state, but if that label is hard to get at
then setting (and checking) for NULL should work just fine.
> - "sendval = (yield [expr])" emits the following code:
> - store away all current temp values in the closure
> - set "closure._resume_label" to the resume label (see below,
> uses
> the C operator "&&")
> - return the expression result (or None) - return immediately
> without cleanup (the temp that holds the expression result
> must be
> unmanaged to prevent DECREF()-ing on resume; INCREF()-ing the
> return value will keep it alive for too long)
> - here goes the resume label ("__Lxyz_resume_from_yield:")
> - reset all saved temp values from the closure
> - if an exception is to be raised (gen.throw() was called,
> which has
> already set the exception externally), use normal exception
> path
Rather than inserting this after every label, I think throw() should
be a completely separate function.
> - set the result temp of the yield node to the send value
> argument
> that was passed (INCREF or not, as for parameters)
> * generator C function basically implements gen.send(x)
> - receives both the closure and the current send value as
> parameters
> - if "closure._resume_label" is not NULL, jump to the label;
> otherwise, check that 'x' is None (raise an exception if not)
> and
> execute the function body normally
>
> So the main work that's left to be done in 4) will be the closure
> extension
> to include the temps and the yield/resume implementation.
>
> Here's the (trivial) generic generator type:
>
> cdef class generator:
> cdef object _closure
> cdef meth_o_func* _run
> cdef object __weakref__
>
> def __iter__(self):
> return self
>
> def __next__(self):
> return self._run(self._closure, None)
>
> def send(self, value):
> return self._run(self._closure, value)
>
> def throw(self, type, value=None, traceback=None):
> EXC_RESET(type, value, traceback)
> return self._run(self._closure, NULL)
>
> def close(self):
> try:
> EXC_RESET(GeneratorExit, NULL, NULL)
> self._run(self._closure, NULL)
> except (GeneratorExit, StopIteration):
> pass
> else:
> raise RuntimeError('generator ignored GeneratorExit')
We also need a __dealloc__ method to clean up any stored temps, etc.
in case this generator goes out of scope before it is used up.
> I wonder if there is a way to make it inherit from CPython's
> GeneratorType.
> That would enhance the interoperability, but it would also mean that
> we add
> some unnecessary instance size overhead and that we have to prevent
> that
> base-type from doing anything, including initialisation and final
> cleanup.
That could be nice, but it could have unforeseen and nasty
consequences, especially if the generator type evolves as well.
> The separation in 1a) has also been requested by Lisandro (and likely
> others) a while ago to make the function setup code more readable.
> Currently, the argument unpacking code takes so much space that it's
> easy
> to get lost when trying to read the generated function code,
> especially in
> short functions.
>
> The refactoring for 1) actually conflicts a bit with cpdef
> functions, which
> do the exact opposite: they create a DefNode for an existing C
> function. I
> wonder if it makes sense to swap that while we're at it. That would
> reduce
> some redundancy.
Yep, makes sense to me.
> Ok, this is a rather lengthy e-mail that's a bit akin to a spec
> already.
> Does this make sense to everybody? Any objections or ideas? Anyone
> happy to
> give a hand? :)
I'd love to help implement this with you, but I've got a thesis to
write...
- Robert
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev