Hi,

I've been looking at the nqueens benchmark for a while, and I think it's actually not that a bad benchmark for generators.

http://hg.python.org/benchmarks/file/tip/performance/bm_nqueens.py

A better implementation only for Py2.7/Py3 is here:

https://github.com/cython/cython/blob/master/Demos/benchmarks/nqueens.py

Cython currently runs the first implementation about as fast as Py3.3:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-pybenchmarks-py3k/lastSuccessfulBuild/artifact/chart.html

and the second one more than 3x as fast:

https://sage.math.washington.edu:8091/hudson/view/bench/job/cython-devel-cybenchmarks-py3k/lastSuccessfulBuild/artifact/chart.html

However, I think there's still some space for improvements, and local variables are part of that. For generator functions that do non-trivial things between yields, I think that local variables will quickly become a bottleneck. Currently, they are always closure fields, so any access to them will use a pointer indirection to a foreign struct, originally passed in as an argument to the function. Given that generators often do Python object manipulation through C-API calls, any such call will basically require the C compiler to assume that all values in the closure may have changed, thus disabling any optimisations for them. The same applies to many other object related operations or pointer operations (even DECREF!), as the C compiler cannot know that the generator function owns the closure during its lifetime exclusively.

I think it would be worth changing the current implementation to use local C variables for local Cython variables in the generator, and to copy the values back into/from the closure around yields. I'd even let local Python references start off as NULL when the generator is created, given that Vitek's branch can eliminate None initialisations now.

I started looking into this a bit, but it's not a quick change. The main problem I see is the current code duplication between the generator body node and the DefNode. It would be better if both could share more code. Basically, if local variables become truly local, the only differences will be that they come from call arguments in one case and from the closure in the other, and that generators have the additional jump-to-yield entry step. Everything else could hopefully be identical.

Once again, this goes hand in hand with the still pending DefNode refactoring...

Stefan
_______________________________________________
cython-devel mailing list
[email protected]
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to