On Sun, Jun 8, 2008 at 2:59 PM, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Hi, > > I did a tiny bit of profiling on Cython compiling lxml.etree. Here are the > numbers I get: > > """ > 3127018 function calls (2777951 primitive calls) in 25.128 CPU seconds > > Ordered by: internal time, call count > List reduced from 1035 to 20 due to restriction <20> > > ncalls tottime percall cumtime percall filename:lineno(function) > 119144 4.579 0.000 4.671 0.000 > Scanners.py:148(run_machine_inlined) > 18316 1.586 0.000 1.586 0.000 codecs.py:371(read) > 119144 1.074 0.000 5.746 0.000 Scanners.py:109(scan_a_token) > 88362 1.023 0.000 7.976 0.000 Scanners.py:88(read) > 29680 0.673 0.000 1.659 0.000 Code.py:103(mark_pos) > 77 0.588 0.008 0.588 0.008 posixpath.py:168(exists) > 88362 0.538 0.000 8.514 0.000 Scanning.py:397(next) > 23318/2985 0.495 0.000 0.517 0.000 Nodes.py:155(end_pos) > 70446 0.442 0.000 0.591 0.000 Code.py:62(put) > 90657/10562 0.351 0.000 4.549 0.000 Parsing.py:59(p_binop_expr) > [...] > """ > > So the major headache here is Scanners.py in Plex. The method at the top-rank > is a huge function. According to the comments, it's the result of inlining a > couple of method calls that originally lead to slow code, and it looks heavily > profiled already. > > Assuming that further optimisation attempts were rather futile, I just > compiled the module with Cython. The first (obvious) result is that the > internal calls disappear from the profile log, as they are now internal C > calls. The call that remains is Scanner.next(), which originally took an > accumulated 8.5 seconds. In the compiled version, it's down to just over 5 > seconds, that's more than 40 percent faster. > > """ > 2595627 function calls (2246560 primitive calls) in 18.681 CPU seconds > > Ordered by: internal time, call count > List reduced from 1028 to 20 due to restriction <20> > > ncalls tottime percall cumtime percall filename:lineno(function) > 88362 4.246 0.000 5.041 0.000 Scanning.py:397(next) > 29680 0.673 0.000 1.632 0.000 Code.py:103(mark_pos) > 70446 0.439 0.000 0.586 0.000 Code.py:62(put) > 90657/10562 0.335 0.000 3.228 0.000 Parsing.py:59(p_binop_expr) > 23318/2985 0.316 0.000 0.338 0.000 Nodes.py:155(end_pos) > 65791 0.295 0.000 0.903 0.000 Code.py:47(putln) > 29677 0.289 0.000 0.915 0.000 Code.py:93(file_contents) > 4724 0.287 0.000 0.292 0.000 Symtab.py:532(allocate_temp) > 88070 0.247 0.000 0.247 0.000 ExprNodes.py:192(subexpr_nodes) > 52071 0.232 0.000 0.232 0.000 Nodes.py:82(__init__) > [...] > """ > > In total, I get an improvement of 12% in compilation time. That makes me think > that it's actually worth putting the compilation into Cython's own setup.py, > and installing the compiled Scanning module next to the Python one (Python > prefers C extensions on import). Here's a patch, what do you think?
Maybe it'd be cool if one could instruct cython using comments, or strings, so that one has one file, that works in regular python, and you can use it to compile itself, but more optimized, than your patch does. Otherwise, how can you improve the cythonized code (not having to maintain the pure python and cython codes side by side)? Ondrej _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
