Re: [Cython] Profiling Cython

Ondrej Certik Wed, 11 Jun 2008 16:51:51 -0700

On Sun, Jun 8, 2008 at 2:59 PM, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I did a tiny bit of profiling on Cython compiling lxml.etree. Here are the
> numbers I get:
>
> """
>         3127018 function calls (2777951 primitive calls) in 25.128 CPU seconds
>
>   Ordered by: internal time, call count
>   List reduced from 1035 to 20 due to restriction <20>
>
>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>   119144    4.579    0.000    4.671    0.000 
> Scanners.py:148(run_machine_inlined)
>    18316    1.586    0.000    1.586    0.000 codecs.py:371(read)
>   119144    1.074    0.000    5.746    0.000 Scanners.py:109(scan_a_token)
>    88362    1.023    0.000    7.976    0.000 Scanners.py:88(read)
>    29680    0.673    0.000    1.659    0.000 Code.py:103(mark_pos)
>       77    0.588    0.008    0.588    0.008 posixpath.py:168(exists)
>    88362    0.538    0.000    8.514    0.000 Scanning.py:397(next)
> 23318/2985    0.495    0.000    0.517    0.000 Nodes.py:155(end_pos)
>    70446    0.442    0.000    0.591    0.000 Code.py:62(put)
> 90657/10562    0.351    0.000    4.549    0.000 Parsing.py:59(p_binop_expr)
> [...]
> """
>
> So the major headache here is Scanners.py in Plex. The method at the top-rank
> is a huge function. According to the comments, it's the result of inlining a
> couple of method calls that originally lead to slow code, and it looks heavily
> profiled already.
>
> Assuming that further optimisation attempts were rather futile, I just
> compiled the module with Cython. The first (obvious) result is that the
> internal calls disappear from the profile log, as they are now internal C
> calls. The call that remains is Scanner.next(), which originally took an
> accumulated 8.5 seconds. In the compiled version, it's down to just over 5
> seconds, that's more than 40 percent faster.
>
> """
>         2595627 function calls (2246560 primitive calls) in 18.681 CPU seconds
>
>   Ordered by: internal time, call count
>   List reduced from 1028 to 20 due to restriction <20>
>
>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>    88362    4.246    0.000    5.041    0.000 Scanning.py:397(next)
>    29680    0.673    0.000    1.632    0.000 Code.py:103(mark_pos)
>    70446    0.439    0.000    0.586    0.000 Code.py:62(put)
> 90657/10562    0.335    0.000    3.228    0.000 Parsing.py:59(p_binop_expr)
> 23318/2985    0.316    0.000    0.338    0.000 Nodes.py:155(end_pos)
>    65791    0.295    0.000    0.903    0.000 Code.py:47(putln)
>    29677    0.289    0.000    0.915    0.000 Code.py:93(file_contents)
>     4724    0.287    0.000    0.292    0.000 Symtab.py:532(allocate_temp)
>    88070    0.247    0.000    0.247    0.000 ExprNodes.py:192(subexpr_nodes)
>    52071    0.232    0.000    0.232    0.000 Nodes.py:82(__init__)
> [...]
> """
>
> In total, I get an improvement of 12% in compilation time. That makes me think
> that it's actually worth putting the compilation into Cython's own setup.py,
> and installing the compiled Scanning module next to the Python one (Python
> prefers C extensions on import). Here's a patch, what do you think?


Maybe it'd be cool if one could instruct cython using comments, or
strings, so that one has one file, that works in regular python, and
you can use it to compile itself, but more optimized, than your patch
does. Otherwise, how can you improve the cythonized code (not having
to maintain the pure python and cython codes side by side)?

Ondrej
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Profiling Cython

Reply via email to