[Dev] Profiling techniques

Phillip J. Eby Fri, 09 Sep 2005 14:37:06 -0700

Due to frequent net outages during my attempt at an IRC profiling clinicearlier this week, we didn't get to go through the basic stats-printingfunctions that day. So Katie asked if I would write up a summary of thestuff that didn't get covered, so that folks could bring profiling data forpart 2 of the clinic.

We had gotten as far as running Python with "-im profile_tests testname" inorder to profile one or more unit tests, and get into a Python prompt witha stats object:

$ python -im profile_testsapplication.tests.TestSchemaAPI.SchemaTests.testDeriveFromCore

.
----------------------------------------------------------------------
Ran 1 test in 1.922s

OK
         347794 function calls (345724 primitive calls) in 9.728 CPU seconds

   Ordered by: internal time
   List reduced from 627 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

204/6 1.738 0.009 6.966 1.161release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)

        1    0.611    0.611    4.921    4.921 application\schema.py:1(?)
        1    0.475    0.475    0.486    0.486 repository\schema\types.py:2(?)
        1    0.262    0.262    0.400    0.400 application\parcel.py:4(?)

36 0.259 0.007 0.331 0.009repository\item\item.py:28(__init__)4491/3539 0.219 0.000 1.928 0.001repository\schema\kind.py:418(iterAttributes)3316 0.189 0.000 0.642 0.000repository\util\linkedmap.py:156(__setitem__)

        1    0.173    0.173    0.175    0.175 repository\schema\kind.py:2(?)

450 0.157 0.000 0.429 0.001repository\util\classloader.py:11(loadClass)10459 0.155 0.000 0.223 0.000repository\util\linkedmap.py:398(iterkeys)



>>>

At this point, the local variable 's' is a "Stats" object:

>>> s
<pstats.Stats instance at 0x025AC760>

The Stats class is documented in detail at:

    http://python.org/doc/2.4.1/lib/profile-stats.html

But the key features you need to know about are 'sort_stats()' and'print_stats()'. Here, I'll sort the calls by "cumulative time", and printthe top 20 functions/methods:


>>> s.sort_stats('cum').print_stats(20)
         347794 function calls (345724 primitive calls) in 9.728 CPU seconds

   Ordered by: cumulative time
   List reduced from 627 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    9.728    9.728 <string>:1(?)

1 0.000 0.000 9.728 9.728release\bin\lib\unittest.py:743(__init__)3 0.000 0.000 7.433 2.478application\schema.py:1201(reset)6 0.039 0.007 7.267 1.211repository\persistence\repositoryview.py:392(loadPack)204/6 0.007 0.000 6.966 1.161repository\util\sax.py:183(parseFile)204/6 1.738 0.009 6.966 1.161release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)3 0.000 0.000 5.935 1.978repository\persistence\repositoryview.py:1053(__init__)3 0.000 0.000 5.934 1.978repository\persistence\repositoryview.py:39(__init__)3 0.000 0.000 5.934 1.978repository\persistence\repositoryview.py:92(openView)3 0.001 0.000 5.933 1.978repository\persistence\repositoryview.py:116(_loadSchema)1 0.046 0.046 5.104 5.104release\bin\lib\unittest.py:766(parseArgs)1 0.000 0.000 5.058 5.058release\bin\lib\unittest.py:789(createTests)1 0.000 0.000 5.058 5.058release\bin\lib\unittest.py:552(loadTestsFromNames)1 0.082 0.082 5.058 5.058release\bin\lib\unittest.py:510(loadTestsFromName)1 0.053 0.053 4.976 4.976application\tests\testschemaapi.py:1(?)

        1    0.611    0.611    4.921    4.921 application\schema.py:1(?)

1 0.000 0.000 4.623 4.623release\bin\lib\unittest.py:793(runTests)1 0.000 0.000 4.623 4.623release\bin\lib\unittest.py:692(run)1 0.000 0.000 4.623 4.623release\bin\lib\unittest.py:427(__call__)1 0.000 0.000 4.623 4.623release\bin\lib\unittest.py:420(run)



<pstats.Stats instance at 0x025AC760>
>>>

Details on how to read these printouts, by the way, can be found at:

    http://python.org/doc/2.4.1/lib/module-profile.html

Different sort orders are useful for different kinds of analysis. If youare trying to find out what code to speed up, a sort by "time" is moreuseful, but I find that if you're trying to make your program *faster* -i.e., make it run less code, then a cumulative sort is usually moreuseful. Nonetheless, I always look at the "time" sort first, becausesometimes one or two functions are using the lion's share of the program'srun time, and you want to pay attention to that. However, usually thatlion's share isn't because the function is slow, but because it's being runthousands or tens of thousands of times. (This is one way that the'percall' columns come in handy; if the percall is 0.000, then the functionis going to be hard to optimize, and you need to look further up the calltree in order to stop calling the function so often.)

So, sorting by cumulative time usually gives you a better grasp on theoverall situation. If you look at the first stats printout above, you'llget the impression that there are a couple of seconds spent on XMLprocessing, and that everything else is a quarter-second here, half secondthere, no big deal. However, if you look at the second printout, you'llsee that over 70% of this test's runtime is spent in loading repositorypacks. Looking at the 'ncalls' column, you can see that schema.reset() iscalled 6 times, and ends up invoking loadPack() 6 times, but loadPack() isthen recursively reinvoked a total of 204 times, indicating that eachinitial loadPack() causes 34, maybe 35 XML files to be loaded. And you canfigure this out just from the profile.

However, if you go down the chart a little, you'll also see that 5.1seconds is spent in code called from the unittest module itself, including4.976 seconds in TestSchemaAPI.py itself. Where is this time comingfrom? The ":1(?)" at the end of the line tells us that this is the moduleinitialization code, and most of the time in module initialization code isspent doing imports. This suggests that almost half of the profiledruntime is spent simply importing the repository and the libraries it uses,and that perhaps much of the loadPack() call time is attributable toimporting as well.

What does this tell us? Well, what it tells me is that profile_tests isbroken. :) Including the import time in this profile is making itimpossible to see what's really going on. Mostly, I've been usingprofile_tests to profile things that take so long as to make the importtimes moot, or else things where I *wanted* the import time included, likeprofiling Chandler startup. This example, however, makes it clear thatunittest's gathering of tests needs to occur before the profiler isstarted, and I will be changing profile_tests to do exactly that. So,rerunning the statistics with the new and improved profile_tests program:


>>> s.sort_stats('cum').print_stats(20)
         238484 function calls (237037 primitive calls) in 4.221 CPU seconds

   Ordered by: cumulative time
   List reduced from 441 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

1 0.000 0.000 4.221 4.221release\bin\lib\unittest.py:793(runTests)1 0.000 0.000 4.220 4.220release\bin\lib\unittest.py:692(run)1 0.000 0.000 4.220 4.220release\bin\lib\unittest.py:427(__call__)1 0.000 0.000 4.220 4.220release\bin\lib\unittest.py:420(run)1 0.000 0.000 4.220 4.220release\bin\lib\unittest.py:280(__call__)1 0.000 0.000 4.220 4.220release\bin\lib\unittest.py:245(run)2 0.000 0.000 3.806 1.903application\schema.py:1201(reset)4 0.000 0.000 3.700 0.925repository\persistence\repositoryview.py:392(loadPack)136/4 0.005 0.000 3.698 0.924repository\util\sax.py:183(parseFile)136/4 0.490 0.004 3.698 0.924release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)2 0.000 0.000 2.826 1.413repository\persistence\repositoryview.py:1053(__init__)2 0.000 0.000 2.825 1.413repository\persistence\repositoryview.py:39(__init__)2 0.000 0.000 2.825 1.413repository\persistence\repositoryview.py:92(openView)2 0.000 0.000 2.825 1.412repository\persistence\repositoryview.py:116(_loadSchema)82 0.002 0.000 2.177 0.027repository\persistence\packhandler.py:38(startElement)78 0.008 0.000 2.172 0.028repository\persistence\packhandler.py:98(itemStart)132 0.017 0.000 2.152 0.016repository\persistence\packhandler.py:144(loadItem)1 0.000 0.000 2.139 2.139application\tests\testschemaapi.py:29(tearDown)1 0.000 0.000 1.861 1.861application\tests\testschemaapi.py:26(setUp)82 0.002 0.000 1.502 0.018repository\persistence\packhandler.py:52(endElement)

This gives us a different picture. Now, loadPack() is over 87% of theruntime, with the actual XML parsing (in libxml2.SAXParseFile) accountingfor 10%. So, somewhere around 77% of the runtime is in processing theloaded pack data. Looking at the cumulative times for startElement andendElement -- 3.679 when put together -- appears to confirm this hypothesis.

(By the way, in case I didn't mention it earlier, the times shown in theprofiler are *relative*, not real time. The test takes under 1 second in"real life", so you can't treat these times as having something to do withwall clock time.)


To go deeper, we can now begin to look at call trees:

>>> s.print_callees('startElement')
   Ordered by: internal time
   List reduced from 441 to 3 due to restriction <'startElement'>

Function
       called...
repository\item\itemhandler.py:753(startElement)
        repository\item\itemhandler.py:77(startDocument)(306)    0.007

        repository\item\itemhandler.py:85(startElement)(2434)    0.248

        repository\item\itemhandler.py:415(__init__)(306)    0.037

repository\item\itemhandler.py:85(startElement)
        repository\item\itemhandler.py:160(attributeStart)(396)    0.056

        repository\item\itemhandler.py:176(valueStart)(116)    0.033

        repository\item\itemhandler.py:427(refStart)(554)    0.081

        repository\item\itemhandler.py:468(itemStart)(306)    0.038

        repository\schema\types.py:1424(valueStart)(20)    0.001

        repository\util\sax.py:173(errorOccurred)(2434)    0.048

repository\persistence\packhandler.py:38(startElement)
        repository\persistence\packhandler.py:61(packStart)(4)    0.004

        repository\persistence\packhandler.py:98(itemStart)(78)    2.137

        repository\util\sax.py:173(errorOccurred)(82)    0.048


<pstats.Stats instance at 0x0257A5D0>
>>>

This actually gave more data than we wanted; we wanted the "startElement"function on line 38 of packhandler.py:


>>> s.print_callees('packhandler.py:38')
   Ordered by: internal time
   List reduced from 441 to 1 due to restriction <'packhandler.py:38'>

Function
       called...
repository\persistence\packhandler.py:38(startElement)
        repository\persistence\packhandler.py:61(packStart)(4)    0.004

        repository\persistence\packhandler.py:98(itemStart)(78)    2.137

        repository\util\sax.py:173(errorOccurred)(82)    0.048


<pstats.Stats instance at 0x0257A5D0>
>>>

Ah, that's better. print_callees() and print_callers() accept a stringthat's matched against the output lines, so you can use all or part of aname or filename/line # to narrow down the output. Anyway, we can now seethat most of the time spent in startElement() is actually being spent initemStart (line 98 of packhandler.py). And we can drill into that:


>>> s.print_callees('packhandler.py:98')
   Ordered by: internal time
   List reduced from 441 to 1 due to restriction <'packhandler.py:98'>

Function
    called...
repository\persistence\packhandler.py:98(itemStart)
     repository\persistence\packhandler.py:144(loadItem)(132)    2.114

     repository\persistence\repositoryview.py:282(find)(4)    0.114

c:\cygwin\home\pje\c6\chandler\repository\util\path.py:19(__init__)(4)0.032

c:\cygwin\home\pje\c6\chandler\repository\util\sax.py:173(errorOccurred)(64)0.048


     release\bin\lib\ntpath.py:59(join)(140)    0.011

     release\bin\lib\sre.py:178(compile)(6)    0.000


<pstats.Stats instance at 0x0257A5D0>
>>>

So, 2.114 seconds are spent in loadItem(), which is very close to the totaltime spent in startElement and is 50% of the total test execution time.

Since I only chose to analyze this test because it's very very short, andbecause I've now shown nearly all of the ways to use a stats object, I'mgoing to stop analyzing it here, and transition to profiling the Chandler UI.

Profiling in the Chandler UI is pretty simple. From the "Test" menu select"Start Profiler", and then do whatever it is you want to profile. Thenselect "Stop Profiler". This creates a file called "Events.prof", whichyou can analyze by opening a Python shell from the "Test" menu, and thendoing something like this (it helps if you maximize the window):


>>> from hotshot.stats import load
>>> s=load('Events.prof')
>>> s.sort_stats("cum").print_stats(20)
         187634 function calls (170697 primitive calls) in 3.675 CPU seconds

   Ordered by: cumulative time
   List reduced from 874 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

185/56 0.004 0.000 3.675 0.066parcels\osaf\framework\blocks\block.py:563(callNamedMethod)1 0.001 0.001 3.666 3.666parcels\osaf\views\main\sidebar.py:384(onKindParameterizedEvent)2/1 0.000 0.000 3.662 3.662parcels\osaf\framework\blocks\block.py:122(postEventByName)2/1 0.000 0.000 3.662 3.662parcels\osaf\framework\blocks\block.py:94(post)2/1 0.000 0.000 3.662 3.662parcels\osaf\framework\blocks\block.py:539(dispatchEvent)129/1 0.005 0.000 3.662 3.662parcels\osaf\framework\blocks\block.py:602(broadcast)129/126 0.002 0.000 3.639 0.029parcels\osaf\framework\blocks\block.py:542(callProfiledMethod)1 0.000 0.000 3.600 3.600parcels\osaf\framework\blocks\trunk.py:65(onSelectItemEvent)2/1 0.000 0.000 3.600 3.600parcels\osaf\framework\blocks\trunk.py:33(wxSynchronizeWidget)2/1 0.001 0.000 3.598 3.598parcels\osaf\framework\blocks\trunk.py:81(installTreeOfBlocks)68/1 0.031 0.000 3.521 3.521parcels\osaf\framework\blocks\block.py:199(render)2698/1110 0.104 0.000 2.618 0.002repository\util\linkedmap.py:389(__iter__)219/136 0.020 0.000 2.602 0.019repository\item\refcollections.py:407(_load)4154/1175 0.069 0.000 2.561 0.002repository\util\linkedmap.py:398(iterkeys)8791/1640 0.037 0.000 2.549 0.002repository\util\linkedmap.py:124(_get)792/574 0.005 0.000 1.850 0.003repository\persistence\repositoryview.py:921(_setLoading)140 0.003 0.000 1.722 0.012repository\persistence\dbitemio.py:475(<lambda>)140 0.008 0.000 1.719 0.012repository\schema\kind.py:63(_setupClass)20 0.014 0.001 1.711 0.086repository\schema\kind.py:93(_setupDescriptors)1429/667 0.111 0.000 1.686 0.003repository\schema\kind.py:418(iterAttributes)

As you can see, it's exactly the same procedure to manipulate the statsobject as with run_tests, but you need to start by loading the'Events.prof' file.

One last technique, which works with either way of getting profilingdata... Suppose we wanted to find out who's calling __iter__ inlinkedmap.py 1100 times. We can use 'print_callers()' instead of'print_callees':


>>> s.print_callers("linkedmap.py:389")
   Ordered by: internal time
   List reduced from 874 to 1 due to restriction <'linkedmap.py:389'>

Function
    was called by...
repository\util\linkedmap.py:389(__iter__)
    parcels\osaf\framework\blocks\block.py:199(render)(134)    3.521
    parcels\osaf\framework\blocks\block.py:267(unRender)(9)    0.018
    parcels\osaf\framework\blocks\block.py:467(pushView)(49)    0.499
    parcels\osaf\framework\blocks\block.py:602(broadcast)(258)    3.662

parcels\osaf\framework\blocks\containerblocks.py:23(wxSynchronizeWidget)(228)0.079parcels\osaf\framework\blocks\containerblocks.py:308(adjustSplit)(4)0.002parcels\osaf\framework\blocks\containerblocks.py:345(wxSynchronizeWidget)(2)0.007parcels\osaf\framework\blocks\detail\detail.py:90(synchronizeDetailView)(7)0.111parcels\osaf\framework\blocks\detail\detail.py:105(reNotifyInside)(122)0.108parcels\osaf\framework\blocks\detail\detail.py:203(showReentrant)(3)0.000parcels\osaf\framework\blocks\menusandtoolbars.py:861(selectTool)(10)0.001

    parcels\osaf\pim\items.py:516(getSharedState)(62)    0.004

parcels\osaf\views\main\sidebar.py:384(onKindParameterizedEvent)(10)3.666

    repository\item\query.py:60(_run)(24)    0.031
    repository\item\sets.py:106(_iterSource)(8)    0.019
    repository\schema\kind.py:56(onItemLoad)(44)    0.122
    repository\schema\kind.py:418(iterAttributes)(1678)    1.686
    repository\schema\kind.py:567(_kindOf)(46)    0.113


<pstats.Stats instance at 0x09A66BC0>
>>>

As you can see, compared to not having any data, the Python profiling toolsare better than nothing, but on an absolute scale they sort of suck. ThePython profiler hasn't received much development attention for many years;in fact the stats facility is so old that it goes back to before Python had*packages*, so the output is pretty sucky. If we're going to be doing alot of profiling, it might be worthwhile to invest some effort in creatingbetter reporting/analysis tools to let us crunch the raw data better.


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev

[Dev] Profiling techniques

Reply via email to