Re: [Dev] Profiling techniques

Alec Flett Fri, 09 Sep 2005 15:49:25 -0700

A while back, I added some stuff to the detail view to turn on/off profiling for CPIA events. If you Go to test->Start Profiler, you'll get a file called 'Events.prof' in the current directory. The nice thing about this is that it ONLY logs CPIA events as they are dispatched, and does not profile stuff like mouse movement and so forth. You can profile events, turn off the profiler, then load up the Events.prof file in a python session.. from there, you can do everything Phillip describes...

However, I haven't used in a while. I used to be able to say:
>>> from pstats import Stats
>>> s = Stats('Events.prof')

but now I get an exception.. not sure whats up with that?

Alec

Phillip J. Eby wrote:

Due to frequent net outages during my attempt at an IRC profiling clinic earlier this week, we didn't get to go through the basic stats-printing functions that day. So Katie asked if I would write up a summary of the stuff that didn't get covered, so that folks could bring profiling data for part 2 of the clinic.

We had gotten as far as running Python with "-im profile_tests testname" in order to profile one or more unit tests, and get into a Python prompt with a stats object:

$ python -im profile_tests application.tests.TestSchemaAPI.SchemaTests.testDeriveFromCore
.
----------------------------------------------------------------------
Ran 1 test in 1.922s

OK
         347794 function calls (345724 primitive calls) in 9.728 CPU seconds

   Ordered by: internal time
   List reduced from 627 to 10 due to restriction <10>

   ncalls tottime percall cumtime percall filename:lineno(function)
    204/6    1.738    0.009    6.966    1.161 release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
        1    0.611    0.611    4.921    4.921 application\schema.py:1(?)
        1    0.475    0.475    0.486    0.486 repository\schema\types.py:2(?)
        1    0.262    0.262    0.400    0.400 application\parcel.py:4(?)
       36    0.259    0.007    0.331    0.009 repository\item\item.py:28(__init__)
4491/3539    0.219    0.000    1.928    0.001 repository\schema\kind.py:418(iterAttributes)
     3316    0.189    0.000    0.642    0.000 repository\util\linkedmap.py:156(__setitem__)
        1    0.173    0.173    0.175    0.175 repository\schema\kind.py:2(?)
      450    0.157    0.000    0.429    0.001 repository\util\classloader.py:11(loadClass)
    10459    0.155    0.000    0.223    0.000 repository\util\linkedmap.py:398(iterkeys)

>>>

At this point, the local variable 's' is a "Stats" object:

>>> s
<pstats.Stats instance at 0x025AC760>

The Stats class is documented in detail at:

    http://python.org/doc/2.4.1/lib/profile-stats.html

But the key features you need to know about are 'sort_stats()' and 'print_stats()'. Here, I'll sort the calls by "cumulative time", and print the top 20 functions/methods:

>>> s.sort_stats('cum').print_stats(20)
         347794 function calls (345724 primitive calls) in 9.728 CPU seconds

   Ordered by: cumulative time
   List reduced from 627 to 20 due to restriction <20>

   ncalls tottime percall cumtime percall filename:lineno(function)
        1    0.000    0.000    9.728    9.728 <string>:1(?)
        1    0.000    0.000    9.728    9.728 release\bin\lib\unittest.py:743(__init__)
        3    0.000    0.000    7.433    2.478 application\schema.py:1201(reset)
        6    0.039    0.007    7.267    1.211 repository\persistence\repositoryview.py:392(loadPack)
    204/6    0.007    0.000    6.966    1.161 repository\util\sax.py:183(parseFile)
    204/6    1.738    0.009    6.966    1.161 release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
        3    0.000    0.000    5.935    1.978 repository\persistence\repositoryview.py:1053(__init__)
        3    0.000    0.000    5.934    1.978 repository\persistence\repositoryview.py:39(__init__)
        3    0.000    0.000    5.934    1.978 repository\persistence\repositoryview.py:92(openView)
        3    0.001    0.000    5.933    1.978 repository\persistence\repositoryview.py:116(_loadSchema)
        1    0.046    0.046    5.104    5.104 release\bin\lib\unittest.py:766(parseArgs)
        1    0.000    0.000    5.058    5.058 release\bin\lib\unittest.py:789(createTests)
        1    0.000    0.000    5.058    5.058 release\bin\lib\unittest.py:552(loadTestsFromNames)
        1    0.082    0.082    5.058    5.058 release\bin\lib\unittest.py:510(loadTestsFromName)
        1    0.053    0.053    4.976    4.976 application\tests\testschemaapi.py:1(?)
        1    0.611    0.611    4.921    4.921 application\schema.py:1(?)
        1    0.000    0.000    4.623    4.623 release\bin\lib\unittest.py:793(runTests)
        1    0.000    0.000    4.623    4.623 release\bin\lib\unittest.py:692(run)
        1    0.000    0.000    4.623    4.623 release\bin\lib\unittest.py:427(__call__)
        1    0.000    0.000    4.623    4.623 release\bin\lib\unittest.py:420(run)

<pstats.Stats instance at 0x025AC760>
>>>

Details on how to read these printouts, by the way, can be found at:

    http://python.org/doc/2.4.1/lib/module-profile.html

Different sort orders are useful for different kinds of analysis. If you are trying to find out what code to speed up, a sort by "time" is more useful, but I find that if you're trying to make your program *faster* - i.e., make it run less code, then a cumulative sort is usually more useful. Nonetheless, I always look at the "time" sort first, because sometimes one or two functions are using the lion's share of the program's run time, and you want to pay attention to that. However, usually that lion's share isn't because the function is slow, but because it's being run thousands or tens of thousands of times. (This is one way that the 'percall' columns come in handy; if the percall is 0.000, then the function is going to be hard to optimize, and you need to look further up the call tree in order to stop calling the function so often.)

So, sorting by cumulative time usually gives you a better grasp on the overall situation. If you look at the first stats printout above, you'll get the impression that there are a couple of seconds spent on XML processing, and that everything else is a quarter-second here, half second there, no big deal. However, if you look at the second printout, you'll see that over 70% of this test's runtime is spent in loading repository packs. Looking at the 'ncalls' column, you can see that schema.reset() is called 6 times, and ends up invoking loadPack() 6 times, but loadPack() is then recursively reinvoked a total of 204 times, indicating that each initial loadPack() causes 34, maybe 35 XML files to be loaded. And you can figure this out just from the profile.

However, if you go down the chart a little, you'll also see that 5.1 seconds is spent in code called from the unittest module itself, including 4.976 seconds in TestSchemaAPI.py itself. Where is this time coming from? The ":1(?)" at the end of the line tells us that this is the module initialization code, and most of the time in module initialization code is spent doing imports. This suggests that almost half of the profiled runtime is spent simply importing the repository and the libraries it uses, and that perhaps much of the loadPack() call time is attributable to importing as well.

What does this tell us? Well, what it tells me is that profile_tests is broken. :) Including the import time in this profile is making it impossible to see what's really going on. Mostly, I've been using profile_tests to profile things that take so long as to make the import times moot, or else things where I *wanted* the import time included, like profiling Chandler startup. This example, however, makes it clear that unittest's gathering of tests needs to occur before the profiler is started, and I will be changing profile_tests to do exactly that. So, rerunning the statistics with the new and improved profile_tests program:

>>> s.sort_stats('cum').print_stats(20)
         238484 function calls (237037 primitive calls) in 4.221 CPU seconds

   Ordered by: cumulative time
   List reduced from 441 to 20 due to restriction <20>

   ncalls tottime percall cumtime percall filename:lineno(function)
        1    0.000    0.000    4.221    4.221 release\bin\lib\unittest.py:793(runTests)
        1    0.000    0.000    4.220    4.220 release\bin\lib\unittest.py:692(run)
        1    0.000    0.000    4.220    4.220 release\bin\lib\unittest.py:427(__call__)
        1    0.000    0.000    4.220    4.220 release\bin\lib\unittest.py:420(run)
        1    0.000    0.000    4.220    4.220 release\bin\lib\unittest.py:280(__call__)
        1    0.000    0.000    4.220    4.220 release\bin\lib\unittest.py:245(run)
        2    0.000    0.000    3.806    1.903 application\schema.py:1201(reset)
        4    0.000    0.000    3.700    0.925 repository\persistence\repositoryview.py:392(loadPack)
    136/4    0.005    0.000    3.698    0.924 repository\util\sax.py:183(parseFile)
    136/4    0.490    0.004    3.698    0.924 release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
        2    0.000    0.000    2.826    1.413 repository\persistence\repositoryview.py:1053(__init__)
        2    0.000    0.000    2.825    1.413 repository\persistence\repositoryview.py:39(__init__)
        2    0.000    0.000    2.825    1.413 repository\persistence\repositoryview.py:92(openView)
        2    0.000    0.000    2.825    1.412 repository\persistence\repositoryview.py:116(_loadSchema)
       82    0.002    0.000    2.177    0.027 repository\persistence\packhandler.py:38(startElement)
       78    0.008    0.000    2.172    0.028 repository\persistence\packhandler.py:98(itemStart)
      132    0.017    0.000    2.152    0.016 repository\persistence\packhandler.py:144(loadItem)
        1    0.000    0.000    2.139    2.139 application\tests\testschemaapi.py:29(tearDown)
        1    0.000    0.000    1.861    1.861 application\tests\testschemaapi.py:26(setUp)
       82    0.002    0.000    1.502    0.018 repository\persistence\packhandler.py:52(endElement)

This gives us a different picture. Now, loadPack() is over 87% of the runtime, with the actual XML parsing (in libxml2.SAXParseFile) accounting for 10%. So, somewhere around 77% of the runtime is in processing the loaded pack data. Looking at the cumulative times for startElement and endElement -- 3.679 when put together -- appears to confirm this hypothesis.

(By the way, in case I didn't mention it earlier, the times shown in the profiler are *relative*, not real time. The test takes under 1 second in "real life", so you can't treat these times as having something to do with wall clock time.)

To go deeper, we can now begin to look at call trees:

>>> s.print_callees('startElement')
   Ordered by: internal time
   List reduced from 441 to 3 due to restriction <'startElement'>

Function
       called...
repository\item\itemhandler.py:753(startElement)
        repository\item\itemhandler.py:77(startDocument)(306)    0.007

        repository\item\itemhandler.py:85(startElement)(2434)    0.248

        repository\item\itemhandler.py:415(__init__)(306)    0.037

repository\item\itemhandler.py:85(startElement)
        repository\item\itemhandler.py:160(attributeStart)(396)    0.056

        repository\item\itemhandler.py:176(valueStart)(116)    0.033

        repository\item\itemhandler.py:427(refStart)(554)    0.081

        repository\item\itemhandler.py:468(itemStart)(306)    0.038

        repository\schema\types.py:1424(valueStart)(20)    0.001

        repository\util\sax.py:173(errorOccurred)(2434)    0.048

repository\persistence\packhandler.py:38(startElement)
        repository\persistence\packhandler.py:61(packStart)(4)    0.004

        repository\persistence\packhandler.py:98(itemStart)(78)    2.137

        repository\util\sax.py:173(errorOccurred)(82)    0.048

<pstats.Stats instance at 0x0257A5D0>
>>>

This actually gave more data than we wanted; we wanted the "startElement" function on line 38 of packhandler.py:

>>> s.print_callees('packhandler.py:38')
   Ordered by: internal time
   List reduced from 441 to 1 due to restriction <'packhandler.py:38'>

Function
       called...
repository\persistence\packhandler.py:38(startElement)
        repository\persistence\packhandler.py:61(packStart)(4)    0.004

        repository\persistence\packhandler.py:98(itemStart)(78)    2.137

        repository\util\sax.py:173(errorOccurred)(82)    0.048

<pstats.Stats instance at 0x0257A5D0>
>>>

Ah, that's better. print_callees() and print_callers() accept a string that's matched against the output lines, so you can use all or part of a name or filename/line # to narrow down the output. Anyway, we can now see that most of the time spent in startElement() is actually being spent in itemStart (line 98 of packhandler.py). And we can drill into that:

>>> s.print_callees('packhandler.py:98')
   Ordered by: internal time
   List reduced from 441 to 1 due to restriction <'packhandler.py:98'>

Function
    called...
repository\persistence\packhandler.py:98(itemStart)
     repository\persistence\packhandler.py:144(loadItem)(132)    2.114

     repository\persistence\repositoryview.py:282(find)(4)    0.114

     c:\cygwin\home\pje\c6\chandler\repository\util\path.py:19(__init__)(4) 0.032

     c:\cygwin\home\pje\c6\chandler\repository\util\sax.py:173(errorOccurred)(64) 0.048

     release\bin\lib\ntpath.py:59(join)(140)    0.011

     release\bin\lib\sre.py:178(compile)(6)    0.000

<pstats.Stats instance at 0x0257A5D0>
>>>

So, 2.114 seconds are spent in loadItem(), which is very close to the total time spent in startElement and is 50% of the total test execution time.

Since I only chose to analyze this test because it's very very short, and because I've now shown nearly all of the ways to use a stats object, I'm going to stop analyzing it here, and transition to profiling the Chandler UI.

Profiling in the Chandler UI is pretty simple. From the "Test" menu select "Start Profiler", and then do whatever it is you want to profile. Then select "Stop Profiler". This creates a file called "Events.prof", which you can analyze by opening a Python shell from the "Test" menu, and then doing something like this (it helps if you maximize the window):

>>> from hotshot.stats import load
>>> s=load('Events.prof')
>>> s.sort_stats("cum").print_stats(20)
         187634 function calls (170697 primitive calls) in 3.675 CPU seconds

   Ordered by: cumulative time
   List reduced from 874 to 20 due to restriction <20>

   ncalls tottime percall cumtime percall filename:lineno(function)
   185/56    0.004    0.000    3.675    0.066 parcels\osaf\framework\blocks\block.py:563(callNamedMethod)
        1    0.001    0.001    3.666    3.666 parcels\osaf\views\main\sidebar.py:384(onKindParameterizedEvent)
      2/1    0.000    0.000    3.662    3.662 parcels\osaf\framework\blocks\block.py:122(postEventByName)
      2/1    0.000    0.000    3.662    3.662 parcels\osaf\framework\blocks\block.py:94(post)
      2/1    0.000    0.000    3.662    3.662 parcels\osaf\framework\blocks\block.py:539(dispatchEvent)
    129/1    0.005    0.000    3.662    3.662 parcels\osaf\framework\blocks\block.py:602(broadcast)
129/126    0.002    0.000    3.639    0.029 parcels\osaf\framework\blocks\block.py:542(callProfiledMethod)
        1    0.000    0.000    3.600    3.600 parcels\osaf\framework\blocks\trunk.py:65(onSelectItemEvent)
      2/1    0.000    0.000    3.600    3.600 parcels\osaf\framework\blocks\trunk.py:33(wxSynchronizeWidget)
      2/1    0.001    0.000    3.598    3.598 parcels\osaf\framework\blocks\trunk.py:81(installTreeOfBlocks)
     68/1    0.031    0.000    3.521    3.521 parcels\osaf\framework\blocks\block.py:199(render)
2698/1110    0.104    0.000    2.618    0.002 repository\util\linkedmap.py:389(__iter__)
219/136    0.020    0.000    2.602    0.019 repository\item\refcollections.py:407(_load)
4154/1175    0.069    0.000    2.561    0.002 repository\util\linkedmap.py:398(iterkeys)
8791/1640    0.037    0.000    2.549    0.002 repository\util\linkedmap.py:124(_get)
792/574    0.005    0.000    1.850    0.003 repository\persistence\repositoryview.py:921(_setLoading)
      140    0.003    0.000    1.722    0.012 repository\persistence\dbitemio.py:475(<lambda>)
      140    0.008    0.000    1.719    0.012 repository\schema\kind.py:63(_setupClass)
       20    0.014    0.001    1.711    0.086 repository\schema\kind.py:93(_setupDescriptors)
1429/667    0.111    0.000    1.686    0.003 repository\schema\kind.py:418(iterAttributes)

As you can see, it's exactly the same procedure to manipulate the stats object as with run_tests, but you need to start by loading the 'Events.prof' file.

One last technique, which works with either way of getting profiling data... Suppose we wanted to find out who's calling __iter__ in linkedmap.py 1100 times. We can use 'print_callers()' instead of 'print_callees':

>>> s.print_callers("linkedmap.py:389")
   Ordered by: internal time
   List reduced from 874 to 1 due to restriction <'linkedmap.py:389'>

Function
    was called by...
repository\util\linkedmap.py:389(__iter__)
    parcels\osaf\framework\blocks\block.py:199(render)(134)    3.521
    parcels\osaf\framework\blocks\block.py:267(unRender)(9)    0.018
    parcels\osaf\framework\blocks\block.py:467(pushView)(49)    0.499
    parcels\osaf\framework\blocks\block.py:602(broadcast)(258)    3.662
    parcels\osaf\framework\blocks\containerblocks.py:23(wxSynchronizeWidget)(228) 0.079
    parcels\osaf\framework\blocks\containerblocks.py:308(adjustSplit)(4) 0.002
    parcels\osaf\framework\blocks\containerblocks.py:345(wxSynchronizeWidget)(2) 0.007
    parcels\osaf\framework\blocks\detail\detail.py:90(synchronizeDetailView)(7) 0.111
    parcels\osaf\framework\blocks\detail\detail.py:105(reNotifyInside)(122) 0.108
    parcels\osaf\framework\blocks\detail\detail.py:203(showReentrant)(3) 0.000
    parcels\osaf\framework\blocks\menusandtoolbars.py:861(selectTool)(10) 0.001
    parcels\osaf\pim\items.py:516(getSharedState)(62)    0.004
    parcels\osaf\views\main\sidebar.py:384(onKindParameterizedEvent)(10) 3.666
    repository\item\query.py:60(_run)(24)    0.031
    repository\item\sets.py:106(_iterSource)(8)    0.019
    repository\schema\kind.py:56(onItemLoad)(44)    0.122
    repository\schema\kind.py:418(iterAttributes)(1678)    1.686
    repository\schema\kind.py:567(_kindOf)(46)    0.113

<pstats.Stats instance at 0x09A66BC0>
>>>

As you can see, compared to not having any data, the Python profiling tools are better than nothing, but on an absolute scale they sort of suck. The Python profiler hasn't received much development attention for many years; in fact the stats facility is so old that it goes back to before Python had *packages*, so the output is pretty sucky. If we're going to be doing a lot of profiling, it might be worthwhile to invest some effort in creating better reporting/analysis tools to let us crunch the raw data better.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _


Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev

Re: [Dev] Profiling techniques

Reply via email to