A while back, I added some stuff to the detail view to turn on/off
profiling for CPIA events. If you Go to test->Start Profiler, you'll
get a file called 'Events.prof' in the current directory. The nice
thing about this is that it ONLY logs CPIA events as they are
dispatched, and does not profile stuff like mouse movement and so
forth. You can profile events, turn off the profiler, then load up the
Events.prof file in a python session.. from there, you can do
everything Phillip describes...
However, I haven't used in a while. I used to be able to say:
>>> from pstats import Stats
>>> s = Stats('Events.prof')
but now I get an exception.. not sure whats up with that?
Alec
Phillip J. Eby wrote:
Due to frequent net outages during my attempt at an IRC
profiling clinic earlier this week, we didn't get to go through the
basic stats-printing functions that day. So Katie asked if I would
write up a summary of the stuff that didn't get covered, so that folks
could bring profiling data for part 2 of the clinic.
We had gotten as far as running Python with "-im profile_tests
testname" in order to profile one or more unit tests, and get into a
Python prompt with a stats object:
$ python -im profile_tests
application.tests.TestSchemaAPI.SchemaTests.testDeriveFromCore
.
----------------------------------------------------------------------
Ran 1 test in 1.922s
OK
347794 function calls (345724 primitive calls) in 9.728 CPU
seconds
Ordered by: internal time
List reduced from 627 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
204/6 1.738 0.009 6.966 1.161
release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
1 0.611 0.611 4.921 4.921
application\schema.py:1(?)
1 0.475 0.475 0.486 0.486
repository\schema\types.py:2(?)
1 0.262 0.262 0.400 0.400
application\parcel.py:4(?)
36 0.259 0.007 0.331 0.009
repository\item\item.py:28(__init__)
4491/3539 0.219 0.000 1.928 0.001
repository\schema\kind.py:418(iterAttributes)
3316 0.189 0.000 0.642 0.000
repository\util\linkedmap.py:156(__setitem__)
1 0.173 0.173 0.175 0.175
repository\schema\kind.py:2(?)
450 0.157 0.000 0.429 0.001
repository\util\classloader.py:11(loadClass)
10459 0.155 0.000 0.223 0.000
repository\util\linkedmap.py:398(iterkeys)
>>>
At this point, the local variable 's' is a "Stats" object:
>>> s
<pstats.Stats instance at 0x025AC760>
The Stats class is documented in detail at:
http://python.org/doc/2.4.1/lib/profile-stats.html
But the key features you need to know about are 'sort_stats()' and
'print_stats()'. Here, I'll sort the calls by "cumulative time", and
print the top 20 functions/methods:
>>> s.sort_stats('cum').print_stats(20)
347794 function calls (345724 primitive calls) in 9.728 CPU
seconds
Ordered by: cumulative time
List reduced from 627 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 9.728 9.728 <string>:1(?)
1 0.000 0.000 9.728 9.728
release\bin\lib\unittest.py:743(__init__)
3 0.000 0.000 7.433 2.478
application\schema.py:1201(reset)
6 0.039 0.007 7.267 1.211
repository\persistence\repositoryview.py:392(loadPack)
204/6 0.007 0.000 6.966 1.161
repository\util\sax.py:183(parseFile)
204/6 1.738 0.009 6.966 1.161
release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
3 0.000 0.000 5.935 1.978
repository\persistence\repositoryview.py:1053(__init__)
3 0.000 0.000 5.934 1.978
repository\persistence\repositoryview.py:39(__init__)
3 0.000 0.000 5.934 1.978
repository\persistence\repositoryview.py:92(openView)
3 0.001 0.000 5.933 1.978
repository\persistence\repositoryview.py:116(_loadSchema)
1 0.046 0.046 5.104 5.104
release\bin\lib\unittest.py:766(parseArgs)
1 0.000 0.000 5.058 5.058
release\bin\lib\unittest.py:789(createTests)
1 0.000 0.000 5.058 5.058
release\bin\lib\unittest.py:552(loadTestsFromNames)
1 0.082 0.082 5.058 5.058
release\bin\lib\unittest.py:510(loadTestsFromName)
1 0.053 0.053 4.976 4.976
application\tests\testschemaapi.py:1(?)
1 0.611 0.611 4.921 4.921
application\schema.py:1(?)
1 0.000 0.000 4.623 4.623
release\bin\lib\unittest.py:793(runTests)
1 0.000 0.000 4.623 4.623
release\bin\lib\unittest.py:692(run)
1 0.000 0.000 4.623 4.623
release\bin\lib\unittest.py:427(__call__)
1 0.000 0.000 4.623 4.623
release\bin\lib\unittest.py:420(run)
<pstats.Stats instance at 0x025AC760>
>>>
Details on how to read these printouts, by the way, can be found at:
http://python.org/doc/2.4.1/lib/module-profile.html
Different sort orders are useful for different kinds of analysis. If
you are trying to find out what code to speed up, a sort by "time" is
more useful, but I find that if you're trying to make your program
*faster* - i.e., make it run less code, then a cumulative sort is
usually more useful. Nonetheless, I always look at the "time" sort
first, because sometimes one or two functions are using the lion's
share of the program's run time, and you want to pay attention to
that. However, usually that lion's share isn't because the function is
slow, but because it's being run thousands or tens of thousands of
times. (This is one way that the 'percall' columns come in handy; if
the percall is 0.000, then the function is going to be hard to
optimize, and you need to look further up the call tree in order to
stop calling the function so often.)
So, sorting by cumulative time usually gives you a better grasp on the
overall situation. If you look at the first stats printout above,
you'll get the impression that there are a couple of seconds spent on
XML processing, and that everything else is a quarter-second here, half
second there, no big deal. However, if you look at the second
printout, you'll see that over 70% of this test's runtime is spent in
loading repository packs. Looking at the 'ncalls' column, you can see
that schema.reset() is called 6 times, and ends up invoking loadPack()
6 times, but loadPack() is then recursively reinvoked a total of 204
times, indicating that each initial loadPack() causes 34, maybe 35 XML
files to be loaded. And you can figure this out just from the profile.
However, if you go down the chart a little, you'll also see that 5.1
seconds is spent in code called from the unittest module itself,
including 4.976 seconds in TestSchemaAPI.py itself. Where is this time
coming from? The ":1(?)" at the end of the line tells us that this is
the module initialization code, and most of the time in module
initialization code is spent doing imports. This suggests that almost
half of the profiled runtime is spent simply importing the repository
and the libraries it uses, and that perhaps much of the loadPack() call
time is attributable to importing as well.
What does this tell us? Well, what it tells me is that profile_tests
is broken. :) Including the import time in this profile is making it
impossible to see what's really going on. Mostly, I've been using
profile_tests to profile things that take so long as to make the import
times moot, or else things where I *wanted* the import time included,
like profiling Chandler startup. This example, however, makes it clear
that unittest's gathering of tests needs to occur before the profiler
is started, and I will be changing profile_tests to do exactly that.
So, rerunning the statistics with the new and improved profile_tests
program:
>>> s.sort_stats('cum').print_stats(20)
238484 function calls (237037 primitive calls) in 4.221 CPU
seconds
Ordered by: cumulative time
List reduced from 441 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 4.221 4.221
release\bin\lib\unittest.py:793(runTests)
1 0.000 0.000 4.220 4.220
release\bin\lib\unittest.py:692(run)
1 0.000 0.000 4.220 4.220
release\bin\lib\unittest.py:427(__call__)
1 0.000 0.000 4.220 4.220
release\bin\lib\unittest.py:420(run)
1 0.000 0.000 4.220 4.220
release\bin\lib\unittest.py:280(__call__)
1 0.000 0.000 4.220 4.220
release\bin\lib\unittest.py:245(run)
2 0.000 0.000 3.806 1.903
application\schema.py:1201(reset)
4 0.000 0.000 3.700 0.925
repository\persistence\repositoryview.py:392(loadPack)
136/4 0.005 0.000 3.698 0.924
repository\util\sax.py:183(parseFile)
136/4 0.490 0.004 3.698 0.924
release\bin\lib\site-packages\libxml2.py:1357(SAXParseFile)
2 0.000 0.000 2.826 1.413
repository\persistence\repositoryview.py:1053(__init__)
2 0.000 0.000 2.825 1.413
repository\persistence\repositoryview.py:39(__init__)
2 0.000 0.000 2.825 1.413
repository\persistence\repositoryview.py:92(openView)
2 0.000 0.000 2.825 1.412
repository\persistence\repositoryview.py:116(_loadSchema)
82 0.002 0.000 2.177 0.027
repository\persistence\packhandler.py:38(startElement)
78 0.008 0.000 2.172 0.028
repository\persistence\packhandler.py:98(itemStart)
132 0.017 0.000 2.152 0.016
repository\persistence\packhandler.py:144(loadItem)
1 0.000 0.000 2.139 2.139
application\tests\testschemaapi.py:29(tearDown)
1 0.000 0.000 1.861 1.861
application\tests\testschemaapi.py:26(setUp)
82 0.002 0.000 1.502 0.018
repository\persistence\packhandler.py:52(endElement)
This gives us a different picture. Now, loadPack() is over 87% of the
runtime, with the actual XML parsing (in libxml2.SAXParseFile)
accounting for 10%. So, somewhere around 77% of the runtime is in
processing the loaded pack data. Looking at the cumulative times for
startElement and endElement -- 3.679 when put together -- appears to
confirm this hypothesis.
(By the way, in case I didn't mention it earlier, the times shown in
the profiler are *relative*, not real time. The test takes under 1
second in "real life", so you can't treat these times as having
something to do with wall clock time.)
To go deeper, we can now begin to look at call trees:
>>> s.print_callees('startElement')
Ordered by: internal time
List reduced from 441 to 3 due to restriction <'startElement'>
Function
called...
repository\item\itemhandler.py:753(startElement)
repository\item\itemhandler.py:77(startDocument)(306) 0.007
repository\item\itemhandler.py:85(startElement)(2434) 0.248
repository\item\itemhandler.py:415(__init__)(306) 0.037
repository\item\itemhandler.py:85(startElement)
repository\item\itemhandler.py:160(attributeStart)(396)
0.056
repository\item\itemhandler.py:176(valueStart)(116) 0.033
repository\item\itemhandler.py:427(refStart)(554) 0.081
repository\item\itemhandler.py:468(itemStart)(306) 0.038
repository\schema\types.py:1424(valueStart)(20) 0.001
repository\util\sax.py:173(errorOccurred)(2434) 0.048
repository\persistence\packhandler.py:38(startElement)
repository\persistence\packhandler.py:61(packStart)(4) 0.004
repository\persistence\packhandler.py:98(itemStart)(78)
2.137
repository\util\sax.py:173(errorOccurred)(82) 0.048
<pstats.Stats instance at 0x0257A5D0>
>>>
This actually gave more data than we wanted; we wanted the
"startElement" function on line 38 of packhandler.py:
>>> s.print_callees('packhandler.py:38')
Ordered by: internal time
List reduced from 441 to 1 due to restriction
<'packhandler.py:38'>
Function
called...
repository\persistence\packhandler.py:38(startElement)
repository\persistence\packhandler.py:61(packStart)(4) 0.004
repository\persistence\packhandler.py:98(itemStart)(78)
2.137
repository\util\sax.py:173(errorOccurred)(82) 0.048
<pstats.Stats instance at 0x0257A5D0>
>>>
Ah, that's better. print_callees() and print_callers() accept a string
that's matched against the output lines, so you can use all or part of
a name or filename/line # to narrow down the output. Anyway, we can
now see that most of the time spent in startElement() is actually being
spent in itemStart (line 98 of packhandler.py). And we can drill into
that:
>>> s.print_callees('packhandler.py:98')
Ordered by: internal time
List reduced from 441 to 1 due to restriction
<'packhandler.py:98'>
Function
called...
repository\persistence\packhandler.py:98(itemStart)
repository\persistence\packhandler.py:144(loadItem)(132) 2.114
repository\persistence\repositoryview.py:282(find)(4) 0.114
c:\cygwin\home\pje\c6\chandler\repository\util\path.py:19(__init__)(4)
0.032
c:\cygwin\home\pje\c6\chandler\repository\util\sax.py:173(errorOccurred)(64)
0.048
release\bin\lib\ntpath.py:59(join)(140) 0.011
release\bin\lib\sre.py:178(compile)(6) 0.000
<pstats.Stats instance at 0x0257A5D0>
>>>
So, 2.114 seconds are spent in loadItem(), which is very close to the
total time spent in startElement and is 50% of the total test execution
time.
Since I only chose to analyze this test because it's very very short,
and because I've now shown nearly all of the ways to use a stats
object, I'm going to stop analyzing it here, and transition to
profiling the Chandler UI.
Profiling in the Chandler UI is pretty simple. From the "Test" menu
select "Start Profiler", and then do whatever it is you want to
profile. Then select "Stop Profiler". This creates a file called
"Events.prof", which you can analyze by opening a Python shell from the
"Test" menu, and then doing something like this (it helps if you
maximize the window):
>>> from hotshot.stats import load
>>> s=load('Events.prof')
>>> s.sort_stats("cum").print_stats(20)
187634 function calls (170697 primitive calls) in 3.675 CPU
seconds
Ordered by: cumulative time
List reduced from 874 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
185/56 0.004 0.000 3.675 0.066
parcels\osaf\framework\blocks\block.py:563(callNamedMethod)
1 0.001 0.001 3.666 3.666
parcels\osaf\views\main\sidebar.py:384(onKindParameterizedEvent)
2/1 0.000 0.000 3.662 3.662
parcels\osaf\framework\blocks\block.py:122(postEventByName)
2/1 0.000 0.000 3.662 3.662
parcels\osaf\framework\blocks\block.py:94(post)
2/1 0.000 0.000 3.662 3.662
parcels\osaf\framework\blocks\block.py:539(dispatchEvent)
129/1 0.005 0.000 3.662 3.662
parcels\osaf\framework\blocks\block.py:602(broadcast)
129/126 0.002 0.000 3.639 0.029
parcels\osaf\framework\blocks\block.py:542(callProfiledMethod)
1 0.000 0.000 3.600 3.600
parcels\osaf\framework\blocks\trunk.py:65(onSelectItemEvent)
2/1 0.000 0.000 3.600 3.600
parcels\osaf\framework\blocks\trunk.py:33(wxSynchronizeWidget)
2/1 0.001 0.000 3.598 3.598
parcels\osaf\framework\blocks\trunk.py:81(installTreeOfBlocks)
68/1 0.031 0.000 3.521 3.521
parcels\osaf\framework\blocks\block.py:199(render)
2698/1110 0.104 0.000 2.618 0.002
repository\util\linkedmap.py:389(__iter__)
219/136 0.020 0.000 2.602 0.019
repository\item\refcollections.py:407(_load)
4154/1175 0.069 0.000 2.561 0.002
repository\util\linkedmap.py:398(iterkeys)
8791/1640 0.037 0.000 2.549 0.002
repository\util\linkedmap.py:124(_get)
792/574 0.005 0.000 1.850 0.003
repository\persistence\repositoryview.py:921(_setLoading)
140 0.003 0.000 1.722 0.012
repository\persistence\dbitemio.py:475(<lambda>)
140 0.008 0.000 1.719 0.012
repository\schema\kind.py:63(_setupClass)
20 0.014 0.001 1.711 0.086
repository\schema\kind.py:93(_setupDescriptors)
1429/667 0.111 0.000 1.686 0.003
repository\schema\kind.py:418(iterAttributes)
As you can see, it's exactly the same procedure to manipulate the stats
object as with run_tests, but you need to start by loading the
'Events.prof' file.
One last technique, which works with either way of getting profiling
data... Suppose we wanted to find out who's calling __iter__ in
linkedmap.py 1100 times. We can use 'print_callers()' instead of
'print_callees':
>>> s.print_callers("linkedmap.py:389")
Ordered by: internal time
List reduced from 874 to 1 due to restriction
<'linkedmap.py:389'>
Function
was called by...
repository\util\linkedmap.py:389(__iter__)
parcels\osaf\framework\blocks\block.py:199(render)(134) 3.521
parcels\osaf\framework\blocks\block.py:267(unRender)(9) 0.018
parcels\osaf\framework\blocks\block.py:467(pushView)(49) 0.499
parcels\osaf\framework\blocks\block.py:602(broadcast)(258) 3.662
parcels\osaf\framework\blocks\containerblocks.py:23(wxSynchronizeWidget)(228)
0.079
parcels\osaf\framework\blocks\containerblocks.py:308(adjustSplit)(4)
0.002
parcels\osaf\framework\blocks\containerblocks.py:345(wxSynchronizeWidget)(2)
0.007
parcels\osaf\framework\blocks\detail\detail.py:90(synchronizeDetailView)(7)
0.111
parcels\osaf\framework\blocks\detail\detail.py:105(reNotifyInside)(122)
0.108
parcels\osaf\framework\blocks\detail\detail.py:203(showReentrant)(3)
0.000
parcels\osaf\framework\blocks\menusandtoolbars.py:861(selectTool)(10)
0.001
parcels\osaf\pim\items.py:516(getSharedState)(62) 0.004
parcels\osaf\views\main\sidebar.py:384(onKindParameterizedEvent)(10)
3.666
repository\item\query.py:60(_run)(24) 0.031
repository\item\sets.py:106(_iterSource)(8) 0.019
repository\schema\kind.py:56(onItemLoad)(44) 0.122
repository\schema\kind.py:418(iterAttributes)(1678) 1.686
repository\schema\kind.py:567(_kindOf)(46) 0.113
<pstats.Stats instance at 0x09A66BC0>
>>>
As you can see, compared to not having any data, the Python profiling
tools are better than nothing, but on an absolute scale they sort of
suck. The Python profiler hasn't received much development attention
for many years; in fact the stats facility is so old that it goes back
to before Python had *packages*, so the output is pretty sucky. If
we're going to be doing a lot of profiling, it might be worthwhile to
invest some effort in creating better reporting/analysis tools to let
us crunch the raw data better.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev
|