On Wed, Aug 18, 2010 at 3:44 PM, Johan Corveleyn <jcor...@gmail.com> wrote: > On Wed, Aug 18, 2010 at 12:49 PM, Stefan Sperling <s...@elego.de> wrote: >> Can you show a profiler run that illustrates where the client is >> spending most of its time during diff? That would probably help with >> getting opinions from people, because it saves them from spending time >> doing this research themselves. >> You've already hinted at svn_diff__get_tokens() in another mail, but >> a real profiler run would show more candidates. > > Sorry, I'm still very much a beginner in C (I've been programming for > 10 years, but mainly in Java). Especially the tooling is a steep part > of the learning curve :-), so I don't know (how to use) a profiler > yet. Any suggestions? I'm on Windows (XP), using VCE 2008, and used > cygwin to compare with GNU diff.
I googled around a bit for C profilers on Windows. I found several which seem useful: - very sleepy (http://www.codersnotes.com/sleepy/sleepy) - Shiny (http://sourceforge.net/projects/shinyprofiler/) - AMD CodeAnalyst (http://developer.amd.com/cpu/CodeAnalyst/Pages/default.aspx) - AQTime - not free, but has a trial of 30 days (http://www.automatedqa.com/products/aqtime/) Before I dive in and try them out: any preference/favorites from the windows devs on this list? Or other suggestions? Further, some additional context to the "manual-profiling numbers": see below. > For the time being, I've helped myself using "apr_time_now()" from > apr_time.h and printf statements. Not terribly accurate and probably > somewhat overheadish, but it gives me some hints about the > bottlenecks. > > FWIW, below is the output of a recent run with my instrumented trunk > build. I've instrumented svn_diff_diff in diff.c and > svn_diff__get_tokens in token.c. I checked out bigfile.xml from a > repository, and edited a single line of it in my working copy. The two > statements "Starting diff" and "Diff finished" are the first and last > statements inside the svn_diff_diff function. These are numbers from > the second run (any following runs show approximately the same > numbers). > > $ time svn diff bigfile.xml > Starting diff ... (entered svn_diff_diff in diff.c) > -> calling svn_diff__get_tokens for svn_diff_datasource_original > == svn_diff__get_tokens datasource_open: 0 usec > == svn_diff__get_tokens while loop: 265625 usec > calls to datasource_get_next_token: 62500 usec > svn_diff__tree_insert_token: 171875 usec > rest: 15625 usec > <- done: 281250 usec > -> calling svn_diff__get_tokens for svn_diff_datasource_modified > == svn_diff__get_tokens datasource_open: 234375 usec > == svn_diff__get_tokens while loop: 312500 usec > calls to datasource_get_next_token: 62500 usec > svn_diff__tree_insert_token: 171875 usec > rest: 46875 usec > <- done: 562500 usec > -> calling svn_diff__lcs > <- done: 0 usec > -> calling svn_diff__diff > <- done: 0 usec > Diff finished in 875000 usec (875 millis) > Index: bigfile.xml > =================================================================== > [snip] > > real 0m1.266s > user 0m0.015s > sys 0m0.031s For comparison: a debug build from trunk, with only one instrumentation spot (at start and end of svn_diff_diff): $ time svn diff bigfile.xml Starting diff ... (entered svn_diff_diff in diff.c) Diff finished in 703125 usec (703 millis) [snip] real 0m1.109s user 0m0.015s sys 0m0.031s So the instrumentation in the critical loop probably costs me around 150-200 ms. A release build will also probably be faster, but I have no time to make one and test it now. If I time my 1.6.5 build from tigris.org, that's still a lot faster: $ time svn diff bigfile.xml [snip] real 0m0.468s user 0m0.015s sys 0m0.031s Maybe that can be totally attributed to the build (release vs. debug), or maybe 1.6.5 was faster at svn diff than trunk is? > Some observations: > - svn_diff__get_tokens takes most of the time > - for some reason, in the case of datasource_modified, the call to > datasource_open takes a long time (234 ms). In case of > datasource_original, it's instantaneous. Maybe because of translation, > ... of the pristine file? But I'd think that would be the original?? > - apart from that, most of the time goes to the while loop in > svn_diff__get_tokens. > - Inside the while loop, most time is spent on calls to > svn_diff__tree_insert_token (which compares tokens (=lines) to insert > them into some tree structure). Calls to datasource_get_next_token > also take some time. > > I'm not too sure of the accuracy of those last numbers with my simple > profiling method, because it's the accumulated time of calls inside a > while loop (with 61000 iterations). > > > For completeness, the same diff with /usr/bin/diff (under cygwin), of > the edited bigfile.xml vs. the original, as of the second diff run: > > $ ls -l settings.xml > -rwxr-xr-x+ 1 User None 1447693 2010-08-17 14:20 bigfile.xml > $ time diff bigfile.xml bigfile_orig.xml > [snip] > > real 0m0.078s > user 0m0.046s > sys 0m0.046s > > Cheers, > -- > Johan -- Johan