On 29.09.2010 10:45, Ramkumar Ramachandra wrote:
Hi Philip,

Philip Martin writes:
The performance of svnrdump is likely to be dominated by IO from the
repository, network or disk depending on the RA layer.  strace is a
useful tool to see opens/reads/writes.  You can see what order the
calls occur, how many there are, how big they are and how long they
take.
Ah, thanks for the tip.
My measurements seem to support what Philip wrote:
The expensive part is run on the server. Even with my
optimized server, the svnrdump CPU usage is less than
the time taken by the server. Some numbers (hot file
cache):

svnadmin dump
    1.7 trunk 70s real  66s user 4s system
    perf-branch 30s real 28s user 2s system

1.7 trunk svnrdump
    ra-local 88s real 81s user 7s system
    svn: (1.7 trunk) 99s real 6s user 4s system
    svn: (perf-branch, cold)  72s real 5s user 6s system
    svn: (perf-branch, hot)  17s real 5s user 5s system

Thus, svnrdump is slower only for ra-local where it is
of no particular use in the first place. To really speed
things up, the caching infrastructure from the performance
branch should be merged into /trunk.

Valgrind/Callgrind is good and doesn't require you to instrument the
code, but it does help to build with debug information.  It does
impose a massive runtime overhead.
I don't mind -- I'm mostly using some remote machines to gather the
profiling data :)

This is what I get when dumping 1000 revisions from a local mirror of
the Subversion repository over ra_neon:

CPU: Core 2, speed 1200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (Unhalted core cycles) count 100000
samples  %        app name                 symbol name
4738     41.1893  no-vmlinux               (no symbols)
1037      9.0150  libxml2.so.2.6.32        (no symbols)
700       6.0854  libneon.so.27.1.2        (no symbols)
238       2.0690  libc-2.7.so              _int_malloc
228       1.9821  libc-2.7.so              memcpy
221       1.9212  libc-2.7.so              memset
217       1.8865  libc-2.7.so              strlen
191       1.6604  libsvn_subr-1.so.0.0.0   decode_bytes
180       1.5648  libc-2.7.so              vfprintf
171       1.4866  libc-2.7.so              strcmp
153       1.3301  libapr-1.so.0.2.12       apr_hashfunc_default
134       1.1649  libapr-1.so.0.2.12       apr_vformatter
130       1.1301  libapr-1.so.0.2.12       apr_palloc

That's on my Debian desktop.  At the recent Apache Retreat I tried to
demonstrate OProfile on my Ubuntu laptop and could not get it to work
properly, probably because I forgot about -fno-omit-frame-pointer.
Ah, now I see why it didn't work for me. The data from Callgrind is
very interesting- it seems to suggest that APR hashtables are
prohibitively expensive.

@Stefan: Thoughts on hacking APR hashtables directly?

Are you sure?! Which operation is the most expensive one
and how often is it called? Who calls it and why?

Finally there is traditional gprof.  It's a long time since I used it
so I don't remember the details.  You instrument the code at compile
time using CFLAGS=-pg.  If an instrumented function foo calls into a
library bar that is not instrumented then bar is invisible, all you
see is how long foo took to execute.
Yes, I used gprof initially. Callgrind is WAY more useful.

At least the results are much more useful when there is
a tool like Kcachegrind that allows easy navigation though
the huge amount of information that was gathered.

-- Stefan^2.

Reply via email to