On 29.09.2010 10:45, Ramkumar Ramachandra wrote:
Hi Philip,
Philip Martin writes:
The performance of svnrdump is likely to be dominated by IO from the
repository, network or disk depending on the RA layer. strace is a
useful tool to see opens/reads/writes. You can see what order the
calls occur, how many there are, how big they are and how long they
take.
Ah, thanks for the tip.
My measurements seem to support what Philip wrote:
The expensive part is run on the server. Even with my
optimized server, the svnrdump CPU usage is less than
the time taken by the server. Some numbers (hot file
cache):
svnadmin dump
1.7 trunk 70s real 66s user 4s system
perf-branch 30s real 28s user 2s system
1.7 trunk svnrdump
ra-local 88s real 81s user 7s system
svn: (1.7 trunk) 99s real 6s user 4s system
svn: (perf-branch, cold) 72s real 5s user 6s system
svn: (perf-branch, hot) 17s real 5s user 5s system
Thus, svnrdump is slower only for ra-local where it is
of no particular use in the first place. To really speed
things up, the caching infrastructure from the performance
branch should be merged into /trunk.
Valgrind/Callgrind is good and doesn't require you to instrument the
code, but it does help to build with debug information. It does
impose a massive runtime overhead.
I don't mind -- I'm mostly using some remote machines to gather the
profiling data :)
This is what I get when dumping 1000 revisions from a local mirror of
the Subversion repository over ra_neon:
CPU: Core 2, speed 1200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
of 0x00 (Unhalted core cycles) count 100000
samples % app name symbol name
4738 41.1893 no-vmlinux (no symbols)
1037 9.0150 libxml2.so.2.6.32 (no symbols)
700 6.0854 libneon.so.27.1.2 (no symbols)
238 2.0690 libc-2.7.so _int_malloc
228 1.9821 libc-2.7.so memcpy
221 1.9212 libc-2.7.so memset
217 1.8865 libc-2.7.so strlen
191 1.6604 libsvn_subr-1.so.0.0.0 decode_bytes
180 1.5648 libc-2.7.so vfprintf
171 1.4866 libc-2.7.so strcmp
153 1.3301 libapr-1.so.0.2.12 apr_hashfunc_default
134 1.1649 libapr-1.so.0.2.12 apr_vformatter
130 1.1301 libapr-1.so.0.2.12 apr_palloc
That's on my Debian desktop. At the recent Apache Retreat I tried to
demonstrate OProfile on my Ubuntu laptop and could not get it to work
properly, probably because I forgot about -fno-omit-frame-pointer.
Ah, now I see why it didn't work for me. The data from Callgrind is
very interesting- it seems to suggest that APR hashtables are
prohibitively expensive.
@Stefan: Thoughts on hacking APR hashtables directly?
Are you sure?! Which operation is the most expensive one
and how often is it called? Who calls it and why?
Finally there is traditional gprof. It's a long time since I used it
so I don't remember the details. You instrument the code at compile
time using CFLAGS=-pg. If an instrumented function foo calls into a
library bar that is not instrumented then bar is invisible, all you
see is how long foo took to execute.
Yes, I used gprof initially. Callgrind is WAY more useful.
At least the results are much more useful when there is
a tool like Kcachegrind that allows easy navigation though
the huge amount of information that was gathered.
-- Stefan^2.