Re: Some tips on profiling

2010-10-27 Thread Ramkumar Ramachandra
Hi Stefan,

Stefan Fuhrmann writes:
 Hm. Works here (LINUX ggc 4.3.3 and Win32 MS VisualStudio 2010).

I'll check this -- must be some issue at my end; saw Daniel's response
too. Thanks.

 That would be awesome! Any weekend would do ;)

:)

-- Ram


Re: Some tips on profiling

2010-10-24 Thread Stefan Fuhrmann

On 04.10.2010 16:58, Ramkumar Ramachandra wrote:

Hi Stefan,

Sorry for the late reply. This has been the first weekend
that I didn't spend in office for some time.

Stefan Fuhrmann writes:

I enabled it, but there's still some issue:
subversion/svnadmin/main.c:1892: undefined reference to 
`svn_fs_get_cache_config'


It builds here. Did you run autogen.sh before ./configure?

Yep, I did. I tried it several times again; same issue. Is the
build.conf broken?

Hm. Works here (LINUX ggc 4.3.3 and Win32 MS VisualStudio 2010).



As soon as a larger number of patches got reviewed and merged,
I will prepare further changes for integration. So far, nobody had
free cycles to do the reviews.

I'm being stretched really thin myself- I sometimes have to sacrifice
several hours of sleep to keep up :| I'll try my best but I can't
promise. Also, there's the additional overhead of having to wait for
approvals- if I can't pull it off, I'll request a full committer to
take over.


I had the chance to check them out and apply them just now. They work
as expected. I'll submit some (uneducated) patch reviews to the list
and request a merge. I don't have access to the areas your patches
touch.

I really appreciate that. It would be great if someone had the time
to review the 3 commits to the membuffer cache integration branch.
The review should not require too much context knowledge. An
in-depth review will take a full day or so (like for an average sized
C++ class).

Thanks for the estimate- Instead of jumping between classes and
attempting to review it bit-by-bit, I'll try to allocate a Saturday or
Sunday to this task.

That would be awesome! Any weekend would do ;)

-- Stefan^2.


Re: Some tips on profiling

2010-10-24 Thread Daniel Shahaf
Stefan Fuhrmann wrote on Sun, Oct 24, 2010 at 20:55:04 +0200:
 On 04.10.2010 16:58, Ramkumar Ramachandra wrote:
 Hi Stefan,
 Sorry for the late reply. This has been the first weekend
 that I didn't spend in office for some time.
 Stefan Fuhrmann writes:
 I enabled it, but there's still some issue:
 subversion/svnadmin/main.c:1892: undefined reference to 
 `svn_fs_get_cache_config'

 It builds here. Did you run autogen.sh before ./configure?
 Yep, I did. I tried it several times again; same issue. Is the
 build.conf broken?
 Hm. Works here (LINUX ggc 4.3.3 and Win32 MS VisualStudio 2010).

Is it linking against the installed libraries by accident?  This can
happen on Debian an older Subversion (that doesn't have that function)
is on the search path (e.g., by virtue of being the --prefix directory).


Re: Some tips on profiling

2010-10-04 Thread Ramkumar Ramachandra
Hi Stefan,

Stefan Fuhrmann writes:
 I enabled it, but there's still some issue:
 subversion/svnadmin/main.c:1892: undefined reference to 
 `svn_fs_get_cache_config'
 
 It builds here. Did you run autogen.sh before ./configure?

Yep, I did. I tried it several times again; same issue. Is the
build.conf broken?

 For the MD5 stuff, try r986459 (performance branch).
 It should eliminate 1 of the 3 MD5 calculations.
 Why doesn't STATUS document this and everything else consistently?
 
 Because there is no simple mapping rev-feature / improvement.
 In particular, there are a number of intermediate steps that were
 replaced by new code later on. There is no point in reviewing
 these earlier revisions but the newer ones can't be reviewed and
 merged on their own. Hence the integration branch for the first
 major change.

Ah, I saw that.

 As soon as a larger number of patches got reviewed and merged,
 I will prepare further changes for integration. So far, nobody had
 free cycles to do the reviews.

I'm being stretched really thin myself- I sometimes have to sacrifice
several hours of sleep to keep up :| I'll try my best but I can't
promise. Also, there's the additional overhead of having to wait for
approvals- if I can't pull it off, I'll request a full committer to
take over.

 I had the chance to check them out and apply them just now. They work
 as expected. I'll submit some (uneducated) patch reviews to the list
 and request a merge. I don't have access to the areas your patches
 touch.
 I really appreciate that. It would be great if someone had the time
 to review the 3 commits to the membuffer cache integration branch.
 The review should not require too much context knowledge. An
 in-depth review will take a full day or so (like for an average sized
 C++ class).

Thanks for the estimate- Instead of jumping between classes and
attempting to review it bit-by-bit, I'll try to allocate a Saturday or
Sunday to this task.

-- Ram


Re: Some tips on profiling

2010-10-03 Thread Stefan Fuhrmann


 On 01.10.2010 15:56, Ramkumar Ramachandra wrote:

Hi Stefan,

Stefan Fuhrmann writes:

You obviously don't have APR threads enabled.
Thanks for finding this. Fixed in r1003430.

I enabled it, but there's still some issue:
subversion/svnadmin/main.c:1892: undefined reference to 
`svn_fs_get_cache_config'


It builds here. Did you run autogen.sh before ./configure?

The server caches almost everything. So, the
first (cold) run demonstrates the worst-case,
the second run (hot) the best case. Real
world performance depends on server memory
and load.

Ah. Thanks for the clarification.


For the merge part: please review the integrate-membuffer
branch (only 3 patches). You may also review and
merge any of the patches listed in /branches/performance/STATUS.

integrate-cache-membuffer, you mean? Thanks! Your emails contains
references exactly to the patches I'm looking for :)


For the MD5 stuff, try r986459 (performance branch).
It should eliminate 1 of the 3 MD5 calculations.

Why doesn't STATUS document this and everything else consistently?


Because there is no simple mapping rev-feature / improvement.
In particular, there are a number of intermediate steps that were
replaced by new code later on. There is no point in reviewing
these earlier revisions but the newer ones can't be reviewed and
merged on their own. Hence the integration branch for the first
major change.

As soon as a larger number of patches got reviewed and merged,
I will prepare further changes for integration. So far, nobody had
free cycles to do the reviews.

As for the temp files, there are some improvements
listed in /branches/performance/STATUS. These would
also reduce your system CPU time.

I had the chance to check them out and apply them just now. They work
as expected. I'll submit some (uneducated) patch reviews to the list
and request a merge. I don't have access to the areas your patches
touch.

I really appreciate that. It would be great if someone had the time
to review the 3 commits to the membuffer cache integration branch.
The review should not require too much context knowledge. An
in-depth review will take a full day or so (like for an average sized
C++ class).

-- Stefan^2.



Re: Some tips on profiling

2010-10-01 Thread Stefan Fuhrmann


 On 30.09.2010 04:28, Ramkumar Ramachandra wrote:

Hi Stefan,

Stefan Fuhrmann writes:

My measurements seem to support what Philip wrote:
The expensive part is run on the server. Even with my
optimized server, the svnrdump CPU usage is less than
the time taken by the server. Some numbers (hot file
cache):

svnadmin dump
 1.7 trunk 70s real  66s user 4s system
 perf-branch 30s real 28s user 2s system

1.7 trunk svnrdump
 ra-local 88s real 81s user 7s system
 svn: (1.7 trunk) 99s real 6s user 4s system
 svn: (perf-branch, cold)  72s real 5s user 6s system
 svn: (perf-branch, hot)  17s real 5s user 5s system

Thus, svnrdump is slower only for ra-local where it is
of no particular use in the first place. To really speed
things up, the caching infrastructure from the performance
branch should be merged into /trunk.

Wow, that looks awesome. Yeah, Daniel suggested that I run svnrdump
against your perf branch yesterday, but I wasn't able to get your
branch to build:
subversion/libsvn_subr/svn_file_handle_cache.c:890: error: 
'svn_file_handle_cache_t' has no member named 'mutex'

You obviously don't have APR threads enabled.
Thanks for finding this. Fixed in r1003430.

What exactly are the changes that hot introduces- can you point me
to the specific revision numbers that we intend to merge?

The server caches almost everything. So, the
first (cold) run demonstrates the worst-case,
the second run (hot) the best case. Real
world performance depends on server memory
and load.

For the merge part: please review the integrate-membuffer
branch (only 3 patches). You may also review and
merge any of the patches listed in /branches/performance/STATUS.


@Stefan: Thoughts on hacking APR hashtables directly?


Are you sure?! Which operation is the most expensive one
and how often is it called? Who calls it and why?

My bad. I got muddled up when using ra_local: when I saw lots of MD5
functions, I assumed it had to do something with the hashtable since
the checksum was caculated by server-side. The profiler profiled both
server-side and client-side. Over svn://, I got a much cleaner
output. The new results indicate:
1. Server-client IO is the major bottleneck, as Philip and you pointed
out.
2. On the client side, the major bottlneck is the IO during textdelta
application. David (CC'ed now) and I are trying some experiments
with a single temporary file. Thoughts?

For the MD5 stuff, try r986459 (performance branch).
It should eliminate 1 of the 3 MD5 calculations.

As for the temp files, there are some improvements
listed in /branches/performance/STATUS. These would
also reduce your system CPU time.

-- Stefan^2.


Re: Some tips on profiling

2010-10-01 Thread Ramkumar Ramachandra
Hi Stefan,

Stefan Fuhrmann writes:
 You obviously don't have APR threads enabled.
 Thanks for finding this. Fixed in r1003430.

I enabled it, but there's still some issue:
subversion/svnadmin/main.c:1892: undefined reference to 
`svn_fs_get_cache_config'

 The server caches almost everything. So, the
 first (cold) run demonstrates the worst-case,
 the second run (hot) the best case. Real
 world performance depends on server memory
 and load.

Ah. Thanks for the clarification.

 For the merge part: please review the integrate-membuffer
 branch (only 3 patches). You may also review and
 merge any of the patches listed in /branches/performance/STATUS.

integrate-cache-membuffer, you mean? Thanks! Your emails contains
references exactly to the patches I'm looking for :)

 For the MD5 stuff, try r986459 (performance branch).
 It should eliminate 1 of the 3 MD5 calculations.

Why doesn't STATUS document this and everything else consistently?

 As for the temp files, there are some improvements
 listed in /branches/performance/STATUS. These would
 also reduce your system CPU time.

I had the chance to check them out and apply them just now. They work
as expected. I'll submit some (uneducated) patch reviews to the list
and request a merge. I don't have access to the areas your patches
touch.

-- Ram


Re: Some tips on profiling

2010-09-29 Thread Ramkumar Ramachandra
Hi Philip,

Philip Martin writes:
 The performance of svnrdump is likely to be dominated by IO from the
 repository, network or disk depending on the RA layer.  strace is a
 useful tool to see opens/reads/writes.  You can see what order the
 calls occur, how many there are, how big they are and how long they
 take.

Ah, thanks for the tip.

 Valgrind/Callgrind is good and doesn't require you to instrument the
 code, but it does help to build with debug information.  It does
 impose a massive runtime overhead.

I don't mind -- I'm mostly using some remote machines to gather the
profiling data :)

 This is what I get when dumping 1000 revisions from a local mirror of
 the Subversion repository over ra_neon:
 
 CPU: Core 2, speed 1200 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
 mask of 0x00 (Unhalted core cycles) count 10
 samples  %app name symbol name
 4738 41.1893  no-vmlinux   (no symbols)
 1037  9.0150  libxml2.so.2.6.32(no symbols)
 700   6.0854  libneon.so.27.1.2(no symbols)
 238   2.0690  libc-2.7.so  _int_malloc
 228   1.9821  libc-2.7.so  memcpy
 221   1.9212  libc-2.7.so  memset
 217   1.8865  libc-2.7.so  strlen
 191   1.6604  libsvn_subr-1.so.0.0.0   decode_bytes
 180   1.5648  libc-2.7.so  vfprintf
 171   1.4866  libc-2.7.so  strcmp
 153   1.3301  libapr-1.so.0.2.12   apr_hashfunc_default
 134   1.1649  libapr-1.so.0.2.12   apr_vformatter
 130   1.1301  libapr-1.so.0.2.12   apr_palloc
 
 That's on my Debian desktop.  At the recent Apache Retreat I tried to
 demonstrate OProfile on my Ubuntu laptop and could not get it to work
 properly, probably because I forgot about -fno-omit-frame-pointer.

Ah, now I see why it didn't work for me. The data from Callgrind is
very interesting- it seems to suggest that APR hashtables are
prohibitively expensive.

@Stefan: Thoughts on hacking APR hashtables directly?

 Finally there is traditional gprof.  It's a long time since I used it
 so I don't remember the details.  You instrument the code at compile
 time using CFLAGS=-pg.  If an instrumented function foo calls into a
 library bar that is not instrumented then bar is invisible, all you
 see is how long foo took to execute.

Yes, I used gprof initially. Callgrind is WAY more useful.

-- Ram


Re: Some tips on profiling

2010-09-29 Thread Stefan Fuhrmann


 On 29.09.2010 10:45, Ramkumar Ramachandra wrote:

Hi Philip,

Philip Martin writes:

The performance of svnrdump is likely to be dominated by IO from the
repository, network or disk depending on the RA layer.  strace is a
useful tool to see opens/reads/writes.  You can see what order the
calls occur, how many there are, how big they are and how long they
take.

Ah, thanks for the tip.

My measurements seem to support what Philip wrote:
The expensive part is run on the server. Even with my
optimized server, the svnrdump CPU usage is less than
the time taken by the server. Some numbers (hot file
cache):

svnadmin dump
1.7 trunk 70s real  66s user 4s system
perf-branch 30s real 28s user 2s system

1.7 trunk svnrdump
ra-local 88s real 81s user 7s system
svn: (1.7 trunk) 99s real 6s user 4s system
svn: (perf-branch, cold)  72s real 5s user 6s system
svn: (perf-branch, hot)  17s real 5s user 5s system

Thus, svnrdump is slower only for ra-local where it is
of no particular use in the first place. To really speed
things up, the caching infrastructure from the performance
branch should be merged into /trunk.


Valgrind/Callgrind is good and doesn't require you to instrument the
code, but it does help to build with debug information.  It does
impose a massive runtime overhead.

I don't mind -- I'm mostly using some remote machines to gather the
profiling data :)


This is what I get when dumping 1000 revisions from a local mirror of
the Subversion repository over ra_neon:

CPU: Core 2, speed 1200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (Unhalted core cycles) count 10
samples  %app name symbol name
4738 41.1893  no-vmlinux   (no symbols)
1037  9.0150  libxml2.so.2.6.32(no symbols)
700   6.0854  libneon.so.27.1.2(no symbols)
238   2.0690  libc-2.7.so  _int_malloc
228   1.9821  libc-2.7.so  memcpy
221   1.9212  libc-2.7.so  memset
217   1.8865  libc-2.7.so  strlen
191   1.6604  libsvn_subr-1.so.0.0.0   decode_bytes
180   1.5648  libc-2.7.so  vfprintf
171   1.4866  libc-2.7.so  strcmp
153   1.3301  libapr-1.so.0.2.12   apr_hashfunc_default
134   1.1649  libapr-1.so.0.2.12   apr_vformatter
130   1.1301  libapr-1.so.0.2.12   apr_palloc

That's on my Debian desktop.  At the recent Apache Retreat I tried to
demonstrate OProfile on my Ubuntu laptop and could not get it to work
properly, probably because I forgot about -fno-omit-frame-pointer.

Ah, now I see why it didn't work for me. The data from Callgrind is
very interesting- it seems to suggest that APR hashtables are
prohibitively expensive.

@Stefan: Thoughts on hacking APR hashtables directly?


Are you sure?! Which operation is the most expensive one
and how often is it called? Who calls it and why?


Finally there is traditional gprof.  It's a long time since I used it
so I don't remember the details.  You instrument the code at compile
time using CFLAGS=-pg.  If an instrumented function foo calls into a
library bar that is not instrumented then bar is invisible, all you
see is how long foo took to execute.

Yes, I used gprof initially. Callgrind is WAY more useful.


At least the results are much more useful when there is
a tool like Kcachegrind that allows easy navigation though
the huge amount of information that was gathered.

-- Stefan^2.



Re: Some tips on profiling

2010-09-29 Thread Ramkumar Ramachandra
Hi Stefan,

Stefan Fuhrmann writes:
 My measurements seem to support what Philip wrote:
 The expensive part is run on the server. Even with my
 optimized server, the svnrdump CPU usage is less than
 the time taken by the server. Some numbers (hot file
 cache):
 
 svnadmin dump
 1.7 trunk 70s real  66s user 4s system
 perf-branch 30s real 28s user 2s system
 
 1.7 trunk svnrdump
 ra-local 88s real 81s user 7s system
 svn: (1.7 trunk) 99s real 6s user 4s system
 svn: (perf-branch, cold)  72s real 5s user 6s system
 svn: (perf-branch, hot)  17s real 5s user 5s system
 
 Thus, svnrdump is slower only for ra-local where it is
 of no particular use in the first place. To really speed
 things up, the caching infrastructure from the performance
 branch should be merged into /trunk.

Wow, that looks awesome. Yeah, Daniel suggested that I run svnrdump
against your perf branch yesterday, but I wasn't able to get your
branch to build:
subversion/libsvn_subr/svn_file_handle_cache.c:890: error: 
'svn_file_handle_cache_t' has no member named 'mutex'

What exactly are the changes that hot introduces- can you point me
to the specific revision numbers that we intend to merge?

 @Stefan: Thoughts on hacking APR hashtables directly?
 
 Are you sure?! Which operation is the most expensive one
 and how often is it called? Who calls it and why?

My bad. I got muddled up when using ra_local: when I saw lots of MD5
functions, I assumed it had to do something with the hashtable since
the checksum was caculated by server-side. The profiler profiled both
server-side and client-side. Over svn://, I got a much cleaner
output. The new results indicate:
1. Server-client IO is the major bottleneck, as Philip and you pointed
   out.
2. On the client side, the major bottlneck is the IO during textdelta
   application. David (CC'ed now) and I are trying some experiments
   with a single temporary file. Thoughts?

 At least the results are much more useful when there is
 a tool like Kcachegrind that allows easy navigation though
 the huge amount of information that was gathered.

Yep! The visualizer is quite awesome :)

-- Ram


Re: Some tips on profiling

2010-09-28 Thread Ramkumar Ramachandra
Hi Stefan,

Stefan Fuhrmann writes:
 Under Linux, I'm using Valgrind / Callgrind and visualize in KCachegrind.
 That gives me a good idea of what code gets executed too often, how
 often a jump (loop or condition) has been taken etc. It will not show the
 non-user and non-CPU runtime, e.g. wait for disk I/O. And it will slow the
 execution be a factor of 100 (YMMV).

Ah, they're brilliant! Just what I was looking for :D I'm yet to
interpret the output and interpret it to opimize svnrdump though-
might need some more help there; are you available on #svn-dev or
something sometime? Will keep you updated on this either way.

 Under Windows, I'm using the VisualStudio sampling profiler. The
 measurements
 are pretty accurate and the overhead is low. It does not tell me, how often
 a certain code path has been executed. Due to the low overhead, it is
 well-suited for long running (1 min) operations.

Interesting. I didn't know Visual Studio had a profiler. Then again,
I've not used Windows for several years now.

 Also, I find a simple time command very useful to get a first impression
 whether my code is bound by user-runtime, I/O wait or system runtime.

Ah, I want to know how to interpret this correctly. For equivalent
operations
svnrdump says: 7.49s user 1.34s system 91% cpu 9.682 total
svnadmin says: 4.84s user 0.44s system 97% cpu 5.417 total

 To collect data on I/O activity, compression rates and other not readily
 available information, I use simple static counters and printf()
 their values
 at the end of each client call, for instance.

I see.

 Hope that helps.

A ton! Thanks a lot :)

-- Ram


Re: Some tips on profiling

2010-09-28 Thread Philip Martin
Stefan Fuhrmann stefanfuhrm...@alice-dsl.de writes:

  On 27.09.2010 21:44, Ramkumar Ramachandra wrote:
 Could you tell me which tools you use to profile the various
 applications in trunk? I'm looking to profile svnrdump to fix some
 perf issues, but OProfile doesn't seem to work for me.

 Under Linux, I'm using Valgrind / Callgrind and visualize in KCachegrind.
 That gives me a good idea of what code gets executed too often, how
 often a jump (loop or condition) has been taken etc. It will not show the
 non-user and non-CPU runtime, e.g. wait for disk I/O. And it will slow the
 execution be a factor of 100 (YMMV).

The performance of svnrdump is likely to be dominated by IO from the
repository, network or disk depending on the RA layer.  strace is a
useful tool to see opens/reads/writes.  You can see what order the
calls occur, how many there are, how big they are and how long they
take.

Valgrind/Callgrind is good and doesn't require you to instrument the
code, but it does help to build with debug information.  It does
impose a massive runtime overhead.

OProfile will give you CPU usage with far lower overhead than
valgrind/callgrind.  Like valgrind/callgrind you don't need to
instrument the code but it works better with debug information and
with modern gcc if you use -O2 then you need -fno-omit-frame-pointer
for the callgraph stuff to work.  I use it like so:

opcontrol --init
opcontrol --no-vmlinux --separate=library --callgraph=32
opcontrol --start
opcontrol --reset
subversion/svnrdump/svnrdump ...
opcontrol --stop
opcontrol --dump
opreport --merge all -l image:/path/to/lt-svnrdump

This is what I get when dumping 1000 revisions from a local mirror of
the Subversion repository over ra_neon:

CPU: Core 2, speed 1200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (Unhalted core cycles) count 10
samples  %app name symbol name
4738 41.1893  no-vmlinux   (no symbols)
1037  9.0150  libxml2.so.2.6.32(no symbols)
700   6.0854  libneon.so.27.1.2(no symbols)
238   2.0690  libc-2.7.so  _int_malloc
228   1.9821  libc-2.7.so  memcpy
221   1.9212  libc-2.7.so  memset
217   1.8865  libc-2.7.so  strlen
191   1.6604  libsvn_subr-1.so.0.0.0   decode_bytes
180   1.5648  libc-2.7.so  vfprintf
171   1.4866  libc-2.7.so  strcmp
153   1.3301  libapr-1.so.0.2.12   apr_hashfunc_default
134   1.1649  libapr-1.so.0.2.12   apr_vformatter
130   1.1301  libapr-1.so.0.2.12   apr_palloc

That's on my Debian desktop.  At the recent Apache Retreat I tried to
demonstrate OProfile on my Ubuntu laptop and could not get it to work
properly, probably because I forgot about -fno-omit-frame-pointer.

Finally there is traditional gprof.  It's a long time since I used it
so I don't remember the details.  You instrument the code at compile
time using CFLAGS=-pg.  If an instrumented function foo calls into a
library bar that is not instrumented then bar is invisible, all you
see is how long foo took to execute.

-- 
Philip


Some tips on profiling

2010-09-27 Thread Ramkumar Ramachandra
Hi Stefan,

Could you tell me which tools you use to profile the various
applications in trunk? I'm looking to profile svnrdump to fix some
perf issues, but OProfile doesn't seem to work for me.

Thanks.

-- Ram


Re: Some tips on profiling

2010-09-27 Thread Stefan Fuhrmann


 On 27.09.2010 21:44, Ramkumar Ramachandra wrote:

Hi Stefan,

Could you tell me which tools you use to profile the various
applications in trunk? I'm looking to profile svnrdump to fix some
perf issues, but OProfile doesn't seem to work for me.

Thanks.

-- Ram


Hi Ram,

Under Linux, I'm using Valgrind / Callgrind and visualize in KCachegrind.
That gives me a good idea of what code gets executed too often, how
often a jump (loop or condition) has been taken etc. It will not show the
non-user and non-CPU runtime, e.g. wait for disk I/O. And it will slow the
execution be a factor of 100 (YMMV).

Under Windows, I'm using the VisualStudio sampling profiler. The 
measurements

are pretty accurate and the overhead is low. It does not tell me, how often
a certain code path has been executed. Due to the low overhead, it is
well-suited for long running (1 min) operations.

Also, I find a simple time command very useful to get a first impression
whether my code is bound by user-runtime, I/O wait or system runtime.

To collect data on I/O activity, compression rates and other not readily
available information, I use simple static counters and printf() their 
values

at the end of each client call, for instance.

Hope that helps.

-- Stefan^2.