Re: Some tips on profiling
Hi Stefan, Stefan Fuhrmann writes: Hm. Works here (LINUX ggc 4.3.3 and Win32 MS VisualStudio 2010). I'll check this -- must be some issue at my end; saw Daniel's response too. Thanks. That would be awesome! Any weekend would do ;) :) -- Ram
Re: Some tips on profiling
On 04.10.2010 16:58, Ramkumar Ramachandra wrote: Hi Stefan, Sorry for the late reply. This has been the first weekend that I didn't spend in office for some time. Stefan Fuhrmann writes: I enabled it, but there's still some issue: subversion/svnadmin/main.c:1892: undefined reference to `svn_fs_get_cache_config' It builds here. Did you run autogen.sh before ./configure? Yep, I did. I tried it several times again; same issue. Is the build.conf broken? Hm. Works here (LINUX ggc 4.3.3 and Win32 MS VisualStudio 2010). As soon as a larger number of patches got reviewed and merged, I will prepare further changes for integration. So far, nobody had free cycles to do the reviews. I'm being stretched really thin myself- I sometimes have to sacrifice several hours of sleep to keep up :| I'll try my best but I can't promise. Also, there's the additional overhead of having to wait for approvals- if I can't pull it off, I'll request a full committer to take over. I had the chance to check them out and apply them just now. They work as expected. I'll submit some (uneducated) patch reviews to the list and request a merge. I don't have access to the areas your patches touch. I really appreciate that. It would be great if someone had the time to review the 3 commits to the membuffer cache integration branch. The review should not require too much context knowledge. An in-depth review will take a full day or so (like for an average sized C++ class). Thanks for the estimate- Instead of jumping between classes and attempting to review it bit-by-bit, I'll try to allocate a Saturday or Sunday to this task. That would be awesome! Any weekend would do ;) -- Stefan^2.
Re: Some tips on profiling
Stefan Fuhrmann wrote on Sun, Oct 24, 2010 at 20:55:04 +0200: On 04.10.2010 16:58, Ramkumar Ramachandra wrote: Hi Stefan, Sorry for the late reply. This has been the first weekend that I didn't spend in office for some time. Stefan Fuhrmann writes: I enabled it, but there's still some issue: subversion/svnadmin/main.c:1892: undefined reference to `svn_fs_get_cache_config' It builds here. Did you run autogen.sh before ./configure? Yep, I did. I tried it several times again; same issue. Is the build.conf broken? Hm. Works here (LINUX ggc 4.3.3 and Win32 MS VisualStudio 2010). Is it linking against the installed libraries by accident? This can happen on Debian an older Subversion (that doesn't have that function) is on the search path (e.g., by virtue of being the --prefix directory).
Re: Some tips on profiling
Hi Stefan, Stefan Fuhrmann writes: I enabled it, but there's still some issue: subversion/svnadmin/main.c:1892: undefined reference to `svn_fs_get_cache_config' It builds here. Did you run autogen.sh before ./configure? Yep, I did. I tried it several times again; same issue. Is the build.conf broken? For the MD5 stuff, try r986459 (performance branch). It should eliminate 1 of the 3 MD5 calculations. Why doesn't STATUS document this and everything else consistently? Because there is no simple mapping rev-feature / improvement. In particular, there are a number of intermediate steps that were replaced by new code later on. There is no point in reviewing these earlier revisions but the newer ones can't be reviewed and merged on their own. Hence the integration branch for the first major change. Ah, I saw that. As soon as a larger number of patches got reviewed and merged, I will prepare further changes for integration. So far, nobody had free cycles to do the reviews. I'm being stretched really thin myself- I sometimes have to sacrifice several hours of sleep to keep up :| I'll try my best but I can't promise. Also, there's the additional overhead of having to wait for approvals- if I can't pull it off, I'll request a full committer to take over. I had the chance to check them out and apply them just now. They work as expected. I'll submit some (uneducated) patch reviews to the list and request a merge. I don't have access to the areas your patches touch. I really appreciate that. It would be great if someone had the time to review the 3 commits to the membuffer cache integration branch. The review should not require too much context knowledge. An in-depth review will take a full day or so (like for an average sized C++ class). Thanks for the estimate- Instead of jumping between classes and attempting to review it bit-by-bit, I'll try to allocate a Saturday or Sunday to this task. -- Ram
Re: Some tips on profiling
On 01.10.2010 15:56, Ramkumar Ramachandra wrote: Hi Stefan, Stefan Fuhrmann writes: You obviously don't have APR threads enabled. Thanks for finding this. Fixed in r1003430. I enabled it, but there's still some issue: subversion/svnadmin/main.c:1892: undefined reference to `svn_fs_get_cache_config' It builds here. Did you run autogen.sh before ./configure? The server caches almost everything. So, the first (cold) run demonstrates the worst-case, the second run (hot) the best case. Real world performance depends on server memory and load. Ah. Thanks for the clarification. For the merge part: please review the integrate-membuffer branch (only 3 patches). You may also review and merge any of the patches listed in /branches/performance/STATUS. integrate-cache-membuffer, you mean? Thanks! Your emails contains references exactly to the patches I'm looking for :) For the MD5 stuff, try r986459 (performance branch). It should eliminate 1 of the 3 MD5 calculations. Why doesn't STATUS document this and everything else consistently? Because there is no simple mapping rev-feature / improvement. In particular, there are a number of intermediate steps that were replaced by new code later on. There is no point in reviewing these earlier revisions but the newer ones can't be reviewed and merged on their own. Hence the integration branch for the first major change. As soon as a larger number of patches got reviewed and merged, I will prepare further changes for integration. So far, nobody had free cycles to do the reviews. As for the temp files, there are some improvements listed in /branches/performance/STATUS. These would also reduce your system CPU time. I had the chance to check them out and apply them just now. They work as expected. I'll submit some (uneducated) patch reviews to the list and request a merge. I don't have access to the areas your patches touch. I really appreciate that. It would be great if someone had the time to review the 3 commits to the membuffer cache integration branch. The review should not require too much context knowledge. An in-depth review will take a full day or so (like for an average sized C++ class). -- Stefan^2.
Re: Some tips on profiling
On 30.09.2010 04:28, Ramkumar Ramachandra wrote: Hi Stefan, Stefan Fuhrmann writes: My measurements seem to support what Philip wrote: The expensive part is run on the server. Even with my optimized server, the svnrdump CPU usage is less than the time taken by the server. Some numbers (hot file cache): svnadmin dump 1.7 trunk 70s real 66s user 4s system perf-branch 30s real 28s user 2s system 1.7 trunk svnrdump ra-local 88s real 81s user 7s system svn: (1.7 trunk) 99s real 6s user 4s system svn: (perf-branch, cold) 72s real 5s user 6s system svn: (perf-branch, hot) 17s real 5s user 5s system Thus, svnrdump is slower only for ra-local where it is of no particular use in the first place. To really speed things up, the caching infrastructure from the performance branch should be merged into /trunk. Wow, that looks awesome. Yeah, Daniel suggested that I run svnrdump against your perf branch yesterday, but I wasn't able to get your branch to build: subversion/libsvn_subr/svn_file_handle_cache.c:890: error: 'svn_file_handle_cache_t' has no member named 'mutex' You obviously don't have APR threads enabled. Thanks for finding this. Fixed in r1003430. What exactly are the changes that hot introduces- can you point me to the specific revision numbers that we intend to merge? The server caches almost everything. So, the first (cold) run demonstrates the worst-case, the second run (hot) the best case. Real world performance depends on server memory and load. For the merge part: please review the integrate-membuffer branch (only 3 patches). You may also review and merge any of the patches listed in /branches/performance/STATUS. @Stefan: Thoughts on hacking APR hashtables directly? Are you sure?! Which operation is the most expensive one and how often is it called? Who calls it and why? My bad. I got muddled up when using ra_local: when I saw lots of MD5 functions, I assumed it had to do something with the hashtable since the checksum was caculated by server-side. The profiler profiled both server-side and client-side. Over svn://, I got a much cleaner output. The new results indicate: 1. Server-client IO is the major bottleneck, as Philip and you pointed out. 2. On the client side, the major bottlneck is the IO during textdelta application. David (CC'ed now) and I are trying some experiments with a single temporary file. Thoughts? For the MD5 stuff, try r986459 (performance branch). It should eliminate 1 of the 3 MD5 calculations. As for the temp files, there are some improvements listed in /branches/performance/STATUS. These would also reduce your system CPU time. -- Stefan^2.
Re: Some tips on profiling
Hi Stefan, Stefan Fuhrmann writes: You obviously don't have APR threads enabled. Thanks for finding this. Fixed in r1003430. I enabled it, but there's still some issue: subversion/svnadmin/main.c:1892: undefined reference to `svn_fs_get_cache_config' The server caches almost everything. So, the first (cold) run demonstrates the worst-case, the second run (hot) the best case. Real world performance depends on server memory and load. Ah. Thanks for the clarification. For the merge part: please review the integrate-membuffer branch (only 3 patches). You may also review and merge any of the patches listed in /branches/performance/STATUS. integrate-cache-membuffer, you mean? Thanks! Your emails contains references exactly to the patches I'm looking for :) For the MD5 stuff, try r986459 (performance branch). It should eliminate 1 of the 3 MD5 calculations. Why doesn't STATUS document this and everything else consistently? As for the temp files, there are some improvements listed in /branches/performance/STATUS. These would also reduce your system CPU time. I had the chance to check them out and apply them just now. They work as expected. I'll submit some (uneducated) patch reviews to the list and request a merge. I don't have access to the areas your patches touch. -- Ram
Re: Some tips on profiling
Hi Philip, Philip Martin writes: The performance of svnrdump is likely to be dominated by IO from the repository, network or disk depending on the RA layer. strace is a useful tool to see opens/reads/writes. You can see what order the calls occur, how many there are, how big they are and how long they take. Ah, thanks for the tip. Valgrind/Callgrind is good and doesn't require you to instrument the code, but it does help to build with debug information. It does impose a massive runtime overhead. I don't mind -- I'm mostly using some remote machines to gather the profiling data :) This is what I get when dumping 1000 revisions from a local mirror of the Subversion repository over ra_neon: CPU: Core 2, speed 1200 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 10 samples %app name symbol name 4738 41.1893 no-vmlinux (no symbols) 1037 9.0150 libxml2.so.2.6.32(no symbols) 700 6.0854 libneon.so.27.1.2(no symbols) 238 2.0690 libc-2.7.so _int_malloc 228 1.9821 libc-2.7.so memcpy 221 1.9212 libc-2.7.so memset 217 1.8865 libc-2.7.so strlen 191 1.6604 libsvn_subr-1.so.0.0.0 decode_bytes 180 1.5648 libc-2.7.so vfprintf 171 1.4866 libc-2.7.so strcmp 153 1.3301 libapr-1.so.0.2.12 apr_hashfunc_default 134 1.1649 libapr-1.so.0.2.12 apr_vformatter 130 1.1301 libapr-1.so.0.2.12 apr_palloc That's on my Debian desktop. At the recent Apache Retreat I tried to demonstrate OProfile on my Ubuntu laptop and could not get it to work properly, probably because I forgot about -fno-omit-frame-pointer. Ah, now I see why it didn't work for me. The data from Callgrind is very interesting- it seems to suggest that APR hashtables are prohibitively expensive. @Stefan: Thoughts on hacking APR hashtables directly? Finally there is traditional gprof. It's a long time since I used it so I don't remember the details. You instrument the code at compile time using CFLAGS=-pg. If an instrumented function foo calls into a library bar that is not instrumented then bar is invisible, all you see is how long foo took to execute. Yes, I used gprof initially. Callgrind is WAY more useful. -- Ram
Re: Some tips on profiling
On 29.09.2010 10:45, Ramkumar Ramachandra wrote: Hi Philip, Philip Martin writes: The performance of svnrdump is likely to be dominated by IO from the repository, network or disk depending on the RA layer. strace is a useful tool to see opens/reads/writes. You can see what order the calls occur, how many there are, how big they are and how long they take. Ah, thanks for the tip. My measurements seem to support what Philip wrote: The expensive part is run on the server. Even with my optimized server, the svnrdump CPU usage is less than the time taken by the server. Some numbers (hot file cache): svnadmin dump 1.7 trunk 70s real 66s user 4s system perf-branch 30s real 28s user 2s system 1.7 trunk svnrdump ra-local 88s real 81s user 7s system svn: (1.7 trunk) 99s real 6s user 4s system svn: (perf-branch, cold) 72s real 5s user 6s system svn: (perf-branch, hot) 17s real 5s user 5s system Thus, svnrdump is slower only for ra-local where it is of no particular use in the first place. To really speed things up, the caching infrastructure from the performance branch should be merged into /trunk. Valgrind/Callgrind is good and doesn't require you to instrument the code, but it does help to build with debug information. It does impose a massive runtime overhead. I don't mind -- I'm mostly using some remote machines to gather the profiling data :) This is what I get when dumping 1000 revisions from a local mirror of the Subversion repository over ra_neon: CPU: Core 2, speed 1200 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 10 samples %app name symbol name 4738 41.1893 no-vmlinux (no symbols) 1037 9.0150 libxml2.so.2.6.32(no symbols) 700 6.0854 libneon.so.27.1.2(no symbols) 238 2.0690 libc-2.7.so _int_malloc 228 1.9821 libc-2.7.so memcpy 221 1.9212 libc-2.7.so memset 217 1.8865 libc-2.7.so strlen 191 1.6604 libsvn_subr-1.so.0.0.0 decode_bytes 180 1.5648 libc-2.7.so vfprintf 171 1.4866 libc-2.7.so strcmp 153 1.3301 libapr-1.so.0.2.12 apr_hashfunc_default 134 1.1649 libapr-1.so.0.2.12 apr_vformatter 130 1.1301 libapr-1.so.0.2.12 apr_palloc That's on my Debian desktop. At the recent Apache Retreat I tried to demonstrate OProfile on my Ubuntu laptop and could not get it to work properly, probably because I forgot about -fno-omit-frame-pointer. Ah, now I see why it didn't work for me. The data from Callgrind is very interesting- it seems to suggest that APR hashtables are prohibitively expensive. @Stefan: Thoughts on hacking APR hashtables directly? Are you sure?! Which operation is the most expensive one and how often is it called? Who calls it and why? Finally there is traditional gprof. It's a long time since I used it so I don't remember the details. You instrument the code at compile time using CFLAGS=-pg. If an instrumented function foo calls into a library bar that is not instrumented then bar is invisible, all you see is how long foo took to execute. Yes, I used gprof initially. Callgrind is WAY more useful. At least the results are much more useful when there is a tool like Kcachegrind that allows easy navigation though the huge amount of information that was gathered. -- Stefan^2.
Re: Some tips on profiling
Hi Stefan, Stefan Fuhrmann writes: My measurements seem to support what Philip wrote: The expensive part is run on the server. Even with my optimized server, the svnrdump CPU usage is less than the time taken by the server. Some numbers (hot file cache): svnadmin dump 1.7 trunk 70s real 66s user 4s system perf-branch 30s real 28s user 2s system 1.7 trunk svnrdump ra-local 88s real 81s user 7s system svn: (1.7 trunk) 99s real 6s user 4s system svn: (perf-branch, cold) 72s real 5s user 6s system svn: (perf-branch, hot) 17s real 5s user 5s system Thus, svnrdump is slower only for ra-local where it is of no particular use in the first place. To really speed things up, the caching infrastructure from the performance branch should be merged into /trunk. Wow, that looks awesome. Yeah, Daniel suggested that I run svnrdump against your perf branch yesterday, but I wasn't able to get your branch to build: subversion/libsvn_subr/svn_file_handle_cache.c:890: error: 'svn_file_handle_cache_t' has no member named 'mutex' What exactly are the changes that hot introduces- can you point me to the specific revision numbers that we intend to merge? @Stefan: Thoughts on hacking APR hashtables directly? Are you sure?! Which operation is the most expensive one and how often is it called? Who calls it and why? My bad. I got muddled up when using ra_local: when I saw lots of MD5 functions, I assumed it had to do something with the hashtable since the checksum was caculated by server-side. The profiler profiled both server-side and client-side. Over svn://, I got a much cleaner output. The new results indicate: 1. Server-client IO is the major bottleneck, as Philip and you pointed out. 2. On the client side, the major bottlneck is the IO during textdelta application. David (CC'ed now) and I are trying some experiments with a single temporary file. Thoughts? At least the results are much more useful when there is a tool like Kcachegrind that allows easy navigation though the huge amount of information that was gathered. Yep! The visualizer is quite awesome :) -- Ram
Re: Some tips on profiling
Hi Stefan, Stefan Fuhrmann writes: Under Linux, I'm using Valgrind / Callgrind and visualize in KCachegrind. That gives me a good idea of what code gets executed too often, how often a jump (loop or condition) has been taken etc. It will not show the non-user and non-CPU runtime, e.g. wait for disk I/O. And it will slow the execution be a factor of 100 (YMMV). Ah, they're brilliant! Just what I was looking for :D I'm yet to interpret the output and interpret it to opimize svnrdump though- might need some more help there; are you available on #svn-dev or something sometime? Will keep you updated on this either way. Under Windows, I'm using the VisualStudio sampling profiler. The measurements are pretty accurate and the overhead is low. It does not tell me, how often a certain code path has been executed. Due to the low overhead, it is well-suited for long running (1 min) operations. Interesting. I didn't know Visual Studio had a profiler. Then again, I've not used Windows for several years now. Also, I find a simple time command very useful to get a first impression whether my code is bound by user-runtime, I/O wait or system runtime. Ah, I want to know how to interpret this correctly. For equivalent operations svnrdump says: 7.49s user 1.34s system 91% cpu 9.682 total svnadmin says: 4.84s user 0.44s system 97% cpu 5.417 total To collect data on I/O activity, compression rates and other not readily available information, I use simple static counters and printf() their values at the end of each client call, for instance. I see. Hope that helps. A ton! Thanks a lot :) -- Ram
Re: Some tips on profiling
Stefan Fuhrmann stefanfuhrm...@alice-dsl.de writes: On 27.09.2010 21:44, Ramkumar Ramachandra wrote: Could you tell me which tools you use to profile the various applications in trunk? I'm looking to profile svnrdump to fix some perf issues, but OProfile doesn't seem to work for me. Under Linux, I'm using Valgrind / Callgrind and visualize in KCachegrind. That gives me a good idea of what code gets executed too often, how often a jump (loop or condition) has been taken etc. It will not show the non-user and non-CPU runtime, e.g. wait for disk I/O. And it will slow the execution be a factor of 100 (YMMV). The performance of svnrdump is likely to be dominated by IO from the repository, network or disk depending on the RA layer. strace is a useful tool to see opens/reads/writes. You can see what order the calls occur, how many there are, how big they are and how long they take. Valgrind/Callgrind is good and doesn't require you to instrument the code, but it does help to build with debug information. It does impose a massive runtime overhead. OProfile will give you CPU usage with far lower overhead than valgrind/callgrind. Like valgrind/callgrind you don't need to instrument the code but it works better with debug information and with modern gcc if you use -O2 then you need -fno-omit-frame-pointer for the callgraph stuff to work. I use it like so: opcontrol --init opcontrol --no-vmlinux --separate=library --callgraph=32 opcontrol --start opcontrol --reset subversion/svnrdump/svnrdump ... opcontrol --stop opcontrol --dump opreport --merge all -l image:/path/to/lt-svnrdump This is what I get when dumping 1000 revisions from a local mirror of the Subversion repository over ra_neon: CPU: Core 2, speed 1200 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 10 samples %app name symbol name 4738 41.1893 no-vmlinux (no symbols) 1037 9.0150 libxml2.so.2.6.32(no symbols) 700 6.0854 libneon.so.27.1.2(no symbols) 238 2.0690 libc-2.7.so _int_malloc 228 1.9821 libc-2.7.so memcpy 221 1.9212 libc-2.7.so memset 217 1.8865 libc-2.7.so strlen 191 1.6604 libsvn_subr-1.so.0.0.0 decode_bytes 180 1.5648 libc-2.7.so vfprintf 171 1.4866 libc-2.7.so strcmp 153 1.3301 libapr-1.so.0.2.12 apr_hashfunc_default 134 1.1649 libapr-1.so.0.2.12 apr_vformatter 130 1.1301 libapr-1.so.0.2.12 apr_palloc That's on my Debian desktop. At the recent Apache Retreat I tried to demonstrate OProfile on my Ubuntu laptop and could not get it to work properly, probably because I forgot about -fno-omit-frame-pointer. Finally there is traditional gprof. It's a long time since I used it so I don't remember the details. You instrument the code at compile time using CFLAGS=-pg. If an instrumented function foo calls into a library bar that is not instrumented then bar is invisible, all you see is how long foo took to execute. -- Philip
Some tips on profiling
Hi Stefan, Could you tell me which tools you use to profile the various applications in trunk? I'm looking to profile svnrdump to fix some perf issues, but OProfile doesn't seem to work for me. Thanks. -- Ram
Re: Some tips on profiling
On 27.09.2010 21:44, Ramkumar Ramachandra wrote: Hi Stefan, Could you tell me which tools you use to profile the various applications in trunk? I'm looking to profile svnrdump to fix some perf issues, but OProfile doesn't seem to work for me. Thanks. -- Ram Hi Ram, Under Linux, I'm using Valgrind / Callgrind and visualize in KCachegrind. That gives me a good idea of what code gets executed too often, how often a jump (loop or condition) has been taken etc. It will not show the non-user and non-CPU runtime, e.g. wait for disk I/O. And it will slow the execution be a factor of 100 (YMMV). Under Windows, I'm using the VisualStudio sampling profiler. The measurements are pretty accurate and the overhead is low. It does not tell me, how often a certain code path has been executed. Due to the low overhead, it is well-suited for long running (1 min) operations. Also, I find a simple time command very useful to get a first impression whether my code is bound by user-runtime, I/O wait or system runtime. To collect data on I/O activity, compression rates and other not readily available information, I use simple static counters and printf() their values at the end of each client call, for instance. Hope that helps. -- Stefan^2.