Re: Performance of svn+ssh vs. file for multiple files
On 07/08/2010 02:27 AM, Daniel Shahaf wrote: Eric Peers wrote on Wed, 7 Jul 2010 at 04:44 -: Incidentally, where is [svn_ra_reparent] defined??? I can't find it in the libraries, but I see it in libsvn_ra-1.so but not in the libsvn_ra directory... % grep svn_ra_reparent tags svn_ra_reparent ./subversion/include/svn_ra.h /^svn_ra_reparent(svn_ra_session_t *ra_session,$/; p signature:(svn_ra_session_t *ra_session, const char *url, apr_pool_t *pool) svn_ra_reparent ./subversion/libsvn_ra/ra_loader.c /^svn_error_t *svn_ra_reparent(svn_ra_session_t *session,$/; f signature:(svn_ra_session_t *session, const char *url, apr_pool_t *pool) To save you some work: you'll see it calls vtable-reparent(). So the functions you *really* want are svn_ra__*_reparent(): % grep _reparent tags | awk '{print $1,$2}' | grep -v tools/server-side/ ra_svn_reparent ./subversion/libsvn_ra_svn/client.c svn_log__reparent ./subversion/include/private/svn_log.h svn_log__reparent ./subversion/libsvn_subr/log.c svn_ra_local__reparent ./subversion/libsvn_ra_local/ra_plugin.c svn_ra_neon__reparent ./subversion/libsvn_ra_neon/session.c svn_ra_reparent ./subversion/include/svn_ra.h svn_ra_reparent ./subversion/libsvn_ra/ra_loader.c svn_ra_serf__reparent ./subversion/libsvn_ra_serf/serf.c test_reparent ./subversion/bindings/swig/ruby/test/test_ra.rb I ended up writing a routine that uses the reparent call as previously discussed with a minor rework of the svn_client__update_internal to accomodate this. Overall time to update: 3.09s rather than 53s originally by reusing the session. Once I polish up the code, I'll post a copy on my blog if anybody wants it. This is well within acceptable ranges for performance in my mind. @Les: tags/branches don't work in this case because an edit on this can change the tag/branch and because the merge of local edits + local version changes becomes cumbersome (if not impossible) on the svn switch to the branch/tag. Perforce style tagging does work, svn does not since it's a branch unfortunately. We did consider this option. Thanks Daniel! one last q though: is the vtable-reparent the equivalent of a C++/Object Oriented Virtual Method? Where any given session (ssh, svnserve, file, http) can override as necessary? --Eric
Performance of svn+ssh vs. file for multiple files
Howdy, I've got a program that needs to checkout specific files at specific versions. In this particular case a branch does not make sense. I have found that the performance of svn+ssh in this case is very bad. I run the rough equivalent of: svn update -r 2 file1 file2 file3 file4 file5 svn update -r 3 file6 file7 file8 file9 file10 overall I have about 100 such files, and 2 svn update calls. I've accomplished this with an xargs frontend to svn so as to not overrun the cmdline. if I use file:/// as a protocol, it runs in 3 seconds. if I use svn+ssh:/// as a protocol, it takes 53 seconds. if I run an svn update -r 3 with no files, it takes about 2s. I wrote a direct svn api-program to accept the file lists, make the authentication a single time, and then call svn_update3. This still runs super slow. around 53s still. I suspect the problem is because each individual file is called out, locked, etc. Is there a way to batch these locks together or improve performance? Cause the ssh channel/ra session to be reused? Perusing the source code suggests that svn_client__update_internal will be called for each element in my paths. Since an individual file lock/svn directory write does not seem to be overly performance costly, I suspect the problem is in the svn_client__open_ra_session_internal + svn_ra_do_update2 calls from svn_client__update_internal? Is the subversion code opening a new ra_session for each of these files at the expense of an ssh+svnserve on the remote end? Is there a way to force a single RA session across all the files at an API level without writing my own svn_client__update_internal? thoughts here? thanks! --eric
Re: Performance of svn+ssh vs. file for multiple files
Good suggestion Daniel. While this does markedly improve performance, it does so at the expense of changing the underlying protocol. Unfortunately, I'm not at liberty to change the underlying protocol - I have customers that define the protocol, I don't. So my program needs to access their repos using their protocols. But the results: ssh port forwarding to an active svnserve takes about 2.5s. pure svnserve takes roughly 2s svnserve -d --listen-port 8000 ssh epe...@localhost -L 3690:localhost:8000 ...then run my svn update commands... --eric On 07/06/2010 12:52 PM, Daniel Shahaf wrote: Have you tried using SSH port forwarding instead of svn+ssh://? Daniel (perhaps one of the other devs will address the points you made; I'm myself not familiar with that part of the code) Eric Peers wrote on Tue, 6 Jul 2010 at 21:17 -: Howdy, I've got a program that needs to checkout specific files at specific versions. In this particular case a branch does not make sense. I have found that the performance of svn+ssh in this case is very bad. I run the rough equivalent of: svn update -r 2 file1 file2 file3 file4 file5 svn update -r 3 file6 file7 file8 file9 file10 overall I have about 100 such files, and 2 svn update calls. I've accomplished this with an xargs frontend to svn so as to not overrun the cmdline. if I use file:/// as a protocol, it runs in 3 seconds. if I use svn+ssh:/// as a protocol, it takes 53 seconds. if I run an svn update -r 3 with no files, it takes about 2s. I wrote a direct svn api-program to accept the file lists, make the authentication a single time, and then call svn_update3. This still runs super slow. around 53s still. I suspect the problem is because each individual file is called out, locked, etc. Is there a way to batch these locks together or improve performance? Cause the ssh channel/ra session to be reused? Perusing the source code suggests that svn_client__update_internal will be called for each element in my paths. Since an individual file lock/svn directory write does not seem to be overly performance costly, I suspect the problem is in the svn_client__open_ra_session_internal + svn_ra_do_update2 calls from svn_client__update_internal? Is the subversion code opening a new ra_session for each of these files at the expense of an ssh+svnserve on the remote end? Is there a way to force a single RA session across all the files at an API level without writing my own svn_client__update_internal? thoughts here? thanks! --eric