Re: Performance of svn+ssh vs. file for multiple files

2010-07-08 Thread Eric Peers



On 07/08/2010 02:27 AM, Daniel Shahaf wrote:

Eric Peers wrote on Wed, 7 Jul 2010 at 04:44 -:
   

Incidentally, where is [svn_ra_reparent] defined??? I can't
find it in the libraries, but I see it in libsvn_ra-1.so but not in the
libsvn_ra directory...
 

% grep svn_ra_reparent tags
svn_ra_reparent ./subversion/include/svn_ra.h   /^svn_ra_reparent(svn_ra_session_t 
*ra_session,$/; p   signature:(svn_ra_session_t *ra_session, const 
char *url, apr_pool_t *pool)
svn_ra_reparent ./subversion/libsvn_ra/ra_loader.c  /^svn_error_t 
*svn_ra_reparent(svn_ra_session_t *session,$/;   f   
signature:(svn_ra_session_t *session, const char *url, apr_pool_t *pool)


To save you some work: you'll see it calls vtable-reparent().  So the
functions you *really* want are svn_ra__*_reparent():

% grep _reparent tags | awk '{print $1,$2}' | grep -v tools/server-side/
ra_svn_reparent ./subversion/libsvn_ra_svn/client.c
svn_log__reparent ./subversion/include/private/svn_log.h
svn_log__reparent ./subversion/libsvn_subr/log.c
svn_ra_local__reparent ./subversion/libsvn_ra_local/ra_plugin.c
svn_ra_neon__reparent ./subversion/libsvn_ra_neon/session.c
svn_ra_reparent ./subversion/include/svn_ra.h
svn_ra_reparent ./subversion/libsvn_ra/ra_loader.c
svn_ra_serf__reparent ./subversion/libsvn_ra_serf/serf.c
test_reparent ./subversion/bindings/swig/ruby/test/test_ra.rb

   



I ended up writing a routine that uses the reparent call as previously 
discussed with a minor rework of the svn_client__update_internal to 
accomodate this. Overall time to update: 3.09s rather than 53s 
originally by reusing the session. Once I polish up the code, I'll post 
a copy on my blog if anybody wants it.


This is well within acceptable ranges for performance in my mind.

@Les: tags/branches don't work in this case because an edit on this can 
change the tag/branch and because the merge of local edits + local 
version changes becomes cumbersome (if not impossible) on the svn switch 
to the branch/tag. Perforce style tagging does work, svn does not since 
it's a branch unfortunately. We did consider this option.


Thanks Daniel!

one last q though: is the vtable-reparent the equivalent of a 
C++/Object Oriented Virtual Method? Where any given session (ssh, 
svnserve, file, http) can override as necessary?



   --Eric



Performance of svn+ssh vs. file for multiple files

2010-07-06 Thread Eric Peers

Howdy,

I've got a program that needs to checkout specific files at specific 
versions. In this particular case a branch does not make sense. I have 
found that the performance of svn+ssh in this case is very bad.


I run the rough equivalent of:
svn update -r 2 file1 file2 file3 file4 file5
svn update -r 3 file6 file7 file8 file9 file10

overall I have about 100 such files, and 2 svn update calls. I've 
accomplished this with an xargs frontend to svn so as to not overrun the 
cmdline.


if I use file:/// as a protocol, it runs in 3 seconds.
if I use svn+ssh:/// as a protocol, it takes 53 seconds.
if I run an svn update -r 3 with no files, it takes about 2s.

I wrote a direct svn api-program to accept the file lists, make the 
authentication a single time, and then call svn_update3. This still runs 
super slow. around 53s still.


I suspect the problem is because each individual file is called out, 
locked, etc. Is there a way to batch these locks together or improve 
performance? Cause the ssh channel/ra session to be reused?


Perusing the source code suggests that svn_client__update_internal will 
be called for each element in my paths. Since an individual file 
lock/svn directory write does not seem to be overly performance costly, 
I suspect the problem is in the svn_client__open_ra_session_internal + 
svn_ra_do_update2 calls from svn_client__update_internal? Is the 
subversion code opening a new ra_session for each of these files at the 
expense of an ssh+svnserve on the remote end? Is there a way to force a 
single RA session across all the files at an API level without writing 
my own svn_client__update_internal?


thoughts here?

thanks!
   --eric




Re: Performance of svn+ssh vs. file for multiple files

2010-07-06 Thread Eric Peers
Good suggestion Daniel. While this does markedly improve performance, it 
does so at the expense of changing the underlying protocol. 
Unfortunately, I'm not at liberty to change the underlying protocol - I 
have customers that define the protocol, I don't. So my program needs 
to access their repos using their protocols.


But the results:
ssh port forwarding to an active svnserve takes about 2.5s.
pure svnserve takes roughly 2s

svnserve -d --listen-port 8000
ssh epe...@localhost -L 3690:localhost:8000
...then run my svn update commands...

   --eric

On 07/06/2010 12:52 PM, Daniel Shahaf wrote:

Have you tried using SSH port forwarding instead of svn+ssh://?

Daniel
(perhaps one of the other devs will address the points you made; I'm
myself not familiar with that part of the code)

Eric Peers wrote on Tue, 6 Jul 2010 at 21:17 -:
   

Howdy,

I've got a program that needs to checkout specific files at specific versions.
In this particular case a branch does not make sense. I have found that the
performance of svn+ssh in this case is very bad.

I run the rough equivalent of:
svn update -r 2 file1 file2 file3 file4 file5
svn update -r 3 file6 file7 file8 file9 file10

overall I have about 100 such files, and 2 svn update calls. I've accomplished
this with an xargs frontend to svn so as to not overrun the cmdline.

if I use file:/// as a protocol, it runs in 3 seconds.
if I use svn+ssh:/// as a protocol, it takes 53 seconds.
if I run an svn update -r 3 with no files, it takes about 2s.

I wrote a direct svn api-program to accept the file lists, make the
authentication a single time, and then call svn_update3. This still runs super
slow. around 53s still.

I suspect the problem is because each individual file is called out, locked,
etc. Is there a way to batch these locks together or improve performance?
Cause the ssh channel/ra session to be reused?

Perusing the source code suggests that svn_client__update_internal will be
called for each element in my paths. Since an individual file lock/svn
directory write does not seem to be overly performance costly, I suspect the
problem is in the svn_client__open_ra_session_internal + svn_ra_do_update2
calls from svn_client__update_internal? Is the subversion code opening a new
ra_session for each of these files at the expense of an ssh+svnserve on the
remote end? Is there a way to force a single RA session across all the files
at an API level without writing my own svn_client__update_internal?

thoughts here?

thanks!
--eric