Hi Neels, Neels J Hofmeyr writes: > On a side note, svnsync happens to be relatively slow. I tried to svnsync > the ASF repos once (for huge test data). The slowness of svnsync made it > practically unfeasible to pull off. I ended up downloading a zipped dump and > 'svnadmin load'ing that dump. Even with a zipped dump already downloaded, > 'unzip | svnadmin load' took a few *days* to load the 950.000+ revisions. > (And someone rebooted that box after two days, halfway through, grr. Took > some serious hacking to finish up without starting over.)
Yeah, we had a tough time obtaining the complete undeltified ASF dump for testing purposes as well. > So, that experience tells me that svnsync and svnadmin dump/load aren't > close to optimal, for example compared to a straight download of 34 gigs > that the ASF repos is... Anything that could speed up a remote dump/load > process would probably be good -- while I don't know any details about > svnrdump. I just benchmarked it recently and found that it dumps 10000 revisions of the ASF repository in 106 seconds: that's about 94 revisions per second. It used to be faster than `svnadmin` in an older benchmark: I'll work on perf issues this week. I estimate that it should be possible to get it to dump at ~140 revisions/second. @Daniel and others: I'd recommend a feature freeze. I'm currently profiling svnrdump and working on improving especially the I/O profile. > My two cents: Rephrasing everything into the dump format and back blows up > both data size and ETA. Maybe a remote backup mechanism could even break > loose from discrete revision boundaries during transfer/load... I've been thinking about this too: we'll have to start attacking the RA layer itself to make svnrdump even faster. The replay API isn't optimized for this kind of operation. > P.S.: If the whole ASF repos were a single Git WC, how long would that take > to pull? (Given that Git tends to take up much more space than a Subversion > repos, I wonder.) The gzipped undeltified dump of the complete ASF repository comes to about 25 GiB and it takes ~70 minutes to import it into the Git object store using a tool which is currently under development in Git. Thanks to David for these statistics. Cloning takes as long as it takes to transmit this data. After a repack, it'll probably shrink in size, but that's besides the point. Git was never designed to handle this- each project being a separate repository would be a fairer comparison. Even linux-2.6.git contains just 210887 revisions, and it tests Git's limits. -- Ram