RE: place of svnrdump
-Original Message- From: Ramkumar Ramachandra [mailto:artag...@gmail.com] Sent: 27 September 2010 19:58 To: Neels J Hofmeyr Cc: dev@subversion.apache.org; Daniel Shahaf; Stefan Sperling Subject: Re: place of svnrdump Neels J Hofmeyr writes: While we're at it... svnsync's slowness is particularly painful when doing 'svnsync copy-revprops'. With revprop changes enabled, any revprops may be changed at any time. So to maintain an up-to-date mirror, one would like to copy *all* revprops at the very least once per day. With a repos of average corporate size, though, that can take the whole night and soon longer than the developers need to go home and come back to work next morning (to find the mirror lagging). So one could copy only the youngest 1000 revprops each night and do a complete run every weekend. Or script a revprop-change hook that propagates revprop change signals to mirrors. :( Of course, you could put a post-revprop-change hook in place to note which revprop was changed, and then run a script that only syncs those revprops. I wouldn't recommend putting the 'sync copy-revprops' command in the post-revprop-change hook, if someone commits a revision then immediately updates the revprop the sync will fail (as the rev may not have been synced yet). If anything, changing svnsync to ignore a failed copy-revprop command if no revision existed to sync to would fix this problem, and the copy-revprop could then be put in the hook without worry.
Re: place of svnrdump
Ramkumar Ramachandra wrote on Tue, Sep 28, 2010 at 00:27:38 +0530: Hi Neels, Neels J Hofmeyr writes: I just benchmarked it recently and found that it dumps 1 revisions of the ASF repository in 106 seconds: that's about 94 revisions per second. Wow! That's magnitudes better than the 5 to 10 revisions per second I'm used to (I think that's using svnsync). Yep :) While we're at it... svnsync's slowness is particularly painful when doing 'svnsync copy-revprops'. With revprop changes enabled, any revprops may be changed at any time. So to maintain an up-to-date mirror, one would like to copy *all* revprops at the very least once per day. With a repos of average corporate size, though, that can take the whole night and soon longer than the developers need to go home and come back to work next morning (to find the mirror lagging). So one could copy only the youngest 1000 revprops each night and do a complete run every weekend. Or script a revprop-change hook that propagates revprop change signals to mirrors. :( Wow. This is quite a serious problem. I'm a very new developer, and I don't really use Subversion. You should probably let the other Subversion developers know about this on a new thread? @Daniel, @Stefan: Thoughts on this? Use the commits@ list and run copy-revprops only on revisions that actually had been revprop-edited? svnrdump won't help in that compartment, would it? That would be a feature request (although I'm not sure svnrdump will ever be extended to handle that), How could svnrdump help here? What we might need is an RA call that has the server provide the N last revisions to have undergone revprop edits... because svnrdump is still very young- it just dumps/ loads dumpfiles from remote repositories quickly at the moment. I've decided to feature freeze until I fix the perf issues for the upcoming release- I'll keep this in mind though. -- Ram
Re: place of svnrdump
Hi Neels, Neels J Hofmeyr writes: On a side note, svnsync happens to be relatively slow. I tried to svnsync the ASF repos once (for huge test data). The slowness of svnsync made it practically unfeasible to pull off. I ended up downloading a zipped dump and 'svnadmin load'ing that dump. Even with a zipped dump already downloaded, 'unzip | svnadmin load' took a few *days* to load the 950.000+ revisions. (And someone rebooted that box after two days, halfway through, grr. Took some serious hacking to finish up without starting over.) Yeah, we had a tough time obtaining the complete undeltified ASF dump for testing purposes as well. So, that experience tells me that svnsync and svnadmin dump/load aren't close to optimal, for example compared to a straight download of 34 gigs that the ASF repos is... Anything that could speed up a remote dump/load process would probably be good -- while I don't know any details about svnrdump. I just benchmarked it recently and found that it dumps 1 revisions of the ASF repository in 106 seconds: that's about 94 revisions per second. It used to be faster than `svnadmin` in an older benchmark: I'll work on perf issues this week. I estimate that it should be possible to get it to dump at ~140 revisions/second. @Daniel and others: I'd recommend a feature freeze. I'm currently profiling svnrdump and working on improving especially the I/O profile. My two cents: Rephrasing everything into the dump format and back blows up both data size and ETA. Maybe a remote backup mechanism could even break loose from discrete revision boundaries during transfer/load... I've been thinking about this too: we'll have to start attacking the RA layer itself to make svnrdump even faster. The replay API isn't optimized for this kind of operation. P.S.: If the whole ASF repos were a single Git WC, how long would that take to pull? (Given that Git tends to take up much more space than a Subversion repos, I wonder.) The gzipped undeltified dump of the complete ASF repository comes to about 25 GiB and it takes ~70 minutes to import it into the Git object store using a tool which is currently under development in Git. Thanks to David for these statistics. Cloning takes as long as it takes to transmit this data. After a repack, it'll probably shrink in size, but that's besides the point. Git was never designed to handle this- each project being a separate repository would be a fairer comparison. Even linux-2.6.git contains just 210887 revisions, and it tests Git's limits. -- Ram
Build tools/ by default? (was: Re: place of svnrdump)
On Sat, Sep 25, 2010 at 04:40:01PM +0200, Daniel Shahaf wrote: Ramkumar Ramachandra wrote on Sat, Sep 25, 2010 at 19:15:16 +0530: I have no interest in politics, and I don't want to speculate why svnmucc isn't built by `make` by default. Because it lives in tools/. What about we start building tools by default? Not install, just build? This would help us catch compile-breakage in tools more quickly. Stefan Index: configure.ac === --- configure.ac(revision 1000128) +++ configure.ac(working copy) @@ -702,7 +702,7 @@ dnl Build and install rules --- INSTALL_STATIC_RULES=install-bin install-docs INSTALL_RULES=install-fsmod-lib install-ramod-lib install-lib install-include install-static INSTALL_RULES=$INSTALL_RULES $INSTALL_APACHE_RULE -BUILD_RULES=fsmod-lib ramod-lib lib bin test $BUILD_APACHE_RULE +BUILD_RULES=fsmod-lib ramod-lib lib bin test $BUILD_APACHE_RULE tools if test $svn_lib_berkeley_db = yes; then BUILD_RULES=$BUILD_RULES bdb-lib bdb-test
Re: place of svnrdump (was: Re: svnmucc multiline property issue)
On Sat, Sep 25, 2010 at 02:43:37PM +0200, Daniel Shahaf wrote: Ramkumar Ramachandra wrote on Sat, Sep 25, 2010 at 11:59:49 +0530: Daniel Shahaf writes: Would svnrdump benefit if, once 1.7.x branched and RC's start being rolled, it were subjected to a more relaxed backporting policy? If so, we might consider moving it to tools/ for 1.7.x, with intent to move it back to subversion/svnrdump/ for 1.8.x (as soon as 1.7.x is branched from trunk). Hm? I don't understand how it'll help backporting. I already maintain an out-of-tree build that successfully compiles against 1.6 [1]. It'll be fairly trivial to write an svn18_compat module. We'll be able to follow a more relaxed Is this change backportable policy. I don't really mind where svnrdump lives. The community is committed to supporting both the tools/ and subversion/ directories. If backwards-compat rules automatically apply to everything below subversion/ (they do?), and that people feel that svnrdump may still change in backwards-incompatible ways, we might as well decide to make the subversion/svnrdump directory exempt from such guarantees during the 1.7 release. It is a simple technical decision: Do we think that the current feature set is worth supporting under our backwards-compat rules? I do. Hyrum K. Wright writes: I would add the further thought that as this is a tool rather than a first-class component, I think we can play a little bit looser with the rules. For what it's worth, I have a different opinion on this issue. It really doesn't matter. It's just the name of a directory and a set of promises we give to our users about backwards-compat. There's no need for hard feelings. Stefan
Re: place of svnrdump
On 2010-09-27 09:45, Ramkumar Ramachandra wrote: ... I just benchmarked it recently and found that it dumps 1 revisions of the ASF repository in 106 seconds: that's about 94 revisions per second. Wow! That's magnitudes better than the 5 to 10 revisions per second I'm used to (I think that's using svnsync). While we're at it... svnsync's slowness is particularly painful when doing 'svnsync copy-revprops'. With revprop changes enabled, any revprops may be changed at any time. So to maintain an up-to-date mirror, one would like to copy *all* revprops at the very least once per day. With a repos of average corporate size, though, that can take the whole night and soon longer than the developers need to go home and come back to work next morning (to find the mirror lagging). So one could copy only the youngest 1000 revprops each night and do a complete run every weekend. Or script a revprop-change hook that propagates revprop change signals to mirrors. :( svnrdump won't help in that compartment, would it? Thanks, ~Neels signature.asc Description: OpenPGP digital signature
Re: place of svnrdump (was: Re: svnmucc multiline property issue)
Hi Stefan, Stefan Sperling writes: I don't really mind where svnrdump lives. The community is committed to supporting both the tools/ and subversion/ directories. tools and subversion are merely directory names. All I'm saying is this: I don't want packaging/ distribution overheads for such a simple package; users should be able to use whatever Subversion-interop tools that other developers build by just having Subversion installed. If backwards-compat rules automatically apply to everything below subversion/ (they do?), and that people feel that svnrdump may still change in backwards-incompatible ways, we might as well decide to make the subversion/svnrdump directory exempt from such guarantees during the 1.7 release. It is a simple technical decision: Do we think that the current feature set is worth supporting under our backwards-compat rules? I do. Hm, I still don't understand this back-porting thing very well. Does it mean that the svnrdump should always do what it currently does functionally (plus anything additional)? Does it mean that any improvements to the command-line UI should ensure that the current command-line UI still works? Does it mean that the public API it exposes through the headers should not break- do we instead have to write corresponding '_2' functions? It seems to be quite sane at the moment, and I don't think backporting is an issue; I'm not very experienced in this though, so I wouldn't take my own opinion too seriously. Hyrum K. Wright writes: I would add the further thought that as this is a tool rather than a first-class component, I think we can play a little bit looser with the rules. For what it's worth, I have a different opinion on this issue. It really doesn't matter. It's just the name of a directory and a set of promises we give to our users about backwards-compat. There's no need for hard feelings. Hey, no hard feelings! I was merely citing how other version control systems make it easy for users to get access to their own history, and suggesting that Subversion should too. In the long-term, I think of svnrdump as just a simple dumping/ loading functionality of Subversion: I don't really care *how* it's present; I just think it should be present: either as part of svnrdump, svnadmin, svnsync, or something else. -- Ram
Re: place of svnrdump (was: Re: svnmucc multiline property issue)
On Mon, Sep 27, 2010 at 07:48:06PM +0530, Ramkumar Ramachandra wrote: Hi Stefan, Stefan Sperling writes: I don't really mind where svnrdump lives. The community is committed to supporting both the tools/ and subversion/ directories. tools and subversion are merely directory names. All I'm saying is this: I don't want packaging/ distribution overheads for such a simple package; users should be able to use whatever Subversion-interop tools that other developers build by just having Subversion installed. There are many interoperability tools that are built on top of Subversion, and they're hosted as independent projects. By the above logic, we'd have to ship all those, and host them in our repository. And what does having Subversion installed really mean? E.g. in the Linux world, the Subversion client/server packages are often separate, but not always. It's also possible for svnsync to live in a separate package from the svn binary. And if you install from source, you get whatever the make install target installs. And maybe you also run make install-tools? Who knows. The point is that this is something packagers need to worry about, not us. With a well-maintained distribution, you can also install svnmucc easily, just like you can install svn easily. Of course, svnrdump is more likely to end up being installed if it gets installed with the default make install target. But packagers might as well decide to split the result of make install into separate packages, and we can't do anything about it. In practice, I don't think any of this is very important. People who need the svnrdump tool will find it no matter what. Even if it was hosted as an entirely separate project. If backwards-compat rules automatically apply to everything below subversion/ (they do?), and that people feel that svnrdump may still change in backwards-incompatible ways, we might as well decide to make the subversion/svnrdump directory exempt from such guarantees during the 1.7 release. It is a simple technical decision: Do we think that the current feature set is worth supporting under our backwards-compat rules? I do. Hm, I still don't understand this back-porting thing very well. Does it mean that the svnrdump should always do what it currently does functionally (plus anything additional)? Does it mean that any improvements to the command-line UI should ensure that the current command-line UI still works? Does it mean that the public API it exposes through the headers should not break- do we instead have to write corresponding '_2' functions? It means all of the above. We'll promise to support its current state until Subversion 2.0 breaks the world. That's why it's important to make sure everyone agrees that it is ready for that level of dedication. If it isn't, then we need to make sure our users understand that (by moving it to tools/, or by declaring it as experimental, or whatever). Hey, no hard feelings! I was merely citing how other version control systems make it easy for users to get access to their own history, There are quite a few systems that make getting at history really hard. But people only realise that when they're trying to migrate away to something else :) and suggesting that Subversion should too. Certainly! In the long-term, I think of svnrdump as just a simple dumping/ loading functionality of Subversion: I don't really care *how* it's present; I just think it should be present: either as part of svnrdump, svnadmin, svnsync, or something else. Yes, it's a valuable feature to have. Stefan
Re: place of svnrdump (was: Re: svnmucc multiline property issue)
Hi Stefan, Stefan Sperling writes: tools and subversion are merely directory names. All I'm saying is this: I don't want packaging/ distribution overheads for such a simple package; users should be able to use whatever Subversion-interop tools that other developers build by just having Subversion installed. There are many interoperability tools that are built on top of Subversion, and they're hosted as independent projects. By the above logic, we'd have to ship all those, and host them in our repository. No. That's what I tried to point out in my email: the interop software are tools like fast-import for hg and bzr, or even git-p4. That's not what svnrdump is: svnrdump itself is *far* from being able to provide interop. It's the *infrastructure* that's necessary for interop tools to be built in a sane and maintainable manner. If you're interested, check out git.git contrib/svn-fe leveraging the infrastructure in vcs-svn/ to convert a Subverion dumpfile v2 into a fast-import stream. It's VERY non-trivial. And what does having Subversion installed really mean? E.g. in the Linux world, the Subversion client/server packages are often separate, but not always. It's also possible for svnsync to live in a separate package from the svn binary. And if you install from source, you get whatever the make install target installs. And maybe you also run make install-tools? Who knows. The point is that this is something packagers need to worry about, not us. With a well-maintained distribution, you can also install svnmucc easily, just like you can install svn easily. Of course, svnrdump is more likely to end up being installed if it gets installed with the default make install target. But packagers might as well decide to split the result of make install into separate packages, and we can't do anything about it. In practice, I don't think any of this is very important. People who need the svnrdump tool will find it no matter what. Even if it was hosted as an entirely separate project. I see- that's an interesting perspective. Hm, I still don't understand this back-porting thing very well. Does it mean that the svnrdump should always do what it currently does functionally (plus anything additional)? Does it mean that any improvements to the command-line UI should ensure that the current command-line UI still works? Does it mean that the public API it exposes through the headers should not break- do we instead have to write corresponding '_2' functions? It means all of the above. We'll promise to support its current state until Subversion 2.0 breaks the world. That's why it's important to make sure everyone agrees that it is ready for that level of dedication. If it isn't, then we need to make sure our users understand that (by moving it to tools/, or by declaring it as experimental, or whatever). Ah. Yeah, I think it's sane enough for this. I'll put in as much work as I can before the release to fix perf issues. For now, it passes the complete svnsync testsuite but for Issue #3717. -- Ram
Re: place of svnrdump
On 2010-09-25 14:43, Daniel Shahaf wrote: Ramkumar Ramachandra wrote on Sat, Sep 25, 2010 at 11:59:49 +0530: Agreed, these modules should not be part of the core. However, in the case of Subversion, there absolutely NO way to get/ back up the revision history data* [5]. svnsync. On a side note, svnsync happens to be relatively slow. I tried to svnsync the ASF repos once (for huge test data). The slowness of svnsync made it practically unfeasible to pull off. I ended up downloading a zipped dump and 'svnadmin load'ing that dump. Even with a zipped dump already downloaded, 'unzip | svnadmin load' took a few *days* to load the 950.000+ revisions. (And someone rebooted that box after two days, halfway through, grr. Took some serious hacking to finish up without starting over.) So, that experience tells me that svnsync and svnadmin dump/load aren't close to optimal, for example compared to a straight download of 34 gigs that the ASF repos is... Anything that could speed up a remote dump/load process would probably be good -- while I don't know any details about svnrdump. My two cents: Rephrasing everything into the dump format and back blows up both data size and ETA. Maybe a remote backup mechanism could even break loose from discrete revision boundaries during transfer/load... In contrast, the speed of a remote 'svn log' just amazes me. It's pretty darn fast to get all the commit logs of a repos. So between that and getting the rev content as well there's some big speed loss. Heh, that's my reply to a single-word statement ;) ~Neels P.S.: If the whole ASF repos were a single Git WC, how long would that take to pull? (Given that Git tends to take up much more space than a Subversion repos, I wonder.) signature.asc Description: OpenPGP digital signature
place of svnrdump (was: Re: svnmucc multiline property issue)
Ramkumar Ramachandra wrote on Sat, Sep 25, 2010 at 11:59:49 +0530: Daniel Shahaf writes: Would svnrdump benefit if, once 1.7.x branched and RC's start being rolled, it were subjected to a more relaxed backporting policy? If so, we might consider moving it to tools/ for 1.7.x, with intent to move it back to subversion/svnrdump/ for 1.8.x (as soon as 1.7.x is branched from trunk). Hm? I don't understand how it'll help backporting. I already maintain an out-of-tree build that successfully compiles against 1.6 [1]. It'll be fairly trivial to write an svn18_compat module. We'll be able to follow a more relaxed Is this change backportable policy. Hyrum K. Wright writes: I would add the further thought that as this is a tool rather than a first-class component, I think we can play a little bit looser with the rules. For what it's worth, I have a different opinion on this issue. Many of the modern DVCS's speak the Git fast-import protocol: See hg-fast-import or even BzrFastImport for example [2] [3]. Even for those that don't, backing up a repository is as simple as tar'ing up a local checkout. It's a problem in the case of many centralized versioning systems, but there are third-party scripts to even get the data out of Perforce [4]. Agreed, these modules should not be part of the core. However, in the case of Subversion, there absolutely NO way to get/ back up the revision history data* [5]. svnsync. Getting it out in a version-control independent format is a secondary challenge- the primary challenge is to get the data out in /any/ format. We're currently building a module that'll assist the second task- the infrastructure is already in vcs-svn/ in Git 1.7.3. I'll propose to merge /that/ into the Subversion trunk as a tool when it's ready, however svnrdump should be part of core. Even if you don't agree with the above and claim that `svnadmin (dump|load)` fits the bill, svnrdump can provide the same functionality- it can do what svnadmin can, only faster. There's been some speculation about getting svnrdump merged into svnadmin, but let's not get into that right now. As I see it, the reason we should have svnrdump in trunk is to build it and distribute it as a part of core Subversion: to enable people to get access to their own history, and to encourage them write additional tools on top of it. I agree that svnrdump does something very useful and that it belongs in Subversion. But I'm not sure whether it's mature enough today to belong in subversion/svnrdump/. svnrdump is still young (less than, how much, 6 months old?). The code still needs a bit of cleanup to remove rough edges (e.g., the most recent one I recall is pool usage), and hasn't been tested widely. Yet, svnrdump is in subversion/ and distributed as part of the core, while svnmucc --- with 5 years of core developers' support under its belt --- doesn't get built by 'make' by default. Just to repeat: it's not a question of 'whether' svnrdump belongs in subversion/svnrdump/, it's a question of whether it belongs there right now. -- Ram [1]: http://github.com/artagnon/svnrdump [2]: http://mercurial.selenic.com/wiki/FastImportExtension [3]: http://wiki.bazaar.canonical.com/BzrFastImport [4]: http://repo.or.cz/w/git.git?a=blob_plain;f=contrib/fast-import/git-p4;hb=maint [5]: There's a Subversion equivalent of the git-p4 script called git-svn. It's a long, ugly, unmaintainable 5000-line Perl script http://repo.or.cz/w/git.git?a=blob_plain;f=git-svn.perl * I suppose it's worth mentioning svnsync here although it takes years to finish and eats up eons of disk space. Mirroring an entire repository is NOT the way to get the data of a centralized versioining system- it was not designed to be used that way.
Re: place of svnrdump (was: Re: svnmucc multiline property issue)
Hi, Daniel Shahaf writes: I agree that svnrdump does something very useful and that it belongs in Subversion. But I'm not sure whether it's mature enough today to belong in subversion/svnrdump/. svnrdump is still young (less than, how much, 6 months old?). The code still needs a bit of cleanup to remove rough edges (e.g., the most recent one I recall is pool usage), and hasn't been tested widely. Yet, svnrdump is in subversion/ and distributed as part of the core, while svnmucc --- with 5 years of core developers' support under its belt --- doesn't get built by 'make' by default. Less than 6 months, and I'm not a very experienced developer. The code in dump_editor definitely needs a little cleanup. Except for a few tests, it passes the complete svnsync testssuite. Ofcourse it's not perfect. By the time 1.7 is released, it'll probably be better than what it is now. I have no interest in politics, and I don't want to speculate why svnmucc isn't built by `make` by default. I would like to keep this discussion focused purely on the benefits and trade-offs of including svnrdump in subversion/svnrdump. If the discussion deviates from this, I would NOT like to be included in it- I'm a just a new partial committer and I don't know how your organization works. Just to repeat: it's not a question of 'whether' svnrdump belongs in subversion/svnrdump/, it's a question of whether it belongs there right now. As far as I'm concerned, development will happily chug along and whoever wants to use svnrdump will use it- the out-of-tree build will build against both Subversion 1.6 and Subversion 1.7 nicely and everyone's happy. The major downside of including svnrdump now is that users might complain that it doesn't work and will file some bugs. However, being a simple client-side utility, I doubt it'll cause any major catastrophies, even if it's used in production. The benefit is that it'll get more exposure- developers will find out about it and start writing tools that use it earlier. They will eventually discover more bugs and file them which will speed up its development. This is not speculation- the Git tool is almost ready, and will certainly be merged within the next couple of months. There are some projects using rsvndump- they will also benefit immediately. Ofcourse I understand that having authored svnrdump, it's possible that my personal interests have clouded my judgement. I've kept this in mind, and tried to be as unbiased as possible. I'll appeal to everyone else to do the same- do what you think is in the best interest of the greater good. -- Ram
Re: place of svnrdump (was: Re: svnmucc multiline property issue)
Ramkumar Ramachandra wrote on Sat, Sep 25, 2010 at 19:15:16 +0530: I have no interest in politics, and I don't want to speculate why svnmucc isn't built by `make` by default. Because it lives in tools/. I would like to keep this discussion focused purely on the benefits and trade-offs of including svnrdump in subversion/svnrdump. If the discussion deviates from this, I would NOT like to be included in it- I mentioned svnmucc as a relevant example. Nobody will force you to participate in discussions you aren't interested in. I'm a just a new partial committer and I don't know how your organization works. Your organization? I hope that you consider yourself a part of this community. You ARE a committer. The major downside of including svnrdump now is that users might complain that it doesn't work and will file some bugs. However, being a simple client-side utility, I doubt it'll cause any major catastrophies, even if it's used in production. Indeed. But it WILL get included either way --- the question is how we package it. Ofcourse I understand that having authored svnrdump, it's possible that my personal interests have clouded my judgement. I've kept this in mind, and tried to be as unbiased as possible. I'll appeal to everyone else to do the same- do what you think is in the best interest of the greater good. +1, let's keep this discussion professional. I'm trying to do that too (but your tone makes me feel you might have taken offence at some of my remarks...?) -- Ram