Re: Splitting out project from repo
for rev in `svn log -r0:HEAD ${url}/${project} | \ egrep ^r[0-9]+ | | cut -d -f1`; do svnrdump dump --incremental -r ${rev:1} ${url}/${project} ${project}.dump done Basically, I am only dumping (incrementally) the revisions which actually affect the path in question. I have since discoved that incrementally dumping specific revisions via svnrdump is not as safe as I previously thought. Some paths that were copied from outside sources did not get included because I skipped the revision in which it was copied from. So to correct myself and save others frustration - don't skip revisions with svnrdump (as in my example above) unless you absolutely know that you won't be missing anything. Bryon
Re: Splitting out project from repo
Guten Tag Bryon Winger, am Dienstag, 2. April 2013 um 23:32 schrieben Sie: Are you saying that appending to an existing dump file in general is a problem or just with all of his node-path processing? I have had no trouble appending to existing dump files. I don't know if appending to a dump file is supported or not, I just meant his revision based file naming approach. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail:thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
RE: Splitting out project from repo
Hi, The ‘svnrdump’ tool that was added in Subversion 1.7 might do exactly what you to do. This tool allows creating a dumpfile from a url (E.g. file:///path/to/repos file:///\\path\to\repos ) and should skip unrelated paths for you during the repository processing. You probably still want the svndumpfilter processing to drop empty revisions before loading it in a new repository. Bert From: Bryon Winger [mailto:bryonwin...@gmail.com] Sent: dinsdag 2 april 2013 23:32 To: subversion_us...@googlegroups.com Cc: users@subversion.apache.org; tschoen...@am-soft.de Subject: Re: Splitting out project from repo I am going through a similar process myself and have some questions about your concerns. I'm not trying to rock the boat, just looking fo clarity on a few points. For perspective, I am working with around 300 individual projects in a 70+ Gb repository containing over 300k revisions. If I understand correctly, you manually retrieve each version where the given path/project has changed in any way to afterwards dump those revisions. Why is this better/faster than using svndumpfilter with specifying an include path, but without the need to post process the dump files? I personally don't see the advantage to waiting around for svnadmin dump to process every unrelated revision. For one project, I am only concerned with about 200 revisions, spread out over 210k unrelated revisions. # This example took around 8 hours: svnadmin dump /path/to/master | svndumpfilter --drop-empty-revs \ --re-number-revs include $PROJECT $PROJECT.dump # However, when I run this on the same project: for rev in `svn log -r0:HEAD file:///path/to/master/$PROJECT file:///\\path\to\master\$PROJECT | egrep \ ^r[0-9]+ | | cut -d -f1`; do svnadmin dump --incremental -r ${rev:1} /path/to/master | svndumpfilter \ include $PROJECT $PROJECT.dump done … I can have a usable dump file in under 30 seconds. I realize this will take longer for larger projects, but I think it makes my point. ‘svnadmin dump’ is still creating a full dump stream for each revision before svndumpfilter sees that revision to decide to keep it or not. Are you sure your approach doesn't need other paths from the repo, e.g. other source paths from copy operations for projects or stuff like that? I absolutely agree with this checking for this. You can’t successfully pull out a single path using svnadmin dump / svndumpfilter if there are copies from a location outside of whatever you are filtering for. I did notice that using svnrdump pointing to url/project seems to get around the outside-copy-sources issue, but I think that’s another discussion altogether. svnadmin dump $repo --quiet -r $rev --incremental $project.$rev.bak Adding to revision files with should be impossible in your approach. Are you saying that appending to an existing dump file in general is a problem or just with all of his node-path processing? I have had no trouble appending to existing dump files. Thanks, Bryon Winger
Re: Splitting out project from repo
You probably still want the svndumpfilter processing to drop empty revisions before loading it in a new repository. I believe that the current version of svndumpfilter only operates on version 2 dump streams - which svnadmin dump produces. svnrdump produces a version 3 dump stream and is not compatible with svnrdump. That being said, I am able to get around dumping empty revisions (from a previous dump/load) with svnrdump by running something along these lines: for rev in `svn log -r0:HEAD ${url}/${project} | \ egrep ^r[0-9]+ | | cut -d -f1`; do svnrdump dump --incremental -r ${rev:1} ${url}/${project} ${project}.dump done Basically, I am only dumping (incrementally) the revisions which actually affect the path in question. This obviously is not as fast as doing everything server-side, but it does appear to work around having files or directories copied from paths outside of the particular project path. The outside-copy-paths are dumped in full as opposed to just a simple reference as to where it was originally copied from. I would appreciate some feedback if I’m missing something or if the above statement is inaccurate or unreliable. In my tests, everything appears to be the same once loaded into a fresh repository, checked out in full and diffed against the originals. There is a very brief mention in the svn-book of appending to an existing dump file, so I expect that to be safe in general. It can be found in the “*Repository Backup*http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html#svn.reposadmin.maint.backup” section by searching for ‘appending’. Thanks, Bryon Winger
Re: Splitting out project from repo
I am going through a similar process myself and have some questions about your concerns. I'm not trying to rock the boat, just looking fo clarity on a few points. For perspective, I am working with around 300 individual projects in a 70+ Gb repository containing over 300k revisions. If I understand correctly, you manually retrieve each version where the given path/project has changed in any way to afterwards dump those revisions. Why is this better/faster than using svndumpfilter with specifying an include path, but without the need to post process the dump files? I personally don't see the advantage to waiting around for svnadmin dump to process every unrelated revision. For one project, I am only concerned with about 200 revisions, spread out over 210k unrelated revisions. # This example took around 8 hours: svnadmin dump /path/to/master | svndumpfilter --drop-empty-revs \ --re-number-revs include $PROJECT $PROJECT.dump # However, when I run this on the same project: for rev in `svn log -r0:HEAD file:///path/to/master/$PROJECT | egrep \ ^r[0-9]+ | | cut -d -f1`; do svnadmin dump --incremental -r ${rev:1} /path/to/master | svndumpfilter \ include $PROJECT $PROJECT.dump done … I can have a usable dump file in under 30 seconds. I realize this will take longer for larger projects, but I think it makes my point. ‘svnadmin dump’ is still creating a full dump stream for each revision before svndumpfilter sees that revision to decide to keep it or not. Are you sure your approach doesn't need other paths from the repo, e.g. other source paths from copy operations for projects or stuff like that? I absolutely agree with this checking for this. You can’t successfully pull out a single path using svnadmin dump / svndumpfilter if there are copies from a location outside of whatever you are filtering for. I did notice that using svnrdump pointing to url/project seems to get around the outside-copy-sources issue, but I think that’s another discussion altogether. svnadmin dump $repo --quiet -r $rev --incremental $project.$rev.bak Adding to revision files with should be impossible in your approach. Are you saying that appending to an existing dump file in general is a problem or just with all of his node-path processing? I have had no trouble appending to existing dump files. Thanks, Bryon Winger
Re: Splitting out project from repo
Guten Tag Jonathan Petersson, am Freitag, 1. März 2013 um 19:54 schrieben Sie: As mentioned, the repository is incredibly huge and it would take hours for each project. That's no reason at all, your computer is doing all the work. How many hours did you try to implement your own solution to automate the process and it doesn't work? How many hours will you continue trying? What will happen if everything seems to work properly and some bugs in your rewritten dump files are recognized some days after splitting your huge repo and result in corrupted history? How many hours are needed to check out your working copies again, change your build process and all those things? What's the reason behind trying to split the source repo up in one effort and not do it as time is available? You surely worked years with your huge repo until now, why not continue working with it for some day or even weeks? Some ideas on your script: svn log file://$repo $project If I understand correctly, you manually retrieve each version where the given path/project has changed in any way to afterwards dump those revisions. Why is this better/faster than using svndumpfilter with specifying an include path, but without the need to post process the dump files? Are you sure your approach doesn't need other paths from the repo, e.g. other source paths from copy operations for projects or stuff like that? grep -e .*r[0-9].*|.* This looks really imprecise to me, I would prefer something like grep -E -e ^r[0-9]+ | just to be sure to really get what I want. svnadmin dump $repo --quiet -r $rev --incremental $project.$rev.bak Adding to revision files with should be impossible in your approach. svnadmin setuuid $project This should be unnecessary as you created a new repo per project and use --ignore-uuid during loading data into the repo. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail:thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Splitting out project from repo
I've a repository that's grown incredibly big and we're going to start breaking out each project in the repo to separate repos. However I've ran into a couple of issues in regards of Node-copyfrom-rev which doesn't match up properly upon dump/load resulting in the following: svnadmin: E160006: Relative source revision -18169 is not available in current repository svnadmin: E160013: File not found: transaction '37-1c', path 'tags/1.6.1/file.php' I've done some efforts to rewrite the dump-files and revision-definitions but it doesn't seem to match up properly making it hard to automate the process. Any suggestions of changes are welcome, please notice however that using svndumpfilter isn't really an option due to the size of the repo it takes hours to break out just a single project and this repo contains several thousand. #!/bin/bash project=$1 repo=/root/svn-copy/oldrepo rm -fr $project* mkdir $project cd $project svnadmin create $project i=1 svn log file://$repo $project | grep -e .*r[0-9].*|.* | awk '{ print substr($1,2) }' | sort -g | while read rev; do revs[$rev]=$i svnadmin dump $repo --quiet -r $rev --incremental $project.$rev.bak # Rewrite revision number to ease rewrite of Node-copyfrom perl -pi -e s/Revision-number: $rev/Revision-number: $i/; $project.$rev.bak # Rewrite node-paths perl -pi -e s/Node-path: $project\//Node-path: /; $project.$rev.bak # Rewrite Node-copyfrom-path perl -pi -e s/Node-copyfrom-path: $project\//Node-copyfrom-path: /; $project.$rev.bak # Rewrite Node-copyfrom-rev for rev in $(grep Node-copyfrom-rev $project.$i.bak | awk '{ print $2 }'); do perl -pi -e s/Node-copyfrom-rev: $rev/Node-copyfrom-rev: ${revs[$rev]}/; $project.$i.bak done # Remove prop for old project-folder sed -i /Node-path: $project/,/PROPS-END/d $project.$rev.bak svnadmin load --ignore-uuid $project $project.$rev.bak let i=$i+1 rm -fr $project.$rev.bak done svnadmin setuuid $project Please notice that the rewrite of the revision-numbers has mitigated the node-copyfrom-rev somewhat but not entirely as it seems like node-copyfrom-repo points incorrectly sometimes when dumping this way. Best