Re: Breaking up a monolothic repository
On 10/2/2013 10:36 AM, Ullrich Jans wrote: I'm now facing the same problem. My users want the rebasing, but during the dump/load instead of after the fact (apparently, it causes issues with their environment when they need to go back to an earlier revision to reproduce something). They also want to keep the empty revisions (for references from the issue tracker). I haven't tried it with svnadmin dump followed by svndumpfilter (I don't think it has that capability). The command we ended up using back in May 2011 when we did this looked like the following. It's been two years, but I'm pretty sure these two scripts is all we ended up using. - We had a master dump of the entire brc-jobs repository. - Target repository name was brc-jobs-zp (CLCODE) - It takes the dump and splits it into a smaller chunk (CLPATH). - Had to edit the script for each new client/path that we wanted to split out. It does *not* attempt to rebase the individual projects up to the root directory. It *is* possible by using 'sed' to do this in the resulting dump file, but it is trick. #!/bin/bash SOURCE=/mnt/scratch/svn-dump-brc-jobs.may2011.dump.gz DESTDIR=/var/svn/ DESTPFX=svn-raw-brc-jobs- DESTSFX=10xx.dump.gz CLCODE=zp CLPATH=Z/ZP_SingleJobs SDFOPTS='--drop-empty-revs --renumber-revs' date echo ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX} svnadmin dump --quiet /var/svn/brc-jobs | \ svndumpfilter include --quiet $SDFOPTS $CLPATH | \ gzip ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX} date The mirror to this was the script that created the new SVN repository and loads in the individual dump. Note the commented out 'sed' lines where we attempted to rebase individual project folders back up to the root of the repository. They didn't work, so we ended up just doing a move operation in the TortoiseSVN repository browser. - It changes the UUID of the newly created repository to be something unique instead of using the old repo's UUID. - Had to be edited anew for each new client/path. #!/bin/bash SRCDIR=/var/svn/ SRCPFX=svn-raw-brc-jobs- SRCSFX=10xx.dump.gz DESTDIR=/var/svn/ DESTPFX=svn-newbase-brc-jobs- DESTSFX=10xx.dump.gz SDFOPTS='--quiet --drop-empty-revs --renumber-revs' CLPARENT=Z CLCODE=zp date #gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \ #sed s/Node-path: $CLPATH\//Node-path: / | \ #sed s/Node-copyfrompath: $CLPATH\//Node-copyfrompath: / | \ #gzip ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX} svn mkdir -m Import from brc-jobs file:///var/svn/brc-jobs-${CLCODE}/${CLPARENT} gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \ svnadmin load --quiet /var/svn/brc-jobs-${CLCODE} svnlook uuid /var/svn/brc-jobs-${CLCODE} svnadmin setuuid /var/svn/brc-jobs-${CLCODE} svnlook uuid /var/svn/brc-jobs-${CLCODE} svnadmin pack /var/svn/brc-jobs-${CLCODE} chmod -R 775 /var/svn/brc-jobs-${CLCODE} chmod -R g+s /var/svn/brc-jobs-${CLCODE}/db chgrp -R svn-brc-jobs /var/svn/brc-jobs-${CLCODE} date I do wish I could have figured out the 'sed' commands to move a project from /Z/ZP_SingleJobs/JOBNR to be just /JOBNR in the repository, but there wasn't time. For rebasing, that's probably your missing piece... which I don't have.
Re: Breaking up a monolothic repository
Am 10.09.2013 19:45, schrieb Thomas Harold: When we moved from a monolithic repository to per-client repositories a few years ago, we went ahead and: - Rebased the paths up one or two levels (old system was something like monolithicrepo/[a-z]/[client directories]/[job directory]) so that the urls were now clientrepo/[job directory]. That was a tricky thing to do and we had to 'sed' the output of the dump filter before importing it back. It broke a few things, such as svn:externals which were not relative-pathed, but was worth it in the long run so that our URLs got shorter. - Made sure that the new repos all had unique UUIDs. - Renumbered all of the resulting revisions as we loaded things back in. But we didn't have to deal with any bug tracking systems that referred to a specific revision. And having lower revision numbers was preferred, along with dropping revisions that referred to other projects. I'm now facing the same problem. My users want the rebasing, but during the dump/load instead of after the fact (apparently, it causes issues with their environment when they need to go back to an earlier revision to reproduce something). They also want to keep the empty revisions (for references from the issue tracker). I haven't tried it with svnadmin dump followed by svndumpfilter (I don't think it has that capability). I've tried svnrdump (from svn 1.7), it resulted in either a new repository with the full path included (rdump/load all revs) or an interesting failure mode with a missing node during a copy operation when rdump -r revision_after_path:HEAD was used I've also tried using svnsync, but that also results in the full path included, no rebasing. How did you do it? Also, am I missing something that has been included in a current svn version? Cheers, Ulli -- Ullrich Jans, Specialist, IT-A Phone: +49 9131 7701-6627, mailto:ullrich.j...@elektrobit.com Fax: +49 9131 7701-6333, www.elektrobit.com Elektrobit Automotive GmbH, Am Wolfsmantel 46, 91058 Erlangen, Germany Managing Directors: Alexander Kocher, Gregor Zink Register Court Fürth HRB 4886 Please note: This e-mail may contain confidential information intended solely for the addressee. If you have received this e-mail in error, please do not disclose it to anyone, notify the sender promptly, and delete the message from your system. Thank you.
RE: Breaking up a monolothic repository
Am 10.09.2013 19:45, schrieb Thomas Harold: When we moved from a monolithic repository to per-client repositories a few years ago, we went ahead and: - Rebased the paths up one or two levels (old system was something like monolithicrepo/[a-z]/[client directories]/[job directory]) so that the urls were now clientrepo/[job directory]. That was a tricky thing to do and we had to 'sed' the output of the dump filter before importing it back. It broke a few things, such as svn:externals which were not relative-pathed, but was worth it in the long run so that our URLs got shorter. - Made sure that the new repos all had unique UUIDs. - Renumbered all of the resulting revisions as we loaded things back in. But we didn't have to deal with any bug tracking systems that referred to a specific revision. And having lower revision numbers was preferred, along with dropping revisions that referred to other projects. I'm now facing the same problem. My users want the rebasing, but during the dump/load instead of after the fact (apparently, it causes issues with their environment when they need to go back to an earlier revision to reproduce something). They also want to keep the empty revisions (for references from the issue tracker). Wouldn't it be much simpler to keep the current repository as a read only archives and move the HEAD of each project into its own repo? I haven't tried it with svnadmin dump followed by svndumpfilter (I don't think it has that capability). I've tried svnrdump (from svn 1.7), it resulted in either a new repository with the full path included (rdump/load all revs) or an interesting failure mode with a missing node during a copy operation when rdump -r revision_after_path:HEAD was used I've also tried using svnsync, but that also results in the full path included, no rebasing. How did you do it? Also, am I missing something that has been included in a current svn version? Cheers, Ulli
Re: Breaking up a monolothic repository
On Wed, Sep 11, 2013 at 10:49 PM, Nico Kadel-Garcia nka...@gmail.com wrote: Les, disk space isn't the issue for the empty revs. It's any operations that try to scan or assemble information from the revisions. 5000 empty objects is still a logistical burden, especially if assembling any kind of change history for the new repository. I don't see how that imposes a bigger computational burden than the same number of unrelated revisions did in the combined repo. - which typically is not a problem. We are at rev 186767 on a large multi-project repo which, although I wish it had been created as separate repos for easier future maintenance, does not have serious performance issues. And since the new repositories are effectively a rebase of a subset of the code, you don't normally *gain* anything from having empty revisions for code that is in the other new repositories. You can't meaninglfully merge content between the new smaller repositories and the old repo, barring some seriously weird cases, so it's safer to treat them as completely distinct and not bother to preserve all the empty revisions. The revision numbers are stored in support tickets is the only reason I can think of to keep them. Or pegged externals if they stay in the same relative location. Or any email, documentation or recorded discussion referring to the changes in a revision. My point is that any change that requires new training or human intervention to fix something is never going to win back that time. Someone who completely understands the current process and user base might be able to optimize and improve it with drastic changes, but that seems unlikely if they are asking for advice on a mail list. -- Les Mikesell lesmikes...@gmail.com
Re: Breaking up a monolothic repository
Les, disk space isn't the issue for the empty revs. It's any operations that try to scan or assemble information from the revisions. 5000 empty objects is still a logistical burden, especially if assembling any kind of change history for the new repository. And since the new repositories are effectively a rebase of a subset of the code, you don't normally *gain* anything from having empty revisions for code that is in the other new repositories. You can't meaninglfully merge content between the new smaller repositories and the old repo, barring some seriously weird cases, so it's safer to treat them as completely distinct and not bother to preserve all the empty revisions. The revision numbers are stored in support tickets is the only reason I can think of to keep them. On Tue, Sep 10, 2013 at 11:35 AM, Les Mikesell lesmikes...@gmail.comwrote: On Tue, Sep 10, 2013 at 6:22 AM, Nico Kadel-Garcia nka...@gmail.com wrote: Even if the history is considered sacrosanct (and this is often a theological policy, not an engineering one!), an opportunity to reduce the size of each reaporitory by discarding deadwood at switchover time should be taken seriously. Those empty revs take what, a couple of dollars worth of disk space (OK, x3 or 4 for backups...), vs. how much human time will it take to make everyone involved understand that you use one procedure for revisions before a certain date, and a different one after, and to get diffs between them you have to either check out both copies and use local tools or map the rev number from your old reference to the new numbering scheme? And then there are likely to be pegged externals to pull in components that you'll have to fix even if they stay within the same project repo and use relative notation. I'd call not unnecessarily changing the history you use a version control system to preserve to be 'philosophically correct' as opposed to a theological requirement. If your engineering choices were always right the first time, you probably wouldn't have all these revisions in the first place. -- Les Mikesell lesmikes...@gmail.com
Re: Breaking up a monolothic repository
Guten Tag Trent W. Buck, am Dienstag, 10. September 2013 um 02:49 schrieben Sie: ...hm, still 1.6. Is it worth me backporting a newer svn? I would give it a try, get yourself a current build of 1.8, dump your old repo and load it into a newly created from your 1.8 version and see how much space is saved. Your version information about the repo looks current enough to already use representation sharing, but depending on how the upgrades were made, svnadmin upgrade vs. full dump/load cycle, there maybe old duplicate data in the repo created before svnadmin upgrade. Besides that, 1.8 made improvements to reduce disk space, too. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail:thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: Breaking up a monolothic repository
On 9/9/2013 8:49 PM, Trent W. Buck wrote: I'm partway through provisioning the replacement Debian 7 server, which will have subversion 1.6.17dfsg-4+deb7u3 apache22.2.22-13 ...hm, still 1.6. Is it worth me backporting a newer svn? Yes, it's worth installing 1.8.3. http://www.wandisco.com/subversion/download#debian7
Re: Breaking up a monolothic repository
Have you checked if the users have/need anything (emails, ticket system, etc.) that refer to specific revisions or the history of changes made there? It seems kind of drastic to throw that away because you think the numbers aren't pretty enough. But keeping thousands of empty commits in a project they're not relevant to is confusing and wasteful. The repository and repository URL's for the old project should be preserved, if possible, locked down and read-only, precisely for this kind of change history. But since the repository is being completely refactored *anyway*, it's a great opportunity to discard debris. Even if the history is considered sacrosanct (and this is often a theological policy, not an engineering one!), an opportunity to reduce the size of each reaporitory by discarding deadwood at switchover time should be taken seriously.
Re: Breaking up a monolothic repository
On Tue, Sep 10, 2013 at 6:22 AM, Nico Kadel-Garcia nka...@gmail.com wrote: Even if the history is considered sacrosanct (and this is often a theological policy, not an engineering one!), an opportunity to reduce the size of each reaporitory by discarding deadwood at switchover time should be taken seriously. Those empty revs take what, a couple of dollars worth of disk space (OK, x3 or 4 for backups...), vs. how much human time will it take to make everyone involved understand that you use one procedure for revisions before a certain date, and a different one after, and to get diffs between them you have to either check out both copies and use local tools or map the rev number from your old reference to the new numbering scheme? And then there are likely to be pegged externals to pull in components that you'll have to fix even if they stay within the same project repo and use relative notation. I'd call not unnecessarily changing the history you use a version control system to preserve to be 'philosophically correct' as opposed to a theological requirement. If your engineering choices were always right the first time, you probably wouldn't have all these revisions in the first place. -- Les Mikesell lesmikes...@gmail.com
Re: Breaking up a monolothic repository
On 9/10/2013 7:22 AM, Nico Kadel-Garcia wrote: But keeping thousands of empty commits in a project they're not relevant to is confusing and wasteful. The repository and repository URL's for the old project should be preserved, if possible, locked down and read-only, precisely for this kind of change history. But since the repository is being completely refactored *anyway*, it's a great opportunity to discard debris. When we moved from a monolithic repository to per-client repositories a few years ago, we went ahead and: - Rebased the paths up one or two levels (old system was something like monolithicrepo/[a-z]/[client directories]/[job directory]) so that the urls were now clientrepo/[job directory]. That was a tricky thing to do and we had to 'sed' the output of the dump filter before importing it back. It broke a few things, such as svn:externals which were not relative-pathed, but was worth it in the long run so that our URLs got shorter. - Made sure that the new repos all had unique UUIDs. - Renumbered all of the resulting revisions as we loaded things back in. But we didn't have to deal with any bug tracking systems that referred to a specific revision. And having lower revision numbers was preferred, along with dropping revisions that referred to other projects. Even if the history is considered sacrosanct (and this is often a theological policy, not an engineering one!), an opportunity to reduce the size of each repository by discarding deadwood at switchover time should be taken seriously. Less of an issue now that svn 1.8 has revprop packing (plus the rev packing from 1.6). That deadwood takes up a lot less space in terms of the number of files in the file system. And the fact that svnadmin hotcopy is now incremental in 1.8 also makes it less of an issue. Having a few thousand (tens of thousands) revisions in a repository is no longer a big bottleneck during the hotcopy process like it was before. Our backup system is also a lot happier with fewer files to backup.
RE: Breaking up a monolothic repository
-Original Message- From: t...@elba.apache.org [mailto:t...@elba.apache.org] On Behalf Of Trent W. Buck Sent: Monday, September 09, 2013 11:38 PM To: users@subversion.apache.org Subject: Re: Breaking up a monolothic repository Les Mikesell lesmikes...@gmail.com writes: On Mon, Sep 9, 2013 at 7:23 PM, Trent W. Buck trentb...@gmail.com wrote: Ryan Schmidt subversion-20...@ryandesign.com writes: As someone used to Subversion's usually sequential revision numbers, that bugs me aesthetically, but it works fine. I think that's the crux of it. Have you checked if the users have/need anything (emails, ticket system, etc.) that refer to specific revisions or the history of changes made there? It seems kind of drastic to throw that away because you think the numbers aren't pretty enough. That is an extremely valid point. I'll check. Also part of the reason to split up the repos is to make access control easier, and it looks bad if Alice (who should have access to project 1 but not project 2) can see Bob's old commit metadata to project 2, even if she can't see the commit bodies after the split. How does this work now in the combined repository? Right now, they don't have it with the combined repo. Anyone in the svn group can read everything. (This is one of the reasons they want to break up the single repo into per-project repos.) You should knock the reason off the list. You can set up path based authorization fairly easily. (especially compared to braking it up into multiple repos.) BOb
Re: Breaking up a monolothic repository
On Tue, Sep 10, 2013 at 4:36 PM, Bob Archer bob.arc...@amsi.com wrote: Also part of the reason to split up the repos is to make access control easier, and it looks bad if Alice (who should have access to project 1 but not project 2) can see Bob's old commit metadata to project 2, even if she can't see the commit bodies after the split. How does this work now in the combined repository? Right now, they don't have it with the combined repo. Anyone in the svn group can read everything. (This is one of the reasons they want to break up the single repo into per-project repos.) You should knock the reason off the list. You can set up path based authorization fairly easily. (especially compared to braking it up into multiple repos.) Unless you already have a central authentication source you'll have a certain tradeoff in complexity between maintaining password control for multiple repos vs. path-based control in a single one and if there are external references where different groups use each others' libraries it can be a little messy either way. -- Les Mikesell lesmikes...@gmail.com
Re: Breaking up a monolothic repository
Guten Tag Trent W. Buck, am Montag, 9. September 2013 um 03:13 schrieben Sie: What else can I do? Tell us about the size of your repo, it's format version and primary data types versioned, as you always can simply clone the entire repo into one for each project needed and delete and move unneeded contents per new project repo with a Subversion client. The current format of the repo and it's primary data types are interesting because if it's pretty old, current repo versions may provide a significantly reduced disk space per repo, making the overhead of duplicating the original one acceptable. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail:thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: Breaking up a monolothic repository
On Sun, Sep 8, 2013 at 8:13 PM, Trent W. Buck trentb...@gmail.com wrote: I'm stuck. Since it's no fun to have tens of thousands of empty revs in each project repo, my current approach is to leave existing projects in the monolithic repo, and new projects get separate repos. Why do you think an empty rev will bother anyone any more in a per-project rev that having the rev number jump from a commit to an unrelated project does in the combined repo?It shouldn't be a problem in either case. Rev numbers for any particular use don't need to be sequential, you just need to know what they are. -- Les Mikesell lesmikes...@gmail.com
RE: Breaking up a monolothic repository
I can see Trent's view point that people are weird and get freaked out by the unexpected (where they might expect the revision numbers to be relatively low). I guess what we should be providing him are points like you do make to help him sell why this isn't an issue to the end users. Like Les says, if someone performs a large batch of commits to a particular branch then the trunk revision numbers are going to leap forward (unexpectedly). So what to sell those folks concerned about it is that they're experiencing this already. -- David Grierson - SDLC Tools Specialist Sky Broadcasting - Customer Business Systems - SDLC Tools Tel: +44 1506 325100 / Email: david.grier...@bskyb.com / Chatter: CBS SDLC Tools Watermark Building, Alba Campus, Livingston, EH54 7HH -Original Message- From: Les Mikesell [mailto:lesmikes...@gmail.com] Sent: 09 September 2013 13:32 To: Trent W. Buck Cc: Subversion Subject: Re: Breaking up a monolothic repository On Sun, Sep 8, 2013 at 8:13 PM, Trent W. Buck trentb...@gmail.com wrote: I'm stuck. Since it's no fun to have tens of thousands of empty revs in each project repo, my current approach is to leave existing projects in the monolithic repo, and new projects get separate repos. Why do you think an empty rev will bother anyone any more in a per-project rev that having the rev number jump from a commit to an unrelated project does in the combined repo?It shouldn't be a problem in either case. Rev numbers for any particular use don't need to be sequential, you just need to know what they are. -- Les Mikesell lesmikes...@gmail.com Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and Sky International AG and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.
Re: Breaking up a monolothic repository
On Mon, Sep 9, 2013 at 8:03 AM, Grierson, David david.grier...@bskyb.com wrote: I can see Trent's view point that people are weird and get freaked out by the unexpected (where they might expect the revision numbers to be relatively low). I could see that for someone who had never used subversion before and did not understand the concept of global revision numbers, but not for anyone who has used a multi-project repository. I guess what we should be providing him are points like you do make to help him sell why this isn't an issue to the end users. Like Les says, if someone performs a large batch of commits to a particular branch then the trunk revision numbers are going to leap forward (unexpectedly). So what to sell those folks concerned about it is that they're experiencing this already. Revision numbers aren't something you guess at or expect anything from. They are only useful in terms of the repository history, and it doesn't matter if your project runs sequentially or not. If you want names/numbers that make human sense, you'll be copying to tags for easier reference anyway. -- Les Mikesell lesmikes...@gmail.com
Re: Breaking up a monolothic repository
On Sep 9, 2013, at 07:31, Les Mikesell wrote: On Sun, Sep 8, 2013 at 8:13 PM, Trent W. Buck wrote: I'm stuck. Since it's no fun to have tens of thousands of empty revs in each project repo, my current approach is to leave existing projects in the monolithic repo, and new projects get separate repos. Why do you think an empty rev will bother anyone any more in a per-project rev that having the rev number jump from a commit to an unrelated project does in the combined repo?It shouldn't be a problem in either case. Rev numbers for any particular use don't need to be sequential, you just need to know what they are. This is true. Heck, if you use a dvcs like git or hg you'll get a completely random revision number (shaped like a sha1 hash) every time. As someone used to Subversion's usually sequential revision numbers, that bugs me aesthetically, but it works fine. There are also some reasons why keeping the revision number from the old monolithic repository in your new repositories (with empty padding revisions in between) is a really good idea. Have you ever referenced revision numbers in your issue tracker (fixed in r111; r222 broke xyz) or in emails (can you explain what you did in r333; r444 is a great example of abc) or in commit messages (reverted r555; added file forgotten in r666)? If so, you don't want to renumber revs, because that would invalidate all those references.
Re: Breaking up a monolothic repository
Ryan Schmidt subversion-20...@ryandesign.com writes: As someone used to Subversion's usually sequential revision numbers, that bugs me aesthetically, but it works fine. I think that's the crux of it. Also part of the reason to split up the repos is to make access control easier, and it looks bad if Alice (who should have access to project 1 but not project 2) can see Bob's old commit metadata to project 2, even if she can't see the commit bodies after the split.
Re: Breaking up a monolothic repository
Thorsten Schöning tschoen...@am-soft.de writes: Tell us about the size of your repo it's format version and primary data types versioned (Sorry for not giving this info earlier, and shifting the goal posts -- I personally went rcs-arch-darcs-git and never really used svn, so I'm feeling pretty noob attacking this problem.) du reports it is 18GiB. The current revno is 16115. $ grep . /home/svn/PI/{format,db/fs-type,db/format} /home/svn/PI/format:5 /home/svn/PI/db/fs-type:fsfs /home/svn/PI/db/format:4 /home/svn/PI/db/format:layout sharded 1000 As to what kind of files are in there -- I'm not actually sure. Just doing a dumb look at HEAD's list of files, $ svn ls -R file:///home/svn/PI | wc -l 269281 And looking at the most common extensions: $ svn ls -R file:///home/svn/PI | sed -n 's/.*\.//p' | sort | uniq -c | sort -nr | head -20 36581 h 2438 txt 21732 patch 2375 sh 17621 html 2362 i 15023 c 2121 bmp 8143 py 1957 mk 3919 cpp1932 po 3559 png1916 class 3074 gif1813 lua 2950 xml1742 cs 2585 properties 1613 hpp Obviously that's not weighted by size, and completely ignores anything that's not in HEAD anymore. * * * It's currently hosted on an Ubuntu 10.04 server, so my server svn is quite old: subversion 1.6.6dfsg-2ubuntu1.3 apache22.2.14-5ubuntu8.12 I believe some of the users have svn 1.7 on their desktops, but not all. I'm partway through provisioning the replacement Debian 7 server, which will have subversion 1.6.17dfsg-4+deb7u3 apache22.2.22-13 ...hm, still 1.6. Is it worth me backporting a newer svn?
Re: Breaking up a monolothic repository
trentb...@gmail.com (Trent W. Buck) writes: So then I thought to chain the two approaches. This didn't work -- the empty revs were not removed. I guess svndumpfilter --drop-empty-revs is only smart enough to drop the revs that have just *become* empty? rm -rf delete-me-3 svnadmin create delete-me-3 svnadmin dump delete-me-2 | svndumpfilter --drop-empty-revs exclude /canthappen | svnadmin load delete-me-3 A helpful offlist correspondent noted svn 1.8 has --drop-all-empty-revs, so I might try building that long enough to try that option.
Re: Breaking up a monolothic repository
On Mon, Sep 9, 2013 at 7:23 PM, Trent W. Buck trentb...@gmail.com wrote: Ryan Schmidt subversion-20...@ryandesign.com writes: As someone used to Subversion's usually sequential revision numbers, that bugs me aesthetically, but it works fine. I think that's the crux of it. Have you checked if the users have/need anything (emails, ticket system, etc.) that refer to specific revisions or the history of changes made there? It seems kind of drastic to throw that away because you think the numbers aren't pretty enough. Also part of the reason to split up the repos is to make access control easier, and it looks bad if Alice (who should have access to project 1 but not project 2) can see Bob's old commit metadata to project 2, even if she can't see the commit bodies after the split. How does this work now in the combined repository? -- Les Mikesell lesmikes...@gmail.com
Re: Breaking up a monolothic repository
Les Mikesell lesmikes...@gmail.com writes: On Mon, Sep 9, 2013 at 7:23 PM, Trent W. Buck trentb...@gmail.com wrote: Ryan Schmidt subversion-20...@ryandesign.com writes: As someone used to Subversion's usually sequential revision numbers, that bugs me aesthetically, but it works fine. I think that's the crux of it. Have you checked if the users have/need anything (emails, ticket system, etc.) that refer to specific revisions or the history of changes made there? It seems kind of drastic to throw that away because you think the numbers aren't pretty enough. That is an extremely valid point. I'll check. Also part of the reason to split up the repos is to make access control easier, and it looks bad if Alice (who should have access to project 1 but not project 2) can see Bob's old commit metadata to project 2, even if she can't see the commit bodies after the split. How does this work now in the combined repository? Right now, they don't have it with the combined repo. Anyone in the svn group can read everything. (This is one of the reasons they want to break up the single repo into per-project repos.)
Breaking up a monolothic repository
I have inherited a single monolithic repo for all the company's projects. I want to migrate to one repo per project. (One-way, one-time migration.) Following the red-bean book[0], I first tried svnadmin, which was really slow, and eventually crashed because some files were copied into projects/133_Redacted from a different subdir. rm -rf delete-me svnadmin create delete-me svnadmin dump /srv/svn/Frobozz | svndumpfilter --drop-empty-revs include projects/133_Redacted | svnadmin load delete-me [...] svndumpfilter: Invalid copy source path '/EE/ProjectDocs/133_Redacted/REDACTED.pdf' svnadmin: Can't write to stream: Broken pipe Started new transaction, based on original revision 4182 svnadmin: File not found: transaction '0-0', path 'projects/133_Redacted' * adding path : projects/133_Redacted ... Freenode's #svn IRC channel advised me to use svnsync instead. That was really slow, eventually succeeded, but left a tonne of empty commit messages rm -rf delete-me-2 svnadmin create delete-me-2 ln -s /bin/true delete-me-2/hooks/pre-revprop-change svnsync init file://$PWD/delete-me-2 file:///srv/svn/Frobozz/projects/133_Redacted svnsync sync file://$PWD/delete-me-2 rm delete-me-2/hooks/pre-revprop-change So then I thought to chain the two approaches. This didn't work -- the empty revs were not removed. I guess svndumpfilter --drop-empty-revs is only smart enough to drop the revs that have just *become* empty? rm -rf delete-me-3 svnadmin create delete-me-3 svnadmin dump delete-me-2 | svndumpfilter --drop-empty-revs exclude /canthappen | svnadmin load delete-me-3 I also thought of converting to git fast-export format and back again, but AFAICT there is no way to import a fast-export into a svn repo. I'm stuck. Since it's no fun to have tens of thousands of empty revs in each project repo, my current approach is to leave existing projects in the monolithic repo, and new projects get separate repos. What else can I do? [0] http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html
Re: Breaking up a monolothic repository
Lock the existing repo: Do clean exports, and imports, to new repositories with the new layout, with a README.md or other guideline to where the legacy repository exists. You lose the infinitely preserved history this way, but for most working software projects, you don't *need* that. And it's a good opportunity to discard materials, such as bulky binaries or security sensitive files with plain text passwords. On Sun, Sep 8, 2013 at 9:13 PM, Trent W. Buck trentb...@gmail.com wrote: I have inherited a single monolithic repo for all the company's projects. I want to migrate to one repo per project. (One-way, one-time migration.) Following the red-bean book[0], I first tried svnadmin, which was really slow, and eventually crashed because some files were copied into projects/133_Redacted from a different subdir. rm -rf delete-me svnadmin create delete-me svnadmin dump /srv/svn/Frobozz | svndumpfilter --drop-empty-revs include projects/133_Redacted | svnadmin load delete-me [...] svndumpfilter: Invalid copy source path '/EE/ProjectDocs/133_Redacted/REDACTED.pdf' svnadmin: Can't write to stream: Broken pipe Started new transaction, based on original revision 4182 svnadmin: File not found: transaction '0-0', path 'projects/133_Redacted' * adding path : projects/133_Redacted ... Freenode's #svn IRC channel advised me to use svnsync instead. That was really slow, eventually succeeded, but left a tonne of empty commit messages rm -rf delete-me-2 svnadmin create delete-me-2 ln -s /bin/true delete-me-2/hooks/pre-revprop-change svnsync init file://$PWD/delete-me-2 file:///srv/svn/Frobozz/projects/133_Redacted svnsync sync file://$PWD/delete-me-2 rm delete-me-2/hooks/pre-revprop-change So then I thought to chain the two approaches. This didn't work -- the empty revs were not removed. I guess svndumpfilter --drop-empty-revs is only smart enough to drop the revs that have just *become* empty? rm -rf delete-me-3 svnadmin create delete-me-3 svnadmin dump delete-me-2 | svndumpfilter --drop-empty-revs exclude /canthappen | svnadmin load delete-me-3 I also thought of converting to git fast-export format and back again, but AFAICT there is no way to import a fast-export into a svn repo. I'm stuck. Since it's no fun to have tens of thousands of empty revs in each project repo, my current approach is to leave existing projects in the monolithic repo, and new projects get separate repos. What else can I do? [0] http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html
Re: Breaking up a monolothic repository
Nico Kadel-Garcia nka...@gmail.com writes: Lock the existing repo: Do clean exports, and imports, to new repositories with the new layout, with a README.md or other guideline to where the legacy repository exists. You lose the infinitely preserved history this way, but for most working software projects, you don't *need* that. And it's a good opportunity to discard materials, such as bulky binaries or security sensitive files with plain text passwords. Ah, sorry, I forgot to mention that preserving history was a hard requirement handed down from higher up. I get the impression that $company's projects mostly have a finite lifespan (a couple of years), so I think that approach ends up being very similar to my current plan of creating new projects as new repos, and letting the monolithic repo die out via attrition. I don't actually know exactly what they put in their repos; I think it's about half huge unpacked source tarball I downloaded from somewhere then tinkered with and half huge CAD files and .docx contracts.
RE: Breaking up a monolothic repository
From: Trent W. Buck Sent: Monday, 9 September 2013 12:17 PM Nico Kadel-Garcia nka...@gmail.com writes: Lock the existing repo: Do clean exports, and imports, to new repositories with the new layout, with a README.md or other guideline to where the legacy repository exists. You lose the infinitely preserved history this way, but for most working software projects, you don't *need* that. And it's a good opportunity to discard materials, such as bulky binaries or security sensitive files with plain text passwords. Ah, sorry, I forgot to mention that preserving history was a hard requirement handed down from higher up. You *could* argue that the existing repository preserves the history. However, I think I know what they mean. I get the impression that $company's projects mostly have a finite lifespan (a couple of years), By lifespan, what exactly do you mean? At my company, the individual projects might be in production within anywhere from 6 months to 2 years after start of development, be manufactured for two to four years, then go into support mode for up to 7 years (or more). so I think that approach ends up being very similar to my current plan of creating new projects as new repos, and letting the monolithic repo die out via attrition. That sounds like an easy way to do things. I don't actually know exactly what they put in their repos; I think it's about half huge unpacked source tarball I downloaded from somewhere then tinkered with and half huge CAD files and .docx contracts. It's entirely possible that the empty commit messages you reported were due to users not actually entering anything in the messages. Many of the commit messages I've seen (particularly from non-software people, but even from a few of those) are less informative than I'd like - a lot are totally empty. Regards, Geoff -- Apologies for the auto-generated legal boilerplate added by our IT department: - The contents of this email, and any attachments, are strictly private and confidential. - It may contain legally privileged or sensitive information and is intended solely for the individual or entity to which it is addressed. - Only the intended recipient may review, reproduce, retransmit, disclose, disseminate or otherwise use or take action in reliance upon the information contained in this email and any attachments, with the permission of Australian Arrow Pty. Ltd. - If you have received this communication in error, please reply to the sender immediately and promptly delete the email and attachments, together with any copies, from all computers. - It is your responsibility to scan this communication and any attached files for computer viruses and other defects and we recommend that it be subjected to your virus checking procedures prior to use. - Australian Arrow Pty. Ltd. does not accept liability for any loss or damage of any nature, howsoever caused, which may result directly or indirectly from this communication or any attached files.
Re: Breaking up a monolothic repository
Geoff Field geoff_fi...@aapl.com.au writes: I get the impression that $company's projects mostly have a finite lifespan (a couple of years), By lifespan, what exactly do you mean? At my company, the individual projects might be in production within anywhere from 6 months to 2 years after start of development, be manufactured for two to four years, then go into support mode for up to 7 years (or more). That's probably a more accurate way of putting it. But the bottom line is migration through attrition ought to work. It's entirely possible that the empty commit messages you reported were due to users not actually entering anything in the messages. Many of the commit messages I've seen (particularly from non-software people, but even from a few of those) are less informative than I'd like - a lot are totally empty. Ah, sorry, I wasn't clear. Supposing the repo has two subdirs: projects/1_Muffins projects/2_Cakes Then when I use svnsync to make a repo that only contains projects/2_Cakes, I still have a bunch of commits that WERE making changes to projects/1_Muffins -- so they have commit messages and authors and times and suchlike metadata -- but they don't actually *do* anything anymore, because they files they edited aren't in projects/2_Cakes. If there were only two projects, it wouldn't be too bad, but suppose 100 projects, with 1000 commits each. If I use svnsync, I end up with 100 repos, each of which has 99,000 useless commits.