Efficiency of rep-sharing (deduplication) in 1.8 and later
I have a question about how efficient SVN is at de-duplication within a repository with regards to files that appear in multiple locations, but which have the same content. I know a small improvement was made in 1.8... http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements When representation sharing has been enabled, Subversion 1.8 will now be able to detect files and properties with identical contents within the same revision and only store them once. This is a common situation when you for instance import a non-incremental dump file or when users apply the same change to multiple branches in a single commit. #1 - If a commit puts files A, B and C into the repository, and a latter commit puts files B, C and D into the repository at a different location, is SVN smart enough to realize that B and C are already stored in the repository? In other words, does it track each individual file separately, even if they were all part of one big revision?
Re: Repository Structure Question
On 1/2/2014 5:25 PM, Mike Fochtman wrote: Currently the team hasn't used any form of version control on these applications because 'it would be too hard...' I think you can get 99% of the way there by making sure that application 'A' is under full version control. Some version control is better then no version control, so tackle project A first. I'm part of a small development team (currently 4). We have two applications used in-house that consist of about 1900 source files. The two applications share about 1880 of the files in common, and there are only about 20 different between them. For a lot of complicated reasons I won't go into here, we can't split the common files into a shared-library sort of project. Most of our development goes on in application 'A'. Currently we then transferred over to the other application 'B' development machine manually and build/test that one. I would put application B into the same repository under a 2nd root directory. The primary reason that I recommend a single repository for both applications is so that SVN 1.8's duplicate detection will keep your repository size under control. So you would have: /projectA/(trunk|branches|tags) /projectB/(trunk|branches|tags) There's a few ways to tackle moving stuff from project A to project B. Most of them involve making sure that the unique files not shared across the applications are in a separate directory. One method would be to checkout project B's files, then use svn export to overlay project A's files into the project B's working copy. It's messy, but it duplicates your existing process. http://svnbook.red-bean.com/en/1.7/svn.ref.svn.c.export.html Another option would be to branch A's trunk (or stable release tag) into B's trunk. Then apply B's changes to make the application look like B. http://svnbook.red-bean.com/en/1.7/svn.branchmerge.using.html Or you could combine the approaches and setup your repository like: /Common/(trunk|branches|tags) /projectA/(trunk|branches|tags) /projectB/(trunk|branches|tags) /buildA/(trunk|branches|tags) /buildB/(trunk|branches|tags) Where the files unique to ProjectA are in /projectA/trunk, the files unique to project B are in /projectB/trunk. The files common to both applications are in /Common/trunk. The /buildA/trunk tree is then where you use svn:external to weld files from Common + ProjectA together into something that builds for application A. And /buildB/trunk is where you use svn:external to weld together the application B build. http://svnbook.red-bean.com/en/1.7/svn.advanced.externals.html
Re: svn hotcopy
On 12/26/2013 3:42 PM, Listman wrote: I am using svn 1.5.5 and I backup with hotcopy. I am starting to see that my repository which 50G is backing up as 48G with hotcopy. I can’t figure it out and my friend google is not helping at all. Does any one have a clue? I agree with Thorsten. There may be leftover cruft in your repository directory, which gets ignored during the hotcopy process. Use svnadmin verify on the repository to check for errors. (You should probably be running svnadmin verify on a weekly or monthly basis anyway.) Also, upgrading to 1.8 is a very good idea. Since you are using 1.5, I recommend a full svnadmin dump / svnadmin load cycle to put it into 1.8 format. Because of the changes in the repository format between 1.5 and 1.8, you might even seen 5-20% space reduction in the size of your repository. (We averaged about 15% size reduction going from 1.6 to 1.8, plus a huge reduction in the number of individual files due to revision and revprop packing. YMMV.)
Re: Upgrade Subversion Repository from 1.5 into 1.8
On 12/16/2013 9:03 AM, Krishnamoorthi Gopal wrote: Thanks for your clarification pavel.. If i used existing repositories in Subversion 1.8 then how can i benefit features in new version.. Shall i use commands like svnadmin Upgrade to upgrade my existing repos into latest.. As Mark says, svnadmin dump and svnadmin load cycle is the best way to upgrade older SVN repositories to 1.8 because it will completely convert it into 1.8 format (including the new space-saving additions to the repository format). However, you don't have to do it all at once. You could start running SVN 1.8 on the server, then upgrade the individual repositories to the 1.8 format at your leisure. We spread our migration out over a few weeks (going from 1.6 to 1.8 format). So during the migration period we had a mix of repository formats on the server. Client-side working copies, however, are much more all-or-nothing. When the client moves to 1.8, all of the working copies also have to be upgraded to 1.8. And we still have a few 1.6 and 1.7 clients talking to our 1.8 server. Naturally, you should be making good backups of your SVN repositories daily. And the dump/load cycle is a good time to copy the dump files off to long-term storage.
Re: Update-Only Checkout Enhancement
On 12/10/2013 8:45 PM, Mark Kneisler wrote: I have several environments where I’d like to use a SVN checkout, but where I’d never ever want to make changes to the files or perform a commit. For these environments, I’d only want to perform an update or an update to revision. In cases where you do not want a .svn directory and you are using Linux, take a look at FSVS: http://fsvs.tigris.org/ This is a command line tool that works very similar to the svn command-line tool and talks to an SVN repository. We make heavy use of it to version-control our Linux servers (especially the files under /usr/local, /boot and /etc). The big difference over using FSVS vs SVN on the Linux box is that FSVS does not create a .svn folder in the root. I don't know off-hand if FSVS can be used in Cygwin under Windows.
Re: Update-Only Checkout Enhancement
On 12/11/2013 2:19 PM, Bob Archer wrote: On 11.12.2013 17:21, Mark Kneisler wrote: I think making the pristine files optional would work for me. Here’s an idea. Instead of having pristine copies of all files, how about adding to the pristine directory only when a file is changed? You know, that's a great idea! I wonder why we never thought of it ourselves? :) Wouldn't that mean that you need to have some daemon service (or file watcher or something) running to determine if a file is modified? Also, it would mean you would need a constant connection to the server to use a subversion working copy. Not necessarily. Take a look at how FSVS does its magic. http://fsvs.tigris.org/ It functions in a similar manner to the svn command-line tool, but works without requring a .svn folder. Which is why I prefer it for doing version control of system configuration files on a Linux server.
Re: Tools for projecting growth of repository storage?
On 12/2/2013 7:58 PM, Eric Johnson wrote: Anyone have a suggestion for a tool that projects the growth of repository storage. I've got repos taking over 75% of a disk volume, and I'm curious to project out when I'll need new storage. Obviously, this is approximate, but has anyone got a tool for it? Eric. We keep our repositories on a dedicated file system (ext4) and run collectd on the box to track file system space usage (the df plugin). Combine that with a graphing tool for collectd that can read the RRD files (such as the web-based CGP front-end) and we get nice pretty charts. http://imgur.com/xDZ9BGu As you can see in Week 27-29, we had some runaway growth which alerted me that I needed to take a look at what was being automatically committed. In our case, it was FSVS doing automated commits of a Linux box where we should have ignored/excluded some additional directories. When looking at my quarterly graph (13 weeks), CGP gives me numbers like: Used (Minimum) 96.9GB Used (Last) 99.2GB - which means I have only seen 2.3GB of growth over 13 weeks, or about 10GB per year at current rate of growth. We also run a small script each day that checks the file systems and sends an alert if any file system is over 75% full.
Re: svn backup
On 10/7/2013 3:37 PM, rvaede wrote: I am confused on how to backup my repositary. Note: Making a raw file-level backup of a SVN repository is not advisable. There may be open files, files that changed while the backup is running, etc. So you will need to use one of the hotcopy / dump methods to get a good snapshot of the repository state for inclusion onto a backup tape/disk/set. If you want to backup everything, including server-side hook scripts and the like which are stored under the repository directory, take a look at svnadmin hotcopy. It's essentially an rsync (but not quite) of the origin repository directory. The hotcopy directories (since they only change once per day, or whenever you do the hotcopy) are ideal for being used to backup the repository. http://svnbook.red-bean.com/nightly/en/svn.ref.svnadmin.c.hotcopy.html The svnadmin dump is more suitable for long-term archival of the SVN repository because it stores the data in platform-neutral format. It will be much larger then the original repository was (even with gzip -5) and will take a long time to perform the dump. http://svnbook.red-bean.com/nightly/en/svn.ref.svnadmin.c.dump.html The third option is to run a hot-spare system using the svn sync. Except that this does not give you generational backups, so you still need to use either hotcopy/dump as well. http://svnbook.red-bean.com/nightly/en/svn.ref.svnsync.html Personally: I prefer hotcopy for daily backups, then using rdiff-backup to make a backup of the hotcopy directory or our main backup server. The rdiff-backup step gives me the ability to go back to any day within the past 26 weeks (configurable). Combined with the use of generational media, I have multiple copies of the rdiff-backup target directory. (rsnapshot would also be a good choice.) I'll also make svnadmin dumps every few months, but it takes a long time to do, uses up a lot of disk space (3x for us) and has few advantages compared to the rdiff-backup of the hotcopy directory.
Re: Breaking up a monolothic repository
On 10/2/2013 10:36 AM, Ullrich Jans wrote: I'm now facing the same problem. My users want the rebasing, but during the dump/load instead of after the fact (apparently, it causes issues with their environment when they need to go back to an earlier revision to reproduce something). They also want to keep the empty revisions (for references from the issue tracker). I haven't tried it with svnadmin dump followed by svndumpfilter (I don't think it has that capability). The command we ended up using back in May 2011 when we did this looked like the following. It's been two years, but I'm pretty sure these two scripts is all we ended up using. - We had a master dump of the entire brc-jobs repository. - Target repository name was brc-jobs-zp (CLCODE) - It takes the dump and splits it into a smaller chunk (CLPATH). - Had to edit the script for each new client/path that we wanted to split out. It does *not* attempt to rebase the individual projects up to the root directory. It *is* possible by using 'sed' to do this in the resulting dump file, but it is trick. #!/bin/bash SOURCE=/mnt/scratch/svn-dump-brc-jobs.may2011.dump.gz DESTDIR=/var/svn/ DESTPFX=svn-raw-brc-jobs- DESTSFX=10xx.dump.gz CLCODE=zp CLPATH=Z/ZP_SingleJobs SDFOPTS='--drop-empty-revs --renumber-revs' date echo ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX} svnadmin dump --quiet /var/svn/brc-jobs | \ svndumpfilter include --quiet $SDFOPTS $CLPATH | \ gzip ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX} date The mirror to this was the script that created the new SVN repository and loads in the individual dump. Note the commented out 'sed' lines where we attempted to rebase individual project folders back up to the root of the repository. They didn't work, so we ended up just doing a move operation in the TortoiseSVN repository browser. - It changes the UUID of the newly created repository to be something unique instead of using the old repo's UUID. - Had to be edited anew for each new client/path. #!/bin/bash SRCDIR=/var/svn/ SRCPFX=svn-raw-brc-jobs- SRCSFX=10xx.dump.gz DESTDIR=/var/svn/ DESTPFX=svn-newbase-brc-jobs- DESTSFX=10xx.dump.gz SDFOPTS='--quiet --drop-empty-revs --renumber-revs' CLPARENT=Z CLCODE=zp date #gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \ #sed s/Node-path: $CLPATH\//Node-path: / | \ #sed s/Node-copyfrompath: $CLPATH\//Node-copyfrompath: / | \ #gzip ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX} svn mkdir -m Import from brc-jobs file:///var/svn/brc-jobs-${CLCODE}/${CLPARENT} gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \ svnadmin load --quiet /var/svn/brc-jobs-${CLCODE} svnlook uuid /var/svn/brc-jobs-${CLCODE} svnadmin setuuid /var/svn/brc-jobs-${CLCODE} svnlook uuid /var/svn/brc-jobs-${CLCODE} svnadmin pack /var/svn/brc-jobs-${CLCODE} chmod -R 775 /var/svn/brc-jobs-${CLCODE} chmod -R g+s /var/svn/brc-jobs-${CLCODE}/db chgrp -R svn-brc-jobs /var/svn/brc-jobs-${CLCODE} date I do wish I could have figured out the 'sed' commands to move a project from /Z/ZP_SingleJobs/JOBNR to be just /JOBNR in the repository, but there wasn't time. For rebasing, that's probably your missing piece... which I don't have.
Re: Push ?
On 9/15/2013 11:32 AM, Dan White wrote: The issue is that the client end of the transaction is in a DMZ A connection from a DMZ to one’s internal network is a very high security risk. What I was hoping for was a way to define a very specific connection from the Subversion server to the DMZ client (push). This is considered to be a much lower security risk. One way to handle this is to use SSH to access the specific SVN repository. 1. Use no-password SSH public-key pair that the DMZ host can punch through to the SSH port on the internal SVN server. (Naturally, SSH should be set to dis-allow root login, and only allow public-key authentication.) - If you can't change everyone over to using public keys and disabling password based authentication for SSH, then you should run a 2nd SSHD process on a different port and have that only allow specific accounts to login and require public-key authentication. Then you can setup your DMZ - SVN server firewall to only allow access to the SVN SSH alternate port from the DMZ. 2. Give the SSH account read-only access to the SVN repo that it needs 3. Lock down what the SSH account can do to just: command=/usr/bin/svnserve -t -r /var/svn,no-agent-forwarding,no-pty,no-port-forwarding,no-X11-forwarding ssh-rsa ... Since the account will have very limited permissions on the SVN machine (read-only access), there's not a whole lot that someone could do with the account. Plus the use of the command= line means they'd have to figure out a way to escape the svnserve program in order to get a command-line on the SVN machine.
Re: Breaking up a monolothic repository
On 9/9/2013 8:49 PM, Trent W. Buck wrote: I'm partway through provisioning the replacement Debian 7 server, which will have subversion 1.6.17dfsg-4+deb7u3 apache22.2.22-13 ...hm, still 1.6. Is it worth me backporting a newer svn? Yes, it's worth installing 1.8.3. http://www.wandisco.com/subversion/download#debian7
Re: Breaking up a monolothic repository
On 9/10/2013 7:22 AM, Nico Kadel-Garcia wrote: But keeping thousands of empty commits in a project they're not relevant to is confusing and wasteful. The repository and repository URL's for the old project should be preserved, if possible, locked down and read-only, precisely for this kind of change history. But since the repository is being completely refactored *anyway*, it's a great opportunity to discard debris. When we moved from a monolithic repository to per-client repositories a few years ago, we went ahead and: - Rebased the paths up one or two levels (old system was something like monolithicrepo/[a-z]/[client directories]/[job directory]) so that the urls were now clientrepo/[job directory]. That was a tricky thing to do and we had to 'sed' the output of the dump filter before importing it back. It broke a few things, such as svn:externals which were not relative-pathed, but was worth it in the long run so that our URLs got shorter. - Made sure that the new repos all had unique UUIDs. - Renumbered all of the resulting revisions as we loaded things back in. But we didn't have to deal with any bug tracking systems that referred to a specific revision. And having lower revision numbers was preferred, along with dropping revisions that referred to other projects. Even if the history is considered sacrosanct (and this is often a theological policy, not an engineering one!), an opportunity to reduce the size of each repository by discarding deadwood at switchover time should be taken seriously. Less of an issue now that svn 1.8 has revprop packing (plus the rev packing from 1.6). That deadwood takes up a lot less space in terms of the number of files in the file system. And the fact that svnadmin hotcopy is now incremental in 1.8 also makes it less of an issue. Having a few thousand (tens of thousands) revisions in a repository is no longer a big bottleneck during the hotcopy process like it was before. Our backup system is also a lot happier with fewer files to backup.
Re: Error after server upgrade to 1.8.3 - E160052: Revprop caching disabled
On 9/5/2013 6:41 PM, Gordon Moore wrote: Is this a known issue with 1.8.3? Any ideas on what is going on, how I can investigate, or what I might do to correct this? I'd start with: - How are you accessing the SVN repository? http? svn+ssh? svn? - What are the ownership and permissions on the /svn/repos/build folder, the build/db folder, and the contents of the build/db folder? - If you are running with SELinux set to Enforcing, try setting it temporarily to Permissive and see if the issue goes away. - What user account are you using when trying to svnadmin dump / svnadmin verify?
Re: How Big A Dump File Can Be Handled? (svn 1.8 upgrade)
On 8/21/2013 7:13 PM, Geoff Field wrote: I'm keeping the original BDB repositories, with read-only permissions. If I really have the need, I can restart Apache 2 with SVN 1.2.3 and go back to the original repositories. Otherwise, I also have the option of re-running my batch file (modifying it if absolutely required). On top of that, there are bunches of files on another server that give us at least the latest state of the projects. The dump files in this case are not really as useful as the data itself. Regards, Geoff When we did our 1.6 to 1.8 upgrade a few weeks ago, I used the following steps (ours was an in-place upgrade, so a bit of extra checking was added): 0. Back everything up, twice. 1. Check the version of the repository to see whether it is already 1.8 BASE='/var/svn/' TARGET='/backup/svndump/' DIR='somereponame' SVNADMIN=/path/to/svnadmin REPOFMT=`grep '^[123456]$' ${BASE}${DIR}/db/format` echo FSVS database format is $REPOFMT if [ $REPOFMT -ge 6 ]; then echo Format = 6, not upgrading. continue fi Note: That was a quick-n-dirty check that was valid for our configuration. To be truly correct, you need to verify: reponame/format reponame/db/fs-type reponame/db/format 2. Strip permissions on the original repo down to read-only. 3. Ran svnadmin verify on the original repository. echo Run svnadmin verify... $SVNADMIN verify --quiet ${BASE}${DIR} status=$? if [ $status -ne 0 ]; then echo svnadmin verify failed with status: $status continue else echo svnadmin verify succeeded fi 4. Do the svnadmin dump, piping the output into gzip -5 (moderate compression). echo svnadmin dump... $SVNADMIN dump --quiet ${BASE}${DIR} | gzip -5 --rsyncable ${TARGET}${DIR}.dump.gz status=$? if [ $status -ne 0 ]; then echo svnadmin dump failed with status: $status continue fi 5. Remove the old repository directory. echo Remove old repository (dangerous) rm -rf ${BASE}${DIR} status=$? if [ $status -ne 0 ]; then echo remove failed with status: $status continue fi 6. Create the repository in svn 1.8. echo Recreate repository with svnadmin $SVNADMIN create ${BASE}${DIR} status=$? if [ $status -ne 0 ]; then echo svnadmin create failed with status: $status continue fi 7. Strip permissions on the repository back down to 700, owned by root:root while we reload the data. 8. Fix the db/fsfs.conf file to take advantage of new features. Note: Make sure you understand what enable-dir-deltification, enable-props-deltification and enable-rep-sharing do. Some of these are not turned on in SVN 1.8 by default. echo Fix db/fsfs.conf file sed 's/^[#[:space:]]*enable-rep-sharing = false[#[:space:]]*$/enable-rep-sharing = true/g;s/^[#[:space:]]*enable-dir-deltificati on = false[#[:space:]]*$/enable-dir-deltification = true/g;s/^[#[:space:]]*enable-props-deltification = false[#[:space:]]*$/enable-p rops-deltification = true/g' --in-place=.bkp ${BASE}${DIR}/db/fsfs.conf status=$? if [ $status -ne 0 ]; then echo sed adjustment of db/fsfs.conf failed with status: $status continue fi 9. Load the repository back from the dump file. echo svnadmin load... gzip -c -d ${TARGET}${DIR}.dump.gz | $SVNADMIN load --quiet ${BASE}${DIR} status=$? if [ $status -ne 0 ]; then echo svnadmin load failed with status: $status continue fi 10. Run svnadmin pack to pack revs/revprops files (saves on inodes). echo svnadmin pack... $SVNADMIN pack --quiet ${BASE}${DIR} status=$? if [ $status -ne 0 ]; then echo svnadmin pack failed with status: $status continue fi 11. Run svnadmin verify. echo Run svnadmin verify... $SVNADMIN verify --quiet ${BASE}${DIR} status=$? if [ $status -ne 0 ]; then echo svnadmin verify failed with status: $status continue else echo svnadmin verify succeeded fi 12. Restore original permissions. Note: I have a custom script that I can run to set permissions correctly on our repository directories. I never set file system permissions by hand on the repositories, I always update the script and then use that. (With a few hundred repositories, I have to be organized and rely on scripts.) 13. Back everything up again, twice. All-in-all, it took us a few days to convert 110GB of repositories (mostly in 1.6 format), but the resulting size was only 95GB and far fewer files (due to revprops packing in 1.8). Our nightly backup window went from about 3 hours, down to 30 minutes from using svnadmin hotcopy --incremental. When then use rdiff-backup to push the hotcopy directory to a backup server.
Re: How Big A Dump File Can Be Handled? (svn 1.8 upgrade)
On 8/22/2013 7:11 PM, Geoff Field wrote: 6. Create the repository in svn 1.8. I'm sure there's an upgrade command that would do it all in-place. 7. Strip permissions on the repository back down to 700, owned by root:root while we reload the data. While, or before? Step 6 created the repos in our system with writable permissions, so we had to make sure nobody could commit to the repo while we loaded back i the dump file in step 9. Most restores for us took about 5-10 minutes, a few of our larger repos took a few hours. On your OS, is there a way to read the permissions first? Mmm, we could have used stat -c 0%a /path/to/file, but with the script to set our permissions, and because we structure our repos as category-reponame, we can set permissions across entire categories easily with the script. Since we use svn+ssh, repository permissions matter a bit more for us.
Re: server config
On 8/20/2013 1:19 AM, olli hauer wrote: On 2013-08-20 01:41, Nico Kadel-Garcia wrote: I think he meant subversion-1.6.11, which is the default version for CentOS 6.4. Check the SELinux settings in /etc/sysconfig/selinux. Set the line to 'SELINUX=permissive' (or disabled) After changing the SELINUX value a reboot is required Additional add a trailing '/' so you config looks so. A better way to handle SELinux issues is to: # getenforce - To see whether you are in permissive or enforcing mode # setenforce permissive - Run this before doing your tests Then use the various SELinux troubleshooting tools to see what errors were logged while in permissive mode. Once you have fixed your issues, you can use setenforce enforcing and then re-run your tests. The command line troubleshooting tool is: # sealert -a /var/log/audit/audit.log
Re: server config
On 8/19/2013 6:19 PM, Ben Reser wrote: On 8/19/13 9:07 AM, Scott Frankel wrote: I'm new to SVN server configuration and find myself setting up a CentOS 6.4 server with svn version 1.6.1, following the red-bean book. I'd strongly urge you not to use 1.6.1, see the list of applicable security issues here: http://subversion.apache.org/security/ If you're using the CentOS packages they may have patched those issues without updating the svn version number. You should check that though. If you're setting a new server I wouldn't start with 1.6.x but would go straight to 1.7.x or 1.8.x, probably 1.8.x if you can. For the 1.8.1 RPMs, I suggest adding the WANDisco repository to your configuration. http://www.wandisco.com/subversion/download What you're looking for is Download Subversion Installer V1.8.1 for Redhat. You download a shell script which then needs to be executed to install the WANDisco repositories and install the SVN 1.8.1 RPMs.
Re: server config
On 8/19/2013 12:42 PM, David Chapman wrote: How many repositories do you have? You shouldn't use SVNParentPath if you have only one repository; use SVNPath. I don't know if that is the direct cause of your problem, but you should fix it. I suggest planning for multiple repositories from the get-go. Some things in SVN land work better when you dedicate a separate repository to it. We started with one monolithic repository, but have since split that into ~300 smaller repositories.
Re: Suggestion to change the name Subversion
On 8/12/2013 8:03 PM, Nico Kadel-Garcia wrote: No one else remember the old Satan monitoring toolkit, that had an option to change the displayed name and icon to Santa? The name Subversion has enough positive reputation that changing it, just to avoid NSA style monitoring, seems very destabilizing to a popular project. Let's not change it. We get around this whole issue with our users by either always saying Subversion instead of subversion so that it's clear we're talking about a proper noun instead of a verb. Or by just using SVN.
Re: hotcopy --incremental auto-upgrade Re: Backup strategy sanity check
On 8/10/2013 6:28 PM, Daniel Shahaf wrote: Daniel Shahaf wrote on Sun, Aug 11, 2013 at 01:25:24 +0300: Thomas Harold wrote on Sat, Aug 10, 2013 at 10:53:43 -0400: With the 'svnadmin hotcopy --incremental' backups, we have to do extra checking in the script (comparing reponame/db/format versions) in order to make sure that the hotcopy runs correctly. If you have to do that, please do it correctly: - Check reponame/format - Check reponame/db/fs-type - Check reponame/db/format ... in this order. Will make a note of what to check for future. ... Maybe add --incremental-or-force (or some similar name) as an option. It will, if the format/fs-type/db-format do not match, fall back to doing a full blown hotcopy backup instead of incremental. But it's a 'nice-to-have' and not 'required' because things changing like this tend to be a once every 2-3 years thing. So we just work around it.
svnadmin verify performance - CPU bottleneck
On our setup (10k RPM SAS RAID-10 across 6 spindles, AMD Opteron 4180 2.6GHz), we're finding that svnadmin verify is CPU-bound and only uses a single CPU core. Is it possible that svnadmin verify could be multi-process in the future to spread the work over more cores? Or is that technically impossible? (Our current workaround is to divide the list of repositories up and run multiple concurrent svnadmin verify scripts.)
Re: svn 1.8 migration - directory deltification and revprop packing
On 8/2/2013 3:21 PM, Thomas Harold wrote: Our migration process: 0. svnadmin verify oldreponame 1. svnadmin dump oldreponame 2. svnadmin create newreponame 3. Modify db/fsfs.conf [rep-sharing] enable-rep-sharing = true # defaults to true in 1.8 [deltification] enable-dir-deltification = true enable-props-deltification = true This can be done in automated fashion with sed (or awk). sed 's/^[#[:space:]]*enable-rep-sharing = false[#[:space:]]*$/enable-rep-sharing = true/g;s/^[#[:space:]]*enable-dir-deltification = false[#[:space:]]*$/enable-dir-deltification = true/g;s/^[#[:space:]]*enable-props-deltification = false[#[:space:]]*$/enable-props-deltification = true/g' --in-place=bkp newreponame/db/fsfs.conf Naturally, you should have a backup of your db/fsfs.conf file. While sed creates one with --in-place=bkp, you should not rely solely on that method. 4. svnadmin load newreponame 5. svnadmin pack newreponame - Not all of our repos were packed Sample repository size changes for us: OLD: Size: 52MB Files: 1310 NEW: Size: 52MB Files: 1313 - This one is a lot of add file, then remove it a few days later without modifications. OLD: Size: 151MB Files: 18574 NEW: Size: 60MB Files: 633 - Apache HTTP log files stored with FSVS, 40% of original size OLD: Size: 151MB Files: 2540 NEW: Size: 126MB Files: 551 - Linux server configuration, stored using FSVS, 83% of original OLD: Size: 473MB Files: 600 NEW: Size: 424MB Files: 603 - Another FSVS repository, 90% of original OLD: Size: 1080MB Files: 2582 NEW: Size: 964MB Files: 1785 - FSVS repository, 69% of original I haven't seen any repositories bloat up to larger then the original size, but I'm still working through converting our old v1.6 repositories to the v1.8 format. The bottlenecks for us in the dump/load cycle are: - CPU for svnadmin verify step - GZIP (using -5 option) during dump step - Disk contention during the svnadmin load step
SVN FSFS format value for 1.8?
According to the release notes: http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements We should have seen a bump in the FSFS format number to '6' for 1.8. But when I'm looking at the 'format' file inside the FSFS repository directory on the server, I'm seeing '5'. # svnadmin --version svnadmin, version 1.8.1 (r1503906) compiled Jul 17 2013, 10:30:55 on x86_64-unknown-linux-gnu # svnadmin create /backup/svndump/test3 # cat /backup/svndump/test3/format 5 It makes me think that I still have remnants of 1.7 floating around in the system (which was previously installed from source). Even though I removed the old 'svn*' executables from bin folders and removed the old 'libsvn*' libraries from lib folders.
Re: SVN FSFS format value for 1.8?
On 8/4/2013 6:30 PM, Ryan Schmidt wrote: Check the file /backup/svndump/test3/db/format. # cat /backup/svndump/test3/db/format 6 layout sharded 1000 Okay, so I was looking at the wrong thing. That still raises the question (in my mind) of why the two values are different on a freshly created repository. # svnadmin create test4 # cat test4/format 5 # cat test4/db/format 6 layout sharded 1000 Or is the format file in the root used for something else?
Re: svn 1.8 migration - directory deltification and revprop packing
On 6/11/2013 8:52 AM, C. Michael Pilato wrote: One advantage of being in a room full of Subversion developers, specifically the guy that implemented all this stuff, is that I can ask him directly about how to respond to this mail. :-) Hopefully I will accurately represent the answers Stefan Fuhrmann just gave me to your questions. On 06/10/2013 03:05 PM, Thomas Harold wrote: a) Why are directory/property deltifications turned off by default? Stefan's jovial first answer was, Because I'm a chicken. :-) Fortunately, he didn't stop there. He explained that there is no known correctness risk -- you're not going to damage your repositories by enabling the feature. But he wanted to phase in the feature to allow time to collect real-world data about the amount of space savings folks are actually observing in their repositories. The feature is on by default in his proposed next version of the FSFS format. b) Is there a global setting file that can be used to enable directory/property deltifications? Or will we have to update the fsfs.conf file for each newly created repository in order to get this feature? In 1.8, you'll need to toggle this for each new repository manually. So in order to get full use of the Directory and property storage reduction in a retroactive manner as noted in: http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements We will need to: 1. svnadmin dump oldreponame 2. svnadmin create newreponame 3. Modify db/fsfs.conf [rep-sharing] enable-rep-sharing = true # defaults to true in 1.8 [deltification] enable-dir-deltification = true enable-props-deltification = true This can be done in automated fashion with sed (or awk). For example, to change compress-packed-revprops: sed 's/^[#[:space:]]*compress-packed-revprops = false[#[:space:]]*$/compress-packed-revprops = true/g' --in-place=bkp newreponame/db/fsfs.conf Or to enable deltification (as well as enable-rep-sharing) all at once: sed 's/^[#[:space:]]*enable-rep-sharing = false[#[:space:]]*$/enable-rep-sharing = true/g;s/^[#[:space:]]*enable-dir-deltification = false[#[:space:]]*$/enable-dir-deltification = true/g;s/^[#[:space:]]*enable-props-deltification = false[#[:space:]]*$/enable-props-deltification = true/g' --in-place=bkp newreponame/db/fsfs.conf Naturally, you should have a backup of your db/fsfs.conf file. While sed creates one with --in-place=bkp, you should not rely solely on that method. 4. svnadmin load newreponame 5. svnadmin pack newreponame - Not all of our repos were packed Even without turning on deltification, I'm seeing a size difference between our 1.7 repositories and the ones that we loaded in from dump files in 1.8.1. Possibly from enable-rep-sharing now being set to 'true' by default in 1.8. Test repo #1 went from 2.0G to 1.7G (85% of original). Test repo #2 went from 2.4G down to 2.1G (88% of original). File count in repo #2's fsfs folder only went from 3808 to about 3684 (1820 revisions). Even though we did svnadmin load using 1.8.1, I still had to svnadmin pack to cut the file count down to 1692. Test repo #3 is 10202MB with 47199 files and 45990 revisions (in 1.6 / 1.7). The revs had been packed, but not the revprops under the old system. The gzip'd dump file is 35.8GB. I enabled the two deltification options in db/fsfs.conf before doing the svnadmin load step. Finished size was 7330MB (72% of original) with 92108 files. File count drops to 2380 after packing and repository size drops to 7072MB (69% of original).
Re: SVN performance -URGENT
On 8/1/2013 10:52 AM, Somashekarappa, Anup (CWM-NR) wrote: Bandwidth is 35.4 MBytes/secfrom my system(London) to server(New york) when i checked with iperf tool. We are using LDAP AuthzLDAPAuthoritative off AuthType Basic AuthBasicProvider ldap AuthName Windows Credentials As per message after checkout in TortoiseSVNGUI = 368 Mbytes transfered. Actual folder size = 1.15 GB(1236706079 bytes) Number of files = 201,712 Folder = 21,707 Guess this inculdes the .svn folder as well. That's a fairly complex working copy with many files/folders. Given that you have 35Mbps (note the lower case B) of bandwidth, an ideal transfer should be somewhere in the 45-60 minute range for a fresh checkout of the entire thing. However, you're obviously bottlenecked somewhere. On the Linux server side, I suggest installing a tool called atop and monitoring things like how busy the disks are, how busy the CPU cores are and the network throughput. This will give you an idea of how hard the Linux server is working while sending out the data to the SVN client. For the windows client, you will need to look at the Performance Monitor (perfmon) and Task Manager to see if you are bottlenecking somewhere. Good counters to watch in perfmon are Physical Disk / % Disk Read Time, Physical Disk / % Disk Write Time, Network Interface / Bytes Sent/sec, Network Interface / Bytes Received/sec. My guesses at this point would be: - You're not using a SSD on the Windows client, so there is a lot of disk activity as SVN goes to create the working copy. So your disks are 100% busy and are your bottleneck. - You're CPU bottlenecked somewhere. Either server-side or client-side. - Maybe you need to consider using sparse working copies or only checking out a portion of the repository at a time. (Such as only bringing down your project's trunk folder.) - You'll need to do this checkout once to create the initial working copy, then keep the working copy around for a long time. Future svn update commands will then only transmit the changes over the wire instead of all of the content.
Re: problem building 1.8.1 (sqlite-amalgamation)
From my notes back when I compiled 1.8.0, I had to download the sqlite-amalgamation ZIP file and add it into the source directory. $ cd /usr/local/src/subversion-1.8.0/ $ wget http://www.sqlite.org/sqlite-amalgamation-3071501.zip $ unzip sqlite-amalgamation-3071501.zip $ mv sqlite-amalgamation-3071501 sqlite-amalgamation $ rm sqlite-amalgamation-3071501.zip Once I had the zip file unpacked into the sqlite-amalgamation subdirectory, there were no extra options to be passed to that ./configure script. It saw it automatically. (I have not had time yet to build 1.8.1.) Current version of the sqlite-amalgamation ZIP can be found at: http://www.sqlite.org/download.html
Re: Backup strategy sanity check
On 7/24/2013 4:21 PM, Les Mikesell wrote: Is that better than using svnsync from a remote server plus some normal file backup approach for the conf/hooks directories? Not sure, I have not tried out svnsync. We also don't use post-commit hooks (yet). I am under the impression that hotcopy does grab conf/hooks stuff while dump does not. But can't find anything in the svnbook that says either way at the moment. ... http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html#svn.reposadmin.maint.backup svnsync definitely does not handle some things: The primary disadvantage of this method is that only the versioned repository data gets synchronized—repository configuration files, user-specified repository path locks, and other items that might live in the physical repository directory but not inside the repository's virtual versioned filesystem are not handled by svnsync. ... We also run a svnadmin verify on the rdiff-backup directories each week, combined with verifying the checksums on the rdiff-backup files. The combination of checksums on the rdiff-backups plus 26W of snapshots that I can restore to is, I feel, pretty safe. I try to reexamine the backup strategy every 6 months, but I think I'm in a good spot now with the svnadmin hotcopy / rdiff-backup setup. Which also makes it easy for us to rsync the rdiff-backup folder to an offsite server. Downside is the delay introduced by doing hotcopy only once per day. So worst case might mean the lost of 20-48 hours of commits. A more frequent svnsync / incremental hotcopy triggered by a post-commit hook would have a much smaller delay.
Re: Backup strategy sanity check
On 7/24/2013 2:59 PM, Andy Levy wrote: I'm planning my upgrade to SVN 1.8 to go along with it, setting up a new backup process. Here's what I'm thinking: * Monday overnight, take a full backup (svnadmin hotcopy, then compress the result for storage) * Tuesday through Sunday overnights, incremental backups (svnadmin dump --incremental, compress the result) * After completing the Monday night full backup, purge the previous week's incrementals. * After completing the Monday night full backup, run svnadmin pack * Keep the last 6 full backups on local disk (these will be kept written to the corporate backup system, so we can go back further if needed). We simply do svnadmin hotcopy each night, then rdiff-backup that to another server over the network. The rdiff-backups keep 6 months of revisions to the hotcopy folders. Our /var/svn is 122GB (as is the hotcopy location), the rdiff-backup is 173GB. Using rdiff-backup means that I can go back to any point in time in the last 6 months for a particular repository (plus we have hashes/checksums of all files). For offsite purposes, we backup the rdiff-backup directory instead of the hotcopy directory. What we might do once 1.8 server is stable is switch to doing the new incremental style hotcopy on Mon-Sat evenings and do a full hotcopy on Sun. Right now, we address the time it takes to do full hotcopy of all 300+ repositories by only doing a nightly hotcopy of any repositories that have changes within the last N days (usually 7 days, and usually a only a few dozen repositories see activity each week). Doing the hotcopy to a different set of platters also helps. This is based on the assumption that svnadmin hotcopy is a preferred backup method over svnadmin dump for daily backups, because it grabs everything out of the repository directory while svnadmin dump misses things like scripts.
Re: How to prune old versions of an artefact?
On 7/15/2013 5:49 AM, Cooke, Mark wrote: For your uses, perhaps you could spin this artifact off into it's own repository (use externals from your main repo if required) and then you can archive that repo off whenever necessary? Sound advice for any sort of large artifact, or in the case where automated scripts will push commits into a repository on a regular (sometimes frequent) basis. It not only keeps the size down and gives you the option to do extreme resets to the separate repository, but it also keeps the log history of the original repository from being polluted with all of the automated commits. We do this on the servers that we version control using FSVS. There are sections of the directory tree which need to be heavily monitored for changes (such as configuration reports), but which don't need to pollute the main SVN repository for that machine.
Re: Expected performance
On 7/8/2013 11:32 AM, Naumenko, Roman wrote: Hello, How fast would you expect svn checkout to be from a server like one below? Considering eveyrthing on the server functioning as expected. Our bottleneck is usually the CPU, but we're doing svn+ssh access. So I lean towards a few less but more powerful cores. The only time we thrash the disks is when doing the nightly hotcopy of our repositories (total of about 110GB).
Re: Expected performance (svn+ssh)
On 7/8/2013 2:18 PM, Naumenko, Roman wrote: That box has more than enough CPUs (forty), cores are barely utilized. How is the access over ssh can be configured? I thought it's only http(s) or svn proto. http://svnbook.red-bean.com/en/1.7/svn.basic.in-action.html#svn.advanced.reposurls http://svnbook.red-bean.com/en/1.7/svn.serverconfig.svnserve.html#svn.serverconfig.svnserve.sshtricks svn+ssh access has some upsides and downsides. For us, it was simpler to get up and running with it back in 2007 when we were still getting our feet wet with SVN 1.4. We weren't ready to muck around with Apache httpd and SSL certificates to do https access to the repository. We grant access at the repository level via Linux file system permissions. This means that every user needs to have their own system account and belong to Linux group that owns the repository. chown -R svn-group1 /var/svn/svn-repository1 chmod -R 770 /var/svn/svn-repository1 chmod -R g+s /var/svn/svn-repository1 Where the 770 is some combination of, 770, 775, 755, 750, 700. 770 = owner read/write, group read/write, other none 750 = owner read/write, group read-only, other none To keep things sane, we do not set permission by hand, but edit a script that can be re-run to fix permissions on the repositories. Most of our repositories follow a set naming pattern, which makes it easier. The other advantage of svn+ssh is that it works well when using FSVS, because you can edit ~/.ssh/config so that FSVS can login to the SVN server automatically and push/pull configuration file changes.
Re: WebDAV support in future versions of SVN server?
On 6/26/2013 8:15 AM, Nico Kadel-Garcia wrote: On Tue, Jun 25, 2013 at 9:55 AM, Thomas Harold thomas-li...@nybeta.com wrote: Is it still a long-term goal to maintain the ability to mount a SVN repository as a WebDAV folder? Out of curiosity, why do you feel the need for this? Working in a remote copy isn't enough for your uses? Less technical users like the idea of being able to treat the SVN repository as a mapped drive where everything is auto-versioned.
WebDAV support in future versions of SVN server?
Is it still a long-term goal to maintain the ability to mount a SVN repository as a WebDAV folder? Based on this message from 2009: http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462dsMessageId=1180976 It sounds like the SVN server is still planning on supporting WebDAV clients, but moving the svn client away from talking WebDAV to the HTTP server? But before I go and roll out WebDAV to our users, I'd like to make sure that SVN isn't going to drop WebDAV client support in the next few years.
Re: Advice for changing filename case in SVN on case insensitive system
On 6/20/2013 6:56 PM, Geoff Hoffman wrote: deleting the file from Subversion, then adding the copy with the correct case. Question: Doesn't that blow away revision history? If I didn't care about revision history I would just start over with a fresh repo. If you use svn mv to do the change, it does not blow away the revision history for the file. You can, however, choose to have log output stop on copy. http://svnbook.red-bean.com/en/1.7/svn.ref.svn.html#svn.ref.svn.sw.stop_on_copy (In TortoiseSVN's log viewer, there is a checkbox at the bottom called Stop on copy/rename that you can turn off.) I also thought about doing full URL svn mv's but seemed like that could take a very long time to do... It probably will be slow, depending on which access method you use, and each mv would result in a new transaction in the repository. I tend to only do server-side moves (URL to URL) for the renaming of upper level folders in the tree, which is a rare occurrence for us. All other moves we try to do at the working copy level. (As with everything, it's best to do a test on an inconsequential file before doing any mass moves.)
Re: Subversion Exception! line 647: assertion failed (peg_revnum != SVN_INVALID_REVNUM)
On 6/20/2013 11:55 PM, Sandeepan Kundu wrote: Tried going back to 1.7, but it is telling project is in higher version :(( how to use now, my development is getting affected!!! I suggest renaming your old (upgraded to 1.8) working copy out of the way, doing a fresh checkout using 1.7 into a fresh working copy folder, then copying over changed files from the upgraded 1.8 working copy which isn't working. Naturally, making a backup of the borked working copy is strongly suggested if you had uncommitted changes.
Re: Crash in 1.8.0 (db/format layout linear)
On 6/19/2013 5:30 AM, Daniel Shahaf wrote: Does %REPOS_DIR%\db\format contain 4 layout linear? If so, that's a known issue that will be fixed in 1.8.1. Out of curiosity, which versions of SVN produced a layout linear? I'm guessing that was from back in the SVN 1.4 days (repo format #2) as layout sharded was added in SVN 1.5 (repo format #3)? At least, that's the impression that I got from: http://svn.apache.org/repos/asf/subversion/trunk/tools/server-side/fsfs-reshard.py Checked our repositories with the following: find /var/svn/ -maxdepth 3 -name format -exec grep -H 'layout' {} \;
Re: Apache Subversion 1.8.0 Released (Apache module support via DSO through APXS)
On 6/18/2013 8:06 AM, Branko Čibej wrote: We're happy to announce the release of Apache Subversion 1.8.0. While running ./configure (on CentOS 6), I see the following in the output: checking for Apache module support via DSO through APXS... no == WARNING: skipping the build of mod_dav_svn try using --with-apxs == Turns out I had not installed the httpd-devel package. After installing httpd-devel-2.2.15-28.el6.centos.x86_64 from the CentOS repositories, the warning went away.
Re: Apache Subversion 1.8.0 Released (An appropriate version of serf could not be found)
And the last hurdle I had to jump over to get svn 1.8.0 to compile on my CentOS 6 box. When running ./configure, I had the following message show up: checking was serf enabled... no An appropriate version of serf could not be found, so libsvn_ra_serf will not be built. If you want to build libsvn_ra_serf, please install serf 1.2.1 or newer. I downloaded serf from: http://code.google.com/p/serf/downloads/list Ran the ./configure, make, make install steps Then I had to tell SVN's configure command where to find the serf libraries: $ ./configure --with-serf=/usr/local/serf/ I probably could have also downloaded the serf-devel and serf RPMs from Wandisco's site: http://opensource.wandisco.com/rhel/6/svn-1.8/RPMS/x86_64/
Re: svn 1.8 migration - directory deltification and revprop packing
On 6/11/2013 8:52 AM, C. Michael Pilato wrote: One advantage of being in a room full of Subversion developers, specifically the guy that implemented all this stuff, is that I can ask him directly about how to respond to this mail. :-) Hopefully I will accurately represent the answers Stefan Fuhrmann just gave me to your questions. Thank you very much. b) Is there a global setting file that can be used to enable directory/property deltifications? Or will we have to update the fsfs.conf file for each newly created repository in order to get this feature? In 1.8, you'll need to toggle this for each new repository manually. I'll cobble something together with grep, sed/awk, and find then to monitor (and update) our fsfs.conf files. As for the --deltas option, that has nothing in the world to do with the types of deltas we're discussing here. (As an aside, I would highly recommend that, unless you need your dumpfiles to be smaller, avoid the --deltas option. The performance penalty of using it isn't worth it.) Right now, the size of our dumpfile directory is 207G, while the hotcopy is only 104G. So the size savings could be big for us. The hotcopy backup is still our preferred solution, with the dump files being a worst-case fallback. #2 - revision property (revprops) files packing a) Will there be a svnadmin pack command like there was for SVN 1.6? Or will we need to do a full dump/load of the repository to pack the revprops? The existing 'svnadmin pack' command will govern both revision and revprop packing, and will keep the two in sync with each other. 'svnadmin upgrade' will also take the opportunity to synchronize the packing status of the revision properties with that of the revision backing files. Thanks, the svn book is light on details of what exactly counts as minimum amount of work needed for svnadmin upgrade.
Re: svn 1.8 migration - svnadmin hotcopy
On 6/11/2013 10:20 AM, Stefan Sperling wrote: On Tue, Jun 11, 2013 at 10:13:15AM -0400, Thomas Harold wrote: Right now, the size of our dumpfile directory is 207G, while the hotcopy is only 104G. So the size savings could be big for us. The hotcopy backup is still our preferred solution, with the dump files being a worst-case fallback. Please try the new svnadmin hotcopy --incremental. It should accelerate your backup process. Yes, I'm looking forward to that feature in 1.8. We currently tackle the time issue in two ways: 1) We only svnadmin hotcopy repositories which have changed in the last N days (typically 3 days). Since we have about 300 repositories currently, but we don't do work on things in all 300 constantly, this means we only backup a few dozen repositories each night. BASE=/var/svn/ DAYS=3 # Directories get randomized with the perl fragment, so that # they get processed in random order. This makes the backups # more reliable over the long term in case one directory # causes problems. DIRS=`$FIND ${BASE} -maxdepth 3 -name current -mtime -${DAYS} | \ $GREP 'db/current$' | \ $SED 's:/db/current$::' | $SED s:^${BASE}:: | \ perl -MList::Util -e 'print List::Util::shuffle '` 2) We read the svn repositories from one set of spindles and write the hotcopy to a second spindle set. Even with the 104GB and 300 repositories that we have, this only takes ~37 minutes. It still takes 4-5 hours to perform the rdiff-backup step that pushes the hotcopy folder over to our internal backup server, but that's more because of the tens of thousands of revprops files in some of the repositories. Which is another feature in 1.8 that I'm looking forward to.
svn 1.8 migration - directory deltification and revprop packing
Questions about the 1.8 upgrade path: #1 - In reading the release notes for 1.8, I'm interested in the directory/property storage reduction as described in: http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements Directory and property storage reduction For each changed node in a FSFS repository, new versions of all parent directories will be created. Larger repositories tend to have relatively deep directory structures with at least one level (branches, modules or projects) having tens of entries. The total size of that data may well exceed the size of the actual change. Subversion 1.8 now supports directory deltification which eliminates most of that overhead. In db/fsfs.conf, you may now enable and disable directory deltification at any time and these settings will be applied in new revisions. For completeness, node properties may now be deltified as well although the reductions in repository size will usually be minimal. By default, directory and property deltification are disabled. You must edit db/fsfs.conf to enable these features. Also, db/fsfs.conf now allows for fine-grained control over how deltification will be applied. See the comments in that file for a detailed description of the individual options. a) Why are directory/property deltifications turned off by default? What are the risks to enabling them across all repositories? (Yes we backup daily with svnadmin hotcopy, then rdiff-backup the hotcopy with 6 months of backup history kept. So we can always rewind to any day within the last 6 months.) b) Is there a global setting file that can be used to enable directory/property deltifications? Or will we have to update the fsfs.conf file for each newly created repository in order to get this feature? c) Is it a safe assumption that in order to apply this change to an older repository that we will need a dump/load cycle? Will we need a full dump or will an delta style dump suffice (--deltas option of svnadmin dump command)? #2 - revision property (revprops) files packing a) Will there be a svnadmin pack command like there was for SVN 1.6? Or will we need to do a full dump/load of the repository to pack the revprops? b) Does revprop caching only need to be enabled for http/https access and does it have any effect on svn+ssh access? (All of our users currently use svn+ssh access, but we are considering moving to http/https.)
Re: Subversion access control / Linux users etc.
The issues with passwords is why we ended up going with SSH public-key authentication. Load the SSH key into the SSH agent, unlock it with the passphrase, then don't worry about it again until we reset the SSH agent at logout. Less prompts, happier users. (Plus it makes it harder to get into our servers since we don't allow password authentication.)
Re: Subversion: existing users
On 7/17/2011 2:07 AM, Andy Canfield wrote: The most obvious authorization scheme is that of the host server; if there is a user named andy on that server with a password jackel then I would like to simply be able to talk to the subversion server as user named andy password jackel. This is how ssh and sftp work. But apparently subversion can't handle that. True? You can use individual accounts, the main trickiness is in making sure that the svn repository directory is group owned, group writable and that new files created within the repo/db tree are owned by the group and not the individual's primary group. A quick chmod -R g+s repo/db after setting up the repository takes care of that. Our server only allowed SSH public-key authentication, so the only way to login (other then physically at the console) is via the SSH keys. So the command= line in the authorized_keys files is reasonably secure for our purposes. Very few users actually have a way to get to the shell. And most of those don't even know the password for their account on the server. (Naturally, we run backups daily, just in case someone does figure out how to get a shell through the svnserve process and deletes a repository. But if they can commit to the repository, there are more nefarious things they can do there too.) We prefix our ssh-rsa lines in the ~/.ssh/authorized_keys file with: command=/usr/bin/svnserve -t -r /var/svn, no-agent-forwarding,no-pty, no-port-forwarding,no-X11-forwarding This also has the advantage that remote URL ends being: svn+ssh://servername/repositoryname/path/within/repo Instead of: svn+ssh://servername/var/svn/repositoryname/path/within/repo With SSH ~/.ssh/config files or by setting up PuTTY sessions correctly you can get rid of having the usernames / port numbers in the svn+ssh URL. (We run our SSH servers on a non-standard port.)
Re: Move to a new repo and keep the history, Part 2
On 7/14/2011 12:29 PM, K F wrote: Recap – I would like to move some directories from one repository to another while keeping the history. I went through this a few months ago (and maybe this will help). We were using a big monolithic repository for all of our jobs. Our repository was arranged as: (jobs repository) /A/ABClient/ABJob1/... /A/AXClient/AXJob1/... /A/AXClient/AXJob2/... /B/BQClient/BQJob1/... /C/CAClient/CAJob1/... /C/CMClient/CMJob1/... /C/CMClient/CMJob2/... We wanted to split each client out to a different repository. The output repositories would look like: (jobs-ab) /ABJob1/... (jobs-ax) /AXJob1/... /AXJob2/... (jobs-bq) /BQJob1/... (jobs-ca) /CAJob1/... (jobs-cm) /CMJob1/... /CMJob2/... Which made all the URLs a lot shorter because the /A/ABClient was often something lengthy like /A/AB_Acme_Border_Wings_Inc. 1) A shell script to split out a specific directory. We had to edit the CLCODE and CLPATH lines for each run (took 30-40 minutes to parse the monolithic jobs repository and split out a particular client's tree). Each time I did a new client, I had to make sure that everyone was ready for that client's project tree to move. Alternately, I could have made the entire jobs tree read-only for a week... (Apologies if there are errors in this as I had to quickly edit out some client/company specific paths. I always executed the script as bash -x scriptname so I could spot errors. The date lines are just there so I could keep track of how long it took.) #!/bin/bash DESTDIR=/var/svn/ DESTPFX=svn-raw-jobs- DESTSFX=10xx.dump.gz CLCODE=bq CLPATH=B/BQClient SDFOPTS='--drop-empty-revs --renumber-revs' date echo ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX} svnadmin dump --quiet /var/svn/jobs | \ svndumpfilter include --quiet $SDFOPTS $CLPATH | gzip \ ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX} date 2) Created the new repository (such as jobs-bq for the BQClient). Although you could probably roll that into the import script. # svnadmin create jobs-bq 3) Edited the following import script for each new run. Loading it up was a fairly quick process. Note that we update the UUID of the new repository to make sure that nobody commits outdated stuff. I gave up on trying to re-base on the fly using sed and simply moved all of the individual job folders into the root of the new repository and then cleaned up the left-over folder. Our script also had to create the letter level in the new repository. Otherwise the import had no place to hang itself off of. You may want to drop the chmod/chgrp lines towards the end. For our server, we only use svn+ssh authentication. Each users has their own local account on the server and they belong to a svn-jobs group which gives them read/write access to the entire repository. #!/bin/bash SRCDIR=/var/svn/ SRCPFX=svn-raw-jobs- SRCSFX=10xx.dump.gz DESTDIR=/var/svn/ DESTPFX=svn-newbase-jobs- DESTSFX=10xx.dump.gz CLPARENT=B CLCODE=bq date svn mkdir -m Import from jobs \ file:///var/svn/jobs-${CLCODE}/${CLPARENT} gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \ svnadmin load --quiet /var/svn/jobs-${CLCODE} svnlook uuid /var/svn/jobs-${CLCODE} svnadmin setuuid /var/svn/jobs-${CLCODE} svnlook uuid /var/svn/jobs-${CLCODE} svnadmin pack /var/svn/jobs-${CLCODE} chmod -R 775 /var/svn/jobs-${CLCODE} chmod -R g+s /var/svn/jobs-${CLCODE}/db chgrp -R svn-jobs /var/svn/jobs-${CLCODE} date 4) After the load into the new repository, I would access the repository through TortoiseSVN's Repository Browser and drag all of the individual jobs folders from being under /A/ABClient/ up to the root of the new repository. Then I deleted the /A/ABClient/ tree. If I could have figured out how to remap on the fly via sed, I could have avoided step #4. But it was good enough for our purposes and a few historical entries in the SVN log showing the movement of the folders wasn't a big deal. 5) After we finished the splits, we chmod'd the old repository to be read-only and took it completely offline a month or two later.
Re: Problem Loading Huge Repository
On 6/16/2011 7:05 PM, Bruno Antunes wrote: Do you know any faster way to load the dump file or to filter out some projects/revisions so I can speed up the process? Are you CPU-bound? Or are you limited by disk speed? If you're limited by disk access times, make sure that the source file that you're loading from is on a different disk then the destination repository. Even if you toss the 45GB dump file onto a USB2 external disk, you'll see a speed increase. And if you have a choice of file systems for the repository to be stored on, make sure that it's something which can deal with a few hundred thousand tiny files. On Linux, I'd suggest going with ext4 over ext3. While db/revs in a FSFS repository can have its revisions packed to reduce the file count, the db/revprops folder still consists of 1 tiny file for every revision in the project in a FSFS repository.
Re: Problem Loading Huge Repository
On 6/17/2011 10:54 AM, Daniel Shahaf wrote: Thomas Harold wrote on Fri, Jun 17, 2011 at 10:31:43 -0400: And if you have a choice of file systems for the repository to be stored on, make sure that it's something which can deal with a few hundred thousand tiny files. On Linux, I'd suggest going with ext4 over ext3. While db/revs in a FSFS repository can have its revisions packed to reduce the file count, the db/revprops folder still consists of 1 tiny file for every revision in the project in a FSFS repository. revprops/ is sharded. And in 1.7 (including the recent 1.7.0-alpha1) it is packed, too. Good. Another of the many reasons that we're looking forward to 1.7. Even with the sharding, those little revprop files are causing us issues during backups (hotcopy - rdiff-backup). Being able to pack those revprop files is going to make a big difference as the backup process will only have to track 2000-2200 files instead of 30,000 to 50,000. (We have a few long-lived repositories with up to 25k revisions. And I just finished splitting a 22GB repository with 15-16k revs into a bunch of smaller repositories. Now the nightly backup can look at doing a hotcopy on only the repositories with changes in the last 5 days.)
Re: Subversion 1.6.17 Released
On 6/1/2011 7:27 PM, Daniel Shahaf wrote: Be advised: the 1.6.17 tag [1] in our repository does not match the tarballs at the time of this writing. Until we fix this, please use the tarballs or zip archives, and avoid installing 1.6.17 from the tag. Daniel [1] https://svn.apache.org/repos/asf/subversion/tags/1.6.17 I'm guessing that is why the following URL returns a 404 at the moment? http://svn.apache.org/repos/asf/subversion/tags/1.6.17/CHANGES