Efficiency of rep-sharing (deduplication) in 1.8 and later

2014-09-12 Thread Thomas Harold
I have a question about how efficient SVN is at de-duplication within a
repository with regards to files that appear in multiple locations, but
which have the same content.

I know a small improvement was made in 1.8...

http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements

 When representation sharing has been enabled, Subversion 1.8 will now
 be able to detect files and properties with identical contents within
 the same revision and only store them once. This is a common
 situation when you for instance import a non-incremental dump file or
 when users apply the same change to multiple branches in a single
 commit.

#1 - If a commit puts files A, B and C into the repository, and a latter
commit puts files B, C and D into the repository at a different
location, is SVN smart enough to realize that B and C are already stored
in the repository?

In other words, does it track each individual file separately, even if
they were all part of one big revision?


Re: Repository Structure Question

2014-01-03 Thread Thomas Harold

On 1/2/2014 5:25 PM, Mike Fochtman wrote:

Currently the team hasn't used any form of version control on these
applications because 'it would be too hard...'


I think you can get 99% of the way there by making sure that application
'A' is under full version control.  Some version control is better then
no version control, so tackle project A first.


I'm part of a small development team (currently 4).  We have two
applications used in-house that consist of about 1900 source files.
The two applications share about 1880 of the files in common, and
there are only about 20 different between them.

For a lot of complicated reasons I won't go into here, we can't split
the common files into a shared-library sort of project.

Most of our development goes on in application 'A'.  Currently we
then transferred over to the other application 'B' development
machine manually and build/test that one.


I would put application B into the same repository under a 2nd root
directory.  The primary reason that I recommend a single repository for 
both applications is so that SVN 1.8's duplicate detection will keep 
your repository size under control.  So you would have:


/projectA/(trunk|branches|tags)
/projectB/(trunk|branches|tags)

There's a few ways to tackle moving stuff from project A to project B.

Most of them involve making sure that the unique files not shared 
across the applications are in a separate directory.


One method would be to checkout project B's files, then use svn export 
to overlay project A's files into the project B's working copy.  It's 
messy, but it duplicates your existing process.


http://svnbook.red-bean.com/en/1.7/svn.ref.svn.c.export.html

Another option would be to branch A's trunk (or stable release tag) into 
B's trunk.  Then apply B's changes to make the application look like B.


http://svnbook.red-bean.com/en/1.7/svn.branchmerge.using.html

Or you could combine the approaches and setup your repository like:

/Common/(trunk|branches|tags)
/projectA/(trunk|branches|tags)
/projectB/(trunk|branches|tags)
/buildA/(trunk|branches|tags)
/buildB/(trunk|branches|tags)

Where the files unique to ProjectA are in /projectA/trunk, the files 
unique to project B are in /projectB/trunk.  The files common to both 
applications are in /Common/trunk.


The /buildA/trunk tree is then where you use svn:external to weld files 
from Common + ProjectA together into something that builds for 
application A.  And /buildB/trunk is where you use svn:external to weld 
together the application B build.


http://svnbook.red-bean.com/en/1.7/svn.advanced.externals.html



Re: svn hotcopy

2013-12-28 Thread Thomas Harold

On 12/26/2013 3:42 PM, Listman wrote:

I am using svn 1.5.5 and I backup with hotcopy.  I am starting to see
that my repository which 50G is backing up as 48G with hotcopy.  I
can’t figure it out and my friend google is not helping at all.  Does
any one have a clue?



I agree with Thorsten.  There may be leftover cruft in your repository 
directory, which gets ignored during the hotcopy process.  Use svnadmin 
verify on the repository to check for errors.  (You should probably be 
running svnadmin verify on a weekly or monthly basis anyway.)


Also, upgrading to 1.8 is a very good idea.  Since you are using 1.5, I 
recommend a full svnadmin dump / svnadmin load cycle to put it into 
1.8 format.  Because of the changes in the repository format between 1.5 
and 1.8, you might even seen 5-20% space reduction in the size of your 
repository.


(We averaged about 15% size reduction going from 1.6 to 1.8, plus a huge 
reduction in the number of individual files due to revision and revprop 
packing.  YMMV.)


Re: Upgrade Subversion Repository from 1.5 into 1.8

2013-12-17 Thread Thomas Harold

On 12/16/2013 9:03 AM, Krishnamoorthi Gopal wrote:


Thanks for your clarification pavel..

If i used existing repositories in Subversion 1.8 then how can i benefit
features in new version..

Shall i use commands like svnadmin Upgrade to upgrade my existing
repos into latest..



As Mark says, svnadmin dump and svnadmin load cycle is the best way 
to upgrade older SVN repositories to 1.8 because it will completely 
convert it into 1.8 format (including the new space-saving additions to 
the repository format).


However, you don't have to do it all at once.  You could start running 
SVN 1.8 on the server, then upgrade the individual repositories to the 
1.8 format at your leisure.  We spread our migration out over a few 
weeks (going from 1.6 to 1.8 format).  So during the migration period we 
had a mix of repository formats on the server.


Client-side working copies, however, are much more all-or-nothing.  When 
the client moves to 1.8, all of the working copies also have to be 
upgraded to 1.8.  And we still have a few 1.6 and 1.7 clients talking to 
our 1.8 server.


Naturally, you should be making good backups of your SVN repositories 
daily.  And the dump/load cycle is a good time to copy the dump files 
off to long-term storage.




Re: Update-Only Checkout Enhancement

2013-12-12 Thread Thomas Harold

On 12/10/2013 8:45 PM, Mark Kneisler wrote:

I have several environments where I’d like to use a SVN checkout, but
where I’d never ever want to make changes to the files or perform a
commit.  For these environments, I’d only want to perform an update or
an update to revision.



In cases where you do not want a .svn directory and you are using Linux, 
take a look at FSVS:


http://fsvs.tigris.org/

This is a command line tool that works very similar to the svn 
command-line tool and talks to an SVN repository.  We make heavy use of 
it to version-control our Linux servers (especially the files under 
/usr/local, /boot and /etc).


The big difference over using FSVS vs SVN on the Linux box is that FSVS 
does not create a .svn folder in the root.


I don't know off-hand if FSVS can be used in Cygwin under Windows.



Re: Update-Only Checkout Enhancement

2013-12-12 Thread Thomas Harold

On 12/11/2013 2:19 PM, Bob Archer wrote:

On 11.12.2013 17:21, Mark Kneisler wrote:


I think making the pristine files optional would work for me.



Here’s an idea.



Instead of having pristine copies of all files, how about adding
to the pristine directory only when a file is changed?



You know, that's a great idea! I wonder why we never thought of it
ourselves? :)


Wouldn't that mean that you need to have some daemon service (or file
watcher or something) running to determine if a file is modified?
Also, it would mean you would need a constant connection to the
server to use a subversion working copy.



Not necessarily.  Take a look at how FSVS does its magic.

http://fsvs.tigris.org/

It functions in a similar manner to the svn command-line tool, but works 
without requring a .svn folder.  Which is why I prefer it for doing 
version control of system configuration files on a Linux server.


Re: Tools for projecting growth of repository storage?

2013-12-12 Thread Thomas Harold

On 12/2/2013 7:58 PM, Eric Johnson wrote:

Anyone have a suggestion for a tool that projects the growth of
repository storage.

I've got repos taking over 75% of a disk volume, and I'm curious to
project out when I'll need new storage.

Obviously, this is approximate, but has anyone got a tool for it?

Eric.



We keep our repositories on a dedicated file system (ext4) and run 
collectd on the box to track file system space usage (the df plugin).


Combine that with a graphing tool for collectd that can read the RRD 
files (such as the web-based CGP front-end) and we get nice pretty charts.


http://imgur.com/xDZ9BGu

As you can see in Week 27-29, we had some runaway growth which alerted 
me that I needed to take a look at what was being automatically 
committed.  In our case, it was FSVS doing automated commits of a Linux 
box where we should have ignored/excluded some additional directories.


When looking at my quarterly graph (13 weeks), CGP gives me numbers like:

Used (Minimum) 96.9GB Used (Last) 99.2GB - which means I have only seen 
2.3GB of growth over 13 weeks, or about 10GB per year at current rate of 
growth.


We also run a small script each day that checks the file systems and 
sends an alert if any file system is over 75% full.





Re: svn backup

2013-10-07 Thread Thomas Harold


On 10/7/2013 3:37 PM, rvaede wrote:


 I am confused on how to backup my repositary.

Note: Making a raw file-level backup of a SVN repository is not 
advisable.  There may be open files, files that changed while the backup 
is running, etc.  So you will need to use one of the hotcopy / dump 
methods to get a good snapshot of the repository state for inclusion 
onto a backup tape/disk/set.


If you want to backup everything, including server-side hook scripts and 
the like which are stored under the repository directory, take a look at 
svnadmin hotcopy.  It's essentially an rsync (but not quite) of the 
origin repository directory.  The hotcopy directories (since they only 
change once per day, or whenever you do the hotcopy) are ideal for being 
used to backup the repository.


http://svnbook.red-bean.com/nightly/en/svn.ref.svnadmin.c.hotcopy.html

The svnadmin dump is more suitable for long-term archival of the SVN 
repository because it stores the data in platform-neutral format.  It 
will be much larger then the original repository was (even with gzip -5) 
and will take a long time to perform the dump.


http://svnbook.red-bean.com/nightly/en/svn.ref.svnadmin.c.dump.html

The third option is to run a hot-spare system using the svn sync. Except 
that this does not give you generational backups, so you still need to 
use either hotcopy/dump as well.


http://svnbook.red-bean.com/nightly/en/svn.ref.svnsync.html

Personally: I prefer hotcopy for daily backups, then using rdiff-backup 
to make a backup of the hotcopy directory or our main backup server. The 
rdiff-backup step gives me the ability to go back to any day within the 
past 26 weeks (configurable).  Combined with the use of generational 
media, I have multiple copies of the rdiff-backup target directory.


(rsnapshot would also be a good choice.)

I'll also make svnadmin dumps every few months, but it takes a long time 
to do, uses up a lot of disk space (3x for us) and has few advantages 
compared to the rdiff-backup of the hotcopy directory.


Re: Breaking up a monolothic repository

2013-10-04 Thread Thomas Harold

On 10/2/2013 10:36 AM, Ullrich Jans wrote:


I'm now facing the same problem. My users want the rebasing, but during
the dump/load instead of after the fact (apparently, it causes issues
with their environment when they need to go back to an earlier revision
to reproduce something). They also want to keep the empty revisions (for
references from the issue tracker).

I haven't tried it with svnadmin dump followed by svndumpfilter (I don't
think it has that capability).


The command we ended up using back in May 2011 when we did this looked 
like the following.  It's been two years, but I'm pretty sure these two 
scripts is all we ended up using.


- We had a master dump of the entire brc-jobs repository.
- Target repository name was brc-jobs-zp (CLCODE)
- It takes the dump and splits it into a smaller chunk (CLPATH).
- Had to edit the script for each new client/path that we wanted to 
split out.


It does *not* attempt to rebase the individual projects up to the root 
directory.  It *is* possible by using 'sed' to do this in the resulting 
dump file, but it is trick.



#!/bin/bash

SOURCE=/mnt/scratch/svn-dump-brc-jobs.may2011.dump.gz

DESTDIR=/var/svn/
DESTPFX=svn-raw-brc-jobs-
DESTSFX=10xx.dump.gz

CLCODE=zp
CLPATH=Z/ZP_SingleJobs

SDFOPTS='--drop-empty-revs  --renumber-revs'

date

echo ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

svnadmin dump --quiet /var/svn/brc-jobs | \
svndumpfilter include --quiet $SDFOPTS $CLPATH | \
gzip  ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

date


The mirror to this was the script that created the new SVN repository 
and loads in the individual dump.


Note the commented out 'sed' lines where we attempted to rebase 
individual project folders back up to the root of the repository.  They 
didn't work, so we ended up just doing a move operation in the 
TortoiseSVN repository browser.


- It changes the UUID of the newly created repository to be something 
unique instead of using the old repo's UUID.

- Had to be edited anew for each new client/path.


#!/bin/bash

SRCDIR=/var/svn/
SRCPFX=svn-raw-brc-jobs-
SRCSFX=10xx.dump.gz

DESTDIR=/var/svn/
DESTPFX=svn-newbase-brc-jobs-
DESTSFX=10xx.dump.gz

SDFOPTS='--quiet --drop-empty-revs  --renumber-revs'

CLPARENT=Z
CLCODE=zp

date

#gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \
#sed s/Node-path: $CLPATH\//Node-path: / | \
#sed s/Node-copyfrompath: $CLPATH\//Node-copyfrompath: / | \
#gzip  ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

svn mkdir -m Import from brc-jobs 
file:///var/svn/brc-jobs-${CLCODE}/${CLPARENT}


gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \
  svnadmin load --quiet /var/svn/brc-jobs-${CLCODE}

svnlook uuid /var/svn/brc-jobs-${CLCODE}
svnadmin setuuid /var/svn/brc-jobs-${CLCODE}
svnlook uuid /var/svn/brc-jobs-${CLCODE}
svnadmin pack /var/svn/brc-jobs-${CLCODE}

chmod -R 775 /var/svn/brc-jobs-${CLCODE}
chmod -R g+s /var/svn/brc-jobs-${CLCODE}/db
chgrp -R svn-brc-jobs /var/svn/brc-jobs-${CLCODE}

date


I do wish I could have figured out the 'sed' commands to move a project 
from /Z/ZP_SingleJobs/JOBNR to be just /JOBNR in the repository, but 
there wasn't time.


For rebasing, that's probably your missing piece... which I don't have.


Re: Push ?

2013-09-16 Thread Thomas Harold

On 9/15/2013 11:32 AM, Dan White wrote:

The issue is that the client end of the transaction is in a DMZ

A connection from a DMZ to one’s internal network is a very high
security risk. What I was hoping for was a way to define a very
specific connection from the Subversion server to the DMZ client
(push). This is considered to be a much lower security risk.


One way to handle this is to use SSH to access the specific SVN repository.

1. Use no-password SSH public-key pair that the DMZ host can punch 
through to the SSH port on the internal SVN server.  (Naturally, SSH 
should be set to dis-allow root login, and only allow public-key 
authentication.)


- If you can't change everyone over to using public keys and disabling 
password based authentication for SSH, then you should run a 2nd SSHD 
process on a different port and have that only allow specific accounts 
to login and require public-key authentication.  Then you can setup your 
DMZ - SVN server firewall to only allow access to the SVN SSH alternate 
port from the DMZ.


2. Give the SSH account read-only access to the SVN repo that it needs

3. Lock down what the SSH account can do to just:

command=/usr/bin/svnserve -t -r 
/var/svn,no-agent-forwarding,no-pty,no-port-forwarding,no-X11-forwarding ssh-rsa 
...


Since the account will have very limited permissions on the SVN machine 
(read-only access), there's not a whole lot that someone could do with 
the account.  Plus the use of the command= line means they'd have to 
figure out a way to escape the svnserve program in order to get a 
command-line on the SVN machine.


Re: Breaking up a monolothic repository

2013-09-10 Thread Thomas Harold

On 9/9/2013 8:49 PM, Trent W. Buck wrote:


I'm partway through provisioning the replacement Debian 7 server, which
will have

 subversion 1.6.17dfsg-4+deb7u3
 apache22.2.22-13

...hm, still 1.6.  Is it worth me backporting a newer svn?



Yes, it's worth installing 1.8.3.

http://www.wandisco.com/subversion/download#debian7




Re: Breaking up a monolothic repository

2013-09-10 Thread Thomas Harold

On 9/10/2013 7:22 AM, Nico Kadel-Garcia wrote:

But keeping thousands of empty commits in a project they're not relevant
to is confusing and wasteful. The  repository and repository URL's for
the old project should be preserved, if possible, locked down and
read-only, precisely for this kind of change history. But since the
repository is being completely refactored *anyway*, it's a great
opportunity to discard debris.


When we moved from a monolithic repository to per-client repositories a 
few years ago, we went ahead and:


- Rebased the paths up one or two levels (old system was something like 
monolithicrepo/[a-z]/[client directories]/[job directory]) so that the 
urls were now clientrepo/[job directory].  That was a tricky thing to 
do and we had to 'sed' the output of the dump filter before importing it 
back.


It broke a few things, such as svn:externals which were not 
relative-pathed, but was worth it in the long run so that our URLs got 
shorter.


- Made sure that the new repos all had unique UUIDs.

- Renumbered all of the resulting revisions as we loaded things back in. 
 But we didn't have to deal with any bug tracking systems that referred 
to a specific revision.  And having lower revision numbers was 
preferred, along with dropping revisions that referred to other projects.



Even if the history is considered sacrosanct (and this is often a
theological policy, not an engineering one!), an opportunity to reduce
the size of each repository by discarding deadwood at switchover time
should be taken seriously.


Less of an issue now that svn 1.8 has revprop packing (plus the rev 
packing from 1.6).  That deadwood takes up a lot less space in terms of 
the number of files in the file system.


And the fact that svnadmin hotcopy is now incremental in 1.8 also makes 
it less of an issue.  Having a few thousand (tens of thousands) 
revisions in a repository is no longer a big bottleneck during the 
hotcopy process like it was before.


Our backup system is also a lot happier with fewer files to backup.



Re: Error after server upgrade to 1.8.3 - E160052: Revprop caching disabled

2013-09-05 Thread Thomas Harold

On 9/5/2013 6:41 PM, Gordon Moore wrote:


Is this a known issue with 1.8.3? Any ideas on what is going on, how I
can investigate, or what I might do to correct this?


I'd start with:

- How are you accessing the SVN repository? http? svn+ssh? svn?

- What are the ownership and permissions on the /svn/repos/build folder, 
the build/db folder, and the contents of the build/db folder?


- If you are running with SELinux set to Enforcing, try setting it 
temporarily to Permissive and see if the issue goes away.


- What user account are you using when trying to svnadmin dump / 
svnadmin verify?




Re: How Big A Dump File Can Be Handled? (svn 1.8 upgrade)

2013-08-22 Thread Thomas Harold

On 8/21/2013 7:13 PM, Geoff Field wrote: I'm keeping the original BDB
repositories, with read-only permissions.

If I really have the need, I can restart Apache 2 with SVN 1.2.3 and
go back to the original repositories. Otherwise, I also have the
option of re-running my batch file (modifying it if absolutely
required). On top of that, there are bunches of files on another
server that give us at least the latest state of the projects. The
dump files in this case are not really as useful as the data itself.
Regards, Geoff



When we did our 1.6 to 1.8 upgrade a few weeks ago, I used the following 
steps (ours was an in-place upgrade, so a bit of extra checking was added):


0. Back everything up, twice.

1. Check the version of the repository to see whether it is already 1.8

BASE='/var/svn/'
TARGET='/backup/svndump/'
DIR='somereponame'
SVNADMIN=/path/to/svnadmin

REPOFMT=`grep '^[123456]$' ${BASE}${DIR}/db/format`
echo FSVS database format is $REPOFMT
if [ $REPOFMT -ge 6 ]; then
echo Format = 6, not upgrading.
continue
fi

Note: That was a quick-n-dirty check that was valid for our 
configuration.  To be truly correct, you need to verify:


reponame/format
reponame/db/fs-type
reponame/db/format

2. Strip permissions on the original repo down to read-only.

3. Ran svnadmin verify on the original repository.

echo Run svnadmin verify...
$SVNADMIN verify --quiet ${BASE}${DIR}
status=$?
if [ $status -ne 0 ]; then
echo svnadmin verify failed with status: $status
continue
else
echo svnadmin verify succeeded
fi

4. Do the svnadmin dump, piping the output into gzip -5 (moderate 
compression).


echo svnadmin dump...
$SVNADMIN dump --quiet ${BASE}${DIR} | gzip -5 --rsyncable  
${TARGET}${DIR}.dump.gz

status=$?
if [ $status -ne 0 ]; then
echo svnadmin dump failed with status: $status
continue
fi

5. Remove the old repository directory.

echo Remove old repository (dangerous)
rm -rf ${BASE}${DIR}
status=$?
if [ $status -ne 0 ]; then
echo remove failed with status: $status
continue
fi

6. Create the repository in svn 1.8.

echo Recreate repository with svnadmin
$SVNADMIN create ${BASE}${DIR}
status=$?
if [ $status -ne 0 ]; then
echo svnadmin create failed with status: $status
continue
fi

7. Strip permissions on the repository back down to 700, owned by 
root:root while we reload the data.


8. Fix the db/fsfs.conf file to take advantage of new features.

Note: Make sure you understand what enable-dir-deltification, 
enable-props-deltification and enable-rep-sharing do.  Some of these are 
not turned on in SVN 1.8 by default.


echo Fix db/fsfs.conf file
sed 's/^[#[:space:]]*enable-rep-sharing = 
false[#[:space:]]*$/enable-rep-sharing = 
true/g;s/^[#[:space:]]*enable-dir-deltificati
on = false[#[:space:]]*$/enable-dir-deltification = 
true/g;s/^[#[:space:]]*enable-props-deltification = 
false[#[:space:]]*$/enable-p

rops-deltification = true/g' --in-place=.bkp ${BASE}${DIR}/db/fsfs.conf
status=$?
if [ $status -ne 0 ]; then
echo sed adjustment of db/fsfs.conf failed with status: $status
continue
fi

9. Load the repository back from the dump file.

echo svnadmin load...
gzip -c -d ${TARGET}${DIR}.dump.gz | $SVNADMIN load --quiet ${BASE}${DIR}
status=$?
if [ $status -ne 0 ]; then
echo svnadmin load failed with status: $status
continue
fi

10. Run svnadmin pack to pack revs/revprops files (saves on inodes).

echo svnadmin pack...
$SVNADMIN pack --quiet ${BASE}${DIR}
status=$?
if [ $status -ne 0 ]; then
echo svnadmin pack failed with status: $status
continue
fi

11. Run svnadmin verify.

echo Run svnadmin verify...
$SVNADMIN verify --quiet ${BASE}${DIR}
status=$?
if [ $status -ne 0 ]; then
echo svnadmin verify failed with status: $status
continue
else
echo svnadmin verify succeeded
fi

12. Restore original permissions.

Note: I have a custom script that I can run to set permissions correctly 
on our repository directories.  I never set file system permissions by 
hand on the repositories, I always update the script and then use that. 
 (With a few hundred repositories, I have to be organized and rely on 
scripts.)


13. Back everything up again, twice.

All-in-all, it took us a few days to convert 110GB of repositories 
(mostly in 1.6 format), but the resulting size was only 95GB and far 
fewer files (due to revprops packing in 1.8).  Our nightly backup window 
went from about 3 hours, down to 30 minutes from using svnadmin hotcopy 
--incremental.  When then use rdiff-backup to push the hotcopy 
directory to a backup server.


Re: How Big A Dump File Can Be Handled? (svn 1.8 upgrade)

2013-08-22 Thread Thomas Harold

On 8/22/2013 7:11 PM, Geoff Field wrote:

6. Create the repository in svn 1.8.


I'm sure there's an upgrade command that would do it all in-place.


7. Strip permissions on the repository back down to 700, owned by
root:root while we reload the data.


While, or before?


Step 6 created the repos in our system with writable permissions, so we
had to make sure nobody could commit to the repo while we loaded back i
the dump file in step 9.

Most restores for us took about 5-10 minutes, a few of our larger repos
took a few hours.



On your OS, is there a way to read the permissions first?



Mmm, we could have used stat -c 0%a /path/to/file, but with the script
to set our permissions, and because we structure our repos as
category-reponame, we can set permissions across entire categories
easily with the script.

Since we use svn+ssh, repository permissions matter a bit more for us.




Re: server config

2013-08-20 Thread Thomas Harold

On 8/20/2013 1:19 AM, olli hauer wrote:

On 2013-08-20 01:41, Nico Kadel-Garcia wrote:

I think he meant subversion-1.6.11, which is the default version for
CentOS 6.4.


Check the SELinux settings in /etc/sysconfig/selinux.
Set the line to 'SELINUX=permissive' (or disabled)

After changing the SELINUX value a reboot is required

Additional add a trailing '/' so you config looks so.



A better way to handle SELinux issues is to:

# getenforce
- To see whether you are in permissive or enforcing mode
# setenforce permissive
- Run this before doing your tests

Then use the various SELinux troubleshooting tools to see what errors 
were logged while in permissive mode.  Once you have fixed your issues, 
you can use setenforce enforcing and then re-run your tests.


The command line troubleshooting tool is:

# sealert -a /var/log/audit/audit.log







Re: server config

2013-08-20 Thread Thomas Harold

On 8/19/2013 6:19 PM, Ben Reser wrote:

On 8/19/13 9:07 AM, Scott Frankel wrote:

I'm new to SVN server configuration and find myself setting up a
CentOS 6.4 server with svn version 1.6.1, following the red-bean
book.


I'd strongly urge you not to use 1.6.1, see the list of applicable
security issues here: http://subversion.apache.org/security/

If you're using the CentOS packages they may have patched those
issues without updating the svn version number.  You should check
that though.

If you're setting a new server I wouldn't start with 1.6.x but would
go straight to 1.7.x or 1.8.x, probably 1.8.x if you can.


For the 1.8.1 RPMs, I suggest adding the WANDisco repository to your 
configuration.


http://www.wandisco.com/subversion/download

What you're looking for is Download Subversion Installer V1.8.1 for 
Redhat.  You download a shell script which then needs to be executed to 
install the WANDisco repositories and install the SVN 1.8.1 RPMs.


Re: server config

2013-08-19 Thread Thomas Harold

On 8/19/2013 12:42 PM, David Chapman wrote:


How many repositories do you have?  You shouldn't use SVNParentPath if
you have only one repository; use SVNPath.  I don't know if that is the
direct cause of your problem, but you should fix it.



I suggest planning for multiple repositories from the get-go.  Some 
things in SVN land work better when you dedicate a separate repository 
to it.


We started with one monolithic repository, but have since split that 
into ~300 smaller repositories.


Re: Suggestion to change the name Subversion

2013-08-15 Thread Thomas Harold

On 8/12/2013 8:03 PM, Nico Kadel-Garcia wrote:

No one else remember the old Satan monitoring toolkit, that had an
option to change the displayed name and icon to Santa?

The name Subversion has enough positive reputation that changing
it, just to avoid NSA style monitoring, seems very destabilizing to a
popular project. Let's not change it.



We get around this whole issue with our users by either always saying 
Subversion instead of subversion so that it's clear we're talking 
about a proper noun instead of a verb.  Or by just using SVN.





Re: hotcopy --incremental auto-upgrade Re: Backup strategy sanity check

2013-08-11 Thread Thomas Harold

On 8/10/2013 6:28 PM, Daniel Shahaf wrote:

Daniel Shahaf wrote on Sun, Aug 11, 2013 at 01:25:24 +0300:

Thomas Harold wrote on Sat, Aug 10, 2013 at 10:53:43 -0400:

With the 'svnadmin hotcopy --incremental' backups, we have to do extra
checking in the script (comparing reponame/db/format versions) in order
to make sure that the hotcopy runs correctly.



If you have to do that, please do it correctly:

- Check reponame/format
- Check reponame/db/fs-type
- Check reponame/db/format


... in this order.



Will make a note of what to check for future.

...

Maybe add --incremental-or-force (or some similar name) as an option. 
It will, if the format/fs-type/db-format do not match, fall back to 
doing a full blown hotcopy backup instead of incremental.


But it's a 'nice-to-have' and not 'required' because things changing 
like this tend to be a once every 2-3 years thing.  So we just work 
around it.


svnadmin verify performance - CPU bottleneck

2013-08-06 Thread Thomas Harold
On our setup (10k RPM SAS RAID-10 across 6 spindles, AMD Opteron 4180 
2.6GHz), we're finding that svnadmin verify is CPU-bound and only uses 
a single CPU core.


Is it possible that svnadmin verify could be multi-process in the 
future to spread the work over more cores?  Or is that technically 
impossible?


(Our current workaround is to divide the list of repositories up and run 
multiple concurrent svnadmin verify scripts.)


Re: svn 1.8 migration - directory deltification and revprop packing

2013-08-05 Thread Thomas Harold

On 8/2/2013 3:21 PM, Thomas Harold wrote:

Our migration process:


 0. svnadmin verify oldreponame


1. svnadmin dump oldreponame

2. svnadmin create newreponame

3. Modify db/fsfs.conf

[rep-sharing]
enable-rep-sharing = true # defaults to true in 1.8

[deltification]
enable-dir-deltification = true
enable-props-deltification = true

This can be done in automated fashion with sed (or awk).

sed 's/^[#[:space:]]*enable-rep-sharing =
false[#[:space:]]*$/enable-rep-sharing =
true/g;s/^[#[:space:]]*enable-dir-deltification =
false[#[:space:]]*$/enable-dir-deltification =
true/g;s/^[#[:space:]]*enable-props-deltification =
false[#[:space:]]*$/enable-props-deltification = true/g' --in-place=bkp
newreponame/db/fsfs.conf

Naturally, you should have a backup of your db/fsfs.conf file.  While
sed creates one with --in-place=bkp, you should not rely solely on that
method.

4. svnadmin load newreponame

5. svnadmin pack newreponame
- Not all of our repos were packed



Sample repository size changes for us:

OLD: Size: 52MB Files: 1310
NEW: Size: 52MB Files: 1313
- This one is a lot of add file, then remove it a few days later 
without modifications.


OLD: Size: 151MB Files: 18574
NEW: Size: 60MB Files: 633
- Apache HTTP log files stored with FSVS, 40% of original size

OLD: Size: 151MB Files: 2540
NEW: Size: 126MB Files: 551
- Linux server configuration, stored using FSVS, 83% of original

OLD: Size: 473MB Files: 600
NEW: Size: 424MB Files: 603
- Another FSVS repository, 90% of original

OLD: Size: 1080MB Files: 2582
NEW: Size: 964MB Files: 1785
- FSVS repository, 69% of original

I haven't seen any repositories bloat up to larger then the original 
size, but I'm still working through converting our old v1.6 repositories 
to the v1.8 format.


The bottlenecks for us in the dump/load cycle are:
- CPU for svnadmin verify step
- GZIP (using -5 option) during dump step
- Disk contention during the svnadmin load step



SVN FSFS format value for 1.8?

2013-08-04 Thread Thomas Harold

According to the release notes:
http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements

We should have seen a bump in the FSFS format number to '6' for 1.8. 
But when I'm looking at the 'format' file inside the FSFS repository 
directory on the server, I'm seeing '5'.


# svnadmin --version
svnadmin, version 1.8.1 (r1503906)
   compiled Jul 17 2013, 10:30:55 on x86_64-unknown-linux-gnu

# svnadmin create /backup/svndump/test3

# cat /backup/svndump/test3/format
5

It makes me think that I still have remnants of 1.7 floating around in 
the system (which was previously installed from source).  Even though I 
removed the old 'svn*' executables from bin folders and removed the old 
'libsvn*' libraries from lib folders.


Re: SVN FSFS format value for 1.8?

2013-08-04 Thread Thomas Harold

On 8/4/2013 6:30 PM, Ryan Schmidt wrote:


Check the file /backup/svndump/test3/db/format.


# cat /backup/svndump/test3/db/format
6
layout sharded 1000

Okay, so I was looking at the wrong thing.

That still raises the question (in my mind) of why the two values are 
different on a freshly created repository.


# svnadmin create test4
# cat test4/format
5
# cat test4/db/format
6
layout sharded 1000

Or is the format file in the root used for something else?


Re: svn 1.8 migration - directory deltification and revprop packing

2013-08-02 Thread Thomas Harold

On 6/11/2013 8:52 AM, C. Michael Pilato wrote:

One advantage of being in a room full of Subversion developers, specifically
the guy that implemented all this stuff, is that I can ask him directly
about how to respond to this mail.  :-)  Hopefully I will accurately
represent the answers Stefan Fuhrmann just gave me to your questions.

On 06/10/2013 03:05 PM, Thomas Harold wrote:

a) Why are directory/property deltifications turned off by default?


Stefan's jovial first answer was, Because I'm a chicken.  :-)

Fortunately, he didn't stop there.  He explained that there is no known
correctness risk -- you're not going to damage your repositories by enabling
the feature.  But he wanted to phase in the feature to allow time to collect
real-world data about the amount of space savings folks are actually
observing in their repositories.  The feature is on by default in his
proposed next version of the FSFS format.


b) Is there a global setting file that can be used to enable
directory/property deltifications? Or will we have to update the fsfs.conf
file for each newly created repository in order to get this feature?


In 1.8, you'll need to toggle this for each new repository manually.


So in order to get full use of the Directory and property storage 
reduction in a retroactive manner as noted in:


http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements

We will need to:

1. svnadmin dump oldreponame

2. svnadmin create newreponame

3. Modify db/fsfs.conf

[rep-sharing]
enable-rep-sharing = true # defaults to true in 1.8

[deltification]
enable-dir-deltification = true
enable-props-deltification = true

This can be done in automated fashion with sed (or awk).  For example, 
to change compress-packed-revprops:


sed 's/^[#[:space:]]*compress-packed-revprops = 
false[#[:space:]]*$/compress-packed-revprops = true/g' --in-place=bkp 
newreponame/db/fsfs.conf


Or to enable deltification (as well as enable-rep-sharing) all at once:

sed 's/^[#[:space:]]*enable-rep-sharing = 
false[#[:space:]]*$/enable-rep-sharing = 
true/g;s/^[#[:space:]]*enable-dir-deltification = 
false[#[:space:]]*$/enable-dir-deltification = 
true/g;s/^[#[:space:]]*enable-props-deltification = 
false[#[:space:]]*$/enable-props-deltification = true/g' --in-place=bkp 
newreponame/db/fsfs.conf


Naturally, you should have a backup of your db/fsfs.conf file.  While 
sed creates one with --in-place=bkp, you should not rely solely on that 
method.


4. svnadmin load newreponame

5. svnadmin pack newreponame
- Not all of our repos were packed

Even without turning on deltification, I'm seeing a size difference 
between our 1.7 repositories and the ones that we loaded in from dump 
files in 1.8.1.  Possibly from enable-rep-sharing now being set to 
'true' by default in 1.8.


Test repo #1 went from 2.0G to 1.7G (85% of original).

Test repo #2 went from 2.4G down to 2.1G (88% of original).  File count 
in repo #2's fsfs folder only went from 3808 to about 3684 (1820 
revisions).  Even though we did svnadmin load using 1.8.1, I still had 
to svnadmin pack to cut the file count down to 1692.


Test repo #3 is 10202MB with 47199 files and 45990 revisions (in 1.6 / 
1.7). The revs had been packed, but not the revprops under the old 
system.  The gzip'd dump file is 35.8GB.  I enabled the two 
deltification options in db/fsfs.conf before doing the svnadmin load 
step.  Finished size was 7330MB (72% of original) with 92108 files. 
File count drops to 2380 after packing and repository size drops to 
7072MB (69% of original).


Re: SVN performance -URGENT

2013-08-01 Thread Thomas Harold

On 8/1/2013 10:52 AM, Somashekarappa, Anup (CWM-NR) wrote:

Bandwidth is 35.4 MBytes/secfrom my system(London)  to server(New york)
when i checked with iperf tool.
We are using LDAP
AuthzLDAPAuthoritative off
AuthType Basic
AuthBasicProvider ldap
AuthName Windows Credentials

As per message after checkout in TortoiseSVNGUI = 368 Mbytes transfered.

Actual folder size = 1.15 GB(1236706079 bytes)

Number of files = 201,712

Folder = 21,707

Guess this inculdes the .svn folder as well.



That's a fairly complex working copy with many files/folders.  Given 
that you have 35Mbps (note the lower case B) of bandwidth, an ideal 
transfer should be somewhere in the 45-60 minute range for a fresh 
checkout of the entire thing.


However, you're obviously bottlenecked somewhere.

On the Linux server side, I suggest installing a tool called atop and 
monitoring things like how busy the disks are, how busy the CPU cores 
are and the network throughput.  This will give you an idea of how hard 
the Linux server is working while sending out the data to the SVN client.


For the windows client, you will need to look at the Performance Monitor 
(perfmon) and Task Manager to see if you are bottlenecking somewhere. 
Good counters to watch in perfmon are Physical Disk / % Disk Read 
Time, Physical Disk / % Disk Write Time, Network Interface / Bytes 
Sent/sec, Network Interface / Bytes Received/sec.


My guesses at this point would be:

- You're not using a SSD on the Windows client, so there is a lot of 
disk activity as SVN goes to create the working copy.  So your disks are 
100% busy and are your bottleneck.


- You're CPU bottlenecked somewhere.  Either server-side or client-side.

- Maybe you need to consider using sparse working copies or only 
checking out a portion of the repository at a time.  (Such as only 
bringing down your project's trunk folder.)


- You'll need to do this checkout once to create the initial working 
copy, then keep the working copy around for a long time.  Future svn 
update commands will then only transmit the changes over the wire 
instead of all of the content.






Re: problem building 1.8.1 (sqlite-amalgamation)

2013-07-29 Thread Thomas Harold
From my notes back when I compiled 1.8.0, I had to download the 
sqlite-amalgamation ZIP file and add it into the source directory.


$ cd /usr/local/src/subversion-1.8.0/
$ wget http://www.sqlite.org/sqlite-amalgamation-3071501.zip
$ unzip sqlite-amalgamation-3071501.zip
$ mv sqlite-amalgamation-3071501 sqlite-amalgamation
$ rm sqlite-amalgamation-3071501.zip

Once I had the zip file unpacked into the sqlite-amalgamation 
subdirectory, there were no extra options to be passed to that 
./configure script.  It saw it automatically.


(I have not had time yet to build 1.8.1.)

Current version of the sqlite-amalgamation ZIP can be found at:

http://www.sqlite.org/download.html



Re: Backup strategy sanity check

2013-07-25 Thread Thomas Harold

On 7/24/2013 4:21 PM, Les Mikesell wrote:


Is that better than using svnsync from a remote server plus some
normal file backup approach for the conf/hooks directories?



Not sure, I have not tried out svnsync.  We also don't use post-commit
hooks (yet).  I am under the impression that hotcopy does grab 
conf/hooks stuff while dump does not.  But can't find anything in the 
svnbook that says either way at the moment.


...

http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html#svn.reposadmin.maint.backup

svnsync definitely does not handle some things:


The primary disadvantage of this method is that only the versioned
repository data gets synchronized—repository configuration files,
user-specified repository path locks, and other items that might live
in the physical repository directory but not inside the repository's
virtual versioned filesystem are not handled by svnsync.


...

We also run a svnadmin verify on the rdiff-backup directories each week, 
combined with verifying the checksums on the rdiff-backup files.  The 
combination of checksums on the rdiff-backups plus 26W of snapshots that 
I can restore to is, I feel, pretty safe.


I try to reexamine the backup strategy every 6 months, but I think I'm 
in a good spot now with the svnadmin hotcopy / rdiff-backup setup. 
Which also makes it easy for us to rsync the rdiff-backup folder to an 
offsite server.


Downside is the delay introduced by doing hotcopy only once per day.  So 
worst case might mean the lost of 20-48 hours of commits.  A more 
frequent svnsync / incremental hotcopy triggered by a post-commit hook 
would have a much smaller delay.


Re: Backup strategy sanity check

2013-07-24 Thread Thomas Harold

On 7/24/2013 2:59 PM, Andy Levy wrote:

I'm planning my upgrade to SVN 1.8  to go along with it, setting up a
new backup process. Here's what I'm thinking:

* Monday overnight, take a full backup (svnadmin hotcopy, then
compress the result for storage)
* Tuesday through Sunday overnights, incremental backups (svnadmin
dump --incremental, compress the result)
* After completing the Monday night full backup, purge the previous
week's incrementals.
* After completing the Monday night full backup, run svnadmin pack
* Keep the last 6 full backups on local disk (these will be kept
written to the corporate backup system, so we can go back further if
needed).



We simply do svnadmin hotcopy each night, then rdiff-backup that to 
another server over the network.  The rdiff-backups keep 6 months of 
revisions to the hotcopy folders.  Our /var/svn is 122GB (as is the 
hotcopy location), the rdiff-backup is 173GB.  Using rdiff-backup means 
that I can go back to any point in time in the last 6 months for a 
particular repository (plus we have hashes/checksums of all files).


For offsite purposes, we backup the rdiff-backup directory instead of 
the hotcopy directory.


What we might do once 1.8 server is stable is switch to doing the new 
incremental style hotcopy on Mon-Sat evenings and do a full hotcopy on 
Sun.  Right now, we address the time it takes to do full hotcopy of all 
300+ repositories by only doing a nightly hotcopy of any repositories 
that have changes within the last N days (usually 7 days, and usually a 
only a few dozen repositories see activity each week).  Doing the 
hotcopy to a different set of platters also helps.


This is based on the assumption that svnadmin hotcopy is a preferred 
backup method over svnadmin dump for daily backups, because it grabs 
everything out of the repository directory while svnadmin dump misses 
things like scripts.


Re: How to prune old versions of an artefact?

2013-07-15 Thread Thomas Harold

On 7/15/2013 5:49 AM, Cooke, Mark wrote:


For your uses, perhaps you could spin this artifact off into it's own
repository (use externals from your main repo if required) and then
you can archive that repo off whenever necessary?



Sound advice for any sort of large artifact, or in the case where 
automated scripts will push commits into a repository on a regular 
(sometimes frequent) basis.


It not only keeps the size down and gives you the option to do extreme 
resets to the separate repository, but it also keeps the log history 
of the original repository from being polluted with all of the automated 
commits.


We do this on the servers that we version control using FSVS.  There are 
sections of the directory tree which need to be heavily monitored for 
changes (such as configuration reports), but which don't need to pollute 
the main SVN repository for that machine.




Re: Expected performance

2013-07-08 Thread Thomas Harold

On 7/8/2013 11:32 AM, Naumenko, Roman wrote:

Hello,

How fast would you expect svn checkout to be from a server like one
below? Considering eveyrthing on the server functioning as expected.



Our bottleneck is usually the CPU, but we're doing svn+ssh access.  So I 
lean towards a few less but more powerful cores.


The only time we thrash the disks is when doing the nightly hotcopy of 
our repositories (total of about 110GB).


Re: Expected performance (svn+ssh)

2013-07-08 Thread Thomas Harold

On 7/8/2013 2:18 PM, Naumenko, Roman wrote:


That box has more than enough CPUs (forty), cores are barely utilized.
How is the access over ssh can be configured? I thought it's only
http(s) or svn proto.



http://svnbook.red-bean.com/en/1.7/svn.basic.in-action.html#svn.advanced.reposurls

http://svnbook.red-bean.com/en/1.7/svn.serverconfig.svnserve.html#svn.serverconfig.svnserve.sshtricks

svn+ssh access has some upsides and downsides.  For us, it was simpler 
to get up and running with it back in 2007 when we were still getting 
our feet wet with SVN 1.4.  We weren't ready to muck around with Apache 
httpd and SSL certificates to do https access to the repository.


We grant access at the repository level via Linux file system 
permissions.  This means that every user needs to have their own system 
account and belong to Linux group that owns the repository.


chown -R svn-group1 /var/svn/svn-repository1
chmod -R 770 /var/svn/svn-repository1
chmod -R g+s /var/svn/svn-repository1

Where the 770 is some combination of, 770, 775, 755, 750, 700.

770 = owner read/write, group read/write, other none
750 = owner read/write, group read-only, other none

To keep things sane, we do not set permission by hand, but edit a script 
that can be re-run to fix permissions on the repositories.  Most of our 
repositories follow a set naming pattern, which makes it easier.


The other advantage of svn+ssh is that it works well when using FSVS, 
because you can edit ~/.ssh/config so that FSVS can login to the SVN 
server automatically and push/pull configuration file changes.


Re: WebDAV support in future versions of SVN server?

2013-06-26 Thread Thomas Harold

On 6/26/2013 8:15 AM, Nico Kadel-Garcia wrote:

On Tue, Jun 25, 2013 at 9:55 AM, Thomas Harold thomas-li...@nybeta.com wrote:

Is it still a long-term goal to maintain the ability to mount a SVN
repository as a WebDAV folder?


Out of curiosity, why do you feel the need for this? Working in a
remote copy isn't enough for your uses?



Less technical users like the idea of being able to treat the SVN 
repository as a mapped drive where everything is auto-versioned.





WebDAV support in future versions of SVN server?

2013-06-25 Thread Thomas Harold
Is it still a long-term goal to maintain the ability to mount a SVN 
repository as a WebDAV folder?


Based on this message from 2009:

http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462dsMessageId=1180976

It sounds like the SVN server is still planning on supporting WebDAV 
clients, but moving the svn client away from talking WebDAV to the HTTP 
server?


But before I go and roll out WebDAV to our users, I'd like to make sure 
that SVN isn't going to drop WebDAV client support in the next few years.


Re: Advice for changing filename case in SVN on case insensitive system

2013-06-21 Thread Thomas Harold

On 6/20/2013 6:56 PM, Geoff Hoffman wrote:

deleting the file from Subversion, then adding the copy with the correct
case.

Question: Doesn't that blow away revision history? If I didn't care
about revision history I would just start over with a fresh repo.


If you use svn mv to do the change, it does not blow away the revision 
history for the file.  You can, however, choose to have log output stop 
on copy.


http://svnbook.red-bean.com/en/1.7/svn.ref.svn.html#svn.ref.svn.sw.stop_on_copy

(In TortoiseSVN's log viewer, there is a checkbox at the bottom called 
Stop on copy/rename that you can turn off.)




I also thought about doing full URL svn mv's but seemed like that could
take a very long time to do...



It probably will be slow, depending on which access method you use, and 
each mv would result in a new transaction in the repository.  I tend 
to only do server-side moves (URL to URL) for the renaming of upper 
level folders in the tree, which is a rare occurrence for us.  All other 
moves we try to do at the working copy level.


(As with everything, it's best to do a test on an inconsequential file 
before doing any mass moves.)


Re: Subversion Exception! line 647: assertion failed (peg_revnum != SVN_INVALID_REVNUM)

2013-06-20 Thread Thomas Harold

On 6/20/2013 11:55 PM, Sandeepan Kundu wrote:


Tried going back to 1.7, but it is telling project is in higher version :((

how to use now, my development is getting affected!!!


I suggest renaming your old (upgraded to 1.8) working copy out of the 
way, doing a fresh checkout using 1.7 into a fresh working copy folder, 
then copying over changed files from the upgraded 1.8 working copy which 
isn't working.


Naturally, making a backup of the borked working copy is strongly 
suggested if you had uncommitted changes.




Re: Crash in 1.8.0 (db/format layout linear)

2013-06-19 Thread Thomas Harold

On 6/19/2013 5:30 AM, Daniel Shahaf wrote:

Does %REPOS_DIR%\db\format contain

4
layout linear?

If so, that's a known issue that will be fixed in 1.8.1.



Out of curiosity, which versions of SVN produced a layout linear?  I'm 
guessing that was from back in the SVN 1.4 days (repo format #2) as 
layout sharded was added in SVN 1.5 (repo format #3)?  At least, 
that's the impression that I got from:


http://svn.apache.org/repos/asf/subversion/trunk/tools/server-side/fsfs-reshard.py

Checked our repositories with the following:

find /var/svn/ -maxdepth 3 -name format -exec grep -H 'layout' {} \;




Re: Apache Subversion 1.8.0 Released (Apache module support via DSO through APXS)

2013-06-18 Thread Thomas Harold

On 6/18/2013 8:06 AM, Branko Čibej wrote:

We're happy to announce the release of Apache Subversion 1.8.0.


While running ./configure (on CentOS 6), I see the following in the output:

checking for Apache module support via DSO through APXS... no
==
WARNING: skipping the build of mod_dav_svn
 try using --with-apxs
==

Turns out I had not installed the httpd-devel package.  After installing 
httpd-devel-2.2.15-28.el6.centos.x86_64 from the CentOS repositories, 
the warning went away.




Re: Apache Subversion 1.8.0 Released (An appropriate version of serf could not be found)

2013-06-18 Thread Thomas Harold
And the last hurdle I had to jump over to get svn 1.8.0 to compile on my 
CentOS 6 box.  When running ./configure, I had the following message 
show up:




checking was serf enabled... no

An appropriate version of serf could not be found, so libsvn_ra_serf
will not be built.  If you want to build libsvn_ra_serf, please
install serf 1.2.1 or newer.



I downloaded serf from:

http://code.google.com/p/serf/downloads/list

Ran the ./configure, make, make install steps

Then I had to tell SVN's configure command where to find the serf libraries:

$ ./configure --with-serf=/usr/local/serf/

I probably could have also downloaded the serf-devel and serf RPMs from 
Wandisco's site:


http://opensource.wandisco.com/rhel/6/svn-1.8/RPMS/x86_64/



Re: svn 1.8 migration - directory deltification and revprop packing

2013-06-11 Thread Thomas Harold

On 6/11/2013 8:52 AM, C. Michael Pilato wrote:

One advantage of being in a room full of Subversion developers, specifically
the guy that implemented all this stuff, is that I can ask him directly
about how to respond to this mail.  :-)  Hopefully I will accurately
represent the answers Stefan Fuhrmann just gave me to your questions.



Thank you very much.


b) Is there a global setting file that can be used to enable
directory/property deltifications? Or will we have to update the fsfs.conf
file for each newly created repository in order to get this feature?


In 1.8, you'll need to toggle this for each new repository manually.


I'll cobble something together with grep, sed/awk, and find then 
to monitor (and update) our fsfs.conf files.



As for the --deltas option, that has nothing in the world to do with the
types of deltas we're discussing here.  (As an aside, I would highly
recommend that, unless you need your dumpfiles to be smaller, avoid the
--deltas option.  The performance penalty of using it isn't worth it.)


Right now, the size of our dumpfile directory is 207G, while the hotcopy 
is only 104G.  So the size savings could be big for us.  The hotcopy 
backup is still our preferred solution, with the dump files being a 
worst-case fallback.



#2 - revision property (revprops) files packing

a) Will there be a svnadmin pack command like there was for SVN 1.6? Or
will we need to do a full dump/load of the repository to pack the revprops?


The existing 'svnadmin pack' command will govern both revision and revprop
packing, and will keep the two in sync with each other.  'svnadmin upgrade'
will also take the opportunity to synchronize the packing status of the
revision properties with that of the revision backing files.


Thanks, the svn book is light on details of what exactly counts as 
minimum amount of work needed for svnadmin upgrade.




Re: svn 1.8 migration - svnadmin hotcopy

2013-06-11 Thread Thomas Harold

On 6/11/2013 10:20 AM, Stefan Sperling wrote:

On Tue, Jun 11, 2013 at 10:13:15AM -0400, Thomas Harold wrote:

Right now, the size of our dumpfile directory is 207G, while the
hotcopy is only 104G.  So the size savings could be big for us.  The
hotcopy backup is still our preferred solution, with the dump files
being a worst-case fallback.


Please try the new svnadmin hotcopy --incremental.
It should accelerate your backup process.



Yes, I'm looking forward to that feature in 1.8.  We currently tackle 
the time issue in two ways:


1) We only svnadmin hotcopy repositories which have changed in the last 
N days (typically 3 days).  Since we have about 300 repositories 
currently, but we don't do work on things in all 300 constantly, this 
means we only backup a few dozen repositories each night.


BASE=/var/svn/
DAYS=3

# Directories get randomized with the perl fragment, so that
# they get processed in random order.  This makes the backups
# more reliable over the long term in case one directory
# causes problems.
DIRS=`$FIND ${BASE} -maxdepth 3 -name current -mtime -${DAYS} | \
$GREP 'db/current$' | \
$SED 's:/db/current$::' | $SED s:^${BASE}:: | \
perl -MList::Util -e 'print List::Util::shuffle '`

2) We read the svn repositories from one set of spindles and write the 
hotcopy to a second spindle set.  Even with the 104GB and 300 
repositories that we have, this only takes ~37 minutes.


It still takes 4-5 hours to perform the rdiff-backup step that pushes 
the hotcopy folder over to our internal backup server, but that's more 
because of the tens of thousands of revprops files in some of the 
repositories.  Which is another feature in 1.8 that I'm looking forward to.




svn 1.8 migration - directory deltification and revprop packing

2013-06-10 Thread Thomas Harold

Questions about the 1.8 upgrade path:

#1 - In reading the release notes for 1.8, I'm interested in the 
directory/property storage reduction as described in:


http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements


Directory and property storage reduction

For each changed node in a FSFS repository, new versions of all parent 
directories will be created. Larger repositories tend to have relatively deep 
directory structures with at least one level (branches, modules or projects) 
having tens of entries. The total size of that data may well exceed the size of 
the actual change. Subversion 1.8 now supports directory deltification which 
eliminates most of that overhead.

In db/fsfs.conf, you may now enable and disable directory deltification at any 
time and these settings will be applied in new revisions. For completeness, 
node properties may now be deltified as well although the reductions in 
repository size will usually be minimal.

By default, directory and property deltification are disabled. You must edit 
db/fsfs.conf to enable these features.

Also, db/fsfs.conf now allows for fine-grained control over how deltification 
will be applied. See the comments in that file for a detailed description of 
the individual options.


a) Why are directory/property deltifications turned off by default? 
What are the risks to enabling them across all repositories?  (Yes we 
backup daily with svnadmin hotcopy, then rdiff-backup the hotcopy with 6 
months of backup history kept.  So we can always rewind to any day 
within the last 6 months.)


b) Is there a global setting file that can be used to enable 
directory/property deltifications? Or will we have to update the 
fsfs.conf file for each newly created repository in order to get this 
feature?


c) Is it a safe assumption that in order to apply this change to an 
older repository that we will need a dump/load cycle?  Will we need a 
full dump or will an delta style dump suffice (--deltas option of 
svnadmin dump command)?


#2 - revision property (revprops) files packing

a) Will there be a svnadmin pack command like there was for SVN 1.6? 
Or will we need to do a full dump/load of the repository to pack the 
revprops?


b) Does revprop caching only need to be enabled for http/https access 
and does it have any effect on svn+ssh access?  (All of our users 
currently use svn+ssh access, but we are considering moving to http/https.)





Re: Subversion access control / Linux users etc.

2011-07-21 Thread Thomas Harold
The issues with passwords is why we ended up going with SSH public-key 
authentication.  Load the SSH key into the SSH agent, unlock it with the 
passphrase, then don't worry about it again until we reset the SSH agent 
at logout.


Less prompts, happier users.

(Plus it makes it harder to get into our servers since we don't allow 
password authentication.)


Re: Subversion: existing users

2011-07-20 Thread Thomas Harold

On 7/17/2011 2:07 AM, Andy Canfield wrote:

The most obvious authorization scheme is that of the host server; if
there is a user named andy on that server with a password jackel
then I would like to simply be able to talk to the subversion server as
user named andy password jackel. This is how ssh and sftp work. But
apparently subversion can't handle that. True?


You can use individual accounts, the main trickiness is in making sure 
that the svn repository directory is group owned, group writable and 
that new files created within the repo/db tree are owned by the group 
and not the individual's primary group.  A quick chmod -R g+s repo/db 
after setting up the repository takes care of that.


Our server only allowed SSH public-key authentication, so the only way 
to login (other then physically at the console) is via the SSH keys.  So 
the command= line in the authorized_keys files is reasonably secure 
for our purposes.  Very few users actually have a way to get to the 
shell.  And most of those don't even know the password for their account 
on the server.


(Naturally, we run backups daily, just in case someone does figure out 
how to get a shell through the svnserve process and deletes a 
repository.  But if they can commit to the repository, there are more 
nefarious things they can do there too.)


We prefix our ssh-rsa lines in the ~/.ssh/authorized_keys file with:

command=/usr/bin/svnserve -t -r /var/svn,
no-agent-forwarding,no-pty,
no-port-forwarding,no-X11-forwarding

This also has the advantage that remote URL ends being:

svn+ssh://servername/repositoryname/path/within/repo

Instead of:

svn+ssh://servername/var/svn/repositoryname/path/within/repo

With SSH ~/.ssh/config files or by setting up PuTTY sessions correctly 
you can get rid of having the usernames / port numbers in the svn+ssh 
URL.  (We run our SSH servers on a non-standard port.)




Re: Move to a new repo and keep the history, Part 2

2011-07-14 Thread Thomas Harold

On 7/14/2011 12:29 PM, K F wrote:

Recap – I would like to move some directories from one repository to
another while keeping the history.



I went through this a few months ago (and maybe this will help).  We 
were using a big monolithic repository for all of our jobs. Our 
repository was arranged as:


(jobs repository)
/A/ABClient/ABJob1/...
/A/AXClient/AXJob1/...
/A/AXClient/AXJob2/...
/B/BQClient/BQJob1/...
/C/CAClient/CAJob1/...
/C/CMClient/CMJob1/...
/C/CMClient/CMJob2/...

We wanted to split each client out to a different repository.  The 
output repositories would look like:


(jobs-ab)
/ABJob1/...

(jobs-ax)
/AXJob1/...
/AXJob2/...

(jobs-bq)
/BQJob1/...

(jobs-ca)
/CAJob1/...

(jobs-cm)
/CMJob1/...
/CMJob2/...

Which made all the URLs a lot shorter because the /A/ABClient was 
often something lengthy like /A/AB_Acme_Border_Wings_Inc.


1) A shell script to split out a specific directory.  We had to edit the 
CLCODE and CLPATH lines for each run (took 30-40 minutes to parse the 
monolithic jobs repository and split out a particular client's tree).


Each time I did a new client, I had to make sure that everyone was ready 
for that client's project tree to move.  Alternately, I could have made 
the entire jobs tree read-only for a week...


(Apologies if there are errors in this as I had to quickly edit out some 
client/company specific paths.  I always executed the script as bash -x 
scriptname so I could spot errors.  The date lines are just there so 
I could keep track of how long it took.)


#!/bin/bash

DESTDIR=/var/svn/
DESTPFX=svn-raw-jobs-
DESTSFX=10xx.dump.gz

CLCODE=bq
CLPATH=B/BQClient

SDFOPTS='--drop-empty-revs  --renumber-revs'

date

echo ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

svnadmin dump --quiet /var/svn/jobs | \
svndumpfilter include --quiet $SDFOPTS $CLPATH | gzip  \
${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

date

2) Created the new repository (such as jobs-bq for the BQClient). 
Although you could probably roll that into the import script.


# svnadmin create jobs-bq

3) Edited the following import script for each new run.  Loading it up 
was a fairly quick process.  Note that we update the UUID of the new 
repository to make sure that nobody commits outdated stuff.


I gave up on trying to re-base on the fly using sed and simply moved 
all of the individual job folders into the root of the new repository 
and then cleaned up the left-over folder.


Our script also had to create the letter level in the new repository. 
 Otherwise the import had no place to hang itself off of.


You may want to drop the chmod/chgrp lines towards the end.  For our 
server, we only use svn+ssh authentication.  Each users has their own 
local account on the server and they belong to a svn-jobs group which 
gives them read/write access to the entire repository.


#!/bin/bash

SRCDIR=/var/svn/
SRCPFX=svn-raw-jobs-
SRCSFX=10xx.dump.gz

DESTDIR=/var/svn/
DESTPFX=svn-newbase-jobs-
DESTSFX=10xx.dump.gz

CLPARENT=B
CLCODE=bq

date

svn mkdir -m Import from jobs \
file:///var/svn/jobs-${CLCODE}/${CLPARENT}

gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \
svnadmin load --quiet /var/svn/jobs-${CLCODE}

svnlook uuid /var/svn/jobs-${CLCODE}
svnadmin setuuid /var/svn/jobs-${CLCODE}
svnlook uuid /var/svn/jobs-${CLCODE}
svnadmin pack /var/svn/jobs-${CLCODE}

chmod -R 775 /var/svn/jobs-${CLCODE}
chmod -R g+s /var/svn/jobs-${CLCODE}/db
chgrp -R svn-jobs /var/svn/jobs-${CLCODE}

date

4) After the load into the new repository, I would access the repository 
through TortoiseSVN's Repository Browser and drag all of the individual 
jobs folders from being under /A/ABClient/ up to the root of the new 
repository.  Then I deleted the /A/ABClient/ tree.


If I could have figured out how to remap on the fly via sed, I could 
have avoided step #4.  But it was good enough for our purposes and a few 
historical entries in the SVN log showing the movement of the folders 
wasn't a big deal.


5) After we finished the splits, we chmod'd the old repository to be 
read-only and took it completely offline a month or two later.


Re: Problem Loading Huge Repository

2011-06-17 Thread Thomas Harold

On 6/16/2011 7:05 PM, Bruno Antunes wrote:


Do you know any faster way to load the dump file or to filter out
some projects/revisions so I can speed up the process?



Are you CPU-bound? Or are you limited by disk speed? If you're limited
by disk access times, make sure that the source file that you're loading
from is on a different disk then the destination repository. Even if you
toss the 45GB dump file onto a USB2 external disk, you'll see a speed
increase.

And if you have a choice of file systems for the repository to be stored
on, make sure that it's something which can deal with a few hundred
thousand tiny files.  On Linux, I'd suggest going with ext4 over ext3.
While db/revs in a FSFS repository can have its revisions packed to
reduce the file count, the db/revprops folder still consists of 1 tiny
file for every revision in the project in a FSFS repository.



Re: Problem Loading Huge Repository

2011-06-17 Thread Thomas Harold

On 6/17/2011 10:54 AM, Daniel Shahaf wrote:

Thomas Harold wrote on Fri, Jun 17, 2011 at 10:31:43 -0400:

And if you have a choice of file systems for the repository to be stored
on, make sure that it's something which can deal with a few hundred
thousand tiny files.  On Linux, I'd suggest going with ext4 over ext3.
While db/revs in a FSFS repository can have its revisions packed to
reduce the file count, the db/revprops folder still consists of 1 tiny
file for every revision in the project in a FSFS repository.


revprops/ is sharded.

And in 1.7 (including the recent 1.7.0-alpha1) it is packed, too.



Good.  Another of the many reasons that we're looking forward to 1.7.

Even with the sharding, those little revprop files are causing us issues 
during backups (hotcopy - rdiff-backup).  Being able to pack those 
revprop files is going to make a big difference as the backup process 
will only have to track 2000-2200 files instead of 30,000 to 50,000.


(We have a few long-lived repositories with up to 25k revisions.  And I 
just finished splitting a 22GB repository with 15-16k revs into a bunch 
of smaller repositories.  Now the nightly backup can look at doing a 
hotcopy on only the repositories with changes in the last 5 days.)


Re: Subversion 1.6.17 Released

2011-06-02 Thread Thomas Harold

On 6/1/2011 7:27 PM, Daniel Shahaf wrote:


Be advised: the 1.6.17 tag [1] in our repository does not match the
tarballs at the time of this writing.  Until we fix this, please use the
tarballs or zip archives, and avoid installing 1.6.17 from the tag.

Daniel

[1] https://svn.apache.org/repos/asf/subversion/tags/1.6.17



I'm guessing that is why the following URL returns a 404 at the moment?

http://svn.apache.org/repos/asf/subversion/tags/1.6.17/CHANGES