pack and rsync Re: Creating and Verifying a Reliable backup

2016-06-30 Thread Daniel Shahaf
Andrew Reedick wrote on Mon, Jun 27, 2016 at 20:00:02 +:
> However, I'm not sure what the pros/cons of packing are in regards to rsync.

Packing a repository moves revision data from the existing files to new
files: it basically concatenates each 1000 revision files into a single
file.

I think rsync will copy the concatenated files as new files (despite
most of their contents already being available in other files at the
destination).  Therefore, if you've packed the master repository and are
about to rsync it, you will save bandwidth by packing the mirror before
rsyncing to it.

(rsync must be run inside an 'svnadmin freeze' but packing must be run
outside it.)

Cheers,

Daniel


RE: Creating and Verifying a Reliable backup

2016-06-27 Thread Andrew Reedick
> From: Michael Schwager [mailto:mschw...@gmail.com] 
> Sent: Wednesday, June 22, 2016 10:25 AM
> To: users@subversion.apache.org
> Subject: Re: Creating and Verifying a Reliable backup
>
> Following is an update to my question of Jun 1, where I ask the following 
> question:
>
... snip verify/backup/verify/rsync/verify script...
>

If you're not already doing it, you might want to pack your repos in order to 
the make the backups and/or copying faster.  Working on thousands of small 
files is incredibly slow/inefficient.
http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.diskspace.fsfspacking

However, I'm not sure what the pros/cons of packing are in regards to rsync.




Re: Creating and Verifying a Reliable backup

2016-06-22 Thread Michael Schwager
Following is an update to my question of Jun 1, where I ask the following
question:

On Wed, Jun 1, 2016 at 9:58 AM, Michael Schwager  wrote:

> ...
> My question is: How do I back (subversion repos) up reliably, and verify
> (them) so that I can deliver a 100% recovery guarantee to my boss?
> ...
> I have compiled subversion-1.9.4 on the server under
> /opt/subversion-1.9.4. If I run that version of svn hotcopy, it appears to
> work and svnverify exits successfully. But ... I find that a file is
> missing: repos2/db/rev-prop-atomics.shm .
>

I have implemented a script that performs the following, in abbreviated
Bash form. Note that I have enough disk space for the original repos
directories, plus 2 copies. I am backup up the whole enchilada to tape. And
archiving it to another pair of tapes monthly and taking one of them
offsite. There... that oughta do it :-) . Without further ado:

LOGFILE=/var/log/backup_svn.$(date +%a)
exec 1>&$LOGFILE
exec 2>&1
REPODIR=/home/svn/svn/repositories
HOTCOPYDIR=/home/svn/hotcopy
RSYNCDIR=/home/svn/rsync
RELIABLE_SVNADMIN=/opt/subversion-1.9.4/bin/svnadmin
EMAIL=usern...@example.com
# $repos is a space-separated list of all the directories in
/home/svn/svn/repositories
for repo in $repos; do
#
# VERIFY ORIGINAL
--
#
echo "***
*"
echo " REPO $repo pre-backup sanity check
***"
echo
""
svnadmin verify $repo || { mail -s "ERROR: SVN directory FATAL
error. This is a BIG problem. Backups are HALTED." $EMAIL < $LOGFILE; exit
1; }

#
# HOTCOPY
--
#
echo
""
echo " REPO $repo hotcopy
***"
echo
""
rm -rf $HOTCOPYDIR/$repo.1
[[ -e $HOTCOPYDIR/$repo ]] && mv $HOTCOPYDIR/$repo
$HOTCOPYDIR/$repo.1
$RELIABLE_SVNADMIN hotcopy $REPODIR/$repo $HOTCOPYDIR/$repo || {
mail -s "ERROR: svn hotcopy exited with nonzero status." $EMAIL < $LOGFILE;
exit 1; }
echo
""
echo " REPO $repo hotcopy verify
"
echo
""
svnadmin verify $HOTCOPYDIR/$repo || { mail -s "ERROR: SVN hotcopy
error. The original is fine, hotcopy is not." $EMAIL < $LOGFILE; exit 1; }

#
# RSYNC

#
echo
""
echo " REPO $repo rsync
*"
echo
""
rm -rf $RSYNCDIR/$repo.1
[[ -e $RSYNCDIR/$repo ]] && mv $RSYNCDIR/$repo  $RSYNCDIR/$repo.1
$RELIABLE_SVNADMIN freeze $repo -- rsync -av $repo $RSYNCDIR/ || {
mail -s "ERROR: rsync of svn repo $repo exited with nonzero status." $EMAIL
< $LOGFILE; exit 1; }
echo
""
echo " REPO $repo rsync verify
**"
echo
""
svnadmin verify $RSYNCDIR/$repo || { mail -s "ERROR: SVN rsync
error. The original is fine, rsync is not." $EMAIL < $LOGFILE; exit 1; }

echo
""
echo " Do other directories: authorization and bin
***"
echo
""
cd $REPODIR/..
rm -rf $RSYNCDIR/authorization.1
[[ -e $RSYNCDIR/authorization ]] && mv $RSYNCDIR/authorization
$RSYNCDIR/authorization.1
rm -rf $RSYNCDIR/bin.1
[[ -e $RSYNCDIR/bin ]] && mv $RSYNCDIR/bin $RSYNCDIR/bin.1
rsync -av authorization $RSYNCDIR/
rsync -av bin $RSYNCDIR/


-- 
-Mike Schwager


Re: Creating and Verifying a Reliable backup

2016-06-10 Thread Doug Robinson
At least one problem with "svnsync" is that it, by design, does not
propagate the locks.  So a sync'd repo cannot replace the sync'd-from
repository without losing all of the locks.

On Thu, Jun 2, 2016 at 9:28 AM, Daniel Shahaf 
wrote:

> Michael Schwager wrote on Wed, Jun 01, 2016 at 09:58:18 -0500:
> > We are very paranoid about our Subversion repo, notwithstanding the fact
> > that the previous sysadmin didn't back it up. But that's another story.
> Now
> > I'm here at my job, I've inherited the repo admin duties, and I want to
> > back it up reliably. If we lose it, we're all out of work.
> >
> > My question is: How do I back it up reliably,
>
> I would recommend svnsync.  It should be more robust than 'hotcopy' due
> to the way they are implemented: 'svnsync' wraps 'log' and 'commit'
> while 'hotcopy' is implemented by a dedicated codepath which bypasses
> the usual filesystem-internal reading/writing APIs.
>
> Cheers,
>
> Daniel
>



-- 
*DOUGLAS B. ROBINSON* SENIOR PRODUCT MANAGER

*T *925-396-1125
*E* doug.robin...@wandisco.com

*www.wandisco.com *

-- 


Learn how WANdisco Fusion solves Hadoop data protection and scalability 
challenges 

Listed on the London Stock Exchange: WAND 


THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE 
PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its 
subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. 
 If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone.  Any 
distribution, use or copying of this e-mail or the information it contains 
by other than an intended recipient is unauthorized.  The views and 
opinions expressed in this e-mail message are the author's own and may not 
reflect the views and opinions of WANdisco, unless the author is authorized 
by WANdisco to express such views or opinions on its behalf.  All email 
sent to or from this address is subject to electronic storage and review by 
WANdisco.  Although WANdisco operates anti-virus programs, it does not 
accept responsibility for any damage whatsoever caused by viruses being 
passed.


Re: Creating and Verifying a Reliable backup

2016-06-02 Thread Daniel Shahaf
Pavel Lyalyakin wrote on Wed, Jun 01, 2016 at 19:29:14 +0300:
> Yes, hotcopy makes full repository copy with locks and hook scripts.
> Read SVNBook[2].

For FSFS repositories, 'hotcopy' doesn't backup locks atomically [1];
one workaround to that is to wrap it by a 'freeze', as in 'svnadmin
freeze r svnadmin hotcopy r r2'.

However, for BDB repositories, that command will deadlock instantly (can
be killed by ^\ SIGQUIT), so the safe idiom would be:

# Workaround SVN-3750
case "$(LC_ALL=C svnadmin info -- "$r" | sed -ne 's/^Filesystem Type: 
//p')" in
   fsfs) svnadmin freeze -- "$r" svnadmin hotcopy -- "$r" "$r.hotcopy";;
   bdb)  svnadmin hotcopy -- "$r" "$r.hotcopy";;
   *)TODO ;;
esac

Cheers,

Daniel

[1] https://issues.apache.org/jira/browse/SVN-3750



Re: Creating and Verifying a Reliable backup

2016-06-02 Thread Daniel Shahaf
Michael Schwager wrote on Wed, Jun 01, 2016 at 09:58:18 -0500:
> We are very paranoid about our Subversion repo, notwithstanding the fact
> that the previous sysadmin didn't back it up. But that's another story. Now
> I'm here at my job, I've inherited the repo admin duties, and I want to
> back it up reliably. If we lose it, we're all out of work.
> 
> My question is: How do I back it up reliably,

I would recommend svnsync.  It should be more robust than 'hotcopy' due
to the way they are implemented: 'svnsync' wraps 'log' and 'commit'
while 'hotcopy' is implemented by a dedicated codepath which bypasses
the usual filesystem-internal reading/writing APIs.

Cheers,

Daniel


RE: Creating and Verifying a Reliable backup

2016-06-02 Thread Kerry, Richard
>
>

>On Wed, Jun 1, 2016 at 11:20 AM, Stefan Hett 
>mailto:ste...@egosoft.com>> wrote:
To ensure the integrity of a backup close to 100% I go the fail safe way:
2. svnadmin dump to dump the current repository
3. svnadmin load the dump into a fresh repository
4. svnadmin dump the newly loaded repository
5. compare the first and the last dump

>I like this suggestion. What I think I will do ultimately is check out the 
>entire repo somewhere, check out the >entire hotcopied repo somewhere, and 
>check all the files against one another.
>

"check out " "check out" "check all"
But that only checks the working copies so doesn't check the repository 
structure and hence the history.  Comparing the dumps does that.


Regards,
Richard.

Atos, Atos Consulting, Worldline and Canopy The Open Cloud Company are trading 
names used by the Atos group. The following trading entities are registered in 
England and Wales: Atos IT Services UK Limited (registered number 01245534), 
Atos Consulting Limited (registered number 04312380), Atos Worldline UK Limited 
(registered number 08514184) and Canopy The Open Cloud Company Limited 
(registration number 08011902). The registered office for each is at 4 Triton 
Square, Regent’s Place, London, NW1 3HG.The VAT No. for each is: GB232327983.

This e-mail and the documents attached are confidential and intended solely for 
the addressee, and may contain confidential or privileged information. If you 
receive this e-mail in error, you are not authorised to copy, disclose, use or 
retain it. Please notify the sender immediately and delete this email from your 
systems. As emails may be intercepted, amended or lost, they are not secure. 
Atos therefore can accept no liability for any errors or their content. 
Although Atos endeavours to maintain a virus-free network, we do not warrant 
that this transmission is virus-free and can accept no liability for any 
damages resulting from any virus transmitted. The risks are deemed to be 
accepted by everyone who communicates with Atos by email.


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Michael Schwager
On Wed, Jun 1, 2016 at 11:20 AM, Stefan Hett  wrote:

> To ensure the integrity of a backup close to 100% I go the fail safe way:
> 2. svnadmin dump to dump the current repository
> 3. svnadmin load the dump into a fresh repository
> 4. svnadmin dump the newly loaded repository
> 5. compare the first and the last dump
>

I like this suggestion. What I think I will do ultimately is check out the
entire repo somewhere, check out the entire hotcopied repo somewhere, and
check all the files against one another.

-- 
-Mike Schwager


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Michael Schwager
On Wed, Jun 1, 2016 at 11:29 AM, Pavel Lyalyakin <
pavel.lyalya...@visualsvn.com> wrote:

> Hello Michael,
>
> On Wed, Jun 1, 2016 at 7:18 PM, Michael Schwager 
> wrote:
> ...



> > My intention is to perform the following:
> ...
> > * svnadmin verify /home/svn/hotcopy/repoX. If error, warn and exit.
> > * svnadmin hotcopy into /home/svn/hotcopy/repoX. If error, warn and exit.
> > * svnadmin verify /home/svn/hotcopy/repoX. If error, warn and exit.
>
> These steps don't require you to shut down httpd.
>

Most likely quite true, but it's nighttime and I'm paranoid. I'm looking
for guarantees wherever I can find them. and the cost is quite low (ie, I
shut down something that nobody should be using anyway).

> * svnadmin freeze /path/to/repoX -- rsync -av /path/to/repoX
> > /home/svn/rsync/repoX.
> > * rsync -c -n -av /path/to/repoX /home/svn/rsync/repoX. (perform a
> checksum
> > compare) If error, warn and exit.
>
> Why don't you run `rsync` on "/home/svn/hotcopy/repoX"?
>

That assumes the hotcopy is correct. At the start of my script, the only
thing that I am comfortable assuming is correct is... well, very little.
Even the original repo may have a bug lurking in it somewhere. But the
closer I can get to the original, presumably good, copy of our code, the
happier I am every step along the way.


> > ...I've always presumed this meant that the hotcopy did back up locks and
> > hook scripts...
>
> Yes, hotcopy makes full repository copy with locks and hook scripts.
> Read SVNBook[2].
>

Excellent. Thanks for the  confirmation. I note that the svn book does
mention that it performs a "fully functional Subversion repository, able to
be dropped in as a replacement for your live repository", but as there are
caveats with the svndump command that the uninitiated (yours truly) may not
be familiar with, it's not entirely clear that hotcopy obviates all of
those concerns. Indeed, a hotcopy does not produce the name number of files
as an rsync (a file named rev-prop-atomics.shm is missing in one of my
hotcopied directories), so the question remained in my mind until now.

Thanks again for the help!
-- 
-Mike Schwager


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Michael Schwager
On Wed, Jun 1, 2016 at 11:20 AM, Stefan Hett  wrote:

> On 6/1/2016 4:58 PM, Michael Schwager wrote:
>
> ...we ran into a bug that came about under 1.8 when working with older
> repos; the hotcopy exits with the following error:
>
> svnadmin: E22: Serialized hash missing terminator
>
> As far as I can tell this indicates a problem in the repository you are
> trying to hotcopy from. Run svnadmin verify on that to get details where
> the corruption might be located and resolve that (if possible).
>


No it's a known bug when hotcopying using 1.8.x on a repo started with an
old version of subversion, findable via Google. svnadmin verify on the
original repo shows clean.

-- 
-Mike Schwager


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Michael Schwager
Yessity-yes-yes-ye! svnadmin freeze- I missed that one! That's
something I can really sink my teeth into! Awesome, thanks!

My intention is to perform the following:
* shut down httpd
Then, for each repoX:
* svnadmin verify /home/svn/hotcopy/repoX. If error, warn and exit.
* svnadmin hotcopy into /home/svn/hotcopy/repoX. If error, warn and exit.
* svnadmin verify /home/svn/hotcopy/repoX. If error, warn and exit.
* svnadmin freeze /path/to/repoX -- rsync -av /path/to/repoX
/home/svn/rsync/repoX.
* rsync -c -n -av /path/to/repoX /home/svn/rsync/repoX. (perform a checksum
compare) If error, warn and exit.

I had considered svnadmin dump, but according to the Wisdom on Serverfault,
"[svnadmin dump] also fails to backup things like the repository scripts
and a few other things..."
(
http://serverfault.com/questions/540214/is-it-necessary-to-do-an-svnadmin-dump-to-backup-subversion
)

"One other thing to watch is svnadmin dump doesn't backup locks and hook
scripts."
http://serverfault.com/questions/52252/best-way-to-do-subversion-backups?rq=1

...I've always presumed this meant that the hotcopy did back up locks and
hook scripts...

On Wed, Jun 1, 2016 at 10:51 AM, Pavel Lyalyakin <
pavel.lyalya...@visualsvn.com> wrote:

> Hello Michael,
>
> On Wed, Jun 1, 2016 at 5:58 PM, Michael Schwager 
> wrote:
> > Hello,
> > We are very paranoid about our Subversion repo, notwithstanding the fact
> > that the previous sysadmin didn't back it up. But that's another story.
> Now
> > I'm here at my job, I've inherited the repo admin duties, and I want to
> back
> > it up reliably. If we lose it, we're all out of work.
> >
> > My question is: How do I back it up reliably, and verify it so that I can
> > deliver a 100% recovery guarantee to my boss? I have Subversion 1.8.4 on
> a
> > CentOS 6.3 server, and Tortoise SVN 1.8.11 on Windows 7 clients.
>
> Don't forget about `svnadmin verify`[1]. It makes sense to run it
> right now if you've never ran it. ;)
>
> And if you want to deliver recovery guarantee close to 100%, consider
> `svnadmin dump`[2] in addition to hotcopy.
>
> > I am thinking to do both an svn hotcopy to one directory, and an rsync to
> > another. The svn hotcopy will give me a backup that I'm pretty sure is
> > reliable (see Notes below). Assuming httpd is down and I can guarantee
> that
> > I am the only person who will be logged into the SVN server, can I expect
> > with 99.9% surety that the svn repos are quiescent?
>
> I'm not sure that I understand what you mean by "quiescent" here.
> What's your concern?
>
> So you are planning to make a hotcopy and then transfer this hotcopy
> using rsync to another machine, right?
>
> * It's safe to run `svnadmin hotcopy` on a live repository.
>
> * And if you'd have to use `rsync` on a *live* repository, run
>  `svnadmin freeze`[3] to prevent concurrent commits to the repo while
>  `rsync` is running.
>
> If httpd is disabled and you are the only user logged on, this means that
> your repositories should be untouched by any other processes at this
> moment. Unless you have some indexing services that touch the repos
> (and it is generally recommended to exclude repos from indexing).
>
> > Notes:
> >
> > We're a little worried about svn hotcopy; we ran into a bug that came
> about
> > under 1.8 when working with older repos; the hotcopy exits with the
> > following error:
> >
> > svnadmin: E22: Serialized hash missing terminator
>
> It seems to be a bug #4554[4] that has been fixed in Subversion 1.9.
> `svnadmin hotcopy` could fail on a repository with older FSFS format.
>
> > I have compiled subversion-1.9.4 on the server under
> /opt/subversion-1.9.4.
> > If I run that version of svn hotcopy, it appears to work and svnverify
> exits
> > successfully. But if I look at all the files under both the original and
> the
> > hotcopy on one of our repos, I find that a file is missing:
> > repos2/db/rev-prop-atomics.shm . That's probably ok, but still- how do we
> > know the latest hotcopy, and hotcopies of the future, are and will remain
> > 100% bug-free?
>
> Use `svnadmin verify` and consider using `svnadmin dump` in addition
> to hotcopies. Make sure that your Subversion server is on the latest
> patch update too.
>
> [1]: http://svnbook.red-bean.com/en/1.8/svn.ref.svnadmin.c.verify.html
> [2]: http://svnbook.red-bean.com/en/1.8/svn.ref.svnadmin.c.dump.html
> [3]: http://svnbook.red-bean.com/en/1.8/svn.ref.svnadmin.c.freeze.html
> [4]: https://issues.apache.org/jira/browse/SVN-4554
>
> --
> With best regards,
> Pavel Lyalyakin
> VisualSVN Team
>



-- 
-Mike Schwager


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Matt Garman
On Wed, Jun 1, 2016 at 10:58 AM, Pavel Lyalyakin
 wrote:
> Hello Matt,
>
> Why do you use `rsync`?
>
> [1]: http://svnbook.red-bean.com/en/1.8/svn.ref.svnadmin.c.freeze.html
> [2]: http://svnbook.red-bean.com/en/1.8/svn.ref.svnadmin.c.hotcopy.html

I have no good reason!  Backup scripts were setup long ago, and
rsync-based backup scripts (in general) are routine for me.  I wasn't
aware of freeze and hotcopy at the time.

But thank you Mike for starting this thread, clearly my scripts are
overdue for some improvement!

-Matt


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Pavel Lyalyakin
Hello Michael,

On Wed, Jun 1, 2016 at 7:18 PM, Michael Schwager  wrote:
> Yessity-yes-yes-ye! svnadmin freeze- I missed that one! That's
> something I can really sink my teeth into! Awesome, thanks!
>
> My intention is to perform the following:
> * shut down httpd
> Then, for each repoX:
> * svnadmin verify /home/svn/hotcopy/repoX. If error, warn and exit.
> * svnadmin hotcopy into /home/svn/hotcopy/repoX. If error, warn and exit.
> * svnadmin verify /home/svn/hotcopy/repoX. If error, warn and exit.

These steps don't require you to shut down httpd.

> * svnadmin freeze /path/to/repoX -- rsync -av /path/to/repoX
> /home/svn/rsync/repoX.
> * rsync -c -n -av /path/to/repoX /home/svn/rsync/repoX. (perform a checksum
> compare) If error, warn and exit.

Why don't you run `rsync` on "/home/svn/hotcopy/repoX"?

You won't need to shut down httpd or run `svnadmin freeze` in such
case.

> I had considered svnadmin dump, but according to the Wisdom on Serverfault,
> "[svnadmin dump] also fails to backup things like the repository scripts and
> a few other things..."
> (http://serverfault.com/questions/540214/is-it-necessary-to-do-an-svnadmin-dump-to-backup-subversion)

Yep, this is right. Subversion repository dump streams (the dump
generated by `svnadmin dump` or `svnrdump`) contain versioned history
only. Read SVNBook[1].

> "One other thing to watch is svnadmin dump doesn't backup locks and hook
> scripts."
> http://serverfault.com/questions/52252/best-way-to-do-subversion-backups?rq=1
>
> ...I've always presumed this meant that the hotcopy did back up locks and
> hook scripts...

Yes, hotcopy makes full repository copy with locks and hook scripts.
Read SVNBook[2].

[1]: 
http://svnbook.red-bean.com/en/1.8/svn.reposadmin.maint.html#svn.reposadmin.maint.migrate
[2]: 
http://svnbook.red-bean.com/en/1.8/svn.reposadmin.maint.html#svn.reposadmin.maint.backup

--
With best regards,
Pavel Lyalyakin
VisualSVN Team


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Stefan Hett

On 6/1/2016 4:58 PM, Michael Schwager wrote:

Hello,
We are very paranoid about our Subversion repo, notwithstanding the 
fact that the previous sysadmin didn't back it up. But that's another 
story. Now I'm here at my job, I've inherited the repo admin duties, 
and I want to back it up reliably. If we lose it, we're all out of work.


My question is: How do I back it up reliably, and verify it so that I 
can deliver a 100% recovery guarantee to my boss? I have Subversion 
1.8.4 on a CentOS 6.3 server, and Tortoise SVN 1.8.11 on Windows 7 
clients.


I am thinking to do both an svn hotcopy to one directory, and an rsync 
to another. The svn hotcopy will give me a backup that I'm pretty sure 
is reliable (see Notes below). Assuming httpd is down and I can 
guarantee that I am the only person who will be logged into the SVN 
server, can I expect with 99.9% surety that the svn repos are quiescent?


Thanks.
--
-Mike Schwager

Notes:

We're a little worried about svn hotcopy; we ran into a bug that came 
about under 1.8 when working with older repos; the hotcopy exits with 
the following error:


svnadmin: E22: Serialized hash missing terminator
As far as I can tell this indicates a problem in the repository you are 
trying to hotcopy from. Run svnadmin verify on that to get details where 
the corruption might be located and resolve that (if possible).


I have compiled subversion-1.9.4 on the server under 
/opt/subversion-1.9.4. If I run that version of svn hotcopy, it 
appears to work and svnverify exits successfully. But if I look at all 
the files under both the original and the hotcopy on one of our repos, 
I find that a file is missing: repos2/db/rev-prop-atomics.shm . That's 
probably ok, but still- how do we know the latest hotcopy, and 
hotcopies of the future, are and will remain 100% bug-free?

To ensure the integrity of a backup close to 100% I go the fail safe way:
1. svnadmin verify to ensure the current repository is in a good state 
(if there are errors/issues resolve them)

2. svnadmin dump to dump the current repository
3. svnadmin load the dump into a fresh repository
4. svnadmin dump the newly loaded repository
5. compare the first and the last dump
6. run svnadmin verify on the loaded dump to en
If both dumps are equal, I'm certain enough the integrity is given. 
(Note: this implies the same fsfs-format as well as the same server 
version for the svnadmin calls).


This process however only works when you can take down access to a 
repository completely. Otherwise svnadmin hotcopy would be my choice 
too. In addition setting up a mirror which is kept in sync using svnsync 
is also a reasonable measurement to further increase the reliability of 
an SVN repository IMO.


However, it's also utterly vital to keep the server up to date with 
patch releases, so you are not suffering flaws/bugs which could impact 
the server side. Some examples for issues which have been fixed a long 
time ago and are not fixed in 1.8.4 (but might be relevant for your case 
of ensuring data correctness/integrity and a reliable backup system):
1.8.5: hotcopy: fix hotcopy losing revprop files in packed repos (issue 
#4448)

1.8.9: svnadmin dump: don't let invalid mergeinfo stop dump
1.8.9: svnrdump load: fix crash when svn:* normalization (issue #4490)
1.8.9: mod_dav_svn: detect out of dateness correctly during commit 
(issue #4480)

1.8.11: disable revprop caching feature due to cache invalidation problems
1.8.13: svnadmin load: tolerate invalid mergeinfo at r0 (issue #4476)
1.8.13: svnadmin load: strip references to r1 from mergeinfo (issue #4538)
1.8.13: svnsync: strip any r0 references from mergeinfo (issue #4476)
1.8.14: prevent possible repository corruption on power/disk failures
1.8.16: dump: don't write broken dump files in some ambiguously encoded 
fsfs repositories (issue #4554)


If you feel very conservative you might also wanna consider staying at 
the old stable version for the server (sidenote: even with the server 
running at 1.8, clients could use svn 1.9), if you don't need 
bugfixes/improvements of the current stable svn version. While the 
current stable build (1.9.4) contains fixes for issues still present in 
1.8.16 and also delivers new features, it also contains 
features/improvements which by the nature of software lifecycles have 
not been tested as long in the wild as the features present in 1.8.16. 
So one might argue that from the code integrity point of view, 1.8.16 
would be the safer choice to go.


If you are compiling your own server, be sure to keep the dependencies 
used also up to date.


--
Regards,
Stefan Hett



Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Mark Phippard
On Wed, Jun 1, 2016 at 11:55 AM, Michael Schwager 
wrote:

> Thanks Matt. To your point,
>
> > my collection of backups probably does have some corrupt repo trees...
>
> that is really what I'm driving at. The RAID, offsite, number of backups
> (nightly in our case), and testing is all covered. In other words, I can
> mitigate the effects of failure with all those tried-and-true sysadmin
> techniques.
>
> The essence of my question drives to Subversion specifically. I don't want
> *any* unknown corrupt Subversion repo backups lying around. Meaning I don't
> trust a shotgun approach, where I do enough of them so one is bound to be
> good. I'm looking for a precision approach, where I can be reasonably
> assured that the techniques I'm doing will provide me with a recoverable
> repo at any chosen backup point. ("reasonably" defined as "as close to 100%
> as I can get)
>
> Because if it's Thursday, and Wednesday night's repo backup is the one I
> need, I don't want to have to report back that it's corrupt and the best I
> can do is Tuesday.
>


Personally, I think it is overkill to keep a bunch of backups.  With the
exception of revision properties, history is immutable so it is not like
someone can delete something where you would need an older backup to find
it.

The best backup option is svnadmin hotcopy.  That is the only backup option
that gets all of the data in the repository.  dump and svnsync both do not
backup locks or hook scripts.

As of 1.8 I believe, svnadmin hotcopy supports an --incremental option so
that you can also do these backups very fast.  I would recommend doing an
--incremental hotcopy from a post-commit hook script so that you always
have a proper backup.  You can then periodically take dumps or perform
verify on your hotcopy or do an rsync to maintain some offsite backup etc.

So do an incremental hotcopy via post-commit hook and do a nightly verify
of the hotcopy if you are paranoid.  SVN repositories do not suddenly
become corrupt though.  The data is only written once, so unless the disk
drive starts losing bits once it is good it stays good.  Only a newer
commit can come in that is somehow "corrupt".  That said, corrupt
repositories are really rare.

Mark


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Pavel Lyalyakin
Hello Matt,

On Wed, Jun 1, 2016 at 6:45 PM, Matt Garman  wrote:
>
> As you suggested, if you can make a fancier backup script that shuts
> down anyone's ability to make changes to the repo while the backup is
> taking place, that's even better.

There is `svnadmin freeze`[1] for this. And you can use `svnadmin
hotcopy`[2] on a live repository without any problems.

Why do you use `rsync`?

[1]: http://svnbook.red-bean.com/en/1.8/svn.ref.svnadmin.c.freeze.html
[2]: http://svnbook.red-bean.com/en/1.8/svn.ref.svnadmin.c.hotcopy.html

--
With best regards,
Pavel Lyalyakin
VisualSVN Team


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Michael Schwager
Thanks Matt. To your point,

> my collection of backups probably does have some corrupt repo trees...

that is really what I'm driving at. The RAID, offsite, number of backups
(nightly in our case), and testing is all covered. In other words, I can
mitigate the effects of failure with all those tried-and-true sysadmin
techniques.

The essence of my question drives to Subversion specifically. I don't want
*any* unknown corrupt Subversion repo backups lying around. Meaning I don't
trust a shotgun approach, where I do enough of them so one is bound to be
good. I'm looking for a precision approach, where I can be reasonably
assured that the techniques I'm doing will provide me with a recoverable
repo at any chosen backup point. ("reasonably" defined as "as close to 100%
as I can get)

Because if it's Thursday, and Wednesday night's repo backup is the one I
need, I don't want to have to report back that it's corrupt and the best I
can do is Tuesday.


On Wed, Jun 1, 2016 at 10:45 AM, Matt Garman 
wrote:

> I think there's two questions here: (1) what are general good backup
> practices, and (2) how to backup svn repos specifically.
>
> "If we lose it, we're all out of work."  Hopefully your boss
> recognizes this and has budgeted appropriately.  In my experience
> there is no perfect backup; the best you can do is ever-decreasing
> odds of a catastrophic failure.
>
> Step one would be to run your svn server with some kind of redundant
> disk configuration.  Of course we all know RAID is not backup, but
> storage is relatively cheap, so why not?
>
> I'd then backup to at least two different machines, preferably
> offsite.  Cloud storage is fairly cheap these days as well.  So a good
> scheme might be (at least) one offsite server that you control, and
> (at least) a second copy with a cloud provider (CrashPlan, BackBlaze,
> Amazon, DropBox, etc).
>
> What we've always done is a simple rsync of the repo tree.  Your email
> made me realize that we could be doing the backup right when someone
> is committing, and thus ending up with a corrupt repo tree.  However,
> we have some mitigating factors: we don't have just one repo, but
> literally dozens.  And we do backups twice per week, and we keep
> several months of backups.  So my collection of backups probably does
> have some corrupt repo trees... but given the number of repos we have,
> plus the fact that the backup jobs run in the middle of the
> night/weekend, I think the probability is pretty low that I have any
> significant corruption.
>
> As you suggested, if you can make a fancier backup script that shuts
> down anyone's ability to make changes to the repo while the backup is
> taking place, that's even better.
>
> For my personal svn repos (home hobby projects) I do simple backups
> with svndump.
>
> Lastly, you probably owe it to your company to regularly test your
> backups to ensure that they are indeed viable.  Just like buildings
> have fire drills, so should sysadmins have DR drills.
>
> Hope these suggestions are useful!
>
>
>
>
> On Wed, Jun 1, 2016 at 9:58 AM, Michael Schwager 
> wrote:
> > Hello,
> > We are very paranoid about our Subversion repo, notwithstanding the fact
> > that the previous sysadmin didn't back it up. But that's another story.
> Now
> > I'm here at my job, I've inherited the repo admin duties, and I want to
> back
> > it up reliably. If we lose it, we're all out of work.
> >
> > My question is: How do I back it up reliably, and verify it so that I can
> > deliver a 100% recovery guarantee to my boss? I have Subversion 1.8.4 on
> a
> > CentOS 6.3 server, and Tortoise SVN 1.8.11 on Windows 7 clients.
> >
> > I am thinking to do both an svn hotcopy to one directory, and an rsync to
> > another. The svn hotcopy will give me a backup that I'm pretty sure is
> > reliable (see Notes below). Assuming httpd is down and I can guarantee
> that
> > I am the only person who will be logged into the SVN server, can I expect
> > with 99.9% surety that the svn repos are quiescent?
> >
> > Thanks.
> > --
> > -Mike Schwager
> >
> > Notes:
> >
> > We're a little worried about svn hotcopy; we ran into a bug that came
> about
> > under 1.8 when working with older repos; the hotcopy exits with the
> > following error:
> >
> > svnadmin: E22: Serialized hash missing terminator
> >
> > I have compiled subversion-1.9.4 on the server under
> /opt/subversion-1.9.4.
> > If I run that version of svn hotcopy, it appears to work and svnverify
> exits
> > successfully. But if I look at all the files under both the original and
> the
> > hotcopy on one of our repos, I find that a file is missing:
> > repos2/db/rev-prop-atomics.shm . That's probably ok, but still- how do we
> > know the latest hotcopy, and hotcopies of the future, are and will remain
> > 100% bug-free?
>



-- 
-Mike Schwager


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Pavel Lyalyakin
Hello Michael,

On Wed, Jun 1, 2016 at 5:58 PM, Michael Schwager  wrote:
> Hello,
> We are very paranoid about our Subversion repo, notwithstanding the fact
> that the previous sysadmin didn't back it up. But that's another story. Now
> I'm here at my job, I've inherited the repo admin duties, and I want to back
> it up reliably. If we lose it, we're all out of work.
>
> My question is: How do I back it up reliably, and verify it so that I can
> deliver a 100% recovery guarantee to my boss? I have Subversion 1.8.4 on a
> CentOS 6.3 server, and Tortoise SVN 1.8.11 on Windows 7 clients.

Don't forget about `svnadmin verify`[1]. It makes sense to run it
right now if you've never ran it. ;)

And if you want to deliver recovery guarantee close to 100%, consider
`svnadmin dump`[2] in addition to hotcopy.

> I am thinking to do both an svn hotcopy to one directory, and an rsync to
> another. The svn hotcopy will give me a backup that I'm pretty sure is
> reliable (see Notes below). Assuming httpd is down and I can guarantee that
> I am the only person who will be logged into the SVN server, can I expect
> with 99.9% surety that the svn repos are quiescent?

I'm not sure that I understand what you mean by "quiescent" here.
What's your concern?

So you are planning to make a hotcopy and then transfer this hotcopy
using rsync to another machine, right?

* It's safe to run `svnadmin hotcopy` on a live repository.

* And if you'd have to use `rsync` on a *live* repository, run
 `svnadmin freeze`[3] to prevent concurrent commits to the repo while
 `rsync` is running.

If httpd is disabled and you are the only user logged on, this means that
your repositories should be untouched by any other processes at this
moment. Unless you have some indexing services that touch the repos
(and it is generally recommended to exclude repos from indexing).

> Notes:
>
> We're a little worried about svn hotcopy; we ran into a bug that came about
> under 1.8 when working with older repos; the hotcopy exits with the
> following error:
>
> svnadmin: E22: Serialized hash missing terminator

It seems to be a bug #4554[4] that has been fixed in Subversion 1.9.
`svnadmin hotcopy` could fail on a repository with older FSFS format.

> I have compiled subversion-1.9.4 on the server under /opt/subversion-1.9.4.
> If I run that version of svn hotcopy, it appears to work and svnverify exits
> successfully. But if I look at all the files under both the original and the
> hotcopy on one of our repos, I find that a file is missing:
> repos2/db/rev-prop-atomics.shm . That's probably ok, but still- how do we
> know the latest hotcopy, and hotcopies of the future, are and will remain
> 100% bug-free?

Use `svnadmin verify` and consider using `svnadmin dump` in addition
to hotcopies. Make sure that your Subversion server is on the latest
patch update too.

[1]: http://svnbook.red-bean.com/en/1.8/svn.ref.svnadmin.c.verify.html
[2]: http://svnbook.red-bean.com/en/1.8/svn.ref.svnadmin.c.dump.html
[3]: http://svnbook.red-bean.com/en/1.8/svn.ref.svnadmin.c.freeze.html
[4]: https://issues.apache.org/jira/browse/SVN-4554

--
With best regards,
Pavel Lyalyakin
VisualSVN Team


Re: Creating and Verifying a Reliable backup

2016-06-01 Thread Matt Garman
I think there's two questions here: (1) what are general good backup
practices, and (2) how to backup svn repos specifically.

"If we lose it, we're all out of work."  Hopefully your boss
recognizes this and has budgeted appropriately.  In my experience
there is no perfect backup; the best you can do is ever-decreasing
odds of a catastrophic failure.

Step one would be to run your svn server with some kind of redundant
disk configuration.  Of course we all know RAID is not backup, but
storage is relatively cheap, so why not?

I'd then backup to at least two different machines, preferably
offsite.  Cloud storage is fairly cheap these days as well.  So a good
scheme might be (at least) one offsite server that you control, and
(at least) a second copy with a cloud provider (CrashPlan, BackBlaze,
Amazon, DropBox, etc).

What we've always done is a simple rsync of the repo tree.  Your email
made me realize that we could be doing the backup right when someone
is committing, and thus ending up with a corrupt repo tree.  However,
we have some mitigating factors: we don't have just one repo, but
literally dozens.  And we do backups twice per week, and we keep
several months of backups.  So my collection of backups probably does
have some corrupt repo trees... but given the number of repos we have,
plus the fact that the backup jobs run in the middle of the
night/weekend, I think the probability is pretty low that I have any
significant corruption.

As you suggested, if you can make a fancier backup script that shuts
down anyone's ability to make changes to the repo while the backup is
taking place, that's even better.

For my personal svn repos (home hobby projects) I do simple backups
with svndump.

Lastly, you probably owe it to your company to regularly test your
backups to ensure that they are indeed viable.  Just like buildings
have fire drills, so should sysadmins have DR drills.

Hope these suggestions are useful!




On Wed, Jun 1, 2016 at 9:58 AM, Michael Schwager  wrote:
> Hello,
> We are very paranoid about our Subversion repo, notwithstanding the fact
> that the previous sysadmin didn't back it up. But that's another story. Now
> I'm here at my job, I've inherited the repo admin duties, and I want to back
> it up reliably. If we lose it, we're all out of work.
>
> My question is: How do I back it up reliably, and verify it so that I can
> deliver a 100% recovery guarantee to my boss? I have Subversion 1.8.4 on a
> CentOS 6.3 server, and Tortoise SVN 1.8.11 on Windows 7 clients.
>
> I am thinking to do both an svn hotcopy to one directory, and an rsync to
> another. The svn hotcopy will give me a backup that I'm pretty sure is
> reliable (see Notes below). Assuming httpd is down and I can guarantee that
> I am the only person who will be logged into the SVN server, can I expect
> with 99.9% surety that the svn repos are quiescent?
>
> Thanks.
> --
> -Mike Schwager
>
> Notes:
>
> We're a little worried about svn hotcopy; we ran into a bug that came about
> under 1.8 when working with older repos; the hotcopy exits with the
> following error:
>
> svnadmin: E22: Serialized hash missing terminator
>
> I have compiled subversion-1.9.4 on the server under /opt/subversion-1.9.4.
> If I run that version of svn hotcopy, it appears to work and svnverify exits
> successfully. But if I look at all the files under both the original and the
> hotcopy on one of our repos, I find that a file is missing:
> repos2/db/rev-prop-atomics.shm . That's probably ok, but still- how do we
> know the latest hotcopy, and hotcopies of the future, are and will remain
> 100% bug-free?