Re: Breaking up a monolothic repository

2013-10-04 Thread Thomas Harold

On 10/2/2013 10:36 AM, Ullrich Jans wrote:


I'm now facing the same problem. My users want the rebasing, but during
the dump/load instead of after the fact (apparently, it causes issues
with their environment when they need to go back to an earlier revision
to reproduce something). They also want to keep the empty revisions (for
references from the issue tracker).

I haven't tried it with svnadmin dump followed by svndumpfilter (I don't
think it has that capability).


The command we ended up using back in May 2011 when we did this looked 
like the following.  It's been two years, but I'm pretty sure these two 
scripts is all we ended up using.


- We had a master dump of the entire brc-jobs repository.
- Target repository name was brc-jobs-zp (CLCODE)
- It takes the dump and splits it into a smaller chunk (CLPATH).
- Had to edit the script for each new client/path that we wanted to 
split out.


It does *not* attempt to rebase the individual projects up to the root 
directory.  It *is* possible by using 'sed' to do this in the resulting 
dump file, but it is trick.



#!/bin/bash

SOURCE=/mnt/scratch/svn-dump-brc-jobs.may2011.dump.gz

DESTDIR=/var/svn/
DESTPFX=svn-raw-brc-jobs-
DESTSFX=10xx.dump.gz

CLCODE=zp
CLPATH=Z/ZP_SingleJobs

SDFOPTS='--drop-empty-revs  --renumber-revs'

date

echo ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

svnadmin dump --quiet /var/svn/brc-jobs | \
svndumpfilter include --quiet $SDFOPTS $CLPATH | \
gzip  ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

date


The mirror to this was the script that created the new SVN repository 
and loads in the individual dump.


Note the commented out 'sed' lines where we attempted to rebase 
individual project folders back up to the root of the repository.  They 
didn't work, so we ended up just doing a move operation in the 
TortoiseSVN repository browser.


- It changes the UUID of the newly created repository to be something 
unique instead of using the old repo's UUID.

- Had to be edited anew for each new client/path.


#!/bin/bash

SRCDIR=/var/svn/
SRCPFX=svn-raw-brc-jobs-
SRCSFX=10xx.dump.gz

DESTDIR=/var/svn/
DESTPFX=svn-newbase-brc-jobs-
DESTSFX=10xx.dump.gz

SDFOPTS='--quiet --drop-empty-revs  --renumber-revs'

CLPARENT=Z
CLCODE=zp

date

#gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \
#sed s/Node-path: $CLPATH\//Node-path: / | \
#sed s/Node-copyfrompath: $CLPATH\//Node-copyfrompath: / | \
#gzip  ${DESTDIR}${DESTPFX}${CLCODE}${DESTSFX}

svn mkdir -m Import from brc-jobs 
file:///var/svn/brc-jobs-${CLCODE}/${CLPARENT}


gunzip -c ${SRCDIR}${SRCPFX}${CLCODE}${SRCSFX} | \
  svnadmin load --quiet /var/svn/brc-jobs-${CLCODE}

svnlook uuid /var/svn/brc-jobs-${CLCODE}
svnadmin setuuid /var/svn/brc-jobs-${CLCODE}
svnlook uuid /var/svn/brc-jobs-${CLCODE}
svnadmin pack /var/svn/brc-jobs-${CLCODE}

chmod -R 775 /var/svn/brc-jobs-${CLCODE}
chmod -R g+s /var/svn/brc-jobs-${CLCODE}/db
chgrp -R svn-brc-jobs /var/svn/brc-jobs-${CLCODE}

date


I do wish I could have figured out the 'sed' commands to move a project 
from /Z/ZP_SingleJobs/JOBNR to be just /JOBNR in the repository, but 
there wasn't time.


For rebasing, that's probably your missing piece... which I don't have.


Re: Breaking up a monolothic repository

2013-10-02 Thread Ullrich Jans

Am 10.09.2013 19:45, schrieb Thomas Harold:


When we moved from a monolithic repository to per-client repositories a
few years ago, we went ahead and:

- Rebased the paths up one or two levels (old system was something like
monolithicrepo/[a-z]/[client directories]/[job directory]) so that the
urls were now clientrepo/[job directory].  That was a tricky thing to
do and we had to 'sed' the output of the dump filter before importing it
back.

It broke a few things, such as svn:externals which were not
relative-pathed, but was worth it in the long run so that our URLs got
shorter.

- Made sure that the new repos all had unique UUIDs.

- Renumbered all of the resulting revisions as we loaded things back in.
  But we didn't have to deal with any bug tracking systems that referred
to a specific revision.  And having lower revision numbers was
preferred, along with dropping revisions that referred to other projects.


I'm now facing the same problem. My users want the rebasing, but during 
the dump/load instead of after the fact (apparently, it causes issues 
with their environment when they need to go back to an earlier revision 
to reproduce something). They also want to keep the empty revisions (for 
references from the issue tracker).


I haven't tried it with svnadmin dump followed by svndumpfilter (I don't 
think it has that capability).


I've tried svnrdump (from svn 1.7), it resulted in either a new 
repository with the full path included (rdump/load all revs) or an 
interesting failure mode with a missing node during a copy operation 
when rdump -r revision_after_path:HEAD was used


I've also tried using svnsync, but that also results in the full path 
included, no rebasing.


How did you do it? Also, am I missing something that has been included 
in a current svn version?


Cheers,

Ulli

--
Ullrich Jans, Specialist, IT-A
Phone: +49 9131 7701-6627, mailto:ullrich.j...@elektrobit.com
Fax: +49 9131 7701-6333, www.elektrobit.com

Elektrobit Automotive GmbH, Am Wolfsmantel 46, 91058 Erlangen, Germany
Managing Directors: Alexander Kocher, Gregor Zink
Register Court Fürth HRB 4886



Please note: This e-mail may contain confidential information
intended solely for the addressee. If you have received this
e-mail in error, please do not disclose it to anyone, notify
the sender promptly, and delete the message from your system.
Thank you.



RE: Breaking up a monolothic repository

2013-10-02 Thread Bob Archer
 Am 10.09.2013 19:45, schrieb Thomas Harold:
 
  When we moved from a monolithic repository to per-client repositories
  a few years ago, we went ahead and:
 
  - Rebased the paths up one or two levels (old system was something
  like monolithicrepo/[a-z]/[client directories]/[job directory]) so
  that the urls were now clientrepo/[job directory].  That was a
  tricky thing to do and we had to 'sed' the output of the dump filter
  before importing it back.
 
  It broke a few things, such as svn:externals which were not
  relative-pathed, but was worth it in the long run so that our URLs got
  shorter.
 
  - Made sure that the new repos all had unique UUIDs.
 
  - Renumbered all of the resulting revisions as we loaded things back in.
But we didn't have to deal with any bug tracking systems that
  referred to a specific revision.  And having lower revision numbers
  was preferred, along with dropping revisions that referred to other 
  projects.
 
 I'm now facing the same problem. My users want the rebasing, but during the
 dump/load instead of after the fact (apparently, it causes issues with their
 environment when they need to go back to an earlier revision to reproduce
 something). They also want to keep the empty revisions (for references from
 the issue tracker).

Wouldn't it be much simpler to keep the current repository as a read only 
archives and move the HEAD of each project into its own repo?


 I haven't tried it with svnadmin dump followed by svndumpfilter (I don't 
 think it
 has that capability).
 
 I've tried svnrdump (from svn 1.7), it resulted in either a new repository 
 with
 the full path included (rdump/load all revs) or an interesting failure mode 
 with
 a missing node during a copy operation when rdump -r
 revision_after_path:HEAD was used
 
 I've also tried using svnsync, but that also results in the full path 
 included, no
 rebasing.
 
 How did you do it? Also, am I missing something that has been included in a
 current svn version?
 
 Cheers,
 
 Ulli


Re: Breaking up a monolothic repository

2013-09-12 Thread Les Mikesell
On Wed, Sep 11, 2013 at 10:49 PM, Nico Kadel-Garcia nka...@gmail.com wrote:
 Les, disk space isn't the issue for the empty revs. It's any operations that
 try to scan or assemble information from the revisions. 5000 empty objects
 is still a logistical burden, especially if assembling any kind of change
 history for the new repository.

I don't see how that imposes a bigger computational burden than the
same number of unrelated revisions did in the combined repo. - which
typically is not a problem.  We are at rev 186767 on a large
multi-project repo which, although I wish it had been created as
separate repos for easier future maintenance, does not have serious
performance issues.

 And since the new repositories are
 effectively a rebase of a subset of the code, you don't normally *gain*
 anything from having empty revisions for code that is in the other new
 repositories. You can't meaninglfully merge content between the new smaller
 repositories and the old repo, barring some seriously weird cases, so it's
 safer to treat them as completely distinct and not bother to preserve all
 the empty revisions.

 The revision numbers are stored in support tickets is the only reason I
 can think of to keep them.

Or pegged externals if they stay in the same relative location.  Or
any email, documentation or recorded discussion referring to the
changes in a revision.   My point is that any change that requires new
training or human intervention to fix something is never going to win
back that time.   Someone who completely understands the current
process and user base might be able to optimize and improve it with
drastic changes, but that seems unlikely if they are asking for advice
on a mail list.

-- 
   Les Mikesell
lesmikes...@gmail.com


Re: Breaking up a monolothic repository

2013-09-11 Thread Nico Kadel-Garcia
Les, disk space isn't the issue for the empty revs. It's any operations
that try to scan or assemble information from the revisions. 5000 empty
objects is still a logistical burden, especially if assembling any kind
of change history for the new repository. And since the new repositories
are effectively a rebase of a subset of the code, you don't normally *gain*
anything from having empty revisions for code that is in the other new
repositories. You can't meaninglfully merge content between the new smaller
repositories and the old repo, barring some seriously weird cases, so it's
safer to treat them as completely distinct and not bother to preserve all
the empty revisions.

The revision numbers are stored in support tickets is the only reason I
can think of to keep them.


On Tue, Sep 10, 2013 at 11:35 AM, Les Mikesell lesmikes...@gmail.comwrote:

 On Tue, Sep 10, 2013 at 6:22 AM, Nico Kadel-Garcia nka...@gmail.com
 wrote:
 
  Even if the history is considered sacrosanct (and this is often a
  theological policy, not an engineering one!), an opportunity to reduce
 the
  size of each reaporitory by discarding deadwood at switchover time
 should be
  taken seriously.

 Those empty revs take what, a couple of dollars worth of disk space
 (OK, x3 or 4 for backups...), vs. how much human time will it take to
 make everyone involved understand that you use one procedure for
 revisions before a certain date, and a different one after, and to get
 diffs between them you have to either check out both copies and use
 local tools or map the rev number from your old reference to the new
 numbering scheme?   And then there are likely to be pegged externals
 to pull in components that you'll have to fix even if they stay within
 the same project repo and use relative notation.   I'd call not
 unnecessarily changing the history you use a version control system to
 preserve to be 'philosophically correct'  as opposed to a theological
 requirement.  If your engineering choices were always right the first
 time, you probably wouldn't have all these revisions in the first
 place.

 --
Les Mikesell
   lesmikes...@gmail.com



Re: Breaking up a monolothic repository

2013-09-10 Thread Thorsten Schöning
Guten Tag Trent W. Buck,
am Dienstag, 10. September 2013 um 02:49 schrieben Sie:

 ...hm, still 1.6.  Is it worth me backporting a newer svn?

I would give it a try, get yourself a current build of 1.8, dump your
old repo and load it into a newly created from your 1.8 version and
see how much space is saved. Your version information about the repo
looks current enough to already use representation sharing, but
depending on how the upgrades were made, svnadmin upgrade vs. full
dump/load cycle, there maybe old duplicate data in the repo created
before svnadmin upgrade. Besides that, 1.8 made improvements to reduce
disk space, too.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail:thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow



Re: Breaking up a monolothic repository

2013-09-10 Thread Thomas Harold

On 9/9/2013 8:49 PM, Trent W. Buck wrote:


I'm partway through provisioning the replacement Debian 7 server, which
will have

 subversion 1.6.17dfsg-4+deb7u3
 apache22.2.22-13

...hm, still 1.6.  Is it worth me backporting a newer svn?



Yes, it's worth installing 1.8.3.

http://www.wandisco.com/subversion/download#debian7




Re: Breaking up a monolothic repository

2013-09-10 Thread Nico Kadel-Garcia
 Have you checked if the users have/need anything (emails, ticket
system, etc.) that refer to specific revisions or the history of
changes made there?   It seems kind of drastic to throw that away
because you think the numbers aren't pretty enough.


But keeping thousands of empty commits in a project they're not relevant to
is confusing and wasteful. The  repository and repository URL's for the old
project should be preserved, if possible, locked down and read-only,
precisely for this kind of change history. But since the repository is
being completely refactored *anyway*, it's a great opportunity to discard
debris.

Even if the history is considered sacrosanct (and this is often a
theological policy, not an engineering one!), an opportunity to reduce the
size of each reaporitory by discarding deadwood at switchover time should
be taken seriously.


Re: Breaking up a monolothic repository

2013-09-10 Thread Les Mikesell
On Tue, Sep 10, 2013 at 6:22 AM, Nico Kadel-Garcia nka...@gmail.com wrote:

 Even if the history is considered sacrosanct (and this is often a
 theological policy, not an engineering one!), an opportunity to reduce the
 size of each reaporitory by discarding deadwood at switchover time should be
 taken seriously.

Those empty revs take what, a couple of dollars worth of disk space
(OK, x3 or 4 for backups...), vs. how much human time will it take to
make everyone involved understand that you use one procedure for
revisions before a certain date, and a different one after, and to get
diffs between them you have to either check out both copies and use
local tools or map the rev number from your old reference to the new
numbering scheme?   And then there are likely to be pegged externals
to pull in components that you'll have to fix even if they stay within
the same project repo and use relative notation.   I'd call not
unnecessarily changing the history you use a version control system to
preserve to be 'philosophically correct'  as opposed to a theological
requirement.  If your engineering choices were always right the first
time, you probably wouldn't have all these revisions in the first
place.

-- 
   Les Mikesell
  lesmikes...@gmail.com


Re: Breaking up a monolothic repository

2013-09-10 Thread Thomas Harold

On 9/10/2013 7:22 AM, Nico Kadel-Garcia wrote:

But keeping thousands of empty commits in a project they're not relevant
to is confusing and wasteful. The  repository and repository URL's for
the old project should be preserved, if possible, locked down and
read-only, precisely for this kind of change history. But since the
repository is being completely refactored *anyway*, it's a great
opportunity to discard debris.


When we moved from a monolithic repository to per-client repositories a 
few years ago, we went ahead and:


- Rebased the paths up one or two levels (old system was something like 
monolithicrepo/[a-z]/[client directories]/[job directory]) so that the 
urls were now clientrepo/[job directory].  That was a tricky thing to 
do and we had to 'sed' the output of the dump filter before importing it 
back.


It broke a few things, such as svn:externals which were not 
relative-pathed, but was worth it in the long run so that our URLs got 
shorter.


- Made sure that the new repos all had unique UUIDs.

- Renumbered all of the resulting revisions as we loaded things back in. 
 But we didn't have to deal with any bug tracking systems that referred 
to a specific revision.  And having lower revision numbers was 
preferred, along with dropping revisions that referred to other projects.



Even if the history is considered sacrosanct (and this is often a
theological policy, not an engineering one!), an opportunity to reduce
the size of each repository by discarding deadwood at switchover time
should be taken seriously.


Less of an issue now that svn 1.8 has revprop packing (plus the rev 
packing from 1.6).  That deadwood takes up a lot less space in terms of 
the number of files in the file system.


And the fact that svnadmin hotcopy is now incremental in 1.8 also makes 
it less of an issue.  Having a few thousand (tens of thousands) 
revisions in a repository is no longer a big bottleneck during the 
hotcopy process like it was before.


Our backup system is also a lot happier with fewer files to backup.



RE: Breaking up a monolothic repository

2013-09-10 Thread Bob Archer
 -Original Message-
 From: t...@elba.apache.org [mailto:t...@elba.apache.org] On Behalf Of Trent
 W. Buck
 Sent: Monday, September 09, 2013 11:38 PM
 To: users@subversion.apache.org
 Subject: Re: Breaking up a monolothic repository
 
 Les Mikesell lesmikes...@gmail.com writes:
 
  On Mon, Sep 9, 2013 at 7:23 PM, Trent W. Buck trentb...@gmail.com
 wrote:
  Ryan Schmidt subversion-20...@ryandesign.com writes:
 
  As someone used to Subversion's usually sequential revision numbers,
  that bugs me aesthetically, but it works fine.
 
  I think that's the crux of it.
 
  Have you checked if the users have/need anything (emails, ticket
  system, etc.) that refer to specific revisions or the history of
  changes made there?   It seems kind of drastic to throw that away
  because you think the numbers aren't pretty enough.
 
 That is an extremely valid point.  I'll check.
 
 Also part of the reason to split up the  repos is to make access
 control easier, and it looks bad if Alice (who  should have access to
 project 1 but not project 2) can see Bob's old  commit metadata to
 project 2, even if she can't see the commit bodies  after the split.
 
  How does this work now in the combined repository?
 
 Right now, they don't have it with the combined repo.  Anyone in the svn group
 can read everything.  (This is one of the reasons they want to break up the
 single repo into per-project repos.)

You should knock the reason off the list. You can set up path based 
authorization fairly easily. (especially compared to braking it up into 
multiple repos.)

BOb



Re: Breaking up a monolothic repository

2013-09-10 Thread Les Mikesell
On Tue, Sep 10, 2013 at 4:36 PM, Bob Archer bob.arc...@amsi.com wrote:

 Also part of the reason to split up the  repos is to make access
 control easier, and it looks bad if Alice (who  should have access to
 project 1 but not project 2) can see Bob's old  commit metadata to
 project 2, even if she can't see the commit bodies  after the split.
 
  How does this work now in the combined repository?

 Right now, they don't have it with the combined repo.  Anyone in the svn 
 group
 can read everything.  (This is one of the reasons they want to break up the
 single repo into per-project repos.)

 You should knock the reason off the list. You can set up path based 
 authorization fairly easily. (especially compared to braking it up into 
 multiple repos.)


Unless you already have a central authentication source you'll have a
certain tradeoff in complexity between maintaining password control
for multiple repos vs. path-based control in a single one and if there
are external references where different groups use each others'
libraries it can be a little messy either way.

-- 
   Les Mikesell
lesmikes...@gmail.com


Re: Breaking up a monolothic repository

2013-09-09 Thread Thorsten Schöning
Guten Tag Trent W. Buck,
am Montag, 9. September 2013 um 03:13 schrieben Sie:

 What else can I do?

Tell us about the size of your repo, it's format version and primary
data types versioned, as you always can simply clone the entire repo
into one for each project needed and delete and move unneeded contents
per new project repo with a Subversion client. The current format of
the repo and it's primary data types are interesting because if it's
pretty old, current repo versions may provide a significantly reduced
disk space per repo, making the overhead of duplicating the original
one acceptable.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning   E-Mail:thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme  http://www.AM-SoFT.de/

Telefon...05151-  9468- 55
Fax...05151-  9468- 88
Mobil..0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow



Re: Breaking up a monolothic repository

2013-09-09 Thread Les Mikesell
On Sun, Sep 8, 2013 at 8:13 PM, Trent W. Buck trentb...@gmail.com wrote:

 I'm stuck.  Since it's no fun to have tens of thousands of empty revs
 in each project repo, my current approach is to leave existing
 projects in the monolithic repo, and new projects get separate repos.


Why do you think an empty rev will bother anyone any more in a
per-project rev that having the rev number jump from a commit to an
unrelated project does in the combined repo?It shouldn't be a
problem in either case.  Rev numbers for any particular use don't need
to be sequential, you just need to know what they are.

-- 
   Les Mikesell
 lesmikes...@gmail.com


RE: Breaking up a monolothic repository

2013-09-09 Thread Grierson, David
I can see Trent's view point that people are weird and get freaked out by the 
unexpected (where they might expect the revision numbers to be relatively low).

I guess what we should be providing him are points like you do make to help him 
sell why this isn't an issue to the end users.

Like Les says, if someone performs a large batch of commits to a particular 
branch then the trunk revision numbers are going to leap forward 
(unexpectedly). So what to sell those folks concerned about it is that they're 
experiencing this already.
--
David Grierson - SDLC Tools Specialist 
Sky Broadcasting - Customer Business Systems - SDLC Tools
Tel: +44 1506 325100 / Email: david.grier...@bskyb.com / Chatter: CBS SDLC Tools
Watermark Building, Alba Campus, Livingston, EH54 7HH
 

 -Original Message-
 From: Les Mikesell [mailto:lesmikes...@gmail.com]
 Sent: 09 September 2013 13:32
 To: Trent W. Buck
 Cc: Subversion
 Subject: Re: Breaking up a monolothic repository
 
 On Sun, Sep 8, 2013 at 8:13 PM, Trent W. Buck trentb...@gmail.com wrote:
 
  I'm stuck.  Since it's no fun to have tens of thousands of empty revs
  in each project repo, my current approach is to leave existing
  projects in the monolithic repo, and new projects get separate repos.
 
 
 Why do you think an empty rev will bother anyone any more in a
 per-project rev that having the rev number jump from a commit to an
 unrelated project does in the combined repo?It shouldn't be a
 problem in either case.  Rev numbers for any particular use don't need
 to be sequential, you just need to know what they are.
 
 --
Les Mikesell
  lesmikes...@gmail.com


Information in this email including any attachments may be privileged, 
confidential and is intended exclusively for the addressee. The views expressed 
may not be official policy, but the personal views of the originator. If you 
have received it in error, please notify the sender by return e-mail and delete 
it from your system. You should not reproduce, distribute, store, retransmit, 
use or disclose its contents to anyone. Please note we reserve the right to 
monitor all e-mail communication through our internal and external networks. 
SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and 
Sky International AG and are used under licence. British Sky Broadcasting 
Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration 
No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) 
are direct or indirect subsidiaries of British Sky Broadcasting Group plc 
(Registration No. 2247735). All of the companies mentioned in this paragraph 
are incorporated in England and Wales and share the same registered office at 
Grant Way, Isleworth, Middlesex TW7 5QD.




Re: Breaking up a monolothic repository

2013-09-09 Thread Les Mikesell
On Mon, Sep 9, 2013 at 8:03 AM, Grierson, David
david.grier...@bskyb.com wrote:
 I can see Trent's view point that people are weird and get freaked out by the 
 unexpected (where they might expect the revision numbers to be relatively 
 low).


I could see that for someone who had never used subversion before and
did not understand the concept of global revision numbers, but not for
anyone who has used a multi-project repository.

 I guess what we should be providing him are points like you do make to help 
 him sell why this isn't an issue to the end users.

 Like Les says, if someone performs a large batch of commits to a particular 
 branch then the trunk revision numbers are going to leap forward 
 (unexpectedly). So what to sell those folks concerned about it is that 
 they're experiencing this already.

Revision numbers aren't something you guess at or expect anything
from.  They are only useful in terms of the repository history, and it
doesn't matter if your project runs sequentially or not.   If you want
names/numbers that make human sense, you'll be copying to tags for
easier reference anyway.

-- 
Les Mikesell
  lesmikes...@gmail.com


Re: Breaking up a monolothic repository

2013-09-09 Thread Ryan Schmidt

On Sep 9, 2013, at 07:31, Les Mikesell wrote:

 On Sun, Sep 8, 2013 at 8:13 PM, Trent W. Buck wrote:
 
 I'm stuck.  Since it's no fun to have tens of thousands of empty revs
 in each project repo, my current approach is to leave existing
 projects in the monolithic repo, and new projects get separate repos.
 
 Why do you think an empty rev will bother anyone any more in a
 per-project rev that having the rev number jump from a commit to an
 unrelated project does in the combined repo?It shouldn't be a
 problem in either case.  Rev numbers for any particular use don't need
 to be sequential, you just need to know what they are.

This is true. Heck, if you use a dvcs like git or hg you'll get a completely 
random revision number (shaped like a sha1 hash) every time. As someone used to 
Subversion's usually sequential revision numbers, that bugs me aesthetically, 
but it works fine.

There are also some reasons why keeping the revision number from the old 
monolithic repository in your new repositories (with empty padding revisions in 
between) is a really good idea. Have you ever referenced revision numbers in 
your issue tracker (fixed in r111; r222 broke xyz) or in emails (can you 
explain what you did in r333; r444 is a great example of abc) or in commit 
messages (reverted r555; added file forgotten in r666)? If so, you don't 
want to renumber revs, because that would invalidate all those references.



Re: Breaking up a monolothic repository

2013-09-09 Thread Trent W. Buck
Ryan Schmidt subversion-20...@ryandesign.com writes:

 As someone used to Subversion's usually sequential revision numbers,
 that bugs me aesthetically, but it works fine.

I think that's the crux of it.  Also part of the reason to split up the
repos is to make access control easier, and it looks bad if Alice (who
should have access to project 1 but not project 2) can see Bob's old
commit metadata to project 2, even if she can't see the commit bodies
after the split.



Re: Breaking up a monolothic repository

2013-09-09 Thread Trent W. Buck
Thorsten Schöning tschoen...@am-soft.de writes:

 Tell us about the size of your repo
 it's format version and primary data types versioned

(Sorry for not giving this info earlier, and shifting the goal posts --
I personally went rcs-arch-darcs-git and never really used svn, so
I'm feeling pretty noob attacking this problem.)

du reports it is 18GiB.  The current revno is 16115.

$ grep . /home/svn/PI/{format,db/fs-type,db/format}
/home/svn/PI/format:5
/home/svn/PI/db/fs-type:fsfs
/home/svn/PI/db/format:4
/home/svn/PI/db/format:layout sharded 1000

As to what kind of files are in there -- I'm not actually sure.
Just doing a dumb look at HEAD's list of files,

$ svn ls -R file:///home/svn/PI | wc -l
269281

And looking at the most common extensions:

$ svn ls -R file:///home/svn/PI | sed -n 's/.*\.//p' |
  sort | uniq -c | sort -nr | head -20
  36581 h  2438 txt
  21732 patch  2375 sh
  17621 html   2362 i
  15023 c  2121 bmp
   8143 py 1957 mk
   3919 cpp1932 po
   3559 png1916 class
   3074 gif1813 lua
   2950 xml1742 cs
   2585 properties 1613 hpp

Obviously that's not weighted by size, and completely ignores anything
that's not in HEAD anymore.

   *   *   *

It's currently hosted on an Ubuntu 10.04 server, so my server svn is
quite old:

subversion 1.6.6dfsg-2ubuntu1.3
apache22.2.14-5ubuntu8.12

I believe some of the users have svn 1.7 on their desktops, but not all.

I'm partway through provisioning the replacement Debian 7 server, which
will have

subversion 1.6.17dfsg-4+deb7u3
apache22.2.22-13

...hm, still 1.6.  Is it worth me backporting a newer svn?



Re: Breaking up a monolothic repository

2013-09-09 Thread Trent W. Buck
trentb...@gmail.com (Trent W. Buck) writes:

 So then I thought to chain the two approaches. This didn't work -- the
 empty revs were not removed. I guess svndumpfilter --drop-empty-revs
 is only smart enough to drop the revs that have just *become* empty?

 rm -rf delete-me-3
 svnadmin create delete-me-3
 svnadmin dump delete-me-2 |
 svndumpfilter --drop-empty-revs exclude /canthappen |
 svnadmin load delete-me-3

A helpful offlist correspondent noted svn 1.8 has --drop-all-empty-revs,
so I might try building that long enough to try that option.



Re: Breaking up a monolothic repository

2013-09-09 Thread Les Mikesell
On Mon, Sep 9, 2013 at 7:23 PM, Trent W. Buck trentb...@gmail.com wrote:
 Ryan Schmidt subversion-20...@ryandesign.com writes:

 As someone used to Subversion's usually sequential revision numbers,
 that bugs me aesthetically, but it works fine.

 I think that's the crux of it.

Have you checked if the users have/need anything (emails, ticket
system, etc.) that refer to specific revisions or the history of
changes made there?   It seems kind of drastic to throw that away
because you think the numbers aren't pretty enough.

Also part of the reason to split up the
 repos is to make access control easier, and it looks bad if Alice (who
 should have access to project 1 but not project 2) can see Bob's old
 commit metadata to project 2, even if she can't see the commit bodies
 after the split.

How does this work now in the combined repository?

-- 
   Les Mikesell
  lesmikes...@gmail.com


Re: Breaking up a monolothic repository

2013-09-09 Thread Trent W. Buck
Les Mikesell lesmikes...@gmail.com writes:

 On Mon, Sep 9, 2013 at 7:23 PM, Trent W. Buck trentb...@gmail.com wrote:
 Ryan Schmidt subversion-20...@ryandesign.com writes:

 As someone used to Subversion's usually sequential revision numbers,
 that bugs me aesthetically, but it works fine.

 I think that's the crux of it.

 Have you checked if the users have/need anything (emails, ticket
 system, etc.) that refer to specific revisions or the history of
 changes made there?   It seems kind of drastic to throw that away
 because you think the numbers aren't pretty enough.

That is an extremely valid point.  I'll check.

Also part of the reason to split up the
 repos is to make access control easier, and it looks bad if Alice (who
 should have access to project 1 but not project 2) can see Bob's old
 commit metadata to project 2, even if she can't see the commit bodies
 after the split.

 How does this work now in the combined repository?

Right now, they don't have it with the combined repo.  Anyone in the svn
group can read everything.  (This is one of the reasons they want to
break up the single repo into per-project repos.)



Breaking up a monolothic repository

2013-09-08 Thread Trent W. Buck
I have inherited a single monolithic repo for all the company's
projects.  I want to migrate to one repo per project. (One-way,
one-time migration.)

Following the red-bean book[0], I first tried svnadmin, which was
really slow, and eventually crashed because some files were copied
into projects/133_Redacted from a different subdir.

rm -rf delete-me
svnadmin create delete-me
svnadmin dump /srv/svn/Frobozz |
svndumpfilter --drop-empty-revs include projects/133_Redacted |
svnadmin load delete-me

[...]
svndumpfilter: Invalid copy source path 
'/EE/ProjectDocs/133_Redacted/REDACTED.pdf'
svnadmin: Can't write to stream: Broken pipe
 Started new transaction, based on original revision 4182
svnadmin: File not found: transaction '0-0', path 'projects/133_Redacted' * 
adding path : projects/133_Redacted ...

Freenode's #svn IRC channel advised me to use svnsync instead.  That
was really slow, eventually succeeded, but left a tonne of empty
commit messages

rm -rf delete-me-2
svnadmin create delete-me-2
ln -s /bin/true delete-me-2/hooks/pre-revprop-change
svnsync init file://$PWD/delete-me-2 
file:///srv/svn/Frobozz/projects/133_Redacted
svnsync sync file://$PWD/delete-me-2
rm delete-me-2/hooks/pre-revprop-change

So then I thought to chain the two approaches. This didn't work -- the
empty revs were not removed. I guess svndumpfilter --drop-empty-revs
is only smart enough to drop the revs that have just *become* empty?

rm -rf delete-me-3
svnadmin create delete-me-3
svnadmin dump delete-me-2 |
svndumpfilter --drop-empty-revs exclude /canthappen |
svnadmin load delete-me-3

I also thought of converting to git fast-export format and back again,
but AFAICT there is no way to import a fast-export into a svn repo.

I'm stuck.  Since it's no fun to have tens of thousands of empty revs
in each project repo, my current approach is to leave existing
projects in the monolithic repo, and new projects get separate repos.

What else can I do?

[0] http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html



Re: Breaking up a monolothic repository

2013-09-08 Thread Nico Kadel-Garcia
Lock the existing repo: Do clean exports, and imports, to new repositories
with the new layout, with a README.md or other guideline to where the
legacy repository exists. You lose the infinitely preserved history this
way, but for most working software projects, you don't *need* that. And
it's a good opportunity to discard materials, such as bulky binaries or
security sensitive files with plain text passwords.


On Sun, Sep 8, 2013 at 9:13 PM, Trent W. Buck trentb...@gmail.com wrote:

 I have inherited a single monolithic repo for all the company's
 projects.  I want to migrate to one repo per project. (One-way,
 one-time migration.)

 Following the red-bean book[0], I first tried svnadmin, which was
 really slow, and eventually crashed because some files were copied
 into projects/133_Redacted from a different subdir.

 rm -rf delete-me
 svnadmin create delete-me
 svnadmin dump /srv/svn/Frobozz |
 svndumpfilter --drop-empty-revs include projects/133_Redacted |
 svnadmin load delete-me

 [...]
 svndumpfilter: Invalid copy source path
 '/EE/ProjectDocs/133_Redacted/REDACTED.pdf'
 svnadmin: Can't write to stream: Broken pipe
  Started new transaction, based on original revision 4182
 svnadmin: File not found: transaction '0-0', path
 'projects/133_Redacted' * adding path : projects/133_Redacted ...

 Freenode's #svn IRC channel advised me to use svnsync instead.  That
 was really slow, eventually succeeded, but left a tonne of empty
 commit messages

 rm -rf delete-me-2
 svnadmin create delete-me-2
 ln -s /bin/true delete-me-2/hooks/pre-revprop-change
 svnsync init file://$PWD/delete-me-2
 file:///srv/svn/Frobozz/projects/133_Redacted
 svnsync sync file://$PWD/delete-me-2
 rm delete-me-2/hooks/pre-revprop-change

 So then I thought to chain the two approaches. This didn't work -- the
 empty revs were not removed. I guess svndumpfilter --drop-empty-revs
 is only smart enough to drop the revs that have just *become* empty?

 rm -rf delete-me-3
 svnadmin create delete-me-3
 svnadmin dump delete-me-2 |
 svndumpfilter --drop-empty-revs exclude /canthappen |
 svnadmin load delete-me-3

 I also thought of converting to git fast-export format and back again,
 but AFAICT there is no way to import a fast-export into a svn repo.

 I'm stuck.  Since it's no fun to have tens of thousands of empty revs
 in each project repo, my current approach is to leave existing
 projects in the monolithic repo, and new projects get separate repos.

 What else can I do?

 [0] http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html




Re: Breaking up a monolothic repository

2013-09-08 Thread Trent W. Buck
Nico Kadel-Garcia nka...@gmail.com writes:

 Lock the existing repo: Do clean exports, and imports, to new repositories
 with the new layout, with a README.md or other guideline to where the
 legacy repository exists. You lose the infinitely preserved history this
 way, but for most working software projects, you don't *need* that. And
 it's a good opportunity to discard materials, such as bulky binaries or
 security sensitive files with plain text passwords.

Ah, sorry, I forgot to mention that preserving history was a hard
requirement handed down from higher up.

I get the impression that $company's projects mostly have a finite
lifespan (a couple of years), so I think that approach ends up being
very similar to my current plan of creating new projects as new repos,
and letting the monolithic repo die out via attrition.

I don't actually know exactly what they put in their repos; I think it's
about half huge unpacked source tarball I downloaded from somewhere
then tinkered with and half huge CAD files and .docx contracts.



RE: Breaking up a monolothic repository

2013-09-08 Thread Geoff Field
 From: Trent W. Buck
 Sent: Monday, 9 September 2013 12:17 PM
 Nico Kadel-Garcia nka...@gmail.com writes:
 
  Lock the existing repo: Do clean exports, and imports, to new 
  repositories with the new layout, with a README.md or other 
 guideline 
  to where the legacy repository exists. You lose the infinitely 
  preserved history this way, but for most working software projects, 
  you don't *need* that. And it's a good opportunity to discard 
  materials, such as bulky binaries or security sensitive 
 files with plain text passwords.
 
 Ah, sorry, I forgot to mention that preserving history was a 
 hard requirement handed down from higher up.

You *could* argue that the existing repository preserves the history.
However, I think I know what they mean.

 I get the impression that $company's projects mostly have a 
 finite lifespan (a couple of years),

By lifespan, what exactly do you mean?  At my company, the individual 
projects might be in production within anywhere from 6 months to 2 years after 
start of development, be manufactured for two to four years, then go into 
support mode for up to 7 years (or more).

 so I think that approach 
 ends up being very similar to my current plan of creating new 
 projects as new repos, and letting the monolithic repo die 
 out via attrition.

That sounds like an easy way to do things.

 I don't actually know exactly what they put in their repos; I 
 think it's about half huge unpacked source tarball I 
 downloaded from somewhere then tinkered with and half huge 
 CAD files and .docx contracts.

It's entirely possible that the empty commit messages you reported were due to 
users not actually entering anything in the messages.  Many of the commit 
messages I've seen (particularly from non-software people, but even from a few 
of those) are less informative than I'd like - a lot are totally empty.

Regards,

Geoff

-- 
Apologies for the auto-generated legal boilerplate added by our IT department:




- The contents of this email, and any attachments, are strictly private
and confidential.
- It may contain legally privileged or sensitive information and is intended
solely for the individual or entity to which it is addressed.
- Only the intended recipient may review, reproduce, retransmit, disclose,
disseminate or otherwise use or take action in reliance upon the information
contained in this email and any attachments, with the permission of
Australian Arrow Pty. Ltd.
- If you have received this communication in error, please reply to the sender
immediately and promptly delete the email and attachments, together with
any copies, from all computers.
- It is your responsibility to scan this communication and any attached files
for computer viruses and other defects and we recommend that it be
subjected to your virus checking procedures prior to use.
- Australian Arrow Pty. Ltd. does not accept liability for any loss or damage
of any nature, howsoever caused, which may result
directly or indirectly from this communication or any attached files. 




Re: Breaking up a monolothic repository

2013-09-08 Thread Trent W. Buck
Geoff Field geoff_fi...@aapl.com.au writes:

 I get the impression that $company's projects mostly have a finite
 lifespan (a couple of years),

 By lifespan, what exactly do you mean?  At my company, the
 individual projects might be in production within anywhere from 6
 months to 2 years after start of development, be manufactured for two
 to four years, then go into support mode for up to 7 years (or more).

That's probably a more accurate way of putting it.
But the bottom line is migration through attrition ought to work.

 It's entirely possible that the empty commit messages you reported
 were due to users not actually entering anything in the messages.
 Many of the commit messages I've seen (particularly from non-software
 people, but even from a few of those) are less informative than I'd
 like - a lot are totally empty.

Ah, sorry, I wasn't clear.  Supposing the repo has two subdirs:

projects/1_Muffins
projects/2_Cakes

Then when I use svnsync to make a repo that only contains
projects/2_Cakes, I still have a bunch of commits that WERE making
changes to projects/1_Muffins -- so they have commit messages and
authors and times and suchlike metadata -- but they don't actually *do*
anything anymore, because they files they edited aren't in
projects/2_Cakes.

If there were only two projects, it wouldn't be too bad, but suppose 100
projects, with 1000 commits each.  If I use svnsync, I end up with 100
repos, each of which has 99,000 useless commits.