On 07-Oct-11 6:59 AM, Julian Foad wrote:
On Fri, 2011-10-07 at 11:29 +0100, Julian Foad wrote:
Stefan Sperling wrote:
julianfoad wrote:
+/* This property marks a branch root. Branches with the same value of this
+ * property are mergeable. */
+#define SVN_PROP_BRANCHING_ROOT "svn:ignore" /* ### should be 
"svn:branching-root" */

Hi Stefan. Thanks for picking up on this.

I think your addition of a 'branch root' property is quite a significant
step. Is this really necessary in order to improve the output of
'svn mergeinfo' or do you have additional steps planned that go beyond
tuning output?

Both.  I think knowing whether the (requested) merge source and target
are branch roots (and indeed branches of the *same* "project" or tree)
is important for improving the output and diagnostics of "svn mergeinfo"
and "svn merge" commands.

It could of course enable other new behaviours relating to branches, and
I don't know what those are yet (apart from trivial UI things like
answering "is this a branch?").

So I'm working on the idea that it would be useful to have branch roots
identifiable by some mechanism, so I'll add "some mechanism" (currently
this property, but I'm totally open to a different mechanism such as
branch points being defined in a config file) and see what useful
behaviours I can come up with.

There has been some discussion about adding a property for this
and similar purposes in the past, see
http://svn.haxx.se/dev/archive-2009-09/0156.shtml
(there are probably more threads about this topic)

Yes, and it's time to figure out what we can usefully do with such
information and then we'll know exactly what branch configuration
information we need and what's a good way to store it.

I'll reply to the rest in a further email.

- Julian
Welp, I'm never going to get a better lead in than that, so, hi, folks!
Freelance SCM consultant here; used to specialise in ClearQuest, of all
things, but my last two gigs ended up revolving around Subversion.

Specifically, Subversion merges, in the enterprise, and the, uh, quirks
involved.  Each client had different requirements, and thus, the
solution I ended up delivering to each one differed a bit.  The first
solution was neat, and did all kinds of funky ClearQuest integration
and merge validation, but the second one is more applicable to this
discussion, so I'll describe that first.

In essence, it's a hook framework that attempts to enforce Subversion
best-practices by blocking* incoming commits if it detects one or more
of the following:

        (*) Sometimes it'll block, but phrase the error message
            along the lines of "if you *really* want to do this,
            re-try your commit with the phrase 'CONFIRM MULTI-ROOT
            RENAME' somewhere in your commit message".

    TagCopied
    TagRenamed
    TagRemoved
    TagModified
    TagReplaced
    TagSubtreeCopied
    TagSubtreeRenamed
    TagSubtreeRemoved
    TagSubtreeModified
    TagSubtreeReplaced
    MultipleUnknownAndKnownRootsModified
    MixedRootNamesInMultiRootCommit
    MixedRootTypesInMultiRootCommit
    SubversionRepositoryCheckedIn
    MergeinfoAddedToRepositoryRoot
    MergeinfoModifiedOnRepositoryRoot
    SubtreeMergeinfoAdded
    RootMergeinfoRemoved
    DirectoryReplacedDuringMerge
    EmptyMergeinfoCreated
    TagDirectoryCreatedManually
    BranchDirectoryCreatedManually
    BranchRenamedToTrunk
    TrunkRenamedToBranch
    TrunkRenamedToTag
    BranchRenamedToTag
    BranchRenamedOutsideRootBaseDir
    TagSubtreePathRemoved
    RenameAffectsMultipleRoots
    UncleanRenameAffectsMultipleRoots
    MultipleRootsCopied
    UncleanCopy
    FileRemovedFromTag
    CopyKnownRootSubtreeToValidAbsRootPath
    MixedRootsNotClarifiedByExternals
    CopyKnownRootToIncorrectlyNamedRootPath
    CopyKnownRootSubtreeToIncorrectlyNamedRootPath
    RenamedKnownRootToIncorrectlyNamedRootPath
    MixedChangeTypesInMultiRootCommit
    CopyKnownRootToKnownRootSubtree
    UnknownPathCopiedToIncorrectlyNamedNewRootPath
    RenamedKnownRootToKnownRootSubtree
    FileUnchangedAndNoParentCopyOrRename
    DirUnchangedAndNoParentCopyOrRename
    EmptyChangeSet
    CopyKnownRootToUnknownPath
    CopyKnownRootSubtreeToInvalidRootPath
    NewRootCreatedByRenamingUnknownPath
    UnknownPathCopiedToKnownRootSubtree
    NewRootCreatedByCopyingUnknownPath
    PathCopiedFromOutsideRootDuringNonMerge
    UnknownDirReplacedViaCopyDuringNonMerge
    DirReplacedViaCopyDuringNonMerge
    DirectoryReplacedDuringNonMerge
    PreviousPathNotMatchedToPathsInMergeinfo
    PreviousRevDiffersFromParentCopiedFromRev
    PreviousPathDiffersFromParentCopiedFromPath
    PreviousRevDiffersFromParentRenamedFromRev
    PreviousPathDiffersFromParentRenamedFromPath
    KnownRootPathReplacedViaCopy
    BranchesDirShouldBeCreatedManuallyNotCopied
    TagsDirShouldBeCreatedManuallyNotCopied
    CopiedFromPathNotMatchedToPathsInMergeinfo
    InvariantViolatedModifyContainsMismatchedPreviousPath
    InvariantViolatedModifyContainsMismatchedPreviousRev
    InvariantViolatedCopyNewPathInRootsButNotReplace
    MultipleRootsAffectedByRemove
    AbsoluteRootOfRepositoryCopied
    PropertyChangedButOldAndNewValuesAreSame
    CopiedOrRenamedUnknownPathToIncorrectlyNamedNewRootPath
    UnknownPathRenamedViaReplaceToExistingKnownRoot
    UnknownPathCopiedViaReplaceToExistingKnownRoot
    UnknownPathRenamedToKnownRootSubtree
    UnknownPathCopiedToKnownRootSubtree
    KnownRootSubtreeRenamedViaReplaceToExistingKnownRoot
    UncleanRenameOfRootAncestorPath
    RenamedKnownRootViaReplaceToExistingKnownRoot
    RootPathAncestorRenamedViaReplaceToExistingKnownRoot
    RenamedKnownRootViaReplaceToRootAncestorPath
    RenamedKnownRootViaReplaceToRootAncestorPath
    RootPathAncestorRenamedToValidAbsoluteRootPath
    RootPathAncestorRenamedToValidRootPathSubtree
    RootPathAncestorRenamedToKnownRootSubtree
    RootPathAncestorRenamedViaReplaceToRootAncestorPath
    RenamedKnownRootToUnknownPath
    RenamedKnownRootSubtreeToUnknownPath
    RenamedKnownRootSubtreeToValidRootPath
    RenamedKnownRootSubtreeToIncorrectlyNamedRootPath
    UncleanRename
    RenameRelocatedPathOutsideKnownRoot

        (There's probably room for another e-mail thread just
         discussing all of these conditions; let's just say,
         Subversion repositories in the enterprise rarely look
         like their usually-well-laid out open source repository
         brethren.  What was the Blade Runner line?  "I've seen
         things you people wouldn't believe."? ;-)  My personal
         favorite: 'SubversionRepositoryCheckedIn'.)

So, as you can see, most of these conditions involve the concept of a
root.  Thus, the ability to accurately discern what constitutes a root
took up a large portion of my time.

Hard-coding regexes and forcing all repositories to confirm to a pre-
defined repository layout worked like a charm for my first client, as
I was coming in before they had any Subversion repositories rolled out
into production.  (Well, sort of.)

That unfortunately wasn't feasible for my second client.  They were a
*huge* Subversion shop.  At the time I came in they had something like
960 production repositories, and I wouldn't be surprised if they were
well over 1,000 by now.  There was no standard layout between repos,
and a lot of repos used non-standard branches/tags/trunks paths so
trying to manage 'root detection' via regexes was a non-starter.

For example, a number of repos had layouts like this:

    /foo/trunk
    /foo/branches/1.0.x
    /foo/branches/bugzilla/1081

i.e. 'bugzilla' was just some random directory they created to hold
developer branches related to bugs.  A regex approach would have
matched 'bugzilla' as the branch root, whereas, in fact, the branch
root would have been 1081.

The other non-starter was requiring the admin staff to have to go in
and manually specify what constituted a branch, i.e. setting a 'branch
root' property on relevant paths.  The overhead that would have been
required to do that for ~1,000 repositories (with hundreds, if not
thousands of differently named branches/roots (i.e. not particularly
easy to automate reliably)) was not acceptable (for many enterprisey
reasons mainly surrounding cost).

So, I needed to design the branch detection logic in such a way that
it didn't require any hand-holding from the admins or support staff.

It took two attempts.

For the first attempt, I played around with the notion of a root *base*
directory, i.e. /branches and /tags.  The first thing the framework
would do when processing a pre-commit was create a 'RepositoryRoots'
class (the framework was written in Python FWIW), which would recurse
through the repo up to N-levels deep in order to determine the valid
root base directories.  Except for trunk, which was special, if a
directory had subdirectories that were created by copying another path
(i.e. how tags or branches are created), then the directory would be
considered a root base dir.

That lasted... about a day or two.  It was a leaky abstraction at best,
and broke when I encountered repos with the more non-standard layouts.
(I'm not even sure if I've described it accurately above; but eh, who
cares, it's gone now.)

The problem with the regex and base-root-dir discovery approaches was
that they were essentially heuristic based.  "This directory features
lots of subdirectories that were copies of other paths, therefore, it's
a good chance it's a valid root base directory."

In most cases, yes, that was a valid assumption, but not always.  The
root detection logic was the most critical piece of my solution -- I
wasn't getting paid to correctly detect roots 70% of the time in 60% of
the repos.  It needed to be 100% in 100%.

So, I thought to myself, how can I correctly and autonomously identify
a root with 100% accuracy?  What one property did valid roots share
that I could interrogate?  Heck, what even constitutes a root? A branch
is a root, so is a tag, so is trunk.

....and then it dawned on me.  It seems so simple now, in retrospect:

    In the beginning, there was one root: trunk.  Then it was copied
    elsewhere, and became a branch, or maybe a tag.  These copies are
    also roots, and copies of them should also be considered roots.

Ah, so simple!  I just need to start at revision 0 and work my way up to
HEAD, whilst keeping a record of roots I encounter along the way.  And
that's pretty much it ;-)

Turns out, that approach has worked surprisingly well.  It's been in
production at the second client's site for nearly a year now.  They just
run the 'repo analysis' part of the code against new repositories before
enabling the hooks, and wallah, they get instant root detection and
prevention of some 80-something erroneous conditions.

Here are some techie' details about the implementation.  So, the script
stores root information in a revision property called 'evn:roots' (set
against the root of the repository).  The value of evn:roots at any
given revision will list all of the known roots in the repo at that
revision:

% svn pg --revprop -r26503 evn:roots svn://client.com/repos/foo
{'/build/branches/3.0.1/': {'created': 22323},
 '/build/branches/3.0.2/': {'created': 23129},
 '/build/branches/3.1.0/': {'created': 25804},
 '/build/branches/cvs/0.0.1/': {'created': 26389},
 '/build/branches/bugzilla/4144/': {'created': 22121},
 '/build/branches/bugzilla/6952/': {'created': 17661},
 '/build/release//3.0.0/': {'created': 20774},
 '/build/release/paris/3.0.0/': {'created': 20307},
 '/build/release/rome/3.0.1/': {'created': 22473},
 '/build/trunk/': {'created': 2919},
 '/src/trunk/': {'created': 9353},
 ...

The 'created' revision refers to the revision that the root was created
in.  That's important, 'cause we store special metadata against the root
in the revprop for the revision it was created in:

% svn pg --revprop -r9353 svn://client.com/repos/foo
 ...
 '/src/trunk/': {
    'copies': {
        9834:  [('/src/branches/2.1/', 9835)],
        9997:  [('/src/branches/bugzilla/2800/', 9998)],
        10211: [('/src/branches/bugzilla/3326/', 10212)],
        10252: [('/src/branches/bugzilla/2160/', 10253)],
        10468: [('/src/branches/2.2/', 10469)],
        11148: [('/src/branches/2.3/', 11149)],
        11420: [('/src/branches/bugzilla/3720/', 11421)]},
    'created': 9353,
    'creation_method': 'created'},
 ...

i.e. we store all the subsequent forward-copies of this root, as well as
details of how it was created (which isn't very interesting in this
example, as it's trunk and was created via mkdir, but if it were a
branch or tag, it would contain details about where it was copied from).

Let's say I delete /src/trunk in r26504.  The entry for it in evn:roots
in that revision will be gone; but a note will be made against the r9353
creation revprop to indicate which rev it was deleted in.

The importance of storing data like this becomes apparent when you deal
with situations like this:

 *hooks are turned off*
    r2:     svn cp ^/trunk ^/branches/foo
    r3:     svn rm ^/branches/foo
    r4:     svn mkdir ^/branches/foo
 *repo is analysed, evn:roots are set, hooks are turned on*

An attempt to do the following would be blocked, because r4/HEAD of
/branches/foo was not created correctly (i.e. wasn't copied from an
existing root), and thus, isn't considered a root either:

    svn cp /branches/foo /branches/bar

However, the following *would* work, because /branches/foo *was* a valid
root in r2:

    svn cp -r2 /branches/foo /branches/bar


Thoughts?

    Trent.

Reply via email to