Re: Mergeinfo is not per node

2014-07-26 Thread Thomas Åkesson
On 22 jul 2014, at 13:40, Julian Foad  wrote:

Hi Julian,

I happened to read this post despite not having much focus on merge 
functionality. We use Subversion for XML-authoring and we don't support 
branching/merging of trees, just files. This line of thought approaches one of 
our long-standing struggles; move tracking. The XML contains large amounts of 
hrefs to other XML files and graphics. Move/rename of individual files are 
problematic (regardless if the hrefs are relative or absolute).

> What I have been calling "mergeinfo" here is only part of the information we 
> need for merging. We also need a way to map nodes in the source branch to 
> nodes in the target branch, in order to apply most of the individual per-node 
> changes in the source branch to the "right places" on the target branch 
> before falling back to conflicts and user input where this automatic attempt 
> fails. I am starting to see this mapping as an almost completely separate 
> problem with its own metadata rather than something that the mergeinfo should 
> give us for free.

I am particularly interested in "a way to map nodes in the source branch to 
nodes in the target branch". We have been thinking about an alternative to path 
when identifying a file/node in a repository. It would be interesting if a node 
received an ID when first appearing in the repository and the ID would be 
stable across move operations. Copy is debatable, but my current thinking would 
be that copies get new IDs but might in addition maintain a list of ancestry.

This is just some thoughts that we briefly considered but we have not explored 
further. 

Regards,
Thomas Å.

Mergeinfo is not per node

2014-07-22 Thread Julian Foad
For those interested in merging etc., a note on a recent line of thought.

For some time now I've had this idea going round my head that mergeinfo 
theoretically belongs to each node separately, and that we "elide" subtree 
mergeinfo only for convenience, compactness, and to make it less obtrusive and 
more easily understandable to the user.

It seemed a nice idea, but it's wrong. Mergeinfo is not inherently "per node".


WHY?

The content of two branches is usually *different* -- that's the point of 
branches.

In the per-file model of branching used by CVS, for example, each file is 
branched, and the content of each branch of that file can differ. This means 
for each file in the source tree there is one obviously corresponding file in 
the target tree.

In Subversion the intention is to version trees rather than just separate 
files, and so two branches can differ in tree structure as well as in file 
content. The changes to one file on branch B1 can correspond to changes in two 
files on branch B2, or in no particular file on branch B2, and so on. A merge 
cannot assume there is a 1-to-1 mapping of nodes.

Imagine the change on branch B1 at revision 100 consists of renaming a 
function, and updating all calls to it. The change affects files foo.c and 
foo.h and bar.c. When we merge this change to the target branch B2, we have to 
adjust the result, manually and/or automatically,  to fit the target branch. 
Perhaps foo and bar have been combined into a single file foobar.c on branch 
B2, and so the change affects only foobar.c. This does not mean foobar.c alone 
has received that change, as that would imply all other nodes are still 
eligible to receive that change. Rather, the information we need to track is 
that the target branch as a whole has received the change as a whole.

- The merge source changes may be a selection of changes from just one subtree 
(or more generally a subset of the nodes) in the source branch;

- but the target is not inherently "the corresponding subtree", it's the whole 
tree;

- and other target nodes/subtrees are *not* still eligible to receive this 
change.


NESTED BRANCHING

With nested branching, on the other hand, mergeinfo *does* belong to a subtree 
of the outer branch. The intent is to track that a change was merged into a 
subtree B2/D1, but there may be another subtree B2/D2 where the same change is 
still eligible to be merged.

- The merge source is a selected subtree;

- the target is a "corresponding" subtree;

- other target subtrees are still eligible to receive this change.


THEREFORE

Mergeinfo belongs to the target branch as a whole, in the (common) case of a 
selective merge of a part of the changes in the branch.

Mergeinfo belongs to the target subtree (as a whole) when the intent is nested 
branching.


SO WHAT?


In designing a revised repository model, we should not think of mergeinfo as an 
attribute that appears in the model on every node and needs to be 
elided/normalized for storage efficiency.

On the client side, we should in future keep mergeinfo only on the branch root 
in most cases, more so than we do today. We need to *distinguish* the two 
cases: whether the user intends to merge only a subset of the changes in the 
whole branch, or to make a nested branch. To do so, we may consider heuristics 
(for example, assume a subset merge is intended if there is no mergeinfo on the 
specified target but there is on a parent) as well as explicit UI.

What I have been calling "mergeinfo" here is only part of the information we 
need for merging. We also need a way to map nodes in the source branch to nodes 
in the target branch, in order to apply most of the individual per-node changes 
in the source branch to the "right places" on the target branch before falling 
back to conflicts and user input where this automatic attempt fails. I am 
starting to see this mapping as an almost completely separate problem with its 
own metadata rather than something that the mergeinfo should give us for free.

- Julian