[Monotone-devel] suturing, resurrection, renaming and unique file identifiers

Mikkel Fahnøe Jørgensen Tue, 27 May 2008 14:17:47 -0700

I have been following monotone for a long time and it has matured quite a bit.


Now it seems one of major remaining issues is related to
non-content-conflicts which let me to think about which model might
solve these issues in a natural way. The following is a proposal for
such a solution. It is not necessarily practical for monotones
implementation, but I think it somewhat unifies monotones rename
tracking and gits non-rename tracking models by viewing every change
as new file.
I'm sorry if this is old ideas as I have not followed monotone
development that closely.

Another major issue is die-die-die merge where you more or less
blindly risk loosing changes made in a child branch after the parent
branch has deleted files. I think my proposal also to some extend can
deal with this through resurrection handling.


I looked a bit on how git does not track files vs. Monotone and
Bitkeeper which do track file renames.

Apparently there are some major issues with "suturing",
non-content-conflicts (ncc's) and resurrecting dead files in the
models using unique file id's, including monotone.

Git on the other hand, does not attempt to maintain in history of
identity which makes it easy make it fail on trivial test cases,
although it seems to work in praxis. But then, I understand, there are
issues with old merges that does require repeated resolution in some
cases.

The standard answers to ncc's seems to be "drop one file" both as a
work around and as a possible future solution (and apparently
Bitkeeper does something similar).

So, not knowing the details of monotone internals, I can't say how
easy or difficult a solution is, but here is a proposal:

Each file has a unique id used to track identity. This id causes problems.
At the same time, source control really is a functional tree, in the
sense that things never change.

So why not conceptually require every change to cause a new file id,
representing a new value.
(We can discuss if parent directories should then also have a new id,
but I think that is separate).

If we do this, we see some interesting behaviour:
Two files that are different but should be the same are trivially
handled by creating a new file id and pointing to the two old
versions.
It does not matter if the dimension was past revisions of the same
file, or past revisions of independent files.
It does matter, however, that we have identified that the merge is valid.

If we look at git, I think this is approximately what it does, except
not having file id's and always having new id's is somewhat related.
But git does not track the origins, it only infers the origin from
content and path hints.

By having a concept of unique id we can explicitly state all sources
that contributed to the new file version. This may not be precise
since we could have copied or moved a function (Linus argument), but
we can add as much data as we find relevant.
Instead of (or in addition to) a rename operation, we can have a
"depends on" operation that both covers renames and files where
significant source portions have been moved.

File splits are handled by creating two files that depends on the same
file. Each new version get a new id.

Assuming new id's are all fine, what about changes made in other
branches? When a change is merged to a target revision, we can find
the merge candidate not by comparing file id's but by finding common
history in sliding the change forward along the path of file id
changes until we reach one or more files in the current merge target
revision. If we reach more than one file, we have merge for a file
split which is tricky, but otherwise we have either a simple change
history or two joined files, which is then trivial to merge. For split
files, we can view the to (or more) files as a single concatenated
file and apply standard diff to find the relevant merge location, or
issue a conflict if this fails.


With new id's for every change, that standard change a single file
becomes a special case. We can optimise this special case to not
change the unique id in this common case (altough one could also argue
that a new id would link it to a unique source tree version). This
special case then becomes the normal operation in current monotone. We
can then argue that the problems with split and join happens exactly
because we do not allow new id's as a natural cause of events.

If we look at deleted files, we can resurrect deleted files by
creating a new file with a new id and say that the new file depends on
the old file.
When we have a change to a deleted file which is merged into a target
revision where the file is resurrected, we use the same algorithm as
above. We slide that change forward, discover the file is deleted,
(which normally kills the change, possibly with a warning), but then
discovers we can continue sliding over to the resurrected file.

A deleted file and joined files then become the same case: two files
to be joined are deleted, then the merged result is the "resurrected"
new content.

In conclusion, we always treat changes as new files (and possibly
directories), even in the simple single file change case. We then use
the dependency graph to resolve merges and to produce diffs.

Regards,
Mikkel


_______________________________________________
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel

[Monotone-devel] suturing, resurrection, renaming and unique file identifiers

Reply via email to