TLDR: Ev2 move support doesn't work with the "move" and "rotate" instructions as currently defined, and I propose that it could work instead with separate "move-away" and "move-here" instructions.
== SUMMARY == Ev2 was designed with an intention to support “move” semantics, but the current design of its move() method does not satisfy the requirements. Here I explain why, and present a solution. The following principles constrain or guide Ev2 design: * An edit must be able to express any change that can be constructed in a WC and committed. * An edit must be able to express any change that can be constructed in a repo across multiple revisions. * Ev2 uses sequential description: a succession of incremental edits applied to an initial state gives a final state. * Ev2 should not require the driver to use temporary paths to express a change. Given the constraints, * Not all moves can be expressed by a “move source to destination” operation, with or without a “rotate” operation. * Moves can be expressed, in general, using separate move-away and move-here operations. While other designs are also possible, a move-away and move-here design might be the best fit for the shape of the editor. Such a design is detailed below, but not yet finished. == WHY IS THIS NECESSARY? == === Express Any Change === Since the purpose of the editor is to communicate a change originating in a WC when it is committed, and a change originating in a repository when the WC is updated, then it must be able to express any such change. This includes updates across multiple revisions, and from a mixed-revision state to a single revision. Through a series of simple steps of the form “move X to Y”, some quite funky overall changes can be created. For example, starting from this state: | +-- A | +-- B the following sequence: move /A/B /X move /A /X/B move /X /A results in swapping A with B. If those steps are performed in a WC and the result committed all at once, the editor needs to be able to handle it. If those steps are committed separately, and then a working copy is updated, the editor needs to be able to handle it. More examples are given later, some of which involve other operations such as “make directory” as well as moves. All of this emergent complexity results from the introduction of a simple “move” primitive, and there does not seem to be any acceptable way to further constrain the basic move that would simplify the derived complexity.Temporary Paths Paths in the Ev2 editor operations refer to the current state of the tree being edited, at that moment in the sequence of edit operations. (The sole exception is the copy-from path, which is relative to a given committed revision.) A natural and compact way of expressing moves would be as a mapping from original-state paths to final-state paths. However, that paradigm does not fit well with the desire to express the edit as a sequence of incremental steps. If we are going to include move functionality as steps in a sequence of edit operations, it makes sense to use paths that are relative to the current state. Ev2 should not require the driver to express a change as a sequence of operations that can include moving a node to a temporary path and then later moving it again to the final path. The end result of the edit is the important part, and there are unlimited potential sequences that lead to that result, and it does not make sense to require an edit driver to construct such a sequence arbitrarily, if there is an alternative method that specifies the result uniquely. The receiver may in fact need to employ temporary paths in its implementation, but then it knows this and it is in a better position to construct such paths when needed, and it will know that they are temporary which may be important. There are advantages to placing a node in its final position exactly once. It was claimed that Google's BigTable implementation of Svn's back-end would have benefited from knowing that once a node has been edited then it is in its final state. Ev2 aims to provide that guarantee, where Ev1 could not. === Sequential Description === Ev2 (svn_editor_t) intends to express a new state in terms of an old state and a description of the parts of the new state that differ from the old state, or, in other words, a description of the changes against the old state. It uses a sequential, incremental description: for example, “add directory X, then copy file X/Y from somewhere, then edit the contents and properties of file X/Y”. Only certain parts of the description are incremental. Ev2 aims to allow nodes to be visited in arbitrary order, subject to a small number of restrictions. The parts where ordering matters are: * tree changes before content changes * alter/add/copy a directory (if required) before touching its (immediate) children === A Move Event is Not Adequate === Moves, in general, cannot be expressed as “move from path X to path Y” events in such a sequential description without introducing temporary paths. This is because some cases require the source path of the move to be moved away, then some other steps, and then the destination path of the move can be populated. Some classic examples are: Example 1: Insert a directory level | | +--A mv--\ (add) +--A \ | \--> +--B The add cannot happen before the move-away, but must happen before the move-here. (The mirror of Example 1, which would be removing a directory level, is not actually problematic: | | +--A (del) /--> +--A | / +--B mv--/ The move-away must (in principle) happen before the delete, while the move-here cannot (in principle) happen before the delete. However, Ev2 defines that a deletion which is to be replaced shall occur implicitly when the replacement is put in place, and so the move-here can happen simultaneously with the delete.) Example 2: Swap two siblings | | +--A mv--\ /--> +--A | X | +--B mv--/ \--> +--B Neither of the moves can be completed before doing the move-away part of the other one. Example 3: Swap two directory levels | | +--A mv--\ /--> +--A | X | +--B mv--/ \--> +--B Neither of the moves can be completed before doing the move-away part of the other one. === A Rotate Event is Not Adequate === At one time there was an idea that the addition of a “rotate PATH1 PATH2 … PATHn” event would complete the semantics and allow arbitrary moves to be supported. While this does enable Example 2 (swap two siblings) and Example 3 (swap two directory levels) and many other cases, it does not help with inserting a directory level (Example 1), and it has been shown [1] to be incapable of resolving other more involved cases involving swapping or rotation. One specific example is swapping A with A/B/C [2]: | | +-- A mv--\ /--> +-- A | \ / | +-- B mv--- X ---> +-- B | / \ | +-- C mv--/ \--> +-- C [1] Email thread on dev@, “Ev2 as a move-aware editor”, started on 2013-06-24, e.g. <http://svn.haxx.se/dev/archive-2013-06/0560.shtml> or < http://mail-archives.apache.org/mod_mbox/subversion-dev/201306.mbox/%3C1372087321.76009.YahooMailNeo%40web186101.mail.ir2.yahoo.com%3E >. [2] An example problem in thread [1], of swapping A with A/B/C: < http://svn.haxx.se/dev/archive-2013-06/0684.shtml> or < http://mail-archives.apache.org/mod_mbox/subversion-dev/201306.mbox/%3C87bo6rewwp.fsf%40ntlworld.com%3E >. == MOVE-AWAY AND MOVE-HERE == One solution is to describe the two halves of each move separately: move-away SOURCE-PATH ... move-here DESTINATION-PATH We can then solve Example 1 in the following way: issue the “move-away A”, then create a new directory at path A which replaces that source of the move, and then finally issue the “move-here A/B” which relies on that replacement directory A having been created. The consumer must be able to put the node aside for an indeterminate amount of time until the “move-here” is received. Of course there needs to be a way to link each move-away with the corresponding move-here. Remembering that each edit step refers to the current state in a sequence of states, we cannot simply specify the path corresponding to the other end of the move like this: move-away SOURCE-PATH to DESTINATION-PATH ... move-here DESTINATION-PATH from SOURCE-PATH because the problem cases are when the destination path does not yet exist at the time of a move-away, or the source path no longer exists at the time of a move-here. What we can do is use some other unique reference that is unique within the edit, like this: move-away SOURCE-PATH as identifier ID1 ... move-here DESTINATION-PATH from identifier ID1 The reference could perhaps be the destination path as it will finally exist at the end of the edit, or just and arbitrary number or string. We will just specify the identifier as an “id” and not specify how it is generated. === Ordering Restrictions === The ordering rules regarding move-away and move-here should include: * mv-away must come before the matching mv-here - The edit should provide a sequential procedure that the consumer must be able to follow without having to buffer an arbitrary amount of state. * mv-here & cp & add must be in nesting order: create (or put in place) the parent before its children * mv-away must come before deleting a parent - Receiver needs to know that it must preserve this path when we delete its parent. * mv-away must come before mv-away of a parent - If we allowed “mv-away A; …; mv-away A/B” then the child path “A/B” would have to be specified not relative to the current state of the edit, as all other operative paths are, but in some other way, because the parent has gone into temporary namespace, and has perhaps been replaced so that “A/B” now refers to some other node. - There is a general rule that all edits within a moved directory “A” must come after A is moved its destination, but a mv-away of a subtree of A is not considered an edit for this purpose. === Examples Solved === Example 1: | | +--A mv--\ add--> +--A \ | \--> +--B 1. alter-dir / (children={A}) 2. mv-away A (id=“original A”) 3. add-directory A (children={B}) 4. mv-here A/B (id=“original A”) Example 2: Swapping two siblings | | +--A --\ /--> +--A | X | +--B --/ \--> +--B 1. alter-dir / (children={A,B}) 2. mv-away A (id=“original A”) 3. mv-away B (id=“original B”) 4. mv-here A (id=“original B”) 5. mv-here B (id=“original A”) Example 3 can also be solved in this way, except for some ordering restriction issues that are discussed below.The Need for Alter-directory before Tree Changes Why do we have this ordering rule? If any path is added (with add_*) or deleted/moved/rotated, then an svn_editor_alter_directory() call must be made for its parent directory with the target/eventual set of children. It can't be quite right. Inside an add-directory, all the children have to be added, and that would imply we need an alter-directory as well as the add-directory, violating the Once Rule. It omits “copied” – just an oversight? It does not specify when the call must occur – presumably it must be before any such modifications to the children. To remedy those three initial problems, I suppose something like this is intended: Either alter_directory() or add_directory() must be called on a directory, declaring its final set of children, before calling delete(), move_away(), move_here(), copy(), or add_*() on any child. (For delete() or move_away(), it must be alter_directory(), as the children of add_directory() must not be deleted or moved away.) But there is still a problem. If we require alter_directory() before a move_away(), it leads to a violation of the Once Rule as shown in the following example. Example 3: Swapping two directory levels | | +--A --\ /--> +--A | X | +--B --/ \--> +--B 1. alter-dir A (children={}) ### Needed? 2. mv-away A/B (id=”original A/B”) 3. alter-dir / (children={A}) 4. mv-away A (id=”original A”) 5. mv-here A (id=”original A/B”) 6. alter-dir A (children={B}) ### Second alter-dir on same path, if (1) was needed. 7. mv-here A/B (id=”original A”) There are two potential problems here: * We make an edit within subtree A (the “move-away A/B”) before moving A * The “alter-dir A” is performed twice The first point violates the assumed constraint that we should disallow edits within a subtree before moving the subtree. The second point at face value violates the Once Rule. We can probably resolve is by making the Once Rule apply per node rather than per path – see below. === What If We Allow Edit Before Move? === We were assuming that we should disallow edits within a subtree before moving the subtree. [Why?] One solution might be to drop that requirement and let the subtree be moved (move-away) part way through editing it, allowing all editing operations. To accommodate such a change, some of the other rules that currently refer to “a path” probably need to be reformulated to refer to “a path relative to such a subtree” instead. If we allow edits before and after moving, should we also allow edits after the move-away and before the move-here? Not sure. It seems like that may be more problematic for certain consumer architectures and so probably should not be allowed. But is there a better way to decide? === What if the Once Rule is Per Node? === The path-based Once Rule was written something like this: A path should never be referenced more than once by the add_*, alter_*, and delete operations (the "Once Rule"). The source path of a copy (and its children, if a directory) may be copied many times, and are otherwise subject to the Once Rule. The destination path of a copy [or move_here] may have alter_* operations applied, but not add_* or delete. If the destination path of a copy or move is a directory, then its children are subject to the Once Rule. The source path of a move_away() (and its child paths) may be replaced using add_*() or copy() or move_here() (where these new or copied nodes are subject to the Once Rule). Perhaps the Once Rule should not apply per path as it was stated, but rather per node. If a directory is altered and then moved away, we should be able to create a replacement directory, being a different node at the same path, and then alter that. * The Once Rule applies per node, rather than per path. The definition of a “node” is such that “move” moves an existing node (or nodes) to a new path and does not create a new node, while “add” creates a new node and “copy” creates a set of new nodes, each new node being different from all other nodes that are or were in the tree even if it replaces an existing node at the same path. * One of the following actions can be applied, just Once, directly to each node: - create (only if it does not exist in the initial tree) - remove (only if it exists in the initial tree or is brought in as a child of a copy) - modify (only if it exists in the initial tree or is brought in as a child of a copy) * A node may be created by one of: - add_*() - copy(), optionally followed by alter_*() No other operations may be applied to such a node during the entire edit. Any children may be edited after (not before) the node is created. When a node is created by copy(), its children (recursively) are brought into the tree, and are then subject to the Once Rule as existing nodes. * A node may be removed, along with all its children recursively, by one of: - delete() - add_*() replacing this node - copy() replacing this node - move_here() replacing this node No other operations may be applied to such a node, nor to any children (recursively), during the entire edit, except for nodes that have been moved away. * A node may be modified by any combination from the following: - move_away() followed by move_here(), at most once - alter_*(), at most once - (if a directory) edit a child These can come in any order except that alter_*() is required before editing any children, and neither of those can come between move-away and move-here except for move-away of a deeper child [and/or editing of such prior to move-away?]. [### This seems more complex than it could be. It would be much easier if alter_directory() were not required before adding/moving/editing children.] * The source of a copy operation may be a node in the tree being edited. Such a node (and its children, if a directory) may be copied many times, in addition to being subject to the Once Rule as existing nodes. [### I'm not sure about the Copy operation: the doc string implies it's copying from something in the current state, but it should be able to copy from any path-rev.] == CONCLUSION == With a bit more work on the ordering restrictions, especially the requirement about calling alter-dir, it looks like this design may fit the requirements. Thoughts? - Julian