For move tracking we need a move-aware editor. svn_editor_t ("Ev2") is designed to support moves. However, I'm not clear how its move support is meant to work, on a fairly basic level:
* What are the ordering rules for moves? * Are the move operations to be interprested in series or in parallel? * When an operation refers to a node that is involved in a move, how are the paths to be interpreted? From conversations with other devs I know I'm not the only one. There are a number of rules written down in <svn_editor.h> but to my reading they didn't seem to explain the overall scheme, and they contain inconsistencies, some of which I have posted as queries to this list in the past but remain unresolved [1]. Having failed to grok what is intended, I have deliberately put off trying to dig further into Ev2's version of move support, until now, in order to think freely about the theory and to focus more evenly on all the areas of move tracking. (That's why I started using Ev1 in my experimental branch [2] and in my notes [3].) But now we should tackle this. So, please could somebody help to elucidate how Ev2 move support works or should work? == Basic Theory == I can imagine two possible extremes for the overall mode of operation to which such an editor *could* be designed: parallel and sequential. I'll mention some essential characteristics of each. == Sequential == Edits are described sequentially, incrementally. The consumer maintains a state that describes the 'current' tree shape, starting with the initialtree shape, and being changed incrementally by each edit operation. Each edit operation refers to paths that exist in the 'current' tree state and/or that will exist in the next state. == Parallel == The producer sends all changes in parallel or in an unspecified order. Each change is described independently of all other changes: it refers to paths (or nodes) in a way that doesn't depend on the other changes being sent. In fact, rather than talking about 'operations' or 'changes', the semantics required here is to describe the final state in any compact way, not necessarily in terms of changes against the initial state. (One case that may seem difficult to parallelize is adding a new parent directory and its children. How can the consumer accept a child node before the creation of the parent directory? The answer is simply that the consumer would be designed to operate this way, storing the children either in their final form or in a temporary form, and linking them into the parent once all necessary information has been received.) == Hybrid == We could mix these and have a scheme that is partly parallel, partly sequential, according to some additional rules. That is fine, but more complex to describe and understand. Note that while Ev1 is mostly sequential, the copy-from field of a copy (when used for a move) is more like the parallel scheme, since it refers to a path in the initial tree state and not a path in the current tree state. == Cyclic moves == We need to accommodate arbitrary sets of moves, including cycles. For an example, we'll use a simple cycle, swapping two siblings A and B. This ability is needed for two reasons: * Ev1 supports this (del A, copy A from B@OLD, del B, copy B from A@OLD). * A single edit must be able to represent the combination of any series of edits, and a series of moves can result in a cycle even if each individual edit only supports non-cyclic moves. (1: mv B C. 2: mv A B. 3: mv C A.) In a parallel scheme, no special treatment is necessary: A -> B B -> A In a sequential scheme, there are two ways to accommodate cyclic moves: * Allow moving to a temporary path and then later to the final path: mv A tmp; mv B A; mv tmp B * Allow a 'swap A B' or 'rotate A B ...' operation: rotate A B == Swap or Rotate == swap(A,B) == { A->B || B->A } is a primitive operation. ('||' indicates 'in parallel'.) rotate(A,B,...N) == { A->B || B->... || ...->N || N->A } is non-primitive, since any rotate can be composed from multiple sequential swaps. For example, rotate(A,B,C) == { swap(B,C); swap(A,B) }. It is no more useful to rotate more than two paths in one operation (rotate(A,B,C)) than to move more than two paths in one operation (move(A,B,C) == { A->B || B->C }). So if we have a sequential scheme and we don't want to use temporary paths, we should include the 'swap' primitive rather than a 'rotate'. == Ev2 == One of Ev2's goals is to parallelize text modifications, in order to take advantage of Serf's parallelism. What about tree changes, and specifically what are the semantics of the 'move' operation? The Ev2 documentation indicates sequential requirements for the 'add' operation (parent before children), and also in this rule: * - The ancestor of an added, copied-here, moved-here, rotated, or * modified node may not be deleted. The ancestor may not be moved * (instead: perform the move, *then* the edits). The doc string for 'svn_editor_move' says: * Move the node at @a src_relpath to @a dst_relpath. * * @a src_revision specifies the revision at which the receiver should * expect to find this node. That is, @a src_relpath at the start of * the whole edit and @a src_relpath at @a src_revision must lie within * the same node-rev (aka history-segment). This is just like the * revisions specified to svn_editor_delete() and svn_editor_rotate(). * ... The discussion implies that "the node at src_relpath" refers to a node in the initial tree state. In that respect, it's a parallel operation: the source reference doesn't depend on previous moves. But look at the 'move_cb' implementation in libsvn_fs/editor.c: # src_root == *initial* tree state svn_fs_copy(src_root, src_fspath, root, dst_fspath, scratch_pool) /* Notice: we're deleting the src repos path from the dst root. */ # root == *current* tree state (txn) svn_fs_delete(root, src_fspath, scratch_pool) The delete of src_relpath within the current tree state is potentially at odds with the copy from src_relpath relative to the initial state. The delete would delete the wrong thing if the original node at this path has already been replaced, or would try to delete a non-existent node if it had already been deleted or moved away. Could it have been replaced or deleted or moved away, if not itself then perhaps by a copy/move/add/delete of a parent directory? Let's try the following scenario: A->A2 || A/B->B2 # move a child of a moved parent Try with Ev2: move(src_relpath='A', src_rev=100, dst_relpath='A2', replaces_rev=-1) move(src_relpath='A/B', src_rev=100, dst_relpath='B2', replaces_rev=-1) The implementation shown above would fail in the second move when trying to delete 'A/B', because that path no longer exists in the transaction after the first move deleted the whole subtree at 'A'. The other way works fine: move(src_relpath='A/B', src_rev=100, dst_relpath='B2', replaces_rev=-1) move(src_relpath='A', src_rev=100, dst_relpath='A2', replaces_rev=-1) So these moves are not parallelizable with this implementation. (Is that implementation wrong?) Ev2 also documents (as the first Driving Order Restriction) that there must be an alter_directory call for the parent of moved-away node 'A/B'. What does this look like? alter_dir('A', ...) or alter_dir('A2', ...) ? Can the alter_dir come before or after the move of 'A' to 'A2'? Is the path it references 'A' in either case, or 'A2' in either case, or 'A' if it comes before the move and 'A2' if it comes after the move? Can we start with a clear consensus of whether we're trying to deliver a sequential or a parallel editor? - Julian [1] For example, email subject "Re: Ev2 -- Driving Order Restrictions" from Julian Foad on 2012-07-18, <http://svn.haxx.se/dev/archive-2012-07/0247.shtml>. [2] 'move-tracking-1' branch, <http://svn.apache.org/repos/asf/subversion/branches/move-tracking-1/>. [3] Wiki page 'MoveDev/MoveDev', <https://wiki.apache.org/subversion/MoveDev/MoveDev>. -- Join WANdisco's free daily demo sessions on Scaling Subversion for the Enterprise <http://www.wandisco.com/training/webinars>