For move tracking we need a move-aware editor.  svn_editor_t ("Ev2") is 
designed to support moves.  However, I'm not clear how its move support is 
meant to work, on a fairly basic level:

  * What are the ordering rules for moves?

  * Are the move operations to be interprested in series or in parallel?

  * When an operation refers to a node that is involved in a move, how are the 
paths to be interpreted?

From conversations with other devs I know I'm not the only one.  There are a 
number of rules written down in <svn_editor.h> but to my reading they didn't 
seem to explain the overall scheme, and they contain inconsistencies, some of 
which I have posted as queries to this list in the past but remain unresolved 
[1].

Having failed to grok what is intended, I have deliberately put off trying to 
dig further into Ev2's version of move support, until now, in order to think 
freely about the theory and to focus more evenly on all the areas of move 
tracking.  (That's why I started using Ev1 in my experimental branch [2] and in 
my notes [3].)  But now we should tackle this.

So, please could somebody help to elucidate how Ev2 move support works or 
should work?


== Basic Theory ==

I can imagine two possible extremes for the overall mode of operation to which 
such an editor *could* be designed: parallel and sequential.  I'll mention some 
essential characteristics of each.

== Sequential ==

Edits are described sequentially, incrementally.  The consumer maintains a 
state that describes the 'current' tree shape, starting with the initialtree 
shape, and being changed incrementally by each edit operation.

Each edit operation refers to paths that exist in the 'current' tree state 
and/or that will exist in the next state.

== Parallel ==

The producer sends all changes in parallel or in an unspecified order.  Each 
change is described independently of all other changes: it refers to paths (or 
nodes) in a way that doesn't depend on the other changes being sent.  In fact, 
rather than talking about 'operations' or 'changes', the semantics required 
here is to describe the final state in any compact way, not necessarily in 
terms of changes against the initial state.

(One case that may seem difficult to parallelize is adding a new parent 
directory and its children.  How can the consumer accept a child node before 
the creation of the parent directory?  The answer is simply that the consumer 
would be designed to operate this way, storing the children either in their 
final form or in a temporary form, and linking them into the parent once all 
necessary information has been received.)

== Hybrid ==

We could mix these and have a scheme that is partly parallel, partly 
sequential, according to some additional rules.  That is fine, but more complex 
to describe and understand.

Note that while Ev1 is mostly sequential, the copy-from field of a copy (when 
used for a move) is more like the parallel scheme, since it refers to a path in 
the initial tree state and not a path in the current tree state.

== Cyclic moves ==

We need to accommodate arbitrary sets of moves, including cycles.  For an 
example, we'll use a simple cycle, swapping two siblings A and B.  This ability 
is needed for two reasons:

  * Ev1 supports this (del A, copy A from B@OLD, del B, copy B from A@OLD).

  * A single edit must be able to represent the combination of any series of 
edits, and a series of moves can result in a cycle even if each individual edit 
only supports non-cyclic moves.  (1: mv B C.  2: mv A B.  3: mv C A.)

In a parallel scheme, no special treatment is necessary:

    A -> B
    B -> A

In a sequential scheme, there are two ways to accommodate cyclic moves:

  * Allow moving to a temporary path and then later to the final path:
    mv A tmp; mv B A; mv tmp B

  * Allow a 'swap A B' or 'rotate A B ...' operation:
    rotate A B

== Swap or Rotate ==

swap(A,B) == { A->B || B->A }

is a primitive operation.  ('||' indicates 'in parallel'.)

rotate(A,B,...N) == { A->B || B->... || ...->N || N->A }

is non-primitive, since any rotate can be composed from multiple sequential 
swaps.  For example, rotate(A,B,C) == { swap(B,C); swap(A,B) }.  It is no more 
useful to rotate more than two paths in one operation (rotate(A,B,C)) than to 
move more than two paths  in one operation (move(A,B,C) == { A->B || B->C }).

So if we have a sequential scheme and we don't want to use temporary paths, we 
should include the 'swap' primitive rather than a 'rotate'.


== Ev2 ==

One of Ev2's goals is to parallelize text modifications, in order to take 
advantage of Serf's parallelism.  What about tree changes, and specifically 
what are the semantics of the 'move' operation?  The Ev2 documentation 
indicates sequential requirements for the 'add' operation (parent before 
children), and also in this rule:

 * - The ancestor of an added, copied-here, moved-here, rotated, or
 *   modified node may not be deleted. The ancestor may not be moved
 *   (instead: perform the move, *then* the edits).

The doc string for 'svn_editor_move' says:

 * Move the node at @a src_relpath to @a dst_relpath.
 *
 * @a src_revision specifies the revision at which the receiver should
 * expect to find this node.  That is, @a src_relpath at the start of
 * the whole edit and @a src_relpath at @a src_revision must lie within
 * the same node-rev (aka history-segment).  This is just like the
 * revisions specified to svn_editor_delete() and svn_editor_rotate().
 * ...

The discussion implies that "the node at src_relpath" refers to a node in the 
initial tree state.  In that respect, it's a parallel operation: the source 
reference doesn't depend on previous moves.  But look at the 'move_cb' 
implementation in libsvn_fs/editor.c:

  # src_root == *initial* tree state

  svn_fs_copy(src_root, src_fspath, root, dst_fspath, scratch_pool)
  /* Notice: we're deleting the src repos path from the dst root. */
  # root == *current* tree state (txn)
  svn_fs_delete(root, src_fspath, scratch_pool)

The delete of src_relpath within the current tree state is potentially at odds 
with the copy from src_relpath relative to the initial state.  The delete would 
delete the wrong thing if the original node at this path has already been 
replaced, or would try to delete a non-existent node if it had already been 
deleted or moved away.  Could it have been replaced or deleted or moved away, 
if not itself then perhaps by a copy/move/add/delete of a parent directory?

Let's try the following scenario:

  A->A2 || A/B->B2  # move a child of a moved parent

Try with Ev2:

  move(src_relpath='A', src_rev=100, dst_relpath='A2', replaces_rev=-1)
  move(src_relpath='A/B', src_rev=100, dst_relpath='B2', replaces_rev=-1)

The implementation shown above would fail in the second move when trying to 
delete 'A/B', because that path no longer exists in the transaction after the 
first move deleted the whole subtree at 'A'.

The other way works fine:

  move(src_relpath='A/B', src_rev=100, dst_relpath='B2', replaces_rev=-1)
  move(src_relpath='A', src_rev=100, dst_relpath='A2', replaces_rev=-1)

So these moves are not parallelizable with this implementation.  (Is that 
implementation wrong?)

Ev2 also documents (as the first Driving Order Restriction) that there must be 
an alter_directory call for the parent of moved-away node 'A/B'.  What does 
this look like?

  alter_dir('A', ...)

or

  alter_dir('A2', ...)

?  Can the alter_dir come before or after the move of 'A' to 'A2'?  Is the path 
it references 'A' in either case, or 'A2' in either case, or 'A' if it comes 
before the move and 'A2' if it comes after the move?

Can we start with a clear consensus of whether we're trying to deliver a 
sequential or a parallel editor?


- Julian


[1] For example, email subject "Re: Ev2 -- Driving Order Restrictions" from 
Julian Foad on 2012-07-18, <http://svn.haxx.se/dev/archive-2012-07/0247.shtml>.

[2] 'move-tracking-1' branch, 
<http://svn.apache.org/repos/asf/subversion/branches/move-tracking-1/>.

[3] Wiki page 'MoveDev/MoveDev', 
<https://wiki.apache.org/subversion/MoveDev/MoveDev>.

--
Join WANdisco's free daily demo sessions on Scaling Subversion for the 
Enterprise
<http://www.wandisco.com/training/webinars>

Reply via email to