[jr3 microkernel] Change log consolidation

Michael Dürig Fri, 03 Feb 2012 10:45:04 -0800

Hi,

While working on a new transient space implementation [1] I discoveredthat recreating a list of changes from transient modifications to thehierarchy is like opening a can of worms. Rethinking things it turnedout, that it is much simpler to directly consolidate the change log.This approach is much more flexible and versatile since knowledge of thepersisted hierarchy is not necessary at all. Thus this approach shouldwork equally well with change logs for jr3, the Microkernel, Jackrabbitcore, JCR and SPI.

The algorithm is capable of creating minimal change logs from any validchange log (that is from any change log that can be applied to somehierarchy).

I extensively tested the algorithm on randomly created change logs [3].This resulted in some interesting numbers: consolidation results in andecrease of about 45% of these randomly generated change logs.Furthermore the resulting size of the H2 database files *decreased up to200%*!

The implementation [2] is currently part of the jackrabbit-microkernelmodule in the sandbox. However it does not have any crucial dependenciesso it should be easy to adapt for other usages.

The gory details of the algorithm are based on some algebraic propertiesof move operations on paths as detailed in the following paragraphs.

The change log consolidation algorithm implemented in the reduce method[2] is based on algebraic properties of move operations on paths. Theother operations (add node, remove node and set property) aregeneralized to move operations. Consolidation relies on reduction andcommutation rules of move operations.

A move operation resembles a map on a hierarchy (of nodes andproperties). A change log consisting of k move operations m_1 to m_k isthus the composition of the individual moves: m_1 *, ..., * m_k (using *for function composition: f(g(x)) = (f * g)(x)).


Some definitions, notation and propositions:

* Let NIL denote a path which never occurs in any hierarchy.

* Order on paths: let p, q be paths.
  - p < q iff p != NIL and q != NIL and p is an ancestor of q.
  - p <= q iff p < q or p == q != NIL

* Conflict of paths: let p, q be paths.
  - p ~ q (p conflicts with q) iff either p <= q or q <= p

* Substitution in paths: let p, q, r be paths.
  - [p -> q]r = r if p is not an ancestor of r and
  - [p -> q]r = s where s is the path resulting from replacing
    the ancestor p in r with q otherwise.

* Let p, q be paths. Then p:q denotes a move operation where the
  node at p is moved to a new node at q.

* Valid moves: leq p, q be paths.
  - p:q is valid iff p !~ q or p = q
  - if p:q is not valid, it is invalid

Invalid moves are exactly those which would result in a node being movedto an ancestor/descendant of itself.


* Identity on moves: let p, q be paths.
  - p:q = ID iff p = q.

* Conflict on moves: let p, q, r, s be paths.
  - p:q ~ r:s (p:q conflicts with r:s) iff either p ~ r or p ~ s
    or q ~ r or q ~ s.

* Strict commutativity of moves: let m, n be moves.
  - m * n = n * m iff m !~ n

* Substitutions in moves: let p, q, r, s be paths.
  - [p -> q]r:s = [p -> q]r:[p -> q]s

* Let p be a path and let +p denote an add node operation and let -p
  denote a remove node operation for a node at path p.
  - +p = NIL:p That is, adding a node is represented by a move from a
    unknown source.
  - p = p:NIL. That is, removing a node is represented by a move to an
    unknown sink.

Let m = p:q, n = r:s with p, q, r, s != NIL be valid moves with m != IDand n != ID. Then the following reduction and commutation rules apply:


1.  p!~ r:  m * n = n * m
2.  p < r:  illegal (since this implies q <= r which implies p ~ q and
            thus m invalid)
3.  p = r:  illegal (since this implies q <= r which implies p ~ q and
            this m invalid)
4.  p > r:  does not commute if q < s. Otherwise m * n = n * [r -> s]m
5.  p!~ s:  m * n = n * m
6.  p < s:  illegal (since this implies p ~ q and thus m invalid)
7.  p = s:  does not commute
8.  p > s:  illegal (since p > s implies there is an s already which
            will conflict with r:s)
9.  q!~ r:  m * n = n * m
10. q < r:  m * n = [q -> p]n * m
11. q = r:  m * n = p:s (transitivity of moves)
12. q > r:  m * n = n * [r -> s]m
13. q!~ s:  m * n = n * m
14. q < s:  does not commute if p > r. Otherwise m * n = [q -> p]n * m
15. q = s:  illegal (since s conflicts with r:s)
16. q > s:  illegal (since s conflicts with r:s)

Allowing add node and remove node operations the following additionalconditions apply:

Let m = p:q, n = r:s be valid moves with m != ID and n != ID. Then thereduction and commutations rules 1. to 16. apply with extra conditionson 4., 10., 12. and 14.:


4'.  if s = NIL and q = NIL then m * n = -r. Otherwise if s = NIL then
     m, n do not commute.
10'. illegal if p = NIL
12'. if s = NIL then m * n = -r * -p
14'. illegal if p = NIL

Michael


[1] http://markmail.org/message/qkkcvtmtapas2cx4

[2]http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/main/java/org/apache/jackrabbit/state/ChangeLog.java?view=markup[3]http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/test/java/org/apache/jackrabbit/state/ChangeLogFuzzTest.java?view=markup

[jr3 microkernel] Change log consolidation

Reply via email to