Hi,
While working on a new transient space implementation [1] I discovered
that recreating a list of changes from transient modifications to the
hierarchy is like opening a can of worms. Rethinking things it turned
out, that it is much simpler to directly consolidate the change log.
This approach is much more flexible and versatile since knowledge of the
persisted hierarchy is not necessary at all. Thus this approach should
work equally well with change logs for jr3, the Microkernel, Jackrabbit
core, JCR and SPI.
The algorithm is capable of creating minimal change logs from any valid
change log (that is from any change log that can be applied to some
hierarchy).
I extensively tested the algorithm on randomly created change logs [3].
This resulted in some interesting numbers: consolidation results in an
decrease of about 45% of these randomly generated change logs.
Furthermore the resulting size of the H2 database files *decreased up to
200%*!
The implementation [2] is currently part of the jackrabbit-microkernel
module in the sandbox. However it does not have any crucial dependencies
so it should be easy to adapt for other usages.
The gory details of the algorithm are based on some algebraic properties
of move operations on paths as detailed in the following paragraphs.
The change log consolidation algorithm implemented in the reduce method
[2] is based on algebraic properties of move operations on paths. The
other operations (add node, remove node and set property) are
generalized to move operations. Consolidation relies on reduction and
commutation rules of move operations.
A move operation resembles a map on a hierarchy (of nodes and
properties). A change log consisting of k move operations m_1 to m_k is
thus the composition of the individual moves: m_1 *, ..., * m_k (using *
for function composition: f(g(x)) = (f * g)(x)).
Some definitions, notation and propositions:
* Let NIL denote a path which never occurs in any hierarchy.
* Order on paths: let p, q be paths.
- p < q iff p != NIL and q != NIL and p is an ancestor of q.
- p <= q iff p < q or p == q != NIL
* Conflict of paths: let p, q be paths.
- p ~ q (p conflicts with q) iff either p <= q or q <= p
* Substitution in paths: let p, q, r be paths.
- [p -> q]r = r if p is not an ancestor of r and
- [p -> q]r = s where s is the path resulting from replacing
the ancestor p in r with q otherwise.
* Let p, q be paths. Then p:q denotes a move operation where the
node at p is moved to a new node at q.
* Valid moves: leq p, q be paths.
- p:q is valid iff p !~ q or p = q
- if p:q is not valid, it is invalid
Invalid moves are exactly those which would result in a node being moved
to an ancestor/descendant of itself.
* Identity on moves: let p, q be paths.
- p:q = ID iff p = q.
* Conflict on moves: let p, q, r, s be paths.
- p:q ~ r:s (p:q conflicts with r:s) iff either p ~ r or p ~ s
or q ~ r or q ~ s.
* Strict commutativity of moves: let m, n be moves.
- m * n = n * m iff m !~ n
* Substitutions in moves: let p, q, r, s be paths.
- [p -> q]r:s = [p -> q]r:[p -> q]s
* Let p be a path and let +p denote an add node operation and let -p
denote a remove node operation for a node at path p.
- +p = NIL:p That is, adding a node is represented by a move from a
unknown source.
- p = p:NIL. That is, removing a node is represented by a move to an
unknown sink.
Let m = p:q, n = r:s with p, q, r, s != NIL be valid moves with m != ID
and n != ID. Then the following reduction and commutation rules apply:
1. p!~ r: m * n = n * m
2. p < r: illegal (since this implies q <= r which implies p ~ q and
thus m invalid)
3. p = r: illegal (since this implies q <= r which implies p ~ q and
this m invalid)
4. p > r: does not commute if q < s. Otherwise m * n = n * [r -> s]m
5. p!~ s: m * n = n * m
6. p < s: illegal (since this implies p ~ q and thus m invalid)
7. p = s: does not commute
8. p > s: illegal (since p > s implies there is an s already which
will conflict with r:s)
9. q!~ r: m * n = n * m
10. q < r: m * n = [q -> p]n * m
11. q = r: m * n = p:s (transitivity of moves)
12. q > r: m * n = n * [r -> s]m
13. q!~ s: m * n = n * m
14. q < s: does not commute if p > r. Otherwise m * n = [q -> p]n * m
15. q = s: illegal (since s conflicts with r:s)
16. q > s: illegal (since s conflicts with r:s)
Allowing add node and remove node operations the following additional
conditions apply:
Let m = p:q, n = r:s be valid moves with m != ID and n != ID. Then the
reduction and commutations rules 1. to 16. apply with extra conditions
on 4., 10., 12. and 14.:
4'. if s = NIL and q = NIL then m * n = -r. Otherwise if s = NIL then
m, n do not commute.
10'. illegal if p = NIL
12'. if s = NIL then m * n = -r * -p
14'. illegal if p = NIL
Michael
[1] http://markmail.org/message/qkkcvtmtapas2cx4
[2]
http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/main/java/org/apache/jackrabbit/state/ChangeLog.java?view=markup
[3]
http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/test/java/org/apache/jackrabbit/state/ChangeLogFuzzTest.java?view=markup