== Current Behavior ==

Currently Jackrabbit tries to merge changes when two sessions
add/change/remove different properties concurrently on the same node.
As far as I understand, Jackrabbit merges changes by looking at the
data (baseline, currently stored, and new). The same for child nodes:
when two sessions add different child nodes concurrently, both child
nodes are added.

There are some problems, for example (when using b-tree mechanisms for
child nodes) when a session added child nodes that caused the child
node list to split, and a second session adds a different child node
(possibly causing a different split). For the second session it looks
like some child nodes have been removed, and it would add the child
node on the wrong (b-tree) level (in the inner node instead in the
leave node).

I think merging changes is problematic. Trying to derive the logical
operation from "diffing" the old and new versions is sometimes very
hard. I suggest to merge changes in a different way.

== Proposed Solution ==

When adding/changing/removing a property or node, the logical
operation should be recorded on a high level ("this node was added",
"this node was moved from here to there", "this property was added"),
first in memory, but when there are changes, it needs to be persisted
(possibly only temporarily).

When committing a transaction (usually Session.save()), the
micro-kernel tries to apply the changes. If there was a conflict, the
micro-kernel rejects the changes (it doesn't try to merge). The higher
level then has to deal with that. One way to deal with conflict
resolution is:

1) Reload the current persistent state (undo all changes, load the new data).

2) Replay the logical operations from the (in-memory or persisted) journal.

3) If that fails again, depending on a timeout, go to 1) or fail.

What I describe here is how I understand MVCC
http://en.wikipedia.org/wiki/Multiversion_concurrency_control - "every
object would also have a read timestamp, and if a transaction Ti
wanted to write to object P, and the timestamp of that transaction is
earlier than the object's read timestamp (TS(Ti) < RTS(P)), the
transaction Ti is aborted and restarted." So Jackrabbit would record
the 'transaction Ti' on a higher level. If applying the changes fails
(in the micro-kernel), Jackrabbit would automatically restart this
transaction (up to a timeout).

This should also work well in a distributed environment. This case is
similar synchronizing databases.

== API ==

Instead of the current API that requires the change log to be in
memory, I suggest to use iterators:

void store(Iterator<Bundle> newBundles, Iterator<Event> events) throws
ConcurrentUpdateException

The ChangeLog consists of the new node bundles (plus, for each node
bundle, the read timestamp). The event list consists of the
EventJournal entries. For smaller operations, a session can keep the
event journal in memory. For larger operations, the session can use a
temporary file, or possibly store the data in a temporary area within
the persistence layer (maybe using a different API).

If the operation fails, the session would reload all bundles, and
re-apply the events stored in his own local event log.

Regards,
Thomas

Reply via email to