== Current Behavior == Currently Jackrabbit tries to merge changes when two sessions add/change/remove different properties concurrently on the same node. As far as I understand, Jackrabbit merges changes by looking at the data (baseline, currently stored, and new). The same for child nodes: when two sessions add different child nodes concurrently, both child nodes are added.
There are some problems, for example (when using b-tree mechanisms for child nodes) when a session added child nodes that caused the child node list to split, and a second session adds a different child node (possibly causing a different split). For the second session it looks like some child nodes have been removed, and it would add the child node on the wrong (b-tree) level (in the inner node instead in the leave node). I think merging changes is problematic. Trying to derive the logical operation from "diffing" the old and new versions is sometimes very hard. I suggest to merge changes in a different way. == Proposed Solution == When adding/changing/removing a property or node, the logical operation should be recorded on a high level ("this node was added", "this node was moved from here to there", "this property was added"), first in memory, but when there are changes, it needs to be persisted (possibly only temporarily). When committing a transaction (usually Session.save()), the micro-kernel tries to apply the changes. If there was a conflict, the micro-kernel rejects the changes (it doesn't try to merge). The higher level then has to deal with that. One way to deal with conflict resolution is: 1) Reload the current persistent state (undo all changes, load the new data). 2) Replay the logical operations from the (in-memory or persisted) journal. 3) If that fails again, depending on a timeout, go to 1) or fail. What I describe here is how I understand MVCC http://en.wikipedia.org/wiki/Multiversion_concurrency_control - "every object would also have a read timestamp, and if a transaction Ti wanted to write to object P, and the timestamp of that transaction is earlier than the object's read timestamp (TS(Ti) < RTS(P)), the transaction Ti is aborted and restarted." So Jackrabbit would record the 'transaction Ti' on a higher level. If applying the changes fails (in the micro-kernel), Jackrabbit would automatically restart this transaction (up to a timeout). This should also work well in a distributed environment. This case is similar synchronizing databases. == API == Instead of the current API that requires the change log to be in memory, I suggest to use iterators: void store(Iterator<Bundle> newBundles, Iterator<Event> events) throws ConcurrentUpdateException The ChangeLog consists of the new node bundles (plus, for each node bundle, the read timestamp). The event list consists of the EventJournal entries. For smaller operations, a session can keep the event journal in memory. For larger operations, the session can use a temporary file, or possibly store the data in a temporary area within the persistence layer (maybe using a different API). If the operation fails, the session would reload all bundles, and re-apply the events stored in his own local event log. Regards, Thomas