Hi, I think we all have more or less the same basic idea. Well, I would call it "Node" and not "Tree" because the term node more closely matches what we do. But first about what we seem to agree:
* A persistent and immutable data structure has many benefits (for example concurrency). * Mutable data strucutures also have benefits: it avoids creating lots of garbage and a new root node for each and every modification. I think we should use a structure that is immutable when reading, but mutable when writing. Write methods should return a new object (for example cloneAndAddChildNode), but only when necessary, that is, if the revision doesn't match. When the revision matches, the internal state is changed instead, and the cloneAndAddChildNode method returns 'this'. A first implementation is org.apache.jackrabbit.mk.simple.NodeImpl. > everyone has their own > MicroKernel implementation. The reasons why we have two implementations is, Stefan and I could not agree on a few implementation details. My view is: * Stefan assumes it's important to use the content hash as the node id, while I think this hurts performance too much (as the randomly distributed node ids hurt storage performance in Jackrabbit 2). With the copy-on-write model it is even a bigger performance problem because each time a node is changed it gets a new node id (I didn't think about this first). That means even small repositories are problematic if there are many changes. * Stefan implemented a different way to represent large child node lists. He used a h-tree (the hash code of the child node name), which means child nodes are pseudo-randomly distributed. I think this hurts performance too much (same reason as above: a stored index on randomly distributed keys is required). * Stefan believes concurrent write operations are very important while I believe this will not increase write throughput as much as it complicates the implementation (Stefan implementations merges changes while my implementation does not). * Stefans implementation re-constructs the journal by diffing the node tree. I believe this is troublesome because it is not always possible to re-construct the journal with regards to re-order and move operations. Also I believe this adds more complexity that the possible benefit (because the journal doesn't need to be stored). In my implementation the journal is stored. There is quite a lot of common code, for example the data store, parsing and constructing JSON / JSOP. My implementation consists of the classes in org.apache.jackrabbit.mk.simple. The classes NodeImpl, NodeList, NodeListSmall are used again in multiple different modules (the index and the wrapper implementations for example). So what is really unique in my "storage" implementation is the 3 NodeMap implementations, NodeListTrie to support large child node lists, the Revision class, and SimpleKernelImpl. Regards, Thomas