Hi, It sounds like MVCC (multi version concurrency control) in databases. Question: in your view, do we need to change the PersistenceManager API as well?
* Removing nodes * Last section: when removing B, the child C is removed as well. Is it important to say 'remove C', and not 'remove B including children' in the revision? What would happen if another session would add a child D to C in the meantime, and commit this change? If there is no locking, how to do large deletes / updates? * Revision scope * In my opinion, the scope should be the entire repository. * Base Revision * The text talks about 'the base revision of the session' (can be updated in some cases). But to support Item.refresh, a session needs to keep multiple base revisions: one for the session, and one for each refreshed Item (including children)?
The base revision of a session can optionally be changed when more recent revisions are persisted during the session lifetime.
Does 'optionally' mean it is a setting of the session? Is there a JCR API feature to set this option? * Persisting a Subtree *
If the operation fails, then the two new revisions are discarded and no changes are made to the session.
Item.save says: 'If validation fails, then no pending changes are saved and they remain recorded on the Session' * Workspace Operations *
If the operation succeeds, the session is updated to use the persisted revision as the new base revision.
I think the base revision of the session should not be updated in this case. * Transactions *
This model can also easily support two-phase commits in a distributed transaction.
I agree, two-phase commit is no problem. However XAResource.recover (obtains the list of prepared transactions) is tricky to implement. As far as I know, is currently not supported. If we want it in the future, we better think early how to implement it. I suggest we describe the problem and possible solutions, but wait with the implementation until required. * Namespace and Node Type Management * Namespace and node type management in jcr:system: Good idea! However without custom data structures it will be slow (if jcr:system subtree is read whenever a namespace is resolved). What about custom data structures that cache the latest state (and listen for changes to the relevant subtree)? I guess only the latest version is relevant, or is there a situation where older versions of node types / namespaces are required? If yes, things will be complicated. * Internal Data Structures * I don't fully understand this section. I think this section should be extended. In databases, the approach is usually: A: There is a main store (where the 'base' revision of items are kept, indexed by item). B: Committed revisions are stored sequentially in the redo log. Can only be read sequentially. C: Draft and old revisions are mainly kept in-memory (saved to disk only if no space). D: Each session keeps an undo log for uncommitted changes. E: Committed revisions are persisted in the main store, and if session references an older revision of the same item, an in-memory copy is made (copy on write, see C). Of course we don't need to rebuild a database, but maybe reuse some ideas. * Combined Revisions * I don't understand why to do that, but probably because you have different internal data structures in mind. When to send large objects to the server: There are two use cases: - Client is far away: keep changes on the client as long as possible, send batches - Client is close by: avoid temporary copies of large (binary) data on the client Both cases should be supported, but which one is more important? Thomas