Re: Next Generation Persistence

Thomas Mueller Mon, 11 Jun 2007 09:38:18 -0700

Hi,

It sounds like MVCC (multi version concurrency control) in databases.
Question: in your view, do we need to change the PersistenceManager
API as well?


* Removing nodes *
Last section: when removing B, the child C is removed as well. Is it
important to say 'remove C', and not 'remove B including children' in
the revision? What would happen if another session would add a child D
to C in the meantime, and commit this change? If there is no locking,
how to do large deletes / updates?

* Revision scope *
In my opinion, the scope should be the entire repository.

* Base Revision *
The text talks about 'the base revision of the session' (can be
updated in some cases). But to support  Item.refresh, a session needs
to keep multiple base revisions: one for the session, and one for each
refreshed Item (including children)?

The base revision of a session can optionally be changed when more recent 
revisions are persisted during the session lifetime.

Does 'optionally' mean it is a setting of the session? Is there a JCR
API feature to set this option?

* Persisting a Subtree *

If the operation fails, then the two new revisions are discarded and no changes 
are made to the session.

Item.save says: 'If validation fails, then no pending changes are
saved and they remain recorded on the Session'

* Workspace Operations *

If the operation succeeds, the session is updated to use the persisted revision 
as the new base revision.

I think the base revision of the session should not be updated in this case.

* Transactions *

This model can also easily support two-phase commits in a distributed 
transaction.

I agree, two-phase commit is no problem. However XAResource.recover
(obtains the list of prepared transactions) is tricky to implement. As
far as I know, is currently not supported. If we want it in the
future, we better think early how to implement it. I suggest we
describe the problem and possible solutions, but wait with the
implementation until required.

* Namespace and Node Type Management *
Namespace and node type management in jcr:system: Good idea! However
without custom data structures it will be slow (if jcr:system subtree
is read whenever a namespace is resolved). What about custom data
structures that cache the latest state (and listen for changes to the
relevant subtree)? I guess only the latest version is relevant, or is
there a situation where older versions of node types / namespaces are
required? If yes, things will be complicated.

* Internal Data Structures *
I don't fully understand this section. I think this section should be
extended. In databases, the approach is usually:

A: There is a main store (where the 'base' revision of items are kept,
indexed by item).
B: Committed revisions are stored sequentially in the redo log. Can
only be read sequentially.
C: Draft and old revisions are mainly kept in-memory (saved to disk
only if no space).
D: Each session keeps an undo log for uncommitted changes.
E: Committed revisions are persisted in the main store, and if session
references an older revision of the same item, an in-memory copy is
made (copy on write, see C).

Of course we don't need to rebuild a database, but maybe reuse some ideas.

* Combined Revisions *
I don't understand why to do that, but probably because you have
different internal data structures in mind.



When to send large objects to the server: There are two use cases:
- Client is far away: keep changes on the client as long as possible,
send batches
- Client is close by: avoid temporary copies of large (binary) data on
the client
Both cases should be supported, but which one is more important?

Thomas

Re: Next Generation Persistence

Reply via email to