Re: Next Generation Persistence

Thomas Mueller Thu, 14 Jun 2007 09:17:09 -0700

Hi,

I like to better understand the reasons for NGP. I found the following
issues in JIRA, but I think most of those problems can be solved even
without NGP. Are there any other issues to consider (and issues
without JIRA entry)?


http://issues.apache.org/jira/browse/JCR-314
Allow concurrent writes on the PM. The root problem seems to be:
storing large binary objects blocks others?

http://issues.apache.org/jira/browse/JCR-926
Global data store for binaries (stream large objects early without
blocking others)

http://issues.apache.org/jira/browse/JCR-926
Multiple connections problem / Versioning operations.
Could be solved by using the same connection for versioning.

https://issues.apache.org/jira/browse/JCR-630
Versioning operations are not fully transactional.
Could be solved by using the same connection for versioning.

http://issues.apache.org/jira/browse/JCR-631
Change resources sequence during transaction commit.
Could be solved by using the same connection for versioning.

http://issues.apache.org/jira/browse/JCR-890
Concurrent read-only access to a session
Unrelated (multiple threads in one session, I would use synchronize)

http://issues.apache.org/jira/browse/JCR-851
Handling of binary properties (streams) in QValue interface: unrelated
to this discussion, SPI specific

I didn't find an open issue for: The search index is updated outside
of transactions. This doesn't feel right (I like consistency), but in
practice this is not a problem as long as all saved objects are in the
index: the query engine filters non-existing results. Is this correct?

What do you think about using the same connection for versioning and
regular access? I know it requires refactoring, and a new setting in
repository.xml. Anything else?

I found some more information about MVCC. It looks like PostgreSQL,
Oracle, and newer versions of MS-SQL Server work like this:

- Reading: read the 'base revision of the session' (writers don't block readers)
- Writing: lock the node for other writers, creates a new 'version'

Using write locks avoids the following problem:

- Session A starts a transaction, updates Node 1 (x=4)
- Session B starts a transaction, updates Node 1 (x=5), commits (saves)
- Session A does some more work, tries to commit -> Exception

Theoretically, session A should catch the exception and retry. But
many applications expect it to work (it works now). Also, retrying
will not work if the transaction is long and Node 1 is updated a lot
by other sessions (let's say it a counter). That's why I would use
locks for writes. MVCC is used for reading, so readers don't block
writers (like they do now?), resulting in good concurrency for most
situations.

Explicit write locks: Sometimes an application doesn't need to update
a node but wants to ensure it's not updated by somebody else. This
feature is not that important; in databases, this is SELECT ... FOR
UPDATE, and most people don't really need it. This case is not
documented in the JCR API specs, but Jackrabbit could add a write lock
when calling Item.save() (even when no changes are made).

Thomas

P.S. If somebody wants to cross-post it to Lucene and Derby, feel
free. I think the requirements of Lucene and Derby are different, but
I might be wrong.

Re: Next Generation Persistence

Reply via email to