Hi, I like to better understand the reasons for NGP. I found the following issues in JIRA, but I think most of those problems can be solved even without NGP. Are there any other issues to consider (and issues without JIRA entry)?
http://issues.apache.org/jira/browse/JCR-314 Allow concurrent writes on the PM. The root problem seems to be: storing large binary objects blocks others? http://issues.apache.org/jira/browse/JCR-926 Global data store for binaries (stream large objects early without blocking others) http://issues.apache.org/jira/browse/JCR-926 Multiple connections problem / Versioning operations. Could be solved by using the same connection for versioning. https://issues.apache.org/jira/browse/JCR-630 Versioning operations are not fully transactional. Could be solved by using the same connection for versioning. http://issues.apache.org/jira/browse/JCR-631 Change resources sequence during transaction commit. Could be solved by using the same connection for versioning. http://issues.apache.org/jira/browse/JCR-890 Concurrent read-only access to a session Unrelated (multiple threads in one session, I would use synchronize) http://issues.apache.org/jira/browse/JCR-851 Handling of binary properties (streams) in QValue interface: unrelated to this discussion, SPI specific I didn't find an open issue for: The search index is updated outside of transactions. This doesn't feel right (I like consistency), but in practice this is not a problem as long as all saved objects are in the index: the query engine filters non-existing results. Is this correct? What do you think about using the same connection for versioning and regular access? I know it requires refactoring, and a new setting in repository.xml. Anything else? I found some more information about MVCC. It looks like PostgreSQL, Oracle, and newer versions of MS-SQL Server work like this: - Reading: read the 'base revision of the session' (writers don't block readers) - Writing: lock the node for other writers, creates a new 'version' Using write locks avoids the following problem: - Session A starts a transaction, updates Node 1 (x=4) - Session B starts a transaction, updates Node 1 (x=5), commits (saves) - Session A does some more work, tries to commit -> Exception Theoretically, session A should catch the exception and retry. But many applications expect it to work (it works now). Also, retrying will not work if the transaction is long and Node 1 is updated a lot by other sessions (let's say it a counter). That's why I would use locks for writes. MVCC is used for reading, so readers don't block writers (like they do now?), resulting in good concurrency for most situations. Explicit write locks: Sometimes an application doesn't need to update a node but wants to ensure it's not updated by somebody else. This feature is not that important; in databases, this is SELECT ... FOR UPDATE, and most people don't really need it. This case is not documented in the JCR API specs, but Jackrabbit could add a write lock when calling Item.save() (even when no changes are made). Thomas P.S. If somebody wants to cross-post it to Lucene and Derby, feel free. I think the requirements of Lucene and Derby are different, but I might be wrong.
