[jira] [Issue Comment Edited] (JENA-41) Different policy for concurrency access in TDB supporting a single writer and multiple readers

Stephen Allen (JIRA) Mon, 21 Mar 2011 15:15:50 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009433#comment-13009433
 ]


Stephen Allen edited comment on JENA-41 at 3/21/11 10:14 PM:
-------------------------------------------------------------

I think your idea about the DatasetGraph being the interface for transactions 
makes sense.  Transactional DatasetGraphs could also provide fallback behavior 
for legacy code by implementing autocommit transactions if the user called 
methods on a dataset that was not initialized in a transactionBegin() call.


With regard to the isolation levels, I believe some of the lower levels can 
make sense for particular applications or queries.  For example say you want to 
know the size of a few of graphs.

BEGIN READ_ONLY;
select count (*) where { graph <http://example/g1> { ?s ?p ?o . } } ;
select count (*) where { graph <http://example/g2> { ?s ?p ?o . } } ;
COMMIT;

Assuming a traditional pessimistic locking scheme, running the transaction at 
SERIALIZABLE could cause the locks held by the first select query to also be 
held through the second query, reducing concurrency (using two transactions 
instead might not be a good idea as there is usually some amount of overhead 
associated with creating and committing transactions).

If you were OK with the possibility that the two query results are not truly 
serializable with respect to each other, then you could improve concurrency by 
using a READ_COMMITTED isolation level instead that would give serializable 
results for each query (but not the whole transaction).  And if you really just 
needed a rough estimate of size, using READ_UNCOMMITTED may be able to avoid 
locking all together.

An additional motivating factor for MVCC implementations is that they may be 
implementing snapshot isolation, which probably maps better to REPEATABLE_READ 
than SERIALIZABLE (especially if it could do predicate locking for true 
serializable behavior but allow cheaper snapshot isolation if that was all that 
was needed).  The Postgres documentation does a good job of describing this [1].

I would find it useful to have multiple isolation levels available (even if 
internally I'm mapping them all to SERIALIZABLE at first).  The four ANSI 
Isolation levels seem appropriate, and remember that implementations are 
allowed to map unavailable lower levels to higher levels as desired.


[1] http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html



      was (Author: sallen):
    I think your idea about the DatasetGraph being the interface for 
transactions makes sense.  Transactional DatasetGraphs could also provide 
fallback behavior for legacy code by implementing autocommit transactions if 
the user called methods on a dataset that was not initialized in a 
transactionBegin() call.


With regard to the isolation levels, I believe some of the lower levels can 
make sense for particular applications or queries.  For example say you want to 
know the size of a few of graphs.

BEGIN READ_ONLY;
select count (*) where { graph <http://example/g1> { ?s ?p ?o . } } ;
select count (*) where { graph <http://example/g2> { ?s ?p ?o . } } ;
COMMIT;

Assuming a traditional pessimistic locking scheme, running the transaction at 
SERIALIZABLE could cause the locks held by the first select query to also be 
held through the second query, reducing concurrency (using two transactions 
instead might not be a good idea as there is usually some amount of overhead 
associated with creating and committing transactions).

If you were OK with the possibility that the two query results are not truly 
serializable with respect to each other, then you could improve concurrency by 
using a READ_COMMITTED isolation level instead that would give serializable 
results for each query (but not the whole transaction).  And if you really just 
needed a rough estimate of size, using READ_UNCOMMITTED may be able to avoid 
locking all together.

An additional motivating factor for MVCC implementations is that they may be 
implementing snapshot isolation, which probably maps better to READ_COMMITTED 
than SERIALIZABLE (especially if it could do predicate locking for true 
serializable behavior but allow cheaper snapshot isolation if that was all that 
was needed).  The Postgres documentation does a good job of describing this [1].

I would find it useful to have multiple isolation levels available (even if 
internally I'm mapping them all to SERIALIZABLE at first).  The four ANSI 
Isolation levels seem appropriate, and remember that implementations are 
allowed to map unavailable lower levels to higher levels as desired.


[1] http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html


  
> Different policy for concurrency access in TDB supporting a single writer and 
> multiple readers
> ----------------------------------------------------------------------------------------------
>
>                 Key: JENA-41
>                 URL: https://issues.apache.org/jira/browse/JENA-41
>             Project: Jena
>          Issue Type: New Feature
>          Components: Fuseki, TDB
>            Reporter: Paolo Castagna
>         Attachments: Transaction.java, TransactionHandle.java, 
> TransactionHandler.java, TransactionManager.java, 
> TransactionManagerBase.java, TransactionalDatasetGraph.java
>
>
> As a follow up to a discussion about "Concurrent updates in TDB" [1] on the 
> jena-users mailing list, I am creating this as a new feature request.
> Currently TDB requires developers to use a Multiple Reader or Single Writer 
> (MRSW) locking policy for concurrency access [2]. Not doing this could cause 
> data corruptions.
> The MRSW is indeed a MR xor SW (i.e. while a writer has a lock, no readers 
> are allowed and, similarly, if a reader has a lock, no writes are possible).
> This works fine in most of the situation, but there might be problems in 
> presence of long writes or long reads.
> It has been suggested that a "journaled file access" could be used to solve 
> the issue regarding a long write blocking reads.
>  [1] http://markmail.org/message/jnqm6pn32df4wgte
>  [2] http://openjena.org/wiki/TDB/JavaAPI#Concurrency

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (JENA-41) Different policy for concurrency access in TDB supporting a single writer and multiple readers

Reply via email to