On 09/10/16 02:10, Stian Soiland-Reyes (JIRA) wrote:
Stian Soiland-Reyes created COMMONSRDF-45:
---------------------------------------------

             Summary: Support longer-running RDF4J transactions?
                 Key: COMMONSRDF-45
                 URL: https://issues.apache.org/jira/browse/COMMONSRDF-45
             Project: Apache Commons RDF
          Issue Type: Wish
          Components: rdf4j
    Affects Versions: 0.3.0
            Reporter: Stian Soiland-Reyes


RDF4J operations like Graph.add() uses an internal RepositoryConnection that is 
closed on every method call.

c.f. HTTP.


This could cause a performance hit.

This task is to investigate how big that hit is for different backends, and to 
propose an alternative to support more longer-running transactions.

For instance, one alternative would be:

{code:
Dataset g = rdf4j.createDataset();

try (TransactionalDataset t = g.begin()) {
  t.add(triple1);
  t.add(triple2);
  t.remove(triple3);

  t.commit();
  // or
  // t.abort()
}

A Java8 approach:

http://jena.staging.apache.org/documentation/txn/txn.html

It works on anything that is "transactional" rather than tying to dataset (or graph? actually enforcing the unit to be dataset makes sense).

where TransactionalDataset is subtype of Dataset with a shared connection. Here 
modifications in t won't be visible in g before the commit - but I guess we 
could expose some of the different transaction isolation levels from RDF4J.

If the goal of CommonsRDF is common function across many systems, then adopting a simple model for transactions would be better than trying to reconcile all possibilities, now and that may come along.

This could call abort() if an exception is thrown. Perhaps .commit() would be the default 
if everything is OK and so not needed explicitly - but that could cause unnecessary 
"empty commits" for read-only transactions (e.g. using .contains() or 
.iterate())

Surely that's an implementation issue? - no change => very little work on commit. Read-only transactions in the implementation have a faster-path commit (clearup). In Jena (TIM and TDB), read transactions, or unpromoted general transactions, are very low exit cost (often a ThreadLocal unset). In lock-based systems (TIM and TDB are lock free transactions) there might be a bit more in releasing all the latches.

Txn implicitly commits and copes with explicit abort.

Iteration is fun. It's a great way to leak state out of the transaction context.

    Andy






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to