Re: Transactions versus threads

Paolo Castagna Wed, 28 Mar 2012 10:40:01 -0700

Hi Bernie,
I proposed an update for the current documentation, you can see a preview here:
http://jena.staging.apache.org/jena/documentation/tdb/tdb_transactions.html


It's just a small tiny change, taking content from Andy's email, but I hope it
will help users:

- Dataset dataset = ...
+ Location location = ... ;
+ Dataset dataset =  TDBFactory.create(location) ;

This is how TDBFactory.create(location) is implemented:

    public static Dataset createDataset(Location location)
    { return createDataset(createDatasetGraph(location)) ; }

... which calls:

    public static DataSource create(DatasetGraph dataset)
    { return (DataSource)DatasetImpl.wrap(dataset) ; }

... which at the end, currently, results to this in StoreConnection:

    public static synchronized StoreConnection make(Location location)
    {
        StoreConnection sConn = cache.get(location) ;
        if ( sConn != null )
            return sConn ;

        DatasetGraphTDB dsg = DatasetBuilderStd.build(location) ;
        sConn = _makeAndCache(dsg) ;
        return sConn ;
    }

You can have a look at it yourself, starting from here:
http://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/tags/jena-tdb-0.9.0-incubating/src/main/java/com/hp/hpl/jena/tdb/TDBFactory.java

By the way, the documentation is in SVN and patches for the website are more
than welcome! ;-) Have a look here in the content/jena/ directory:
http://svn.apache.org/repos/asf/incubator/jena/site/trunk/

Hopefully, creating a new Dataset each time for each of your threads is
the solution to your problems (and it you can also measure how fast/slow
it is to just create a new Dataset object :-)).

In the meantime, thanks for your time, patience and feedback.

Andy, are you reluctant to have TDBFactory.create(location) in the
documentation and/or is there a plan (I am not aware of) to change the
way to create Dataset objects in TDB? If that is the case, we could still
have TDBFactory.create(location) in the documentation, but add a NOTE/WARNING
that this can change.

Thanks,
Paolo

Bernie Greenberg wrote:
> This is really news.  Then what is the right-size object to hold around for
> an on-disk data store to represent an opened database, such that each call
> on the server doesn't have to Dataset dataset1 =
> TDBFactory.create(location) anew, or is that so cheap that each call on the
> server should do it to access the dataset?
> 
> On Wed, Mar 28, 2012 at 12:31 PM, Paolo Castagna <
> [email protected]> wrote:
> 
>> Hi Andy,
>> thanks for this reply, I find it very useful.
>>
>> As you said, our documentation is not that clear and perhaps we should
>> improve it with content from this email.
>>
>> In particular, I think having in the documentation:
>>
>>  Dataset dataset =  ... ;
>>
>> is not really helping users.
>>
>> Need to go off-line for a bit, but I'll propose changes to the document:
>> http://incubator.apache.org/jena/documentation/tdb/tdb_transactions.html
>>
>> If the problem is just in the documentation, it's much easier to fix. ;-)
>>
>> Thank again,
>> Paolo
>>
>> Andy Seaborne wrote:
>>> I can see a way it might go wrong if you are using the same dataset Java
>>> object in  "dataset.begin(READ)", "dataset.begin(WRITE)".  It will
>>> switch the first one to the second transactions, and that will trigger
>>> the concurrency check.
>>>
>>> (The documentation does not explain this, and indeed, is almost
>>> misleading on the subject.)
>>>
>>> Instead, a app idiom of one dateset per thread. The "Dataset" concept
>>> incorporates the JDBC connection concept so it's like Connection pools.
>>>
>>> I'll add some checking code as well.
>>>
>>>
>>> A way to use transactions that should work in the released system is to
>>> do this on one thread:
>>>
>>>    Dataset dataset1 = TDBFactory.create(location) ;
>>>    dataset1.begin(READ) ;
>>>    ...
>>>
>>> and on the other thread:
>>>
>>>    Dataset dataset2 = TDBFactory.create(location) ;
>>>    dataset2.begin(WRITE) ;
>>>    ...
>>>
>>> i.e. different dataset objects (they get backed by the same safe
>>> datastorage).
>>>
>>> I may be able to come up with a cleaner solution but I'd (Mildly) prefer
>>> not to use thread local variables as it's only a partial fix.
>>>
>>> Datasets are quite cheap to create and TDBFactory.create(location) gets
>>> all the caching thing right.
>>>
>>> If, for some reason, you are using in-memory TDB databases, then the use
>>> of "named memory locations" should work.  Location.mem("X").
>>>
>>>     Andy
>>
>

Re: Transactions versus threads

Reply via email to