On 03/04/12 17:25, Bernie Greenberg wrote:
While I still haven't got an answer from this list about whether it was
really true that one has to close (and not reuse) dataset objects after a
committed or aborted "write" transaction, I did get an answer from my code,
as it were, and a surprising one, at that.
I found that at app shutdown time, I had nothing (with respect to Jena) to
do, as I had already, in each thread which had created a dataset in
response to a request for some kind of service, closed that dataset. Since
the datasets were not drawn from an open "master" object representing the
open store, but from a static source, there is no "master" object to close
down.
If the threads all had their hands on persistent, open dataset objects,
which each (according to your documentation and my own experience) can only
be used in that one thread, I would have a difficult problem causing those
threads (which may be asleep in a web or other server) to wake up to close
the pointer (yes, there may be a "thread close time" hook or the like, but
as I have it, I don't need one).
This all seems consistent with what we have transacted here before and
consistent with my understanding of transaction semantics, and seems to
work; please let me know if you think I'm overlooking something.
Thanks
Bernie
When a (Java in-memory object for a) Dataset is used transactionally, it
must be used only transactionally. I think you are only using
transactions so no issues around here. People using datasets "old
world" non-transactionally get old-world semantics - they need to be
sync'ed.
There's no harm syncing a transactional dataset (it does not do anything).
With transactions, no clearup after .end() is needed. (and a writer
doing .commit()/.abort() don't require .end - it's better style to
always call .end() in a "finally{}2 though).
When .commit() happens, the journal is written (append only), with a
commit record. The changes are written to the main dataset at sometime
when it's quiet. It may be when the .commit() happens, it may not -
does not matter, the bytes are on-disk and the change is permanent.
Any transaction starting after the .commit sees the changes, either from
the real storage or the unflushed transaction state. The system handles
that.
If the app exits before the journal is fully written to the main dataset
(strictly - "is known to have been written back"), then on next startup,
the journal is flushed and the changes have become permanent in the main
storage.
If the system crashes during write-back, then the changes are still in
the journal - it just writes them again on next recovery. The key point
is that the journal contains the new state of the data (as blocks) and
not diffs. If it were diffs, then it would have to read the old state
to calculate the new state. By recording new state only, it can simply
keep trying to write until it succeeds regardless of power cycling and
crashes. The journal is a sequence of idempotent changes.
TDB uses write-ahead logging. There is nothing to do on abort except
forget about it. There are no undo actions, no write-behind logging.
Update of the storage is:
Write log to storage
sync the storage
Truncate log to zero.
It's the truncate that records the fact all transactions have been
flushed back to the real dataset.
Which means the app has no shutdown actions to do. Any running
transactions implicitly abort.
Andy