On 03/04/12 17:25, Bernie Greenberg wrote:
While I still haven't got an answer from this list about whether it was
really true that one has to close (and not reuse) dataset objects after a
committed or aborted "write" transaction, I did get an answer from my code,
as it were, and a surprising one, at that.

I found that at app shutdown time, I had nothing (with respect to Jena) to
do, as I had already, in each thread which had created a dataset in
response to a request for some kind of service, closed that dataset. Since
the datasets were not drawn from an open "master" object representing the
open store, but from a static source, there is no "master" object to close
down.

If the threads all had their hands on persistent, open dataset objects,
which each (according to your documentation and my own experience) can only
be used in that one thread, I would have a difficult problem causing those
threads (which may be asleep in a web or other server) to wake up to close
the pointer (yes, there may be a "thread close time" hook or the like, but
as I have it, I don't need one).

This all seems consistent with what we have transacted here before and
consistent with my understanding of transaction semantics, and seems to
work; please let me know if you think I'm overlooking something.

Thanks
Bernie


When a (Java in-memory object for a) Dataset is used transactionally, it must be used only transactionally. I think you are only using transactions so no issues around here. People using datasets "old world" non-transactionally get old-world semantics - they need to be sync'ed.

There's no harm syncing a transactional dataset (it does not do anything).

With transactions, no clearup after .end() is needed. (and a writer doing .commit()/.abort() don't require .end - it's better style to always call .end() in a "finally{}2 though).

When .commit() happens, the journal is written (append only), with a commit record. The changes are written to the main dataset at sometime when it's quiet. It may be when the .commit() happens, it may not - does not matter, the bytes are on-disk and the change is permanent.

Any transaction starting after the .commit sees the changes, either from the real storage or the unflushed transaction state. The system handles that.

If the app exits before the journal is fully written to the main dataset (strictly - "is known to have been written back"), then on next startup, the journal is flushed and the changes have become permanent in the main storage.

If the system crashes during write-back, then the changes are still in the journal - it just writes them again on next recovery. The key point is that the journal contains the new state of the data (as blocks) and not diffs. If it were diffs, then it would have to read the old state to calculate the new state. By recording new state only, it can simply keep trying to write until it succeeds regardless of power cycling and crashes. The journal is a sequence of idempotent changes.

TDB uses write-ahead logging. There is nothing to do on abort except forget about it. There are no undo actions, no write-behind logging.

Update of the storage is:
  Write log to storage
  sync the storage
  Truncate log to zero.

It's the truncate that records the fact all transactions have been flushed back to the real dataset.

Which means the app has no shutdown actions to do. Any running transactions implicitly abort.

        Andy






Reply via email to