On 28/05/11 08:30, Al Baker wrote:
I've been testing TDB for a while, and am very impressed with its
performance.  However, I do see the various emails on the mailing lists
warning of touching the files while an application with TDB is open
(presumably with an open Jena Model attached to the TDB directory).

What kind of reliability does TDB have to survive a power hit or application
crash?

Are there some steps to take consistent and regular backups to mitigate any
issues?

Basically looking to have some level of confidence that I can use TDB in
production, take a reasonable amount of steps to insure reliability, and be
confident that I'll always either have a valid TDB store, or a way to
incrementally backup/rollback in the case of a severe crash/file system
error.

Thanks,
Al Baker

Hi Al,

Currently, TDB provides some update capabilities but relies on the application maintaining MRSW (Multiple Reader Or Single Writer) concurrency semantics together with a clean shutdown. Many of the reports are due to letting two writers access the database at the same time or crashes without ensuring a sync() is done which currently is important for updates.

For read-only usage, the database is safe - it is modified or reorganised by reads so loss of machines or applications does not damage the on-disk database.

TDB is an in-process database - one JVM controls the database. Having two managing the files also will cause damage.

You can backup a database by copy but only from a running system if you co-ordinate with a sync() which makes the on-disk structures consistent. Stopping the DB is better and is needed on some OS's but dumping to N-Quads can be done on a live database.

For updates, there are periods of vulnerability. This is being addressed by adding ACID transactions to TDB. The transaction system is based on write-ahead logging; read requests go straight to the DB as before so performance there will be unchanged.

The disk format is (probably) going to be unchanged. There are some improvements that can be made but they aren't necessary.

The bulk loader used to build a database from scratch will provide the best load performance. It will remain non-transactional. Transactions will be aimed at non-bulk updates. Where the practical boundary will be will emerge in testing.

The transaction work is active-work-in-progress [*] but I'm not going to give specific release schedules except to say that as an open source project, "release early, release often" of development versions will happen.

        Andy

[*] Indeed, I'm writing a journaled file abstraction at this moment.

Reply via email to