Re: TDB Reliability

Andy Seaborne Sun, 29 May 2011 11:53:04 -0700


On 28/05/11 08:30, Al Baker wrote:

I've been testing TDB for a while, and am very impressed with its
performance.  However, I do see the various emails on the mailing lists
warning of touching the files while an application with TDB is open
(presumably with an open Jena Model attached to the TDB directory).

What kind of reliability does TDB have to survive a power hit or application
crash?

Are there some steps to take consistent and regular backups to mitigate any
issues?

Basically looking to have some level of confidence that I can use TDB in
production, take a reasonable amount of steps to insure reliability, and be
confident that I'll always either have a valid TDB store, or a way to
incrementally backup/rollback in the case of a severe crash/file system
error.

Thanks,
Al Baker


Hi Al,

Currently, TDB provides some update capabilities but relies on theapplication maintaining MRSW (Multiple Reader Or Single Writer)concurrency semantics together with a clean shutdown. Many of thereports are due to letting two writers access the database at the sametime or crashes without ensuring a sync() is done which currently isimportant for updates.

For read-only usage, the database is safe - it is modified orreorganised by reads so loss of machines or applications does not damagethe on-disk database.

TDB is an in-process database - one JVM controls the database. Havingtwo managing the files also will cause damage.

You can backup a database by copy but only from a running system if youco-ordinate with a sync() which makes the on-disk structures consistent.Stopping the DB is better and is needed on some OS's but dumping toN-Quads can be done on a live database.

For updates, there are periods of vulnerability. This is beingaddressed by adding ACID transactions to TDB. The transaction system isbased on write-ahead logging; read requests go straight to the DB asbefore so performance there will be unchanged.

The disk format is (probably) going to be unchanged. There are someimprovements that can be made but they aren't necessary.

The bulk loader used to build a database from scratch will provide thebest load performance. It will remain non-transactional. Transactionswill be aimed at non-bulk updates. Where the practical boundary will bewill emerge in testing.

The transaction work is active-work-in-progress [*] but I'm not going togive specific release schedules except to say that as an open sourceproject, "release early, release often" of development versions will happen.


        Andy

[*] Indeed, I'm writing a journaled file abstraction at this moment.

Re: TDB Reliability

Reply via email to