On 28/05/11 08:30, Al Baker wrote:
I've been testing TDB for a while, and am very impressed with its
performance. However, I do see the various emails on the mailing lists
warning of touching the files while an application with TDB is open
(presumably with an open Jena Model attached to the TDB directory).
What kind of reliability does TDB have to survive a power hit or application
crash?
Are there some steps to take consistent and regular backups to mitigate any
issues?
Basically looking to have some level of confidence that I can use TDB in
production, take a reasonable amount of steps to insure reliability, and be
confident that I'll always either have a valid TDB store, or a way to
incrementally backup/rollback in the case of a severe crash/file system
error.
Thanks,
Al Baker
Hi Al,
Currently, TDB provides some update capabilities but relies on the
application maintaining MRSW (Multiple Reader Or Single Writer)
concurrency semantics together with a clean shutdown. Many of the
reports are due to letting two writers access the database at the same
time or crashes without ensuring a sync() is done which currently is
important for updates.
For read-only usage, the database is safe - it is modified or
reorganised by reads so loss of machines or applications does not damage
the on-disk database.
TDB is an in-process database - one JVM controls the database. Having
two managing the files also will cause damage.
You can backup a database by copy but only from a running system if you
co-ordinate with a sync() which makes the on-disk structures consistent.
Stopping the DB is better and is needed on some OS's but dumping to
N-Quads can be done on a live database.
For updates, there are periods of vulnerability. This is being
addressed by adding ACID transactions to TDB. The transaction system is
based on write-ahead logging; read requests go straight to the DB as
before so performance there will be unchanged.
The disk format is (probably) going to be unchanged. There are some
improvements that can be made but they aren't necessary.
The bulk loader used to build a database from scratch will provide the
best load performance. It will remain non-transactional. Transactions
will be aimed at non-bulk updates. Where the practical boundary will be
will emerge in testing.
The transaction work is active-work-in-progress [*] but I'm not going to
give specific release schedules except to say that as an open source
project, "release early, release often" of development versions will happen.
Andy
[*] Indeed, I'm writing a journaled file abstraction at this moment.