Hi Andy,

Great to hear about the transaction work.

For the N-Quad export on a live database, do you mean that a running
application - with open Jena models, and possibly a thread writing to it
will not interfere with the N-Quad dump - the only gotcha would be a
possible missed quad just before/after the export operation?  If so, I think
that would sufficie for the short-term.

I guess another possiblity is within the app, fire up a thread and do a
Model.write out to the filesystem to save the entire model.

Regarding bulk imports - I'm actually finding regular Jena model
manipulation runs very fast with TDB.  Within second can have a 100k
statement TDB store setup.

I'm not looking to boil the giant datasets out there, I'm looking at
practical how-to-build-apps perspective.  Speaking of which, it would be
nice if the cache was controllable - similar to Ehcache (or maybe an idea
for a future project for TDB to use Ehcache) - TTL, max in memory, etc.

Thanks,
Al


On Sun, May 29, 2011 at 2:52 PM, Andy Seaborne <
[email protected]> wrote:

>
>
> On 28/05/11 08:30, Al Baker wrote:
>
>> I've been testing TDB for a while, and am very impressed with its
>> performance.  However, I do see the various emails on the mailing lists
>> warning of touching the files while an application with TDB is open
>> (presumably with an open Jena Model attached to the TDB directory).
>>
>> What kind of reliability does TDB have to survive a power hit or
>> application
>> crash?
>>
>> Are there some steps to take consistent and regular backups to mitigate
>> any
>> issues?
>>
>> Basically looking to have some level of confidence that I can use TDB in
>> production, take a reasonable amount of steps to insure reliability, and
>> be
>> confident that I'll always either have a valid TDB store, or a way to
>> incrementally backup/rollback in the case of a severe crash/file system
>> error.
>>
>> Thanks,
>> Al Baker
>>
>
> Hi Al,
>
> Currently, TDB provides some update capabilities but relies on the
> application maintaining MRSW (Multiple Reader Or Single Writer) concurrency
> semantics together with a clean shutdown.  Many of the reports are due to
> letting two writers access the database at the same time or crashes without
> ensuring a sync() is done which currently is important for updates.
>
> For read-only usage, the database is safe - it is modified or reorganised
> by reads so loss of machines or applications does not damage the on-disk
> database.
>
> TDB is an in-process database - one JVM controls the database.  Having two
> managing the files also will cause damage.
>
> You can backup a database by copy but only from a running system if you
> co-ordinate with a sync() which makes the on-disk structures consistent.
>  Stopping the DB is better and is needed on some OS's but dumping to N-Quads
> can be done on a live database.
>
> For updates, there are periods of vulnerability.  This is being addressed
> by adding ACID transactions to TDB.  The transaction system is based on
> write-ahead logging; read requests go straight to the DB as before so
> performance there will be unchanged.
>
> The disk format is (probably) going to be unchanged.  There are some
> improvements that can be made but they aren't necessary.
>
> The bulk loader used to build a database from scratch will provide the best
> load performance.  It will remain non-transactional. Transactions will be
> aimed at non-bulk updates.  Where the practical boundary will be will emerge
> in testing.
>
> The transaction work is active-work-in-progress [*] but I'm not going to
> give specific release schedules except to say that as an open source
> project, "release early, release often" of development versions will happen.
>
>        Andy
>
> [*] Indeed, I'm writing a journaled file abstraction at this moment.
>

Reply via email to