On 03/02/11 22:24, Paolo Castagna wrote:
Frank Budinsky wrote:
Hi,
In a previous exchange Damian told me:
You can't write
to the same TDB store from different processes.
I'm wondering if there are any safe exceptions? For example, is it
safe if
one process always adds/removes/updates statements in named graphs, while
the other process works exclusively in the default graph (i.e., the
graphs
being used by the two processes are completely independent)? Is this
safe,
or am I treading on thin ice by using the same TDB store for both
processes?
Thanks,
Frank
For me (and us @ Talis) one concern is that MRSW (i.e. Multiple Readers
Single
Writer) locking necessary is a MR xor SW (i.e. exclusive or). So, a very
long
write operation can actually stop others reading until the write has
finished.
So, if you allow big/long updates, you need to carefully consider
alternatives
to avoid this problem.
A slower(?) (than native TDB) alternative could be TDB-BDB:
https://github.com/afs/TDB-BDB
Having a systems with multiple replica helps as well.
Thinking other alternative approaches [1] is too scary for me at this
time, but
it would be good to list and describe them (just in case there are
people keen
to help on this) or share good papers to read which describe an approach
which
is compatible with TDB design.
Journalled file access. Small matter of programming (it fits the
current design). Phase trees are also possible (but don't have the same
recoverability).
Or break the writes up into smaller blocks.
If it's a long write, then if it's the app that slow, journalling wins.
If it's the fact a lot of data is being written, well, there is rather
less one can do without partial locking (which is expensive for
everything else); it's rather hard in RDF to know what's "unrelated".
Andy
Paolo
[1] http://en.wikipedia.org/wiki/Multiversion_concurrency_control