TDB 2 Replication

Rob Vesse Thu, 12 Nov 2015 09:56:49 -0800

Comments inline:

On 12/11/2015 06:31, "Andy Seaborne" <a...@apache.org> wrote:


>On 12/11/15 00:42, Rob Vesse wrote:
>> Andy
>>
>> I was talking to some folks at a major bank the other day about TDB and
>>got
>> an interesting question that I didn't have an answer to.  Essentially
>>they
>> were interested in learning how you would provide replication and
>>hot-standy
>> with TDB
>>
>> For current generation TDB I told them that people typically place a
>>Load
>> Balancer in front on multiple TDB instances for read-centric workloads
>>and
>> manually handle replicating updates to all systems (usually be
>>temporarily
>> disabling the services, applying the updates and restarting)
>
>There is a tradeoff - another way, avoiding loss of service during
>updates is keep the service running and accept a window of inconsistency
>and update each server live (in parallel or in sequence for slight
>different effects).  Load balancers are (usually) sticky so the nature
>of the inconsistency is reduced for each client. Depending on the use
>case where the read-workload and publisher workload are separate,
>perfect consistency may not matter.
>
>> However for Lizard and TDB 2 I did not really know what the answer
>>would be?
>> Lizard is obviously designed as a fault-tolerant system but I don't
>>know if
>> the same can be said for TDB 2 (which I understand more to be about
>>changing
>> the on-disk data structures)
>
>TDB2 is TDB built and avoids the limitations on transaction size.  TDB2
>itself is a single machine system like TDB.
>
>If you have any feedback to express that more clearly than
>
>http://mail-archives.apache.org/mod_mbox/jena-dev/201506.mbox/%3C5575B7B3.
>8020101%40apache.org%3E
>
>please let me know.   I'm too close to the tech!

No that makes perfect sense

My understanding was that TDB primarily solved the transaction and disk
usage growth issues

>
>
>I have been thinking that packaging Lizard as a two server system ("two"
>=> small number, each a complete DB replica so able to operate
>independently).  This is distributed transactions (multi-master, with
>the one true active transaction restriction which is really the same as
>master-slave with a master election for each transaction) across
>multiple TDB2 instances.
>
>> For bonus points they were also interested in whether Lizard/TDB 2 would
>> offer customisable replication policies?  For example they would like
>>to be
>> able to configure some replicas to be read-only while having others be
>> read/write?
>
>Yes, architecturally. No, in implementation.
>
>And the middle ground of update some relicas synchronously and some
>async after the transaction has formally committed.  (i.e. 5 replicas,
>update 3 now and 2 "later").
>
>The current implementation is "update all replicas".  It's missing the
>async catch-up aspect and what do do with crashes during catch-up.
>
>Is that what was meant by "some read-only"?

No I don't think so

>
>Or was it a partition of the dataset with some parts read-only and on
>servers that reflect that and some parts read-write?

I think they expected all the replicas to stay roughly consistent but
they'd like to have a set up where some users are directed to specific
replicas (they were thinking of geographically distributed replicas) and
that users in certain geographic reasons would only be able to read the
data and not to write it.  So the system as a whole would be read/write
but individual replicas might be exposed as read-only services.

Is that something that is doable?

Rob

>
>I guess it comes down to what happens if an update happens - do
>read-only systems eventually catch up? Are they unaffected because
>updates didn't apply to their part of the graphs in the datasets? Other
>meaning?
>
>       Andy
>
>>
>> Rob
>>
>

Re: TDB/Lizard/TDB 2 Replication

Reply via email to