On 09/09/11 13:06, Thorsten Möller wrote:

Am 09.09.2011 um 00:20 schrieb Paolo Castagna:

Hi, one feature I miss from Fuseki (and probably everybody who
wants to use Fuseki to run SPARQL endpoints, public or private) is
"high availability".

One way to increase the availability of your SPARQL endpoints is to
use "replication": you serve the same data with multiple machines
and you put a load balancer in front of it, distributing queries to
those machines.

I'd like something relatively easy to setup, close to zero admin
costs at runtime and which would make extremely easy to provision a
new replica if necessary. I'd like something so easy, I could
implement it in a couple of days.

Here are some thoughts (inspired by Solr master/slave architecture
and replication)...

One master, N slaves. All running exactly the same Fuseki code
base. All writes go to the master. If the master is down, no writes
are possible (this is the price to pay in exchange of extreme
simplicity). All reads go to the N slaves (+ master?). Masters
might be out of synch between each others (this is the price to pay
in exchange of extreme simplicity). Each slave every once in a
while (configurable) will contact the master to check if it has
received updates since the last time the slave has checked. An
additional servlet needs to be added to Fuseki to co-ordinate
replication. Replication happens simply by running rsync (from
Java), from the master pushing out stuff onto a remote slave.
During replication the master must acquire an exclusive write lock,
sync TDB indexes on disk, rsync with slave, release lock when
finished or after a timeout long enough for a first sync to be
successful. Slaves need to drop eventual TDB caches after a rsync.

The things I like about this approach are:

- it's very simple - after the first time, rsync is incremental and
quite fast - provision a new slave is trivial - it could be
implemented relatively quickly - if the master receives an update
while one or more slaves are down, slaves will catch up later

The things I do not like are:

- the master is a SPOF for writes - you need an external load
balancer to send requests to the slave(s) - slaves can be out of
sync for a period of time - can a slave answer queries during the
replication with rsync?

What do you think?

That's all well thought out, but it's more of an approach to
scalability than availability. The reason is that the single
write-only master remains a SPOF; hence, the availability does not
change regarding writes. Of course, one could assume that the master
has 100% availability but then there is no need to do something about
availability.

True, but.

Make the front-end coordinator a simple router of requests. It tracks the state of the cluster but is not itself a database node.

This greatly reduces the chance of the SPOF actually failing.

To really remove a SPOF, replicate the coordinator, use zookeeper (or etc) to assist in tracking the version state - there is a cost of coordination.

One greatly simplifying assumption is to design for a datacentre, not for geographic distribution. The partition problem can simplifed.

Below ...



I don't have an alternative who would be so simple to implement and
which would increase the availability (in particular for reads) of
a bunch of machines running Fuseki.

Replication is inherently a tradeoff between consistency and
performance; recap the famous CAP theorem [1].


Replication is good at increasing your throughput in terms of
queries/sec and it can have minor effects in reducing query
response time because you have less concurrent queries per
machine.

Caching will also be very useful and this is, by no means, a
substitute for that. We would love to have both! The two things can
and IMHO should coexist. In my opinion, availability should come
first and caching soon after. The problem with caching first with
no availability is that you have a problem with cache misses. Your
users don't get their responses back.

Another approach would be to look at the Journal in TxTDB and see
if we could have a Journal which would replicate changes with
remote machines.

Another alternative would be to submit a SPARQL Update or a write
request to each of the replicas.

This seems to be a very interesting alternative since it would
basically blow away all data replication. Contemplating a little
more, a small update query can lead to the need to replicate the
entire data in worst case (e.g., delete everything), whereas costs
for replicating a query of a few hundred bytes are low. In other
words, replicating queries does not differ much in the costs whereas
costs for replicating the data that was changed as a result of a
query can differ dramatically. If you further assume eventual
consistency [2] (i.e., all nodes eventually have the same state of
the data) then it would also not matter how fast all the nodes are in
processing the queries. You would "only" need global update query
ordering in order to ensure that all nodes eventually have the same
state, otherwise, if nodes might receive update queries in different
order, nodes diverge in the state of the data over time. Note that
readers might, of course, see different results depending on which
node they query - that's the price of eventual consistency.

What is more, having quorum control for read and write is a way to allow for different characteristics and for partition tolerance.

If N = number of machines, R = read quorum, W write quorum, then

R+W > N is consistency and majority partition wins
R+W <= N is eventual consistency, and the need to worry
       about partition recombination.

Choose R and W to meet the specific service needs.

e.g. R=1 to maximise query throughput.

        Andy



Thorsten


[1] http://en.wikipedia.org/wiki/CAP_theorem [2]
http://en.wikipedia.org/wiki/Eventual_consistency




In this case, what do we do if one of the replica is down?

Other alternatives?

Is this something you need or you would use?

Paolo

Reply via email to