Is there any reason not to consider zookeeper? The 3.4 release is
quite stable and due to a large number of users, bugs are fixed and
its quirks are known.
I like the idea of RAFT. The paper is well written and very
compelling. The last time I read it, a number of critical issues were
glossed over - for instance, log compaction and pruning. Systems must
be correct in both theory and implementation. Although many raft-based
sustems have cropped in the last year or so since RAFT was published,
I don't judge their use to be significant compared to zookeeper.
Quality only comes with maturity and many workloads bashing on it.
The last time I needed to build up a new distributed system, I had
written up some notes about etcd vs zookeeper. Perhaps you will find
them helpful, or motivate some new questions before you make your
decision.
https://docs.google.com/document/d/1FOnLD26W9iQ2CUZ-jVCn7o0OrX8KPH7QGeokN4tA_j4/edit
On Monday, September 8, 2014, Jonathan Barber jonathan.bar...@gmail.com wrote:
On 8 September 2014 05:05, Krishnan Parthasarathi kpart...@redhat.com wrote:
Bulk of current GlusterD code deals with keeping the configuration of the
cluster and the volumes in it consistent and available across the nodes.
The
current algorithm is not scalable (N^2 in no. of nodes) and doesn't prevent
split-brain of configuration. This is the problem area we are targeting for
the first phase.
As part of the first phase, we aim to delegate the distributed
configuration
store. We are exploring consul [1] as a replacement for the existing
distributed configuration store (sum total of /var/lib/glusterd/* across
all
nodes). Consul provides distributed configuration store which is consistent
and partition tolerant. By moving all Gluster related configuration
information into consul we could avoid split-brain situations.
Did you get a chance to go over the following questions while making the
decision? If yes could you please share the info.
What are the consistency guarantees for changing the configuration in case
of
network partitions?
specifically when there are 2 nodes and 1 of them is not reachable?
consistency guarantees when there are more than 2 nodes?
What are the consistency guarantees for reading configuration in case of
network partitions?
consul uses Raft[1] distributed consensus algorithm internally for
maintaining
consistency. The Raft consensus algorithm is proven to be correct. I will be
going through the workings of the algorithm soon. I will share my answers to
the above questions after that. Thanks for the questions, it is important
for the user to understand the behaviour of a system especially under
failure.
I am considering adding a FAQ section to this proposal, where questions like
the above would
go, once it gets accepted and makes it to the feature page.
[1] -
https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
The following article provides some results on how Consul works following
partitioning, actually testing whether it recovers successfully:
http://aphyr.com/posts/316-call-me-maybe-etcd-and-consul
It gives Consul a positive review.
HTH
~KP
___
Gluster-users mailing list
gluster-us...@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
Jonathan Barber jonathan.bar...@gmail.com
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel