On Thu, Sep 10, 2015 at 10:05 AM, Atin Mukherjee <amukh...@redhat.com> wrote: > > > On 09/10/2015 01:42 AM, Jeff Darcy wrote: >> Better get comfortable, everyone, because I might ramble on for a bit. >> >> Over the last few days, I've been looking into the issue of how to manage >> our own instances of etcd (or something similar) as part of our 4.0 >> configuration store. This is highly relevant for GlusterD 2.0, which would >> be both a consumer of the service and (possibly) a manager for the daemons >> that provide it. It's also relevant for NSR, which needs a similar kind of >> highly-available highly-consistent store for information about terms. Just >> about any other component might be able to take good advantage of such a >> facility if it were available, such as DHT 2.0 using it for layout >> information, and I encourage anyone working on 4.0 to think about how it can >> make other components simpler. (BTW, Shyam, that's just a hypothetical >> example. Don't take it any more seriously than you want to.) >> >> This is not the first time I've looked into this. During the previous round >> of NSR development, I implemented some code to manage etcd daemons from >> within GlusterD: >> >> http://review.gluster.org/#/c/8887/ >> >> That code's junk. We shouldn't use anything more than small pieces of it. >> Among other problems, it nukes the etcd information when a new node joins. >> That was fine for what we were doing with NSR at the time, but clearly can't >> work in real life. I've also been looking at the new-ish etcd interfaces >> for cluster management: >> >> https://github.com/coreos/etcd/blob/master/Documentation/other_apis.md >> >> I'm pretty sure these didn't exist when I was last looking at this stuff, >> but I could be wrong. In any case, they look pretty nice. Much like our >> own "probe" mechanism, it looks like we can start a single-node cluster and >> then add others into that cluster by talking to one of the current members. >> In fact, that similarity suggests how we might manage our instances of etcd. >> >> (1) Each GlusterD *initially* starts its own private instance of etcd. >> >> (2) When we probe from a node X to a node Y, the probe message includes >> information about X's etcd server(s). >> >> (3) Upon receipt of a probe, Y can (depending on a flag) either *use* X's >> etcd cluster or *join* it. Either way, it has to shut down its own one-node >> cluster. In the JOIN case, this implies that X will send the appropriate >> etcd command to its local instance (from whence it will be propagated to the >> others). > I've a follow up question here. Could you elaborate the difference > between *use* & *join*? As you pointed out that in either ways Y's > configuration shouldn't be taken into considerations, I believe as part > of peer probing we should clean up Y's configuration (bringing down its > one node cluster) and then just join to the existing etcd cluster. > That's the single workflow what I could think of and *use* would also do > the same thing IMO.
Here's what I think: With join, the node becomes a "part" of the etcd cluster participating in leader election, replicating logs and such. With use, the node could just use the etcd service without becoming a part of the cluster (just as NSR would *use* etcd to store term information). >> >> (4) Therefore, the CLI/REST interfaces to initiate a probe need an option to >> control this join/use flag. Default should be JOIN for small clusters, >> where it's not a problem for all nodes to be etcd servers as well. > consul/etcd documentation says that the ideal configuration is to have > 3-5 servers to form the cluster. The way I was thinking about it is > during peer probe we would check whether cluster has already gotten the > enough number of servers to form the cluster, if not then consider the > other end to join as a etcd server otherwise act as a client. Thoughts? >> >> (5) For larger clusters, the administrator might start to specify USE >> instead of JOIN after a while. There might also need to be separate >> CLI/REST interfaces to toggle this state without any probe involved. >> >> (6) For detach/deprobe, we simply undo the things we did in (3). >> >> With all of this in place, probes would become one-time exchanges. There's >> no need for GlusterD daemons to keep probing each other when they can just >> "check in" with etcd (which is doing something very similar internally). >> Instead of constantly sending its own probe/heartbeat messages and keeping >> track of which others nodes' messages have been missed, each GlusterD would >> simply use its node UUID to create a time-limited key in etcd, and issue >> watches on other nodes' keys. This is not quite as convenient as >> ZooKeeper's ephemerals, but it's still a lot better than what we're doing >> now. >> >> I'd be tempted to implement this myself, but for now it's probably more >> important to work on NSR itself and for that I can just use an external etcd >> cluster instead. Maybe later in the 4.0 integration phase, if nobody else >> has beaten me to it, I'll take a swing at it. Until then, does anyone else >> have any thoughts on the proposal? >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel >> > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel