On Tue, Jan 14, 2020 at 2:37 PM Atin Mukherjee <atin.mukherje...@gmail.com> wrote:
> From a design perspective 2 is a better choice. However I'd like to see a > design on how cluster id will be generated and maintained (with peer > addition/deletion scenarios, node replacement etc). > > Thanks for the feedback Atin. > On Tue, Jan 14, 2020 at 1:42 PM Amar Tumballi <a...@kadalu.io> wrote: > >> Hello, >> >> As we are gearing up for Release-8, and its planning, I wanted to bring >> up one of my favorite topics, 'Thin-Arbiter' (or Tie-Breaker/Metro-Cluster >> etc etc). >> >> We have made thin-arbiter release in v7.0 itself, which works great, when >> we have just 1 cluster of gluster. I am talking about a situation which >> involves multiple gluster clusters, and easier management of thin-arbiter >> nodes. (Ref: https://github.com/gluster/glusterfs/issues/763) >> >> I am working with a goal of hosting a thin-arbiter node service (free of >> cost), for which any gluster deployment can connect, and save their cost of >> an additional replica, which is required today to not get into split-brain >> situation. Tie-breaker storage and process needs are so less that we can >> easily handle all gluster deployments till date in just one machine. When I >> looked at the code with this goal, I found that current implementation >> doesn't support it, mainly because it uses 'volumename' in the file it >> creates. This is good for 1 cluster, as we don't allow duplicate volume >> names in a single cluster, or OK for multiple clusters, as long as volume >> names are not colliding. >> >> To resolve this properly we have 2 options (as per my thinking now) to >> make it truly global service. >> >> 1. Add 'volume-id' option in afr volume itself, so, each instance picks >> the volume-id and uses it in thin-arbiter name. A variant of this is >> submitted for review - https://review.gluster.org/23723 but as it uses >> volume-id from io-stats, this particular patch fails in case of brick-mux >> and shd-mux scenarios. A proper enhancement of this patch is, providing >> 'volume-id' option in AFR itself, so glusterd (while generating volfiles) >> sends the proper vol-id to instance. >> >> Pros: Minimal code changes to the above patch. >> Cons: One more option to AFR (not exposed to users). >> >> 2. Add* cluster-id *to glusterd, and pass it to all processes. Let >> replicate use this in thin-arbiter file. This too will solve the issue. >> >> Pros: A cluster-id is good to have in any distributed system, specially >> when there are deployments which will be 3 node each in different clusters. >> Identifying bricks, services as part of a cluster is better. >> >> Cons: Code changes are more, and in glusterd component. >> >> On another note, 1 above is purely for Thin-Arbiter feature only, where >> as 2nd option would be useful in debugging, and other solutions which >> involves multiple clusters. >> >> Let me know what you all think about this. This is good to be discussed >> in next week's meeting, and taken to completion. >> > After some more code reading, and thinking about possible solutions, I found that there is another simpler solution to get this resolved for multiple cluster. Currently thin-arbiter file name for a replica-set is picked from what is the 3rd (ie, index=2) option in 'pending-xattr' key in volume file. If we get that key to be unique (say volume-id + index-of-replica-set), this problem is solved. Needs minimum change in code for glusterfs (actually, no code change in filesystem part, but only in glusterd-volgen.c). I tried this approach while providing replica2 option <https://kadalu.io/rfcs/0003-kadalu-thin-arbiter-support> of kadalu.io project. The tests are running fine, and I got the expected goal met. <snip> > I am working with a goal of hosting a thin-arbiter node service (free of > cost), for which any gluster deployment can connect, and save their cost of > an additional replica, which is required today to not get into split-brain > situation. </snip> I am happy to tell, this goal is achieved. We now have `tie-breaker.kadalu.io:/mnt`, an instance in cloud, for anyone trying to use a thin-arbiter. If you are not keen to deploy your own instance, you can use this as thin-arbiter instance. Note that if you are using glusterfs releases, you may want to wait for patch https://review.gluster.org/24096 to make it to a release (probably 7.3/7.4) to use this in production, till that time, volume-files generated by glusterd volgen are still using volumename itself in pending-xattr, hence possible collision of files. Regards, >> Regards, >> Amar >> --- >> https://kadalu.io >> Storage made easy for k8s >> >> _______________________________________________ >> >> Community Meeting Calendar: >> >> APAC Schedule - >> Every 2nd and 4th Tuesday at 11:30 AM IST >> Bridge: https://bluejeans.com/441850968 >> >> >> NA/EMEA Schedule - >> Every 1st and 3rd Tuesday at 01:00 PM EDT >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-devel >> >> -- -- https://kadalu.io Container Storage made easy!
_______________________________________________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel