Re: [Gluster-devel] [Release-8] Thin-Arbiter: Unique-ID requirement

2020-02-04 Thread Amar Tumballi
On Tue, Jan 14, 2020 at 2:37 PM Atin Mukherjee 
wrote:

> From a design perspective 2 is a better choice. However I'd like to see a
> design on how cluster id will be generated and maintained (with peer
> addition/deletion scenarios, node replacement etc).
>
>
Thanks for the feedback Atin.


> On Tue, Jan 14, 2020 at 1:42 PM Amar Tumballi  wrote:
>
>> Hello,
>>
>> As we are gearing up for Release-8, and its planning, I wanted to bring
>> up one of my favorite topics, 'Thin-Arbiter' (or Tie-Breaker/Metro-Cluster
>> etc etc).
>>
>> We have made thin-arbiter release in v7.0 itself, which works great, when
>> we have just 1 cluster of gluster. I am talking about a situation which
>> involves multiple gluster clusters, and easier management of thin-arbiter
>> nodes. (Ref: https://github.com/gluster/glusterfs/issues/763)
>>
>> I am working with a goal of hosting a thin-arbiter node service (free of
>> cost), for which any gluster deployment can connect, and save their cost of
>> an additional replica, which is required today to not get into split-brain
>> situation. Tie-breaker storage and process needs are so less that we can
>> easily handle all gluster deployments till date in just one machine. When I
>> looked at the code with this goal, I found that current implementation
>> doesn't support it, mainly because it uses 'volumename' in the file it
>> creates. This is good for 1 cluster, as we don't allow duplicate volume
>> names in a single cluster, or OK for multiple clusters, as long as volume
>> names are not colliding.
>>
>> To resolve this properly we have 2 options (as per my thinking now) to
>> make it truly global service.
>>
>> 1. Add 'volume-id' option in afr volume itself, so, each instance picks
>> the volume-id and uses it in thin-arbiter name. A variant of this is
>> submitted for review - https://review.gluster.org/23723 but as it uses
>> volume-id from io-stats, this particular patch fails in case of brick-mux
>> and shd-mux scenarios.  A proper enhancement of this patch is, providing
>> 'volume-id' option in AFR itself, so glusterd (while generating volfiles)
>> sends the proper vol-id to instance.
>>
>> Pros: Minimal code changes to the above patch.
>> Cons: One more option to AFR (not exposed to users).
>>
>> 2. Add* cluster-id *to glusterd, and pass it to all processes. Let
>> replicate use this in thin-arbiter file. This too will solve the issue.
>>
>> Pros: A cluster-id is good to have in any distributed system, specially
>> when there are deployments which will be 3 node each in different clusters.
>> Identifying bricks, services as part of a cluster is better.
>>
>> Cons: Code changes are more, and in glusterd component.
>>
>> On another note, 1 above is purely for Thin-Arbiter feature only, where
>> as 2nd option would be useful in debugging, and other solutions which
>> involves multiple clusters.
>>
>> Let me know what you all think about this. This is good to be discussed
>> in next week's meeting, and taken to completion.
>>
>
After some more code reading, and thinking about possible solutions, I
found that there is another simpler solution to get this resolved for
multiple cluster.

Currently thin-arbiter file name for a replica-set is picked from what is
the 3rd (ie, index=2) option in 'pending-xattr' key in volume file. If we
get that key to be unique (say volume-id + index-of-replica-set), this
problem is solved. Needs minimum change in code for glusterfs (actually, no
code change in filesystem part, but only in glusterd-volgen.c).

I tried this approach while providing replica2 option
 of kadalu.io
project. The tests are running fine, and I got the expected goal met.



>  I am working with a goal of hosting a thin-arbiter node service (free of
> cost), for which any gluster deployment can connect, and save their cost of
> an additional replica, which is required today to not get into split-brain
> situation.



I am happy to tell, this goal is achieved. We now have
`tie-breaker.kadalu.io:/mnt`, an instance in cloud, for anyone trying to
use a thin-arbiter. If you are not keen to deploy your own instance, you
can use this as thin-arbiter instance. Note that if you are using glusterfs
releases, you may want to wait for patch https://review.gluster.org/24096
to make it to a release (probably 7.3/7.4) to use this in production, till
that time, volume-files generated by glusterd volgen are still using
volumename itself in pending-xattr, hence possible collision of files.

Regards,


>> Regards,
>> Amar
>> ---
>> https://kadalu.io
>> Storage made easy for k8s
>>
>> ___
>>
>> Community Meeting Calendar:
>>
>> APAC Schedule -
>> Every 2nd and 4th Tuesday at 11:30 AM IST
>> Bridge: https://bluejeans.com/441850968
>>
>>
>> NA/EMEA Schedule -
>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-devel mailing list
>> 

Re: [Gluster-devel] [Release-8] Thin-Arbiter: Unique-ID requirement

2020-01-14 Thread Atin Mukherjee
>From a design perspective 2 is a better choice. However I'd like to see a
design on how cluster id will be generated and maintained (with peer
addition/deletion scenarios, node replacement etc).

On Tue, Jan 14, 2020 at 1:42 PM Amar Tumballi  wrote:

> Hello,
>
> As we are gearing up for Release-8, and its planning, I wanted to bring up
> one of my favorite topics, 'Thin-Arbiter' (or Tie-Breaker/Metro-Cluster etc
> etc).
>
> We have made thin-arbiter release in v7.0 itself, which works great, when
> we have just 1 cluster of gluster. I am talking about a situation which
> involves multiple gluster clusters, and easier management of thin-arbiter
> nodes. (Ref: https://github.com/gluster/glusterfs/issues/763)
>
> I am working with a goal of hosting a thin-arbiter node service (free of
> cost), for which any gluster deployment can connect, and save their cost of
> an additional replica, which is required today to not get into split-brain
> situation. Tie-breaker storage and process needs are so less that we can
> easily handle all gluster deployments till date in just one machine. When I
> looked at the code with this goal, I found that current implementation
> doesn't support it, mainly because it uses 'volumename' in the file it
> creates. This is good for 1 cluster, as we don't allow duplicate volume
> names in a single cluster, or OK for multiple clusters, as long as volume
> names are not colliding.
>
> To resolve this properly we have 2 options (as per my thinking now) to
> make it truly global service.
>
> 1. Add 'volume-id' option in afr volume itself, so, each instance picks
> the volume-id and uses it in thin-arbiter name. A variant of this is
> submitted for review - https://review.gluster.org/23723 but as it uses
> volume-id from io-stats, this particular patch fails in case of brick-mux
> and shd-mux scenarios.  A proper enhancement of this patch is, providing
> 'volume-id' option in AFR itself, so glusterd (while generating volfiles)
> sends the proper vol-id to instance.
>
> Pros: Minimal code changes to the above patch.
> Cons: One more option to AFR (not exposed to users).
>
> 2. Add* cluster-id *to glusterd, and pass it to all processes. Let
> replicate use this in thin-arbiter file. This too will solve the issue.
>
> Pros: A cluster-id is good to have in any distributed system, specially
> when there are deployments which will be 3 node each in different clusters.
> Identifying bricks, services as part of a cluster is better.
>
> Cons: Code changes are more, and in glusterd component.
>
> On another note, 1 above is purely for Thin-Arbiter feature only, where as
> 2nd option would be useful in debugging, and other solutions which
> involves multiple clusters.
>
> Let me know what you all think about this. This is good to be discussed in
> next week's meeting, and taken to completion.
>
> Regards,
> Amar
> ---
> https://kadalu.io
> Storage made easy for k8s
>
> ___
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968


NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] [Release-8] Thin-Arbiter: Unique-ID requirement

2020-01-14 Thread Amar Tumballi
Hello,

As we are gearing up for Release-8, and its planning, I wanted to bring up
one of my favorite topics, 'Thin-Arbiter' (or Tie-Breaker/Metro-Cluster etc
etc).

We have made thin-arbiter release in v7.0 itself, which works great, when
we have just 1 cluster of gluster. I am talking about a situation which
involves multiple gluster clusters, and easier management of thin-arbiter
nodes. (Ref: https://github.com/gluster/glusterfs/issues/763)

I am working with a goal of hosting a thin-arbiter node service (free of
cost), for which any gluster deployment can connect, and save their cost of
an additional replica, which is required today to not get into split-brain
situation. Tie-breaker storage and process needs are so less that we can
easily handle all gluster deployments till date in just one machine. When I
looked at the code with this goal, I found that current implementation
doesn't support it, mainly because it uses 'volumename' in the file it
creates. This is good for 1 cluster, as we don't allow duplicate volume
names in a single cluster, or OK for multiple clusters, as long as volume
names are not colliding.

To resolve this properly we have 2 options (as per my thinking now) to make
it truly global service.

1. Add 'volume-id' option in afr volume itself, so, each instance picks the
volume-id and uses it in thin-arbiter name. A variant of this is submitted
for review - https://review.gluster.org/23723 but as it uses volume-id from
io-stats, this particular patch fails in case of brick-mux and
shd-mux scenarios.  A proper enhancement of this patch is, providing
'volume-id' option in AFR itself, so glusterd (while generating volfiles)
sends the proper vol-id to instance.

Pros: Minimal code changes to the above patch.
Cons: One more option to AFR (not exposed to users).

2. Add* cluster-id *to glusterd, and pass it to all processes. Let
replicate use this in thin-arbiter file. This too will solve the issue.

Pros: A cluster-id is good to have in any distributed system, specially
when there are deployments which will be 3 node each in different clusters.
Identifying bricks, services as part of a cluster is better.

Cons: Code changes are more, and in glusterd component.

On another note, 1 above is purely for Thin-Arbiter feature only, where as
2nd option would be useful in debugging, and other solutions which
involves multiple clusters.

Let me know what you all think about this. This is good to be discussed in
next week's meeting, and taken to completion.

Regards,
Amar
---
https://kadalu.io
Storage made easy for k8s
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968


NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel