Re: [Gluster-users] Single Point of failure in geo Replication

Aravinda Vishwanathapura Krishna Murthy Thu, 17 Oct 2019 08:34:07 -0700

On Thu, Oct 17, 2019 at 11:44 AM deepu srinivasan <sdeep...@gmail.com>
wrote:


> Thank you for your response.
> We have tried the above use case you mentioned.
>
> Case 1: Primary node is permanently Down (Hardware failure)
> In this case, the Georeplication session cannot be stopped and returns the
> failure "start the primary node and then stop(or similar message)".
> Now I cannot delete because I cannot stop the session.
>

Please try "stop force", Let us know if that works.


> On Thu, Oct 17, 2019 at 8:32 AM Aravinda Vishwanathapura Krishna Murthy <
> avish...@redhat.com> wrote:
>
>>
>> On Wed, Oct 16, 2019 at 11:08 PM deepu srinivasan <sdeep...@gmail.com>
>> wrote:
>>
>>> Hi Users
>>> Is there a single point of failure in GeoReplication for gluster?
>>> My Case:
>>> I Use 3 nodes in both master and slave volume.
>>> Master volume : Node1,Node2,Node3
>>> Slave Volume : Node4,Node5,Node6
>>> I tried to recreate the scenario to test a single point of failure.
>>>
>>> Geo-Replication Status:
>>>
>>> *Master Node         Slave Node         Status *
>>> Node1                   Node4                  Active
>>> Node2                   Node4                  Passive
>>> Node3                   Node4                  Passive
>>>
>>> Step 1: Stoped the glusterd daemon in Node4.
>>> Result: There were only two-node statuses like the one below.
>>>
>>> *Master Node         Slave Node         Status *
>>> Node2                   Node4                  Passive
>>> Node3                   Node4                  Passive
>>>
>>>
>>> Will the GeoReplication session goes down if the primary slave is down?
>>>
>>
>>
>> Hi Deepu,
>>
>> Geo-replication depends on a primary slave node to get the information
>> about other nodes which are part of Slave Volume.
>>
>> Once the workers are started, it is not dependent on the primary slave
>> node. Will not fail if a primary goes down. But if any other node goes down
>> then the worker will try to connect to some other node, for which it tries
>> to run Volume status command on the slave node using the following command.
>>
>> ```
>> ssh -i <georep-pem> <primary-node> gluster volume status <slavevol>
>> ```
>>
>> The above command will fail and Worker will not get the list of Slave
>> nodes to which it can connect to.
>>
>> This is only a temporary failure until the primary node comes back
>> online. If the primary node is permanently down then run Geo-rep delete and
>> Geo-rep create command again with the new primary node. (Note: Geo-rep
>> Delete and Create will remember the last sync time and resume once it
>> starts)
>>
>> I will evaluate the possibility of caching a list of Slave nodes so that
>> it can be used as a backup primary node in case of failures. I will open
>> Github issue for the same.
>>
>> Thanks for reporting the issue.
>>
>> --
>> regards
>> Aravinda VK
>>
>

-- 
regards
Aravinda VK

________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Single Point of failure in geo Replication

Reply via email to