subject:"Re\: \[Gluster\-users\] Geo\-replication status is getting Faulty after few seconds"

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-31 Thread Anant Saraswat

Hi @Strahil Nikolov<mailto:hunter86...@yahoo.com>,

We are using 9.4 on all the nodes.

[anant@drtier1data ~]$ glusterd --version
glusterfs 9.4
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

Thanks,
Anant

From: Strahil Nikolov 
Sent: 31 January 2024 4:18 PM
To: Anant Saraswat ; Aravinda 

Cc: gluster-users@gluster.org 
Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
   seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hi Anant,

What version of Gluster are you using ?

Best Regards,
Strahil Nikolov

DISCLAIMER: This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the sender. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee, you should not 
disseminate, distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete this 
email from your system.

If you are not the intended recipient, you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited. Thanks for your cooperation.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-31 Thread Strahil Nikolov

Hi Anant,
What version of Gluster are you using ?
Best Regards,Strahil Nikolov



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-30 Thread Anant Saraswat

Hi All,

As per the documentation, if we use `delete` only it will start the replication 
from the time where it was left before deleting the session, So I tried that 
without any luck.

gluster volume geo-replication tier1data drtier1data::drtier1data delete
gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force
gluster volume geo-replication tier1data drtier1data::drtier1data start
gluster volume geo-replication tier1data drtier1data::drtier1data status

I have tried to check the drtier1data logs as well, and all I can see is 
master1 connects to drtier1data and send disconnect after 5 seconds, please 
check the following logs from drtier1data.

[2024-01-30 21:04:03.016805 +] I [MSGID: 114046] 
[client-handshake.c:857:client_setvolume_cbk] 0-drtier1data-client-0: 
Connected, attached to remote volume [{conn-name=drtier1data-client-0}, 
{remote_subvol=/opt/tier1data2019/brick}]
[2024-01-30 21:04:03.020148 +] I [fuse-bridge.c:5296:fuse_init] 
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.33
[2024-01-30 21:04:03.020197 +] I [fuse-bridge.c:5924:fuse_graph_sync] 
0-fuse: switched to graph 0
[2024-01-30 21:04:08.573873 +] I [fuse-bridge.c:6233:fuse_thread_proc] 
0-fuse: initiating unmount of /tmp/gsyncd-aux-mount-c8c41k2k
[2024-01-30 21:04:08.575131 +] W [glusterfsd.c:1429:cleanup_and_exit] 
(-->/lib64/libpthread.so.0(+0x817a) [0x7fb907e2e17a] 
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xfd) [0x55f97b17dbfd] 
-->/usr/sbin/glusterfs(cleanup_and_exit+0x58) [0x55f97b17da48] ) 0-: received 
signum (15), shutting down
[2024-01-30 21:04:08.575227 +] I [fuse-bridge.c:7063:fini] 0-fuse: 
Unmounting '/tmp/gsyncd-aux-mount-c8c41k2k'.
[2024-01-30 21:04:08.575256 +] I [fuse-bridge.c:7068:fini] 0-fuse: Closing 
fuse connection to '/tmp/gsyncd-aux-mount-c8c41k2k'.

Can anyone suggest how can I find the reason of getting these disconnect 
requests from master1 or what shall I check next?

Many thanks,
A


From: Gluster-users  on behalf of Anant 
Saraswat 
Sent: 30 January 2024 2:14 PM
To: gluster-users@gluster.org ; Strahil Nikolov 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hello Everyone,


I am looking for some help. Can anyone please suggest if it's possible to 
promote a master node to be the primary in the geo-replication session?


We have three master nodes and one secondary node. We are facing issues where 
geo-replication is consistently failing from the primary master node. We want 
to check if it works fine from another master node.


Any guidance or assistance would be highly appreciated.

Many thanks,
Anant

From: Anant Saraswat 
Sent: 29 January 2024 3:55 PM
To: gluster-users@gluster.org ; Strahil Nikolov 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds

Hi @Strahil Nikolov<mailto:hunter86...@yahoo.com>,

We have been running this geo-replication for more than 5 years and it was 
working fine till last week, So I think it shouldn't be something which was 
missed in the initial setup, but I am unable to understand why it's not working 
now.

I have enabled SSH Debug on the secondary node(drtier1data), and I can see this 
in the logs.

Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: server_input_channel_req: 
channel 0 request exec reply 1
Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: session_by_channel: session 
0 channel 0
Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: session_input_channel_req: 
session 0 req exec
Jan 29 14:25:52 drtier1data sshd[1268110]: Starting session: command for root 
from XX.236.28.58 port 53082 id 0
Jan 29 14:25:52 drtier1data sshd[1268095]: debug1: session_new: session 0
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: Received SIGCHLD.
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_by_pid: pid 1268111
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_exit_message: 
session 0 channel 0 pid 1268111
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_exit_message: 
release channel 0
Jan 29 14:25:58 drtier1data sshd[1268110]: Received disconnect from 
XX.236.28.58 port 53082:11: disconnected by user
Jan 29 14:25:58 drtier1data sshd[1268110]: Disconnected from user root 
XX.236.28.58 port 53082
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: do_cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: do_cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: closing session
Jan 29 14:25:58 drtier1data sshd[1268095]: pam_unix(sshd:session): session 
closed for user root
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: deleting credentials

As per the above logs, drtier1data node is getting SIGCHLD from master1. 
(Rece

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-29 Thread Anant Saraswat

Hi @Strahil Nikolov<mailto:hunter86...@yahoo.com>,

We have been running this geo-replication for more than 5 years and it was 
working fine till last week, So I think it shouldn't be something which was 
missed in the initial setup, but I am unable to understand why it's not working 
now.

I have enabled SSH Debug on the secondary node(drtier1data), and I can see this 
in the logs.

Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: server_input_channel_req: 
channel 0 request exec reply 1
Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: session_by_channel: session 
0 channel 0
Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: session_input_channel_req: 
session 0 req exec
Jan 29 14:25:52 drtier1data sshd[1268110]: Starting session: command for root 
from XX.236.28.58 port 53082 id 0
Jan 29 14:25:52 drtier1data sshd[1268095]: debug1: session_new: session 0
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: Received SIGCHLD.
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_by_pid: pid 1268111
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_exit_message: 
session 0 channel 0 pid 1268111
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_exit_message: 
release channel 0
Jan 29 14:25:58 drtier1data sshd[1268110]: Received disconnect from 
XX.236.28.58 port 53082:11: disconnected by user
Jan 29 14:25:58 drtier1data sshd[1268110]: Disconnected from user root 
XX.236.28.58 port 53082
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: do_cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: do_cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: closing session
Jan 29 14:25:58 drtier1data sshd[1268095]: pam_unix(sshd:session): session 
closed for user root
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: deleting credentials

As per the above logs, drtier1data node is getting SIGCHLD from master1. 
(Received disconnect from XX.236.28.58 port 53082:11: disconnected by user)

Also, I have checked the gsyncd.log on master1, which says "SSH: SSH connection 
between master and slave established. [{duration=1.7277}]", which means 
passwordless ssh is working fine.

As per my understanding, Master1 can connect to the drtier1data server, and 
then the geo-replication status changes to Active --> History Crawl and then 
something happens on the master1 which triggers the SSH disconnect.

is it possible to change the master node in geo-replication so that we can mark 
master2 as primary, instead of master1?

I am really struggling to fix this issue, Please help, any pointer is 
appreciated !!!

Many thanks,
Anant

From: Gluster-users  on behalf of Anant 
Saraswat 
Sent: 29 January 2024 12:20 AM
To: gluster-users@gluster.org ; Strahil Nikolov 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

HI Strahil,

As mentioned in my last email, I have copied the gluster public key from 
master3 to secondary server, and I can now ssh from all master nodes to 
secondary server, but still getting the same error.

[root@master1 geo-replication]# ssh root@drtier1data -i 
/var/lib/glusterd/geo-replication/secret.pem
Last login: Mon Jan 29 00:14:32 2024 from
[root@drtier1data ~]#

[root@master2 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Last login: Mon Jan 29 00:02:34 2024 from
[root@drtier1data ~]#

[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Last login: Mon Jan 29 00:14:41 2024 from
[root@drtier1data ~]#

Thanks,
Anant

From: Strahil Nikolov 
Sent: 28 January 2024 10:07 PM
To: Anant Saraswat ; gluster-users@gluster.org 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Gluster doesn't use the ssh key in /root/.ssh, thus you need to exchange the 
public key that corresponds to /var/lib/glusterd/geo-replication/secret.pem . 
If you don't know the pub key, google how to obtain it from the private key.

Ensure that all hosts can ssh to the secondary before proceeding with the 
troubleshooting.

Best Regards,
Strahil Nikolov

On Sun, Jan 28, 2024 at 15:58, Anant Saraswat
 wrote:
Hi All,

I have now copied  /var/lib/glusterd/geo-replication/secret.pem.pub  (public 
key) from master3 to drtier1data /root/.ssh/authorized_keys, and now I can ssh 
from master node3 to drtier1data using the georep key 
(/var/lib/glusterd/geo-replication/secret.pem).

But I am still getting the same error, and geo-replication is getting faulty 
again and again.

[2024-01-28 13:46:38.897683] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449598}]
[2024-01-28 13:46:38.922491] I [gsyncd

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-28 Thread Anant Saraswat

HI Strahil,

As mentioned in my last email, I have copied the gluster public key from 
master3 to secondary server, and I can now ssh from all master nodes to 
secondary server, but still getting the same error.

[root@master1 geo-replication]# ssh root@drtier1data -i 
/var/lib/glusterd/geo-replication/secret.pem
Last login: Mon Jan 29 00:14:32 2024 from
[root@drtier1data ~]#

[root@master2 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Last login: Mon Jan 29 00:02:34 2024 from
[root@drtier1data ~]#

[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Last login: Mon Jan 29 00:14:41 2024 from
[root@drtier1data ~]#

Thanks,
Anant

From: Strahil Nikolov 
Sent: 28 January 2024 10:07 PM
To: Anant Saraswat ; gluster-users@gluster.org 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Gluster doesn't use the ssh key in /root/.ssh, thus you need to exchange the 
public key that corresponds to /var/lib/glusterd/geo-replication/secret.pem . 
If you don't know the pub key, google how to obtain it from the private key.

Ensure that all hosts can ssh to the secondary before proceeding with the 
troubleshooting.

Best Regards,
Strahil Nikolov

On Sun, Jan 28, 2024 at 15:58, Anant Saraswat
 wrote:
Hi All,

I have now copied  /var/lib/glusterd/geo-replication/secret.pem.pub  (public 
key) from master3 to drtier1data /root/.ssh/authorized_keys, and now I can ssh 
from master node3 to drtier1data using the georep key 
(/var/lib/glusterd/geo-replication/secret.pem).

But I am still getting the same error, and geo-replication is getting faulty 
again and again.

[2024-01-28 13:46:38.897683] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449598}]
[2024-01-28 13:46:38.922491] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:46:38.923127] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:46:38.923313] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449598}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:46:39.973584] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:46:40.98970] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:46:40.757691] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:46:40.766860] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:46:50.793311] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:46:50.793469] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:46:50.874474] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:46:52.659114] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7844}]
[2024-01-28 13:46:52.659461] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-28 13:46:53.698769] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0392}]
[2024-01-28 13:46:53.698984] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2024-01-28 13:46:55.831999] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:46:55.832354] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449615}]
[2024-01-28 13:46:55.854684] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:46:55.855251] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:46:55.855419] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449615}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:46:56.905496] I [master(worker 
/opt

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-28 Thread Strahil Nikolov

 Status 
Change [{status=Faulty}][2024-01-28 13:48:15.430175] I 
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status 
Change [{status=Initializing...}][2024-01-28 13:48:15.430308] I 
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker 
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28 
13:48:15.510770] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...[2024-01-28 13:48:17.240311] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7294}][2024-01-28 13:48:17.240509] I 
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting 
gluster volume locally...[2024-01-28 13:48:18.279007] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0384}][2024-01-28 13:48:18.279195] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor[2024-01-28 13:48:20.455937] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}][2024-01-28
 13:48:20.456274] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449700}][2024-01-28 13:48:20.464288] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}][2024-01-28 13:48:20.464807] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}][2024-01-28 13:48:20.464970] I 
[master(worker /opt/tier1data2019/brick):1576:crawl] _GMaster: starting history 
crawl [{turns=1}, {stime=(1705935991, 0)}, {etime=1706449700}, 
{entry_stime=(1705935991, 0)}][2024-01-28 13:48:21.514201] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}][2024-01-28 13:48:21.644609] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}][2024-01-28 13:48:22.284920] I 
[monitor(monitor):228:monitor] Monitor: worker died in startup phase 
[{brick=/opt/tier1data2019/brick}][2024-01-28 13:48:22.286189] I 
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status 
Change [{status=Faulty}][2024-01-28 13:48:32.312378] I 
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status 
Change [{status=Initializing...}][2024-01-28 13:48:32.312526] I 
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker 
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28 
13:48:32.393484] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...[2024-01-28 13:48:34.91825] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.6981}][2024-01-28 13:48:34.92130] I 
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting 
gluster volume locally...
Thanks,Anant
From: Anant Saraswat 
Sent: 28 January 2024 1:33 AM
To: Strahil Nikolov ; gluster-users@gluster.org 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds Hi @Strahil Nikolov,
I have checked the ssh connection from all the master servers and I can 
sshdrtier1data from master1 and master2 server(old master servers), but I am 
unable to ssh drtier1data from master3 (new node).
[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1dataTraceback (most recent call last):  File 
"/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in     
main()  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 259, in 
main    if args.subcmd in ("worker"):TypeError: 'in ' requires string 
as left operand, not NoneTypeConnection to drtier1data closed.
But I am able to ssh drtier1data from master3 without using thegeorep key.
[root@master3 ~]# ssh  root@drtier1dataLast login: Sun Jan 28 01:16:25 2024 
from 87.246.74.32[root@drtier1data ~]# 
Also, today I restarted the gluster server on master1 as geo-replication is 
trying to be active from master1 server, and sometimes I am getting the 
following error in gsyncd.log
[2024-01-28 01:27:24.722663] E [syncdutils(worker 
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync 
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--existing --xattrs --acls --ignore-missing-args . -e ssh 
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-0exuoeg7/75785990b3233f5dbbab9f43cc3ed895.sock 
drtier1data:/proc/55341

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-28 Thread Anant Saraswat

 [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:48:17.240311] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7294}]
[2024-01-28 13:48:17.240509] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-28 13:48:18.279007] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0384}]
[2024-01-28 13:48:18.279195] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2024-01-28 13:48:20.455937] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:48:20.456274] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449700}]
[2024-01-28 13:48:20.464288] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:48:20.464807] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:48:20.464970] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449700}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:48:21.514201] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:48:21.644609] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:48:22.284920] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:48:22.286189] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:48:32.312378] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:48:32.312526] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:48:32.393484] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:48:34.91825] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.6981}]
[2024-01-28 13:48:34.92130] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...

Thanks,
Anant


From: Anant Saraswat 
Sent: 28 January 2024 1:33 AM
To: Strahil Nikolov ; gluster-users@gluster.org 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds

Hi @Strahil Nikolov<mailto:hunter86...@yahoo.com>,

I have checked the ssh connection from all the master servers and I can ssh 
drtier1data from master1 and master2 server(old master servers), but I am 
unable to ssh drtier1data from master3 (new node).

[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in 

main()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 259, in main
if args.subcmd in ("worker"):
TypeError: 'in ' requires string as left operand, not NoneType
Connection to drtier1data closed.

But I am able to ssh  drtier1data from master3 without using the georep key.

[root@master3 ~]# ssh  root@drtier1data
Last login: Sun Jan 28 01:16:25 2024 from 87.246.74.32
[root@drtier1data ~]#

Also, today I restarted the gluster server on master1 as geo-replication is 
trying to be active from master1 server, and sometimes I am getting the 
following error in gsyncd.log

[2024-01-28 01:27:24.722663] E [syncdutils(worker 
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync 
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--existing --xattrs --acls --ignore-missing-args . -e ssh 
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-0exuoeg7/75785990b3233f5dbbab9f43cc3ed895.sock 
drtier1data:/proc/553418/cwd}, {error=3}]

Many thanks,
Anant

From: Strahil Nikolov 
Sent: 27 January 2024 5:25 AM
To: gluster-users@gluster.org ; Anant Saraswat 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
s

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-27 Thread Anant Saraswat

Hi @Strahil Nikolov<mailto:hunter86...@yahoo.com>,

I have checked the ssh connection from all the master servers and I can ssh 
drtier1data from master1 and master2 server(old master servers), but I am 
unable to ssh drtier1data from master3 (new node).

[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in 

main()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 259, in main
if args.subcmd in ("worker"):
TypeError: 'in ' requires string as left operand, not NoneType
Connection to drtier1data closed.

But I am able to ssh  drtier1data from master3 without using the georep key.

[root@master3 ~]# ssh  root@drtier1data
Last login: Sun Jan 28 01:16:25 2024 from 87.246.74.32
[root@drtier1data ~]#

Also, today I restarted the gluster server on master1 as geo-replication is 
trying to be active from master1 server, and sometimes I am getting the 
following error in gsyncd.log

[2024-01-28 01:27:24.722663] E [syncdutils(worker 
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync 
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--existing --xattrs --acls --ignore-missing-args . -e ssh 
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-0exuoeg7/75785990b3233f5dbbab9f43cc3ed895.sock 
drtier1data:/proc/553418/cwd}, {error=3}]

Many thanks,
Anant

From: Strahil Nikolov 
Sent: 27 January 2024 5:25 AM
To: gluster-users@gluster.org ; Anant Saraswat 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds

EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Don't forget to test with the georep key. I think it was 
/var/lib/glusterd/geo-replication/secret.pem

Best Regards,
Strahil Nikolov


В събота, 27 януари 2024 г. в 07:24:07 ч. Гринуич+2, Strahil Nikolov 
 написа:





Hi Anant,

i would first start checking if you can do ssh from all masters to the slave 
node.If you haven't setup a dedicated user for the session, then gluster is 
using root.

Best Regards,
Strahil Nikolov






В петък, 26 януари 2024 г. в 18:07:59 ч. Гринуич+2, Anant Saraswat 
 написа:







Hi All,




I have run the following commands on master3, and that has added master3 to 
geo-replication.




gluster system:: execute gsec_create

gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force

gluster volume geo-replication tier1data drtier1data::drtier1data stop

gluster volume geo-replication tier1data drtier1data::drtier1data start



Now I am able to start the geo-replication, but I am getting the same error.



[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]

[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]

[2024-01-24 19:51:24.158021] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...

[2024-01-24 19:51:25.951998] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7938}]

[2024-01-24 19:51:25.952292] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...

[2024-01-24 19:51:26.986974] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0346}]

[2024-01-24 19:51:26.987137] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor

[2024-01-24 19:51:29.139131] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]

[2024-01-24 19:51:29.139531] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706125889}]

[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]

[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]

[2024-01-24 19:51:29.174558] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, 
{entry_stime=(1705935991, 0)}]

[2024-01-24 19:51:30.251965] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]

[2024-

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-26 Thread Strahil Nikolov

Don't forget to test with the georep key. I think it was 
/var/lib/glusterd/geo-replication/secret.pem

Best Regards,
Strahil Nikolov


В събота, 27 януари 2024 г. в 07:24:07 ч. Гринуич+2, Strahil Nikolov 
 написа: 





Hi Anant,

i would first start checking if you can do ssh from all masters to the slave 
node.If you haven't setup a dedicated user for the session, then gluster is 
using root.

Best Regards,
Strahil Nikolov






В петък, 26 януари 2024 г. в 18:07:59 ч. Гринуич+2, Anant Saraswat 
 написа: 







Hi All,




I have run the following commands on master3, and that has added master3 to 
geo-replication.




gluster system:: execute gsec_create

gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force

gluster volume geo-replication tier1data drtier1data::drtier1data stop

gluster volume geo-replication tier1data drtier1data::drtier1data start



Now I am able to start the geo-replication, but I am getting the same error.



[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]

[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]

[2024-01-24 19:51:24.158021] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...

[2024-01-24 19:51:25.951998] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7938}]

[2024-01-24 19:51:25.952292] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...

[2024-01-24 19:51:26.986974] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0346}]

[2024-01-24 19:51:26.987137] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor

[2024-01-24 19:51:29.139131] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]

[2024-01-24 19:51:29.139531] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706125889}]

[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]

[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]

[2024-01-24 19:51:29.174558] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, 
{entry_stime=(1705935991, 0)}]

[2024-01-24 19:51:30.251965] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]

[2024-01-24 19:51:30.376715] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}]

[2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]

[2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]

Any idea why it's stuck in this loop?



Thanks,

Anant





 
From: Gluster-users  on behalf of Anant 
Saraswat 
Sent: 22 January 2024 9:00 PM
To: gluster-users@gluster.org 
Subject: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds 
 


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hi There,




We have a Gluster setup with three master nodes in replicated mode and one 
slave node with geo-replication.




# gluster volume info

Volume Name: tier1data

Type: Replicate

Volume ID: 93c45c14-f700-4d50-962b-7653be471e27

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: master1:/opt/tier1data2019/brick

Brick2: master2:/opt/tier1data2019/brick

Brick3: master3:/opt/tier1data2019/brick





master1 |master2 |  
--geo-replication- | 
drtier1datamaster3 |



We added the master3 node a few months back, the initial setup consisted of 2 
master nodes and one geo-replicated slave(drtier1data).



Our geo-replication was functioning well with the initial two master nodes 
(master1 and master2), where master1 was active and master2 was in passive 
mode. However, today, we started experiencing issues where geo-replication 
suddenly stopped and became stuck in a loop of Initializing..., Active.. Faulty 
on master1, while master2 remained in passive mode.



Upon checking the gsyncd.log on the master1

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-26 Thread Strahil Nikolov

Hi Anant,

i would first start checking if you can do ssh from all masters to the slave 
node.If you haven't setup a dedicated user for the session, then gluster is 
using root.

Best Regards,
Strahil Nikolov






В петък, 26 януари 2024 г. в 18:07:59 ч. Гринуич+2, Anant Saraswat 
 написа: 







Hi All,




I have run the following commands on master3, and that has added master3 to 
geo-replication.




gluster system:: execute gsec_create

gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force

gluster volume geo-replication tier1data drtier1data::drtier1data stop

gluster volume geo-replication tier1data drtier1data::drtier1data start



Now I am able to start the geo-replication, but I am getting the same error.



[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]

[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]

[2024-01-24 19:51:24.158021] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...

[2024-01-24 19:51:25.951998] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7938}]

[2024-01-24 19:51:25.952292] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...

[2024-01-24 19:51:26.986974] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0346}]

[2024-01-24 19:51:26.987137] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor

[2024-01-24 19:51:29.139131] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]

[2024-01-24 19:51:29.139531] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706125889}]

[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]

[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]

[2024-01-24 19:51:29.174558] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, 
{entry_stime=(1705935991, 0)}]

[2024-01-24 19:51:30.251965] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]

[2024-01-24 19:51:30.376715] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}]

[2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]

[2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]

Any idea why it's stuck in this loop?



Thanks,

Anant





 
From: Gluster-users  on behalf of Anant 
Saraswat 
Sent: 22 January 2024 9:00 PM
To: gluster-users@gluster.org 
Subject: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds 
 


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hi There,




We have a Gluster setup with three master nodes in replicated mode and one 
slave node with geo-replication.




# gluster volume info

Volume Name: tier1data

Type: Replicate

Volume ID: 93c45c14-f700-4d50-962b-7653be471e27

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: master1:/opt/tier1data2019/brick

Brick2: master2:/opt/tier1data2019/brick

Brick3: master3:/opt/tier1data2019/brick





master1 |master2 |  
--geo-replication- | 
drtier1datamaster3 |



We added the master3 node a few months back, the initial setup consisted of 2 
master nodes and one geo-replicated slave(drtier1data).



Our geo-replication was functioning well with the initial two master nodes 
(master1 and master2), where master1 was active and master2 was in passive 
mode. However, today, we started experiencing issues where geo-replication 
suddenly stopped and became stuck in a loop of Initializing..., Active.. Faulty 
on master1, while master2 remained in passive mode.



Upon checking the gsyncd.log on the master1 node, we observed the following 
error (please refer to the attached logs for more details):

E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] : 
Gluster Mount process exited [{error=ENOTCONN}]



#

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-24 Thread Anant Saraswat

Hi All,

I have run the following commands on master3, and that has added master3 to 
geo-replication.

gluster system:: execute gsec_create
gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force
gluster volume geo-replication tier1data drtier1data::drtier1data stop
gluster volume geo-replication tier1data drtier1data::drtier1data start

Now I am able to start the geo-replication, but I am getting the same error.

[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-24 19:51:24.158021] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-24 19:51:25.951998] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7938}]
[2024-01-24 19:51:25.952292] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-24 19:51:26.986974] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0346}]
[2024-01-24 19:51:26.987137] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2024-01-24 19:51:29.139131] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-24 19:51:29.139531] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706125889}]
[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-24 19:51:29.174558] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, 
{entry_stime=(1705935991, 0)}]
[2024-01-24 19:51:30.251965] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-24 19:51:30.376715] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]

Any idea why it's stuck in this loop?

Thanks,
Anant


From: Gluster-users  on behalf of Anant 
Saraswat 
Sent: 22 January 2024 9:00 PM
To: gluster-users@gluster.org 
Subject: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hi There,

We have a Gluster setup with three master nodes in replicated mode and one 
slave node with geo-replication.

# gluster volume info
Volume Name: tier1data
Type: Replicate
Volume ID: 93c45c14-f700-4d50-962b-7653be471e27
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: master1:/opt/tier1data2019/brick
Brick2: master2:/opt/tier1data2019/brick
Brick3: master3:/opt/tier1data2019/brick


master1 |
master2 |  
--geo-replication- | 
drtier1data
master3 |

We added the master3 node a few months back, the initial setup consisted of 2 
master nodes and one geo-replicated slave(drtier1data).

Our geo-replication was functioning well with the initial two master nodes 
(master1 and master2), where master1 was active and master2 was in passive 
mode. However, today, we started experiencing issues where geo-replication 
suddenly stopped and became stuck in a loop of Initializing..., Active.. Faulty 
on master1, while master2 remained in passive mode.


Upon checking the gsyncd.log on the master1 node, we observed the following 
error (please refer to the attached logs for more details):

E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] : 
Gluster Mount process exited [{error=ENOTCONN}]


# gluster volume geo-replication tier1data status

MASTER NODEMASTER VOLMASTER BRICKSLAVE USER
SLAVESLAVE NODESTATUS   
  CRAWL STATUSLAST_SYNCED

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

11 matches

Site Navigation

Mail list logo

Footer information