Re: [Gluster-users] Geo Replication sync intervals

2024-08-18 Thread Gilberto Ferreira
Oh! Is that so?
Never mind!
Just would like to know if it is possible to get some control about how
long it takes between the async jobs.
Thank you for the response.


---


Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram






Em dom., 18 de ago. de 2024 às 17:46, Strahil Nikolov 
escreveu:

> Hi Gilberto,
>
> I doubt you can change that stuff. Officially it's async replication and
> it might take some time to replicate.
>
> What do you want to improve ?
>
> Best Regards,
> Strahil Nikolov
>
> В петък, 16 август 2024 г. в 20:31:25 ч. Гринуич+3, Gilberto Ferreira <
> gilberto.nune...@gmail.com> написа:
>
>
> Hi there.
>
> I have two sites with gluster geo replication, and all work pretty well.
> But I want to check about the sync intervals and if there is some way to
> change it.
> Thanks for any tips.
> ---
>
>
> Gilbert
>
>
>
>
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo Replication sync intervals

2024-08-18 Thread Strahil Nikolov
 Hi Gilberto,
I doubt you can change that stuff. Officially it's async replication and it 
might take some time to replicate.

What do you want to improve ?

Best Regards,
Strahil Nikolov

В петък, 16 август 2024 г. в 20:31:25 ч. Гринуич+3, Gilberto Ferreira 
 написа:  
 
 Hi there.
I have two sites with gluster geo replication, and all work pretty well.But I 
want to check about the sync intervals and if there is some way to change 
it.Thanks for any tips.
---

Gilbert












Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
  



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] geo-replication {error=12} on one primary node

2024-02-15 Thread Stefan Kania

Hi,

I,m still testing and I found that I can force the error by changing the 
shell from the unprivileged user, on the secondary node, from bash to 
sh. In the  first try I used "useradd -G geogruppe -m geobenutzer" so my 
user gets /bin/sh (the dash) as default shell. Then the error occurs. 
Then I switch the user to /bin/bash and the error is gone. After the 
test with the default shell I removed rsync to look for the error. So 
now I tested with /bin/bash as default shell but without rsync is 
installed. And I got:

---
2024-02-15 08:23:23.88036] E [syncdutils(worker 
/gluster/brick):363:log_raise_exception] : FAIL:

Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 
393, in twrap

tf(*aargs)
  File "/usr/libexec/glusterfs/python/syncdaemon/primary.py", line 
2008, in syncjob

po = self.sync_engine(pb, self.log_err)
 ^^
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 
1448, in rsync

get_rsync_version(gconf.get("rsync-command")) >= "3.1.0":
^
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 
682, in get_rsync_version

p = subprocess.Popen([rsync_cmd, "--version"],
^^
  File "/usr/lib/python3.11/subprocess.py", line 1024, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.11/subprocess.py", line 1901, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] Datei oder Verzeichnis nicht gefunden: 'rsync'

---
as expected. Reinstalling rsync and everything is fine again :-). So the 
{error=12} came from /bin/sh as default shell. The missing rsync was not 
shown because geo-replication changed to faulty before rsync was used.


Stefan


Am 14.02.24 um 13:34 schrieb Stefan Kania:

Hi Anant,

shame on me ^.^. I forgot to install rsync on that host. Switching to 
log-level DEBUG helped me to find the problem. Without log-level DEBUG 
the host is not showing the missing rsync. Maybe that could be changed. 
So thank you for the hint.


Stefan

Am 13.02.24 um 20:32 schrieb Anant Saraswat:
gluster volume geo-replication 
privol01geobenutzer@s01.gluster::secvol01  config log-level DEBUG​









Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users




smime.p7s
Description: Kryptografische S/MIME-Signatur




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] geo-replication {error=12} on one primary node

2024-02-14 Thread Stefan Kania

Hi Anant,

shame on me ^.^. I forgot to install rsync on that host. Switching to 
log-level DEBUG helped me to find the problem. Without log-level DEBUG 
the host is not showing the missing rsync. Maybe that could be changed. 
So thank you for the hint.


Stefan

Am 13.02.24 um 20:32 schrieb Anant Saraswat:

gluster volume geo-replication privol01geobenutzer@s01.gluster::secvol01  
config log-level DEBUG​






smime.p7s
Description: Kryptografische S/MIME-Signatur




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] geo-replication {error=12} on one primary node

2024-02-13 Thread Anant Saraswat
Hi @Stefan Kania,

Please try to enable the geo-replication debug logs using the following command 
on the primary server, and recheck or resend the logs.

gluster volume geo-replication privol01 geobenutzer@s01.gluster::secvol01 
config log-level DEBUG​

Thanks,
Anant


From: Gluster-users  on behalf of Stefan 
Kania 
Sent: 13 February 2024 7:11 PM
To: gluster-users@gluster.org 
Subject: [Gluster-users] geo-replication {error=12} on one primary node

EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hi to all,

Yes, I saw that there is a thread about geo-replication with nearly the
same problem, I read it, but I think my problem is a bit different.

I created two volumes the primary volume "privol01" and the secondary
volume "secvol01". All hosts are having the same packages installed, all
hosts are debian12 with gluster version 10.05. So  even rsync is the
same on any of the hosts. (I installed one host (vm) and clone it).
I have:
  Volume Name: privol01
Type: Replicate
Volume ID: 93ace064-2862-41fe-9606-af5a4af9f5ab
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: p01:/gluster/brick
Brick2: p02:/gluster/brick
Brick3: p03:/gluster/brick

and:

Volume Name: secvol01
Type: Replicate
Volume ID: 4ebb7768-51da-446c-a301-dc3ea49a9ba2
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: s01:/gluster/brick
Brick2: s02:/gluster/brick
Brick3: s03:/gluster/brick

resolving the names of the hosts is working in any direction

that's what I did:
on all secondary hosts:

groupadd geogruppe
useradd -G geogruppe -m geobenutzer
passwd geobenutzer
ln -s /usr/sbin/gluster /usr/bin

on one of the secondary hosts:
gluster-mountbroker setup /var/mountbroker geogruppe

gluster-mountbroker add secvol01 geobenutzer

on one of the primary hosts:
ssh-keygen

ssh-copy-id geobenutzer@s01.gluster

gluster-georep-sshkey generate

gluster v geo-replication privol01 geobenutzer@s01.gluster::secvol01
create push-pem


on one of the secondary hosts:
/usr/libexec/glusterfs/set_geo_rep_pem_keys.sh

All the commands exited with out an error message.

Restarted glusterd on all nodes

then on the primary host:
gluster volume geo-replication privol01
geobenutzer@s01.gluster::secvol01 start

The status is showing:

PRIMARY NODEPRIMARY VOLPRIMARY BRICK SECONDARY USER
SECONDARYSECONDARY NODESTATUS CRAWL
STATUSLAST_SYNCED
---
p03 privol01   /gluster/brickgeobenutzer
geobenutzer@s01.gluster::secvol01  PassiveN/A
  N/A
p02 privol01   /gluster/brickgeobenutzer
geobenutzer@s01.gluster::secvol01  PassiveN/A
  N/A
p01 privol01   /gluster/brickgeobenutzer
geobenutzer@s01.gluster::secvol01N/A   Faulty N/A
  N/A

For p01 the status is changing from "Initializing... to" "status=Active
status=History Crawl" to status=Faulty and then back to Initializing

But only for the primary host p01.

Here is the lock from p01:

[2024-02-13 18:30:06.64585] I
[gsyncdstatus(monitor):247:set_worker_status] GeorepStatus: Worker
Status Change [{status=Initializing...}]
[2024-02-13 18:30:06.65004] I [monitor(monitor):158:monitor] Monitor:
starting gsyncd worker [{brick=/gluster/brick}, {secondary_node=s01}]
[2024-02-13 18:30:06.147194] I [resource(worker
/gluster/brick):1387:connect_remote] SSH: Initializing SSH connection
between primary and secondary...
[2024-02-13 18:30:07.85] I [resource(worker
/gluster/brick):1435:connect_remote] SSH: SSH connection between primary
and secondary established. [{duration=1.6304}]
[2024-02-13 18:30:07.777971] I [resource(worker
/gluster/brick):1116:connect] GLUSTER: Mounting gluster volume locally...
[2024-02-13 18:30:08.822077] I [resource(worker
/gluster/brick):1138:connect] GLUSTER: Mounted gluster volume
[{duration=1.0438}]
[2024-02-13 18:30:08.823039] I [subcmds(worker
/gluster/brick):84:subcmd_worker] : Worker spawn successful.
Acknowledging back to monitor
[2024-02-13 18:30:10.861742] I [primary(worker
/gluster/brick):1661:register] _GPrimary: Working dir
[{path=/var/lib/misc/gluster/gsyncd/privol01_s01.gluster_secvol01/gluster-brick}]
[2024-02-13 18:30:10.864432] I [resource(worker
/gluster/brick):1291:service_loop] GLUSTER: Register time
[{time=1707849010}]
[2024-02-13 18:30:10.906805] I [gsyncdstatus(worker
/gluster/brick):280:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2024-02-13 18:30:11.7656] I [gsyncdstatus(worker
/gluster/brick):252:set_worker_crawl_status] GeorepStatus: Crawl Status
Change [{status=History Crawl}]
[2024-02-13 

Re: [Gluster-users] Geo-replication status is getting Faulty after few    seconds

2024-01-31 Thread Anant Saraswat
Hi @Strahil Nikolov<mailto:hunter86...@yahoo.com>,

We are using 9.4 on all the nodes.

[anant@drtier1data ~]$ glusterd --version
glusterfs 9.4
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

Thanks,
Anant

From: Strahil Nikolov 
Sent: 31 January 2024 4:18 PM
To: Anant Saraswat ; Aravinda 

Cc: gluster-users@gluster.org 
Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
   seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hi Anant,

What version of Gluster are you using ?

Best Regards,
Strahil Nikolov

DISCLAIMER: This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the sender. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee, you should not 
disseminate, distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete this 
email from your system.

If you are not the intended recipient, you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited. Thanks for your cooperation.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication status is getting Faulty after few    seconds

2024-01-31 Thread Strahil Nikolov
Hi Anant,
What version of Gluster are you using ?
Best Regards,Strahil Nikolov



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-30 Thread Anant Saraswat
Hi All,

As per the documentation, if we use `delete` only it will start the replication 
from the time where it was left before deleting the session, So I tried that 
without any luck.

gluster volume geo-replication tier1data drtier1data::drtier1data delete
gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force
gluster volume geo-replication tier1data drtier1data::drtier1data start
gluster volume geo-replication tier1data drtier1data::drtier1data status

I have tried to check the drtier1data logs as well, and all I can see is 
master1 connects to drtier1data and send disconnect after 5 seconds, please 
check the following logs from drtier1data.

[2024-01-30 21:04:03.016805 +] I [MSGID: 114046] 
[client-handshake.c:857:client_setvolume_cbk] 0-drtier1data-client-0: 
Connected, attached to remote volume [{conn-name=drtier1data-client-0}, 
{remote_subvol=/opt/tier1data2019/brick}]
[2024-01-30 21:04:03.020148 +] I [fuse-bridge.c:5296:fuse_init] 
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.33
[2024-01-30 21:04:03.020197 +] I [fuse-bridge.c:5924:fuse_graph_sync] 
0-fuse: switched to graph 0
[2024-01-30 21:04:08.573873 +] I [fuse-bridge.c:6233:fuse_thread_proc] 
0-fuse: initiating unmount of /tmp/gsyncd-aux-mount-c8c41k2k
[2024-01-30 21:04:08.575131 +] W [glusterfsd.c:1429:cleanup_and_exit] 
(-->/lib64/libpthread.so.0(+0x817a) [0x7fb907e2e17a] 
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xfd) [0x55f97b17dbfd] 
-->/usr/sbin/glusterfs(cleanup_and_exit+0x58) [0x55f97b17da48] ) 0-: received 
signum (15), shutting down
[2024-01-30 21:04:08.575227 +] I [fuse-bridge.c:7063:fini] 0-fuse: 
Unmounting '/tmp/gsyncd-aux-mount-c8c41k2k'.
[2024-01-30 21:04:08.575256 +] I [fuse-bridge.c:7068:fini] 0-fuse: Closing 
fuse connection to '/tmp/gsyncd-aux-mount-c8c41k2k'.

Can anyone suggest how can I find the reason of getting these disconnect 
requests from master1 or what shall I check next?

Many thanks,
A


From: Gluster-users  on behalf of Anant 
Saraswat 
Sent: 30 January 2024 2:14 PM
To: gluster-users@gluster.org ; Strahil Nikolov 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hello Everyone,


I am looking for some help. Can anyone please suggest if it's possible to 
promote a master node to be the primary in the geo-replication session?


We have three master nodes and one secondary node. We are facing issues where 
geo-replication is consistently failing from the primary master node. We want 
to check if it works fine from another master node.


Any guidance or assistance would be highly appreciated.

Many thanks,
Anant

From: Anant Saraswat 
Sent: 29 January 2024 3:55 PM
To: gluster-users@gluster.org ; Strahil Nikolov 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds

Hi @Strahil Nikolov<mailto:hunter86...@yahoo.com>,

We have been running this geo-replication for more than 5 years and it was 
working fine till last week, So I think it shouldn't be something which was 
missed in the initial setup, but I am unable to understand why it's not working 
now.

I have enabled SSH Debug on the secondary node(drtier1data), and I can see this 
in the logs.

Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: server_input_channel_req: 
channel 0 request exec reply 1
Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: session_by_channel: session 
0 channel 0
Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: session_input_channel_req: 
session 0 req exec
Jan 29 14:25:52 drtier1data sshd[1268110]: Starting session: command for root 
from XX.236.28.58 port 53082 id 0
Jan 29 14:25:52 drtier1data sshd[1268095]: debug1: session_new: session 0
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: Received SIGCHLD.
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_by_pid: pid 1268111
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_exit_message: 
session 0 channel 0 pid 1268111
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_exit_message: 
release channel 0
Jan 29 14:25:58 drtier1data sshd[1268110]: Received disconnect from 
XX.236.28.58 port 53082:11: disconnected by user
Jan 29 14:25:58 drtier1data sshd[1268110]: Disconnected from user root 
XX.236.28.58 port 53082
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: do_cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: do_cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: closing session
Jan 29 14:25:58 drtier1data sshd[1268095]: pam_unix(sshd:session): session 
closed for user root
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: deleting credentials

As per the above logs, drtier1data node is g

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-29 Thread Anant Saraswat
Hi @Strahil Nikolov<mailto:hunter86...@yahoo.com>,

We have been running this geo-replication for more than 5 years and it was 
working fine till last week, So I think it shouldn't be something which was 
missed in the initial setup, but I am unable to understand why it's not working 
now.

I have enabled SSH Debug on the secondary node(drtier1data), and I can see this 
in the logs.

Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: server_input_channel_req: 
channel 0 request exec reply 1
Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: session_by_channel: session 
0 channel 0
Jan 29 14:25:52 drtier1data sshd[1268110]: debug1: session_input_channel_req: 
session 0 req exec
Jan 29 14:25:52 drtier1data sshd[1268110]: Starting session: command for root 
from XX.236.28.58 port 53082 id 0
Jan 29 14:25:52 drtier1data sshd[1268095]: debug1: session_new: session 0
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: Received SIGCHLD.
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_by_pid: pid 1268111
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_exit_message: 
session 0 channel 0 pid 1268111
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: session_exit_message: 
release channel 0
Jan 29 14:25:58 drtier1data sshd[1268110]: Received disconnect from 
XX.236.28.58 port 53082:11: disconnected by user
Jan 29 14:25:58 drtier1data sshd[1268110]: Disconnected from user root 
XX.236.28.58 port 53082
Jan 29 14:25:58 drtier1data sshd[1268110]: debug1: do_cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: do_cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: cleanup
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: closing session
Jan 29 14:25:58 drtier1data sshd[1268095]: pam_unix(sshd:session): session 
closed for user root
Jan 29 14:25:58 drtier1data sshd[1268095]: debug1: PAM: deleting credentials

As per the above logs, drtier1data node is getting SIGCHLD from master1. 
(Received disconnect from XX.236.28.58 port 53082:11: disconnected by user)

Also, I have checked the gsyncd.log on master1, which says "SSH: SSH connection 
between master and slave established. [{duration=1.7277}]", which means 
passwordless ssh is working fine.

As per my understanding, Master1 can connect to the drtier1data server, and 
then the geo-replication status changes to Active --> History Crawl and then 
something happens on the master1 which triggers the SSH disconnect.

is it possible to change the master node in geo-replication so that we can mark 
master2 as primary, instead of master1?

I am really struggling to fix this issue, Please help, any pointer is 
appreciated !!!

Many thanks,
Anant

From: Gluster-users  on behalf of Anant 
Saraswat 
Sent: 29 January 2024 12:20 AM
To: gluster-users@gluster.org ; Strahil Nikolov 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

HI Strahil,

As mentioned in my last email, I have copied the gluster public key from 
master3 to secondary server, and I can now ssh from all master nodes to 
secondary server, but still getting the same error.

[root@master1 geo-replication]# ssh root@drtier1data -i 
/var/lib/glusterd/geo-replication/secret.pem
Last login: Mon Jan 29 00:14:32 2024 from
[root@drtier1data ~]#

[root@master2 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Last login: Mon Jan 29 00:02:34 2024 from
[root@drtier1data ~]#

[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Last login: Mon Jan 29 00:14:41 2024 from
[root@drtier1data ~]#

Thanks,
Anant

From: Strahil Nikolov 
Sent: 28 January 2024 10:07 PM
To: Anant Saraswat ; gluster-users@gluster.org 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Gluster doesn't use the ssh key in /root/.ssh, thus you need to exchange the 
public key that corresponds to /var/lib/glusterd/geo-replication/secret.pem . 
If you don't know the pub key, google how to obtain it from the private key.

Ensure that all hosts can ssh to the secondary before proceeding with the 
troubleshooting.

Best Regards,
Strahil Nikolov

On Sun, Jan 28, 2024 at 15:58, Anant Saraswat
 wrote:
Hi All,

I have now copied  /var/lib/glusterd/geo-replication/secret.pem.pub  (public 
key) from master3 to drtier1data /root/.ssh/authorized_keys, and now I can ssh 
from master node3 to drtier1data using the georep key 
(/var/lib/glusterd/geo-replication/secret.pem).

But I am still getting the same error, and geo-replication is getting faulty 
again and again.

[2024-01-28 13:46:38.897683] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449598}]
[2024-01-28 13:46:

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-28 Thread Anant Saraswat
HI Strahil,

As mentioned in my last email, I have copied the gluster public key from 
master3 to secondary server, and I can now ssh from all master nodes to 
secondary server, but still getting the same error.

[root@master1 geo-replication]# ssh root@drtier1data -i 
/var/lib/glusterd/geo-replication/secret.pem
Last login: Mon Jan 29 00:14:32 2024 from
[root@drtier1data ~]#

[root@master2 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Last login: Mon Jan 29 00:02:34 2024 from
[root@drtier1data ~]#

[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Last login: Mon Jan 29 00:14:41 2024 from
[root@drtier1data ~]#

Thanks,
Anant

From: Strahil Nikolov 
Sent: 28 January 2024 10:07 PM
To: Anant Saraswat ; gluster-users@gluster.org 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Gluster doesn't use the ssh key in /root/.ssh, thus you need to exchange the 
public key that corresponds to /var/lib/glusterd/geo-replication/secret.pem . 
If you don't know the pub key, google how to obtain it from the private key.

Ensure that all hosts can ssh to the secondary before proceeding with the 
troubleshooting.

Best Regards,
Strahil Nikolov

On Sun, Jan 28, 2024 at 15:58, Anant Saraswat
 wrote:
Hi All,

I have now copied  /var/lib/glusterd/geo-replication/secret.pem.pub  (public 
key) from master3 to drtier1data /root/.ssh/authorized_keys, and now I can ssh 
from master node3 to drtier1data using the georep key 
(/var/lib/glusterd/geo-replication/secret.pem).

But I am still getting the same error, and geo-replication is getting faulty 
again and again.

[2024-01-28 13:46:38.897683] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449598}]
[2024-01-28 13:46:38.922491] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:46:38.923127] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:46:38.923313] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449598}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:46:39.973584] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:46:40.98970] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:46:40.757691] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:46:40.766860] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:46:50.793311] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:46:50.793469] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:46:50.874474] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:46:52.659114] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7844}]
[2024-01-28 13:46:52.659461] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-28 13:46:53.698769] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0392}]
[2024-01-28 13:46:53.698984] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2024-01-28 13:46:55.831999] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:46:55.832354] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449615}]
[2024-01-28 13:46:55.854684] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:46:55.855251] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:46:55.855419] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449615}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:46:56.905496] I [mast

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-28 Thread Strahil Nikolov
:set_worker_status] GeorepStatus: Worker Status 
Change [{status=Faulty}][2024-01-28 13:48:15.430175] I 
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status 
Change [{status=Initializing...}][2024-01-28 13:48:15.430308] I 
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker 
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28 
13:48:15.510770] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...[2024-01-28 13:48:17.240311] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7294}][2024-01-28 13:48:17.240509] I 
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting 
gluster volume locally...[2024-01-28 13:48:18.279007] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0384}][2024-01-28 13:48:18.279195] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor[2024-01-28 13:48:20.455937] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}][2024-01-28
 13:48:20.456274] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449700}][2024-01-28 13:48:20.464288] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}][2024-01-28 13:48:20.464807] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}][2024-01-28 13:48:20.464970] I 
[master(worker /opt/tier1data2019/brick):1576:crawl] _GMaster: starting history 
crawl [{turns=1}, {stime=(1705935991, 0)}, {etime=1706449700}, 
{entry_stime=(1705935991, 0)}][2024-01-28 13:48:21.514201] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}][2024-01-28 13:48:21.644609] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}][2024-01-28 13:48:22.284920] I 
[monitor(monitor):228:monitor] Monitor: worker died in startup phase 
[{brick=/opt/tier1data2019/brick}][2024-01-28 13:48:22.286189] I 
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status 
Change [{status=Faulty}][2024-01-28 13:48:32.312378] I 
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status 
Change [{status=Initializing...}][2024-01-28 13:48:32.312526] I 
[monitor(monitor):160:monitor] Monitor: starting gsyncd worker 
[{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}][2024-01-28 
13:48:32.393484] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...[2024-01-28 13:48:34.91825] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.6981}][2024-01-28 13:48:34.92130] I 
[resource(worker /opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting 
gluster volume locally...
Thanks,Anant
From: Anant Saraswat 
Sent: 28 January 2024 1:33 AM
To: Strahil Nikolov ; gluster-users@gluster.org 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds Hi @Strahil Nikolov,
I have checked the ssh connection from all the master servers and I can 
sshdrtier1data from master1 and master2 server(old master servers), but I am 
unable to ssh drtier1data from master3 (new node).
[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1dataTraceback (most recent call last):  File 
"/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in     
main()  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 259, in 
main    if args.subcmd in ("worker"):TypeError: 'in ' requires string 
as left operand, not NoneTypeConnection to drtier1data closed.
But I am able to ssh drtier1data from master3 without using thegeorep key.
[root@master3 ~]# ssh  root@drtier1dataLast login: Sun Jan 28 01:16:25 2024 
from 87.246.74.32[root@drtier1data ~]# 
Also, today I restarted the gluster server on master1 as geo-replication is 
trying to be active from master1 server, and sometimes I am getting the 
following error in gsyncd.log
[2024-01-28 01:27:24.722663] E [syncdutils(worker 
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync 
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--existing --xattrs --acls --ignore-missing-args . -e ssh 
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-0exuoeg7/7578599

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-28 Thread Anant Saraswat
[2024-01-28 13:48:15.510770] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:48:17.240311] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7294}]
[2024-01-28 13:48:17.240509] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-28 13:48:18.279007] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0384}]
[2024-01-28 13:48:18.279195] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2024-01-28 13:48:20.455937] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-28 13:48:20.456274] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706449700}]
[2024-01-28 13:48:20.464288] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-28 13:48:20.464807] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-28 13:48:20.464970] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706449700}, 
{entry_stime=(1705935991, 0)}]
[2024-01-28 13:48:21.514201] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-28 13:48:21.644609] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-28 13:48:22.284920] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-28 13:48:22.286189] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]
[2024-01-28 13:48:32.312378] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-28 13:48:32.312526] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-28 13:48:32.393484] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-28 13:48:34.91825] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.6981}]
[2024-01-28 13:48:34.92130] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...

Thanks,
Anant


From: Anant Saraswat 
Sent: 28 January 2024 1:33 AM
To: Strahil Nikolov ; gluster-users@gluster.org 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds

Hi @Strahil Nikolov<mailto:hunter86...@yahoo.com>,

I have checked the ssh connection from all the master servers and I can ssh 
drtier1data from master1 and master2 server(old master servers), but I am 
unable to ssh drtier1data from master3 (new node).

[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in 

main()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 259, in main
if args.subcmd in ("worker"):
TypeError: 'in ' requires string as left operand, not NoneType
Connection to drtier1data closed.

But I am able to ssh  drtier1data from master3 without using the georep key.

[root@master3 ~]# ssh  root@drtier1data
Last login: Sun Jan 28 01:16:25 2024 from 87.246.74.32
[root@drtier1data ~]#

Also, today I restarted the gluster server on master1 as geo-replication is 
trying to be active from master1 server, and sometimes I am getting the 
following error in gsyncd.log

[2024-01-28 01:27:24.722663] E [syncdutils(worker 
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync 
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--existing --xattrs --acls --ignore-missing-args . -e ssh 
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-0exuoeg7/75785990b3233f5dbbab9f43cc3ed895.sock 
drtier1data:/proc/553418/cwd}, {error=3}]

Many thanks,
Anant

From: Strahil Nikolov 
Sent: 27 January 2024 5:25 AM
To: gluster-users@gluster.org ; Anant Saraswat 

Subject: Re: [Gluster-users] Geo-repl

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-27 Thread Anant Saraswat
Hi @Strahil Nikolov<mailto:hunter86...@yahoo.com>,

I have checked the ssh connection from all the master servers and I can ssh 
drtier1data from master1 and master2 server(old master servers), but I am 
unable to ssh drtier1data from master3 (new node).

[root@master3 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem 
root@drtier1data
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in 

main()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 259, in main
if args.subcmd in ("worker"):
TypeError: 'in ' requires string as left operand, not NoneType
Connection to drtier1data closed.

But I am able to ssh  drtier1data from master3 without using the georep key.

[root@master3 ~]# ssh  root@drtier1data
Last login: Sun Jan 28 01:16:25 2024 from 87.246.74.32
[root@drtier1data ~]#

Also, today I restarted the gluster server on master1 as geo-replication is 
trying to be active from master1 server, and sometimes I am getting the 
following error in gsyncd.log

[2024-01-28 01:27:24.722663] E [syncdutils(worker 
/opt/tier1data2019/brick):847:errlog] Popen: command returned error [{cmd=rsync 
-aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--existing --xattrs --acls --ignore-missing-args . -e ssh 
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-0exuoeg7/75785990b3233f5dbbab9f43cc3ed895.sock 
drtier1data:/proc/553418/cwd}, {error=3}]

Many thanks,
Anant

From: Strahil Nikolov 
Sent: 27 January 2024 5:25 AM
To: gluster-users@gluster.org ; Anant Saraswat 

Subject: Re: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds

EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Don't forget to test with the georep key. I think it was 
/var/lib/glusterd/geo-replication/secret.pem

Best Regards,
Strahil Nikolov


В събота, 27 януари 2024 г. в 07:24:07 ч. Гринуич+2, Strahil Nikolov 
 написа:





Hi Anant,

i would first start checking if you can do ssh from all masters to the slave 
node.If you haven't setup a dedicated user for the session, then gluster is 
using root.

Best Regards,
Strahil Nikolov






В петък, 26 януари 2024 г. в 18:07:59 ч. Гринуич+2, Anant Saraswat 
 написа:







Hi All,




I have run the following commands on master3, and that has added master3 to 
geo-replication.




gluster system:: execute gsec_create

gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force

gluster volume geo-replication tier1data drtier1data::drtier1data stop

gluster volume geo-replication tier1data drtier1data::drtier1data start



Now I am able to start the geo-replication, but I am getting the same error.



[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]

[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]

[2024-01-24 19:51:24.158021] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...

[2024-01-24 19:51:25.951998] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7938}]

[2024-01-24 19:51:25.952292] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...

[2024-01-24 19:51:26.986974] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0346}]

[2024-01-24 19:51:26.987137] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor

[2024-01-24 19:51:29.139131] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]

[2024-01-24 19:51:29.139531] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706125889}]

[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]

[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]

[2024-01-24 19:51:29.174558] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, 
{entry_stime=(1705935991, 0)}]

[2024-01-24 19:51:30.251965] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-26 Thread Strahil Nikolov
Don't forget to test with the georep key. I think it was 
/var/lib/glusterd/geo-replication/secret.pem

Best Regards,
Strahil Nikolov


В събота, 27 януари 2024 г. в 07:24:07 ч. Гринуич+2, Strahil Nikolov 
 написа: 





Hi Anant,

i would first start checking if you can do ssh from all masters to the slave 
node.If you haven't setup a dedicated user for the session, then gluster is 
using root.

Best Regards,
Strahil Nikolov






В петък, 26 януари 2024 г. в 18:07:59 ч. Гринуич+2, Anant Saraswat 
 написа: 







Hi All,




I have run the following commands on master3, and that has added master3 to 
geo-replication.




gluster system:: execute gsec_create

gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force

gluster volume geo-replication tier1data drtier1data::drtier1data stop

gluster volume geo-replication tier1data drtier1data::drtier1data start



Now I am able to start the geo-replication, but I am getting the same error.



[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]

[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]

[2024-01-24 19:51:24.158021] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...

[2024-01-24 19:51:25.951998] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7938}]

[2024-01-24 19:51:25.952292] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...

[2024-01-24 19:51:26.986974] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0346}]

[2024-01-24 19:51:26.987137] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor

[2024-01-24 19:51:29.139131] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]

[2024-01-24 19:51:29.139531] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706125889}]

[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]

[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]

[2024-01-24 19:51:29.174558] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, 
{entry_stime=(1705935991, 0)}]

[2024-01-24 19:51:30.251965] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]

[2024-01-24 19:51:30.376715] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}]

[2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]

[2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]

Any idea why it's stuck in this loop?



Thanks,

Anant





 
From: Gluster-users  on behalf of Anant 
Saraswat 
Sent: 22 January 2024 9:00 PM
To: gluster-users@gluster.org 
Subject: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds 
 


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hi There,




We have a Gluster setup with three master nodes in replicated mode and one 
slave node with geo-replication.




# gluster volume info

Volume Name: tier1data

Type: Replicate

Volume ID: 93c45c14-f700-4d50-962b-7653be471e27

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: master1:/opt/tier1data2019/brick

Brick2: master2:/opt/tier1data2019/brick

Brick3: master3:/opt/tier1data2019/brick





master1 |master2 |  
--geo-replication- | 
drtier1datamaster3 |



We added the master3 node a few months back, the initial setup consisted of 2 
master nodes and one geo-replicated slave(drtier1data).



Our geo-replication was functioning well with the initial two master nodes 
(master1 and master2), where master1 was active and master2 was in passive 
mode. However, today, we started experiencing issues where geo-replication 
suddenly stopped and became stuck in a loop of Initializing..., Active.. Faulty 
on master1, while master2 remained in passive mode.



Upon checking the gsyncd.log on the master1 n

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-26 Thread Strahil Nikolov
Hi Anant,

i would first start checking if you can do ssh from all masters to the slave 
node.If you haven't setup a dedicated user for the session, then gluster is 
using root.

Best Regards,
Strahil Nikolov






В петък, 26 януари 2024 г. в 18:07:59 ч. Гринуич+2, Anant Saraswat 
 написа: 







Hi All,




I have run the following commands on master3, and that has added master3 to 
geo-replication.




gluster system:: execute gsec_create

gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force

gluster volume geo-replication tier1data drtier1data::drtier1data stop

gluster volume geo-replication tier1data drtier1data::drtier1data start



Now I am able to start the geo-replication, but I am getting the same error.



[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]

[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]

[2024-01-24 19:51:24.158021] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...

[2024-01-24 19:51:25.951998] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7938}]

[2024-01-24 19:51:25.952292] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...

[2024-01-24 19:51:26.986974] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0346}]

[2024-01-24 19:51:26.987137] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor

[2024-01-24 19:51:29.139131] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]

[2024-01-24 19:51:29.139531] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706125889}]

[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]

[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]

[2024-01-24 19:51:29.174558] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, 
{entry_stime=(1705935991, 0)}]

[2024-01-24 19:51:30.251965] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]

[2024-01-24 19:51:30.376715] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}]

[2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]

[2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]

Any idea why it's stuck in this loop?



Thanks,

Anant





 
From: Gluster-users  on behalf of Anant 
Saraswat 
Sent: 22 January 2024 9:00 PM
To: gluster-users@gluster.org 
Subject: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds 
 


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hi There,




We have a Gluster setup with three master nodes in replicated mode and one 
slave node with geo-replication.




# gluster volume info

Volume Name: tier1data

Type: Replicate

Volume ID: 93c45c14-f700-4d50-962b-7653be471e27

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: master1:/opt/tier1data2019/brick

Brick2: master2:/opt/tier1data2019/brick

Brick3: master3:/opt/tier1data2019/brick





master1 |master2 |  
--geo-replication- | 
drtier1datamaster3 |



We added the master3 node a few months back, the initial setup consisted of 2 
master nodes and one geo-replicated slave(drtier1data).



Our geo-replication was functioning well with the initial two master nodes 
(master1 and master2), where master1 was active and master2 was in passive 
mode. However, today, we started experiencing issues where geo-replication 
suddenly stopped and became stuck in a loop of Initializing..., Active.. Faulty 
on master1, while master2 remained in passive mode.



Upon checking the gsyncd.log on the master1 node, we observed the following 
error (please refer to the attached logs for more details):

E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] : 
Gluster Mount process exited [{error=ENOTCONN}]



# 

Re: [Gluster-users] Geo-replication status is getting Faulty after few seconds

2024-01-24 Thread Anant Saraswat
Hi All,

I have run the following commands on master3, and that has added master3 to 
geo-replication.

gluster system:: execute gsec_create
gluster volume geo-replication tier1data drtier1data::drtier1data create 
push-pem force
gluster volume geo-replication tier1data drtier1data::drtier1data stop
gluster volume geo-replication tier1data drtier1data::drtier1data start

Now I am able to start the geo-replication, but I am getting the same error.

[2024-01-24 19:51:24.80892] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}]
[2024-01-24 19:51:24.81020] I [monitor(monitor):160:monitor] Monitor: starting 
gsyncd worker [{brick=/opt/tier1data2019/brick}, {slave_node=drtier1data}]
[2024-01-24 19:51:24.158021] I [resource(worker 
/opt/tier1data2019/brick):1387:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2024-01-24 19:51:25.951998] I [resource(worker 
/opt/tier1data2019/brick):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.7938}]
[2024-01-24 19:51:25.952292] I [resource(worker 
/opt/tier1data2019/brick):1116:connect] GLUSTER: Mounting gluster volume 
locally...
[2024-01-24 19:51:26.986974] I [resource(worker 
/opt/tier1data2019/brick):1139:connect] GLUSTER: Mounted gluster volume 
[{duration=1.0346}]
[2024-01-24 19:51:26.987137] I [subcmds(worker 
/opt/tier1data2019/brick):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2024-01-24 19:51:29.139131] I [master(worker 
/opt/tier1data2019/brick):1662:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick}]
[2024-01-24 19:51:29.139531] I [resource(worker 
/opt/tier1data2019/brick):1292:service_loop] GLUSTER: Register time 
[{time=1706125889}]
[2024-01-24 19:51:29.173877] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):281:set_active] GeorepStatus: Worker Status Change 
[{status=Active}]
[2024-01-24 19:51:29.174407] I [gsyncdstatus(worker 
/opt/tier1data2019/brick):253:set_worker_crawl_status] GeorepStatus: Crawl 
Status Change [{status=History Crawl}]
[2024-01-24 19:51:29.174558] I [master(worker 
/opt/tier1data2019/brick):1576:crawl] _GMaster: starting history crawl 
[{turns=1}, {stime=(1705935991, 0)}, {etime=1706125889}, 
{entry_stime=(1705935991, 0)}]
[2024-01-24 19:51:30.251965] I [master(worker 
/opt/tier1data2019/brick):1605:crawl] _GMaster: slave's time 
[{stime=(1705935991, 0)}]
[2024-01-24 19:51:30.376715] E [syncdutils(worker 
/opt/tier1data2019/brick):346:log_raise_exception] : Gluster Mount process 
exited [{error=ENOTCONN}]
[2024-01-24 19:51:30.991856] I [monitor(monitor):228:monitor] Monitor: worker 
died in startup phase [{brick=/opt/tier1data2019/brick}]
[2024-01-24 19:51:30.993608] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Faulty}]

Any idea why it's stuck in this loop?

Thanks,
Anant


From: Gluster-users  on behalf of Anant 
Saraswat 
Sent: 22 January 2024 9:00 PM
To: gluster-users@gluster.org 
Subject: [Gluster-users] Geo-replication status is getting Faulty after few 
seconds


EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Hi There,

We have a Gluster setup with three master nodes in replicated mode and one 
slave node with geo-replication.

# gluster volume info
Volume Name: tier1data
Type: Replicate
Volume ID: 93c45c14-f700-4d50-962b-7653be471e27
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: master1:/opt/tier1data2019/brick
Brick2: master2:/opt/tier1data2019/brick
Brick3: master3:/opt/tier1data2019/brick


master1 |
master2 |  
--geo-replication- | 
drtier1data
master3 |

We added the master3 node a few months back, the initial setup consisted of 2 
master nodes and one geo-replicated slave(drtier1data).

Our geo-replication was functioning well with the initial two master nodes 
(master1 and master2), where master1 was active and master2 was in passive 
mode. However, today, we started experiencing issues where geo-replication 
suddenly stopped and became stuck in a loop of Initializing..., Active.. Faulty 
on master1, while master2 remained in passive mode.


Upon checking the gsyncd.log on the master1 node, we observed the following 
error (please refer to the attached logs for more details):

E [syncdutils(worker /opt/tier1data2019/brick):346:log_raise_exception] : 
Gluster Mount process exited [{error=ENOTCONN}]


# gluster volume geo-replication tier1data status

MASTER NODEMASTER VOLMASTER BRICKSLAVE USER
SLAVESLAVE NODESTATUS   
  CRAWL STATUSLAST_SYNCED
---

Re: [Gluster-users] Geo replication procedure for DR

2023-06-11 Thread Strahil Nikolov
To be honest, I have never reached that point but I think that if the original 
volume is too outdated it makes sense to setup a new volume on primary site and 
run a replication from the DR to primary site and then schedule a cut-over 
(Read-only DR volume, remove replication, point all clients to main site).
You will need to test the whole scenario on a separate cluster , till the 
procedure is well established.

Best Regards,Strahil Nikolov 

Sent from Yahoo Mail for iPhone


On Wednesday, June 7, 2023, 9:13 PM, mabi  wrote:

Dear Strahil,
Thank you for the detailed command. So once you want to switch all traffic to 
the DR site in case of disaster one should first disable the read-only setting 
on the secondary volume on the slave site.
What happens after when the master site is back online? What's the procedure 
there? I had the following question in my previous mail in this regard:

"And once the primary site is back online how do you copy back or sync all data 
changes done on the secondary volume on the secondary site back to the primary 
volume on the primary site?"
Best regards,Mabi

 --- Original Message ---
 On Wednesday, June 7th, 2023 at 6:52 AM, Strahil Nikolov 
 wrote:

 
 It's just a setting on the target volume:
gluster volume set  read-only OFF
Best Regards,Strahil Nikolov 
 
 
  On Mon, Jun 5, 2023 at 22:30, mabi wrote:   Hello,

I was reading the geo replication documentation here:

https://docs.gluster.org/en/main/Administrator-Guide/Geo-Replication/

and I was wondering how it works when in case of disaster recovery when the 
primary cluster is down and the the secondary site with the volume needs to be 
used?

What is the procedure here to make the secondary volume on the secondary site 
available for read/write?

And once the primary site is back online how do you copy back or sync all data 
changes done on the secondary volume on the secondary site back to the primary 
volume on the primary site?

Best regards,
Mabi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
  
 

 






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo replication procedure for DR

2023-06-07 Thread mabi
Dear Strahil,

Thank you for the detailed command. So once you want to switch all traffic to 
the DR site in case of disaster one should first disable the read-only setting 
on the secondary volume on the slave site.

What happens after when the master site is back online? What's the procedure 
there? I had the following question in my previous mail in this regard:

"And once the primary site is back online how do you copy back or sync all data 
changes done on the secondary volume on the secondary site back to the primary 
volume on the primary site?"

Best regards,
Mabi

--- Original Message ---
On Wednesday, June 7th, 2023 at 6:52 AM, Strahil Nikolov 
 wrote:

> It's just a setting on the target volume:
>
> gluster volume set  read-only OFF
>
> Best Regards,
> Strahil Nikolov
>
>> On Mon, Jun 5, 2023 at 22:30, mabi
>>  wrote:
>> Hello,
>>
>> I was reading the geo replication documentation here:
>>
>> https://docs.gluster.org/en/main/Administrator-Guide/Geo-Replication/
>>
>> and I was wondering how it works when in case of disaster recovery when the 
>> primary cluster is down and the the secondary site with the volume needs to 
>> be used?
>>
>> What is the procedure here to make the secondary volume on the secondary 
>> site available for read/write?
>>
>> And once the primary site is back online how do you copy back or sync all 
>> data changes done on the secondary volume on the secondary site back to the 
>> primary volume on the primary site?
>>
>> Best regards,
>> Mabi
>> 
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo replication procedure for DR

2023-06-06 Thread Strahil Nikolov
It's just a setting on the target volume:
gluster volume set  read-only OFF
Best Regards,Strahil Nikolov 
 
 
  On Mon, Jun 5, 2023 at 22:30, mabi wrote:   Hello,

I was reading the geo replication documentation here:

https://docs.gluster.org/en/main/Administrator-Guide/Geo-Replication/

and I was wondering how it works when in case of disaster recovery when the 
primary cluster is down and the the secondary site with the volume needs to be 
used?

What is the procedure here to make the secondary volume on the secondary site 
available for read/write?

And once the primary site is back online how do you copy back or sync all data 
changes done on the secondary volume on the secondary site back to the primary 
volume on the primary site?

Best regards,
Mabi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
  




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-Replication Stuck In "Hybrid Crawl"

2021-09-13 Thread Boubacar Cisse
Hi,

Yes, I have checked both
/var/log/gluster/geo-replication/ (on
primary nodes) and
/var/log/gluster/geo-replication-slaves/ (on
slave node) but not finding any relevant information despite the fact that
I've set all log levels to DEBUG. Looked at gsyncd.logs and bricks logs. At
this point, I'm not even certain geo-replication is actually working. df
command on slave indicates that the volume's brick is being filled with
data but can't figure out how to confirm that things are actually working
but just slow. I have deleted the geo-replication session, reset the bricks
and started a new session but still no luck. Data on volume is less than 20
GB but process has been stuck in "Hybrid Crawl" for over a week.


*** gsyncd.log on primary ***
[2021-09-13 21:53:08.4801] D [repce(worker /gfs2-data/brick):195:push]
RepceClient: call 25808:140134674573056:1631569988.0047505
keep_alive({'version': (1, 0), 'uuid':
'560520f1-d06a-47d9-af6d-153c68016e82', 'retval': 0, 'volume_mark':
(1551463906, 939763), 'timeout': 1631570108},) ...
[2021-09-13 21:53:08.40326] D [repce(worker /gfs2-data/brick):215:__call__]
RepceClient: call 25808:140134674573056:1631569988.0047505 keep_alive -> 23
[2021-09-13 21:53:11.200769] D [master(worker
/gfs2-data/brick):554:crawlwrap] _GMaster: ... crawl #0 done, took 5.043846
seconds
[2021-09-13 21:53:11.237383] D [master(worker
/gfs2-data/brick):578:crawlwrap] _GMaster: Crawl info cluster_stime=61
   brick_stime=(-1, 0)
[2021-09-13 21:53:16.240783] D [master(worker
/gfs2-data/brick):554:crawlwrap] _GMaster: ... crawl #0 done, took 5.039845
seconds
[2021-09-13 21:53:16.642778] D [master(worker
/gfs2-data/brick):578:crawlwrap] _GMaster: Crawl info cluster_stime=61
   brick_stime=(-1, 0)
[2021-09-13 21:53:21.647924] D [master(worker
/gfs2-data/brick):554:crawlwrap] _GMaster: ... crawl #0 done, took 5.406957
seconds
[2021-09-13 21:53:21.648072] D [master(worker
/gfs2-data/brick):560:crawlwrap] _GMaster: 0 crawls, 0 turns


*** gsyncd.log on slave *** [MESSAGE KEEPS REPEATING]
{'op': 'META', 'skip_entry': False, 'go':
'.gfid/341e4a74-b783-4d03-b678-13cd83691ca2', 'stat': {'uid': 33, 'gid':
33, 'mode': 16877, 'atime': 1620058938.6504347, 'mtime':
1630466794.4308176}},
{'op': 'META', 'skip_entry': False, 'go':
'.gfid/454dc70d-e57f-4166-b9b1-9dcbc88906ad', 'stat': {'uid': 33, 'gid':
33, 'mode': 16877, 'atime': 1625766157.52317, 'mtime': 1627944976.2114644}},
{'op': 'META', 'skip_entry': False, 'go':
'.gfid/f7a63767-3ec3-444f-8890-f6bdc569317a', 'stat': {'uid': 33, 'gid':
33, 'mode': 16877, 'atime': 1623954033.0488186, 'mtime':
1630506668.4986405}},
{'op': 'META', 'skip_entry': False, 'go':
'.gfid/e52237fc-d8a7-43e6-8e1d-3f66b6e17bed', 'stat': {'uid': 33, 'gid':
33, 'mode': 16877, 'atime': 1623689028.9785645, 'mtime':
1631113995.6731815}}]
[2021-09-13 21:31:51.388329] I [resource(slave
media01/gfs2-data/brick):1098:connect] GLUSTER: Mounting gluster volume
locally...
[2021-09-13 21:31:51.490466] D [resource(slave
media01/gfs2-data/brick):872:inhibit] MountbrokerMounter: auxiliary
glusterfs mount in place
[2021-09-13 21:31:52.579018] D [resource(slave
media01/gfs2-data/brick):939:inhibit] MountbrokerMounter: Lazy umount done:
/var/mountbroker-root/mb_hive/mntWa5v9P
[2021-09-13 21:31:52.579506] D [resource(slave
media01/gfs2-data/brick):946:inhibit] MountbrokerMounter: auxiliary
glusterfs mount prepared
[2021-09-13 21:31:52.579624] I [resource(slave
media01/gfs2-data/brick):1121:connect] GLUSTER: Mounted gluster volume
duration=1.1912
[2021-09-13 21:31:52.580047] I [resource(slave
media01/gfs2-data/brick):1148:service_loop] GLUSTER: slave listening

Regards,

-Boubacar


On Mon, Sep 13, 2021 at 7:53 AM Strahil Nikolov 
wrote:

> Did you check the logs on the primary nodes
> /var/log/gluster/geo-replication// ?
>
> Best Regards,
> Strahil Nikolov
>
> On Mon, Sep 13, 2021 at 14:55, Boubacar Cisse
>  wrote:
> Currently using gluster 6.10 and have configured geo replication but crawl
> status has been stuck in "Hybrid Crawl" for weeks now. Can't find any
> potential issues in logs and data appears to be transferred even though
> extremely slowly. Any suggestions on what else to look for to help
> troubleshoot this issue? Any help will be appreciated.
>
> root@host01:~# gluster --version
> glusterfs 6.10
> Repository revision: git://git.gluster.org/glusterfs.git
> Copyright (c) 2006-2016 Red Hat, Inc. 
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> It is licensed to you under your choice of the GNU Lesser
> General Public License, version 3 or any later version (LGPLv3
> or later), or the GNU General Public License, version 2 (GPLv2),
> in all cases as published by the Free Software Foundation.
>
>
> root@host01:~# gluster volume geo-replication gfs1 geo-user@host03::gfs1
> status
>
> MASTER NODEMASTER VOLMASTER BRICKSLAVE USERSLAVE
>SLAVE NODESTATUS CRAWL STATUSLAST_SYNCED
>
> -

Re: [Gluster-users] Geo-replication and changelogs cleaning

2021-09-01 Thread Beard Lionel
Hi,

I will be pleased to make a test with your fix.

Cordialement, Regards,
Lionel BEARD
CLS - IT & Operations

De : Aravinda VK 
Envoyé : mercredi 1 septembre 2021 15:58
À : Beard Lionel 
Cc : gluster-users@gluster.org; sacha...@redhat.com
Objet : Re: [Gluster-users] Geo-replication and changelogs cleaning

CAUTION: This message comes from an external server, do not click on links or 
open attachments unless you know the sender and are sure the content is safe.

Hi,

I think the "archive_gluster_changelogs" repo not updated after the backend 
changelogs are restructured(This Patch: 
https://github.com/gluster/glusterfs/commit/ec3df84dcfd7ccda0a18fa75e3b425c090209adf#diff-64c754d7b6ec77154042671072debc69456f43b3abc34354d5b818937635600f<https://m365.eu.vadesecure.com/safeproxy/v4?f=fJtYxAKprC8a-pqijtUZq6RxrB6mYjuV7p24qwQzVxgCaypzXrwdBVy2BNVubcOT&i=6wsY9P0iOYbvMzw_u_9zBuukSJ42yeYOZQPwsfvNFWy6AC1V52xcuiIVo-wD1ZOe2iWAf46m4KHw6yW7xixMBA&k=UZSM&r=mjEpt-1yfQLn_0HhT_ztRGNaoJrXn7Lr7TIB-eoxJ8omUrpvOUcOXNho1U1VyfVz&s=6e2914688601365c699e400fee4a7075ad2e8d1bdcbfe71e21e8e7b5abe89091&u=https%3A%2F%2Fgithub.com%2Fgluster%2Fglusterfs%2Fcommit%2Fec3df84dcfd7ccda0a18fa75e3b425c090209adf%23diff-64c754d7b6ec77154042671072debc69456f43b3abc34354d5b818937635600f>)

If I fix the script to understand the new backend format, will it work for you?

Aravinda Vishwanathapura
https://kadalu.io<https://m365.eu.vadesecure.com/safeproxy/v4?f=xU8i5HuZWyvfY_RCjlYxvD6t6nfFqw24Hjqk4IcHafvlnVxL6I_0F1THzFsLbKPJ&i=H0V6iFERDlJYvTyUUM6BfWknBpSxgPFRDUk7cM_U6-F38Ns8QKCQXxpyoWIhCyU-07L50y6NlHRmAf-DDr5gJQ&k=LksV&r=j1N3ojGiBzIp6R5e09hxo8GseH0379Bc9NXBaQsOvyrbIkTkmWf_ugEzbPXsb1re&s=90cae59ef567535b5af2c0e215c133ad050380cf8ecc3b6d685e315203e3bfb8&u=https%3A%2F%2Fkadalu.io>


On 01-Sep-2021, at 7:07 PM, Shwetha Acharya 
mailto:sacha...@redhat.com>> wrote:

+aravi...@kadalu.io<mailto:aravi...@kadalu.io>

On Tue, Aug 31, 2021 at 6:26 PM Beard Lionel 
mailto:lbe...@groupcls.com>> wrote:
Hi everyone,

I am currently using GlusterFS 8.4 on Ubuntu Bionic for an application hosted 
in Azure, with geo-replication configured for most of the volumes.
But the way changelogs files are managed is a little bit annoying, as there are 
never deleted from .glusterfs directory on source volume.

I have found this message from this list (2017): 
https://www.mail-archive.com/gluster-users@gluster.org/msg28565.html<https://m365.eu.vadesecure.com/safeproxy/v4?f=bodh6V2_6gyp3PPHzxrTpoulLHyRsef-Z2rFdNd57XzX6gplBF74InwbijkJSE6T&i=zSgtD6kDWWnZ2YLmsEA5G9Hi0DxiuvPx30uQ2MFypIk4JatRPWQW3xrr2HCFc_aCpLMYgkBArHcvNYj6qom5VQ&k=IH9R&r=mYNn-g_j6Z--FE5_x1pqDRamLy9qglE5ld3wv8JSYNdCuX_3gUXkVP--bnLi4K3s&s=e46263e662db8f1e219be9e0831b6801c3a83c3e3bbd95c0c0ff614c8b91f9d6&u=https%3A%2F%2Fwww.mail-archive.com%2Fgluster-users%40gluster.org%2Fmsg28565.html>

I also used this tool to make some cleaning : 
https://github.com/aravindavk/archive_gluster_changelogs<https://m365.eu.vadesecure.com/safeproxy/v4?f=7iWLjQmOQdXjvWL-oU8XxG1REnJdba2poSjoWTIvxiiK8z_k29m1dUElzSZeLmV6&i=Q33B1BrLmB6xeoOmkDLGfbrU2CnNmBwuqiVIPvjzswZE_lFqRCNbzoEYA0NKABb-1_es8w2G1SrnpJCQ4HkeXg&k=vKmd&r=fVUVCtlv2RPTBtFJ0XBiuzVPeVS6-q8eX93OkxAAB7Jjp4pSIAa7_8D0CiJuz7mk&s=5345ee88c938192c349fcf2d85d4c710dd66f267d23be9ee25270b9969739b60&u=https%3A%2F%2Fgithub.com%2Faravindavk%2Farchive_gluster_changelogs>
 (after updating it to manage new changelog directory structure).
But it is not perfect as it doesn't work on one of my volumes (I don't know 
why).

But, is there an official way to automatically clean changelogs after they have 
been processed by geo-replication? Is it now implemented into gluster 8+? If 
not, why?

Thanks.

Cordialement, Regards,
Lionel BEARD
CLS - IT & Operations

Ce message et toutes les pièces jointes (ci-après le "message") sont établis à 
l'intention exclusive de ses destinataires et sont confidentiels. Si vous 
recevez ce message par erreur ou s'il ne vous est pas destiné, merci de le 
détruire ainsi que toute copie de votre système et d'en avertir immédiatement 
l'expéditeur. Toute lecture non autorisée, toute utilisation de ce message qui 
n'est pas conforme à sa destination, toute diffusion ou toute publication, 
totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer 
l'intégrité de ce message électronique susceptible d'altération, l'expéditeur 
(et ses filiales) décline(nt) toute responsabilité au titre de ce message dans 
l'hypothèse où il aurait été modifié ou falsifié.
This message and any attachments (the "message") is intended solely for the 
intended recipient(s) and is confidential. If you receive this message in 
error, or are not the intended recipient(s), please delete it and any copies 
from your systems and immediately notify 

Re: [Gluster-users] Geo-replication and changelogs cleaning

2021-09-01 Thread Aravinda VK
Hi,

I think the “archive_gluster_changelogs” repo not updated after the backend 
changelogs are restructured(This Patch: 
https://github.com/gluster/glusterfs/commit/ec3df84dcfd7ccda0a18fa75e3b425c090209adf#diff-64c754d7b6ec77154042671072debc69456f43b3abc34354d5b818937635600f
 
)

If I fix the script to understand the new backend format, will it work for you?

Aravinda Vishwanathapura
https://kadalu.io

> On 01-Sep-2021, at 7:07 PM, Shwetha Acharya  wrote:
> 
> +aravi...@kadalu.io 
> 
> On Tue, Aug 31, 2021 at 6:26 PM Beard Lionel  > wrote:
> Hi everyone,
> 
>  
> 
> I am currently using GlusterFS 8.4 on Ubuntu Bionic for an application hosted 
> in Azure, with geo-replication configured for most of the volumes.
> 
> But the way changelogs files are managed is a little bit annoying, as there 
> are never deleted from .glusterfs directory on source volume.
> 
>  
> 
> I have found this message from this list (2017): 
> https://www.mail-archive.com/gluster-users@gluster.org/msg28565.html 
> 
>  
> 
> I also used this tool to make some cleaning : 
> https://github.com/aravindavk/archive_gluster_changelogs 
>  (after updating it 
> to manage new changelog directory structure).
> 
> But it is not perfect as it doesn’t work on one of my volumes (I don’t know 
> why).
> 
>  
> 
> But, is there an official way to automatically clean changelogs after they 
> have been processed by geo-replication? Is it now implemented into gluster 
> 8+? If not, why?
> 
>  
> 
> Thanks.
> 
>  
> 
> Cordialement, Regards,
> 
> Lionel BEARD
> 
> CLS - IT & Operations
> 
> Ce message et toutes les pièces jointes (ci-après le "message") sont établis 
> à l'intention exclusive de ses destinataires et sont confidentiels. Si vous 
> recevez ce message par erreur ou s'il ne vous est pas destiné, merci de le 
> détruire ainsi que toute copie de votre système et d'en avertir immédiatement 
> l'expéditeur. Toute lecture non autorisée, toute utilisation de ce message 
> qui n'est pas conforme à sa destination, toute diffusion ou toute 
> publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
> d'assurer l'intégrité de ce message électronique susceptible d'altération, 
> l’expéditeur (et ses filiales) décline(nt) toute responsabilité au titre de 
> ce message dans l'hypothèse où il aurait été modifié ou falsifié.
> 
> This message and any attachments (the "message") is intended solely for the 
> intended recipient(s) and is confidential. If you receive this message in 
> error, or are not the intended recipient(s), please delete it and any copies 
> from your systems and immediately notify the sender. Any unauthorized view, 
> use that does not comply with its purpose, dissemination or disclosure, 
> either whole or partial, is prohibited. Since the internet cannot guarantee 
> the integrity of this message which may not be reliable, the sender (and its 
> subsidiaries) shall not be liable for the message if modified or falsified.  
> 
> 
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk 
> 
> Gluster-users mailing list
> Gluster-users@gluster.org 
> https://lists.gluster.org/mailman/listinfo/gluster-users 
> 
> 
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users









Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication and changelogs cleaning

2021-09-01 Thread Shwetha Acharya
+aravi...@kadalu.io

On Tue, Aug 31, 2021 at 6:26 PM Beard Lionel  wrote:

> Hi everyone,
>
>
>
> I am currently using GlusterFS 8.4 on Ubuntu Bionic for an application
> hosted in Azure, with geo-replication configured for most of the volumes.
>
> But the way changelogs files are managed is a little bit annoying, as
> there are never deleted from .glusterfs directory on source volume.
>
>
>
> I have found this message from this list (2017):
> https://www.mail-archive.com/gluster-users@gluster.org/msg28565.html
>
>
>
> I also used this tool to make some cleaning :
> https://github.com/aravindavk/archive_gluster_changelogs (after updating
> it to manage new changelog directory structure).
>
> But it is not perfect as it doesn’t work on one of my volumes (I don’t
> know why).
>
>
>
> But, is there an official way to automatically clean changelogs after they
> have been processed by geo-replication? Is it now implemented into gluster
> 8+? If not, why?
>
>
>
> Thanks.
>
>
>
> Cordialement, Regards,
>
> Lionel BEARD
>
> CLS - IT & Operations
> --
>
> *Ce message et toutes les pièces jointes (ci-après le "message") sont
> établis à l'intention exclusive de ses destinataires et sont confidentiels.
> Si vous recevez ce message par erreur ou s'il ne vous est pas destiné,
> merci de le détruire ainsi que toute copie de votre système et d'en avertir
> immédiatement l'expéditeur. Toute lecture non autorisée, toute utilisation
> de ce message qui n'est pas conforme à sa destination, toute diffusion ou
> toute publication, totale ou partielle, est interdite. L'Internet ne
> permettant pas d'assurer l'intégrité de ce message électronique susceptible
> d'altération, l’expéditeur (et ses filiales) décline(nt) toute
> responsabilité au titre de ce message dans l'hypothèse où il aurait été
> modifié ou falsifié.*
>
> *This message and any attachments (the "message") is intended solely for
> the intended recipient(s) and is confidential. If you receive this message
> in error, or are not the intended recipient(s), please delete it and any
> copies from your systems and immediately notify the sender. Any
> unauthorized view, use that does not comply with its purpose, dissemination
> or disclosure, either whole or partial, is prohibited. Since the internet
> cannot guarantee the integrity of this message which may not be reliable,
> the sender (and its subsidiaries) shall not be liable for the message if
> modified or falsified.  *
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication adding new master node

2021-06-09 Thread David Cunningham
Hi Aravinda,

We ran a "gluster system:: execute gsec_create" and "georep create
push-pem" with force option as suggested, and then a "gluster volume
geo-replication ... status" reported the two new master nodes as being in
"Created" status. We did a geo-replication "stop" and then "start" and are
pleased to see the two new master nodes are now in "Passive" status. Thank
you for your help!


On Tue, 1 Jun 2021 at 10:06, David Cunningham 
wrote:

> Hi Aravinda,
>
> Thank you very much - we will give that a try.
>
>
> On Mon, 31 May 2021 at 20:29, Aravinda VK  wrote:
>
>> Hi David,
>>
>> On 31-May-2021, at 10:37 AM, David Cunningham 
>> wrote:
>>
>> Hello,
>>
>> We have a GlusterFS configuration with mirrored nodes on the master side
>> geo-replicating to mirrored nodes on the secondary side.
>>
>> When geo-replication is initially created it seems to automatically add
>> all the mirrored nodes on the master side as geo-replication master nodes,
>> which is fine. My first question is, if we add a new master side node how
>> can we add it as a geo-replication master?
>> This doesn't seem to happen automatically, according to the output of
>> "gluster volume geo-replication gvol0 secondary::gvol0 status". If we use
>> the normal "gluster volume geo-replication gvol0 secondary::slave-vol
>> create push-pem force" it says that the secondary side volume is not empty,
>> which is true because we're adding a master node to the existing
>> geo-replication.
>>
>>
>> This is not automatic. Run `gluster-georep-sshkey generate` and georep
>> create push-pem with force option to push the keys from new nodes to
>> secondary nodes.
>>
>> You can also try this tool instead of georep create command.
>>
>> https://github.com/aravindavk/gluster-georep-tools
>>
>> $ gluster-georep-setup gvol0 secondary::slave-vol --force
>>
>>
>> My second question is whether we can geo-replicate to multiple nodes on
>> the secondary side? Ideally we would normally have something like:
>> master A -> secondary A
>> master B -> secondary B
>> master C -> secondary C
>> so that any master or secondary node could go offline but geo-replication
>> would keep working.
>>
>>
>> Geo-replication command needs one Secondary node to establish the
>> session. Once session starts, Geo-rep starts one worker process per master
>> brick.
>>
>> These worker processes gets the list of secondary nodes by running the
>> `ssh  gluster volume info `. Then Geo-rep
>> distributes the secondary nodes connection in round robin way. For example,
>> if Master volume contains three nodes and secondary volume 3 nodes as you
>> mentioned then Geo-rep makes connection as Master A -> Secondary A, Master
>> B -> Secondary B and Master C -> Secondary C.
>>
>> Secondary node failover: If a node goes down in secondary cluster then
>> Master worker connects to other secondary node and continues replication.
>> One known issue is if the secondary node specified in the Geo-rep create
>> command goes down then it fails to get the Volume info(To get list of
>> secondary nodes). This can be solved by providing the list of secondary
>> nodes as config(Not yet available).
>>
>>
>> Thank you very much in advance.
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>> 
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> Aravinda Vishwanathapura
>> https://kadalu.io
>>
>>
>>
>>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>


-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication adding new master node

2021-05-31 Thread David Cunningham
Hi Aravinda,

Thank you very much - we will give that a try.


On Mon, 31 May 2021 at 20:29, Aravinda VK  wrote:

> Hi David,
>
> On 31-May-2021, at 10:37 AM, David Cunningham 
> wrote:
>
> Hello,
>
> We have a GlusterFS configuration with mirrored nodes on the master side
> geo-replicating to mirrored nodes on the secondary side.
>
> When geo-replication is initially created it seems to automatically add
> all the mirrored nodes on the master side as geo-replication master nodes,
> which is fine. My first question is, if we add a new master side node how
> can we add it as a geo-replication master?
> This doesn't seem to happen automatically, according to the output of
> "gluster volume geo-replication gvol0 secondary::gvol0 status". If we use
> the normal "gluster volume geo-replication gvol0 secondary::slave-vol
> create push-pem force" it says that the secondary side volume is not empty,
> which is true because we're adding a master node to the existing
> geo-replication.
>
>
> This is not automatic. Run `gluster-georep-sshkey generate` and georep
> create push-pem with force option to push the keys from new nodes to
> secondary nodes.
>
> You can also try this tool instead of georep create command.
>
> https://github.com/aravindavk/gluster-georep-tools
>
> $ gluster-georep-setup gvol0 secondary::slave-vol --force
>
>
> My second question is whether we can geo-replicate to multiple nodes on
> the secondary side? Ideally we would normally have something like:
> master A -> secondary A
> master B -> secondary B
> master C -> secondary C
> so that any master or secondary node could go offline but geo-replication
> would keep working.
>
>
> Geo-replication command needs one Secondary node to establish the session.
> Once session starts, Geo-rep starts one worker process per master brick.
>
> These worker processes gets the list of secondary nodes by running the
> `ssh  gluster volume info `. Then Geo-rep
> distributes the secondary nodes connection in round robin way. For example,
> if Master volume contains three nodes and secondary volume 3 nodes as you
> mentioned then Geo-rep makes connection as Master A -> Secondary A, Master
> B -> Secondary B and Master C -> Secondary C.
>
> Secondary node failover: If a node goes down in secondary cluster then
> Master worker connects to other secondary node and continues replication.
> One known issue is if the secondary node specified in the Geo-rep create
> command goes down then it fails to get the Volume info(To get list of
> secondary nodes). This can be solved by providing the list of secondary
> nodes as config(Not yet available).
>
>
> Thank you very much in advance.
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> Aravinda Vishwanathapura
> https://kadalu.io
>
>
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication adding new master node

2021-05-31 Thread Aravinda VK
Hi David,

> On 31-May-2021, at 10:37 AM, David Cunningham  
> wrote:
> 
> Hello,
> 
> We have a GlusterFS configuration with mirrored nodes on the master side 
> geo-replicating to mirrored nodes on the secondary side.
> 
> When geo-replication is initially created it seems to automatically add all 
> the mirrored nodes on the master side as geo-replication master nodes, which 
> is fine. My first question is, if we add a new master side node how can we 
> add it as a geo-replication master?
> This doesn't seem to happen automatically, according to the output of 
> "gluster volume geo-replication gvol0 secondary::gvol0 status". If we use the 
> normal "gluster volume geo-replication gvol0 secondary::slave-vol create 
> push-pem force" it says that the secondary side volume is not empty, which is 
> true because we're adding a master node to the existing geo-replication.

This is not automatic. Run `gluster-georep-sshkey generate` and georep create 
push-pem with force option to push the keys from new nodes to secondary nodes. 

You can also try this tool instead of georep create command.

https://github.com/aravindavk/gluster-georep-tools 


$ gluster-georep-setup gvol0 secondary::slave-vol --force

> 
> My second question is whether we can geo-replicate to multiple nodes on the 
> secondary side? Ideally we would normally have something like:
> master A -> secondary A
> master B -> secondary B
> master C -> secondary C
> so that any master or secondary node could go offline but geo-replication 
> would keep working.

Geo-replication command needs one Secondary node to establish the session. Once 
session starts, Geo-rep starts one worker process per master brick.

These worker processes gets the list of secondary nodes by running the `ssh 
 gluster volume info `. Then Geo-rep 
distributes the secondary nodes connection in round robin way. For example, if 
Master volume contains three nodes and secondary volume 3 nodes as you 
mentioned then Geo-rep makes connection as Master A -> Secondary A, Master B -> 
Secondary B and Master C -> Secondary C.

Secondary node failover: If a node goes down in secondary cluster then Master 
worker connects to other secondary node and continues replication. One known 
issue is if the secondary node specified in the Geo-rep create command goes 
down then it fails to get the Volume info(To get list of secondary nodes). This 
can be solved by providing the list of secondary nodes as config(Not yet 
available).

> 
> Thank you very much in advance.
> 
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/ 
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
> 
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

Aravinda Vishwanathapura
https://kadalu.io







Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-Replication - UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 78: surrogates not allowed

2021-02-26 Thread Dietmar Putz

Hi Andreas,

recently i have been faced with the same fault. I'm pretty sure you are 
speaking german, that's why a translation should not be necessary.


I found the reason by tracing a certain process which points to the 
gsyncd.log and looking backward from the error until i found some 
lgetxattr function call's. In the corresponding directory i found some 
filenames with 'special' characters. Rename fixed the problem.


Below 'my' history and solution for UnicodeEncodeError und 
UnicodeDecodeError. Hope it helps...btw, we are running gfs 7.9 on 
Ubuntu 18.04.



best regards

Dietmar



script fuer trace von geo-replication :





[ 07:35:09 ] - root@gl-master-05  ~/tmp/geo-rep $cat trace_gf.sh
#!/bin/bash
#
# script zum tracen der geo-rep aktivitaeten
# script benoetigt pid
# gedacht zum tracen der parent pid von master prozess auf gsyncd.log
# in diesem beispiel pid 13620
#
#
#[ 16:19:24 ] - root@gl-master-05 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $lsof 
gsyncd.log

#COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
#python3 13021 root    3w   REG    8,2  2905607 9572924 gsyncd.log
#python3 13619 root    3w   REG    8,2  2905607 9572924 gsyncd.log
#python3 13620 root    3w   REG    8,2  2905607 9572924 gsyncd.log
#[ 16:19:27 ] - root@gl-master-05 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $

#
#gf_log="/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log" 


tr_out="/root/tmp/geo-rep/trace-`date +"%H_%M_%S_%d_%m_%Y"`.out"

echo "tr_out : $tr_out"
#pid=`lsof "$gf_log" | grep -v COMMAND | head -1 | awk '{print $2}'`
PID=$1
echo "pid : $PID"

ps -p $PID > /dev/null 2>&1
if [ $? -ne 0 ]
then
    echo "Pid $PID not running"
    exit
fi

nohup strace -t -f -s 256 -o $tr_out -p$PID &

PID_STRACE=`ps -aef | grep -v grep | grep strace | awk '{print $2}'`
echo "Pid von strace : $PID_STRACE"

while true
do
    filesize=`ls -l $tr_out | awk '{print $5}'`
    if [ $filesize -gt 10 ]
    then
        ps -p $PID > /dev/null 2>&1
        if [ $? -eq 0 ]
        then
            kill -9 $PID_STRACE
            sleep 1
            rm $tr_out
            nohup strace -t -f -s 256 -o $tr_out -p$PID &
            PID_STRACE=`ps -aef | grep -v grep | grep strace | awk 
'{print $2}'`

            echo "Pid von strace : $PID_STRACE"
        else
            echo "pid $PID laeuft nicht mehr"
            exit
        fi
    fi
    ps -p $PID > /dev/null 2>&1
    if [ $? -ne 0 ]
    then
        echo "pid $PID laeuft nicht mehr..."
        exit
    fi
    sleep 120
    echo "`date` : `ls -lh $tr_out`"
done

-- 



zu 2. Loesungsansatz (s.u.) :

Fuer diesen Fehler reicht es den 'letzten' Prozess zu tracen. Hier 1236, 
nicht 13021. 13021 ist der 'mother' prozess, nach error werden die beien 
anderen gekillt und mit neuer pid gestartet, resultat von beobachtungen :


[ 13:00:04 ] - root@gl-master-05  ~/tmp/geo-rep/15 $lsof 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log

COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
python3  1235 root    3w   REG    8,2  2857996 9572924 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log
python3  1236 root    3w   REG    8,2  2857996 9572924 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log
python3 13021 root    3w   REG    8,2  2857996 9572924 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log

[ 13:00:18 ] - root@gl-master-05  ~/tmp/geo-rep/15 $

[ 13:00:10 ] - root@gl-master-05  ~/tmp/geo-rep $strace -t -f -s 256 -o 
/root/tmp/geo-rep/gsyncd1.out -p1236


Um das file nicht zu gross werden zu lassen kann man den strace immer 
wieder killen, file loeschen, und strace neu starten. Pech natuerlich 
wenn gerade dann der Fehler auftritt. Das file hat schnell eine Groesse 
von 1GB und mehr (ca. 10 Minuten, je nach aktivitaet) und viele 
Millionen lines...


geo-replication log beobachten, kill von o.g. pid ist allerdings nicht 
noetig. Der Prozess endet bei error, und damit auch der trace.


[ 12:32:04 ] - root@gl-master-05 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $tail -f 
gsyncd.log


...

[2021-02-11 12:53:59.530649] I [master(worker 
/brick1/mvol1):1441:process] _GMaster: Batch Completed mode=xsync    
duration=178.4717    changelog_start=1613041474 
changelog_end=1613041474    num_changelogs=1    stime=None entry_stime=None
[2021-02-11 12:53:59.639853] I [master(worker /brick1/mvol1):1681:crawl] 
_GMaster: processing xsync changelog 
path=/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/xsync/XSYNC-CHANGELOG.1613041477

###
[2021-02-11 13:00:57.149347] E [syncdutils(worker 
/brick1/mvol1):339:log_raise_exception] : FAIL:

Traceback (most recent call last):

Re: [Gluster-users] Geo-Replication - UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 78: surrogates not allowed

2021-02-26 Thread Andreas Kirbach

Hi Dietmar,

thank you for your reply.

I've also started to trace this down and you are correct, the directory 
does contain filenames with 'special' characters (umlauts), but renaming 
them as a workaround unfortunately is not an option.


So the question really is why does it fail on those characters and how 
to fix that so it doesn't error even if there are such filenames.


Kind regards,
Andreas

Am 26.02.2021 um 14:16 schrieb Dietmar Putz:

Hi Andreas,

recently i have been faced with the same fault. I'm pretty sure you are 
speaking german, that's why a translation should not be necessary.


I found the reason by tracing a certain process which points to the 
gsyncd.log and looking backward from the error until i found some 
lgetxattr function call's. In the corresponding directory i found some 
filenames with 'special' characters. Rename fixed the problem.


Below 'my' history and solution for UnicodeEncodeError und 
UnicodeDecodeError. Hope it helps...btw, we are running gfs 7.9 on 
Ubuntu 18.04.



best regards

Dietmar



script fuer trace von geo-replication :





[ 07:35:09 ] - root@gl-master-05  ~/tmp/geo-rep $cat trace_gf.sh
#!/bin/bash
#
# script zum tracen der geo-rep aktivitaeten
# script benoetigt pid
# gedacht zum tracen der parent pid von master prozess auf gsyncd.log
# in diesem beispiel pid 13620
#
#
#[ 16:19:24 ] - root@gl-master-05 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $lsof 
gsyncd.log

#COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
#python3 13021 root    3w   REG    8,2  2905607 9572924 gsyncd.log
#python3 13619 root    3w   REG    8,2  2905607 9572924 gsyncd.log
#python3 13620 root    3w   REG    8,2  2905607 9572924 gsyncd.log
#[ 16:19:27 ] - root@gl-master-05 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $

#
#gf_log="/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log" 


tr_out="/root/tmp/geo-rep/trace-`date +"%H_%M_%S_%d_%m_%Y"`.out"

echo "tr_out : $tr_out"
#pid=`lsof "$gf_log" | grep -v COMMAND | head -1 | awk '{print $2}'`
PID=$1
echo "pid : $PID"

ps -p $PID > /dev/null 2>&1
if [ $? -ne 0 ]
then
     echo "Pid $PID not running"
     exit
fi

nohup strace -t -f -s 256 -o $tr_out -p$PID &

PID_STRACE=`ps -aef | grep -v grep | grep strace | awk '{print $2}'`
echo "Pid von strace : $PID_STRACE"

while true
do
     filesize=`ls -l $tr_out | awk '{print $5}'`
     if [ $filesize -gt 10 ]
     then
         ps -p $PID > /dev/null 2>&1
         if [ $? -eq 0 ]
         then
             kill -9 $PID_STRACE
             sleep 1
             rm $tr_out
             nohup strace -t -f -s 256 -o $tr_out -p$PID &
             PID_STRACE=`ps -aef | grep -v grep | grep strace | awk 
'{print $2}'`

             echo "Pid von strace : $PID_STRACE"
         else
             echo "pid $PID laeuft nicht mehr"
             exit
         fi
     fi
     ps -p $PID > /dev/null 2>&1
     if [ $? -ne 0 ]
     then
         echo "pid $PID laeuft nicht mehr..."
         exit
     fi
     sleep 120
     echo "`date` : `ls -lh $tr_out`"
done

-- 



zu 2. Loesungsansatz (s.u.) :

Fuer diesen Fehler reicht es den 'letzten' Prozess zu tracen. Hier 1236, 
nicht 13021. 13021 ist der 'mother' prozess, nach error werden die beien 
anderen gekillt und mit neuer pid gestartet, resultat von beobachtungen :


[ 13:00:04 ] - root@gl-master-05  ~/tmp/geo-rep/15 $lsof 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log

COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
python3  1235 root    3w   REG    8,2  2857996 9572924 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log
python3  1236 root    3w   REG    8,2  2857996 9572924 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log
python3 13021 root    3w   REG    8,2  2857996 9572924 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log

[ 13:00:18 ] - root@gl-master-05  ~/tmp/geo-rep/15 $

[ 13:00:10 ] - root@gl-master-05  ~/tmp/geo-rep $strace -t -f -s 256 -o 
/root/tmp/geo-rep/gsyncd1.out -p1236


Um das file nicht zu gross werden zu lassen kann man den strace immer 
wieder killen, file loeschen, und strace neu starten. Pech natuerlich 
wenn gerade dann der Fehler auftritt. Das file hat schnell eine Groesse 
von 1GB und mehr (ca. 10 Minuten, je nach aktivitaet) und viele 
Millionen lines...


geo-replication log beobachten, kill von o.g. pid ist allerdings nicht 
noetig. Der Prozess endet bei error, und damit auch der trace.


[ 12:32:04 ] - root@gl-master-05 
/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $tail -f 
gsyncd.log


...

[2021-02-11 12:53:59.530649] I [master(worker 
/brick1/mvol1):1441:process] _GMaster: Batch Completed mode=xsync
duration=178.4717    changelog_start=16

Re: [Gluster-users] Geo-replication status Faulty

2020-10-27 Thread Strahil Nikolov
If you can afford the extra space , set the logs to TRACE and after reasonable 
timeframe lower them back.

Despite RH's gluster versioning is different - this thread should help:

https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level


Best Regards,
Strahil Nikolov






В вторник, 27 октомври 2020 г., 21:24:59 Гринуич+2, Gilberto Nunes 
 написа: 





Not so fast with my solution!
After shutting the other node in the head, get  FAULTY stat again...
The only failure I saw in this thing regarding xattr value... 

[2020-10-27 19:20:07.718897] E [syncdutils(worker 
/DATA/vms):110:gf_mount_ready] : failed to get the xattr value 

Don't know if I am looking at the right log: 
/var/log/glusterfs/geo-replication/VMS_gluster03_VMS-SLAVE/gsyncd.log 

[2020-10-27 19:20:03.867749] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change [{status=Initializing...}] [2020-10-27 
19:20:03.868206] I [monitor(monitor):160:monitor] Monitor: starting gsyncd 
worker [{brick=/DATA/vms}, {slave_node=gluster03}] [2020-10-27 19:20:04.397444] 
I [resource(worker /DATA/vms):1387:connect_remote] SSH: Initializing SSH 
connection between master and slave... [2020-10-27 19:20:06.337282] I 
[resource(worker /DATA/vms):1436:connect_remote] SSH: SSH connection between 
master and slave established. [{duration=1.9385}] [2020-10-27 19:20:06.337854] 
I [resource(worker /DATA/vms):1116:connect] GLUSTER: Mounting gluster volume 
locally... [2020-10-27 19:20:07.718897] E [syncdutils(worker 
/DATA/vms):110:gf_mount_ready] : failed to get the xattr value [2020-10-27 
19:20:07.720089] I [resource(worker /DATA/vms):1139:connect] GLUSTER: Mounted 
gluster volume [{duration=1.3815}] [2020-10-27 19:20:07.720644] I 
[subcmds(worker /DATA/vms):84:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor [2020-10-27 19:20:09.757677] I [master(worker 
/DATA/vms):1645:register] _GMaster: Working dir 
[{path=/var/lib/misc/gluster/gsyncd/VMS_gluster03_VMS-SLAVE/DATA-vms}] 
[2020-10-27 19:20:09.758440] I [resource(worker /DATA/vms):1292:service_loop] 
GLUSTER: Register time [{time=1603826409}] [2020-10-27 19:20:09.925364] I 
[gsyncdstatus(worker /DATA/vms):281:set_active] GeorepStatus: Worker Status 
Change [{status=Active}] [2020-10-27 19:20:10.407319] I [gsyncdstatus(worker 
/DATA/vms):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change 
[{status=History Crawl}] [2020-10-27 19:20:10.420385] I [master(worker 
/DATA/vms):1559:crawl] _GMaster: starting history crawl [{turns=1}, 
{stime=(1603821702, 0)}, {etime=1603826410}, {entry_stime=(1603822857, 0)}] 
[2020-10-27 19:20:10.424286] E [resource(worker /DATA/vms):1312:service_loop] 
GLUSTER: Changelog History Crawl failed [{error=[Errno 0] Success}] [2020-10-27 
19:20:10.731317] I [monitor(monitor):228:monitor] Monitor: worker died in 
startup phase [{brick=/DATA/vms}] [2020-10-27 19:20:10.740046] I 
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status 
Change [{status=Faulty}]

---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram






Em ter., 27 de out. de 2020 às 16:06, Strahil Nikolov  
escreveu:
> It could be a "simple" bug - software has bugs and regressions.
> 
> I would recommend you to ping the debian mailing list - at least it won't 
> hurt.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 27 октомври 2020 г., 20:10:39 Гринуич+2, Gilberto Nunes 
>  написа: 
> 
> 
> 
> 
> 
> [SOLVED]
> 
> Well... It seems to me that pure Debian Linux 10 has some problem with XFS, 
> which is the FS that  I used.
> It's not accept attr2 mount options.
> 
> Interestingly enough, I have now used Proxmox 6.x, which is Debian based, I 
> am now able to use the attr2 mount point option.
> Then the Faulty status of geo-rep has gone.
> Perhaps Proxmox staff has compiled xfs from scratch... Don't know
> But now I am happy ' cause the main reason to use geo-rep to me is to use it 
> over Proxmox
> 
> cat /etc/fstab  #   
> /dev/pve/root / xfs defaults 0 1 /dev/pve/swap none swap sw 0 0 /dev/sdb1 
>   /DATA   xfs attr2   0   0 gluster01:VMS /vms glusterfs 
> defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster02 0 0 proc 
> /proc proc defaults 0 0
> 
> 
> ---
> Gilberto Nunes Ferreira
> 
> 
> 
> 
> 
> 
> 
> Em ter., 27 de out. de 2020 às 09:39, Gilberto Nunes 
>  escreveu:
 IIUC you're begging for split-brain ...
>> Not at all!
>> I have used this configuration and there isn't any split brain at all!
>> But if I do not use it, then I get a split brain.
>> Regarding count 2 I will see it!
>> Thanks
>> 
>> ---
>> Gilberto Nunes Ferreira
>> 
>> 
>> 
>> 
>> 
>> Em ter., 27 de out. de 2020 às 09:37, Diego Zuccato  
>> escreveu:
>>> Il 27/10/20 13:15, Gilberto Nunes ha scritto:
 I have applied this parameters to the 2-node gluster:
 gluster vol set VMS cluster.heal-timeout 10
 gluster volume heal VMS enable

Re: [Gluster-users] Geo-replication status Faulty

2020-10-27 Thread Strahil Nikolov
It could be a "simple" bug - software has bugs and regressions.

I would recommend you to ping the debian mailing list - at least it won't hurt.

Best Regards,
Strahil Nikolov






В вторник, 27 октомври 2020 г., 20:10:39 Гринуич+2, Gilberto Nunes 
 написа: 





[SOLVED]

Well... It seems to me that pure Debian Linux 10 has some problem with XFS, 
which is the FS that  I used.
It's not accept attr2 mount options.

Interestingly enough, I have now used Proxmox 6.x, which is Debian based, I am 
now able to use the attr2 mount point option.
Then the Faulty status of geo-rep has gone.
Perhaps Proxmox staff has compiled xfs from scratch... Don't know
But now I am happy ' cause the main reason to use geo-rep to me is to use it 
over Proxmox

cat /etc/fstab  #   
/dev/pve/root / xfs defaults 0 1 /dev/pve/swap none swap sw 0 0 /dev/sdb1   
/DATA   xfs attr2   0   0 gluster01:VMS /vms glusterfs 
defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster02 0 0 proc 
/proc proc defaults 0 0


---
Gilberto Nunes Ferreira







Em ter., 27 de out. de 2020 às 09:39, Gilberto Nunes 
 escreveu:
>>> IIUC you're begging for split-brain ...
> Not at all!
> I have used this configuration and there isn't any split brain at all!
> But if I do not use it, then I get a split brain.
> Regarding count 2 I will see it!
> Thanks
> 
> ---
> Gilberto Nunes Ferreira
> 
> 
> 
> 
> 
> Em ter., 27 de out. de 2020 às 09:37, Diego Zuccato  
> escreveu:
>> Il 27/10/20 13:15, Gilberto Nunes ha scritto:
>>> I have applied this parameters to the 2-node gluster:
>>> gluster vol set VMS cluster.heal-timeout 10
>>> gluster volume heal VMS enable
>>> gluster vol set VMS cluster.quorum-reads false
>>> gluster vol set VMS cluster.quorum-count 1
>> Urgh!
>> IIUC you're begging for split-brain ...
>> I think you should leave quorum-count=2 for safe writes. If a node is
>> down, obviously the volume becomes readonly. But if you planned the
>> downtime you can reduce quorum-count just before shutting it down.
>> You'll have to bring it back to 2 before re-enabling the downed server,
>> then wait for heal to complete before being able to down the second server.
>> 
>>> Then I mount the gluster volume putting this line in the fstab file:
>>> In gluster01
>>> gluster01:VMS /vms glusterfs
>>> defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster02 0 0
>>> In gluster02
>>> gluster02:VMS /vms glusterfs
>>> defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster01 0 0
>> Isn't it preferrable to use the 'hostlist' syntax?
>> gluster01,gluster02:VMS /vms glusterfs defaults,_netdev 0 0
>> A / at the beginning is optional, but can be useful if you're trying to
>> use the diamond freespace collector (w/o the initial slash, it ignores
>> glusterfs mountpoints).
>> 
>> -- 
>> Diego Zuccato
>> DIFA - Dip. di Fisica e Astronomia
>> Servizi Informatici
>> Alma Mater Studiorum - Università di Bologna
>> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>> tel.: +39 051 20 95786
>> 
> 




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication status Faulty

2020-10-27 Thread Gilberto Nunes
Not so fast with my solution!
After shutting the other node in the head, get  FAULTY stat again...
The only failure I saw in this thing regarding xattr value...

[2020-10-27 19:20:07.718897] E [syncdutils(worker
/DATA/vms):110:gf_mount_ready] : failed to get the xattr value

Don't know if I am looking at the right log:
/var/log/glusterfs/geo-replication/VMS_gluster03_VMS-SLAVE/gsyncd.log

[2020-10-27 19:20:03.867749] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
Change [{status=Initializing...}]
[2020-10-27 19:20:03.868206] I [monitor(monitor):160:monitor] Monitor:
starting gsyncd worker [{brick=/DATA/vms}, {slave_node=gluster03}]
[2020-10-27 19:20:04.397444] I [resource(worker
/DATA/vms):1387:connect_remote] SSH: Initializing SSH connection between
master and slave...
[2020-10-27 19:20:06.337282] I [resource(worker
/DATA/vms):1436:connect_remote] SSH: SSH connection between master and
slave established. [{duration=1.9385}]
[2020-10-27 19:20:06.337854] I [resource(worker /DATA/vms):1116:connect]
GLUSTER: Mounting gluster volume locally...
[2020-10-27 19:20:07.718897] E [syncdutils(worker
/DATA/vms):110:gf_mount_ready] : failed to get the xattr value
[2020-10-27 19:20:07.720089] I [resource(worker /DATA/vms):1139:connect]
GLUSTER: Mounted gluster volume [{duration=1.3815}]
[2020-10-27 19:20:07.720644] I [subcmds(worker /DATA/vms):84:subcmd_worker]
: Worker spawn successful. Acknowledging back to monitor
[2020-10-27 19:20:09.757677] I [master(worker /DATA/vms):1645:register]
_GMaster: Working dir
[{path=/var/lib/misc/gluster/gsyncd/VMS_gluster03_VMS-SLAVE/DATA-vms}]
[2020-10-27 19:20:09.758440] I [resource(worker
/DATA/vms):1292:service_loop] GLUSTER: Register time [{time=1603826409}]
[2020-10-27 19:20:09.925364] I [gsyncdstatus(worker
/DATA/vms):281:set_active] GeorepStatus: Worker Status Change
[{status=Active}]
[2020-10-27 19:20:10.407319] I [gsyncdstatus(worker
/DATA/vms):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change
[{status=History Crawl}]
[2020-10-27 19:20:10.420385] I [master(worker /DATA/vms):1559:crawl]
_GMaster: starting history crawl [{turns=1}, {stime=(1603821702, 0)},
{etime=1603826410}, {entry_
stime=(1603822857, 0)}]
[2020-10-27 19:20:10.424286] E [resource(worker
/DATA/vms):1312:service_loop] GLUSTER: Changelog History Crawl failed
[{error=[Errno 0] Success}]
[2020-10-27 19:20:10.731317] I [monitor(monitor):228:monitor] Monitor:
worker died in startup phase [{brick=/DATA/vms}]
[2020-10-27 19:20:10.740046] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
Change [{status=Faulty}]


---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram






Em ter., 27 de out. de 2020 às 16:06, Strahil Nikolov 
escreveu:

> It could be a "simple" bug - software has bugs and regressions.
>
> I would recommend you to ping the debian mailing list - at least it won't
> hurt.
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В вторник, 27 октомври 2020 г., 20:10:39 Гринуич+2, Gilberto Nunes <
> gilberto.nune...@gmail.com> написа:
>
>
>
>
>
> [SOLVED]
>
> Well... It seems to me that pure Debian Linux 10 has some problem with
> XFS, which is the FS that  I used.
> It's not accept attr2 mount options.
>
> Interestingly enough, I have now used Proxmox 6.x, which is Debian based,
> I am now able to use the attr2 mount point option.
> Then the Faulty status of geo-rep has gone.
> Perhaps Proxmox staff has compiled xfs from scratch... Don't know
> But now I am happy ' cause the main reason to use geo-rep to me is to use
> it over Proxmox
>
> cat /etc/fstab  # 
>  /dev/pve/root / xfs defaults 0 1 /dev/pve/swap none swap sw 0 0
> /dev/sdb1   /DATA   xfs attr2   0   0 gluster01:VMS /vms
> glusterfs
> defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster02 0 0
> proc /proc proc defaults 0 0
>
>
> ---
> Gilberto Nunes Ferreira
>
>
>
>
>
>
>
> Em ter., 27 de out. de 2020 às 09:39, Gilberto Nunes <
> gilberto.nune...@gmail.com> escreveu:
> >>> IIUC you're begging for split-brain ...
> > Not at all!
> > I have used this configuration and there isn't any split brain at all!
> > But if I do not use it, then I get a split brain.
> > Regarding count 2 I will see it!
> > Thanks
> >
> > ---
> > Gilberto Nunes Ferreira
> >
> >
> >
> >
> >
> > Em ter., 27 de out. de 2020 às 09:37, Diego Zuccato <
> diego.zucc...@unibo.it> escreveu:
> >> Il 27/10/20 13:15, Gilberto Nunes ha scritto:
> >>> I have applied this parameters to the 2-node gluster:
> >>> gluster vol set VMS cluster.heal-timeout 10
> >>> gluster volume heal VMS enable
> >>> gluster vol set VMS cluster.quorum-reads false
> >>> gluster vol set VMS cluster.quorum-count 1
> >> Urgh!
> >> IIUC you're begging for split-brain ...
> >> I think you should leave quorum-count=2 for safe writes. If a node is
> >> down, obviously the volume becomes readonly. But if you planned the
> >> downtime you can reduce quorum-count just before shutting it down.
> >> You'll have to b

Re: [Gluster-users] Geo-replication status Faulty

2020-10-27 Thread Gilberto Nunes
[SOLVED]

Well... It seems to me that pure Debian Linux 10 has some problem with XFS,
which is the FS that  I used.
It's not accept attr2 mount options.

Interestingly enough, I have now used Proxmox 6.x, which is Debian based, I
am now able to use the attr2 mount point option.
Then the Faulty status of geo-rep has gone.
Perhaps Proxmox staff has compiled xfs from scratch... Don't know
But now I am happy ' cause the main reason to use geo-rep to me is to use
it over Proxmox

cat /etc/fstab
#  
/dev/pve/root / xfs defaults 0 1
/dev/pve/swap none swap sw 0 0
/dev/sdb1   /DATA   xfs attr2   0   0
gluster01:VMS /vms glusterfs
defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster02 0 0
proc /proc proc defaults 0 0


---
Gilberto Nunes Ferreira






Em ter., 27 de out. de 2020 às 09:39, Gilberto Nunes <
gilberto.nune...@gmail.com> escreveu:

> >> IIUC you're begging for split-brain ...
> Not at all!
> I have used this configuration and there isn't any split brain at all!
> But if I do not use it, then I get a split brain.
> Regarding count 2 I will see it!
> Thanks
>
> ---
> Gilberto Nunes Ferreira
>
>
>
>
>
> Em ter., 27 de out. de 2020 às 09:37, Diego Zuccato <
> diego.zucc...@unibo.it> escreveu:
>
>> Il 27/10/20 13:15, Gilberto Nunes ha scritto:
>> > I have applied this parameters to the 2-node gluster:
>> > gluster vol set VMS cluster.heal-timeout 10
>> > gluster volume heal VMS enable
>> > gluster vol set VMS cluster.quorum-reads false
>> > gluster vol set VMS cluster.quorum-count 1
>> Urgh!
>> IIUC you're begging for split-brain ...
>> I think you should leave quorum-count=2 for safe writes. If a node is
>> down, obviously the volume becomes readonly. But if you planned the
>> downtime you can reduce quorum-count just before shutting it down.
>> You'll have to bring it back to 2 before re-enabling the downed server,
>> then wait for heal to complete before being able to down the second
>> server.
>>
>> > Then I mount the gluster volume putting this line in the fstab file:
>> > In gluster01
>> > gluster01:VMS /vms glusterfs
>> > defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster02 0 0
>> > In gluster02
>> > gluster02:VMS /vms glusterfs
>> > defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster01 0 0
>> Isn't it preferrable to use the 'hostlist' syntax?
>> gluster01,gluster02:VMS /vms glusterfs defaults,_netdev 0 0
>> A / at the beginning is optional, but can be useful if you're trying to
>> use the diamond freespace collector (w/o the initial slash, it ignores
>> glusterfs mountpoints).
>>
>> --
>> Diego Zuccato
>> DIFA - Dip. di Fisica e Astronomia
>> Servizi Informatici
>> Alma Mater Studiorum - Università di Bologna
>> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>> tel.: +39 051 20 95786
>>
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication status Faulty

2020-10-27 Thread Gilberto Nunes
>> IIUC you're begging for split-brain ...
Not at all!
I have used this configuration and there isn't any split brain at all!
But if I do not use it, then I get a split brain.
Regarding count 2 I will see it!
Thanks

---
Gilberto Nunes Ferreira





Em ter., 27 de out. de 2020 às 09:37, Diego Zuccato 
escreveu:

> Il 27/10/20 13:15, Gilberto Nunes ha scritto:
> > I have applied this parameters to the 2-node gluster:
> > gluster vol set VMS cluster.heal-timeout 10
> > gluster volume heal VMS enable
> > gluster vol set VMS cluster.quorum-reads false
> > gluster vol set VMS cluster.quorum-count 1
> Urgh!
> IIUC you're begging for split-brain ...
> I think you should leave quorum-count=2 for safe writes. If a node is
> down, obviously the volume becomes readonly. But if you planned the
> downtime you can reduce quorum-count just before shutting it down.
> You'll have to bring it back to 2 before re-enabling the downed server,
> then wait for heal to complete before being able to down the second server.
>
> > Then I mount the gluster volume putting this line in the fstab file:
> > In gluster01
> > gluster01:VMS /vms glusterfs
> > defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster02 0 0
> > In gluster02
> > gluster02:VMS /vms glusterfs
> > defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster01 0 0
> Isn't it preferrable to use the 'hostlist' syntax?
> gluster01,gluster02:VMS /vms glusterfs defaults,_netdev 0 0
> A / at the beginning is optional, but can be useful if you're trying to
> use the diamond freespace collector (w/o the initial slash, it ignores
> glusterfs mountpoints).
>
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication status Faulty

2020-10-27 Thread Diego Zuccato
Il 27/10/20 13:15, Gilberto Nunes ha scritto:
> I have applied this parameters to the 2-node gluster:
> gluster vol set VMS cluster.heal-timeout 10
> gluster volume heal VMS enable
> gluster vol set VMS cluster.quorum-reads false
> gluster vol set VMS cluster.quorum-count 1
Urgh!
IIUC you're begging for split-brain ...
I think you should leave quorum-count=2 for safe writes. If a node is
down, obviously the volume becomes readonly. But if you planned the
downtime you can reduce quorum-count just before shutting it down.
You'll have to bring it back to 2 before re-enabling the downed server,
then wait for heal to complete before being able to down the second server.

> Then I mount the gluster volume putting this line in the fstab file:
> In gluster01
> gluster01:VMS /vms glusterfs
> defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster02 0 0
> In gluster02
> gluster02:VMS /vms glusterfs
> defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster01 0 0
Isn't it preferrable to use the 'hostlist' syntax?
gluster01,gluster02:VMS /vms glusterfs defaults,_netdev 0 0
A / at the beginning is optional, but can be useful if you're trying to
use the diamond freespace collector (w/o the initial slash, it ignores
glusterfs mountpoints).

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication status Faulty

2020-10-27 Thread Gilberto Nunes
Hi Aravinda

Let me thank you for that nice tools... It helps me a lot.
And yes! Indeed I think this is the case, but why does gluster03 (which is
the backup server) not continue since gluster02 are still online??
That puzzles me...


---
Gilberto Nunes Ferreira





Em ter., 27 de out. de 2020 às 06:52, Aravinda VK 
escreveu:

> Hi Gilberto,
>
> Happy to see georepsetup tool is useful for you. The repo I moved to
> https://github.com/aravindavk/gluster-georep-tools (renamed as
> “gluster-georep-setup”).
>
> I think the georep command failure is due to respective node’s(peer)
> Glusterd is not reachable/down.
>
> Aravinda Vishwanathapura
> https://kadalu.io
>
> On 27-Oct-2020, at 2:15 AM, Gilberto Nunes 
> wrote:
>
> I was able to solve the issue restarting all servers.
>
> Now I have another issue!
>
> I just powered off the gluster01 server and then the geo-replication
> entered in faulty status.
> I tried to stop and start the gluster geo-replication like that:
>
> gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume
> Peer gluster01.home.local, which is a part of DATA volume, is down. Please
> bring up the peer and retry.
> geo-replication command failed
>
> How can I have geo-replication with 2 master and 1 slave?
>
> Thanks
>
>
> ---
> Gilberto Nunes Ferreira
>
>
>
>
>
>
> Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes <
> gilberto.nune...@gmail.com> escreveu:
>
>> Hi there...
>>
>> I'd created a 2 gluster vol and another 1 gluster server acting as a
>> backup server, using geo-replication.
>> So in gluster01 I'd issued the command:
>>
>> gluster peer probe gluster02;gluster peer probe gluster03
>> gluster vol create DATA replica 2 gluster01:/DATA/master01-data
>> gluster02:/DATA/master01-data/
>>
>> Then in gluster03 server:
>>
>> gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/
>>
>> I'd setted the ssh powerless session between this 3 servers.
>>
>> Then I'd used this script
>>
>> https://github.com/gilbertoferreira/georepsetup
>>
>> like this
>>
>> georepsetup
>> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
>> CryptographyDeprecationWarning: Python 2 is no longer supp
>> orted by the Python core team. Support for it is now deprecated in
>> cryptography, and will be removed in a future release.
>>  from cryptography.hazmat.backends import default_backend
>> usage: georepsetup [-h] [--force] [--no-color] MASTERVOL SLAVE SLAVEVOL
>> georepsetup: error: too few arguments
>> gluster01:~# georepsetup DATA gluster03 DATA-SLAVE
>> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
>> CryptographyDeprecationWarning: Python 2 is no longer supp
>> orted by the Python core team. Support for it is now deprecated in
>> cryptography, and will be removed in a future release.
>>  from cryptography.hazmat.backends import default_backend
>> Geo-replication session will be established between DATA and
>> gluster03::DATA-SLAVE
>> Root password of gluster03 is required to complete the setup. NOTE:
>> Password will not be stored.
>>
>> root@gluster03's password:
>> [OK] gluster03 is Reachable(Port 22)
>> [OK] SSH Connection established root@gluster03
>> [OK] Master Volume and Slave Volume are compatible (Version: 8.2)
>> [OK] Common secret pub file present at
>> /var/lib/glusterd/geo-replication/common_secret.pem.pub
>> [OK] common_secret.pem.pub file copied to gluster03
>> [OK] Master SSH Keys copied to all Up Slave nodes
>> [OK] Updated Master SSH Keys to all Up Slave nodes authorized_keys
>> file
>> [OK] Geo-replication Session Established
>>
>> Then I reboot the 3 servers...
>> After a while everything works ok, but after a few minutes, I get Faulty
>> status in gluster01
>>
>> There's the log
>>
>>
>> [2020-10-26 20:16:41.362584] I
>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
>> Change [{status=Initializing...}]
>> [2020-10-26 20:16:41.362937] I [monitor(monitor):160:monitor] Monitor:
>> starting gsyncd worker [{brick=/DATA/master01-data},
>> {slave_node=gluster03}]
>> [2020-10-26 20:16:41.508884] I [resource(worker
>> /DATA/master01-data):1387:connect_remote] SSH: Initializing SSH connection
>> between master and slave.
>> ..
>> [2020-10-26 20:16:42.996678] I [resource(worker
>> /DATA/master01-data):1436:connect_remote] SSH: SSH connection between
>> master and slave established.
>> [{duration=1.4873}]
>> [2020-10-26 20:16:42.997121] I [resource(worker
>> /DATA/master01-data):1116:connect] GLUSTER: Mounting gluster volume
>> locally...
>> [2020-10-26 20:16:44.170661] E [syncdutils(worker
>> /DATA/master01-data):110:gf_mount_ready] : failed to get the xattr
>> value
>> [2020-10-26 20:16:44.171281] I [resource(worker
>> /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume
>> [{duration=1.1739}]
>> [2020-10-26 20:16:44.171772] I [subcmds(worker
>> /DATA/master01-data):84:subcmd_worker] : Worker spawn successful.
>> Acknowl

Re: [Gluster-users] Geo-replication status Faulty

2020-10-27 Thread Gilberto Nunes
Dear Felix

I have applied this parameters to the 2-node gluster:

gluster vol set VMS cluster.heal-timeout 10
gluster volume heal VMS enable
gluster vol set VMS cluster.quorum-reads false
gluster vol set VMS cluster.quorum-count 1
gluster vol set VMS network.ping-timeout 2
gluster volume set VMS cluster.favorite-child-policy mtime
gluster volume heal VMS granular-entry-heal enable
gluster volume set VMS cluster.data-self-heal-algorithm full

As you can see, I used this for virtualization purposes.
Then I mount the gluster volume putting this line in the fstab file:

In gluster01

gluster01:VMS /vms glusterfs
defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster02 0 0

In gluster02

gluster02:VMS /vms glusterfs
defaults,_netdev,x-systemd.automount,backupvolfile-server=gluster01 0 0

Then after shutdown the gluster01, gluster02 is still access the mounted
gluster volume...

Just the geo-rep has failure.

I could see why, but I'll make further investigation.

Thanks




---
Gilberto Nunes Ferreira





Em ter., 27 de out. de 2020 às 04:57, Felix Kölzow 
escreveu:

> Dear Gilberto,
>
>
> If I am right, you ran into server-quorum if you startet a 2-node replica
> and shutdown one host.
>
> From my perspective, its fine.
>
>
> Please correct me if I am wrong here.
>
>
> Regards,
>
> Felix
> On 27/10/2020 01:46, Gilberto Nunes wrote:
>
> Well I do not reboot the host. I shut down the host. Then after 15 min
> give up.
> Don't know why that happened.
> I will try it latter
>
> ---
> Gilberto Nunes Ferreira
>
>
>
>
>
>
>
>
> Em seg., 26 de out. de 2020 às 21:31, Strahil Nikolov <
> hunter86...@yahoo.com> escreveu:
>
>> Usually there is always only 1 "master" , but when you power off one of
>> the 2 nodes - the geo rep should handle that and the second node should
>> take the job.
>>
>> How long did you wait after gluster1 has been rebooted ?
>>
>>
>> Best Regards,
>> Strahil Nikolov
>>
>>
>>
>>
>>
>>
>> В понеделник, 26 октомври 2020 г., 22:46:21 Гринуич+2, Gilberto Nunes <
>> gilberto.nune...@gmail.com> написа:
>>
>>
>>
>>
>>
>> I was able to solve the issue restarting all servers.
>>
>> Now I have another issue!
>>
>> I just powered off the gluster01 server and then the geo-replication
>> entered in faulty status.
>> I tried to stop and start the gluster geo-replication like that:
>>
>> gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume
>>  Peer gluster01.home.local, which is a part of DATA volume, is down. Please
>> bring up the peer and retry. geo-replication command failed
>> How can I have geo-replication with 2 master and 1 slave?
>>
>> Thanks
>>
>>
>> ---
>> Gilberto Nunes Ferreira
>>
>>
>>
>>
>>
>>
>>
>> Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes <
>> gilberto.nune...@gmail.com> escreveu:
>> > Hi there...
>> >
>> > I'd created a 2 gluster vol and another 1 gluster server acting as a
>> backup server, using geo-replication.
>> > So in gluster01 I'd issued the command:
>> >
>> > gluster peer probe gluster02;gluster peer probe gluster03
>> > gluster vol create DATA replica 2 gluster01:/DATA/master01-data
>> gluster02:/DATA/master01-data/
>> >
>> > Then in gluster03 server:
>> >
>> > gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/
>> >
>> > I'd setted the ssh powerless session between this 3 servers.
>> >
>> > Then I'd used this script
>> >
>> > https://github.com/gilbertoferreira/georepsetup
>> >
>> > like this
>> >
>> > georepsetup
>>
>> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
>> CryptographyDeprecationWarning: Python 2 is no longer supported by the
>> Python core team. Support for it is now deprecated in cryptography, and
>> will be removed in a future release.  from cryptography.hazmat.backends
>> import default_backend usage: georepsetup [-h] [--force] [--no-color]
>> MASTERVOL SLAVE SLAVEVOL georepsetup: error: too few arguments gluster01:~#
>> georepsetup DATA gluster03 DATA-SLAVE
>> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
>> CryptographyDeprecationWarning: Python 2 is no longer supported by the
>> Python core team. Support for it is now deprecated in cryptography, and
>> will be removed in a future release.  from cryptography.hazmat.backends
>> import default_backend Geo-replication session will be established between
>> DATA and gluster03::DATA-SLAVE Root password of gluster03 is required to
>> complete the setup. NOTE: Password will not be stored. root@gluster03's
>> password:  [OK] gluster03 is Reachable(Port 22) [OK] SSH Connection
>> established root@gluster03 [OK] Master Volume and Slave Volume are
>> compatible (Version: 8.2) [OK] Common secret pub file present at
>> /var/lib/glusterd/geo-replication/common_secret.pem.pub [OK]
>> common_secret.pem.pub file copied to gluster03 [OK] Master SSH Keys
>> copied to all Up Slave nodes [OK] Updated Master SSH Keys to all Up
>> Slave nodes authorized_keys file [

Re: [Gluster-users] Geo-replication status Faulty

2020-10-27 Thread Aravinda VK
Hi Gilberto,

Happy to see georepsetup tool is useful for you. The repo I moved to 
https://github.com/aravindavk/gluster-georep-tools 
 (renamed as 
“gluster-georep-setup”).

I think the georep command failure is due to respective node’s(peer) Glusterd 
is not reachable/down.

Aravinda Vishwanathapura
https://kadalu.io

> On 27-Oct-2020, at 2:15 AM, Gilberto Nunes  wrote:
> 
> I was able to solve the issue restarting all servers.
> 
> Now I have another issue!
> 
> I just powered off the gluster01 server and then the geo-replication entered 
> in faulty status.
> I tried to stop and start the gluster geo-replication like that:
> 
> gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume  
> Peer gluster01.home.local, which is a part of DATA volume, is down. Please 
> bring up the peer and retry. 
> geo-replication command failed
> 
> How can I have geo-replication with 2 master and 1 slave?
> 
> Thanks
> 
> 
> ---
> Gilberto Nunes Ferreira
> 
> 
> 
> 
> 
> 
> Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes 
> mailto:gilberto.nune...@gmail.com>> escreveu:
> Hi there...
> 
> I'd created a 2 gluster vol and another 1 gluster server acting as a backup 
> server, using geo-replication.
> So in gluster01 I'd issued the command:
> 
> gluster peer probe gluster02;gluster peer probe gluster03
> gluster vol create DATA replica 2 gluster01:/DATA/master01-data 
> gluster02:/DATA/master01-data/
> 
> Then in gluster03 server:
> 
> gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/
> 
> I'd setted the ssh powerless session between this 3 servers.
> 
> Then I'd used this script
> 
> https://github.com/gilbertoferreira/georepsetup 
> 
> 
> like this
> 
> georepsetup
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
>  CryptographyDeprecationWarning: Python 2 is no longer supp
> orted by the Python core team. Support for it is now deprecated in 
> cryptography, and will be removed in a future release. 
>  from cryptography.hazmat.backends import default_backend 
> usage: georepsetup [-h] [--force] [--no-color] MASTERVOL SLAVE SLAVEVOL 
> georepsetup: error: too few arguments 
> gluster01:~# georepsetup DATA gluster03 DATA-SLAVE 
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
>  CryptographyDeprecationWarning: Python 2 is no longer supp
> orted by the Python core team. Support for it is now deprecated in 
> cryptography, and will be removed in a future release. 
>  from cryptography.hazmat.backends import default_backend 
> Geo-replication session will be established between DATA and 
> gluster03::DATA-SLAVE 
> Root password of gluster03 is required to complete the setup. NOTE: Password 
> will not be stored. 
> 
> root@gluster03's password:  
> [OK] gluster03 is Reachable(Port 22) 
> [OK] SSH Connection established root@gluster03 
> [OK] Master Volume and Slave Volume are compatible (Version: 8.2) 
> [OK] Common secret pub file present at 
> /var/lib/glusterd/geo-replication/common_secret.pem.pub 
> [OK] common_secret.pem.pub file copied to gluster03 
> [OK] Master SSH Keys copied to all Up Slave nodes 
> [OK] Updated Master SSH Keys to all Up Slave nodes authorized_keys file 
> [OK] Geo-replication Session Established
> 
> Then I reboot the 3 servers...
> After a while everything works ok, but after a few minutes, I get Faulty 
> status in gluster01
> 
> There's the log
> 
> 
> [2020-10-26 20:16:41.362584] I [gsyncdstatus(monitor):248:set_worker_status] 
> GeorepStatus: Worker Status Change [{status=Initializing...}] 
> [2020-10-26 20:16:41.362937] I [monitor(monitor):160:monitor] Monitor: 
> starting gsyncd worker [{brick=/DATA/master01-data}, {slave_node=gluster03}] 
> [2020-10-26 20:16:41.508884] I [resource(worker 
> /DATA/master01-data):1387:connect_remote] SSH: Initializing SSH connection 
> between master and slave.
> .. 
> [2020-10-26 20:16:42.996678] I [resource(worker 
> /DATA/master01-data):1436:connect_remote] SSH: SSH connection between master 
> and slave established. 
> [{duration=1.4873}] 
> [2020-10-26 20:16:42.997121] I [resource(worker 
> /DATA/master01-data):1116:connect] GLUSTER: Mounting gluster volume 
> locally... 
> [2020-10-26 20:16:44.170661] E [syncdutils(worker 
> /DATA/master01-data):110:gf_mount_ready] : failed to get the xattr value 
> [2020-10-26 20:16:44.171281] I [resource(worker 
> /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume 
> [{duration=1.1739}] 
> [2020-10-26 20:16:44.171772] I [subcmds(worker 
> /DATA/master01-data):84:subcmd_worker] : Worker spawn successful. 
> Acknowledging back to monitor 
> [2020-10-26 20:16:46.200603] I [master(worker 
> /DATA/master01-data):1645:register] _GMaster: Working dir 
> [{path=/var/lib/misc/gluster/gsyncd/DATA_glu
> ster03_DATA-SLAVE/DATA-master01-data}] 
> [2020-10-26 

Re: [Gluster-users] Geo-replication status Faulty

2020-10-27 Thread Felix Kölzow

Dear Gilberto,


If I am right, you ran into server-quorum if you startet a 2-node
replica and shutdown one host.

From my perspective, its fine.


Please correct me if I am wrong here.


Regards,

Felix

On 27/10/2020 01:46, Gilberto Nunes wrote:

Well I do not reboot the host. I shut down the host. Then after 15 min
give up.
Don't know why that happened.
I will try it latter

---
Gilberto Nunes Ferreira






Em seg., 26 de out. de 2020 às 21:31, Strahil Nikolov
mailto:hunter86...@yahoo.com>> escreveu:

Usually there is always only 1 "master" , but when you power off
one of the 2 nodes - the geo rep should handle that and the second
node should take the job.

How long did you wait after gluster1 has been rebooted ?


Best Regards,
Strahil Nikolov






В понеделник, 26 октомври 2020 г., 22:46:21 Гринуич+2, Gilberto
Nunes mailto:gilberto.nune...@gmail.com>> написа:





I was able to solve the issue restarting all servers.

Now I have another issue!

I just powered off the gluster01 server and then the
geo-replication entered in faulty status.
I tried to stop and start the gluster geo-replication like that:

gluster volume geo-replication DATA root@gluster03::DATA-SLAVE
resume  Peer gluster01.home.local, which is a part of DATA volume,
is down. Please bring up the peer and retry. geo-replication
command failed
How can I have geo-replication with 2 master and 1 slave?

Thanks


---
Gilberto Nunes Ferreira







Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes
mailto:gilberto.nune...@gmail.com>>
escreveu:
> Hi there...
>
> I'd created a 2 gluster vol and another 1 gluster server acting
as a backup server, using geo-replication.
> So in gluster01 I'd issued the command:
>
> gluster peer probe gluster02;gluster peer probe gluster03
> gluster vol create DATA replica 2 gluster01:/DATA/master01-data
gluster02:/DATA/master01-data/
>
> Then in gluster03 server:
>
> gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/
>
> I'd setted the ssh powerless session between this 3 servers.
>
> Then I'd used this script
>
> https://github.com/gilbertoferreira/georepsetup
>
> like this
>
> georepsetup
   
/usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
CryptographyDeprecationWarning: Python 2 is no longer supported by
the Python core team. Support for it is now deprecated in
cryptography, and will be removed in a future release.  from
cryptography.hazmat.backends import default_backend usage:
georepsetup [-h] [--force] [--no-color] MASTERVOL SLAVE SLAVEVOL
georepsetup: error: too few arguments gluster01:~# georepsetup
DATA gluster03 DATA-SLAVE

/usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
CryptographyDeprecationWarning: Python 2 is no longer supported by
the Python core team. Support for it is now deprecated in
cryptography, and will be removed in a future release.  from
cryptography.hazmat.backends import default_backend
Geo-replication session will be established between DATA and
gluster03::DATA-SLAVE Root password of gluster03 is required to
complete the setup. NOTE: Password will not be stored.
root@gluster03's password:  [    OK] gluster03 is Reachable(Port
22) [    OK] SSH Connection established root@gluster03 [    OK]
Master Volume and Slave Volume are compatible (Version: 8.2) [
   OK] Common secret pub file present at
/var/lib/glusterd/geo-replication/common_secret.pem.pub [    OK]
common_secret.pem.pub file copied to gluster03 [    OK] Master SSH
Keys copied to all Up Slave nodes [    OK] Updated Master SSH Keys
to all Up Slave nodes authorized_keys file [    OK]
Geo-replication Session Established
> Then I reboot the 3 servers...
> After a while everything works ok, but after a few minutes, I
get Faulty status in gluster01
>
> There's the log
>
>
> [2020-10-26 20:16:41.362584] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change [{status=Initializing...}] [2020-10-26
20:16:41.362937] I [monitor(monitor):160:monitor] Monitor:
starting gsyncd worker [{brick=/DATA/master01-data},
{slave_node=gluster03}] [2020-10-26 20:16:41.508884] I
[resource(worker /DATA/master01-data):1387:connect_remote] SSH:
Initializing SSH connection between master and slave...
[2020-10-26 20:16:42.996678] I [resource(worker
/DATA/master01-data):1436:connect_remote] SSH: SSH connection
between master and slave established. [{duration=1.4873}]
[2020-10-26 20:16:42.997121] I [resource(worker
/DATA/master01-data):1116:connect] GLUSTER: Mounting gluster
volume locally... [2020-10-26 20:16:44.170661] E
[syncdutils(worker /DATA/master01-data):

Re: [Gluster-users] Geo-replication status Faulty

2020-10-26 Thread Strahil Nikolov
Usually there is always only 1 "master" , but when you power off one of the 2 
nodes - the geo rep should handle that and the second node should take the job.

How long did you wait after gluster1 has been rebooted ?


Best Regards,
Strahil Nikolov






В понеделник, 26 октомври 2020 г., 22:46:21 Гринуич+2, Gilberto Nunes 
 написа: 





I was able to solve the issue restarting all servers.

Now I have another issue!

I just powered off the gluster01 server and then the geo-replication entered in 
faulty status.
I tried to stop and start the gluster geo-replication like that:

gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume  Peer 
gluster01.home.local, which is a part of DATA volume, is down. Please bring up 
the peer and retry. geo-replication command failed
How can I have geo-replication with 2 master and 1 slave?

Thanks


---
Gilberto Nunes Ferreira







Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes 
 escreveu:
> Hi there...
> 
> I'd created a 2 gluster vol and another 1 gluster server acting as a backup 
> server, using geo-replication.
> So in gluster01 I'd issued the command:
> 
> gluster peer probe gluster02;gluster peer probe gluster03
> gluster vol create DATA replica 2 gluster01:/DATA/master01-data 
> gluster02:/DATA/master01-data/
> 
> Then in gluster03 server:
> 
> gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/
> 
> I'd setted the ssh powerless session between this 3 servers.
> 
> Then I'd used this script
> 
> https://github.com/gilbertoferreira/georepsetup
> 
> like this
> 
> georepsetup    
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
>  CryptographyDeprecationWarning: Python 2 is no longer supported by the 
> Python core team. Support for it is now deprecated in cryptography, and will 
> be removed in a future release.  from cryptography.hazmat.backends import 
> default_backend usage: georepsetup [-h] [--force] [--no-color] MASTERVOL 
> SLAVE SLAVEVOL georepsetup: error: too few arguments gluster01:~# georepsetup 
> DATA gluster03 DATA-SLAVE 
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
>  CryptographyDeprecationWarning: Python 2 is no longer supported by the 
> Python core team. Support for it is now deprecated in cryptography, and will 
> be removed in a future release.  from cryptography.hazmat.backends import 
> default_backend Geo-replication session will be established between DATA and 
> gluster03::DATA-SLAVE Root password of gluster03 is required to complete the 
> setup. NOTE: Password will not be stored. root@gluster03's password:  [    
> OK] gluster03 is Reachable(Port 22) [    OK] SSH Connection established 
> root@gluster03 [    OK] Master Volume and Slave Volume are compatible 
> (Version: 8.2) [    OK] Common secret pub file present at 
> /var/lib/glusterd/geo-replication/common_secret.pem.pub [    OK] 
> common_secret.pem.pub file copied to gluster03 [    OK] Master SSH Keys 
> copied to all Up Slave nodes [    OK] Updated Master SSH Keys to all Up Slave 
> nodes authorized_keys file [    OK] Geo-replication Session Established
> Then I reboot the 3 servers...
> After a while everything works ok, but after a few minutes, I get Faulty 
> status in gluster01
> 
> There's the log
> 
> 
> [2020-10-26 20:16:41.362584] I [gsyncdstatus(monitor):248:set_worker_status] 
> GeorepStatus: Worker Status Change [{status=Initializing...}] [2020-10-26 
> 20:16:41.362937] I [monitor(monitor):160:monitor] Monitor: starting gsyncd 
> worker [{brick=/DATA/master01-data}, {slave_node=gluster03}] [2020-10-26 
> 20:16:41.508884] I [resource(worker /DATA/master01-data):1387:connect_remote] 
> SSH: Initializing SSH connection between master and slave... [2020-10-26 
> 20:16:42.996678] I [resource(worker /DATA/master01-data):1436:connect_remote] 
> SSH: SSH connection between master and slave established. [{duration=1.4873}] 
> [2020-10-26 20:16:42.997121] I [resource(worker 
> /DATA/master01-data):1116:connect] GLUSTER: Mounting gluster volume 
> locally... [2020-10-26 20:16:44.170661] E [syncdutils(worker 
> /DATA/master01-data):110:gf_mount_ready] : failed to get the xattr value 
> [2020-10-26 20:16:44.171281] I [resource(worker 
> /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume 
> [{duration=1.1739}] [2020-10-26 20:16:44.171772] I [subcmds(worker 
> /DATA/master01-data):84:subcmd_worker] : Worker spawn successful. 
> Acknowledging back to monitor [2020-10-26 20:16:46.200603] I [master(worker 
> /DATA/master01-data):1645:register] _GMaster: Working dir 
> [{path=/var/lib/misc/gluster/gsyncd/DATA_gluster03_DATA-SLAVE/DATA-master01-data}]
>  [2020-10-26 20:16:46.201798] I [resource(worker 
> /DATA/master01-data):1292:service_loop] GLUSTER: Register time 
> [{time=1603743406}] [2020-10-26 20:16:46.226415] I [gsyncdstatus(worker 
> /DATA/master01-data):281:set_active] GeorepStatus: Worker Status Change 
> [{status=Active}] [2020-10-2

Re: [Gluster-users] Geo-replication status Faulty

2020-10-26 Thread Gilberto Nunes
Well I do not reboot the host. I shut down the host. Then after 15 min give
up.
Don't know why that happened.
I will try it latter

---
Gilberto Nunes Ferreira








Em seg., 26 de out. de 2020 às 21:31, Strahil Nikolov 
escreveu:

> Usually there is always only 1 "master" , but when you power off one of
> the 2 nodes - the geo rep should handle that and the second node should
> take the job.
>
> How long did you wait after gluster1 has been rebooted ?
>
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В понеделник, 26 октомври 2020 г., 22:46:21 Гринуич+2, Gilberto Nunes <
> gilberto.nune...@gmail.com> написа:
>
>
>
>
>
> I was able to solve the issue restarting all servers.
>
> Now I have another issue!
>
> I just powered off the gluster01 server and then the geo-replication
> entered in faulty status.
> I tried to stop and start the gluster geo-replication like that:
>
> gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume
>  Peer gluster01.home.local, which is a part of DATA volume, is down. Please
> bring up the peer and retry. geo-replication command failed
> How can I have geo-replication with 2 master and 1 slave?
>
> Thanks
>
>
> ---
> Gilberto Nunes Ferreira
>
>
>
>
>
>
>
> Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes <
> gilberto.nune...@gmail.com> escreveu:
> > Hi there...
> >
> > I'd created a 2 gluster vol and another 1 gluster server acting as a
> backup server, using geo-replication.
> > So in gluster01 I'd issued the command:
> >
> > gluster peer probe gluster02;gluster peer probe gluster03
> > gluster vol create DATA replica 2 gluster01:/DATA/master01-data
> gluster02:/DATA/master01-data/
> >
> > Then in gluster03 server:
> >
> > gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/
> >
> > I'd setted the ssh powerless session between this 3 servers.
> >
> > Then I'd used this script
> >
> > https://github.com/gilbertoferreira/georepsetup
> >
> > like this
> >
> > georepsetup
>
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
> CryptographyDeprecationWarning: Python 2 is no longer supported by the
> Python core team. Support for it is now deprecated in cryptography, and
> will be removed in a future release.  from cryptography.hazmat.backends
> import default_backend usage: georepsetup [-h] [--force] [--no-color]
> MASTERVOL SLAVE SLAVEVOL georepsetup: error: too few arguments gluster01:~#
> georepsetup DATA gluster03 DATA-SLAVE
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
> CryptographyDeprecationWarning: Python 2 is no longer supported by the
> Python core team. Support for it is now deprecated in cryptography, and
> will be removed in a future release.  from cryptography.hazmat.backends
> import default_backend Geo-replication session will be established between
> DATA and gluster03::DATA-SLAVE Root password of gluster03 is required to
> complete the setup. NOTE: Password will not be stored. root@gluster03's
> password:  [OK] gluster03 is Reachable(Port 22) [OK] SSH Connection
> established root@gluster03 [OK] Master Volume and Slave Volume are
> compatible (Version: 8.2) [OK] Common secret pub file present at
> /var/lib/glusterd/geo-replication/common_secret.pem.pub [OK]
> common_secret.pem.pub file copied to gluster03 [OK] Master SSH Keys
> copied to all Up Slave nodes [OK] Updated Master SSH Keys to all Up
> Slave nodes authorized_keys file [OK] Geo-replication Session
> Established
> > Then I reboot the 3 servers...
> > After a while everything works ok, but after a few minutes, I get Faulty
> status in gluster01
> >
> > There's the log
> >
> >
> > [2020-10-26 20:16:41.362584] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change [{status=Initializing...}] [2020-10-26 20:16:41.362937] I
> [monitor(monitor):160:monitor] Monitor: starting gsyncd worker
> [{brick=/DATA/master01-data}, {slave_node=gluster03}] [2020-10-26
> 20:16:41.508884] I [resource(worker
> /DATA/master01-data):1387:connect_remote] SSH: Initializing SSH connection
> between master and slave... [2020-10-26 20:16:42.996678] I [resource(worker
> /DATA/master01-data):1436:connect_remote] SSH: SSH connection between
> master and slave established. [{duration=1.4873}] [2020-10-26
> 20:16:42.997121] I [resource(worker /DATA/master01-data):1116:connect]
> GLUSTER: Mounting gluster volume locally... [2020-10-26 20:16:44.170661] E
> [syncdutils(worker /DATA/master01-data):110:gf_mount_ready] : failed
> to get the xattr value [2020-10-26 20:16:44.171281] I [resource(worker
> /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume
> [{duration=1.1739}] [2020-10-26 20:16:44.171772] I [subcmds(worker
> /DATA/master01-data):84:subcmd_worker] : Worker spawn successful.
> Acknowledging back to monitor [2020-10-26 20:16:46.200603] I [master(worker
> /DATA/master01-data):1645:register] _GMaster: Working dir
> [{path=/var/lib/

Re: [Gluster-users] Geo-replication status Faulty

2020-10-26 Thread Gilberto Nunes
I was able to solve the issue restarting all servers.

Now I have another issue!

I just powered off the gluster01 server and then the geo-replication
entered in faulty status.
I tried to stop and start the gluster geo-replication like that:

gluster volume geo-replication DATA root@gluster03::DATA-SLAVE resume
Peer gluster01.home.local, which is a part of DATA volume, is down. Please
bring up the peer and retry.
geo-replication command failed

How can I have geo-replication with 2 master and 1 slave?

Thanks


---
Gilberto Nunes Ferreira






Em seg., 26 de out. de 2020 às 17:23, Gilberto Nunes <
gilberto.nune...@gmail.com> escreveu:

> Hi there...
>
> I'd created a 2 gluster vol and another 1 gluster server acting as a
> backup server, using geo-replication.
> So in gluster01 I'd issued the command:
>
> gluster peer probe gluster02;gluster peer probe gluster03
> gluster vol create DATA replica 2 gluster01:/DATA/master01-data
> gluster02:/DATA/master01-data/
>
> Then in gluster03 server:
>
> gluster vol create DATA-SLAVE gluster03:/DATA/slave-data/
>
> I'd setted the ssh powerless session between this 3 servers.
>
> Then I'd used this script
>
> https://github.com/gilbertoferreira/georepsetup
>
> like this
>
> georepsetup
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
> CryptographyDeprecationWarning: Python 2 is no longer supp
> orted by the Python core team. Support for it is now deprecated in
> cryptography, and will be removed in a future release.
>  from cryptography.hazmat.backends import default_backend
> usage: georepsetup [-h] [--force] [--no-color] MASTERVOL SLAVE SLAVEVOL
> georepsetup: error: too few arguments
> gluster01:~# georepsetup DATA gluster03 DATA-SLAVE
> /usr/local/lib/python2.7/dist-packages/paramiko-2.7.2-py2.7.egg/paramiko/transport.py:33:
> CryptographyDeprecationWarning: Python 2 is no longer supp
> orted by the Python core team. Support for it is now deprecated in
> cryptography, and will be removed in a future release.
>  from cryptography.hazmat.backends import default_backend
> Geo-replication session will be established between DATA and
> gluster03::DATA-SLAVE
> Root password of gluster03 is required to complete the setup. NOTE:
> Password will not be stored.
>
> root@gluster03's password:
> [OK] gluster03 is Reachable(Port 22)
> [OK] SSH Connection established root@gluster03
> [OK] Master Volume and Slave Volume are compatible (Version: 8.2)
> [OK] Common secret pub file present at
> /var/lib/glusterd/geo-replication/common_secret.pem.pub
> [OK] common_secret.pem.pub file copied to gluster03
> [OK] Master SSH Keys copied to all Up Slave nodes
> [OK] Updated Master SSH Keys to all Up Slave nodes authorized_keys
> file
> [OK] Geo-replication Session Established
>
> Then I reboot the 3 servers...
> After a while everything works ok, but after a few minutes, I get Faulty
> status in gluster01
>
> There's the log
>
>
> [2020-10-26 20:16:41.362584] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change [{status=Initializing...}]
> [2020-10-26 20:16:41.362937] I [monitor(monitor):160:monitor] Monitor:
> starting gsyncd worker [{brick=/DATA/master01-data},
> {slave_node=gluster03}]
> [2020-10-26 20:16:41.508884] I [resource(worker
> /DATA/master01-data):1387:connect_remote] SSH: Initializing SSH connection
> between master and slave.
> ..
> [2020-10-26 20:16:42.996678] I [resource(worker
> /DATA/master01-data):1436:connect_remote] SSH: SSH connection between
> master and slave established.
> [{duration=1.4873}]
> [2020-10-26 20:16:42.997121] I [resource(worker
> /DATA/master01-data):1116:connect] GLUSTER: Mounting gluster volume
> locally...
> [2020-10-26 20:16:44.170661] E [syncdutils(worker
> /DATA/master01-data):110:gf_mount_ready] : failed to get the xattr
> value
> [2020-10-26 20:16:44.171281] I [resource(worker
> /DATA/master01-data):1139:connect] GLUSTER: Mounted gluster volume
> [{duration=1.1739}]
> [2020-10-26 20:16:44.171772] I [subcmds(worker
> /DATA/master01-data):84:subcmd_worker] : Worker spawn successful.
> Acknowledging back to monitor
> [2020-10-26 20:16:46.200603] I [master(worker
> /DATA/master01-data):1645:register] _GMaster: Working dir
> [{path=/var/lib/misc/gluster/gsyncd/DATA_glu
> ster03_DATA-SLAVE/DATA-master01-data}]
> [2020-10-26 20:16:46.201798] I [resource(worker
> /DATA/master01-data):1292:service_loop] GLUSTER: Register time
> [{time=1603743406}]
> [2020-10-26 20:16:46.226415] I [gsyncdstatus(worker
> /DATA/master01-data):281:set_active] GeorepStatus: Worker Status Change
> [{status=Active}]
> [2020-10-26 20:16:46.395112] I [gsyncdstatus(worker
> /DATA/master01-data):253:set_worker_crawl_status] GeorepStatus: Crawl
> Status Change [{status=His
> tory Crawl}]
> [2020-10-26 20:16:46.396491] I [master(worker
> /DATA/master01-data):1559:crawl] _GMaster: starting history crawl
> [{turns=1}, {stime=(1603742506, 0)},
> {etime=1603743406}, {ent

Re: [Gluster-users] Geo-replication log file not closed

2020-08-30 Thread David Cunningham
Hello all,

Apparently we don't want to "kill -HUP" the two processes that have rotated
log file still open:
root  4495 1  0 Aug10 ?00:00:59 /usr/bin/python2
/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
--path=/nodirectwritedata/gluster/gvol0  --monitor -c
/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
--iprefix=/var :gvol0 --glusterd-uuid=b7521445-ee93-4fed-8ced-6a609fa8c7d4
nvfs10::gvol0
root  4508  4495  0 Aug10 ?00:01:56 python2
/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py agent gvol0
nvfs10::gvol0 --local-path /nodirectwritedata/gluster/gvol0 --local-node
cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 9,12,11,10
... a kill -HUP on those processes stops them rather than re-opening the
log file.

Does anyone know if these processes are supposed to have gsyncd.log open?
If so, how do we tell them to close and re-open their file handle?

Thanks in advance!


On Tue, 25 Aug 2020 at 15:24, David Cunningham 
wrote:

> Hello,
>
> We're having an issue with the rotated gsyncd.log not being released.
> Here's the output of 'lsof':
>
> # lsof | grep 'gsyncd.log.1'
> python24495  root3w  REG8,1
>  9916750234332241
> /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted)
> python24495  4496root3w  REG8,1
>  9916750234332241
> /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted)
> python24495  4507root3w  REG8,1
>  9916750234332241
> /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted)
> python24508  root3w  REG8,1
>  9916750234332241
> /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted)
> python24508  root5w  REG8,1
>  9916750234332241
> /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted)
> python24508  4511root3w  REG8,1
>  9916750234332241
> /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted)
> ... etc...
>
> Those processes are:
> # ps -ef | egrep '4495|4508'
> root  4495 1  0 Aug10 ?00:00:59 /usr/bin/python2
> /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
> --path=/nodirectwritedata/gluster/gvol0  --monitor -c
> /var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
> --iprefix=/var :gvol0 --glusterd-uuid=b7521445-ee93-4fed-8ced-6a609fa8c7d4
> nvfs10::gvol0
> root  4508  4495  0 Aug10 ?00:01:56 python2
> /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py agent gvol0
> nvfs10::gvol0 --local-path /nodirectwritedata/gluster/gvol0 --local-node
> cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
> cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 9,12,11,10
>
> And here's the relevant part of the /etc/logrotate.d/glusterfs-georep
> script:
>
> /var/log/glusterfs/geo-replication/*/*.log {
> sharedscripts
> rotate 52
> missingok
> compress
> delaycompress
> notifempty
> postrotate
> for pid in `ps -aef | grep glusterfs | egrep "\-\-aux-gfid-mount" |
> awk '{print $2}'`; do
> /usr/bin/kill -HUP $pid > /dev/null 2>&1 || true
> done
>  endscript
> }
>
> If I run the postrotate part manually:
> # ps -aef | grep glusterfs | egrep "\-\-aux-gfid-mount" | awk '{print $2}'
> 4520
>
> # ps -aef | grep 4520
> root  4520 1  0 Aug10 ?01:24:23 /usr/sbin/glusterfs
> --aux-gfid-mount --acl --log-level=INFO
> --log-file=/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/mnt-nodirectwritedata-gluster-gvol0.log
> --volfile-server=localhost --volfile-id=gvol0 --client-pid=-1
> /tmp/gsyncd-aux-mount-Tq_3sU
>
> Perhaps the problem is that the kill -HUP in the logrotate script doesn't
> act on the right process? If so, does anyone have a command to get the
> right PID?
>
> Thanks in advance for any help.
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>


-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication causes OOM

2020-08-17 Thread Matthew Benstead

Thanks Strahil,

Would the geo rep process be the gsyncd.py proceses?

It seems like it's the glusterfsd and auxiliary mounts that are holding 
all the memory right now...


Could this be related to the open-behind bug mentioned here: 
https://github.com/gluster/glusterfs/issues/1444  and here: 
https://github.com/gluster/glusterfs/issues/1440 ?


Thanks,
 -Matthew

Matthew Benstead
System Administrator
Pacific Climate Impacts Consortium 
University of Victoria, UH1
PO Box 1800, STN CSC
Victoria, BC, V8W 2Y2
Phone: 1-250-721-8432
Email: matth...@uvic.ca 
On 2020-08-14 10:35 p.m., Strahil Nikolov wrote:

Hey Matthew,

Can you check with valgrind the memory leak ?

It will be something like:
Find the geo rep process via ps and note  all parameters it was started with .
Next stop geo rep.

Then start it with valgrind :
valgrind --log-file="filename"  --tool=memcheck --leak-check=full   

It might help narrowing the problem.

Best Regards,
Strahil Nikolov

На 14 август 2020 г. 20:22:16 GMT+03:00, Matthew Benstead  
написа:

Hi,

We are building a new storage system, and after geo-replication has
been
running for a few hours the server runs out of memory and oom-killer
starts killing bricks. It runs fine without geo-replication on, and the

server has 64GB of RAM. I have stopped geo-replication for now.

Any ideas what to tune?

[root@storage01 ~]# gluster --version | head -1
glusterfs 7.7

[root@storage01 ~]# cat /etc/centos-release; uname -r
CentOS Linux release 7.8.2003 (Core)
3.10.0-1127.10.1.el7.x86_64

[root@storage01 ~]# df -h /storage2/
Filesystem    Size  Used Avail Use% Mounted on
10.0.231.91:/storage  328T  228T  100T  70% /storage2

[root@storage01 ~]# cat /proc/meminfo  | grep MemTotal
MemTotal:   65412064 kB

[root@storage01 ~]# free -g
   total    used    free  shared buff/cache
available
Mem: 62  18   0   0 43  43
Swap: 3   0   3


[root@storage01 ~]# gluster volume info

Volume Name: storage
Type: Distributed-Replicate
Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: 10.0.231.91:/data/storage_a/storage
Brick2: 10.0.231.92:/data/storage_b/storage
Brick3: 10.0.231.93:/data/storage_c/storage (arbiter)
Brick4: 10.0.231.92:/data/storage_a/storage
Brick5: 10.0.231.93:/data/storage_b/storage
Brick6: 10.0.231.91:/data/storage_c/storage (arbiter)
Brick7: 10.0.231.93:/data/storage_a/storage
Brick8: 10.0.231.91:/data/storage_b/storage
Brick9: 10.0.231.92:/data/storage_c/storage (arbiter)
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
network.ping-timeout: 10
features.inode-quota: on
features.quota: on
nfs.disable: on
features.quota-deem-statfs: on
storage.fips-mode-rchecksum: on
performance.readdir-ahead: on
performance.parallel-readdir: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
performance.cache-size: 256MB

You can see the memory spike and reduce as bricks are killed - this
happened twice in the graph below:



You can see two brick processes are down:

[root@storage01 ~]# gluster volume status
Status of volume: storage
Gluster process TCP Port  RDMA Port  Online
Pid
--
Brick 10.0.231.91:/data/storage_a/storage   N/A   N/AN
N/A
Brick 10.0.231.92:/data/storage_b/storage   49152 0  Y
1627
Brick 10.0.231.93:/data/storage_c/storage   49152 0  Y
259966
Brick 10.0.231.92:/data/storage_a/storage   49153 0  Y
1642
Brick 10.0.231.93:/data/storage_b/storage   49153 0  Y
259975
Brick 10.0.231.91:/data/storage_c/storage   49153 0  Y
20656
Brick 10.0.231.93:/data/storage_a/storage   49154 0  Y
259983
Brick 10.0.231.91:/data/storage_b/storage   N/A   N/AN
N/A
Brick 10.0.231.92:/data/storage_c/storage   49154 0  Y
1655
Self-heal Daemon on localhost   N/A   N/AY
20690
Quota Daemon on localhost   N/A   N/AY
172136
Self-heal Daemon on 10.0.231.93 N/A   N/AY
260010
Quota Daemon on 10.0.231.93 N/A   N/AY
128115
Self-heal Daemon on 10.0.231.92 N/A   N/AY
1702
Quota Daemon on 10.0.231.92 N/A   N/AY
128564

Task Status of Volume storage
--
There are no active volume tasks

Logs:

[2020-08-13 20:58:22.186540] I [MSGID: 106143]
[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick
(null) on port 49154
[2020-08-13 20:58:22.196110] I [MSGID: 106005]
[glusterd-handler.c:5960:__glusterd_brick_rpc_notify] 0-management:
Brick

Re: [Gluster-users] Geo-replication causes OOM

2020-08-15 Thread Strahil Nikolov
Hey Matthew,

Can you check with valgrind the memory leak ?

It will be something like:
Find the geo rep process via ps and note  all parameters it was started with .
Next stop geo rep.

Then start it with valgrind :
valgrind --log-file="filename"  --tool=memcheck --leak-check=full   

It might help narrowing the problem.

Best Regards,
Strahil Nikolov

На 14 август 2020 г. 20:22:16 GMT+03:00, Matthew Benstead  
написа:
>Hi,
>
>We are building a new storage system, and after geo-replication has
>been 
>running for a few hours the server runs out of memory and oom-killer 
>starts killing bricks. It runs fine without geo-replication on, and the
>
>server has 64GB of RAM. I have stopped geo-replication for now.
>
>Any ideas what to tune?
>
>[root@storage01 ~]# gluster --version | head -1
>glusterfs 7.7
>
>[root@storage01 ~]# cat /etc/centos-release; uname -r
>CentOS Linux release 7.8.2003 (Core)
>3.10.0-1127.10.1.el7.x86_64
>
>[root@storage01 ~]# df -h /storage2/
>Filesystem    Size  Used Avail Use% Mounted on
>10.0.231.91:/storage  328T  228T  100T  70% /storage2
>
>[root@storage01 ~]# cat /proc/meminfo  | grep MemTotal
>MemTotal:   65412064 kB
>
>[root@storage01 ~]# free -g
>   total    used    free  shared buff/cache   
>available
>Mem: 62  18   0   0 43  43
>Swap: 3   0   3
>
>
>[root@storage01 ~]# gluster volume info
>
>Volume Name: storage
>Type: Distributed-Replicate
>Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99
>Status: Started
>Snapshot Count: 0
>Number of Bricks: 3 x (2 + 1) = 9
>Transport-type: tcp
>Bricks:
>Brick1: 10.0.231.91:/data/storage_a/storage
>Brick2: 10.0.231.92:/data/storage_b/storage
>Brick3: 10.0.231.93:/data/storage_c/storage (arbiter)
>Brick4: 10.0.231.92:/data/storage_a/storage
>Brick5: 10.0.231.93:/data/storage_b/storage
>Brick6: 10.0.231.91:/data/storage_c/storage (arbiter)
>Brick7: 10.0.231.93:/data/storage_a/storage
>Brick8: 10.0.231.91:/data/storage_b/storage
>Brick9: 10.0.231.92:/data/storage_c/storage (arbiter)
>Options Reconfigured:
>changelog.changelog: on
>geo-replication.ignore-pid-check: on
>geo-replication.indexing: on
>network.ping-timeout: 10
>features.inode-quota: on
>features.quota: on
>nfs.disable: on
>features.quota-deem-statfs: on
>storage.fips-mode-rchecksum: on
>performance.readdir-ahead: on
>performance.parallel-readdir: on
>cluster.lookup-optimize: on
>client.event-threads: 4
>server.event-threads: 4
>performance.cache-size: 256MB
>
>You can see the memory spike and reduce as bricks are killed - this 
>happened twice in the graph below:
>
>
>
>You can see two brick processes are down:
>
>[root@storage01 ~]# gluster volume status
>Status of volume: storage
>Gluster process TCP Port  RDMA Port  Online
> Pid
>--
>Brick 10.0.231.91:/data/storage_a/storage   N/A   N/AN 
> N/A
>Brick 10.0.231.92:/data/storage_b/storage   49152 0  Y 
> 1627
>Brick 10.0.231.93:/data/storage_c/storage   49152 0  Y 
> 259966
>Brick 10.0.231.92:/data/storage_a/storage   49153 0  Y 
> 1642
>Brick 10.0.231.93:/data/storage_b/storage   49153 0  Y 
> 259975
>Brick 10.0.231.91:/data/storage_c/storage   49153 0  Y 
> 20656
>Brick 10.0.231.93:/data/storage_a/storage   49154 0  Y 
> 259983
>Brick 10.0.231.91:/data/storage_b/storage   N/A   N/AN 
> N/A
>Brick 10.0.231.92:/data/storage_c/storage   49154 0  Y 
> 1655
>Self-heal Daemon on localhost   N/A   N/AY 
> 20690
>Quota Daemon on localhost   N/A   N/AY 
> 172136
>Self-heal Daemon on 10.0.231.93 N/A   N/AY 
> 260010
>Quota Daemon on 10.0.231.93 N/A   N/AY 
> 128115
>Self-heal Daemon on 10.0.231.92 N/A   N/AY 
> 1702
>Quota Daemon on 10.0.231.92 N/A   N/AY 
> 128564
>
>Task Status of Volume storage
>--
>There are no active volume tasks
>
>Logs:
>
>[2020-08-13 20:58:22.186540] I [MSGID: 106143]
>[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick
>(null) on port 49154
>[2020-08-13 20:58:22.196110] I [MSGID: 106005]
>[glusterd-handler.c:5960:__glusterd_brick_rpc_notify] 0-management:
>Brick 10.0.231.91:/data/storage_b/storage has disconnected from
>glusterd.
>[2020-08-13 20:58:22.196752] I [MSGID: 106143]
>[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick
>/data/storage_b/storage on port 49154
>
>[2020-08-13 21:05:23.418966] I [MSGID: 106143]
>[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick
>(null) on port 49152
>[2020-08-13 21:05:23.420881] I [MSGID: 106005]
>[glusterd-handler.c:5960:__glusterd

Re: [Gluster-users] Geo-replication completely broken

2020-07-03 Thread Strahil Nikolov
Hi Felix,

It seems I missed your reply with the change log that Shwetha requested.

Best Regards,
Strahil Nikolov

На 3 юли 2020 г. 11:16:30 GMT+03:00, "Felix Kölzow"  
написа:
>Dear Users,
>the geo-replication is still broken. This is not really a comfortable
>situation.
>Does any user has had the same experience and is able to share a
>possible workaround?
>We are actually running gluster v6.0
>Regards,
>
>Felix
>
>
>On 25/06/2020 10:04, Shwetha Acharya wrote:
>> Hi Rob and Felix,
>>
>> Please share the *-changes.log files and brick logs, which will help
>> in analysis of the issue.
>>
>> Regards,
>> Shwetha
>>
>> On Thu, Jun 25, 2020 at 1:26 PM Felix Kölzow > > wrote:
>>
>> Hey Rob,
>>
>>
>> same issue for our third volume. Have a look at the logs just
>from
>> right now (below).
>>
>> Question: You removed the htime files and the old changelogs.
>Just
>> rm the files or is there something to pay more attention
>>
>> before removing the changelog files and the htime file.
>>
>> Regards,
>>
>> Felix
>>
>> [2020-06-25 07:51:53.795430] I [resource(worker
>> /gluster/vg00/dispersed_fuse1024/brick):1435:connect_remote] SSH:
>> SSH connection between master and slave established.   
>> duration=1.2341
>> [2020-06-25 07:51:53.795639] I [resource(worker
>> /gluster/vg00/dispersed_fuse1024/brick):1105:connect] GLUSTER:
>> Mounting gluster volume locally...
>> [2020-06-25 07:51:54.520601] I [monitor(monitor):280:monitor]
>> Monitor: worker died in startup phase
>> brick=/gluster/vg01/dispersed_fuse1024/brick
>> [2020-06-25 07:51:54.535809] I
>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
>Worker
>> Status Change    status=Faulty
>> [2020-06-25 07:51:54.882143] I [resource(worker
>> /gluster/vg00/dispersed_fuse1024/brick):1128:connect] GLUSTER:
>> Mounted gluster volume    duration=1.0864
>> [2020-06-25 07:51:54.882388] I [subcmds(worker
>> /gluster/vg00/dispersed_fuse1024/brick):84:subcmd_worker] :
>> Worker spawn successful. Acknowledging back to monitor
>> [2020-06-25 07:51:56.911412] E [repce(agent
>> /gluster/vg00/dispersed_fuse1024/brick):121:worker] : call
>> failed:
>> Traceback (most recent call last):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
>> 117, in worker
>>     res = getattr(self.obj, rmeth)(*in_data[2:])
>>   File
>> "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
>line
>> 40, in register
>>     return Changes.cl_register(cl_brick, cl_dir, cl_log,
>cl_level,
>> retries)
>>   File
>> "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>line
>> 46, in cl_register
>>     cls.raise_changelog_err()
>>   File
>> "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>line
>> 30, in raise_changelog_err
>>     raise ChangelogException(errn, os.strerror(errn))
>> ChangelogException: [Errno 2] No such file or directory
>> [2020-06-25 07:51:56.912056] E [repce(worker
>> /gluster/vg00/dispersed_fuse1024/brick):213:__call__]
>RepceClient:
>> call failed call=75086:140098349655872:1593071514.91
>> method=register    error=ChangelogException
>> [2020-06-25 07:51:56.912396] E [resource(worker
>> /gluster/vg00/dispersed_fuse1024/brick):1286:service_loop]
>> GLUSTER: Changelog register failed    error=[Errno 2] No such
>file
>> or directory
>> [2020-06-25 07:51:56.928031] I [repce(agent
>> /gluster/vg00/dispersed_fuse1024/brick):96:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2020-06-25 07:51:57.886126] I [monitor(monitor):280:monitor]
>> Monitor: worker died in startup phase
>> brick=/gluster/vg00/dispersed_fuse1024/brick
>> [2020-06-25 07:51:57.895920] I
>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
>Worker
>> Status Change    status=Faulty
>> [2020-06-25 07:51:58.607405] I [gsyncdstatus(worker
>> /gluster/vg00/dispersed_fuse1024/brick):287:set_passive]
>> GeorepStatus: Worker Status Change    status=Passive
>> [2020-06-25 07:51:58.607768] I [gsyncdstatus(worker
>> /gluster/vg01/dispersed_fuse1024/brick):287:set_passive]
>> GeorepStatus: Worker Status Change    status=Passive
>> [2020-06-25 07:51:58.608004] I [gsyncdstatus(worker
>> /gluster/vg00/dispersed_fuse1024/brick):281:set_active]
>> GeorepStatus: Worker Status Change    status=Active
>>
>>
>> On 25/06/2020 09:15, rob.quaglio...@rabobank.com
>>  wrote:
>>>
>>> Hi All,
>>>
>>> We’ve got two six node RHEL 7.8 clusters and geo-replication
>>> would appear to be completely broken between them. I’ve deleted
>>> the session, removed & recreated pem files, old changlogs/htime
>>> (after removing relevant options from volume) and completely set
>>> up geo-rep from scratch

Re: [Gluster-users] Geo-replication completely broken

2020-07-03 Thread Felix Kölzow

Dear Users,
the geo-replication is still broken. This is not really a comfortable
situation.
Does any user has had the same experience and is able to share a
possible workaround?
We are actually running gluster v6.0
Regards,

Felix


On 25/06/2020 10:04, Shwetha Acharya wrote:

Hi Rob and Felix,

Please share the *-changes.log files and brick logs, which will help
in analysis of the issue.

Regards,
Shwetha

On Thu, Jun 25, 2020 at 1:26 PM Felix Kölzow mailto:felix.koel...@gmx.de>> wrote:

Hey Rob,


same issue for our third volume. Have a look at the logs just from
right now (below).

Question: You removed the htime files and the old changelogs. Just
rm the files or is there something to pay more attention

before removing the changelog files and the htime file.

Regards,

Felix

[2020-06-25 07:51:53.795430] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1435:connect_remote] SSH:
SSH connection between master and slave established.   
duration=1.2341
[2020-06-25 07:51:53.795639] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1105:connect] GLUSTER:
Mounting gluster volume locally...
[2020-06-25 07:51:54.520601] I [monitor(monitor):280:monitor]
Monitor: worker died in startup phase
brick=/gluster/vg01/dispersed_fuse1024/brick
[2020-06-25 07:51:54.535809] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change    status=Faulty
[2020-06-25 07:51:54.882143] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1128:connect] GLUSTER:
Mounted gluster volume    duration=1.0864
[2020-06-25 07:51:54.882388] I [subcmds(worker
/gluster/vg00/dispersed_fuse1024/brick):84:subcmd_worker] :
Worker spawn successful. Acknowledging back to monitor
[2020-06-25 07:51:56.911412] E [repce(agent
/gluster/vg00/dispersed_fuse1024/brick):121:worker] : call
failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
117, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File
"/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
40, in register
    return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level,
retries)
  File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
46, in cl_register
    cls.raise_changelog_err()
  File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
30, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2020-06-25 07:51:56.912056] E [repce(worker
/gluster/vg00/dispersed_fuse1024/brick):213:__call__] RepceClient:
call failed call=75086:140098349655872:1593071514.91
method=register    error=ChangelogException
[2020-06-25 07:51:56.912396] E [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1286:service_loop]
GLUSTER: Changelog register failed    error=[Errno 2] No such file
or directory
[2020-06-25 07:51:56.928031] I [repce(agent
/gluster/vg00/dispersed_fuse1024/brick):96:service_loop]
RepceServer: terminating on reaching EOF.
[2020-06-25 07:51:57.886126] I [monitor(monitor):280:monitor]
Monitor: worker died in startup phase
brick=/gluster/vg00/dispersed_fuse1024/brick
[2020-06-25 07:51:57.895920] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change    status=Faulty
[2020-06-25 07:51:58.607405] I [gsyncdstatus(worker
/gluster/vg00/dispersed_fuse1024/brick):287:set_passive]
GeorepStatus: Worker Status Change    status=Passive
[2020-06-25 07:51:58.607768] I [gsyncdstatus(worker
/gluster/vg01/dispersed_fuse1024/brick):287:set_passive]
GeorepStatus: Worker Status Change    status=Passive
[2020-06-25 07:51:58.608004] I [gsyncdstatus(worker
/gluster/vg00/dispersed_fuse1024/brick):281:set_active]
GeorepStatus: Worker Status Change    status=Active


On 25/06/2020 09:15, rob.quaglio...@rabobank.com
 wrote:


Hi All,

We’ve got two six node RHEL 7.8 clusters and geo-replication
would appear to be completely broken between them. I’ve deleted
the session, removed & recreated pem files, old changlogs/htime
(after removing relevant options from volume) and completely set
up geo-rep from scratch, but the new session comes up as
Initializing, then goes faulty, and starts looping. Volume (on
both sides) is a 4 x 2 disperse, running Gluster v6 (RH latest). 
Gsyncd reports:

[2020-06-25 07:07:14.701423] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
Worker Status Change status=Initializing...

[2020-06-25 07:07:14.701744] I [monitor(monitor):159:monitor]
Monitor: starting gsyncd worker   brick=/rhgs/brick20/brick
slave_node=bxts470194.eu.rab

Re: [Gluster-users] Geo-replication completely broken

2020-06-25 Thread Shwetha Acharya
Hi Rob and Felix,

Please share the *-changes.log files and brick logs, which will help in
analysis of the issue.

Regards,
Shwetha

On Thu, Jun 25, 2020 at 1:26 PM Felix Kölzow  wrote:

> Hey Rob,
>
>
> same issue for our third volume. Have a look at the logs just from right
> now (below).
>
> Question: You removed the htime files and the old changelogs. Just rm the
> files or is there something to pay more attention
>
> before removing the changelog files and the htime file.
>
> Regards,
>
> Felix
>
> [2020-06-25 07:51:53.795430] I [resource(worker
> /gluster/vg00/dispersed_fuse1024/brick):1435:connect_remote] SSH: SSH
> connection between master and slave established.duration=1.2341
> [2020-06-25 07:51:53.795639] I [resource(worker
> /gluster/vg00/dispersed_fuse1024/brick):1105:connect] GLUSTER: Mounting
> gluster volume locally...
> [2020-06-25 07:51:54.520601] I [monitor(monitor):280:monitor] Monitor:
> worker died in startup phasebrick=/gluster/vg01/dispersed_fuse1024/brick
> [2020-06-25 07:51:54.535809] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Changestatus=Faulty
> [2020-06-25 07:51:54.882143] I [resource(worker
> /gluster/vg00/dispersed_fuse1024/brick):1128:connect] GLUSTER: Mounted
> gluster volumeduration=1.0864
> [2020-06-25 07:51:54.882388] I [subcmds(worker
> /gluster/vg00/dispersed_fuse1024/brick):84:subcmd_worker] : Worker
> spawn successful. Acknowledging back to monitor
> [2020-06-25 07:51:56.911412] E [repce(agent
> /gluster/vg00/dispersed_fuse1024/brick):121:worker] : call failed:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in
> worker
> res = getattr(self.obj, rmeth)(*in_data[2:])
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
> 40, in register
> return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries)
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 46, in cl_register
> cls.raise_changelog_err()
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 30, in raise_changelog_err
> raise ChangelogException(errn, os.strerror(errn))
> ChangelogException: [Errno 2] No such file or directory
> [2020-06-25 07:51:56.912056] E [repce(worker
> /gluster/vg00/dispersed_fuse1024/brick):213:__call__] RepceClient: call
> failedcall=75086:140098349655872:1593071514.91method=register
> error=ChangelogException
> [2020-06-25 07:51:56.912396] E [resource(worker
> /gluster/vg00/dispersed_fuse1024/brick):1286:service_loop] GLUSTER:
> Changelog register failederror=[Errno 2] No such file or directory
> [2020-06-25 07:51:56.928031] I [repce(agent
> /gluster/vg00/dispersed_fuse1024/brick):96:service_loop] RepceServer:
> terminating on reaching EOF.
> [2020-06-25 07:51:57.886126] I [monitor(monitor):280:monitor] Monitor:
> worker died in startup phasebrick=/gluster/vg00/dispersed_fuse1024/brick
> [2020-06-25 07:51:57.895920] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Changestatus=Faulty
> [2020-06-25 07:51:58.607405] I [gsyncdstatus(worker
> /gluster/vg00/dispersed_fuse1024/brick):287:set_passive] GeorepStatus:
> Worker Status Changestatus=Passive
> [2020-06-25 07:51:58.607768] I [gsyncdstatus(worker
> /gluster/vg01/dispersed_fuse1024/brick):287:set_passive] GeorepStatus:
> Worker Status Changestatus=Passive
> [2020-06-25 07:51:58.608004] I [gsyncdstatus(worker
> /gluster/vg00/dispersed_fuse1024/brick):281:set_active] GeorepStatus:
> Worker Status Changestatus=Active
>
>
> On 25/06/2020 09:15, rob.quaglio...@rabobank.com wrote:
>
> Hi All,
>
>
>
> We’ve got two six node RHEL 7.8 clusters and geo-replication would appear
> to be completely broken between them. I’ve deleted the session, removed &
> recreated pem files, old changlogs/htime (after removing relevant options
> from volume) and completely set up geo-rep from scratch, but the new
> session comes up as Initializing, then goes faulty, and starts looping.
> Volume (on both sides) is a 4 x 2 disperse, running Gluster v6 (RH
> latest).  Gsyncd reports:
>
>
>
> [2020-06-25 07:07:14.701423] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change status=Initializing...
>
> [2020-06-25 07:07:14.701744] I [monitor(monitor):159:monitor] Monitor:
> starting gsyncd worker   brick=/rhgs/brick20/brick   slave_node=
> bxts470194.eu.rabonet.com
>
> [2020-06-25 07:07:14.707997] D [monitor(monitor):230:monitor] Monitor:
> Worker would mount volume privately
>
> [2020-06-25 07:07:14.757181] I [gsyncd(agent
> /rhgs/brick20/brick):318:main] : Using session config file
> path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf
>
> [2020-06-25 07:07:14.758126] D [subcmds(agent
> /rhgs/brick20/brick):107:subcmd_agent] : RPC FD
> rpc_fd='5,12,11,10'
>
> [2020-06-25 07:07:14.758627] I [changelogagent(agent
> /rhgs/bri

Re: [Gluster-users] Geo-replication completely broken

2020-06-25 Thread Felix Kölzow

Hey Rob,


same issue for our third volume. Have a look at the logs just from right
now (below).

Question: You removed the htime files and the old changelogs. Just rm
the files or is there something to pay more attention

before removing the changelog files and the htime file.

Regards,

Felix

[2020-06-25 07:51:53.795430] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1435:connect_remote] SSH: SSH
connection between master and slave established. duration=1.2341
[2020-06-25 07:51:53.795639] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1105:connect] GLUSTER: Mounting
gluster volume locally...
[2020-06-25 07:51:54.520601] I [monitor(monitor):280:monitor] Monitor:
worker died in startup phase brick=/gluster/vg01/dispersed_fuse1024/brick
[2020-06-25 07:51:54.535809] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change    status=Faulty
[2020-06-25 07:51:54.882143] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1128:connect] GLUSTER: Mounted
gluster volume    duration=1.0864
[2020-06-25 07:51:54.882388] I [subcmds(worker
/gluster/vg00/dispersed_fuse1024/brick):84:subcmd_worker] : Worker
spawn successful. Acknowledging back to monitor
[2020-06-25 07:51:56.911412] E [repce(agent
/gluster/vg00/dispersed_fuse1024/brick):121:worker] : call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117,
in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 40, in register
    return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 46, in cl_register
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 30, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2020-06-25 07:51:56.912056] E [repce(worker
/gluster/vg00/dispersed_fuse1024/brick):213:__call__] RepceClient: call
failed    call=75086:140098349655872:1593071514.91 method=register   
error=ChangelogException
[2020-06-25 07:51:56.912396] E [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1286:service_loop] GLUSTER:
Changelog register failed    error=[Errno 2] No such file or directory
[2020-06-25 07:51:56.928031] I [repce(agent
/gluster/vg00/dispersed_fuse1024/brick):96:service_loop] RepceServer:
terminating on reaching EOF.
[2020-06-25 07:51:57.886126] I [monitor(monitor):280:monitor] Monitor:
worker died in startup phase brick=/gluster/vg00/dispersed_fuse1024/brick
[2020-06-25 07:51:57.895920] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change    status=Faulty
[2020-06-25 07:51:58.607405] I [gsyncdstatus(worker
/gluster/vg00/dispersed_fuse1024/brick):287:set_passive] GeorepStatus:
Worker Status Change    status=Passive
[2020-06-25 07:51:58.607768] I [gsyncdstatus(worker
/gluster/vg01/dispersed_fuse1024/brick):287:set_passive] GeorepStatus:
Worker Status Change    status=Passive
[2020-06-25 07:51:58.608004] I [gsyncdstatus(worker
/gluster/vg00/dispersed_fuse1024/brick):281:set_active] GeorepStatus:
Worker Status Change    status=Active


On 25/06/2020 09:15, rob.quaglio...@rabobank.com wrote:


Hi All,

We’ve got two six node RHEL 7.8 clusters and geo-replication would
appear to be completely broken between them. I’ve deleted the session,
removed & recreated pem files, old changlogs/htime (after removing
relevant options from volume) and completely set up geo-rep from
scratch, but the new session comes up as Initializing, then goes
faulty, and starts looping. Volume (on both sides) is a 4 x 2
disperse, running Gluster v6 (RH latest).  Gsyncd reports:

[2020-06-25 07:07:14.701423] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Initializing...

[2020-06-25 07:07:14.701744] I [monitor(monitor):159:monitor] Monitor:
starting gsyncd worker   brick=/rhgs/brick20/brick
slave_node=bxts470194.eu.rabonet.com

[2020-06-25 07:07:14.707997] D [monitor(monitor):230:monitor] Monitor:
Worker would mount volume privately

[2020-06-25 07:07:14.757181] I [gsyncd(agent
/rhgs/brick20/brick):318:main] : Using session config file
path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf

[2020-06-25 07:07:14.758126] D [subcmds(agent
/rhgs/brick20/brick):107:subcmd_agent] : RPC FD 
rpc_fd='5,12,11,10'

[2020-06-25 07:07:14.758627] I [changelogagent(agent
/rhgs/brick20/brick):72:__init__] ChangelogAgent: Agent listining...

[2020-06-25 07:07:14.764234] I [gsyncd(worker
/rhgs/brick20/brick):318:main] : Using session config file
path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf

[2020-06-25 07:07:14.779409] I [resource(worker
/rhgs/brick20/brick):1386:connect_remote] SSH: Initializing SSH
connection between master and sl

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-18 Thread David Cunningham
Hi Strahil,

Thank you for that, and the point you can't 'cd' to the .gfid directory.

I think the customer is going to live with the higher CPU usage as it's
still well within acceptable limits, and other things demand our time.

Thanks again for your input!


On Fri, 12 Jun 2020 at 16:06, Strahil Nikolov  wrote:

> Hello David,
>
> The  .gfid  directory is  there but you cannot traverse (cd) in it - you
> need to specify just like in the example.I had some cases where the
> 'transprt endpoint is not connected' was received, but usually this is due
> to a gfid missing.
>
> About the meetings, one of the topics is discussing open bugs and issues
> reported in the mailing  list. It will be nice to join the meeting and
> discuss that in audio, as there could be other devs willing to join the
> 'fight'.
>
> @Sankarshan,
> any idea  how to enable debug on the python script ?
>
>
> Best Regards,
> Strahil Nikolov
>
>
> На 12 юни 2020 г. 6:49:57 GMT+03:00, David Cunningham <
> dcunning...@voisonics.com> написа:
> >Hi Strahil,
> >
> >Is there a trick to getting the .gfid directory to appear besides
> >adding
> >"-o aux-gfid-mount" to the mount? I mounted it using "mount -t
> >glusterfs -o
> >aux-gfid-mount cafs30:/gvol0 /mnt/glusterfs" and there's no .gfid
> >directory
> >under /mnt/glusterfs.
> >
> >I haven't tried joining a gluster meeting. Are bugs/problems usually
> >discussed on such things? I usually find that people need to look into
> >things and respond in their own time so email can be better.
> >
> >Thanks for your help.
> >
> >
> >On Thu, 11 Jun 2020 at 15:16, Strahil Nikolov 
> >wrote:
> >
> >> You can try the path of a file based on gfid (method 2) via:
> >>
> >> https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/
> >>
> >> The gfids from the strace should be there, but if the file was
> >> renamed/deleted   - it is normall  to be missing.
> >>
> >> Have you joined the last gluster meeting to discuss the problem ?
> >>
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >>
> >> На 11 юни 2020 г. 3:15:36 GMT+03:00, David Cunningham <
> >> dcunning...@voisonics.com> написа:
> >> >Hi Strahil,
> >> >
> >> >Thanks for that. I did search for a file with the gfid in the name,
> >on
> >> >both
> >> >the master nodes and geo-replication slave, but none of them had
> >such a
> >> >file. I guess maybe by the time I looked the file had been deleted?
> >> >Either
> >> >that or something is more seriously wrong with invalid gfids.
> >> >
> >> >BTW, I used strace to try and see what gsyncd was up to when using
> >all
> >> >that
> >> >CPU. Running strace attached to gsyncd for 10 seconds gave 168,000
> >> >lines of
> >> >output, mostly like the following:
> >> >
> >> >read(6, "CHANGELOG.1585775398\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) =
> >4096
> >> >lseek(6, 1088012288, SEEK_SET)  = 1088012288
> >> >lseek(6, 1088012288, SEEK_SET)  = 1088012288
> >> >read(6, "est\nE da6ed6e8-2b49-4a56-b783-d0"..., 4096) = 4096
> >> >lseek(6, 1088016384, SEEK_SET)  = 1088016384
> >> >lseek(6, 1088016384, SEEK_SET)  = 1088016384
> >> >read(6, "lock-05380315\nE cf5fe292-2ebf-43"..., 4096) = 4096
> >> >lseek(6, 1088020480, SEEK_SET)  = 1088020480
> >> >lseek(6, 1088020480, SEEK_SET)  = 1088020480
> >> >read(6, "7 10a14313-4f92-4071-83cb-c900ef"..., 4096) = 4096
> >> >lseek(6, 1088024576, SEEK_SET)  = 1088024576
> >> >lseek(6, 1088024576, SEEK_SET)  = 1088024576
> >> >read(6, "D b70ba2e8-d954-4fb2-b17a-77c8cc"..., 4096) = 4096
> >> >lseek(6, 1088028672, SEEK_SET)  = 1088028672
> >> >lseek(6, 1088028672, SEEK_SET)  = 1088028672
> >> >read(6, "01681-e324-4f13-ab3a-0e8ae50ff95"..., 4096) = 4096
> >> >lseek(6, 1088032768, SEEK_SET)  = 1088032768
> >> >read(6, "09ef519/voicemail_1585336530_158"..., 4096) = 4096
> >> >lseek(6, 1088036864, SEEK_SET)  = 1088036864
> >> >read(6, "6-4539-8d7f-d17fb8f71d6d\nD 1236c"..., 4096) = 4096
> >> >lseek(6, 1088040960, SEEK_SET)  = 1088040960
> >> >lseek(6, 1088040960, SEEK_SET)  = 1088040960
> >> >read(6, "6-4d54-8b9b-4146930b3a2d\nD 88287"..., 4096) = 4096
> >> >
> >> >I'm guessing those reads are mostly on the files under .glusterfs? I
> >> >did
> >> >check for files matching the names above and there aren't any
> >matching
> >> >"da6ed6e8-2b49-4a56-b783-d0", "cf5fe292-2ebf-43", or
> >> >"10a14313-4f92-4071-83cb-c900ef" though.
> >> >
> >> >Any guidance would be appreciated.
> >> >
> >> >
> >> >On Wed, 10 Jun 2020 at 16:06, Strahil Nikolov
> >
> >> >wrote:
> >> >
> >> >> Hey David,
> >> >>
> >> >> Sadly I just have a feeling that on any brick there  is a gfid
> >> >mismatch,
> >> >> but I could be wrong.
> >> >>
> >> >> As you have  the gfid list, please check  on all  bricks (both
> >master
> >> >and
> >> >> slave)  that the file exists (not the one in .gluster , but the
> >real
> >> >one)
> >> >> and it has the same gfid.
> >> >>
> >> >> You can find the inode via ls and then run a find 

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-12 Thread Strahil Nikolov
Hello David,

The  .gfid  directory is  there but you cannot traverse (cd) in it - you need 
to specify just like in the example.I had some cases where the 'transprt 
endpoint is not connected' was received, but usually this is due to a gfid 
missing.

About the meetings, one of the topics is discussing open bugs and issues  
reported in the mailing  list. It will be nice to join the meeting and discuss 
that in audio, as there could be other devs willing to join the 'fight'.

@Sankarshan,
any idea  how to enable debug on the python script ?


Best Regards,
Strahil Nikolov


На 12 юни 2020 г. 6:49:57 GMT+03:00, David Cunningham 
 написа:
>Hi Strahil,
>
>Is there a trick to getting the .gfid directory to appear besides
>adding
>"-o aux-gfid-mount" to the mount? I mounted it using "mount -t
>glusterfs -o
>aux-gfid-mount cafs30:/gvol0 /mnt/glusterfs" and there's no .gfid
>directory
>under /mnt/glusterfs.
>
>I haven't tried joining a gluster meeting. Are bugs/problems usually
>discussed on such things? I usually find that people need to look into
>things and respond in their own time so email can be better.
>
>Thanks for your help.
>
>
>On Thu, 11 Jun 2020 at 15:16, Strahil Nikolov 
>wrote:
>
>> You can try the path of a file based on gfid (method 2) via:
>>
>> https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/
>>
>> The gfids from the strace should be there, but if the file was
>> renamed/deleted   - it is normall  to be missing.
>>
>> Have you joined the last gluster meeting to discuss the problem ?
>>
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 11 юни 2020 г. 3:15:36 GMT+03:00, David Cunningham <
>> dcunning...@voisonics.com> написа:
>> >Hi Strahil,
>> >
>> >Thanks for that. I did search for a file with the gfid in the name,
>on
>> >both
>> >the master nodes and geo-replication slave, but none of them had
>such a
>> >file. I guess maybe by the time I looked the file had been deleted?
>> >Either
>> >that or something is more seriously wrong with invalid gfids.
>> >
>> >BTW, I used strace to try and see what gsyncd was up to when using
>all
>> >that
>> >CPU. Running strace attached to gsyncd for 10 seconds gave 168,000
>> >lines of
>> >output, mostly like the following:
>> >
>> >read(6, "CHANGELOG.1585775398\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) =
>4096
>> >lseek(6, 1088012288, SEEK_SET)  = 1088012288
>> >lseek(6, 1088012288, SEEK_SET)  = 1088012288
>> >read(6, "est\nE da6ed6e8-2b49-4a56-b783-d0"..., 4096) = 4096
>> >lseek(6, 1088016384, SEEK_SET)  = 1088016384
>> >lseek(6, 1088016384, SEEK_SET)  = 1088016384
>> >read(6, "lock-05380315\nE cf5fe292-2ebf-43"..., 4096) = 4096
>> >lseek(6, 1088020480, SEEK_SET)  = 1088020480
>> >lseek(6, 1088020480, SEEK_SET)  = 1088020480
>> >read(6, "7 10a14313-4f92-4071-83cb-c900ef"..., 4096) = 4096
>> >lseek(6, 1088024576, SEEK_SET)  = 1088024576
>> >lseek(6, 1088024576, SEEK_SET)  = 1088024576
>> >read(6, "D b70ba2e8-d954-4fb2-b17a-77c8cc"..., 4096) = 4096
>> >lseek(6, 1088028672, SEEK_SET)  = 1088028672
>> >lseek(6, 1088028672, SEEK_SET)  = 1088028672
>> >read(6, "01681-e324-4f13-ab3a-0e8ae50ff95"..., 4096) = 4096
>> >lseek(6, 1088032768, SEEK_SET)  = 1088032768
>> >read(6, "09ef519/voicemail_1585336530_158"..., 4096) = 4096
>> >lseek(6, 1088036864, SEEK_SET)  = 1088036864
>> >read(6, "6-4539-8d7f-d17fb8f71d6d\nD 1236c"..., 4096) = 4096
>> >lseek(6, 1088040960, SEEK_SET)  = 1088040960
>> >lseek(6, 1088040960, SEEK_SET)  = 1088040960
>> >read(6, "6-4d54-8b9b-4146930b3a2d\nD 88287"..., 4096) = 4096
>> >
>> >I'm guessing those reads are mostly on the files under .glusterfs? I
>> >did
>> >check for files matching the names above and there aren't any
>matching
>> >"da6ed6e8-2b49-4a56-b783-d0", "cf5fe292-2ebf-43", or
>> >"10a14313-4f92-4071-83cb-c900ef" though.
>> >
>> >Any guidance would be appreciated.
>> >
>> >
>> >On Wed, 10 Jun 2020 at 16:06, Strahil Nikolov
>
>> >wrote:
>> >
>> >> Hey David,
>> >>
>> >> Sadly I just have a feeling that on any brick there  is a gfid
>> >mismatch,
>> >> but I could be wrong.
>> >>
>> >> As you have  the gfid list, please check  on all  bricks (both
>master
>> >and
>> >> slave)  that the file exists (not the one in .gluster , but the
>real
>> >one)
>> >> and it has the same gfid.
>> >>
>> >> You can find the inode via ls and then run a find (don't forget
>the
>> >> ionice) against the brick and that inode number.
>> >>
>> >> Once you have the full path to the file , test:
>> >> - Mount with FUSE
>> >> - Check file exists ( no '??' for permissions, size, etc) and
>can
>> >be
>> >> manipulated (maybe 'touch' can be used ?)
>> >> - Find (on all replica  sets ) the file and check the gfid
>> >> - Check for heals pending for that gfid
>> >>
>> >>
>> >> Best  Regards,
>> >> Strahil Nikolov
>> >>
>> >> На 10 юни 2020 г. 6:37:35 GMT+03:00, David Cunningham <
>> >> dcunning...@voisonics.com> написа:
>> >> >Hi Strahil,
>> >>

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-11 Thread David Cunningham
Hi Strahil,

Is there a trick to getting the .gfid directory to appear besides adding
"-o aux-gfid-mount" to the mount? I mounted it using "mount -t glusterfs -o
aux-gfid-mount cafs30:/gvol0 /mnt/glusterfs" and there's no .gfid directory
under /mnt/glusterfs.

I haven't tried joining a gluster meeting. Are bugs/problems usually
discussed on such things? I usually find that people need to look into
things and respond in their own time so email can be better.

Thanks for your help.


On Thu, 11 Jun 2020 at 15:16, Strahil Nikolov  wrote:

> You can try the path of a file based on gfid (method 2) via:
>
> https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/
>
> The gfids from the strace should be there, but if the file was
> renamed/deleted   - it is normall  to be missing.
>
> Have you joined the last gluster meeting to discuss the problem ?
>
>
> Best Regards,
> Strahil Nikolov
>
> На 11 юни 2020 г. 3:15:36 GMT+03:00, David Cunningham <
> dcunning...@voisonics.com> написа:
> >Hi Strahil,
> >
> >Thanks for that. I did search for a file with the gfid in the name, on
> >both
> >the master nodes and geo-replication slave, but none of them had such a
> >file. I guess maybe by the time I looked the file had been deleted?
> >Either
> >that or something is more seriously wrong with invalid gfids.
> >
> >BTW, I used strace to try and see what gsyncd was up to when using all
> >that
> >CPU. Running strace attached to gsyncd for 10 seconds gave 168,000
> >lines of
> >output, mostly like the following:
> >
> >read(6, "CHANGELOG.1585775398\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
> >lseek(6, 1088012288, SEEK_SET)  = 1088012288
> >lseek(6, 1088012288, SEEK_SET)  = 1088012288
> >read(6, "est\nE da6ed6e8-2b49-4a56-b783-d0"..., 4096) = 4096
> >lseek(6, 1088016384, SEEK_SET)  = 1088016384
> >lseek(6, 1088016384, SEEK_SET)  = 1088016384
> >read(6, "lock-05380315\nE cf5fe292-2ebf-43"..., 4096) = 4096
> >lseek(6, 1088020480, SEEK_SET)  = 1088020480
> >lseek(6, 1088020480, SEEK_SET)  = 1088020480
> >read(6, "7 10a14313-4f92-4071-83cb-c900ef"..., 4096) = 4096
> >lseek(6, 1088024576, SEEK_SET)  = 1088024576
> >lseek(6, 1088024576, SEEK_SET)  = 1088024576
> >read(6, "D b70ba2e8-d954-4fb2-b17a-77c8cc"..., 4096) = 4096
> >lseek(6, 1088028672, SEEK_SET)  = 1088028672
> >lseek(6, 1088028672, SEEK_SET)  = 1088028672
> >read(6, "01681-e324-4f13-ab3a-0e8ae50ff95"..., 4096) = 4096
> >lseek(6, 1088032768, SEEK_SET)  = 1088032768
> >read(6, "09ef519/voicemail_1585336530_158"..., 4096) = 4096
> >lseek(6, 1088036864, SEEK_SET)  = 1088036864
> >read(6, "6-4539-8d7f-d17fb8f71d6d\nD 1236c"..., 4096) = 4096
> >lseek(6, 1088040960, SEEK_SET)  = 1088040960
> >lseek(6, 1088040960, SEEK_SET)  = 1088040960
> >read(6, "6-4d54-8b9b-4146930b3a2d\nD 88287"..., 4096) = 4096
> >
> >I'm guessing those reads are mostly on the files under .glusterfs? I
> >did
> >check for files matching the names above and there aren't any matching
> >"da6ed6e8-2b49-4a56-b783-d0", "cf5fe292-2ebf-43", or
> >"10a14313-4f92-4071-83cb-c900ef" though.
> >
> >Any guidance would be appreciated.
> >
> >
> >On Wed, 10 Jun 2020 at 16:06, Strahil Nikolov 
> >wrote:
> >
> >> Hey David,
> >>
> >> Sadly I just have a feeling that on any brick there  is a gfid
> >mismatch,
> >> but I could be wrong.
> >>
> >> As you have  the gfid list, please check  on all  bricks (both master
> >and
> >> slave)  that the file exists (not the one in .gluster , but the real
> >one)
> >> and it has the same gfid.
> >>
> >> You can find the inode via ls and then run a find (don't forget the
> >> ionice) against the brick and that inode number.
> >>
> >> Once you have the full path to the file , test:
> >> - Mount with FUSE
> >> - Check file exists ( no '??' for permissions, size, etc) and can
> >be
> >> manipulated (maybe 'touch' can be used ?)
> >> - Find (on all replica  sets ) the file and check the gfid
> >> - Check for heals pending for that gfid
> >>
> >>
> >> Best  Regards,
> >> Strahil Nikolov
> >>
> >> На 10 юни 2020 г. 6:37:35 GMT+03:00, David Cunningham <
> >> dcunning...@voisonics.com> написа:
> >> >Hi Strahil,
> >> >
> >> >Thank you for that. Do you know if these "Stale file handle" errors
> >on
> >> >the
> >> >geo-replication slave could be related?
> >> >
> >> >[2020-06-10 01:02:32.268989] E [MSGID: 109040]
> >> >[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba: failed to lookup the
> >file
> >> >on
> >> >gvol0-dht [Stale file handle]
> >> >[2020-06-10 01:02:32.269092] W [fuse-bridge.c:897:fuse_attr_cbk]
> >> >0-glusterfs-fuse: 7434237: STAT()
> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
> >handle)
> >> >[2020-06-10 01:02:32.329280] W [fuse-bridge.c:897:fuse_attr_cbk]
> >> >0-glusterfs-fuse: 7434251: STAT()
> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
> >hand

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-11 Thread Strahil Nikolov
You can try the path of a file based on gfid (method 2) via:

https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/

The gfids from the strace should be there, but if the file was renamed/deleted  
 - it is normall  to be missing.

Have you joined the last gluster meeting to discuss the problem ?


Best Regards,
Strahil Nikolov

На 11 юни 2020 г. 3:15:36 GMT+03:00, David Cunningham 
 написа:
>Hi Strahil,
>
>Thanks for that. I did search for a file with the gfid in the name, on
>both
>the master nodes and geo-replication slave, but none of them had such a
>file. I guess maybe by the time I looked the file had been deleted?
>Either
>that or something is more seriously wrong with invalid gfids.
>
>BTW, I used strace to try and see what gsyncd was up to when using all
>that
>CPU. Running strace attached to gsyncd for 10 seconds gave 168,000
>lines of
>output, mostly like the following:
>
>read(6, "CHANGELOG.1585775398\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
>lseek(6, 1088012288, SEEK_SET)  = 1088012288
>lseek(6, 1088012288, SEEK_SET)  = 1088012288
>read(6, "est\nE da6ed6e8-2b49-4a56-b783-d0"..., 4096) = 4096
>lseek(6, 1088016384, SEEK_SET)  = 1088016384
>lseek(6, 1088016384, SEEK_SET)  = 1088016384
>read(6, "lock-05380315\nE cf5fe292-2ebf-43"..., 4096) = 4096
>lseek(6, 1088020480, SEEK_SET)  = 1088020480
>lseek(6, 1088020480, SEEK_SET)  = 1088020480
>read(6, "7 10a14313-4f92-4071-83cb-c900ef"..., 4096) = 4096
>lseek(6, 1088024576, SEEK_SET)  = 1088024576
>lseek(6, 1088024576, SEEK_SET)  = 1088024576
>read(6, "D b70ba2e8-d954-4fb2-b17a-77c8cc"..., 4096) = 4096
>lseek(6, 1088028672, SEEK_SET)  = 1088028672
>lseek(6, 1088028672, SEEK_SET)  = 1088028672
>read(6, "01681-e324-4f13-ab3a-0e8ae50ff95"..., 4096) = 4096
>lseek(6, 1088032768, SEEK_SET)  = 1088032768
>read(6, "09ef519/voicemail_1585336530_158"..., 4096) = 4096
>lseek(6, 1088036864, SEEK_SET)  = 1088036864
>read(6, "6-4539-8d7f-d17fb8f71d6d\nD 1236c"..., 4096) = 4096
>lseek(6, 1088040960, SEEK_SET)  = 1088040960
>lseek(6, 1088040960, SEEK_SET)  = 1088040960
>read(6, "6-4d54-8b9b-4146930b3a2d\nD 88287"..., 4096) = 4096
>
>I'm guessing those reads are mostly on the files under .glusterfs? I
>did
>check for files matching the names above and there aren't any matching
>"da6ed6e8-2b49-4a56-b783-d0", "cf5fe292-2ebf-43", or
>"10a14313-4f92-4071-83cb-c900ef" though.
>
>Any guidance would be appreciated.
>
>
>On Wed, 10 Jun 2020 at 16:06, Strahil Nikolov 
>wrote:
>
>> Hey David,
>>
>> Sadly I just have a feeling that on any brick there  is a gfid
>mismatch,
>> but I could be wrong.
>>
>> As you have  the gfid list, please check  on all  bricks (both master
>and
>> slave)  that the file exists (not the one in .gluster , but the real
>one)
>> and it has the same gfid.
>>
>> You can find the inode via ls and then run a find (don't forget the
>> ionice) against the brick and that inode number.
>>
>> Once you have the full path to the file , test:
>> - Mount with FUSE
>> - Check file exists ( no '??' for permissions, size, etc) and can
>be
>> manipulated (maybe 'touch' can be used ?)
>> - Find (on all replica  sets ) the file and check the gfid
>> - Check for heals pending for that gfid
>>
>>
>> Best  Regards,
>> Strahil Nikolov
>>
>> На 10 юни 2020 г. 6:37:35 GMT+03:00, David Cunningham <
>> dcunning...@voisonics.com> написа:
>> >Hi Strahil,
>> >
>> >Thank you for that. Do you know if these "Stale file handle" errors
>on
>> >the
>> >geo-replication slave could be related?
>> >
>> >[2020-06-10 01:02:32.268989] E [MSGID: 109040]
>> >[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba: failed to lookup the
>file
>> >on
>> >gvol0-dht [Stale file handle]
>> >[2020-06-10 01:02:32.269092] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >0-glusterfs-fuse: 7434237: STAT()
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>handle)
>> >[2020-06-10 01:02:32.329280] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >0-glusterfs-fuse: 7434251: STAT()
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>handle)
>> >[2020-06-10 01:02:32.387129] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >0-glusterfs-fuse: 7434264: STAT()
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>handle)
>> >[2020-06-10 01:02:32.448838] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >0-glusterfs-fuse: 7434277: STAT()
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>handle)
>> >[2020-06-10 01:02:32.507196] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >0-glusterfs-fuse: 7434290: STAT()
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>handle)
>> >[2020-06-10 01:02:32.566033] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >0-glusterfs-fuse: 7434303: STAT()
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>handle)
>> >[2020-06-10 01:02:32.625168] W [fuse-bridge.c:897:fuse_att

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-10 Thread David Cunningham
Hi Strahil,

Thanks for that. I did search for a file with the gfid in the name, on both
the master nodes and geo-replication slave, but none of them had such a
file. I guess maybe by the time I looked the file had been deleted? Either
that or something is more seriously wrong with invalid gfids.

BTW, I used strace to try and see what gsyncd was up to when using all that
CPU. Running strace attached to gsyncd for 10 seconds gave 168,000 lines of
output, mostly like the following:

read(6, "CHANGELOG.1585775398\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(6, 1088012288, SEEK_SET)  = 1088012288
lseek(6, 1088012288, SEEK_SET)  = 1088012288
read(6, "est\nE da6ed6e8-2b49-4a56-b783-d0"..., 4096) = 4096
lseek(6, 1088016384, SEEK_SET)  = 1088016384
lseek(6, 1088016384, SEEK_SET)  = 1088016384
read(6, "lock-05380315\nE cf5fe292-2ebf-43"..., 4096) = 4096
lseek(6, 1088020480, SEEK_SET)  = 1088020480
lseek(6, 1088020480, SEEK_SET)  = 1088020480
read(6, "7 10a14313-4f92-4071-83cb-c900ef"..., 4096) = 4096
lseek(6, 1088024576, SEEK_SET)  = 1088024576
lseek(6, 1088024576, SEEK_SET)  = 1088024576
read(6, "D b70ba2e8-d954-4fb2-b17a-77c8cc"..., 4096) = 4096
lseek(6, 1088028672, SEEK_SET)  = 1088028672
lseek(6, 1088028672, SEEK_SET)  = 1088028672
read(6, "01681-e324-4f13-ab3a-0e8ae50ff95"..., 4096) = 4096
lseek(6, 1088032768, SEEK_SET)  = 1088032768
read(6, "09ef519/voicemail_1585336530_158"..., 4096) = 4096
lseek(6, 1088036864, SEEK_SET)  = 1088036864
read(6, "6-4539-8d7f-d17fb8f71d6d\nD 1236c"..., 4096) = 4096
lseek(6, 1088040960, SEEK_SET)  = 1088040960
lseek(6, 1088040960, SEEK_SET)  = 1088040960
read(6, "6-4d54-8b9b-4146930b3a2d\nD 88287"..., 4096) = 4096

I'm guessing those reads are mostly on the files under .glusterfs? I did
check for files matching the names above and there aren't any matching
"da6ed6e8-2b49-4a56-b783-d0", "cf5fe292-2ebf-43", or
"10a14313-4f92-4071-83cb-c900ef" though.

Any guidance would be appreciated.


On Wed, 10 Jun 2020 at 16:06, Strahil Nikolov  wrote:

> Hey David,
>
> Sadly I just have a feeling that on any brick there  is a gfid mismatch,
> but I could be wrong.
>
> As you have  the gfid list, please check  on all  bricks (both master and
> slave)  that the file exists (not the one in .gluster , but the real one)
> and it has the same gfid.
>
> You can find the inode via ls and then run a find (don't forget the
> ionice) against the brick and that inode number.
>
> Once you have the full path to the file , test:
> - Mount with FUSE
> - Check file exists ( no '??' for permissions, size, etc) and can be
> manipulated (maybe 'touch' can be used ?)
> - Find (on all replica  sets ) the file and check the gfid
> - Check for heals pending for that gfid
>
>
> Best  Regards,
> Strahil Nikolov
>
> На 10 юни 2020 г. 6:37:35 GMT+03:00, David Cunningham <
> dcunning...@voisonics.com> написа:
> >Hi Strahil,
> >
> >Thank you for that. Do you know if these "Stale file handle" errors on
> >the
> >geo-replication slave could be related?
> >
> >[2020-06-10 01:02:32.268989] E [MSGID: 109040]
> >[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba: failed to lookup the file
> >on
> >gvol0-dht [Stale file handle]
> >[2020-06-10 01:02:32.269092] W [fuse-bridge.c:897:fuse_attr_cbk]
> >0-glusterfs-fuse: 7434237: STAT()
> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
> >[2020-06-10 01:02:32.329280] W [fuse-bridge.c:897:fuse_attr_cbk]
> >0-glusterfs-fuse: 7434251: STAT()
> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
> >[2020-06-10 01:02:32.387129] W [fuse-bridge.c:897:fuse_attr_cbk]
> >0-glusterfs-fuse: 7434264: STAT()
> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
> >[2020-06-10 01:02:32.448838] W [fuse-bridge.c:897:fuse_attr_cbk]
> >0-glusterfs-fuse: 7434277: STAT()
> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
> >[2020-06-10 01:02:32.507196] W [fuse-bridge.c:897:fuse_attr_cbk]
> >0-glusterfs-fuse: 7434290: STAT()
> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
> >[2020-06-10 01:02:32.566033] W [fuse-bridge.c:897:fuse_attr_cbk]
> >0-glusterfs-fuse: 7434303: STAT()
> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
> >[2020-06-10 01:02:32.625168] W [fuse-bridge.c:897:fuse_attr_cbk]
> >0-glusterfs-fuse: 7434316: STAT()
> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
> >[2020-06-10 01:02:32.772442] W [fuse-bridge.c:897:fuse_attr_cbk]
> >0-glusterfs-fuse: 7434329: STAT()
> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
> >[2020-06-10 01:02:32.832481] W [fuse-bridge.c:897:fuse_attr_cbk]
> >0-glusterfs-fuse: 7434342: STAT()
> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
> >[2020-06-10 01:02:32.891835] W [fuse-bridge.c:89

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-10 Thread Strahil Nikolov
Hey David,

Sadly I just have a feeling that on any brick there  is a gfid mismatch,  but I 
could be wrong.

As you have  the gfid list, please check  on all  bricks (both master and 
slave)  that the file exists (not the one in .gluster , but the real one) and 
it has the same gfid.

You can find the inode via ls and then run a find (don't forget the ionice) 
against the brick and that inode number. 

Once you have the full path to the file , test:
- Mount with FUSE
- Check file exists ( no '??' for permissions, size, etc) and can be 
manipulated (maybe 'touch' can be used ?)
- Find (on all replica  sets ) the file and check the gfid
- Check for heals pending for that gfid


Best  Regards,
Strahil Nikolov

На 10 юни 2020 г. 6:37:35 GMT+03:00, David Cunningham 
 написа:
>Hi Strahil,
>
>Thank you for that. Do you know if these "Stale file handle" errors on
>the
>geo-replication slave could be related?
>
>[2020-06-10 01:02:32.268989] E [MSGID: 109040]
>[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba: failed to lookup the file
>on
>gvol0-dht [Stale file handle]
>[2020-06-10 01:02:32.269092] W [fuse-bridge.c:897:fuse_attr_cbk]
>0-glusterfs-fuse: 7434237: STAT()
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
>[2020-06-10 01:02:32.329280] W [fuse-bridge.c:897:fuse_attr_cbk]
>0-glusterfs-fuse: 7434251: STAT()
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
>[2020-06-10 01:02:32.387129] W [fuse-bridge.c:897:fuse_attr_cbk]
>0-glusterfs-fuse: 7434264: STAT()
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
>[2020-06-10 01:02:32.448838] W [fuse-bridge.c:897:fuse_attr_cbk]
>0-glusterfs-fuse: 7434277: STAT()
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
>[2020-06-10 01:02:32.507196] W [fuse-bridge.c:897:fuse_attr_cbk]
>0-glusterfs-fuse: 7434290: STAT()
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
>[2020-06-10 01:02:32.566033] W [fuse-bridge.c:897:fuse_attr_cbk]
>0-glusterfs-fuse: 7434303: STAT()
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
>[2020-06-10 01:02:32.625168] W [fuse-bridge.c:897:fuse_attr_cbk]
>0-glusterfs-fuse: 7434316: STAT()
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
>[2020-06-10 01:02:32.772442] W [fuse-bridge.c:897:fuse_attr_cbk]
>0-glusterfs-fuse: 7434329: STAT()
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
>[2020-06-10 01:02:32.832481] W [fuse-bridge.c:897:fuse_attr_cbk]
>0-glusterfs-fuse: 7434342: STAT()
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
>[2020-06-10 01:02:32.891835] W [fuse-bridge.c:897:fuse_attr_cbk]
>0-glusterfs-fuse: 7434403: STAT()
>/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
>
>
>
>On Tue, 9 Jun 2020 at 16:31, Strahil Nikolov 
>wrote:
>
>> Hey David,
>>
>> Can you check the cpu usage  in the sar on the rest of the cluster
>(going
>> backwards from the day you found the high cpu usage),  so we can know
>if
>> this behaviour was obseerved on other nodes.
>>
>> Maybe that behaviour was "normal" for the push node (which could be
>> another one) .
>>
>> As  this  script  is python,  I guess  you can put some debug print
>> statements in it.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 9 юни 2020 г. 5:07:11 GMT+03:00, David Cunningham <
>> dcunning...@voisonics.com> написа:
>> >Hi Sankarshan,
>> >
>> >Thanks for that. So what should we look for to figure out what this
>> >process
>> >is doing? In
>> >/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log we
>see
>> >something like the following logged regularly:
>> >
>> >
>> >[[2020-06-09 02:01:19.670595] D [master(worker
>> >/nodirectwritedata/gluster/gvol0):1454:changelogs_batch_process]
>> >_GMaster:
>> >processing changes
>>
>>
>>batch=['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',
>>
>>
>>'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055']
>> >[2020-06-09 02:01:19.674927] D [master(worker
>> >/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing
>> >change
>> >
>>
>>
>>changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040
>> >[2020-06-09 02:01:19.683098] D [master(worker
>> >/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster:
>> >entries: []
>> >[2020-06-09 02:01:19.695125] D [master(worker
>> >/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
>> >files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
>> >'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
>> >'.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0',
>> >'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
>> >[2020-06-09 02:01:19.695344] D [master(worker
>> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:
>

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-09 Thread David Cunningham
Hi Strahil,

Thank you for that. Do you know if these "Stale file handle" errors on the
geo-replication slave could be related?

[2020-06-10 01:02:32.268989] E [MSGID: 109040]
[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba: failed to lookup the file on
gvol0-dht [Stale file handle]
[2020-06-10 01:02:32.269092] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434237: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.329280] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434251: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.387129] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434264: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.448838] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434277: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.507196] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434290: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.566033] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434303: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.625168] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434316: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.772442] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434329: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.832481] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434342: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)
[2020-06-10 01:02:32.891835] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 7434403: STAT()
/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file handle)



On Tue, 9 Jun 2020 at 16:31, Strahil Nikolov  wrote:

> Hey David,
>
> Can you check the cpu usage  in the sar on the rest of the cluster (going
> backwards from the day you found the high cpu usage),  so we can know if
> this behaviour was obseerved on other nodes.
>
> Maybe that behaviour was "normal" for the push node (which could be
> another one) .
>
> As  this  script  is python,  I guess  you can put some debug print
> statements in it.
>
> Best Regards,
> Strahil Nikolov
>
> На 9 юни 2020 г. 5:07:11 GMT+03:00, David Cunningham <
> dcunning...@voisonics.com> написа:
> >Hi Sankarshan,
> >
> >Thanks for that. So what should we look for to figure out what this
> >process
> >is doing? In
> >/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log we see
> >something like the following logged regularly:
> >
> >
> >[[2020-06-09 02:01:19.670595] D [master(worker
> >/nodirectwritedata/gluster/gvol0):1454:changelogs_batch_process]
> >_GMaster:
> >processing changes
>
> >batch=['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',
>
> >'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055']
> >[2020-06-09 02:01:19.674927] D [master(worker
> >/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing
> >change
> >
>
> >changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040
> >[2020-06-09 02:01:19.683098] D [master(worker
> >/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster:
> >entries: []
> >[2020-06-09 02:01:19.695125] D [master(worker
> >/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
> >files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
> >'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
> >'.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0',
> >'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
> >[2020-06-09 02:01:19.695344] D [master(worker
> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
> >for
> >syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
> >[2020-06-09 02:01:19.695508] D [master(worker
> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
> >for
> >syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
> >[2020-06-09 02:01:19.695638] D [master(worker
> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
> >for
> >syncing file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
> >[2020-06-09 02:01:19.695759] D [master(worker
> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
> >for
> >syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
> >[2020-06-09 02:01:19.695883] D [master(worker
> >/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing
> >change
> >
>
> >changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-09 Thread Strahil Nikolov
Hey David,

Can you check the cpu usage  in the sar on the rest of the cluster (going 
backwards from the day you found the high cpu usage),  so we can know if this 
behaviour was obseerved on other nodes.

Maybe that behaviour was "normal" for the push node (which could be another 
one) .

As  this  script  is python,  I guess  you can put some debug print statements 
in it.

Best Regards,
Strahil Nikolov

На 9 юни 2020 г. 5:07:11 GMT+03:00, David Cunningham 
 написа:
>Hi Sankarshan,
>
>Thanks for that. So what should we look for to figure out what this
>process
>is doing? In
>/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log we see
>something like the following logged regularly:
>
>
>[[2020-06-09 02:01:19.670595] D [master(worker
>/nodirectwritedata/gluster/gvol0):1454:changelogs_batch_process]
>_GMaster:
>processing changes
>batch=['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',
>'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055']
>[2020-06-09 02:01:19.674927] D [master(worker
>/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing
>change
>
>changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040
>[2020-06-09 02:01:19.683098] D [master(worker
>/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster:
>entries: []
>[2020-06-09 02:01:19.695125] D [master(worker
>/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
>files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
>'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
>'.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0',
>'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
>[2020-06-09 02:01:19.695344] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
>[2020-06-09 02:01:19.695508] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
>[2020-06-09 02:01:19.695638] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
>[2020-06-09 02:01:19.695759] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>[2020-06-09 02:01:19.695883] D [master(worker
>/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing
>change
>
>changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055
>[2020-06-09 02:01:19.696170] D [master(worker
>/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster:
>entries: []
>[2020-06-09 02:01:19.714097] D [master(worker
>/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
>files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
>'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
>'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
>[2020-06-09 02:01:19.714286] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
>[2020-06-09 02:01:19.714433] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
>[2020-06-09 02:01:19.714577] D [master(worker
>/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate
>for
>syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>[2020-06-09 02:01:20.179656] D [resource(worker
>/nodirectwritedata/gluster/gvol0):1419:rsync] SSH: files:
>.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,
>.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,
>.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0,
>.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435,
>.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,
>.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,
>.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>[2020-06-09 02:01:20.738632] I [master(worker
>/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
>duration=0.5588 num_files=7 job=2   return_code=0
>[2020-06-09 02:01:20.739650] D [master(worker
>/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
> file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
>[2020-06-09 02:01:20.740041] D [master(worker
>/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
> file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
>[2020-06-09 02:01:20.740200] D [master(worker
>/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
> file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
>[2020-06-09 02:01:20.740343] D [master(worker
>/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
> file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>[2020-06-09 02:01:20.740482] D [master(worker
>/nodirectwritedata/g

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-08 Thread David Cunningham
Hi Sankarshan,

Thanks for that. So what should we look for to figure out what this process
is doing? In
/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log we see
something like the following logged regularly:


[[2020-06-09 02:01:19.670595] D [master(worker
/nodirectwritedata/gluster/gvol0):1454:changelogs_batch_process] _GMaster:
processing changes
batch=['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',
'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055']
[2020-06-09 02:01:19.674927] D [master(worker
/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing change

changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040
[2020-06-09 02:01:19.683098] D [master(worker
/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster: entries: []
[2020-06-09 02:01:19.695125] D [master(worker
/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
'.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0',
'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
[2020-06-09 02:01:19.695344] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
[2020-06-09 02:01:19.695508] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
[2020-06-09 02:01:19.695638] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
[2020-06-09 02:01:19.695759] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
[2020-06-09 02:01:19.695883] D [master(worker
/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing change

changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055
[2020-06-09 02:01:19.696170] D [master(worker
/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster: entries: []
[2020-06-09 02:01:19.714097] D [master(worker
/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
[2020-06-09 02:01:19.714286] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
[2020-06-09 02:01:19.714433] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
[2020-06-09 02:01:19.714577] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
[2020-06-09 02:01:20.179656] D [resource(worker
/nodirectwritedata/gluster/gvol0):1419:rsync] SSH: files:
.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,
.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,
.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0,
.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435,
.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,
.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,
.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
[2020-06-09 02:01:20.738632] I [master(worker
/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
duration=0.5588 num_files=7 job=2   return_code=0
[2020-06-09 02:01:20.739650] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
[2020-06-09 02:01:20.740041] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
[2020-06-09 02:01:20.740200] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
[2020-06-09 02:01:20.740343] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
[2020-06-09 02:01:20.740482] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
[2020-06-09 02:01:20.740616] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
[2020-06-09 02:01:20.740741] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
[2020-06-09 02:01:20.745385] D [repce(worker
/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
22499:140085349934848:1591668080.75
done('/var/lib/misc

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-07 Thread sankarshan
Reading through the thread it occurs to me that it would be a stronger
approach to understand the workload (a general description of the
application) and in terms of the releases of GlusterFS running, assess
if there are new issues to be addressed or if existing sets of patches
tend to work. Thanks for setting up the debug level in the log. High
CPU usage by a geo-replication process would need to be traced back to
why it really requires that %-age of CPU if it was not doing so
previously.

On Mon, 8 Jun 2020 at 05:29, David Cunningham  wrote:
>
> Hi Strahil,
>
> The CPU is still quite high, with "top" regularly showing 100% CPU usage by 
> that process. However it's not clear whether this is really a problem, or if 
> it's just normal geo-replication activity. While CPU usage was not previously 
> as high on this server, it's not clear whether GlusterFS might have made this 
> the "push" node for geo-replication when it used to be some other server. 
> Having said that, no other server shows a sudden drop in CPU usage.
>
> Ideally we could find out what the gsyncd process is doing, and therefore 
> whether it's expected or not. Any ideas on that? We did set the log level to 
> DEBUG as Sunny suggested.
>
> Looking at 
> /var/log/glusterfs/geo-replication-slaves/gvol0_nvfs10_gvol0/mnt-cafs30-nodirectwritedata-gluster-gvol0.log
>  on the geo-replication slave, we have a lot of lines like the following 
> logged.
>
> [2020-06-06 00:32:43.155856] W [fuse-bridge.c:897:fuse_attr_cbk] 
> 0-glusterfs-fuse: 875853: STAT() /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac 
> => -1 (Stale file handle)
> [2020-06-06 00:32:43.219759] W [fuse-bridge.c:897:fuse_attr_cbk] 
> 0-glusterfs-fuse: 875923: STAT() /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac 
> => -1 (Stale file handle)
> [2020-06-06 00:32:43.280357] W [fuse-bridge.c:897:fuse_attr_cbk] 
> 0-glusterfs-fuse: 876001: STAT() /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac 
> => -1 (Stale file handle)
> The message "E [MSGID: 109040] 
> [dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht: 
> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac: failed to lookup the file on 
> gvol0-dht [Stale file handle]" repeated 9 times between [2020-06-06 
> 00:32:42.689780] and [2020-06-06 00:32:43.280322]
> [2020-06-06 09:03:03.660956] E [MSGID: 109040] 
> [dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht: 
> /.gfid/decdd552-d58b-4ddf-b27a-0da9a6fbc38b: failed to lookup the file on 
> gvol0-dht [Stale file handle]
> [2020-06-06 09:03:03.661057] W [fuse-bridge.c:897:fuse_attr_cbk] 
> 0-glusterfs-fuse: 965375: STAT() /.gfid/decdd552-d58b-4ddf-b27a-0da9a6fbc38b 
> => -1 (Stale file handle)
> [2020-06-06 09:03:10.258798] E [MSGID: 109040] 
> [dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht: 
> /.gfid/1d90e129-d40f-4ca4-bea8-0bb3c4c7985a: failed to lookup the file on 
> gvol0-dht [Stale file handle]
> [2020-06-06 09:03:10.258880] W [fuse-bridge.c:897:fuse_attr_cbk] 
> 0-glusterfs-fuse: 969455: STAT() /.gfid/1d90e129-d40f-4ca4-bea8-0bb3c4c7985a 
> => -1 (Stale file handle)
> [2020-06-06 09:09:41.259362] E [MSGID: 109040] 
> [dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht: 
> /.gfid/740efff6-74dd-45be-847f-66919d1179e0: failed to lookup the file on 
> gvol0-dht [Stale file handle]
> [2020-06-06 09:09:41.259458] W [fuse-bridge.c:897:fuse_attr_cbk] 
> 0-glusterfs-fuse: 1040904: STAT() /.gfid/740efff6-74dd-45be-847f-66919d1179e0 
> => -1 (Stale file handle)
>
> Could these errors be part of the problem?
>
> Thanks again.
>
>
> On Sat, 6 Jun 2020 at 21:21, Strahil Nikolov  wrote:
>>
>> Hey David,
>>
>> can you check the old logs for gfid mismatch and get a list  of files that 
>> were causing the high cpu .
>> Maybe they are  related  somehow (maybe created by the same software  ,  
>> same client version or something else) which could help about that.
>>
>> Also  take  a  look in geo-replication-slave logs.
>>
>> Does the issue still occurs ?
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 6 юни 2020 г. 1:21:55 GMT+03:00, David Cunningham 
>>  написа:
>> >Hi Sunny and Strahil,
>> >
>> >Thanks again for your responses. We don't have a lot of renaming
>> >activity -
>> >maybe some, but not a lot. We do have files which are open for writing
>> >for
>> >quite a while - they're call recordings being written as the call
>> >happens.
>> >
>> >We've installed GlusterFS using the Ubuntu packages and I'd really like
>> >to
>> >avoid compiling and applying patches.
>> >
>> >After enabling DEBUG the log at
>> >/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log
>> >doesn't
>> >seem to show anything very unusual:
>> >
>> >[2020-06-03 02:49:01.992177] I [master(worker
>> >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
>> >Taken
>> >   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.
>> >UNL=0
>> >[2020-06-03 02:49:01.992465] I [master(worker
>> >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster:

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-07 Thread David Cunningham
Hi Strahil,

The CPU is still quite high, with "top" regularly showing 100% CPU usage by
that process. However it's not clear whether this is really a problem, or
if it's just normal geo-replication activity. While CPU usage was not
previously as high on this server, it's not clear whether GlusterFS might
have made this the "push" node for geo-replication when it used to be some
other server. Having said that, no other server shows a sudden drop in CPU
usage.

Ideally we could find out what the gsyncd process is doing, and therefore
whether it's expected or not. Any ideas on that? We did set the log level
to DEBUG as Sunny suggested.

Looking at
/var/log/glusterfs/geo-replication-slaves/gvol0_nvfs10_gvol0/mnt-cafs30-nodirectwritedata-gluster-gvol0.log
on the geo-replication slave, we have a lot of lines like the following
logged.

[2020-06-06 00:32:43.155856] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 875853: STAT()
/.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale file handle)
[2020-06-06 00:32:43.219759] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 875923: STAT()
/.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale file handle)
[2020-06-06 00:32:43.280357] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 876001: STAT()
/.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale file handle)
The message "E [MSGID: 109040]
[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
/.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac: failed to lookup the file on
gvol0-dht [Stale file handle]" repeated 9 times between [2020-06-06
00:32:42.689780] and [2020-06-06 00:32:43.280322]
[2020-06-06 09:03:03.660956] E [MSGID: 109040]
[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
/.gfid/decdd552-d58b-4ddf-b27a-0da9a6fbc38b: failed to lookup the file on
gvol0-dht [Stale file handle]
[2020-06-06 09:03:03.661057] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 965375: STAT()
/.gfid/decdd552-d58b-4ddf-b27a-0da9a6fbc38b => -1 (Stale file handle)
[2020-06-06 09:03:10.258798] E [MSGID: 109040]
[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
/.gfid/1d90e129-d40f-4ca4-bea8-0bb3c4c7985a: failed to lookup the file on
gvol0-dht [Stale file handle]
[2020-06-06 09:03:10.258880] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 969455: STAT()
/.gfid/1d90e129-d40f-4ca4-bea8-0bb3c4c7985a => -1 (Stale file handle)
[2020-06-06 09:09:41.259362] E [MSGID: 109040]
[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
/.gfid/740efff6-74dd-45be-847f-66919d1179e0: failed to lookup the file on
gvol0-dht [Stale file handle]
[2020-06-06 09:09:41.259458] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 1040904: STAT()
/.gfid/740efff6-74dd-45be-847f-66919d1179e0 => -1 (Stale file handle)

Could these errors be part of the problem?

Thanks again.


On Sat, 6 Jun 2020 at 21:21, Strahil Nikolov  wrote:

> Hey David,
>
> can you check the old logs for gfid mismatch and get a list  of files that
> were causing the high cpu .
> Maybe they are  related  somehow (maybe created by the same software  ,
> same client version or something else) which could help about that.
>
> Also  take  a  look in geo-replication-slave logs.
>
> Does the issue still occurs ?
>
> Best Regards,
> Strahil Nikolov
>
> На 6 юни 2020 г. 1:21:55 GMT+03:00, David Cunningham <
> dcunning...@voisonics.com> написа:
> >Hi Sunny and Strahil,
> >
> >Thanks again for your responses. We don't have a lot of renaming
> >activity -
> >maybe some, but not a lot. We do have files which are open for writing
> >for
> >quite a while - they're call recordings being written as the call
> >happens.
> >
> >We've installed GlusterFS using the Ubuntu packages and I'd really like
> >to
> >avoid compiling and applying patches.
> >
> >After enabling DEBUG the log at
> >/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log
> >doesn't
> >seem to show anything very unusual:
> >
> >[2020-06-03 02:49:01.992177] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
> >Taken
> >   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.
> >UNL=0
> >[2020-06-03 02:49:01.992465] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
> >Time TakenSETA=0  SETX=0
> >meta_duration=0.data_duration=13.0954
> >   DATA=8  XATT=0
> >[2020-06-03 02:49:01.992863] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
> >Completed
> >changelog_end=1591152508entry_stime=(1591152352, 0)
> >changelog_start=1591152494  stime=(1591152507, 0)
> >duration=13.1077
> > num_changelogs=2mode=live_changelog
> >[2020-06-03 02:49:02.958687] D [repce(worker
> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
> >19017:139678812452608:1591152542.96 keep_alive({'version': (1, 0),
> >'retval': 0, 'uuid': '8ee85fae-f3aa-4285-ad48-67a1dc17ed73', 'timeout':
> >1591152662, 'volume

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-06 Thread Strahil Nikolov
Hey David,

can you check the old logs for gfid mismatch and get a list  of files that were 
causing the high cpu .
Maybe they are  related  somehow (maybe created by the same software  ,  same 
client version or something else) which could help about that.

Also  take  a  look in geo-replication-slave logs.

Does the issue still occurs ?

Best Regards,
Strahil Nikolov

На 6 юни 2020 г. 1:21:55 GMT+03:00, David Cunningham 
 написа:
>Hi Sunny and Strahil,
>
>Thanks again for your responses. We don't have a lot of renaming
>activity -
>maybe some, but not a lot. We do have files which are open for writing
>for
>quite a while - they're call recordings being written as the call
>happens.
>
>We've installed GlusterFS using the Ubuntu packages and I'd really like
>to
>avoid compiling and applying patches.
>
>After enabling DEBUG the log at
>/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log
>doesn't
>seem to show anything very unusual:
>
>[2020-06-03 02:49:01.992177] I [master(worker
>/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
>Taken
>   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.
>UNL=0
>[2020-06-03 02:49:01.992465] I [master(worker
>/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
>Time TakenSETA=0  SETX=0 
>meta_duration=0.data_duration=13.0954
>   DATA=8  XATT=0
>[2020-06-03 02:49:01.992863] I [master(worker
>/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
>Completed
>changelog_end=1591152508entry_stime=(1591152352, 0)
>changelog_start=1591152494  stime=(1591152507, 0)  
>duration=13.1077
> num_changelogs=2mode=live_changelog
>[2020-06-03 02:49:02.958687] D [repce(worker
>/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
>19017:139678812452608:1591152542.96 keep_alive({'version': (1, 0),
>'retval': 0, 'uuid': '8ee85fae-f3aa-4285-ad48-67a1dc17ed73', 'timeout':
>1591152662, 'volume_mark': (1583043396, 161632)},) ...
>[2020-06-03 02:49:02.979139] D [repce(worker
>/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
>19017:139678812452608:1591152542.96 keep_alive -> 28
>[2020-06-03 02:49:06.998127] D [master(worker
>/nodirectwritedata/gluster/gvol0):551:crawlwrap] _GMaster: ... crawl
>#114
>done, took 30.180089 seconds
>[2020-06-03 02:49:07.10132] D [repce(worker
>/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
>19017:139679441716992:1591152547.01 scan() ...
>[2020-06-03 02:49:07.10781] D [repce(worker
>/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
>19017:139679441716992:1591152547.01 scan -> None
>[2020-06-03 02:49:07.10935] D [repce(worker
>/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
>19017:139679441716992:1591152547.01 getchanges() ...
>[2020-06-03 02:49:07.11579] D [repce(worker
>/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
>19017:139679441716992:1591152547.01 getchanges ->
>['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591152522']
>[2020-06-03 02:49:07.11720] I [master(worker
>/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
>stime=(1591152507, 0)
>
>Am I looking in the right place to find out what that gsyncd.py process
>is
>doing?
>
>
>On Tue, 2 Jun 2020 at 21:58, Sunny Kumar  wrote:
>
>> Hi David,
>>
>> You haven't answered my previous question regarding the type of your
>> workload.
>> ---
>> You can use the below command to enable debug log.
>>
>> `gluster vol geo-rep  :: config
>log-level
>> DEBUG`
>>
>> and after capturing log again switch back to info mode:
>>
>> `gluster vol geo-rep  :: config
>log-level
>> INFO`
>>
>> Please share the debug log and geo-rep config to debug further:
>> for config:
>>
>> `gluster vol geo-rep  :: config`
>>
>> /sunny
>>
>>
>> On Tue, Jun 2, 2020 at 10:18 AM Strahil Nikolov
>
>> wrote:
>> >
>> > Hi David,
>> >
>> > in which log do you see the entries ?
>> >
>> > I think I got an explanation why you see the process  only on one
>of the
>> master nodes -  geo-rep session is established from only 1 master
>node  /I
>> hope someone corrects me if I'm wrong/ to one slave node. Thus it
>will be
>> natural to see the high CPU  usage on only 1 master node in your
>situation.
>> >
>> > Do you see anything else  in the :
>> var/log/glusterfs/geo-replication/ (master nodes) or in
>> /var/log/glusterfs/geo-replication-slaves (slaves) that could hint of
>the
>> exact issue. I have  a vague feeling that that python script is
>constantly
>> looping over some data causing the CPU hog.
>> >
>> > Sadly, I can't find an instruction for increasing the log level of
>the
>> geo rep log .
>> >
>> >
>> > Best  Regards,
>> > Strahil  Nikolov
>> >
>> >
>> > На 2 юни 2020 г. 6:14:46 GMT+03:00, David Cunningham <
>> dcunning...@voisonics.com> написа:
>> > >Hi Strahil and Sunny,
>> > >
>> > >Thank you for the replies. I checked the gfid on the master and
>slaves
>> > >and
>> > >they are

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-05 Thread David Cunningham
Hi Sunny and Strahil,

Thanks again for your responses. We don't have a lot of renaming activity -
maybe some, but not a lot. We do have files which are open for writing for
quite a while - they're call recordings being written as the call happens.

We've installed GlusterFS using the Ubuntu packages and I'd really like to
avoid compiling and applying patches.

After enabling DEBUG the log at
/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log doesn't
seem to show anything very unusual:

[2020-06-03 02:49:01.992177] I [master(worker
/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time Taken
   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.
UNL=0
[2020-06-03 02:49:01.992465] I [master(worker
/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
Time TakenSETA=0  SETX=0  meta_duration=0.data_duration=13.0954
   DATA=8  XATT=0
[2020-06-03 02:49:01.992863] I [master(worker
/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch Completed
changelog_end=1591152508entry_stime=(1591152352, 0)
 changelog_start=1591152494  stime=(1591152507, 0)   duration=13.1077
 num_changelogs=2mode=live_changelog
[2020-06-03 02:49:02.958687] D [repce(worker
/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
19017:139678812452608:1591152542.96 keep_alive({'version': (1, 0),
'retval': 0, 'uuid': '8ee85fae-f3aa-4285-ad48-67a1dc17ed73', 'timeout':
1591152662, 'volume_mark': (1583043396, 161632)},) ...
[2020-06-03 02:49:02.979139] D [repce(worker
/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
19017:139678812452608:1591152542.96 keep_alive -> 28
[2020-06-03 02:49:06.998127] D [master(worker
/nodirectwritedata/gluster/gvol0):551:crawlwrap] _GMaster: ... crawl #114
done, took 30.180089 seconds
[2020-06-03 02:49:07.10132] D [repce(worker
/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
19017:139679441716992:1591152547.01 scan() ...
[2020-06-03 02:49:07.10781] D [repce(worker
/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
19017:139679441716992:1591152547.01 scan -> None
[2020-06-03 02:49:07.10935] D [repce(worker
/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
19017:139679441716992:1591152547.01 getchanges() ...
[2020-06-03 02:49:07.11579] D [repce(worker
/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
19017:139679441716992:1591152547.01 getchanges ->
['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591152522']
[2020-06-03 02:49:07.11720] I [master(worker
/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
stime=(1591152507, 0)

Am I looking in the right place to find out what that gsyncd.py process is
doing?


On Tue, 2 Jun 2020 at 21:58, Sunny Kumar  wrote:

> Hi David,
>
> You haven't answered my previous question regarding the type of your
> workload.
> ---
> You can use the below command to enable debug log.
>
> `gluster vol geo-rep  :: config log-level
> DEBUG`
>
> and after capturing log again switch back to info mode:
>
> `gluster vol geo-rep  :: config log-level
> INFO`
>
> Please share the debug log and geo-rep config to debug further:
> for config:
>
> `gluster vol geo-rep  :: config`
>
> /sunny
>
>
> On Tue, Jun 2, 2020 at 10:18 AM Strahil Nikolov 
> wrote:
> >
> > Hi David,
> >
> > in which log do you see the entries ?
> >
> > I think I got an explanation why you see the process  only on one of the
> master nodes -  geo-rep session is established from only 1 master node  /I
> hope someone corrects me if I'm wrong/ to one slave node. Thus it will be
> natural to see the high CPU  usage on only 1 master node in your situation.
> >
> > Do you see anything else  in the :
> var/log/glusterfs/geo-replication/ (master nodes) or in
> /var/log/glusterfs/geo-replication-slaves (slaves) that could hint of the
> exact issue. I have  a vague feeling that that python script is constantly
> looping over some data causing the CPU hog.
> >
> > Sadly, I can't find an instruction for increasing the log level of the
> geo rep log .
> >
> >
> > Best  Regards,
> > Strahil  Nikolov
> >
> >
> > На 2 юни 2020 г. 6:14:46 GMT+03:00, David Cunningham <
> dcunning...@voisonics.com> написа:
> > >Hi Strahil and Sunny,
> > >
> > >Thank you for the replies. I checked the gfid on the master and slaves
> > >and
> > >they are the same. After moving the file away and back again it doesn't
> > >seem to be having the issue with that file any more.
> > >
> > >We are still getting higher CPU usage on one of the master nodes than
> > >the
> > >others. It logs this every few seconds:
> > >
> > >[2020-06-02 03:10:15.637815] I [master(worker
> > >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
> > >Taken
> > >   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.
> > >UNL=0
> > >[2020-06-02 03:10:15.638010] I [master(worker
> > >/nodirectwritedata/gluster

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-02 Thread Strahil Nikolov
Hi David,

in which log do you see the entries ?

I think I got an explanation why you see the process  only on one of the master 
nodes -  geo-rep session is established from only 1 master node  /I hope 
someone corrects me if I'm wrong/ to one slave node. Thus it will be natural to 
see the high CPU  usage on only 1 master node in your situation.

Do you see anything else  in the : var/log/glusterfs/geo-replication/ 
(master nodes) or in /var/log/glusterfs/geo-replication-slaves (slaves) that 
could hint of the exact issue. I have  a vague feeling that that python script 
is constantly  looping over some data causing the CPU hog.

Sadly, I can't find an instruction for increasing the log level of the geo rep 
log .


Best  Regards,
Strahil  Nikolov
 

На 2 юни 2020 г. 6:14:46 GMT+03:00, David Cunningham 
 написа:
>Hi Strahil and Sunny,
>
>Thank you for the replies. I checked the gfid on the master and slaves
>and
>they are the same. After moving the file away and back again it doesn't
>seem to be having the issue with that file any more.
>
>We are still getting higher CPU usage on one of the master nodes than
>the
>others. It logs this every few seconds:
>
>[2020-06-02 03:10:15.637815] I [master(worker
>/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
>Taken
>   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.
>UNL=0
>[2020-06-02 03:10:15.638010] I [master(worker
>/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
>Time TakenSETA=0  SETX=0 
>meta_duration=0.data_duration=12.7878
>   DATA=4  XATT=0
>[2020-06-02 03:10:15.638286] I [master(worker
>/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
>Completed
>changelog_end=1591067378entry_stime=(1591067167, 0)
>changelog_start=1591067364  stime=(1591067377, 0)  
>duration=12.8068
> num_changelogs=2mode=live_changelog
>[2020-06-02 03:10:20.658601] I [master(worker
>/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> stime=(1591067377, 0)
>[2020-06-02 03:10:34.21799] I [master(worker
>/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
> duration=0.3826 num_files=8 job=1   return_code=0
>[2020-06-02 03:10:46.440535] I [master(worker
>/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
>Taken
>   MKD=0   MKN=0   LIN=0   SYM=0   REN=1   RMD=0CRE=2   duration=0.1314
>UNL=1
>[2020-06-02 03:10:46.440809] I [master(worker
>/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
>Time TakenSETA=0  SETX=0 
>meta_duration=0.data_duration=13.0171
>   DATA=14 XATT=0
>[2020-06-02 03:10:46.441205] I [master(worker
>/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
>Completed
>changelog_end=1591067420entry_stime=(1591067419, 0)
>changelog_start=1591067392  stime=(1591067419, 0)  
>duration=13.0322
> num_changelogs=3mode=live_changelog
>[2020-06-02 03:10:51.460925] I [master(worker
>/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> stime=(1591067419, 0)
>
>[2020-06-02 03:11:04.448913] I [master(worker
>/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
>duration=0.3466 num_files=3 job=1   return_code=0
>
>Whereas the other master nodes only log this:
>
>[2020-06-02 03:11:33.886938] I [gsyncd(config-get):308:main] :
>Using
>session config file
>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>[2020-06-02 03:11:33.993175] I [gsyncd(status):308:main] : Using
>session config file
>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>
>Can anyone help with what might cause the high CPU usage on one master
>node? The process is this one, and is using 70-100% of CPU:
>
>python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
>worker gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
>/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id
>b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
>cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1
>--resource-remote nvfs30 --resource-remote-id
>1e698ccd-aeec-4ec4-96fe-383da8fc3b78
>
>Thank you in advance!
>
>
>
>
>On Sat, 30 May 2020 at 20:20, Strahil Nikolov 
>wrote:
>
>> Hey David,
>>
>> for me a gfid  mismatch means  that the file  was  replaced/recreated
> -
>> just like  vim in linux does (and it is expected for config file).
>>
>> Have  you checked the gfid  of  the file on both source and
>destination,
>> do they really match or they are different ?
>>
>> What happens  when you move away the file  from the slave ,  does it
>fixes
>> the issue ?
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 30 май 2020 г. 1:10:56 GMT+03:00, David Cunningham <
>> dcunning...@voisonics.com> написа:
>> >Hello,
>> >
>> >We're having an issue with a geo-replication process with unusually
>> >high
>> >CPU use and giving "Entry not present on master. Fixing gfid
>mismatch
>> >in
>> >slave" er

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-02 Thread Sunny Kumar
Hi David,

You haven't answered my previous question regarding the type of your workload.
---
You can use the below command to enable debug log.

`gluster vol geo-rep  :: config log-level DEBUG`

and after capturing log again switch back to info mode:

`gluster vol geo-rep  :: config log-level INFO`

Please share the debug log and geo-rep config to debug further:
for config:

`gluster vol geo-rep  :: config`

/sunny


On Tue, Jun 2, 2020 at 10:18 AM Strahil Nikolov  wrote:
>
> Hi David,
>
> in which log do you see the entries ?
>
> I think I got an explanation why you see the process  only on one of the 
> master nodes -  geo-rep session is established from only 1 master node  /I 
> hope someone corrects me if I'm wrong/ to one slave node. Thus it will be 
> natural to see the high CPU  usage on only 1 master node in your situation.
>
> Do you see anything else  in the : var/log/glusterfs/geo-replication/ 
> (master nodes) or in /var/log/glusterfs/geo-replication-slaves (slaves) that 
> could hint of the exact issue. I have  a vague feeling that that python 
> script is constantly  looping over some data causing the CPU hog.
>
> Sadly, I can't find an instruction for increasing the log level of the geo 
> rep log .
>
>
> Best  Regards,
> Strahil  Nikolov
>
>
> На 2 юни 2020 г. 6:14:46 GMT+03:00, David Cunningham 
>  написа:
> >Hi Strahil and Sunny,
> >
> >Thank you for the replies. I checked the gfid on the master and slaves
> >and
> >they are the same. After moving the file away and back again it doesn't
> >seem to be having the issue with that file any more.
> >
> >We are still getting higher CPU usage on one of the master nodes than
> >the
> >others. It logs this every few seconds:
> >
> >[2020-06-02 03:10:15.637815] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
> >Taken
> >   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.
> >UNL=0
> >[2020-06-02 03:10:15.638010] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
> >Time TakenSETA=0  SETX=0
> >meta_duration=0.data_duration=12.7878
> >   DATA=4  XATT=0
> >[2020-06-02 03:10:15.638286] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
> >Completed
> >changelog_end=1591067378entry_stime=(1591067167, 0)
> >changelog_start=1591067364  stime=(1591067377, 0)
> >duration=12.8068
> > num_changelogs=2mode=live_changelog
> >[2020-06-02 03:10:20.658601] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> > stime=(1591067377, 0)
> >[2020-06-02 03:10:34.21799] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
> > duration=0.3826 num_files=8 job=1   return_code=0
> >[2020-06-02 03:10:46.440535] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
> >Taken
> >   MKD=0   MKN=0   LIN=0   SYM=0   REN=1   RMD=0CRE=2   duration=0.1314
> >UNL=1
> >[2020-06-02 03:10:46.440809] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
> >Time TakenSETA=0  SETX=0
> >meta_duration=0.data_duration=13.0171
> >   DATA=14 XATT=0
> >[2020-06-02 03:10:46.441205] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
> >Completed
> >changelog_end=1591067420entry_stime=(1591067419, 0)
> >changelog_start=1591067392  stime=(1591067419, 0)
> >duration=13.0322
> > num_changelogs=3mode=live_changelog
> >[2020-06-02 03:10:51.460925] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> > stime=(1591067419, 0)
> >
> >[2020-06-02 03:11:04.448913] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
> >duration=0.3466 num_files=3 job=1   return_code=0
> >
> >Whereas the other master nodes only log this:
> >
> >[2020-06-02 03:11:33.886938] I [gsyncd(config-get):308:main] :
> >Using
> >session config file
> >path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
> >[2020-06-02 03:11:33.993175] I [gsyncd(status):308:main] : Using
> >session config file
> >path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
> >
> >Can anyone help with what might cause the high CPU usage on one master
> >node? The process is this one, and is using 70-100% of CPU:
> >
> >python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
> >worker gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
> >/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id
> >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
> >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1
> >--resource-remote nvfs30 --resource-remote-id
> >1e698ccd-aeec-4ec4-96fe-383da8fc3b78
> >
> >Thank you in advance!
> >
> >
> >
> >
> >On Sat, 30 May 2020 at 20:20, Strahil Nikolov 
> >wrote:
> >
> >> Hey David,
> >>
> >> for m

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-06-01 Thread David Cunningham
Hi Strahil and Sunny,

Thank you for the replies. I checked the gfid on the master and slaves and
they are the same. After moving the file away and back again it doesn't
seem to be having the issue with that file any more.

We are still getting higher CPU usage on one of the master nodes than the
others. It logs this every few seconds:

[2020-06-02 03:10:15.637815] I [master(worker
/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time Taken
   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.
UNL=0
[2020-06-02 03:10:15.638010] I [master(worker
/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
Time TakenSETA=0  SETX=0  meta_duration=0.data_duration=12.7878
   DATA=4  XATT=0
[2020-06-02 03:10:15.638286] I [master(worker
/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch Completed
changelog_end=1591067378entry_stime=(1591067167, 0)
 changelog_start=1591067364  stime=(1591067377, 0)   duration=12.8068
 num_changelogs=2mode=live_changelog
[2020-06-02 03:10:20.658601] I [master(worker
/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
 stime=(1591067377, 0)
[2020-06-02 03:10:34.21799] I [master(worker
/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
 duration=0.3826 num_files=8 job=1   return_code=0
[2020-06-02 03:10:46.440535] I [master(worker
/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time Taken
   MKD=0   MKN=0   LIN=0   SYM=0   REN=1   RMD=0CRE=2   duration=0.1314
UNL=1
[2020-06-02 03:10:46.440809] I [master(worker
/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
Time TakenSETA=0  SETX=0  meta_duration=0.data_duration=13.0171
   DATA=14 XATT=0
[2020-06-02 03:10:46.441205] I [master(worker
/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch Completed
changelog_end=1591067420entry_stime=(1591067419, 0)
 changelog_start=1591067392  stime=(1591067419, 0)   duration=13.0322
 num_changelogs=3mode=live_changelog
[2020-06-02 03:10:51.460925] I [master(worker
/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
 stime=(1591067419, 0)

[2020-06-02 03:11:04.448913] I [master(worker
/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
duration=0.3466 num_files=3 job=1   return_code=0

Whereas the other master nodes only log this:

[2020-06-02 03:11:33.886938] I [gsyncd(config-get):308:main] : Using
session config file
path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
[2020-06-02 03:11:33.993175] I [gsyncd(status):308:main] : Using
session config file
path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf

Can anyone help with what might cause the high CPU usage on one master
node? The process is this one, and is using 70-100% of CPU:

python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
worker gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id
b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1
--resource-remote nvfs30 --resource-remote-id
1e698ccd-aeec-4ec4-96fe-383da8fc3b78

Thank you in advance!




On Sat, 30 May 2020 at 20:20, Strahil Nikolov  wrote:

> Hey David,
>
> for me a gfid  mismatch means  that the file  was  replaced/recreated  -
> just like  vim in linux does (and it is expected for config file).
>
> Have  you checked the gfid  of  the file on both source and destination,
> do they really match or they are different ?
>
> What happens  when you move away the file  from the slave ,  does it fixes
> the issue ?
>
> Best Regards,
> Strahil Nikolov
>
> На 30 май 2020 г. 1:10:56 GMT+03:00, David Cunningham <
> dcunning...@voisonics.com> написа:
> >Hello,
> >
> >We're having an issue with a geo-replication process with unusually
> >high
> >CPU use and giving "Entry not present on master. Fixing gfid mismatch
> >in
> >slave" errors. Can anyone help on this?
> >
> >We have 3 GlusterFS replica nodes (we'll call the master), which also
> >push
> >data to a remote server (slave) using geo-replication. This has been
> >running fine for a couple of months, but yesterday one of the master
> >nodes
> >started having unusually high CPU use. It's this process:
> >
> >root@cafs30:/var/log/glusterfs# ps aux | grep 32048
> >root 32048 68.7  0.6 1843140 845756 ?  Rl   02:51 493:51
> >python2
> >/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py worker
> >gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
> >/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id
> >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
> >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1
> >--resource-remote nvfs30 --resource-remote-id
> >1e698ccd-aeec-4ec4-96fe-383da8fc3b78
> >
> >Here's what is being logged in
> >/var/log/

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-05-30 Thread Sunny Kumar
Hi David,

Looks like you are running a workload that involves lots of rename and
geo-rep is trying to handle those. you can try below patches which
will give you performance benefits.

[1]. https://review.gluster.org/#/c/glusterfs/+/23570/
[2]. https://review.gluster.org/#/c/glusterfs/+/23459/
[3]. https://review.gluster.org/#/c/glusterfs/+/22720/

/sunny

On Sat, May 30, 2020 at 9:20 AM Strahil Nikolov  wrote:
>
> Hey David,
>
> for me a gfid  mismatch means  that the file  was  replaced/recreated  -  
> just like  vim in linux does (and it is expected for config file).
>
> Have  you checked the gfid  of  the file on both source and destination,  do 
> they really match or they are different ?
>
> What happens  when you move away the file  from the slave ,  does it fixes 
> the issue ?
>
> Best Regards,
> Strahil Nikolov
>
> На 30 май 2020 г. 1:10:56 GMT+03:00, David Cunningham 
>  написа:
> >Hello,
> >
> >We're having an issue with a geo-replication process with unusually
> >high
> >CPU use and giving "Entry not present on master. Fixing gfid mismatch
> >in
> >slave" errors. Can anyone help on this?
> >
> >We have 3 GlusterFS replica nodes (we'll call the master), which also
> >push
> >data to a remote server (slave) using geo-replication. This has been
> >running fine for a couple of months, but yesterday one of the master
> >nodes
> >started having unusually high CPU use. It's this process:
> >
> >root@cafs30:/var/log/glusterfs# ps aux | grep 32048
> >root 32048 68.7  0.6 1843140 845756 ?  Rl   02:51 493:51
> >python2
> >/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py worker
> >gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
> >/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id
> >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
> >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1
> >--resource-remote nvfs30 --resource-remote-id
> >1e698ccd-aeec-4ec4-96fe-383da8fc3b78
> >
> >Here's what is being logged in
> >/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log:
> >
> >[2020-05-29 21:57:18.843524] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> > stime=(1590789408, 0)
> >[2020-05-29 21:57:30.626172] I [master(worker
> >/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
> >_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
> >Deleting the entryretry_count=1   entry=({u'uid': 108, u'gfid':
> >u'7c0b75e5-d8b7-454f-8010-112d613c599e', u'gid': 117, u'mode': 33204,
> >u'entry':
> >u'.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7.cfg',
> >u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
> >u'slave_name': None, u'slave_gfid':
> >u'ec4b0ace-2ec4-4ea5-adbc-9f519b81917c', u'name_mismatch': False,
> >u'dst':
> >False})
> >[2020-05-29 21:57:30.627893] I [master(worker
> >/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
> >_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
> >Deleting the entryretry_count=1   entry=({u'uid': 108, u'gfid':
> >u'a4d52e40-2e2f-4885-be5f-65fe95a8ebd7', u'gid': 117, u'mode': 33204,
> >u'entry':
> >u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7.cfg',
> >u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
> >u'slave_name': None, u'slave_gfid':
> >u'ece8da77-b5ea-45a7-9af7-7d4d8f55f74a', u'name_mismatch': False,
> >u'dst':
> >False})
> >[2020-05-29 21:57:30.629532] I [master(worker
> >/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
> >_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
> >Deleting the entryretry_count=1   entry=({u'uid': 108, u'gfid':
> >u'3c525ad8-aeb2-46b6-9c41-7fb4987916f8', u'gid': 117, u'mode': 33204,
> >u'entry':
> >u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7-directory.xml',
> >u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
> >u'slave_name': None, u'slave_gfid':
> >u'06717b5a-d842-495d-bd25-aab9cd454490', u'name_mismatch': False,
> >u'dst':
> >False})
> >[2020-05-29 21:57:30.659123] I [master(worker
> >/nodirectwritedata/gluster/gvol0):942:handle_entry_failures] _GMaster:
> >Sucessfully fixed entry ops with gfid mismatch retry_count=1
> >[2020-05-29 21:57:30.659343] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1194:process_change] _GMaster: Retry
> >original entries. count = 1
> >[2020-05-29 21:57:30.725810] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1197:process_change] _GMaster:
> >Sucessfully fixed all entry ops with gfid mismatch
> >[2020-05-29 21:57:31.747319] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
> >duration=0.7409 num_files=18job=1   return_code=0
> >
> >We've verified that the files like polycom_00a859f7.cfg referred to
> >in
> >the error do exist on the master nodes and slave.
> >
> >We found this bug fix:
> >https://bugzilla.redhat.

Re: [Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

2020-05-30 Thread Strahil Nikolov
Hey David,

for me a gfid  mismatch means  that the file  was  replaced/recreated  -  just 
like  vim in linux does (and it is expected for config file).

Have  you checked the gfid  of  the file on both source and destination,  do 
they really match or they are different ?

What happens  when you move away the file  from the slave ,  does it fixes the 
issue ?

Best Regards,
Strahil Nikolov

На 30 май 2020 г. 1:10:56 GMT+03:00, David Cunningham 
 написа:
>Hello,
>
>We're having an issue with a geo-replication process with unusually
>high
>CPU use and giving "Entry not present on master. Fixing gfid mismatch
>in
>slave" errors. Can anyone help on this?
>
>We have 3 GlusterFS replica nodes (we'll call the master), which also
>push
>data to a remote server (slave) using geo-replication. This has been
>running fine for a couple of months, but yesterday one of the master
>nodes
>started having unusually high CPU use. It's this process:
>
>root@cafs30:/var/log/glusterfs# ps aux | grep 32048
>root 32048 68.7  0.6 1843140 845756 ?  Rl   02:51 493:51
>python2
>/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py worker
>gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
>/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id
>b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
>cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1
>--resource-remote nvfs30 --resource-remote-id
>1e698ccd-aeec-4ec4-96fe-383da8fc3b78
>
>Here's what is being logged in
>/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log:
>
>[2020-05-29 21:57:18.843524] I [master(worker
>/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> stime=(1590789408, 0)
>[2020-05-29 21:57:30.626172] I [master(worker
>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
>_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
>Deleting the entryretry_count=1   entry=({u'uid': 108, u'gfid':
>u'7c0b75e5-d8b7-454f-8010-112d613c599e', u'gid': 117, u'mode': 33204,
>u'entry':
>u'.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7.cfg',
>u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
>u'slave_name': None, u'slave_gfid':
>u'ec4b0ace-2ec4-4ea5-adbc-9f519b81917c', u'name_mismatch': False,
>u'dst':
>False})
>[2020-05-29 21:57:30.627893] I [master(worker
>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
>_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
>Deleting the entryretry_count=1   entry=({u'uid': 108, u'gfid':
>u'a4d52e40-2e2f-4885-be5f-65fe95a8ebd7', u'gid': 117, u'mode': 33204,
>u'entry':
>u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7.cfg',
>u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
>u'slave_name': None, u'slave_gfid':
>u'ece8da77-b5ea-45a7-9af7-7d4d8f55f74a', u'name_mismatch': False,
>u'dst':
>False})
>[2020-05-29 21:57:30.629532] I [master(worker
>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
>_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
>Deleting the entryretry_count=1   entry=({u'uid': 108, u'gfid':
>u'3c525ad8-aeb2-46b6-9c41-7fb4987916f8', u'gid': 117, u'mode': 33204,
>u'entry':
>u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7-directory.xml',
>u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
>u'slave_name': None, u'slave_gfid':
>u'06717b5a-d842-495d-bd25-aab9cd454490', u'name_mismatch': False,
>u'dst':
>False})
>[2020-05-29 21:57:30.659123] I [master(worker
>/nodirectwritedata/gluster/gvol0):942:handle_entry_failures] _GMaster:
>Sucessfully fixed entry ops with gfid mismatch retry_count=1
>[2020-05-29 21:57:30.659343] I [master(worker
>/nodirectwritedata/gluster/gvol0):1194:process_change] _GMaster: Retry
>original entries. count = 1
>[2020-05-29 21:57:30.725810] I [master(worker
>/nodirectwritedata/gluster/gvol0):1197:process_change] _GMaster:
>Sucessfully fixed all entry ops with gfid mismatch
>[2020-05-29 21:57:31.747319] I [master(worker
>/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
>duration=0.7409 num_files=18job=1   return_code=0
>
>We've verified that the files like polycom_00a859f7.cfg referred to
>in
>the error do exist on the master nodes and slave.
>
>We found this bug fix:
>https://bugzilla.redhat.com/show_bug.cgi?id=1642865
>
>However that fix went in 5.1, and we're running 5.12 on the master
>nodes
>and slave. A couple of GlusterFS clients connected to the master nodes
>are
>running 5.13.
>
>Would anyone have any suggestions? Thank you in advance.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-Replication File not Found on /.glusterfs/XX/XX/XXXXXXXXXXXX

2020-03-25 Thread Senén Vidal Blanco
Hi,
I have verified that the system is read-only, it does not let me delete or 
create files inside the slave volume.
I send you logs of what I have before stopping the Geo-replication.

Archivos.log
--

[2020-03-18 20:47:57.950339] I [MSGID: 100030] [glusterfsd.c:2867:main] 0-/
usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 7.3 (args: /
usr/sbin/glusterfs --acl --process-name fuse --volfile-server=samil --volfile-
id=/archivossamil /archivos) 
[2020-03-18 20:47:57.952274] I [glusterfsd.c:2594:daemonize] 0-glusterfs: Pid 
of current running process is 5779
[2020-03-18 20:47:57.959404] I [MSGID: 101190] [event-epoll.c:
682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 
[2020-03-18 20:47:57.959535] I [MSGID: 101190] [event-epoll.c:
682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 
[2020-03-18 20:47:57.978600] I [MSGID: 114020] [client.c:2436:notify] 0-
archivossamil-client-0: parent translators are ready, attempting connect on 
transport 
Final graph:
+--
+
  1: volume archivossamil-client-0
  2: type protocol/client
  3: option ping-timeout 42
  4: option remote-host samil
  5: option remote-subvolume /brickarchivos/archivos
  6: option transport-type socket
  7: option transport.address-family inet
  8: option username 892b695d-8a06-42d8-9502-87146d2eab50
  9: option password 37130c51-989b-420d-a8d2-46bb3255435a
 10: option transport.socket.ssl-enabled off
 11: option transport.tcp-user-timeout 0
 12: option transport.socket.keepalive-time 20
 13: option transport.socket.keepalive-interval 2
 14: option transport.socket.keepalive-count 9
 15: option send-gids true
 16: end-volume
 17:  
 18: volume archivossamil-dht
 19: type cluster/distribute
 20: option lock-migration off
 21: option force-migration off
 22: subvolumes archivossamil-client-0
 23: end-volume
 24:  
 25: volume archivossamil-utime
 26: type features/utime
 27: option noatime on
 28: subvolumes archivossamil-dht
 29: end-volume
 30:  
 31: volume archivossamil-write-behind
 32: type performance/write-behind
 33: subvolumes archivossamil-utime
 34: end-volume
 35:  
 36: volume archivossamil-read-ahead
 37: type performance/read-ahead
 38: subvolumes archivossamil-write-behind
 39: end-volume
 40:  
 41: volume archivossamil-readdir-ahead
 42: type performance/readdir-ahead
 43: option parallel-readdir off
 44: option rda-request-size 131072
 45: option rda-cache-limit 10MB
 46: subvolumes archivossamil-read-ahead
 47: end-volume
 48:  
 49: volume archivossamil-io-cache
 50: type performance/io-cache
 51: subvolumes archivossamil-readdir-ahead
 52: end-volume
 53:  
 54: volume archivossamil-open-behind
 55: type performance/open-behind
 56: subvolumes archivossamil-io-cache
 57: end-volume
 58:  
 59: volume archivossamil-quick-read
 60: type performance/quick-read
 61: subvolumes archivossamil-open-behind
 62: end-volume
 63:  
 64: volume archivossamil-md-cache
 65: type performance/md-cache
 66: option cache-posix-acl true
 67: subvolumes archivossamil-quick-read
 68: end-volume
 69:  
 70: volume archivossamil-io-threads
 71: type performance/io-threads
 72: subvolumes archivossamil-md-cache
 73: end-volume
 74:  
 75: volume archivossamil
 76: type debug/io-stats
 77: option log-level INFO
 78: option threads 16
 79: option latency-measurement off
 80: option count-fop-hits off
 81: option global-threading off
 82: subvolumes archivossamil-io-threads
 83: end-volume
 84:  
 85: volume posix-acl-autoload
 86: type system/posix-acl
 87: subvolumes archivossamil
 88: end-volume
 89:  
 90: volume meta-autoload
 91: type meta
 92: subvolumes posix-acl-autoload
 93: end-volume
 94:  
+--
+
[2020-03-18 20:47:57.979407] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-
archivossamil-client-0: changing port to 49153 (from 0)
[2020-03-18 20:47:57.979662] I [socket.c:865:__socket_shutdown] 0-
archivossamil-client-0: intentional socket shutdown(12)
[2020-03-18 20:47:57.980195] I [MSGID: 114057] [client-handshake.c:
1375:select_server_supported_programs] 0-archivossamil-client-0: Using Program 
GlusterFS 4.x v1, Num (1298437), Version (400) 
[2020-03-18 20:47:57.983891] I [MSGID: 114046] [client-handshake.c:
1105:client_setvolume_cbk] 0-archivossamil-client-0: Connected to 
archivossamil-client-0, attached to remote volume '/brickarchivos/archivos'. 
[2020-03-18 20:47:57.985173] I [fuse-bridge.c:5166:fuse_init] 0-glusterfs-
fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.27
[2020-03-18 20:47:57.985205] I [fuse-bridge.c:5777:fuse_graph_sync] 0-fuse: 
switched to graph 0
[2020-03-18 21:04:09.804023] I [glusterfsd

Re: [Gluster-users] Geo-Replication File not Found on /.glusterfs/XX/XX/XXXXXXXXXXXX

2020-03-25 Thread Sunny Kumar
Hi Senén,

By any chance you perform any operation   on slave volume; like
deleting data directly from slave volume.

Also If possible please share geo-rep slave logs.

/sunny

On Wed, Mar 25, 2020 at 9:15 AM Senén Vidal Blanco
 wrote:
>
> Hi,
> I have a problem with the Geo-Replication system.
> The first synchronization was successful a few days ago. But after a bit of
> filming I run into an error message preventing the sync from continuing.
> I summarize a little the data on the configuration:
>
> Debian 10
> Glusterfs 7.3
> Master volume: archivosvao
> Slave volume: archivossamil
>
> volume geo-replication archivosvao samil::archivossamil config
> access_mount:false
> allow_network:
> change_detector:changelog
> change_interval:5
> changelog_archive_format:%Y%m
> changelog_batch_size:727040
> changelog_log_file:/var/log/glusterfs/geo-replication/
> archivosvao_samil_archivossamil/changes-${local_id}.log
> changelog_log_level:INFO
> checkpoint:0
> cli_log_file:/var/log/glusterfs/geo-replication/cli.log
> cli_log_level:INFO
> connection_timeout:60
> georep_session_working_dir:/var/lib/glusterd/geo-replication/
> archivosvao_samil_archivossamil/
> gfid_conflict_resolution:true
> gluster_cli_options:
> gluster_command:gluster
> gluster_command_dir:/usr/sbin
> gluster_log_file:/var/log/glusterfs/geo-replication/
> archivosvao_samil_archivossamil/mnt-${local_id}.log
> gluster_log_level:INFO
> gluster_logdir:/var/log/glusterfs
> gluster_params:aux-gfid-mount acl
> gluster_rundir:/var/run/gluster
> glusterd_workdir:/var/lib/glusterd
> gsyncd_miscdir:/var/lib/misc/gluster/gsyncd
> ignore_deletes:false
> isolated_slaves:
> log_file:/var/log/glusterfs/geo-replication/archivosvao_samil_archivossamil/
> gsyncd.log
> log_level:INFO
> log_rsync_performance:false
> master_disperse_count:1
> master_distribution_count:1
> master_replica_count:1
> max_rsync_retries:10
> meta_volume_mnt:/var/run/gluster/shared_storage
> pid_file:/var/run/gluster/gsyncd-archivosvao-samil-archivossamil.pid
> remote_gsyncd:
> replica_failover_interval:1
> rsync_command:rsync
> rsync_opt_existing:true
> rsync_opt_ignore_missing_args:true
> rsync_options:
> rsync_ssh_options:
> slave_access_mount:false
> slave_gluster_command_dir:/usr/sbin
> slave_gluster_log_file:/var/log/glusterfs/geo-replication-slaves/
> archivosvao_samil_archivossamil/mnt-${master_node}-${master_brick_id}.log
> slave_gluster_log_file_mbr:/var/log/glusterfs/geo-replication-slaves/
> archivosvao_samil_archivossamil/mnt-mbr-${master_node}-${master_brick_id}.log
> slave_gluster_log_level:INFO
> slave_gluster_params:aux-gfid-mount acl
> slave_log_file:/var/log/glusterfs/geo-replication-slaves/
> archivosvao_samil_archivossamil/gsyncd.log
> slave_log_level:INFO
> slave_timeout:120
> special_sync_mode:
> ssh_command:ssh
> ssh_options:-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/
> lib/glusterd/geo-replication/secret.pem
> ssh_options_tar:-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /
> var/lib/glusterd/geo-replication/tar_ssh.pem
> ssh_port:22
> state_file:/var/lib/glusterd/geo-replication/archivosvao_samil_archivossamil/
> monitor.status
> state_socket_unencoded:
> stime_xattr_prefix:trusted.glusterfs.c7fa7778-
> f2e4-48f9-8817-5811c09964d5.8d4c7ef7-35fc-497a-9425-66f4aced159b
> sync_acls:true
> sync_jobs:3
> sync_method:rsync
> sync_xattrs:true
> tar_command:tar
> use_meta_volume:false
> use_rsync_xattrs:false
> working_dir:/var/lib/misc/gluster/gsyncd/archivosvao_samil_archivossamil/
>
>
> gluster> volume info
>
> Volume Name: archivossamil
> Type: Distribute
> Volume ID: 8d4c7ef7-35fc-497a-9425-66f4aced159b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: samil:/brickarchivos/archivos
> Options Reconfigured:
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> features.read-only: on
>
> Volume Name: archivosvao
> Type: Distribute
> Volume ID: c7fa7778-f2e4-48f9-8817-5811c09964d5
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: vao:/brickarchivos/archivos
> Options Reconfigured:
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> geo-replication.indexing: on
> geo-replication.ignore-pid-check: on
> changelog.changelog: on
>
> Volume Name: home
> Type: Replicate
> Volume ID: 74522542-5d7a-4fdd-9cea-76bf1ff27e7d
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: samil:/brickhome/home
> Brick2: vao:/brickhome/home
> Options Reconfigured:
> performance.client-io-threads: off
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
>
>
> These errors appear in the master logs:
>
>
>
> .
>
> [2020-03-25 09:00:12.554226] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Takenjob=1   num_files=2 return_code=0
> duration=0.0483
> [2020-03-25 09:00:12

Re: [Gluster-users] geo-replication sync issue

2020-03-18 Thread Strahil Nikolov
On March 18, 2020 1:41:15 PM GMT+02:00, "Etem Bayoğlu"  
wrote:
>Yes I had tried.. my observation in my issue is : glusterfs crawler did
>not
>exit from a specific directory that had been synced already.  Like a
>infinite loop. It was crawling that directory endlessly. I tried so
>many
>things an time goes on.
>So I gave up and switched to nfs + rsync for now. This issue is getting
>me
>angry.
>
>
>Thank community for help. ;)
>
>On 18 Mar 2020 Wed at 09:00 Kotresh Hiremath Ravishankar <
>khire...@redhat.com> wrote:
>
>> Could you try disabling syncing xattrs and check ?
>>
>> gluster vol geo-rep  :: config
>sync-xattrs
>> false
>>
>> On Fri, Mar 13, 2020 at 1:42 AM Strahil Nikolov
>
>> wrote:
>>
>>> On March 12, 2020 9:41:45 AM GMT+02:00, "Etem Bayoğlu" <
>>> etembayo...@gmail.com> wrote:
>>> >Hello again,
>>> >
>>> >These are gsyncd.log from master on DEBUG level. It tells entering
>>> >directory, synced files , and gfid information
>>> >
>>> >[2020-03-12 07:18:16.702286] D [master(worker
>>> >/srv/media-storage):324:regjob] _GMaster: synced
>>> >file=.gfid/358fe62c-c7e8-449a-90dd-1cc1a3b7a346
>>> >[2020-03-12 07:18:16.702420] D [master(worker
>>> >/srv/media-storage):324:regjob] _GMaster: synced
>>> >file=.gfid/04eb63e3-7fcb-45d2-9f29-6292a5072adb
>>> >[2020-03-12 07:18:16.702574] D [master(worker
>>> >/srv/media-storage):324:regjob] _GMaster: synced
>>> >file=.gfid/4363e521-d81a-4a0f-bfa4-5ee6b92da2b4
>>> >[2020-03-12 07:18:16.702704] D [master(worker
>>> >/srv/media-storage):324:regjob] _GMaster: synced
>>> >file=.gfid/bed30509-2c5f-4c77-b2f9-81916a99abd9
>>> >[2020-03-12 07:18:16.702828] D [master(worker
>>> >/srv/media-storage):324:regjob] _GMaster: synced
>>> >file=.gfid/d86f44cc-3001-4bdf-8bae-6bed2a9c8381
>>> >[2020-03-12 07:18:16.702950] D [master(worker
>>> >/srv/media-storage):324:regjob] _GMaster: synced
>>> >file=.gfid/da40d429-d89e-4dc9-9dda-07922d87b3c8
>>> >[2020-03-12 07:18:16.703075] D [master(worker
>>> >/srv/media-storage):324:regjob] _GMaster: synced
>>> >file=.gfid/befc5e03-b7a1-43dc-b6c2-0a186019b6d5
>>> >[2020-03-12 07:18:16.703198] D [master(worker
>>> >/srv/media-storage):324:regjob] _GMaster: synced
>>> >file=.gfid/4e66035f-99f9-4802-b876-2e01686d18f2
>>> >[2020-03-12 07:18:16.703378] D [master(worker
>>> >/srv/media-storage):324:regjob] _GMaster: synced
>>> >file=.gfid/d1295b51-e461-4766-b504-8e9a941a056f
>>> >[2020-03-12 07:18:16.719875] D [master(worker
>>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>>> >./api/media/listing/2018/06-02/1557813
>>> >[2020-03-12 07:18:17.72679] D [master(worker
>>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>>> >./api/media/listing/2018/06-02/1557205
>>> >[2020-03-12 07:18:17.297362] D [master(worker
>>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>>> >./api/media/listing/2018/06-02/1556880
>>> >[2020-03-12 07:18:17.488224] D [master(worker
>>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>>> >./api/media/listing/2018/06-02/1557769
>>> >[2020-03-12 07:18:17.730181] D [master(worker
>>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>>> >./api/media/listing/2018/06-02/1557028
>>> >[2020-03-12 07:18:17.869410] I [gsyncd(config-get):318:main] :
>>> >Using
>>> >session config file
>>>
>>>
>>path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>>> >[2020-03-12 07:18:18.65431] D [master(worker
>>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>>> >./api/media/listing/2018/06-02/1558442
>>> >[2020-03-12 07:18:18.352381] D [master(worker
>>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>>> >./api/media/listing/2018/06-02/1557391
>>> >[2020-03-12 07:18:18.374876] I [gsyncd(config-get):318:main] :
>>> >Using
>>> >session config file
>>>
>>>
>>path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>>> >[2020-03-12 07:18:18.482299] I [gsyncd(config-set):318:main] :
>>> >Using
>>> >session config file
>>>
>>>
>>path=/var/lib/glusterd/geo-replication/media-storage_slave-nodem_dr-media/gsyncd.conf
>>> >[2020-03-12 07:18:18.507585] D [master(worker
>>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>>> >./api/media/listing/2018/06-02/1558577
>>> >[2020-03-12 07:18:18.576061] I [gsyncd(config-get):318:main] :
>>> >Using
>>> >session config file
>>>
>>>
>>path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>>> >[2020-03-12 07:18:18.582772] D [master(worker
>>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>>> >./api/media/listing/2018/06-02/1556831
>>> >[2020-03-12 07:18:18.684170] I [gsyncd(config-get):318:main] :
>>> >Using
>>> >session config file
>>>
>>>
>>path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>>> >[2020-03-12 07:18:18.691845] E [syncdutils(worker
>>> >/srv/media-storage):312:log_raise_exception] : connection to
>peer
>>> >is
>>> >broken
>>> >[2020-03-12 07:18:18.692106] E [syncdutils(worker
>>> >/srv/media-storage):312:log_raise_exception] : connec

Re: [Gluster-users] geo-replication sync issue

2020-03-18 Thread Etem Bayoğlu
Yes I had tried.. my observation in my issue is : glusterfs crawler did not
exit from a specific directory that had been synced already.  Like a
infinite loop. It was crawling that directory endlessly. I tried so many
things an time goes on.
So I gave up and switched to nfs + rsync for now. This issue is getting me
angry.


Thank community for help. ;)

On 18 Mar 2020 Wed at 09:00 Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> Could you try disabling syncing xattrs and check ?
>
> gluster vol geo-rep  :: config sync-xattrs
> false
>
> On Fri, Mar 13, 2020 at 1:42 AM Strahil Nikolov 
> wrote:
>
>> On March 12, 2020 9:41:45 AM GMT+02:00, "Etem Bayoğlu" <
>> etembayo...@gmail.com> wrote:
>> >Hello again,
>> >
>> >These are gsyncd.log from master on DEBUG level. It tells entering
>> >directory, synced files , and gfid information
>> >
>> >[2020-03-12 07:18:16.702286] D [master(worker
>> >/srv/media-storage):324:regjob] _GMaster: synced
>> >file=.gfid/358fe62c-c7e8-449a-90dd-1cc1a3b7a346
>> >[2020-03-12 07:18:16.702420] D [master(worker
>> >/srv/media-storage):324:regjob] _GMaster: synced
>> >file=.gfid/04eb63e3-7fcb-45d2-9f29-6292a5072adb
>> >[2020-03-12 07:18:16.702574] D [master(worker
>> >/srv/media-storage):324:regjob] _GMaster: synced
>> >file=.gfid/4363e521-d81a-4a0f-bfa4-5ee6b92da2b4
>> >[2020-03-12 07:18:16.702704] D [master(worker
>> >/srv/media-storage):324:regjob] _GMaster: synced
>> >file=.gfid/bed30509-2c5f-4c77-b2f9-81916a99abd9
>> >[2020-03-12 07:18:16.702828] D [master(worker
>> >/srv/media-storage):324:regjob] _GMaster: synced
>> >file=.gfid/d86f44cc-3001-4bdf-8bae-6bed2a9c8381
>> >[2020-03-12 07:18:16.702950] D [master(worker
>> >/srv/media-storage):324:regjob] _GMaster: synced
>> >file=.gfid/da40d429-d89e-4dc9-9dda-07922d87b3c8
>> >[2020-03-12 07:18:16.703075] D [master(worker
>> >/srv/media-storage):324:regjob] _GMaster: synced
>> >file=.gfid/befc5e03-b7a1-43dc-b6c2-0a186019b6d5
>> >[2020-03-12 07:18:16.703198] D [master(worker
>> >/srv/media-storage):324:regjob] _GMaster: synced
>> >file=.gfid/4e66035f-99f9-4802-b876-2e01686d18f2
>> >[2020-03-12 07:18:16.703378] D [master(worker
>> >/srv/media-storage):324:regjob] _GMaster: synced
>> >file=.gfid/d1295b51-e461-4766-b504-8e9a941a056f
>> >[2020-03-12 07:18:16.719875] D [master(worker
>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>> >./api/media/listing/2018/06-02/1557813
>> >[2020-03-12 07:18:17.72679] D [master(worker
>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>> >./api/media/listing/2018/06-02/1557205
>> >[2020-03-12 07:18:17.297362] D [master(worker
>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>> >./api/media/listing/2018/06-02/1556880
>> >[2020-03-12 07:18:17.488224] D [master(worker
>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>> >./api/media/listing/2018/06-02/1557769
>> >[2020-03-12 07:18:17.730181] D [master(worker
>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>> >./api/media/listing/2018/06-02/1557028
>> >[2020-03-12 07:18:17.869410] I [gsyncd(config-get):318:main] :
>> >Using
>> >session config file
>>
>> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>> >[2020-03-12 07:18:18.65431] D [master(worker
>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>> >./api/media/listing/2018/06-02/1558442
>> >[2020-03-12 07:18:18.352381] D [master(worker
>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>> >./api/media/listing/2018/06-02/1557391
>> >[2020-03-12 07:18:18.374876] I [gsyncd(config-get):318:main] :
>> >Using
>> >session config file
>>
>> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>> >[2020-03-12 07:18:18.482299] I [gsyncd(config-set):318:main] :
>> >Using
>> >session config file
>>
>> >path=/var/lib/glusterd/geo-replication/media-storage_slave-nodem_dr-media/gsyncd.conf
>> >[2020-03-12 07:18:18.507585] D [master(worker
>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>> >./api/media/listing/2018/06-02/1558577
>> >[2020-03-12 07:18:18.576061] I [gsyncd(config-get):318:main] :
>> >Using
>> >session config file
>>
>> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>> >[2020-03-12 07:18:18.582772] D [master(worker
>> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
>> >./api/media/listing/2018/06-02/1556831
>> >[2020-03-12 07:18:18.684170] I [gsyncd(config-get):318:main] :
>> >Using
>> >session config file
>>
>> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>> >[2020-03-12 07:18:18.691845] E [syncdutils(worker
>> >/srv/media-storage):312:log_raise_exception] : connection to peer
>> >is
>> >broken
>> >[2020-03-12 07:18:18.692106] E [syncdutils(worker
>> >/srv/media-storage):312:log_raise_exception] : connection to peer
>> >is
>> >broken
>> >[2020-03-12 07:18:18.694910] E [syncdutils(worker
>> >/srv/media-storage):822:errlog] Popen: command returned error cmd=ssh
>> >-oPasswordAuthentication=

Re: [Gluster-users] geo-replication sync issue

2020-03-17 Thread Kotresh Hiremath Ravishankar
Could you try disabling syncing xattrs and check ?

gluster vol geo-rep  :: config sync-xattrs
false

On Fri, Mar 13, 2020 at 1:42 AM Strahil Nikolov 
wrote:

> On March 12, 2020 9:41:45 AM GMT+02:00, "Etem Bayoğlu" <
> etembayo...@gmail.com> wrote:
> >Hello again,
> >
> >These are gsyncd.log from master on DEBUG level. It tells entering
> >directory, synced files , and gfid information
> >
> >[2020-03-12 07:18:16.702286] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/358fe62c-c7e8-449a-90dd-1cc1a3b7a346
> >[2020-03-12 07:18:16.702420] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/04eb63e3-7fcb-45d2-9f29-6292a5072adb
> >[2020-03-12 07:18:16.702574] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/4363e521-d81a-4a0f-bfa4-5ee6b92da2b4
> >[2020-03-12 07:18:16.702704] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/bed30509-2c5f-4c77-b2f9-81916a99abd9
> >[2020-03-12 07:18:16.702828] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/d86f44cc-3001-4bdf-8bae-6bed2a9c8381
> >[2020-03-12 07:18:16.702950] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/da40d429-d89e-4dc9-9dda-07922d87b3c8
> >[2020-03-12 07:18:16.703075] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/befc5e03-b7a1-43dc-b6c2-0a186019b6d5
> >[2020-03-12 07:18:16.703198] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/4e66035f-99f9-4802-b876-2e01686d18f2
> >[2020-03-12 07:18:16.703378] D [master(worker
> >/srv/media-storage):324:regjob] _GMaster: synced
> >file=.gfid/d1295b51-e461-4766-b504-8e9a941a056f
> >[2020-03-12 07:18:16.719875] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1557813
> >[2020-03-12 07:18:17.72679] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1557205
> >[2020-03-12 07:18:17.297362] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1556880
> >[2020-03-12 07:18:17.488224] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1557769
> >[2020-03-12 07:18:17.730181] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1557028
> >[2020-03-12 07:18:17.869410] I [gsyncd(config-get):318:main] :
> >Using
> >session config file
>
> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
> >[2020-03-12 07:18:18.65431] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1558442
> >[2020-03-12 07:18:18.352381] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1557391
> >[2020-03-12 07:18:18.374876] I [gsyncd(config-get):318:main] :
> >Using
> >session config file
>
> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
> >[2020-03-12 07:18:18.482299] I [gsyncd(config-set):318:main] :
> >Using
> >session config file
>
> >path=/var/lib/glusterd/geo-replication/media-storage_slave-nodem_dr-media/gsyncd.conf
> >[2020-03-12 07:18:18.507585] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1558577
> >[2020-03-12 07:18:18.576061] I [gsyncd(config-get):318:main] :
> >Using
> >session config file
>
> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
> >[2020-03-12 07:18:18.582772] D [master(worker
> >/srv/media-storage):1792:Xcrawl] _GMaster: entering
> >./api/media/listing/2018/06-02/1556831
> >[2020-03-12 07:18:18.684170] I [gsyncd(config-get):318:main] :
> >Using
> >session config file
>
> >path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
> >[2020-03-12 07:18:18.691845] E [syncdutils(worker
> >/srv/media-storage):312:log_raise_exception] : connection to peer
> >is
> >broken
> >[2020-03-12 07:18:18.692106] E [syncdutils(worker
> >/srv/media-storage):312:log_raise_exception] : connection to peer
> >is
> >broken
> >[2020-03-12 07:18:18.694910] E [syncdutils(worker
> >/srv/media-storage):822:errlog] Popen: command returned error cmd=ssh
> >-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> >/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
> >-S
> >/tmp/gsyncd-aux-ssh-WaMqpG/241afba5343394352fc3f9c251909232.sock
> >slave-node
> >/nonexistent/gsyncd slave media-storage slave-node::dr-media
> >--master-node
> >master-node --master-node-id 023cdb20-2737-4278-93c2-0927917ee314
> >--master-brick /srv/media-storage --local-node slave-node
> >--local-node-id
> >cf34fc96-a08a-49c2-b8eb-a3df5a05f757 --slave-timeout 120
> >--slave-log-level
> >DEBUG --slave-gluster-log-level INFO --slave

Re: [Gluster-users] geo-replication sync issue

2020-03-12 Thread Strahil Nikolov
On March 12, 2020 9:41:45 AM GMT+02:00, "Etem Bayoğlu"  
wrote:
>Hello again,
>
>These are gsyncd.log from master on DEBUG level. It tells entering
>directory, synced files , and gfid information
>
>[2020-03-12 07:18:16.702286] D [master(worker
>/srv/media-storage):324:regjob] _GMaster: synced
>file=.gfid/358fe62c-c7e8-449a-90dd-1cc1a3b7a346
>[2020-03-12 07:18:16.702420] D [master(worker
>/srv/media-storage):324:regjob] _GMaster: synced
>file=.gfid/04eb63e3-7fcb-45d2-9f29-6292a5072adb
>[2020-03-12 07:18:16.702574] D [master(worker
>/srv/media-storage):324:regjob] _GMaster: synced
>file=.gfid/4363e521-d81a-4a0f-bfa4-5ee6b92da2b4
>[2020-03-12 07:18:16.702704] D [master(worker
>/srv/media-storage):324:regjob] _GMaster: synced
>file=.gfid/bed30509-2c5f-4c77-b2f9-81916a99abd9
>[2020-03-12 07:18:16.702828] D [master(worker
>/srv/media-storage):324:regjob] _GMaster: synced
>file=.gfid/d86f44cc-3001-4bdf-8bae-6bed2a9c8381
>[2020-03-12 07:18:16.702950] D [master(worker
>/srv/media-storage):324:regjob] _GMaster: synced
>file=.gfid/da40d429-d89e-4dc9-9dda-07922d87b3c8
>[2020-03-12 07:18:16.703075] D [master(worker
>/srv/media-storage):324:regjob] _GMaster: synced
>file=.gfid/befc5e03-b7a1-43dc-b6c2-0a186019b6d5
>[2020-03-12 07:18:16.703198] D [master(worker
>/srv/media-storage):324:regjob] _GMaster: synced
>file=.gfid/4e66035f-99f9-4802-b876-2e01686d18f2
>[2020-03-12 07:18:16.703378] D [master(worker
>/srv/media-storage):324:regjob] _GMaster: synced
>file=.gfid/d1295b51-e461-4766-b504-8e9a941a056f
>[2020-03-12 07:18:16.719875] D [master(worker
>/srv/media-storage):1792:Xcrawl] _GMaster: entering
>./api/media/listing/2018/06-02/1557813
>[2020-03-12 07:18:17.72679] D [master(worker
>/srv/media-storage):1792:Xcrawl] _GMaster: entering
>./api/media/listing/2018/06-02/1557205
>[2020-03-12 07:18:17.297362] D [master(worker
>/srv/media-storage):1792:Xcrawl] _GMaster: entering
>./api/media/listing/2018/06-02/1556880
>[2020-03-12 07:18:17.488224] D [master(worker
>/srv/media-storage):1792:Xcrawl] _GMaster: entering
>./api/media/listing/2018/06-02/1557769
>[2020-03-12 07:18:17.730181] D [master(worker
>/srv/media-storage):1792:Xcrawl] _GMaster: entering
>./api/media/listing/2018/06-02/1557028
>[2020-03-12 07:18:17.869410] I [gsyncd(config-get):318:main] :
>Using
>session config file
>path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>[2020-03-12 07:18:18.65431] D [master(worker
>/srv/media-storage):1792:Xcrawl] _GMaster: entering
>./api/media/listing/2018/06-02/1558442
>[2020-03-12 07:18:18.352381] D [master(worker
>/srv/media-storage):1792:Xcrawl] _GMaster: entering
>./api/media/listing/2018/06-02/1557391
>[2020-03-12 07:18:18.374876] I [gsyncd(config-get):318:main] :
>Using
>session config file
>path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>[2020-03-12 07:18:18.482299] I [gsyncd(config-set):318:main] :
>Using
>session config file
>path=/var/lib/glusterd/geo-replication/media-storage_slave-nodem_dr-media/gsyncd.conf
>[2020-03-12 07:18:18.507585] D [master(worker
>/srv/media-storage):1792:Xcrawl] _GMaster: entering
>./api/media/listing/2018/06-02/1558577
>[2020-03-12 07:18:18.576061] I [gsyncd(config-get):318:main] :
>Using
>session config file
>path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>[2020-03-12 07:18:18.582772] D [master(worker
>/srv/media-storage):1792:Xcrawl] _GMaster: entering
>./api/media/listing/2018/06-02/1556831
>[2020-03-12 07:18:18.684170] I [gsyncd(config-get):318:main] :
>Using
>session config file
>path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
>[2020-03-12 07:18:18.691845] E [syncdutils(worker
>/srv/media-storage):312:log_raise_exception] : connection to peer
>is
>broken
>[2020-03-12 07:18:18.692106] E [syncdutils(worker
>/srv/media-storage):312:log_raise_exception] : connection to peer
>is
>broken
>[2020-03-12 07:18:18.694910] E [syncdutils(worker
>/srv/media-storage):822:errlog] Popen: command returned error cmd=ssh
>-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
>-S
>/tmp/gsyncd-aux-ssh-WaMqpG/241afba5343394352fc3f9c251909232.sock
>slave-node
>/nonexistent/gsyncd slave media-storage slave-node::dr-media
>--master-node
>master-node --master-node-id 023cdb20-2737-4278-93c2-0927917ee314
>--master-brick /srv/media-storage --local-node slave-node
>--local-node-id
>cf34fc96-a08a-49c2-b8eb-a3df5a05f757 --slave-timeout 120
>--slave-log-level
>DEBUG --slave-gluster-log-level INFO --slave-gluster-command-dir
>/usr/sbin
>--master-dist-count 1 error=255
>[2020-03-12 07:18:18.701545] E [syncdutils(worker
>/srv/media-storage):826:logerr] Popen: ssh> Killed by signal 15.
>[2020-03-12 07:18:18.721456] I [repce(agent
>/srv/media-storage):96:service_loop] RepceServer: terminating on
>reaching
>EOF.
>[2020-03-12 07:18:18.778527] I
>[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus

Re: [Gluster-users] geo-replication sync issue

2020-03-12 Thread Etem Bayoğlu
Hello again,

These are gsyncd.log from master on DEBUG level. It tells entering
directory, synced files , and gfid information

[2020-03-12 07:18:16.702286] D [master(worker
/srv/media-storage):324:regjob] _GMaster: synced
file=.gfid/358fe62c-c7e8-449a-90dd-1cc1a3b7a346
[2020-03-12 07:18:16.702420] D [master(worker
/srv/media-storage):324:regjob] _GMaster: synced
file=.gfid/04eb63e3-7fcb-45d2-9f29-6292a5072adb
[2020-03-12 07:18:16.702574] D [master(worker
/srv/media-storage):324:regjob] _GMaster: synced
file=.gfid/4363e521-d81a-4a0f-bfa4-5ee6b92da2b4
[2020-03-12 07:18:16.702704] D [master(worker
/srv/media-storage):324:regjob] _GMaster: synced
file=.gfid/bed30509-2c5f-4c77-b2f9-81916a99abd9
[2020-03-12 07:18:16.702828] D [master(worker
/srv/media-storage):324:regjob] _GMaster: synced
file=.gfid/d86f44cc-3001-4bdf-8bae-6bed2a9c8381
[2020-03-12 07:18:16.702950] D [master(worker
/srv/media-storage):324:regjob] _GMaster: synced
file=.gfid/da40d429-d89e-4dc9-9dda-07922d87b3c8
[2020-03-12 07:18:16.703075] D [master(worker
/srv/media-storage):324:regjob] _GMaster: synced
file=.gfid/befc5e03-b7a1-43dc-b6c2-0a186019b6d5
[2020-03-12 07:18:16.703198] D [master(worker
/srv/media-storage):324:regjob] _GMaster: synced
file=.gfid/4e66035f-99f9-4802-b876-2e01686d18f2
[2020-03-12 07:18:16.703378] D [master(worker
/srv/media-storage):324:regjob] _GMaster: synced
file=.gfid/d1295b51-e461-4766-b504-8e9a941a056f
[2020-03-12 07:18:16.719875] D [master(worker
/srv/media-storage):1792:Xcrawl] _GMaster: entering
./api/media/listing/2018/06-02/1557813
[2020-03-12 07:18:17.72679] D [master(worker
/srv/media-storage):1792:Xcrawl] _GMaster: entering
./api/media/listing/2018/06-02/1557205
[2020-03-12 07:18:17.297362] D [master(worker
/srv/media-storage):1792:Xcrawl] _GMaster: entering
./api/media/listing/2018/06-02/1556880
[2020-03-12 07:18:17.488224] D [master(worker
/srv/media-storage):1792:Xcrawl] _GMaster: entering
./api/media/listing/2018/06-02/1557769
[2020-03-12 07:18:17.730181] D [master(worker
/srv/media-storage):1792:Xcrawl] _GMaster: entering
./api/media/listing/2018/06-02/1557028
[2020-03-12 07:18:17.869410] I [gsyncd(config-get):318:main] : Using
session config file
path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
[2020-03-12 07:18:18.65431] D [master(worker
/srv/media-storage):1792:Xcrawl] _GMaster: entering
./api/media/listing/2018/06-02/1558442
[2020-03-12 07:18:18.352381] D [master(worker
/srv/media-storage):1792:Xcrawl] _GMaster: entering
./api/media/listing/2018/06-02/1557391
[2020-03-12 07:18:18.374876] I [gsyncd(config-get):318:main] : Using
session config file
path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
[2020-03-12 07:18:18.482299] I [gsyncd(config-set):318:main] : Using
session config file
path=/var/lib/glusterd/geo-replication/media-storage_slave-nodem_dr-media/gsyncd.conf
[2020-03-12 07:18:18.507585] D [master(worker
/srv/media-storage):1792:Xcrawl] _GMaster: entering
./api/media/listing/2018/06-02/1558577
[2020-03-12 07:18:18.576061] I [gsyncd(config-get):318:main] : Using
session config file
path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
[2020-03-12 07:18:18.582772] D [master(worker
/srv/media-storage):1792:Xcrawl] _GMaster: entering
./api/media/listing/2018/06-02/1556831
[2020-03-12 07:18:18.684170] I [gsyncd(config-get):318:main] : Using
session config file
path=/var/lib/glusterd/geo-replication/media-storage_slave-node_dr-media/gsyncd.conf
[2020-03-12 07:18:18.691845] E [syncdutils(worker
/srv/media-storage):312:log_raise_exception] : connection to peer is
broken
[2020-03-12 07:18:18.692106] E [syncdutils(worker
/srv/media-storage):312:log_raise_exception] : connection to peer is
broken
[2020-03-12 07:18:18.694910] E [syncdutils(worker
/srv/media-storage):822:errlog] Popen: command returned error cmd=ssh
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-WaMqpG/241afba5343394352fc3f9c251909232.sock slave-node
/nonexistent/gsyncd slave media-storage slave-node::dr-media --master-node
master-node --master-node-id 023cdb20-2737-4278-93c2-0927917ee314
--master-brick /srv/media-storage --local-node slave-node --local-node-id
cf34fc96-a08a-49c2-b8eb-a3df5a05f757 --slave-timeout 120 --slave-log-level
DEBUG --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin
--master-dist-count 1 error=255
[2020-03-12 07:18:18.701545] E [syncdutils(worker
/srv/media-storage):826:logerr] Popen: ssh> Killed by signal 15.
[2020-03-12 07:18:18.721456] I [repce(agent
/srv/media-storage):96:service_loop] RepceServer: terminating on reaching
EOF.
[2020-03-12 07:18:18.778527] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
Change status=Faulty
[2020-03-12 07:18:19.791198] I [gsyncd(config-get):318:main] : Using
session config file
path=/var/lib/glusterd/geo-replication/media-stora

Re: [Gluster-users] geo-replication sync issue

2020-03-12 Thread Etem Bayoğlu
Hi,

here my slave node logs at the time sync stopped:

[2020-03-08 03:33:01.489559] I [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk]
0-glusterfs: No change in volfile,continuing
[2020-03-08 03:33:01.489298] I [MSGID: 100011]
[glusterfsd.c:1679:reincarnate] 0-glusterfsd: Fetching the volume file from
server...
[2020-03-08 09:49:37.991177] I [fuse-bridge.c:6083:fuse_thread_proc]
0-fuse: initiating unmount of /tmp/gsyncd-aux-mount-l3PR6o
[2020-03-08 09:49:37.993978] W [glusterfsd.c:1596:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7e65) [0x7f2f9f70ce65]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55cc67c20625]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55cc67c2048b] ) 0-:
received signum (15), shutting down
[2020-03-08 09:49:37.994012] I [fuse-bridge.c:6871:fini] 0-fuse: Unmounting
'/tmp/gsyncd-aux-mount-l3PR6o'.
[2020-03-08 09:49:37.994022] I [fuse-bridge.c:6876:fini] 0-fuse: Closing
fuse connection to '/tmp/gsyncd-aux-mount-l3PR6o'.
[2020-03-08 09:49:50.302806] I [MSGID: 100030] [glusterfsd.c:2867:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 7.3
(args: /usr/sbin/glusterfs --aux-gfid-mount --acl --log-level=INFO
--log-file=/var/log/glusterfs/geo-replication-slaves/media-storage_slave-node_dr-media/mnt-master-node-srv-media-storage.log
--volfile-server=localhost --volfile-id=dr-media --client-pid=-1
/tmp/gsyncd-aux-mount-1AQBe4)
[2020-03-08 09:49:50.311167] I [glusterfsd.c:2594:daemonize] 0-glusterfs:
Pid of current running process is 55522
[2020-03-08 09:49:50.352351] I [MSGID: 101190]
[event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 0
[2020-03-08 09:49:50.352416] I [MSGID: 101190]
[event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2020-03-08 09:49:50.373248] I [MSGID: 114020] [client.c:2436:notify]
0-dr-media-client-0: parent translators are ready, attempting connect on
transport
Final graph:
+--+
  1: volume dr-media-client-0
  2: type protocol/client
  3: option ping-timeout 42
  4: option remote-host slave-node
  5: option remote-subvolume /data/dr-media
  6: option transport-type socket
  7: option transport.address-family inet
  8: option username 4aafadfa-6ccb-4c2f-920c-1f37ed9eef34
  9: option password a8c0f88b-2621-4038-8f65-98068ea71bb0
 10: option transport.socket.ssl-enabled off
 11: option transport.tcp-user-timeout 0
 12: option transport.socket.keepalive-time 20
 13: option transport.socket.keepalive-interval 2
 14: option transport.socket.keepalive-count 9
 15: option send-gids true
 16: end-volume
 17:
 18: volume dr-media-dht
 19: type cluster/distribute
 20: option lock-migration off
 21: option force-migration off
 22: subvolumes dr-media-client-0
 23: end-volume
 24:
 25: volume dr-media-write-behind
 26: type performance/write-behind
 27: option cache-size 8MB
 28: option aggregate-size 1MB
 29: subvolumes dr-media-dht
 30: end-volume
 31:
 32: volume dr-media-read-ahead
 33: type performance/read-ahead
 34: subvolumes dr-media-write-behind
 35: end-volume
 36:
 37: volume dr-media-readdir-ahead
 38: type performance/readdir-ahead
 39: option parallel-readdir off
 40: option rda-request-size 131072
 41: option rda-cache-limit 10MB
 42: subvolumes dr-media-read-ahead
 43: end-volume
 44:
 45: volume dr-media-io-cache
 46: type performance/io-cache
 47: option cache-size 256MB
 48: subvolumes dr-media-readdir-ahead
 49: end-volume
 50:
 51: volume dr-media-open-behind
 52: type performance/open-behind
 53: subvolumes dr-media-io-cache
 54: end-volume
 55:
 56: volume dr-media-quick-read
 57: type performance/quick-read
 58: option cache-size 256MB
 59: subvolumes dr-media-open-behind
 60: end-volume
 61:
 62: volume dr-media-md-cache
 63: type performance/md-cache
 64: option cache-posix-acl true
 65: subvolumes dr-media-quick-read
 66: end-volume
 67:
 68: volume dr-media-io-threads
 69: type performance/io-threads
 70: subvolumes dr-media-md-cache
 71: end-volume
 72:
 73: volume dr-media
 74: type debug/io-stats
 75: option log-level INFO
 76: option threads 16
 77: option latency-measurement off
 78: option count-fop-hits off
 79: option global-threading off
 80: subvolumes dr-media-io-threads
 81: end-volume
 82:
 83: volume posix-acl-autoload
 84: type system/posix-acl
 85: subvolumes dr-media
 86: end-volume
 87:
 88: volume gfid-access-autoload
 89: type features/gfid-access
 90: subvolumes posix-acl-autoload
 91: end-volume
 92:
 93: volume meta-autoload
 94: type meta
 95: subvolumes gfid-access-autoload
 96: end-volume
 97:
+--+
[2020-03-08 09:49:50.388102] I [rpc-clnt.c:1963:rpc_clnt_reconfig]
0-dr-media

Re: [Gluster-users] geo-replication sync issue

2020-03-11 Thread Strahil Nikolov
On March 11, 2020 10:17:05 PM GMT+02:00, "Etem Bayoğlu"  
wrote:
>Hi Strahil,
>
>Thank you for your response. when I tail logs on both master and slave
>I
>get this:
>
>on slave, from
>/var/log/glusterfs/geo-replication-slaves//mnt-XXX.log
>file:
>
>[2020-03-11 19:53:32.721509] E
>[fuse-bridge.c:227:check_and_dump_fuse_W]
>(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7f78e10488ea]
>(-->
>/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x8221)[0x7f78d83f6221]
>(-->
>/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x9998)[0x7f78d83f7998]
>(-->
>/lib64/libpthread.so.0(+0x7e65)[0x7f78dfe89e65] (-->
>/lib64/libc.so.6(clone+0x6d)[0x7f78df74f88d] ) 0-glusterfs-fuse:
>writing to fuse device failed: No such file or directory
>[2020-03-11 19:53:32.723758] E
>[fuse-bridge.c:227:check_and_dump_fuse_W]
>(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7f78e10488ea]
>(-->
>/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x8221)[0x7f78d83f6221]
>(-->
>/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x9998)[0x7f78d83f7998]
>(-->
>/lib64/libpthread.so.0(+0x7e65)[0x7f78dfe89e65] (-->
>/lib64/libc.so.6(clone+0x6d)[0x7f78df74f88d] ) 0-glusterfs-fuse:
>writing to fuse device failed: No such file or directory
>
>on master,
>from /var/log/glusterfs/geo-replication//mnt-XXX.log file:
>
>[2020-03-11 19:40:55.872002] E [fuse-bridge.c:4188:fuse_xattr_cbk]
>0-glusterfs-fuse: extended attribute not supported by the backend
>storage
>[2020-03-11 19:40:58.389748] E
>[fuse-bridge.c:227:check_and_dump_fuse_W]
>(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7f1f4b9108ea]
>(-->
>/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x8221)[0x7f1f42cc2221]
>(-->
>/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x9998)[0x7f1f42cc3998]
>(-->
>/lib64/libpthread.so.0(+0x7e25)[0x7f1f4a751e25] (-->
>/lib64/libc.so.6(clone+0x6d)[0x7f1f4a01abad] ) 0-glusterfs-fuse:
>writing to fuse device failed: No such file or directory
>[2020-03-11 19:41:08.214591] E
>[fuse-bridge.c:227:check_and_dump_fuse_W]
>(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7f1f4b9108ea]
>(-->
>/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x8221)[0x7f1f42cc2221]
>(-->
>/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x9998)[0x7f1f42cc3998]
>(-->
>/lib64/libpthread.so.0(+0x7e25)[0x7f1f4a751e25] (-->
>/lib64/libc.so.6(clone+0x6d)[0x7f1f4a01abad] ) 0-glusterfs-fuse:
>writing to fuse device failed: No such file or directory
>[2020-03-11 19:53:59.275469] E
>[fuse-bridge.c:227:check_and_dump_fuse_W]
>(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7f1f4b9108ea]
>(-->
>/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x8221)[0x7f1f42cc2221]
>(-->
>/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x9998)[0x7f1f42cc3998]
>(-->
>/lib64/libpthread.so.0(+0x7e25)[0x7f1f4a751e25] (-->
>/lib64/libc.so.6(clone+0x6d)[0x7f1f4a01abad] ) 0-glusterfs-fuse:
>writing to fuse device failed: No such file or directory
>
>gsyncd.log outputs:##
>
>from slave:
>[2020-03-11 08:55:16.384085] I [repce(slave
>master-node/srv/media-storage):96:service_loop] RepceServer:
>terminating on
>reaching EOF.
>[2020-03-11 08:57:55.87364] I [resource(slave
>master-node/srv/media-storage):1105:connect] GLUSTER: Mounting gluster
>volume locally...
>[2020-03-11 08:57:56.171372] I [resource(slave
>master-node/srv/media-storage):1128:connect] GLUSTER: Mounted gluster
>volume duration=1.0837
>[2020-03-11 08:57:56.173346] I [resource(slave
>master-node/srv/media-storage):1155:service_loop] GLUSTER: slave
>listening
>
>from master:
>[2020-03-11 20:08:55.145453] I [master(worker
>/srv/media-storage):1991:syncjob] Syncer: Sync Time Taken
>duration=134.9987num_files=4661 job=2 return_code=0
>[2020-03-11 20:08:55.285871] I [master(worker
>/srv/media-storage):1421:process] _GMaster: Entry Time Taken MKD=83
>MKN=8109 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=17.0358 UNL=0
>[2020-03-11 20:08:55.286082] I [master(worker
>/srv/media-storage):1431:process] _GMaster: Data/Metadata Time Taken
>SETA=83 SETX=0 meta_duration=0.9334 data_duration=135.2497 DATA=8109
>XATT=0
>[2020-03-11 20:08:55.286410] I [master(worker
>/srv/media-storage):1441:process] _GMaster: Batch Completed
>changelog_end=1583917610 entry_stime=None changelog_start=1583917610
>stime=None duration=153.5185 num_changelogs=1 mode=xsync
>[2020-03-11 20:08:55.315442] I [master(worker
>/srv/media-storage):1681:crawl] _GMaster: processing xsync changelog
>path=/var/lib/misc/gluster/gsyncd/media-storage_daredevil01.zingat.com_dr-media/srv-media-storage/xsync/XSYNC-CHANGELOG.1583917613
>
>
>Thank you..
>
>Strahil Nikolov , 11 Mar 2020 Çar, 12:28
>tarihinde
>şunu yazdı:
>
>> On March 11, 2020 10:09:27 AM GMT+02:00, "Etem Bayoğlu" <
>> etembayo...@gmail.com> wrote:
>> >Hello community,
>> >
>> >I've set up a glusterfs geo-replication node for disaster recovery.
>I
>> >manage about 10TB media data on a gluster volume and I want to sync
>all
>> >data to remote location over WAN. So, I created a slav

Re: [Gluster-users] geo-replication sync issue

2020-03-11 Thread Etem Bayoğlu
Hi Strahil,

Thank you for your response. when I tail logs on both master and slave I
get this:

on slave, from
/var/log/glusterfs/geo-replication-slaves//mnt-XXX.log file:

[2020-03-11 19:53:32.721509] E [fuse-bridge.c:227:check_and_dump_fuse_W]
(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7f78e10488ea] (-->
/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x8221)[0x7f78d83f6221] (-->
/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x9998)[0x7f78d83f7998] (-->
/lib64/libpthread.so.0(+0x7e65)[0x7f78dfe89e65] (-->
/lib64/libc.so.6(clone+0x6d)[0x7f78df74f88d] ) 0-glusterfs-fuse:
writing to fuse device failed: No such file or directory
[2020-03-11 19:53:32.723758] E [fuse-bridge.c:227:check_and_dump_fuse_W]
(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7f78e10488ea] (-->
/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x8221)[0x7f78d83f6221] (-->
/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x9998)[0x7f78d83f7998] (-->
/lib64/libpthread.so.0(+0x7e65)[0x7f78dfe89e65] (-->
/lib64/libc.so.6(clone+0x6d)[0x7f78df74f88d] ) 0-glusterfs-fuse:
writing to fuse device failed: No such file or directory

on master,
from /var/log/glusterfs/geo-replication//mnt-XXX.log file:

[2020-03-11 19:40:55.872002] E [fuse-bridge.c:4188:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend storage
[2020-03-11 19:40:58.389748] E [fuse-bridge.c:227:check_and_dump_fuse_W]
(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7f1f4b9108ea] (-->
/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x8221)[0x7f1f42cc2221] (-->
/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x9998)[0x7f1f42cc3998] (-->
/lib64/libpthread.so.0(+0x7e25)[0x7f1f4a751e25] (-->
/lib64/libc.so.6(clone+0x6d)[0x7f1f4a01abad] ) 0-glusterfs-fuse:
writing to fuse device failed: No such file or directory
[2020-03-11 19:41:08.214591] E [fuse-bridge.c:227:check_and_dump_fuse_W]
(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7f1f4b9108ea] (-->
/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x8221)[0x7f1f42cc2221] (-->
/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x9998)[0x7f1f42cc3998] (-->
/lib64/libpthread.so.0(+0x7e25)[0x7f1f4a751e25] (-->
/lib64/libc.so.6(clone+0x6d)[0x7f1f4a01abad] ) 0-glusterfs-fuse:
writing to fuse device failed: No such file or directory
[2020-03-11 19:53:59.275469] E [fuse-bridge.c:227:check_and_dump_fuse_W]
(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7f1f4b9108ea] (-->
/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x8221)[0x7f1f42cc2221] (-->
/usr/lib64/glusterfs/7.3/xlator/mount/fuse.so(+0x9998)[0x7f1f42cc3998] (-->
/lib64/libpthread.so.0(+0x7e25)[0x7f1f4a751e25] (-->
/lib64/libc.so.6(clone+0x6d)[0x7f1f4a01abad] ) 0-glusterfs-fuse:
writing to fuse device failed: No such file or directory

gsyncd.log outputs:##

from slave:
[2020-03-11 08:55:16.384085] I [repce(slave
master-node/srv/media-storage):96:service_loop] RepceServer: terminating on
reaching EOF.
[2020-03-11 08:57:55.87364] I [resource(slave
master-node/srv/media-storage):1105:connect] GLUSTER: Mounting gluster
volume locally...
[2020-03-11 08:57:56.171372] I [resource(slave
master-node/srv/media-storage):1128:connect] GLUSTER: Mounted gluster
volume duration=1.0837
[2020-03-11 08:57:56.173346] I [resource(slave
master-node/srv/media-storage):1155:service_loop] GLUSTER: slave listening

from master:
[2020-03-11 20:08:55.145453] I [master(worker
/srv/media-storage):1991:syncjob] Syncer: Sync Time Taken
duration=134.9987num_files=4661 job=2 return_code=0
[2020-03-11 20:08:55.285871] I [master(worker
/srv/media-storage):1421:process] _GMaster: Entry Time Taken MKD=83
MKN=8109 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=17.0358 UNL=0
[2020-03-11 20:08:55.286082] I [master(worker
/srv/media-storage):1431:process] _GMaster: Data/Metadata Time Taken
SETA=83 SETX=0 meta_duration=0.9334 data_duration=135.2497 DATA=8109 XATT=0
[2020-03-11 20:08:55.286410] I [master(worker
/srv/media-storage):1441:process] _GMaster: Batch Completed
changelog_end=1583917610 entry_stime=None changelog_start=1583917610
stime=None duration=153.5185 num_changelogs=1 mode=xsync
[2020-03-11 20:08:55.315442] I [master(worker
/srv/media-storage):1681:crawl] _GMaster: processing xsync changelog
path=/var/lib/misc/gluster/gsyncd/media-storage_daredevil01.zingat.com_dr-media/srv-media-storage/xsync/XSYNC-CHANGELOG.1583917613


Thank you..

Strahil Nikolov , 11 Mar 2020 Çar, 12:28 tarihinde
şunu yazdı:

> On March 11, 2020 10:09:27 AM GMT+02:00, "Etem Bayoğlu" <
> etembayo...@gmail.com> wrote:
> >Hello community,
> >
> >I've set up a glusterfs geo-replication node for disaster recovery. I
> >manage about 10TB media data on a gluster volume and I want to sync all
> >data to remote location over WAN. So, I created a slave node volume on
> >disaster recovery center on remote location and I've started geo-rep
> >session. It has been transferred data fine up to about 800GB, but
> >syncing
> >has stopped for th

Re: [Gluster-users] geo-replication sync issue

2020-03-11 Thread Strahil Nikolov
On March 11, 2020 10:09:27 AM GMT+02:00, "Etem Bayoğlu"  
wrote:
>Hello community,
>
>I've set up a glusterfs geo-replication node for disaster recovery. I
>manage about 10TB media data on a gluster volume and I want to sync all
>data to remote location over WAN. So, I created a slave node volume on
>disaster recovery center on remote location and I've started geo-rep
>session. It has been transferred data fine up to about 800GB, but
>syncing
>has stopped for three days despite gluster geo-rep status active and
>hybrid
>crawl. There is no sending data. I've recreated session and restarted
>but
>still the same.
>
>#gluster volu geo-rep status
>
>MASTER NODEMASTER VOL   MASTER BRICK  SLAVE
>USER
>SLAVE SLAVE NODE   
>STATUS
>   CRAWL STATUSLAST_SYNCED
>
>master-node   media-storage/srv/media-storageroot
> ssh://slave-node::dr-mediaslave-node  Active
>Hybrid Crawl N/A
>
>Any idea? please. Thank you.

Hi Etem,

Have you checked the log on both source and destination. Maybe they can hint 
you what the issue is.

Best Regards,
Strahil Nikolov




Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication

2020-03-04 Thread David Cunningham
Hi Aravinda and Strahil,

The cluster is new so it wasn't a big deal to re-do using public addresses.
That's done and geo-replication is working.

Thank you for your help!


On Wed, 4 Mar 2020 at 17:17, Aravinda VK  wrote:

> Hi David,
>
> I like the Strahil’s idea of adding remote IPs in /etc/hosts with same
> name as used in B cluster. Since Geo-replication uses ssh for syncing it
> should work. Only issue I can think about is if the hostname of cluster B
> conflicts with hostnames of Cluster A.
>
> —
> regards
> Aravinda Vishwanathapura
> https://kadalu.io
>
> On 04-Mar-2020, at 4:13 AM, David Cunningham 
> wrote:
>
> Hi Strahil,
>
> The B cluster are communicating with each other via a LAN, and it seems
> the A cluster has got B's LAN addresses (which aren't accessible from the
> internet including the A cluster) through the geo-replication process. That
> being the case, I think we'll have to re-do the B cluster to replicate
> using public addresses instead of the LAN.
>
> Thank you.
>
>
> On Tue, 3 Mar 2020 at 18:07, Strahil Nikolov 
> wrote:
>
>> On March 3, 2020 4:13:38 AM GMT+02:00, David Cunningham <
>> dcunning...@voisonics.com> wrote:
>> >Hello,
>> >
>> >Thanks for that. When we re-tried with push-pem from cafs10 (on the
>> >A/master cluster) it failed with "Unable to mount and fetch slave
>> >volume
>> >details." and in the logs we see:
>> >
>> >[2020-03-03 02:07:42.614911] E
>> >[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-0: DNS
>> >resolution failed on host nvfs10.local
>> >[2020-03-03 02:07:42.638824] E
>> >[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-1: DNS
>> >resolution failed on host nvfs20.local
>> >[2020-03-03 02:07:42.664493] E
>> >[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-2: DNS
>> >resolution failed on host nvfs30.local
>> >
>> >These .local addresses are the LAN addresses that B/slave nodes nvfs10,
>> >nvfs20, and nvfs30 replicate with. It seems that the A/master needs to
>> >be
>> >able to contact those addresses. Is that right? If it is then we'll
>> >need to
>> >re-do the B cluster to replicate using publicly accessible IP addresses
>> >instead of their LAN.
>> >
>> >Thank you.
>> >
>> >
>> >On Mon, 2 Mar 2020 at 20:53, Aravinda VK  wrote:
>> >
>> >> Looks like setup issue to me. Copying SSH keys manually is not
>> >required.
>> >>
>> >> Command prefix is required while adding to authorized_keys file in
>> >each
>> >> remote nodes. That will not be available if ssh keys are added
>> >manually.
>> >>
>> >> Geo-rep specifies /nonexisting/gsyncd in the command to make sure it
>> >> connects via the actual command specified in authorized_keys file, in
>> >your
>> >> case Geo-replication is actually looking for gsyncd command in
>> >> /nonexisting/gsyncd path.
>> >>
>> >> Please try with push-pem option during Geo-rep create command.
>> >>
>> >> —
>> >> regards
>> >> Aravinda Vishwanathapura
>> >> https://kadalu.io
>> >>
>> >>
>> >> On 02-Mar-2020, at 6:03 AM, David Cunningham
>> >
>> >> wrote:
>> >>
>> >> Hello,
>> >>
>> >> We've set up geo-replication but it isn't actually syncing. Scenario
>> >is
>> >> that we have two GFS clusters. Cluster A has nodes cafs10, cafs20,
>> >and
>> >> cafs30, replicating with each other over a LAN. Cluster B has nodes
>> >nvfs10,
>> >> nvfs20, and nvfs30 also replicating with each other over a LAN. We
>> >are
>> >> geo-replicating data from the A cluster to the B cluster over the
>> >internet.
>> >> SSH key access is set up, allowing all the A nodes password-less
>> >access to
>> >> root on nvfs10
>> >>
>> >> Geo-replication was set up using these commands, run on cafs10:
>> >>
>> >> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 create
>> >> ssh-port 8822 no-verify
>> >> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 config
>> >> remote-gsyncd /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
>> >> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 start
>> >>
>> >> However after a very short period of the status being
>> >"Initializing..."
>> >> the status then sits on "Passive":
>> >>
>> >> # gluster volume geo-replication gvol0 nvfs10.example.com::gvol0
>> >status
>> >> MASTER NODEMASTER VOLMASTER BRICK
>> >SLAVE
>> >> USERSLAVE SLAVE NODE  STATUS
>> >CRAWL
>> >> STATUSLAST_SYNCED
>> >>
>> >>
>>
>> >--
>> >> cafs10 gvol0 /nodirectwritedata/gluster/gvol0root
>> >>  nvfs10.example.com::gvol0nvfs30.localPassiveN/A
>> >> N/A
>> >> cafs30 gvol0 /nodirectwritedata/gluster/gvol0root
>> >>  nvfs10.example.com::gvol0N/A CreatedN/A
>> >> N/A
>> >> cafs20 gvol0 /nodirectwritedata/gluster/gvol0root
>> >>  nvfs10.example.com::gvol0N/A Crea

Re: [Gluster-users] Geo-replication

2020-03-03 Thread Aravinda VK
Hi David,

I like the Strahil’s idea of adding remote IPs in /etc/hosts with same name as 
used in B cluster. Since Geo-replication uses ssh for syncing it should work. 
Only issue I can think about is if the hostname of cluster B conflicts with 
hostnames of Cluster A.

—
regards
Aravinda Vishwanathapura
https://kadalu .io

> On 04-Mar-2020, at 4:13 AM, David Cunningham  
> wrote:
> 
> Hi Strahil,
> 
> The B cluster are communicating with each other via a LAN, and it seems the A 
> cluster has got B's LAN addresses (which aren't accessible from the internet 
> including the A cluster) through the geo-replication process. That being the 
> case, I think we'll have to re-do the B cluster to replicate using public 
> addresses instead of the LAN.
> 
> Thank you.
> 
> 
> On Tue, 3 Mar 2020 at 18:07, Strahil Nikolov  > wrote:
> On March 3, 2020 4:13:38 AM GMT+02:00, David Cunningham 
> mailto:dcunning...@voisonics.com>> wrote:
> >Hello,
> >
> >Thanks for that. When we re-tried with push-pem from cafs10 (on the
> >A/master cluster) it failed with "Unable to mount and fetch slave
> >volume
> >details." and in the logs we see:
> >
> >[2020-03-03 02:07:42.614911] E
> >[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-0: DNS
> >resolution failed on host nvfs10.local
> >[2020-03-03 02:07:42.638824] E
> >[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-1: DNS
> >resolution failed on host nvfs20.local
> >[2020-03-03 02:07:42.664493] E
> >[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-2: DNS
> >resolution failed on host nvfs30.local
> >
> >These .local addresses are the LAN addresses that B/slave nodes nvfs10,
> >nvfs20, and nvfs30 replicate with. It seems that the A/master needs to
> >be
> >able to contact those addresses. Is that right? If it is then we'll
> >need to
> >re-do the B cluster to replicate using publicly accessible IP addresses
> >instead of their LAN.
> >
> >Thank you.
> >
> >
> >On Mon, 2 Mar 2020 at 20:53, Aravinda VK  >> wrote:
> >
> >> Looks like setup issue to me. Copying SSH keys manually is not
> >required.
> >>
> >> Command prefix is required while adding to authorized_keys file in
> >each
> >> remote nodes. That will not be available if ssh keys are added
> >manually.
> >>
> >> Geo-rep specifies /nonexisting/gsyncd in the command to make sure it
> >> connects via the actual command specified in authorized_keys file, in
> >your
> >> case Geo-replication is actually looking for gsyncd command in
> >> /nonexisting/gsyncd path.
> >>
> >> Please try with push-pem option during Geo-rep create command.
> >>
> >> —
> >> regards
> >> Aravinda Vishwanathapura
> >> https://kadalu.io 
> >>
> >>
> >> On 02-Mar-2020, at 6:03 AM, David Cunningham
> >mailto:dcunning...@voisonics.com>>
> >> wrote:
> >>
> >> Hello,
> >>
> >> We've set up geo-replication but it isn't actually syncing. Scenario
> >is
> >> that we have two GFS clusters. Cluster A has nodes cafs10, cafs20,
> >and
> >> cafs30, replicating with each other over a LAN. Cluster B has nodes
> >nvfs10,
> >> nvfs20, and nvfs30 also replicating with each other over a LAN. We
> >are
> >> geo-replicating data from the A cluster to the B cluster over the
> >internet.
> >> SSH key access is set up, allowing all the A nodes password-less
> >access to
> >> root on nvfs10
> >>
> >> Geo-replication was set up using these commands, run on cafs10:
> >>
> >> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 create
> >> ssh-port 8822 no-verify
> >> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 config
> >> remote-gsyncd /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
> >> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 start
> >>
> >> However after a very short period of the status being
> >"Initializing..."
> >> the status then sits on "Passive":
> >>
> >> # gluster volume geo-replication gvol0 nvfs10.example.com::gvol0
> >status
> >> MASTER NODEMASTER VOLMASTER BRICK   
> >SLAVE
> >> USERSLAVE SLAVE NODE  STATUS
> >CRAWL
> >> STATUSLAST_SYNCED
> >>
> >>
> >--
> >> cafs10 gvol0 /nodirectwritedata/gluster/gvol0root
> >>  nvfs10.example.com::gvol0nvfs30.localPassiveN/A
> >> N/A
> >> cafs30 gvol0 /nodirectwritedata/gluster/gvol0root
> >>  nvfs10.example.com::gvol0N/A CreatedN/A
> >> N/A
> >> cafs20 gvol0 /nodirectwritedata/gluster/gvol0root
> >>  nvfs10.example.com::gvol0N/A CreatedN/A
> >> N/A
> >>
> >> So my questions are:
> >> 1. Why does the status on cafs10 mention "nvfs30.local"? That's the
> >LAN
> >> address that nvfs10 replicates wit

Re: [Gluster-users] Geo-replication

2020-03-03 Thread David Cunningham
Hi Strahil,

The B cluster are communicating with each other via a LAN, and it seems the
A cluster has got B's LAN addresses (which aren't accessible from the
internet including the A cluster) through the geo-replication process. That
being the case, I think we'll have to re-do the B cluster to replicate
using public addresses instead of the LAN.

Thank you.


On Tue, 3 Mar 2020 at 18:07, Strahil Nikolov  wrote:

> On March 3, 2020 4:13:38 AM GMT+02:00, David Cunningham <
> dcunning...@voisonics.com> wrote:
> >Hello,
> >
> >Thanks for that. When we re-tried with push-pem from cafs10 (on the
> >A/master cluster) it failed with "Unable to mount and fetch slave
> >volume
> >details." and in the logs we see:
> >
> >[2020-03-03 02:07:42.614911] E
> >[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-0: DNS
> >resolution failed on host nvfs10.local
> >[2020-03-03 02:07:42.638824] E
> >[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-1: DNS
> >resolution failed on host nvfs20.local
> >[2020-03-03 02:07:42.664493] E
> >[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-2: DNS
> >resolution failed on host nvfs30.local
> >
> >These .local addresses are the LAN addresses that B/slave nodes nvfs10,
> >nvfs20, and nvfs30 replicate with. It seems that the A/master needs to
> >be
> >able to contact those addresses. Is that right? If it is then we'll
> >need to
> >re-do the B cluster to replicate using publicly accessible IP addresses
> >instead of their LAN.
> >
> >Thank you.
> >
> >
> >On Mon, 2 Mar 2020 at 20:53, Aravinda VK  wrote:
> >
> >> Looks like setup issue to me. Copying SSH keys manually is not
> >required.
> >>
> >> Command prefix is required while adding to authorized_keys file in
> >each
> >> remote nodes. That will not be available if ssh keys are added
> >manually.
> >>
> >> Geo-rep specifies /nonexisting/gsyncd in the command to make sure it
> >> connects via the actual command specified in authorized_keys file, in
> >your
> >> case Geo-replication is actually looking for gsyncd command in
> >> /nonexisting/gsyncd path.
> >>
> >> Please try with push-pem option during Geo-rep create command.
> >>
> >> —
> >> regards
> >> Aravinda Vishwanathapura
> >> https://kadalu.io
> >>
> >>
> >> On 02-Mar-2020, at 6:03 AM, David Cunningham
> >
> >> wrote:
> >>
> >> Hello,
> >>
> >> We've set up geo-replication but it isn't actually syncing. Scenario
> >is
> >> that we have two GFS clusters. Cluster A has nodes cafs10, cafs20,
> >and
> >> cafs30, replicating with each other over a LAN. Cluster B has nodes
> >nvfs10,
> >> nvfs20, and nvfs30 also replicating with each other over a LAN. We
> >are
> >> geo-replicating data from the A cluster to the B cluster over the
> >internet.
> >> SSH key access is set up, allowing all the A nodes password-less
> >access to
> >> root on nvfs10
> >>
> >> Geo-replication was set up using these commands, run on cafs10:
> >>
> >> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 create
> >> ssh-port 8822 no-verify
> >> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 config
> >> remote-gsyncd /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
> >> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 start
> >>
> >> However after a very short period of the status being
> >"Initializing..."
> >> the status then sits on "Passive":
> >>
> >> # gluster volume geo-replication gvol0 nvfs10.example.com::gvol0
> >status
> >> MASTER NODEMASTER VOLMASTER BRICK
> >SLAVE
> >> USERSLAVE SLAVE NODE  STATUS
> >CRAWL
> >> STATUSLAST_SYNCED
> >>
> >>
>
> >--
> >> cafs10 gvol0 /nodirectwritedata/gluster/gvol0root
> >>  nvfs10.example.com::gvol0nvfs30.localPassiveN/A
> >> N/A
> >> cafs30 gvol0 /nodirectwritedata/gluster/gvol0root
> >>  nvfs10.example.com::gvol0N/A CreatedN/A
> >> N/A
> >> cafs20 gvol0 /nodirectwritedata/gluster/gvol0root
> >>  nvfs10.example.com::gvol0N/A CreatedN/A
> >> N/A
> >>
> >> So my questions are:
> >> 1. Why does the status on cafs10 mention "nvfs30.local"? That's the
> >LAN
> >> address that nvfs10 replicates with nvfs30 using. It's not accessible
> >from
> >> the A cluster, and I didn't use it when configuring geo-replication.
> >> 2. Why does geo-replication sit in Passive status?
> >>
> >> Thanks very much for any assistance.
> >>
> >>
> >> On Tue, 25 Feb 2020 at 15:46, David Cunningham
> >
> >> wrote:
> >>
> >>> Hi Aravinda and Sunny,
> >>>
> >>> Thank you for the replies. We have 3 replicating nodes on the master
> >>> side, and want to geo-replicate their data to the remote slave side.
> >As I
> >>> understand it if the master node which had the geo-replication
> >create
> >>>

Re: [Gluster-users] Geo-replication

2020-03-02 Thread Strahil Nikolov
On March 3, 2020 4:13:38 AM GMT+02:00, David Cunningham 
 wrote:
>Hello,
>
>Thanks for that. When we re-tried with push-pem from cafs10 (on the
>A/master cluster) it failed with "Unable to mount and fetch slave
>volume
>details." and in the logs we see:
>
>[2020-03-03 02:07:42.614911] E
>[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-0: DNS
>resolution failed on host nvfs10.local
>[2020-03-03 02:07:42.638824] E
>[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-1: DNS
>resolution failed on host nvfs20.local
>[2020-03-03 02:07:42.664493] E
>[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-2: DNS
>resolution failed on host nvfs30.local
>
>These .local addresses are the LAN addresses that B/slave nodes nvfs10,
>nvfs20, and nvfs30 replicate with. It seems that the A/master needs to
>be
>able to contact those addresses. Is that right? If it is then we'll
>need to
>re-do the B cluster to replicate using publicly accessible IP addresses
>instead of their LAN.
>
>Thank you.
>
>
>On Mon, 2 Mar 2020 at 20:53, Aravinda VK  wrote:
>
>> Looks like setup issue to me. Copying SSH keys manually is not
>required.
>>
>> Command prefix is required while adding to authorized_keys file in
>each
>> remote nodes. That will not be available if ssh keys are added
>manually.
>>
>> Geo-rep specifies /nonexisting/gsyncd in the command to make sure it
>> connects via the actual command specified in authorized_keys file, in
>your
>> case Geo-replication is actually looking for gsyncd command in
>> /nonexisting/gsyncd path.
>>
>> Please try with push-pem option during Geo-rep create command.
>>
>> —
>> regards
>> Aravinda Vishwanathapura
>> https://kadalu.io
>>
>>
>> On 02-Mar-2020, at 6:03 AM, David Cunningham
>
>> wrote:
>>
>> Hello,
>>
>> We've set up geo-replication but it isn't actually syncing. Scenario
>is
>> that we have two GFS clusters. Cluster A has nodes cafs10, cafs20,
>and
>> cafs30, replicating with each other over a LAN. Cluster B has nodes
>nvfs10,
>> nvfs20, and nvfs30 also replicating with each other over a LAN. We
>are
>> geo-replicating data from the A cluster to the B cluster over the
>internet.
>> SSH key access is set up, allowing all the A nodes password-less
>access to
>> root on nvfs10
>>
>> Geo-replication was set up using these commands, run on cafs10:
>>
>> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 create
>> ssh-port 8822 no-verify
>> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 config
>> remote-gsyncd /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
>> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 start
>>
>> However after a very short period of the status being
>"Initializing..."
>> the status then sits on "Passive":
>>
>> # gluster volume geo-replication gvol0 nvfs10.example.com::gvol0
>status
>> MASTER NODEMASTER VOLMASTER BRICK   
>SLAVE
>> USERSLAVE SLAVE NODE  STATUS
>CRAWL
>> STATUSLAST_SYNCED
>>
>>
>--
>> cafs10 gvol0 /nodirectwritedata/gluster/gvol0root
>>  nvfs10.example.com::gvol0nvfs30.localPassiveN/A
>> N/A
>> cafs30 gvol0 /nodirectwritedata/gluster/gvol0root
>>  nvfs10.example.com::gvol0N/A CreatedN/A
>> N/A
>> cafs20 gvol0 /nodirectwritedata/gluster/gvol0root
>>  nvfs10.example.com::gvol0N/A CreatedN/A
>> N/A
>>
>> So my questions are:
>> 1. Why does the status on cafs10 mention "nvfs30.local"? That's the
>LAN
>> address that nvfs10 replicates with nvfs30 using. It's not accessible
>from
>> the A cluster, and I didn't use it when configuring geo-replication.
>> 2. Why does geo-replication sit in Passive status?
>>
>> Thanks very much for any assistance.
>>
>>
>> On Tue, 25 Feb 2020 at 15:46, David Cunningham
>
>> wrote:
>>
>>> Hi Aravinda and Sunny,
>>>
>>> Thank you for the replies. We have 3 replicating nodes on the master
>>> side, and want to geo-replicate their data to the remote slave side.
>As I
>>> understand it if the master node which had the geo-replication
>create
>>> command run goes down then another node will take over pushing
>updates to
>>> the remote slave. Is that right?
>>>
>>> We have already taken care of adding all master node's SSH keys to
>the
>>> remote slave's authorized_keys externally, so won't include the
>push-pem
>>> part of the create command.
>>>
>>> Mostly I wanted to confirm the geo-replication behaviour on the
>>> replicating master nodes if one of them goes down.
>>>
>>> Thank you!
>>>
>>>
>>> On Tue, 25 Feb 2020 at 14:32, Aravinda VK 
>wrote:
>>>
 Hi David,


 On 25-Feb-2020, at 3:45 AM, David Cunningham
>
 wrote:

 Hello,

 I've a couple of questions on geo-replication

Re: [Gluster-users] Geo-replication

2020-03-02 Thread David Cunningham
Hello,

Thanks for that. When we re-tried with push-pem from cafs10 (on the
A/master cluster) it failed with "Unable to mount and fetch slave volume
details." and in the logs we see:

[2020-03-03 02:07:42.614911] E
[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-0: DNS
resolution failed on host nvfs10.local
[2020-03-03 02:07:42.638824] E
[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-1: DNS
resolution failed on host nvfs20.local
[2020-03-03 02:07:42.664493] E
[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-2: DNS
resolution failed on host nvfs30.local

These .local addresses are the LAN addresses that B/slave nodes nvfs10,
nvfs20, and nvfs30 replicate with. It seems that the A/master needs to be
able to contact those addresses. Is that right? If it is then we'll need to
re-do the B cluster to replicate using publicly accessible IP addresses
instead of their LAN.

Thank you.


On Mon, 2 Mar 2020 at 20:53, Aravinda VK  wrote:

> Looks like setup issue to me. Copying SSH keys manually is not required.
>
> Command prefix is required while adding to authorized_keys file in each
> remote nodes. That will not be available if ssh keys are added manually.
>
> Geo-rep specifies /nonexisting/gsyncd in the command to make sure it
> connects via the actual command specified in authorized_keys file, in your
> case Geo-replication is actually looking for gsyncd command in
> /nonexisting/gsyncd path.
>
> Please try with push-pem option during Geo-rep create command.
>
> —
> regards
> Aravinda Vishwanathapura
> https://kadalu.io
>
>
> On 02-Mar-2020, at 6:03 AM, David Cunningham 
> wrote:
>
> Hello,
>
> We've set up geo-replication but it isn't actually syncing. Scenario is
> that we have two GFS clusters. Cluster A has nodes cafs10, cafs20, and
> cafs30, replicating with each other over a LAN. Cluster B has nodes nvfs10,
> nvfs20, and nvfs30 also replicating with each other over a LAN. We are
> geo-replicating data from the A cluster to the B cluster over the internet.
> SSH key access is set up, allowing all the A nodes password-less access to
> root on nvfs10
>
> Geo-replication was set up using these commands, run on cafs10:
>
> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 create
> ssh-port 8822 no-verify
> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 config
> remote-gsyncd /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 start
>
> However after a very short period of the status being "Initializing..."
> the status then sits on "Passive":
>
> # gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 status
> MASTER NODEMASTER VOLMASTER BRICKSLAVE
> USERSLAVE SLAVE NODE  STATUS CRAWL
> STATUSLAST_SYNCED
>
> --
> cafs10 gvol0 /nodirectwritedata/gluster/gvol0root
>  nvfs10.example.com::gvol0nvfs30.localPassiveN/A
> N/A
> cafs30 gvol0 /nodirectwritedata/gluster/gvol0root
>  nvfs10.example.com::gvol0N/A CreatedN/A
> N/A
> cafs20 gvol0 /nodirectwritedata/gluster/gvol0root
>  nvfs10.example.com::gvol0N/A CreatedN/A
> N/A
>
> So my questions are:
> 1. Why does the status on cafs10 mention "nvfs30.local"? That's the LAN
> address that nvfs10 replicates with nvfs30 using. It's not accessible from
> the A cluster, and I didn't use it when configuring geo-replication.
> 2. Why does geo-replication sit in Passive status?
>
> Thanks very much for any assistance.
>
>
> On Tue, 25 Feb 2020 at 15:46, David Cunningham 
> wrote:
>
>> Hi Aravinda and Sunny,
>>
>> Thank you for the replies. We have 3 replicating nodes on the master
>> side, and want to geo-replicate their data to the remote slave side. As I
>> understand it if the master node which had the geo-replication create
>> command run goes down then another node will take over pushing updates to
>> the remote slave. Is that right?
>>
>> We have already taken care of adding all master node's SSH keys to the
>> remote slave's authorized_keys externally, so won't include the push-pem
>> part of the create command.
>>
>> Mostly I wanted to confirm the geo-replication behaviour on the
>> replicating master nodes if one of them goes down.
>>
>> Thank you!
>>
>>
>> On Tue, 25 Feb 2020 at 14:32, Aravinda VK  wrote:
>>
>>> Hi David,
>>>
>>>
>>> On 25-Feb-2020, at 3:45 AM, David Cunningham 
>>> wrote:
>>>
>>> Hello,
>>>
>>> I've a couple of questions on geo-replication that hopefully someone can
>>> help with:
>>>
>>> 1. If there are multiple nodes in a cluster on the master side (pushing
>>> updates to the geo-replication slave), which node actually does the
>>> pushing? Does Gluster

Re: [Gluster-users] Geo-replication

2020-03-01 Thread Aravinda VK
Looks like setup issue to me. Copying SSH keys manually is not required. 

Command prefix is required while adding to authorized_keys file in each remote 
nodes. That will not be available if ssh keys are added manually.

Geo-rep specifies /nonexisting/gsyncd in the command to make sure it connects 
via the actual command specified in authorized_keys file, in your case 
Geo-replication is actually looking for gsyncd command in /nonexisting/gsyncd 
path.

Please try with push-pem option during Geo-rep create command.

—
regards
Aravinda Vishwanathapura
https://kadalu .io


> On 02-Mar-2020, at 6:03 AM, David Cunningham  
> wrote:
> 
> Hello,
> 
> We've set up geo-replication but it isn't actually syncing. Scenario is that 
> we have two GFS clusters. Cluster A has nodes cafs10, cafs20, and cafs30, 
> replicating with each other over a LAN. Cluster B has nodes nvfs10, nvfs20, 
> and nvfs30 also replicating with each other over a LAN. We are 
> geo-replicating data from the A cluster to the B cluster over the internet. 
> SSH key access is set up, allowing all the A nodes password-less access to 
> root on nvfs10
> 
> Geo-replication was set up using these commands, run on cafs10:
> 
> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 create 
> ssh-port 8822 no-verify
> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 config 
> remote-gsyncd /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 start
> 
> However after a very short period of the status being "Initializing..." the 
> status then sits on "Passive":
> 
> # gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 status 
> MASTER NODEMASTER VOLMASTER BRICKSLAVE USER   
>  SLAVE SLAVE NODE  STATUS CRAWL STATUS
> LAST_SYNCED  
> --
> cafs10 gvol0 /nodirectwritedata/gluster/gvol0root 
>  nvfs10.example.com::gvol0nvfs30.localPassiveN/A N/A  
> 
> cafs30 gvol0 /nodirectwritedata/gluster/gvol0root 
>  nvfs10.example.com::gvol0N/A CreatedN/A N/A  
> 
> cafs20 gvol0 /nodirectwritedata/gluster/gvol0root 
>  nvfs10.example.com::gvol0N/A CreatedN/A N/A  
>
> 
> So my questions are:
> 1. Why does the status on cafs10 mention "nvfs30.local"? That's the LAN 
> address that nvfs10 replicates with nvfs30 using. It's not accessible from 
> the A cluster, and I didn't use it when configuring geo-replication.
> 2. Why does geo-replication sit in Passive status?
> 
> Thanks very much for any assistance.
> 
> 
> On Tue, 25 Feb 2020 at 15:46, David Cunningham  > wrote:
> Hi Aravinda and Sunny,
> 
> Thank you for the replies. We have 3 replicating nodes on the master side, 
> and want to geo-replicate their data to the remote slave side. As I 
> understand it if the master node which had the geo-replication create command 
> run goes down then another node will take over pushing updates to the remote 
> slave. Is that right?
> 
> We have already taken care of adding all master node's SSH keys to the remote 
> slave's authorized_keys externally, so won't include the push-pem part of the 
> create command.
> 
> Mostly I wanted to confirm the geo-replication behaviour on the replicating 
> master nodes if one of them goes down. 
> 
> Thank you!
> 
> 
> On Tue, 25 Feb 2020 at 14:32, Aravinda VK  > wrote:
> Hi David,
> 
> 
>> On 25-Feb-2020, at 3:45 AM, David Cunningham > > wrote:
>> 
>> Hello,
>> 
>> I've a couple of questions on geo-replication that hopefully someone can 
>> help with:
>> 
>> 1. If there are multiple nodes in a cluster on the master side (pushing 
>> updates to the geo-replication slave), which node actually does the pushing? 
>> Does GlusterFS decide itself automatically?
> 
> Once Geo-replication session is started, one worker will be started 
> corresponding to each Master bricks. Each worker identifies the changes 
> happened in respective brick and sync those changes via Mount. This way load 
> is distributed among Master nodes. In case of Replica sub volume, one worker 
> among the Replica group will become active and participate in the syncing. 
> Other bricks in that Replica group will remain Passive. Passive worker will 
> become Active if the previously Active brick goes down (This is because all 
> Replica bricks will have the same set of changes, syncing from each worker is 
> redundant).
> 
>> 
>> 2.With regard to copying SSH keys, presumably the SSH key of all master 
>> nodes should be authorized on the geo-re

Re: [Gluster-users] Geo-replication

2020-03-01 Thread Strahil Nikolov
On March 2, 2020 2:33:10 AM GMT+02:00, David Cunningham 
 wrote:
>Hello,
>
>We've set up geo-replication but it isn't actually syncing. Scenario is
>that we have two GFS clusters. Cluster A has nodes cafs10, cafs20, and
>cafs30, replicating with each other over a LAN. Cluster B has nodes
>nvfs10,
>nvfs20, and nvfs30 also replicating with each other over a LAN. We are
>geo-replicating data from the A cluster to the B cluster over the
>internet.
>SSH key access is set up, allowing all the A nodes password-less access
>to
>root on nvfs10
>
>Geo-replication was set up using these commands, run on cafs10:
>
>gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 create
>ssh-port 8822 no-verify
>gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 config
>remote-gsyncd /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
>gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 start
>
>However after a very short period of the status being "Initializing..."
>the
>status then sits on "Passive":
>
># gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 status
>MASTER NODEMASTER VOLMASTER BRICKSLAVE
>USER
>  SLAVE SLAVE NODE  STATUS CRAWL STATUS
> LAST_SYNCED
>--
>cafs10 gvol0 /nodirectwritedata/gluster/gvol0root
>   nvfs10.example.com::gvol0nvfs30.localPassiveN/A
>N/A
>cafs30 gvol0 /nodirectwritedata/gluster/gvol0root
>   nvfs10.example.com::gvol0N/A CreatedN/A
>N/A
>cafs20 gvol0 /nodirectwritedata/gluster/gvol0root
>   nvfs10.example.com::gvol0N/A CreatedN/A
>N/A
>
>So my questions are:
>1. Why does the status on cafs10 mention "nvfs30.local"? That's the LAN
>address that nvfs10 replicates with nvfs30 using. It's not accessible
>from
>the A cluster, and I didn't use it when configuring geo-replication.
>2. Why does geo-replication sit in Passive status?
>
>Thanks very much for any assistance.
>
>
>On Tue, 25 Feb 2020 at 15:46, David Cunningham
>
>wrote:
>
>> Hi Aravinda and Sunny,
>>
>> Thank you for the replies. We have 3 replicating nodes on the master
>side,
>> and want to geo-replicate their data to the remote slave side. As I
>> understand it if the master node which had the geo-replication create
>> command run goes down then another node will take over pushing
>updates to
>> the remote slave. Is that right?
>>
>> We have already taken care of adding all master node's SSH keys to
>the
>> remote slave's authorized_keys externally, so won't include the
>push-pem
>> part of the create command.
>>
>> Mostly I wanted to confirm the geo-replication behaviour on the
>> replicating master nodes if one of them goes down.
>>
>> Thank you!
>>
>>
>> On Tue, 25 Feb 2020 at 14:32, Aravinda VK  wrote:
>>
>>> Hi David,
>>>
>>>
>>> On 25-Feb-2020, at 3:45 AM, David Cunningham
>
>>> wrote:
>>>
>>> Hello,
>>>
>>> I've a couple of questions on geo-replication that hopefully someone
>can
>>> help with:
>>>
>>> 1. If there are multiple nodes in a cluster on the master side
>(pushing
>>> updates to the geo-replication slave), which node actually does the
>>> pushing? Does GlusterFS decide itself automatically?
>>>
>>>
>>> Once Geo-replication session is started, one worker will be started
>>> corresponding to each Master bricks. Each worker identifies the
>changes
>>> happened in respective brick and sync those changes via Mount. This
>way
>>> load is distributed among Master nodes. In case of Replica sub
>volume, one
>>> worker among the Replica group will become active and participate in
>the
>>> syncing. Other bricks in that Replica group will remain Passive.
>Passive
>>> worker will become Active if the previously Active brick goes down
>(This is
>>> because all Replica bricks will have the same set of changes,
>syncing from
>>> each worker is redundant).
>>>
>>>
>>> 2.With regard to copying SSH keys, presumably the SSH key of all
>master
>>> nodes should be authorized on the geo-replication client side?
>>>
>>>
>>> Geo-replication session is established between one master node and
>one
>>> remote node. If Geo-rep create command is successful then,
>>>
>>> - SSH keys generated in all master nodes
>>> - Public keys from all master nodes are copied to initiator Master
>node
>>> - Public keys copied to the Remote node specified in the create
>command
>>> - Master public keys are distributed to all nodes of remote Cluster
>and
>>> added to respective ~/.ssh/authorized_keys
>>>
>>> After successful Geo-rep create command, any Master node can connect
>to
>>> any remote node via ssh.
>>>
>>> Security: Command prefix is added while adding public key to remote
>>> node’s authorized_keys file, So that if anyone gain access using
>this key
>>> can access only gsyncd command.
>>>
>>> 

Re: [Gluster-users] Geo-replication

2020-03-01 Thread David Cunningham
Hello,

We've set up geo-replication but it isn't actually syncing. Scenario is
that we have two GFS clusters. Cluster A has nodes cafs10, cafs20, and
cafs30, replicating with each other over a LAN. Cluster B has nodes nvfs10,
nvfs20, and nvfs30 also replicating with each other over a LAN. We are
geo-replicating data from the A cluster to the B cluster over the internet.
SSH key access is set up, allowing all the A nodes password-less access to
root on nvfs10

Geo-replication was set up using these commands, run on cafs10:

gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 create
ssh-port 8822 no-verify
gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 config
remote-gsyncd /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 start

However after a very short period of the status being "Initializing..." the
status then sits on "Passive":

# gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 status
MASTER NODEMASTER VOLMASTER BRICKSLAVE USER
   SLAVE SLAVE NODE  STATUS CRAWL STATUS
 LAST_SYNCED
--
cafs10 gvol0 /nodirectwritedata/gluster/gvol0root
   nvfs10.example.com::gvol0nvfs30.localPassiveN/A
N/A
cafs30 gvol0 /nodirectwritedata/gluster/gvol0root
   nvfs10.example.com::gvol0N/A CreatedN/A
N/A
cafs20 gvol0 /nodirectwritedata/gluster/gvol0root
   nvfs10.example.com::gvol0N/A CreatedN/A
N/A

So my questions are:
1. Why does the status on cafs10 mention "nvfs30.local"? That's the LAN
address that nvfs10 replicates with nvfs30 using. It's not accessible from
the A cluster, and I didn't use it when configuring geo-replication.
2. Why does geo-replication sit in Passive status?

Thanks very much for any assistance.


On Tue, 25 Feb 2020 at 15:46, David Cunningham 
wrote:

> Hi Aravinda and Sunny,
>
> Thank you for the replies. We have 3 replicating nodes on the master side,
> and want to geo-replicate their data to the remote slave side. As I
> understand it if the master node which had the geo-replication create
> command run goes down then another node will take over pushing updates to
> the remote slave. Is that right?
>
> We have already taken care of adding all master node's SSH keys to the
> remote slave's authorized_keys externally, so won't include the push-pem
> part of the create command.
>
> Mostly I wanted to confirm the geo-replication behaviour on the
> replicating master nodes if one of them goes down.
>
> Thank you!
>
>
> On Tue, 25 Feb 2020 at 14:32, Aravinda VK  wrote:
>
>> Hi David,
>>
>>
>> On 25-Feb-2020, at 3:45 AM, David Cunningham 
>> wrote:
>>
>> Hello,
>>
>> I've a couple of questions on geo-replication that hopefully someone can
>> help with:
>>
>> 1. If there are multiple nodes in a cluster on the master side (pushing
>> updates to the geo-replication slave), which node actually does the
>> pushing? Does GlusterFS decide itself automatically?
>>
>>
>> Once Geo-replication session is started, one worker will be started
>> corresponding to each Master bricks. Each worker identifies the changes
>> happened in respective brick and sync those changes via Mount. This way
>> load is distributed among Master nodes. In case of Replica sub volume, one
>> worker among the Replica group will become active and participate in the
>> syncing. Other bricks in that Replica group will remain Passive. Passive
>> worker will become Active if the previously Active brick goes down (This is
>> because all Replica bricks will have the same set of changes, syncing from
>> each worker is redundant).
>>
>>
>> 2.With regard to copying SSH keys, presumably the SSH key of all master
>> nodes should be authorized on the geo-replication client side?
>>
>>
>> Geo-replication session is established between one master node and one
>> remote node. If Geo-rep create command is successful then,
>>
>> - SSH keys generated in all master nodes
>> - Public keys from all master nodes are copied to initiator Master node
>> - Public keys copied to the Remote node specified in the create command
>> - Master public keys are distributed to all nodes of remote Cluster and
>> added to respective ~/.ssh/authorized_keys
>>
>> After successful Geo-rep create command, any Master node can connect to
>> any remote node via ssh.
>>
>> Security: Command prefix is added while adding public key to remote
>> node’s authorized_keys file, So that if anyone gain access using this key
>> can access only gsyncd command.
>>
>> ```
>> command=gsyncd ssh-key….
>> ```
>>
>>
>>
>> Thanks for your help.
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>

Re: [Gluster-users] Geo-replication

2020-02-24 Thread David Cunningham
Hi Aravinda and Sunny,

Thank you for the replies. We have 3 replicating nodes on the master side,
and want to geo-replicate their data to the remote slave side. As I
understand it if the master node which had the geo-replication create
command run goes down then another node will take over pushing updates to
the remote slave. Is that right?

We have already taken care of adding all master node's SSH keys to the
remote slave's authorized_keys externally, so won't include the push-pem
part of the create command.

Mostly I wanted to confirm the geo-replication behaviour on the replicating
master nodes if one of them goes down.

Thank you!


On Tue, 25 Feb 2020 at 14:32, Aravinda VK  wrote:

> Hi David,
>
>
> On 25-Feb-2020, at 3:45 AM, David Cunningham 
> wrote:
>
> Hello,
>
> I've a couple of questions on geo-replication that hopefully someone can
> help with:
>
> 1. If there are multiple nodes in a cluster on the master side (pushing
> updates to the geo-replication slave), which node actually does the
> pushing? Does GlusterFS decide itself automatically?
>
>
> Once Geo-replication session is started, one worker will be started
> corresponding to each Master bricks. Each worker identifies the changes
> happened in respective brick and sync those changes via Mount. This way
> load is distributed among Master nodes. In case of Replica sub volume, one
> worker among the Replica group will become active and participate in the
> syncing. Other bricks in that Replica group will remain Passive. Passive
> worker will become Active if the previously Active brick goes down (This is
> because all Replica bricks will have the same set of changes, syncing from
> each worker is redundant).
>
>
> 2.With regard to copying SSH keys, presumably the SSH key of all master
> nodes should be authorized on the geo-replication client side?
>
>
> Geo-replication session is established between one master node and one
> remote node. If Geo-rep create command is successful then,
>
> - SSH keys generated in all master nodes
> - Public keys from all master nodes are copied to initiator Master node
> - Public keys copied to the Remote node specified in the create command
> - Master public keys are distributed to all nodes of remote Cluster and
> added to respective ~/.ssh/authorized_keys
>
> After successful Geo-rep create command, any Master node can connect to
> any remote node via ssh.
>
> Security: Command prefix is added while adding public key to remote node’s
> authorized_keys file, So that if anyone gain access using this key can
> access only gsyncd command.
>
> ```
> command=gsyncd ssh-key….
> ```
>
>
>
> Thanks for your help.
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> —
> regards
> Aravinda Vishwanathapura
> https://kadalu.io
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782




Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication

2020-02-24 Thread Aravinda VK
Hi David,


> On 25-Feb-2020, at 3:45 AM, David Cunningham  
> wrote:
> 
> Hello,
> 
> I've a couple of questions on geo-replication that hopefully someone can help 
> with:
> 
> 1. If there are multiple nodes in a cluster on the master side (pushing 
> updates to the geo-replication slave), which node actually does the pushing? 
> Does GlusterFS decide itself automatically?

Once Geo-replication session is started, one worker will be started 
corresponding to each Master bricks. Each worker identifies the changes 
happened in respective brick and sync those changes via Mount. This way load is 
distributed among Master nodes. In case of Replica sub volume, one worker among 
the Replica group will become active and participate in the syncing. Other 
bricks in that Replica group will remain Passive. Passive worker will become 
Active if the previously Active brick goes down (This is because all Replica 
bricks will have the same set of changes, syncing from each worker is 
redundant).

> 
> 2.With regard to copying SSH keys, presumably the SSH key of all master nodes 
> should be authorized on the geo-replication client side?

Geo-replication session is established between one master node and one remote 
node. If Geo-rep create command is successful then,

- SSH keys generated in all master nodes
- Public keys from all master nodes are copied to initiator Master node
- Public keys copied to the Remote node specified in the create command
- Master public keys are distributed to all nodes of remote Cluster and added 
to respective ~/.ssh/authorized_keys

After successful Geo-rep create command, any Master node can connect to any 
remote node via ssh.

Security: Command prefix is added while adding public key to remote node’s 
authorized_keys file, So that if anyone gain access using this key can access 
only gsyncd command.

```
command=gsyncd ssh-key….
```


> 
> Thanks for your help.
> 
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/ 
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
> 
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
> 
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


—
regards
Aravinda Vishwanathapura
https://kadalu .io





Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication /var/lib space question

2020-02-13 Thread Kotresh Hiremath Ravishankar
All  '.processed' directories (under working_dir and working_dir/.history)
contain processed changelogs and is no longer required by geo-replication
apart from debugging
purposes. That directory can be cleaned up if it's consuming too much space.

On Wed, Feb 12, 2020 at 11:36 PM Sunny Kumar  wrote:

> Hi Alexander,
>
> Yes that is geo-replication working directory and you can run the
> below command to get the location.
>   gluster vol geo-rep  :: config
> working_dir
>
> This directory contains parsed changelogs from backend brick which are
> ready to be processed. After a batch is processed it will be
> automatically cleaned up for next batch processing.
> It does not depends on volume size but on config value "
> changelog-batch-size " the max size of changelogs to process per
> batch.
>
> /sunny
>
> On Mon, Feb 10, 2020 at 11:07 PM Alexander Iliev
>  wrote:
> >
> > Hello list,
> >
> > I have been running a geo-replication session for some time now, but at
> > some point I noticed that the /var/lib/misc/gluster is eating up the
> > storage on my root partition.
> >
> > I moved the folder away to another partition, but I don't seem to
> > remember reading any specific space requirement for /var/lib and
> > geo-replication. Did I miss it in the documentation?
> >
> > Also, does the space used in /var/lib/misc/gluster depend on the
> > geo-replicated volume size? What exactly is stored there? (I'm guessing
> > that's where gsyncd keeps track of the replicatation progress.)
> >
> > (I'm running gluster 6.6 on CentOS 7.7.)
> >
> > Thanks!
> > --
> > alexander iliev
> > 
> >
> > Community Meeting Calendar:
> >
> > APAC Schedule -
> > Every 2nd and 4th Tuesday at 11:30 AM IST
> > Bridge: https://bluejeans.com/441850968
> >
> > NA/EMEA Schedule -
> > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > Bridge: https://bluejeans.com/441850968
> >
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
>
> 
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
Thanks and Regards,
Kotresh H R


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-replication /var/lib space question

2020-02-12 Thread Sunny Kumar
Hi Alexander,

Yes that is geo-replication working directory and you can run the
below command to get the location.
  gluster vol geo-rep  :: config working_dir

This directory contains parsed changelogs from backend brick which are
ready to be processed. After a batch is processed it will be
automatically cleaned up for next batch processing.
It does not depends on volume size but on config value "
changelog-batch-size " the max size of changelogs to process per
batch.

/sunny

On Mon, Feb 10, 2020 at 11:07 PM Alexander Iliev
 wrote:
>
> Hello list,
>
> I have been running a geo-replication session for some time now, but at
> some point I noticed that the /var/lib/misc/gluster is eating up the
> storage on my root partition.
>
> I moved the folder away to another partition, but I don't seem to
> remember reading any specific space requirement for /var/lib and
> geo-replication. Did I miss it in the documentation?
>
> Also, does the space used in /var/lib/misc/gluster depend on the
> geo-replicated volume size? What exactly is stored there? (I'm guessing
> that's where gsyncd keeps track of the replicatation progress.)
>
> (I'm running gluster 6.6 on CentOS 7.7.)
>
> Thanks!
> --
> alexander iliev
> 
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>



Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-Replication Issue while upgrading

2019-11-29 Thread Sunny Kumar
Thanks Deepu.

I will investigate this can you just summarize the steps which can be
helpful in reproducing this issue.

/sunny

On Fri, Nov 29, 2019 at 7:29 AM deepu srinivasan  wrote:
>
> Hi Sunny
> The issue seems to be a bug.
> The issue got fixed when I restarted the glusterd daemon in the slave 
> machines. The logs in the slave end reported that the mount-broker folder was 
> not in the vol file. So when I restarted the machine it got fixed.
> This might be some race condition.
>
> On Thu, Nov 28, 2019 at 9:00 PM deepu srinivasan  wrote:
>>
>> Hi Sunny
>> I Also got this error in slave end
>>>
>>> [2019-11-28 15:30:12.520461] I [resource(slave 
>>> 192.168.185.89/home/sas/gluster/data/code-misc):1105:connect] GLUSTER: 
>>> Mounting gluster volume locally...
>>>
>>> [2019-11-28 15:30:12.649425] E [resource(slave 
>>> 192.168.185.89/home/sas/gluster/data/code-misc):1013:handle_mounter] 
>>> MountbrokerMounter: glusterd answered   mnt=
>>>
>>> [2019-11-28 15:30:12.650573] E [syncdutils(slave 
>>> 192.168.185.89/home/sas/gluster/data/code-misc):805:errlog] Popen: command 
>>> returned error  cmd=/usr/sbin/gluster --remote-host=localhost system:: 
>>> mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO 
>>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.118_code-misc/mnt-192.168.185.89-home-sas-gluster-data-code-misc.log
>>>  volfile-server=localhost volfile-id=code-misc client-pid=-1  error=1
>>>
>>> [2019-11-28 15:30:12.650742] E [syncdutils(slave 
>>> 192.168.185.89/home/sas/gluster/data/code-misc):809:logerr] Popen: 
>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory)
>>
>>
>> On Thu, Nov 28, 2019 at 6:45 PM deepu srinivasan  wrote:
>>>
>>> root@192.168.185.101/var/log/glusterfs#ssh -oPasswordAuthentication=no 
>>> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem 
>>> -p 22 sas@192.168.185.118 "sudo gluster volume status"
>>>
>>> **
>>>
>>> WARNING: This system is a restricted access system.  All activity on this 
>>> system is subject to monitoring.  If information collected reveals possible 
>>> criminal activity or activity that exceeds privileges, evidence of such 
>>> activity may be providedto the relevant authorities for further action.
>>>
>>> By continuing past this point, you expressly consent to   this monitoring
>>>
>>> **
>>>
>>> invoking sudo in restricted SSH session is not allowed
>>>
>>>
>>> On Thu, Nov 28, 2019 at 6:04 PM Sunny Kumar  wrote:

 Hi Deepu,

 Can you try this:

 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
 /var/lib/glusterd/geo-replication/secret.pem -p 22
 sas@192.168.185.118 "sudo gluster volume status"

 /sunny


 On Thu, Nov 28, 2019 at 12:14 PM deepu srinivasan  
 wrote:
 >>
 >> MASTER NODEMASTER VOLMASTER BRICK
 >> SLAVE USERSLAVE SLAVE NODE 
 >> STATUS CRAWL STATUSLAST_SYNCED
 >>
 >> -
 >>
 >> 192.168.185.89 code-misc /home/sas/gluster/data/code-misc
 >> sas   sas@192.168.185.118::code-miscN/A
 >> Faulty N/A N/A
 >>
 >> 192.168.185.101code-misc /home/sas/gluster/data/code-misc
 >> sas   sas@192.168.185.118::code-misc192.168.185.118
 >> PassiveN/A N/A
 >>
 >> 192.168.185.93 code-misc /home/sas/gluster/data/code-misc
 >> sas   sas@192.168.185.118::code-miscN/A
 >> Faulty N/A N/A
 >
 >
 > On Thu, Nov 28, 2019 at 5:43 PM deepu srinivasan  
 > wrote:
 >>
 >> I Think its configured properly. Should i check something else..
 >>
 >> root@192.168.185.89/var/log/glusterfs#ssh sas@192.168.185.118 "sudo 
 >> gluster volume info"
 >>
 >> **
 >>
 >> WARNING: This system is a restricted access system.  All activity on 
 >> this system is subject to monitoring.  If information collected reveals 
 >> possible criminal activity or activity that exceeds privileges, 
 >> evidence of such activity may be providedto the relevant authorities 
 >> for further action.
 >>
 >> By continuing past this point, you expressly consent to   this 
 >> monitoring.-
 >>
 >> 

Re: [Gluster-users] Geo-Replication Issue while upgrading

2019-11-28 Thread Sunny Kumar
Hi Deepu,

Can you try this:

ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22
sas@192.168.185.118 "sudo gluster volume status"

/sunny


On Thu, Nov 28, 2019 at 12:14 PM deepu srinivasan  wrote:
>>
>> MASTER NODEMASTER VOLMASTER BRICKSLAVE 
>> USERSLAVE SLAVE NODE STATUS 
>> CRAWL STATUSLAST_SYNCED
>>
>> -
>>
>> 192.168.185.89 code-misc /home/sas/gluster/data/code-miscsas 
>>   sas@192.168.185.118::code-miscN/AFaulty N/A
>>  N/A
>>
>> 192.168.185.101code-misc /home/sas/gluster/data/code-miscsas 
>>   sas@192.168.185.118::code-misc192.168.185.118PassiveN/A
>>  N/A
>>
>> 192.168.185.93 code-misc /home/sas/gluster/data/code-miscsas 
>>   sas@192.168.185.118::code-miscN/AFaulty N/A
>>  N/A
>
>
> On Thu, Nov 28, 2019 at 5:43 PM deepu srinivasan  wrote:
>>
>> I Think its configured properly. Should i check something else..
>>
>> root@192.168.185.89/var/log/glusterfs#ssh sas@192.168.185.118 "sudo gluster 
>> volume info"
>>
>> **
>>
>> WARNING: This system is a restricted access system.  All activity on this 
>> system is subject to monitoring.  If information collected reveals possible 
>> criminal activity or activity that exceeds privileges, evidence of such 
>> activity may be providedto the relevant authorities for further action.
>>
>> By continuing past this point, you expressly consent to   this monitoring.-
>>
>> **
>>
>>
>>
>> Volume Name: code-misc
>>
>> Type: Replicate
>>
>> Volume ID: e9b6fbed-fcd0-42a9-ab11-02ec39c2ee07
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 3 = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: 192.168.185.118:/home/sas/gluster/data/code-misc
>>
>> Brick2: 192.168.185.45:/home/sas/gluster/data/code-misc
>>
>> Brick3: 192.168.185.84:/home/sas/gluster/data/code-misc
>>
>> Options Reconfigured:
>>
>> features.read-only: enable
>>
>> transport.address-family: inet
>>
>> nfs.disable: on
>>
>> performance.client-io-threads: off
>>
>>
>> On Thu, Nov 28, 2019 at 5:40 PM Sunny Kumar  wrote:
>>>
>>> Hi Deepu,
>>>
>>> Looks like this is error generated due to ssh restrictions:
>>> Can you please check and confirm ssh is properly configured?
>>>
>>>
>>> 2019-11-28 11:59:12.934436] E [syncdutils(worker
>>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
>>> **
>>>
>>> [2019-11-28 11:59:12.934703] E [syncdutils(worker
>>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING:
>>> This system is a restricted access system.  All activity on this
>>> system is subject to monitoring.  If information collected reveals
>>> possible criminal activity or activity that exceeds privileges,
>>> evidence of such activity may be providedto the relevant authorities
>>> for further action.
>>>
>>> [2019-11-28 11:59:12.934967] E [syncdutils(worker
>>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By
>>> continuing past this point, you expressly consent to   this
>>> monitoring.- ZOHO Corporation
>>>
>>> [2019-11-28 11:59:12.935194] E [syncdutils(worker
>>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
>>> **
>>>
>>> 2019-11-28 11:59:12.944369] I [repce(agent
>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer:
>>> terminating on reaching EOF.
>>>
>>> /sunny
>>>
>>> On Thu, Nov 28, 2019 at 12:03 PM deepu srinivasan  
>>> wrote:
>>> >
>>> >
>>> >
>>> > -- Forwarded message -
>>> > From: deepu srinivasan 
>>> > Date: Thu, Nov 28, 2019 at 5:32 PM
>>> > Subject: Geo-Replication Issue while upgrading
>>> > To: gluster-users 
>>> >
>>> >
>>> > Hi Users/Developers
>>> > I hope you remember the last issue we faced regarding the geo-replication 
>>> > goes to the faulty state while stopping and starting the geo-replication.
>>> >>
>>> >> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker 
>>> >> /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker 
>>> >> Status Change   status=Active
>>> >> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker 
>>> >> /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status] 
>>> >> GeorepStatus: Crawl Sta

Re: [Gluster-users] Geo-Replication Issue while upgrading

2019-11-28 Thread Sunny Kumar
Hi Deepu,

Looks like this is error generated due to ssh restrictions:
Can you please check and confirm ssh is properly configured?


2019-11-28 11:59:12.934436] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
**

[2019-11-28 11:59:12.934703] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING:
This system is a restricted access system.  All activity on this
system is subject to monitoring.  If information collected reveals
possible criminal activity or activity that exceeds privileges,
evidence of such activity may be providedto the relevant authorities
for further action.

[2019-11-28 11:59:12.934967] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By
continuing past this point, you expressly consent to   this
monitoring.- ZOHO Corporation

[2019-11-28 11:59:12.935194] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
**

2019-11-28 11:59:12.944369] I [repce(agent
/home/sas/gluster/data/code-misc):97:service_loop] RepceServer:
terminating on reaching EOF.

/sunny

On Thu, Nov 28, 2019 at 12:03 PM deepu srinivasan  wrote:
>
>
>
> -- Forwarded message -
> From: deepu srinivasan 
> Date: Thu, Nov 28, 2019 at 5:32 PM
> Subject: Geo-Replication Issue while upgrading
> To: gluster-users 
>
>
> Hi Users/Developers
> I hope you remember the last issue we faced regarding the geo-replication 
> goes to the faulty state while stopping and starting the geo-replication.
>>
>> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker 
>> /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker 
>> Status Change   status=Active
>> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker 
>> /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status] 
>> GeorepStatus: Crawl Status Change   status=History Crawl
>> [2019-11-16 17:29:43.630328] I [master(worker 
>> /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history 
>> crawl   turns=1 stime=(1573924576, 0)   entry_stime=(1573924576, 0) 
>> etime=1573925383
>> [2019-11-16 17:29:44.636725] I [master(worker 
>> /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time 
>> stime=(1573924576, 0)
>> [2019-11-16 17:29:44.778966] I [master(worker 
>> /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures] 
>> _GMaster: Fixing ENOENT error in slave. Parent does not exist on master. 
>> Safe to ignore, take out entry   retry_count=1   entry=({'uid': 0, 
>> 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3', 'gid': 0, 'mode': 33188, 
>> 'entry': '.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op': 
>> 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': 
>> None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
>> [2019-11-16 17:29:44.779306] I [master(worker 
>> /home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster: 
>> Sucessfully fixed entry ops with gfid mismatchretry_count=1
>> [2019-11-16 17:29:44.779516] I [master(worker 
>> /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry 
>> original entries. count = 1
>> [2019-11-16 17:29:44.879321] E [repce(worker 
>> /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed  
>> call=151945:140353273153344:1573925384.78   method=entry_ops
>> error=OSError
>> [2019-11-16 17:29:44.879750] E [syncdutils(worker 
>> /home/sas/gluster/data/code-misc6):338:log_raise_exception] : FAIL:
>> Traceback (most recent call last):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 322, in 
>> main
>> func(args)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 82, in 
>> subcmd_worker
>> local.service_loop(remote)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1277, in 
>> service_loop
>> g3.crawlwrap(oneshot=True)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 599, in 
>> crawlwrap
>> self.crawl()
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1555, in 
>> crawl
>> self.changelogs_batch_process(changes)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1455, in 
>> changelogs_batch_process
>> self.process(batch)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1290, in 
>> process
>> self.process_change(change, done, retry)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1195, in 
>> process_change
>> failures = self.slave.server.entry_ops(entries)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 233, in 
>> __call__
>> return self.ins(self.meth, *a)
>>   File "/usr/libexec/glusterfs/p

Re: [Gluster-users] Geo-Replication what is transferred

2019-09-04 Thread Strahil
As far as I know , when sharding is enabled - each shard will be synced 
separately, while a whole file  will be transferred when sharding is not 
enabled.

Is striping still supported ? I think sharding should be used.

Best Regards,
Strahil Nikolov
On Sep 3, 2019 23:47, Petric Frank  wrote:
>
> Hello, 
>
> given a geo-replicated file of 20 GBytes in size. 
>
> If one byte in this file is changed, what will be transferred ? 
> - the changed byte 
> - the block/sector the containing the changed byte 
> - the complete file 
>
> Is the storage mode relevant - sharded/striped/... ? 
>
> regards 
>   Petric 
>
>
>
> ___ 
> Gluster-users mailing list 
> Gluster-users@gluster.org 
> https://lists.gluster.org/mailman/listinfo/gluster-users 
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-Replication what is transferred

2019-09-03 Thread Aravinda Vishwanathapura Krishna Murthy
Hi Petric,

On Wed, Sep 4, 2019 at 2:23 AM Petric Frank  wrote:

> Hello,
>
> given a geo-replicated file of 20 GBytes in size.
>
> If one byte in this file is changed, what will be transferred ?
> - the changed byte
> - the block/sector the containing the changed byte
> - the complete file
>

Gluster Geo-replication uses Changelog for detecting the new or modified
files list after the last sync. Once the list of files is available, then
it passes that list of Rsync jobs. Which internally calculates the delta
changes and sync only those delta changes.

Note: Geo-replication syncs complete file if the use-tarssh option is set
to true.


>
> Is the storage mode relevant - sharded/striped/... ?
>

If the Master volume is sharded, then Remote/Slave volume also should be
sharded Volume. Geo-rep is intelligent to detect the changes to the sharded
files and sync only those files.


>
> regards
>   Petric
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
regards
Aravinda VK
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] geo-replication won't start

2019-09-03 Thread Shwetha Acharya
Hi Lucian,

Slave must be a gluster volume. Data from master volume gets replicated
into the slave volume after creation of the geo-rep session.

You can try creating the session again using the steps mentioned in this
link https://docs.gluster.org/en/latest/Administrator%20Guide/Geo
%20Replication/#creating-the-session.

Regards,
Shwetha

On Thu, Aug 22, 2019 at 9:51 PM Nux!  wrote:

> Hi,
>
> I'm trying for the first time ever the geo-replication feature and I am
> not having much success (CentOS7, gluster 6.5).
> First of all, from the docs I get the impression that I can
> geo-replicate over ssh to a simple dir, but it doesn't seem to be the
> case, the "slave" must be a gluster volume, doesn't it?
>
> Second, the slave host is not in the subnet with the other gluster
> peers, but I reckon this would be the usual case and not a problem.
>
> I've stopped the firewall on all peers and slave host to rule it out,
> but I can't get the georep started.
>
> Creation is successfull, however STATUS won't change from Created.
> I'm looking through all the logs and I can't see anything meaningful.
>
> What steps could I take to debug this further?
>
> Cheers,
> Lucian
>
>
> --
> Sent from the Delta quadrant using Borg technology!
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo Replication Failure: libgfchangelog.so: cannot open shared object file

2019-08-28 Thread Sunny Kumar
Thanks Andy for detailed answer.

Cédric,

If possible can you share your updated document/patch which can be
merged to gluster doc[1] and made available to everyone[2].

[1]. https://github.com/gluster/glusterdocs/
[2] https://docs.gluster.org/en/latest/
.
/sunny


On Wed, Aug 28, 2019 at 12:13 PM ROUVRAIS Cedric ResgBscRscDef
 wrote:
>
> Hi Andy,
>
>
>
> Thanks for your reply, I actually fixed the issue – shared library wasn’t 
> installed, so I reinstalled it.
>
>
>
> I realize that the documentation for geo replication would need more 
> information
>
>
>
> I will try a clean install on a new image and document it more precisely.
>
>
>
> Thanks,
>
>
>
> Cédric
>
>
>
>
>
> De : Andy Coates 
> Envoyé : mercredi 28 août 2019 05:18
> À : ROUVRAIS Cedric ResgBscRscDef 
> Cc : gluster-users@gluster.org
> Objet : Re: [Gluster-users] Geo Replication Failure: libgfchangelog.so: 
> cannot open shared object file
>
>
>
> We saw this with 4.1.x RPM on OEL (can't recall which specific version and 
> haven't checked if its fixed in later, at least up to 4.1.6), but the issue 
> seemed to be it just wasn't symlinked for some reason, so we symlinked 
> libgfchangelog.so to /lib64/libgfchangelog.so.0
>
>
>
> Not sure if the python code is meant to ask for libgfchangelog.so.0 in the 
> first place, or if the RPM is meant to symlink it post-install.  In 7.x the 
> script seems to use a different method for finding it too.
>
>
>
> Andy.
>
>
>
> On Tue, 27 Aug 2019 at 04:21, ROUVRAIS Cedric ResgBscRscDef 
>  wrote:
>
> Hello,
>
>
>
> Having some slight difficulties with GeoReplication across 2 servers where 
> the user is root on both sides: libgfchangelog.so: cannot open shared object 
> file: No such file or directory
>
>
>
> I get this error on the master node. I haven’t been able to get around it 
> (I’ve deleted and recreated the configuration ex-nihilo a couple of time to 
> no avail).
>
>
>
> [2019-08-26 17:43:24.213577] I [gsyncdstatus(monitor):248:set_worker_status] 
> GeorepStatus: Worker Status Change status=Initializing...
>
> [2019-08-26 17:43:24.213959] I [monitor(monitor):157:monitor] Monitor: 
> starting gsyncd worker   brick=/vol/gluster/fr/brick1/gv_master 
> slave_node=srv_slave
>
> [2019-08-26 17:43:24.259780] I [gsyncd(agent 
> /vol/gluster/fr/brick1/gv_master):309:main] : Using session config file  
> 
> path=/var/lib/glusterd/geo-replication/gv_master_srv_slave_gv_slave/gsyncd.conf
>
> [2019-08-26 17:43:24.261590] I [changelogagent(agent 
> /vol/gluster/fr/brick1/gv_master):72:__init__] ChangelogAgent: Agent 
> listining...
>
> [2019-08-26 17:43:24.266072] I [gsyncd(worker 
> /vol/gluster/fr/brick1/gv_master):309:main] : Using session config file  
>
> path=/var/lib/glusterd/geo-replication/gv_master_srv_slave_gv_slave/gsyncd.conf
>
> [2019-08-26 17:43:24.280029] I [resource(worker 
> /vol/gluster/fr/brick1/gv_master):1379:connect_remote] SSH: Initializing SSH 
> connection between master and slave...
>
> [2019-08-26 17:43:25.749739] I [resource(worker 
> /vol/gluster/fr/brick1/gv_master):1426:connect_remote] SSH: SSH connection 
> between master and slave established.   duration=1.4696
>
> [2019-08-26 17:43:25.749941] I [resource(worker 
> /vol/gluster/fr/brick1/gv_master):1098:connect] GLUSTER: Mounting gluster 
> volume locally...
>
> [2019-08-26 17:43:26.824810] I [resource(worker 
> /vol/gluster/fr/brick1/gv_master):1121:connect] GLUSTER: Mounted gluster 
> volumeduration=1.0747
>
> [2019-08-26 17:43:26.825088] I [subcmds(worker 
> /vol/gluster/fr/brick1/gv_master):80:subcmd_worker] : Worker spawn 
> successful. Acknowledging back to monitor
>
> [2019-08-26 17:43:26.888922] E [repce(agent 
> /vol/gluster/fr/brick1/gv_master):122:worker] : call failed:
>
> Traceback (most recent call last):
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in 
> worker
>
> res = getattr(self.obj, rmeth)(*in_data[2:])
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, 
> in init
>
> return Changes.cl_init()
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, 
> in __getattr__
>
> from libgfchangelog import Changes as LChanges
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 18, 
> in 
>
> class Changes(object):
>
>  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 20, 
> in Changes
>
> use_errno=True)
>
>   File "/usr/lib64/python2

Re: [Gluster-users] Geo Replication Failure: libgfchangelog.so: cannot open shared object file

2019-08-27 Thread ROUVRAIS Cedric ResgBscRscDef
Hi Andy,

Thanks for your reply, I actually fixed the issue – shared library wasn’t 
installed, so I reinstalled it.

I realize that the documentation for geo replication would need more 
information 😊

I will try a clean install on a new image and document it more precisely.

Thanks,

Cédric


De : Andy Coates 
Envoyé : mercredi 28 août 2019 05:18
À : ROUVRAIS Cedric ResgBscRscDef 
Cc : gluster-users@gluster.org
Objet : Re: [Gluster-users] Geo Replication Failure: libgfchangelog.so: cannot 
open shared object file

We saw this with 4.1.x RPM on OEL (can't recall which specific version and 
haven't checked if its fixed in later, at least up to 4.1.6), but the issue 
seemed to be it just wasn't symlinked for some reason, so we symlinked 
libgfchangelog.so to /lib64/libgfchangelog.so.0

Not sure if the python code is meant to ask for libgfchangelog.so.0 in the 
first place, or if the RPM is meant to symlink it post-install.  In 7.x the 
script seems to use a different method for finding it too.

Andy.

On Tue, 27 Aug 2019 at 04:21, ROUVRAIS Cedric ResgBscRscDef 
mailto:cedric.rouvr...@socgen.com>> wrote:
Hello,

Having some slight difficulties with GeoReplication across 2 servers where the 
user is root on both sides: libgfchangelog.so: cannot open shared object file: 
No such file or directory

I get this error on the master node. I haven’t been able to get around it (I’ve 
deleted and recreated the configuration ex-nihilo a couple of time to no avail).

[2019-08-26 17:43:24.213577] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change status=Initializing...
[2019-08-26 17:43:24.213959] I [monitor(monitor):157:monitor] Monitor: starting 
gsyncd worker   brick=/vol/gluster/fr/brick1/gv_master slave_node=srv_slave
[2019-08-26 17:43:24.259780] I [gsyncd(agent 
/vol/gluster/fr/brick1/gv_master):309:main] : Using session config file
  
path=/var/lib/glusterd/geo-replication/gv_master_srv_slave_gv_slave/gsyncd.conf
[2019-08-26 17:43:24.261590] I [changelogagent(agent 
/vol/gluster/fr/brick1/gv_master):72:__init__] ChangelogAgent: Agent 
listining...
[2019-08-26 17:43:24.266072] I [gsyncd(worker 
/vol/gluster/fr/brick1/gv_master):309:main] : Using session config file
 path=/var/lib/glusterd/geo-replication/gv_master_srv_slave_gv_slave/gsyncd.conf
[2019-08-26 17:43:24.280029] I [resource(worker 
/vol/gluster/fr/brick1/gv_master):1379:connect_remote] SSH: Initializing SSH 
connection between master and slave...
[2019-08-26 17:43:25.749739] I [resource(worker 
/vol/gluster/fr/brick1/gv_master):1426:connect_remote] SSH: SSH connection 
between master and slave established.   duration=1.4696
[2019-08-26 17:43:25.749941] I [resource(worker 
/vol/gluster/fr/brick1/gv_master):1098:connect] GLUSTER: Mounting gluster 
volume locally...
[2019-08-26 17:43:26.824810] I [resource(worker 
/vol/gluster/fr/brick1/gv_master):1121:connect] GLUSTER: Mounted gluster volume 
   duration=1.0747
[2019-08-26 17:43:26.825088] I [subcmds(worker 
/vol/gluster/fr/brick1/gv_master):80:subcmd_worker] : Worker spawn 
successful. Acknowledging back to monitor
[2019-08-26 17:43:26.888922] E [repce(agent 
/vol/gluster/fr/brick1/gv_master):122:worker] : call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, 
in init
return Changes.cl_init()
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, 
in __getattr__
from libgfchangelog import Changes as LChanges
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 18, 
in 
class Changes(object):
 File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 20, in 
Changes
use_errno=True)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libgfchangelog.so: cannot open shared object file: No such file or 
directory


Thank you for any insight,

Cédric


=

Ce message et toutes les pieces jointes (ci-apres le "message")
sont confidentiels et susceptibles de contenir des informations
couvertes par le secret professionnel. Ce message est etabli
a l'intention exclusive de ses destinataires. Toute utilisation
ou diffusion non autorisee interdite.
Tout message electronique est susceptible d'alteration. La SOCIETE GENERALE
et ses filiales declinent toute responsabilite au titre de ce message
s'il a ete altere, deforme falsifie.

=

This message and any attachments (the "message") are confidential,
intended solely for the addresses, and may contain legally privileged
information. Any unauthorize

Re: [Gluster-users] Geo Replication Failure: libgfchangelog.so: cannot open shared object file

2019-08-27 Thread Andy Coates
We saw this with 4.1.x RPM on OEL (can't recall which specific version and
haven't checked if its fixed in later, at least up to 4.1.6), but the issue
seemed to be it just wasn't symlinked for some reason, so we symlinked
libgfchangelog.so to /lib64/libgfchangelog.so.0

Not sure if the python code is meant to ask for libgfchangelog.so.0 in the
first place, or if the RPM is meant to symlink it post-install.  In 7.x the
script seems to use a different method for finding it too.

Andy.

On Tue, 27 Aug 2019 at 04:21, ROUVRAIS Cedric ResgBscRscDef <
cedric.rouvr...@socgen.com> wrote:

> Hello,
>
>
>
> Having some slight difficulties with GeoReplication across 2 servers where
> the user is root on both sides: libgfchangelog.so: cannot open shared
> object file: No such file or directory
>
>
>
> I get this error on the master node. I haven’t been able to get around it
> (I’ve deleted and recreated the configuration ex-nihilo a couple of time to
> no avail).
>
>
>
> [2019-08-26 17:43:24.213577] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change status=Initializing...
>
> [2019-08-26 17:43:24.213959] I [monitor(monitor):157:monitor] Monitor:
> starting gsyncd worker   brick=/vol/gluster/fr/brick1/gv_master
> slave_node=srv_slave
>
> [2019-08-26 17:43:24.259780] I [gsyncd(agent
> /vol/gluster/fr/brick1/gv_master):309:main] : Using session config
> file
> path=/var/lib/glusterd/geo-replication/gv_master_srv_slave_gv_slave/gsyncd.conf
>
> [2019-08-26 17:43:24.261590] I [changelogagent(agent
> /vol/gluster/fr/brick1/gv_master):72:__init__] ChangelogAgent: Agent
> listining...
>
> [2019-08-26 17:43:24.266072] I [gsyncd(worker
> /vol/gluster/fr/brick1/gv_master):309:main] : Using session config
> file
> path=/var/lib/glusterd/geo-replication/gv_master_srv_slave_gv_slave/gsyncd.conf
>
> [2019-08-26 17:43:24.280029] I [resource(worker
> /vol/gluster/fr/brick1/gv_master):1379:connect_remote] SSH: Initializing
> SSH connection between master and slave...
>
> [2019-08-26 17:43:25.749739] I [resource(worker
> /vol/gluster/fr/brick1/gv_master):1426:connect_remote] SSH: SSH connection
> between master and slave established.   duration=1.4696
>
> [2019-08-26 17:43:25.749941] I [resource(worker
> /vol/gluster/fr/brick1/gv_master):1098:connect] GLUSTER: Mounting gluster
> volume locally...
>
> [2019-08-26 17:43:26.824810] I [resource(worker
> /vol/gluster/fr/brick1/gv_master):1121:connect] GLUSTER: Mounted gluster
> volumeduration=1.0747
>
> [2019-08-26 17:43:26.825088] I [subcmds(worker
> /vol/gluster/fr/brick1/gv_master):80:subcmd_worker] : Worker spawn
> successful. Acknowledging back to monitor
>
> [2019-08-26 17:43:26.888922] E [repce(agent
> /vol/gluster/fr/brick1/gv_master):122:worker] : call failed:
>
> Traceback (most recent call last):
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in
> worker
>
> res = getattr(self.obj, rmeth)(*in_data[2:])
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
> 37, in init
>
> return Changes.cl_init()
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
> 21, in __getattr__
>
> from libgfchangelog import Changes as LChanges
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 18, in 
>
> class Changes(object):
>
>  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 20, in Changes
>
> use_errno=True)
>
>   File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
>
> self._handle = _dlopen(self._name, mode)
>
> OSError: libgfchangelog.so: cannot open shared object file: No such file
> or directory
>
>
>
>
>
> Thank you for any insight,
>
>
>
> Cédric
>
>
>
> =
>
> Ce message et toutes les pieces jointes (ci-apres le "message")
> sont confidentiels et susceptibles de contenir des informations
> couvertes par le secret professionnel. Ce message est etabli
> a l'intention exclusive de ses destinataires. Toute utilisation
> ou diffusion non autorisee interdite.
> Tout message electronique est susceptible d'alteration. La SOCIETE GENERALE
> et ses filiales declinent toute responsabilite au titre de ce message
> s'il a ete altere, deforme falsifie.
>
> =
>
> This message and any attachments (the "message") are confidential,
> intended solely for the addresses, and may contain legally privileged
> information. Any unauthorized use or dissemination is prohibited.
> E-mails are susceptible to alteration. Neither SOCIETE GENERALE nor any
> of its subsidiaries or affiliates shall be liable for the message
> if altered, changed or falsified.
>
> =
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
__

Re: [Gluster-users] Geo Replication Stop even after migratingto 5.6

2019-07-18 Thread deepu srinivasan
Hi Guys
Yes, I will try the root geo-rep setup and update you back.
Meanwhile is there any procedure for the below-quoted info in the docs?

> Synchronization is not complete
>
> *Description*: GlusterFS geo-replication did not synchronize the data
> completely but the geo-replication status displayed is OK.
>
> *Solution*: You can enforce a full sync of the data by erasing the index
> and restarting GlusterFS geo-replication. After restarting, GlusterFS
> geo-replication begins synchronizing all the data. All files are compared
> using checksum, which can be a lengthy and high resource utilization
> operation on large data sets.
>
>
On Fri, Jun 14, 2019 at 12:30 PM Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> Could you please try root geo-rep setup and update back?
>
> On Fri, Jun 14, 2019 at 12:28 PM deepu srinivasan 
> wrote:
>
>> Hi Any updates on this
>>
>>
>> On Thu, Jun 13, 2019 at 5:43 PM deepu srinivasan 
>> wrote:
>>
>>> Hi Guys
>>> Hope you remember the issue I reported for geo replication hang status
>>> on History Crawl.
>>> So you advised me to update the gluster version. previously I was using
>>> 4.1 now I upgraded to 5.6/Still after deleting the previous geo-rep session
>>> and creating a new one the geo-rep session hangs. Is there any other way
>>> that I could solve the issue.
>>> I heard that I could redo the whole geo-replication again. How could I
>>> do that?
>>> Please help.
>>>
>>
>
> --
> Thanks and Regards,
> Kotresh H R
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo Replication Stop even after migratingto 5.6

2019-07-18 Thread deepu srinivasan
Hi Any updates on this

On Thu, Jun 13, 2019 at 6:59 PM deepu srinivasan  wrote:

>
>
> -- Forwarded message -
> From: deepu srinivasan 
> Date: Thu, Jun 13, 2019 at 5:43 PM
> Subject: Geo Replication Stop even after migratingto 5.6
> To: , Kotresh Hiremath Ravishankar <
> khire...@redhat.com>, 
>
>
> Hi Guys
> Hope you remember the issue I reported for geo replication hang status on
> History Crawl.
> So you advised me to update the gluster version. previously I was using
> 4.1 now I upgraded to 5.6/Still after deleting the previous geo-rep session
> and creating a new one the geo-rep session hangs. Is there any other way
> that I could solve the issue.
> I heard that I could redo the whole geo-replication again. How could I do
> that?
> Please help.
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

  1   2   3   4   5   >