Thanks, Sunny.

alexander iliev

On 4/7/20 12:25 AM, Sunny Kumar wrote:
Hi Alexander,

Answers inline below:

On Thu, Apr 2, 2020 at 1:08 AM Alexander Iliev <ailiev+glus...@mamul.org> wrote:

Hi all,

I have a running geo-replication session between two clusters and I'm
trying to figure out what is the current progress of the replication and
possibly how much longer it will take.

It has been running for quite a while now (> 1 month), but the thing is
that both the hardware of the nodes and the link between the two
clusters aren't that great (e.g., the volumes are backed by rotating
disks) and the volume is somewhat sizeable (30-ish TB) and given these
details I'm not really sure how long it is supposed to take normally.

I have several bricks in the volume (same brick size and physical layout
in both clusters) that are now showing up with a Changelog Crawl status
and with a recent LAST_SYNCED date in the `gluster colume
geo-replication status detail` command output which seems to be the
desired state for all bricks. The rest of the bricks though are in
Hybrid Crawl state and have been in that state forever.

So I suppose my questions are - how can I tell if the replication
session is somehow broken and if it's not, then is there are way for me
to find out the progress and the ETA of the replication?

Please go through this section[1] which talks about this.
In Hybrid crawl at present we do not have any accounting information
like how much time it will take to sync data.

In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are
some errors like:

[2020-03-31 11:48:47.81269] E [syncdutils(worker
/data/gfs/store1/8/brick):822:errlog] Popen: command returned error
cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
-S /tmp/gsync
d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x
/nonexistent/gsyncd slave <vol> x.x.x.x::<vol> --master-node x.x.x.x
--master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick
<brick_path> --local-node x.x.x.x
2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout
120 --slave-log-level INFO --slave-gluster-log-level INFO
--slave-gluster-command-dir /usr/sbin    error=1
[2020-03-31 11:48:47.81617] E [syncdutils(worker
<brick_path>):826:logerr] Popen: ssh> failed with ValueError.
[2020-03-31 11:48:47.390397] I [repce(agent
<brick_path>):97:service_loop] RepceServer: terminating on reaching EOF.


If you are seeing this error at a regular interval then please check
your ssh connection, it might have broken.
If possible please share full traceback form both master and slave to
debug the issue.

In the brick logs I see stuff like:

[2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend storage

I don't know if these are critical, from the rest of the logs it looks
like data is traveling between the clusters.

Any help will be greatly appreciated. Thank you in advance!

Best regards,
--
alexander iliev
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[1]. 
https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#status

/sunny

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to