Re: [Gluster-users] GlusterFS geo-replication progress question

2020-04-19 Thread Alexander Iliev

Thanks, Sunny.

alexander iliev

On 4/7/20 12:25 AM, Sunny Kumar wrote:

Hi Alexander,

Answers inline below:

On Thu, Apr 2, 2020 at 1:08 AM Alexander Iliev  wrote:


Hi all,

I have a running geo-replication session between two clusters and I'm
trying to figure out what is the current progress of the replication and
possibly how much longer it will take.

It has been running for quite a while now (> 1 month), but the thing is
that both the hardware of the nodes and the link between the two
clusters aren't that great (e.g., the volumes are backed by rotating
disks) and the volume is somewhat sizeable (30-ish TB) and given these
details I'm not really sure how long it is supposed to take normally.

I have several bricks in the volume (same brick size and physical layout
in both clusters) that are now showing up with a Changelog Crawl status
and with a recent LAST_SYNCED date in the `gluster colume
geo-replication status detail` command output which seems to be the
desired state for all bricks. The rest of the bricks though are in
Hybrid Crawl state and have been in that state forever.

So I suppose my questions are - how can I tell if the replication
session is somehow broken and if it's not, then is there are way for me
to find out the progress and the ETA of the replication?


Please go through this section[1] which talks about this.
In Hybrid crawl at present we do not have any accounting information
like how much time it will take to sync data.


In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are
some errors like:

[2020-03-31 11:48:47.81269] E [syncdutils(worker
/data/gfs/store1/8/brick):822:errlog] Popen: command returned error
cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
-S /tmp/gsync
d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x
/nonexistent/gsyncd slave  x.x.x.x:: --master-node x.x.x.x
--master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick
 --local-node x.x.x.x
2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout
120 --slave-log-level INFO --slave-gluster-log-level INFO
--slave-gluster-command-dir /usr/sbinerror=1
[2020-03-31 11:48:47.81617] E [syncdutils(worker
):826:logerr] Popen: ssh> failed with ValueError.
[2020-03-31 11:48:47.390397] I [repce(agent
):97:service_loop] RepceServer: terminating on reaching EOF.



If you are seeing this error at a regular interval then please check
your ssh connection, it might have broken.
If possible please share full traceback form both master and slave to
debug the issue.


In the brick logs I see stuff like:

[2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend storage

I don't know if these are critical, from the rest of the logs it looks
like data is traveling between the clusters.

Any help will be greatly appreciated. Thank you in advance!

Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[1]. 
https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#status

/sunny






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS geo-replication progress question

2020-04-06 Thread Sunny Kumar
Hi Alexander,

Answers inline below:

On Thu, Apr 2, 2020 at 1:08 AM Alexander Iliev  wrote:
>
> Hi all,
>
> I have a running geo-replication session between two clusters and I'm
> trying to figure out what is the current progress of the replication and
> possibly how much longer it will take.
>
> It has been running for quite a while now (> 1 month), but the thing is
> that both the hardware of the nodes and the link between the two
> clusters aren't that great (e.g., the volumes are backed by rotating
> disks) and the volume is somewhat sizeable (30-ish TB) and given these
> details I'm not really sure how long it is supposed to take normally.
>
> I have several bricks in the volume (same brick size and physical layout
> in both clusters) that are now showing up with a Changelog Crawl status
> and with a recent LAST_SYNCED date in the `gluster colume
> geo-replication status detail` command output which seems to be the
> desired state for all bricks. The rest of the bricks though are in
> Hybrid Crawl state and have been in that state forever.
>
> So I suppose my questions are - how can I tell if the replication
> session is somehow broken and if it's not, then is there are way for me
> to find out the progress and the ETA of the replication?
>
Please go through this section[1] which talks about this.
In Hybrid crawl at present we do not have any accounting information
like how much time it will take to sync data.

> In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are
> some errors like:
>
> [2020-03-31 11:48:47.81269] E [syncdutils(worker
> /data/gfs/store1/8/brick):822:errlog] Popen: command returned error
> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
> -S /tmp/gsync
> d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x
> /nonexistent/gsyncd slave  x.x.x.x:: --master-node x.x.x.x
> --master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick
>  --local-node x.x.x.x
> 2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout
> 120 --slave-log-level INFO --slave-gluster-log-level INFO
> --slave-gluster-command-dir /usr/sbinerror=1
> [2020-03-31 11:48:47.81617] E [syncdutils(worker
> ):826:logerr] Popen: ssh> failed with ValueError.
> [2020-03-31 11:48:47.390397] I [repce(agent
> ):97:service_loop] RepceServer: terminating on reaching EOF.
>

If you are seeing this error at a regular interval then please check
your ssh connection, it might have broken.
If possible please share full traceback form both master and slave to
debug the issue.

> In the brick logs I see stuff like:
>
> [2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
>
> I don't know if these are critical, from the rest of the logs it looks
> like data is traveling between the clusters.
>
> Any help will be greatly appreciated. Thank you in advance!
>
> Best regards,
> --
> alexander iliev
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
[1]. 
https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#status

/sunny





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users