Dear Users,
the geo-replication is still broken. This is not really a comfortable
situation.
Does any user has had the same experience and is able to share a
possible workaround?
We are actually running gluster v6.0
Regards,
Felix
On 25/06/2020 10:04, Shwetha Acharya wrote:
Hi Rob and Felix,
Please share the *-changes.log files and brick logs, which will help
in analysis of the issue.
Regards,
Shwetha
On Thu, Jun 25, 2020 at 1:26 PM Felix Kölzow <felix.koel...@gmx.de
<mailto:felix.koel...@gmx.de>> wrote:
Hey Rob,
same issue for our third volume. Have a look at the logs just from
right now (below).
Question: You removed the htime files and the old changelogs. Just
rm the files or is there something to pay more attention
before removing the changelog files and the htime file.
Regards,
Felix
[2020-06-25 07:51:53.795430] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1435:connect_remote] SSH:
SSH connection between master and slave established.
duration=1.2341
[2020-06-25 07:51:53.795639] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1105:connect] GLUSTER:
Mounting gluster volume locally...
[2020-06-25 07:51:54.520601] I [monitor(monitor):280:monitor]
Monitor: worker died in startup phase
brick=/gluster/vg01/dispersed_fuse1024/brick
[2020-06-25 07:51:54.535809] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Faulty
[2020-06-25 07:51:54.882143] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1128:connect] GLUSTER:
Mounted gluster volume duration=1.0864
[2020-06-25 07:51:54.882388] I [subcmds(worker
/gluster/vg00/dispersed_fuse1024/brick):84:subcmd_worker] <top>:
Worker spawn successful. Acknowledging back to monitor
[2020-06-25 07:51:56.911412] E [repce(agent
/gluster/vg00/dispersed_fuse1024/brick):121:worker] <top>: call
failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
117, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File
"/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
40, in register
return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level,
retries)
File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
46, in cl_register
cls.raise_changelog_err()
File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
30, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2020-06-25 07:51:56.912056] E [repce(worker
/gluster/vg00/dispersed_fuse1024/brick):213:__call__] RepceClient:
call failed call=75086:140098349655872:1593071514.91
method=register error=ChangelogException
[2020-06-25 07:51:56.912396] E [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1286:service_loop]
GLUSTER: Changelog register failed error=[Errno 2] No such file
or directory
[2020-06-25 07:51:56.928031] I [repce(agent
/gluster/vg00/dispersed_fuse1024/brick):96:service_loop]
RepceServer: terminating on reaching EOF.
[2020-06-25 07:51:57.886126] I [monitor(monitor):280:monitor]
Monitor: worker died in startup phase
brick=/gluster/vg00/dispersed_fuse1024/brick
[2020-06-25 07:51:57.895920] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Faulty
[2020-06-25 07:51:58.607405] I [gsyncdstatus(worker
/gluster/vg00/dispersed_fuse1024/brick):287:set_passive]
GeorepStatus: Worker Status Change status=Passive
[2020-06-25 07:51:58.607768] I [gsyncdstatus(worker
/gluster/vg01/dispersed_fuse1024/brick):287:set_passive]
GeorepStatus: Worker Status Change status=Passive
[2020-06-25 07:51:58.608004] I [gsyncdstatus(worker
/gluster/vg00/dispersed_fuse1024/brick):281:set_active]
GeorepStatus: Worker Status Change status=Active
On 25/06/2020 09:15, rob.quaglio...@rabobank.com
<mailto:rob.quaglio...@rabobank.com> wrote:
Hi All,
We’ve got two six node RHEL 7.8 clusters and geo-replication
would appear to be completely broken between them. I’ve deleted
the session, removed & recreated pem files, old changlogs/htime
(after removing relevant options from volume) and completely set
up geo-rep from scratch, but the new session comes up as
Initializing, then goes faulty, and starts looping. Volume (on
both sides) is a 4 x 2 disperse, running Gluster v6 (RH latest).
Gsyncd reports:
[2020-06-25 07:07:14.701423] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
Worker Status Change status=Initializing...
[2020-06-25 07:07:14.701744] I [monitor(monitor):159:monitor]
Monitor: starting gsyncd worker brick=/rhgs/brick20/brick
slave_node=bxts470194.eu.rabonet.com
<http://bxts470194.eu.rabonet.com>
[2020-06-25 07:07:14.707997] D [monitor(monitor):230:monitor]
Monitor: Worker would mount volume privately
[2020-06-25 07:07:14.757181] I [gsyncd(agent
/rhgs/brick20/brick):318:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf
[2020-06-25 07:07:14.758126] D [subcmds(agent
/rhgs/brick20/brick):107:subcmd_agent] <top>: RPC FD
rpc_fd='5,12,11,10'
[2020-06-25 07:07:14.758627] I [changelogagent(agent
/rhgs/brick20/brick):72:__init__] ChangelogAgent: Agent listining...
[2020-06-25 07:07:14.764234] I [gsyncd(worker
/rhgs/brick20/brick):318:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf
[2020-06-25 07:07:14.779409] I [resource(worker
/rhgs/brick20/brick):1386:connect_remote] SSH: Initializing SSH
connection between master and slave...
[2020-06-25 07:07:14.841793] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068834.84 __repce_version__() ...
[2020-06-25 07:07:16.148725] D [repce(worker
/rhgs/brick20/brick):215:__call__] RepceClient: call
6799:140380783982400:1593068834.84 __repce_version__ -> 1.0
[2020-06-25 07:07:16.148911] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068836.15 version() ...
[2020-06-25 07:07:16.149574] D [repce(worker
/rhgs/brick20/brick):215:__call__] RepceClient: call
6799:140380783982400:1593068836.15 version -> 1.0
[2020-06-25 07:07:16.149735] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068836.15 pid() ...
[2020-06-25 07:07:16.150588] D [repce(worker
/rhgs/brick20/brick):215:__call__] RepceClient: call
6799:140380783982400:1593068836.15 pid -> 30703
[2020-06-25 07:07:16.150747] I [resource(worker
/rhgs/brick20/brick):1435:connect_remote] SSH: SSH connection
between master and slave established. duration=1.3712
[2020-06-25 07:07:16.150819] I [resource(worker
/rhgs/brick20/brick):1105:connect] GLUSTER: Mounting gluster
volume locally...
[2020-06-25 07:07:16.265860] D [resource(worker
/rhgs/brick20/brick):879:inhibit] DirectMounter: auxiliary
glusterfs mount in place
[2020-06-25 07:07:17.272511] D [resource(worker
/rhgs/brick20/brick):953:inhibit] DirectMounter: auxiliary
glusterfs mount prepared
[2020-06-25 07:07:17.272708] I [resource(worker
/rhgs/brick20/brick):1128:connect] GLUSTER: Mounted gluster
volume duration=1.1218
[2020-06-25 07:07:17.272794] I [subcmds(worker
/rhgs/brick20/brick):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2020-06-25 07:07:17.272973] D [master(worker
/rhgs/brick20/brick):104:gmaster_builder] <top>: setting up
change detection mode mode=xsync
[2020-06-25 07:07:17.273063] D [monitor(monitor):273:monitor]
Monitor: worker(/rhgs/brick20/brick) connected
[2020-06-25 07:07:17.273678] D [master(worker
/rhgs/brick20/brick):104:gmaster_builder] <top>: setting up
change detection mode mode=changelog
[2020-06-25 07:07:17.274224] D [master(worker
/rhgs/brick20/brick):104:gmaster_builder] <top>: setting up
change detection mode mode=changeloghistory
[2020-06-25 07:07:17.276484] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068837.28 version() ...
[2020-06-25 07:07:17.276916] D [repce(worker
/rhgs/brick20/brick):215:__call__] RepceClient: call
6799:140380783982400:1593068837.28 version -> 1.0
[2020-06-25 07:07:17.277009] D [master(worker
/rhgs/brick20/brick):777:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick
[2020-06-25 07:07:17.277098] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068837.28 init() ...
[2020-06-25 07:07:17.292944] D [repce(worker
/rhgs/brick20/brick):215:__call__] RepceClient: call
6799:140380783982400:1593068837.28 init -> None
[2020-06-25 07:07:17.293097] D [repce(worker
/rhgs/brick20/brick):195:push] RepceClient: call
6799:140380783982400:1593068837.29
register('/rhgs/brick20/brick',
'/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick',
'/var/log/glusterfs/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/changes-rhgs-brick20-brick.log',
8, 5) ...
[2020-06-25 07:07:19.296294] E [repce(agent
/rhgs/brick20/brick):121:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
117, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File
"/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 40, in register
return Changes.cl_register(cl_brick, cl_dir, cl_log,
cl_level, retries)
File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 46, in cl_register
cls.raise_changelog_err()
File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 30, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2020-06-25 07:07:19.297161] E [repce(worker
/rhgs/brick20/brick):213:__call__] RepceClient: call failed
call=6799:140380783982400:1593068837.29 method=register
error=ChangelogException
[2020-06-25 07:07:19.297338] E [resource(worker
/rhgs/brick20/brick):1286:service_loop] GLUSTER: Changelog
register failed error=[Errno 2] No such file or directory
[2020-06-25 07:07:19.315074] I [repce(agent
/rhgs/brick20/brick):96:service_loop] RepceServer: terminating on
reaching EOF.
[2020-06-25 07:07:20.275701] I [monitor(monitor):280:monitor]
Monitor: worker died in startup phase brick=/rhgs/brick20/brick
[2020-06-25 07:07:20.277383] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
Worker Status Change status=Faulty
We’ve done everything we can think of, including an “strace –f”
on the pid, and we can’t really find anything. I’m about to lose
the last of my hair over this, so does anyone have any ideas at
all? We’ve even removed the entire slave vol and rebuilt it.
Thanks
Rob
*Rob Quagliozzi*
*Specialised Application Support*
------------------------------------------------------------------------
This email (including any attachments to it) is confidential,
legally privileged, subject to copyright and is sent for the
personal attention of the intended recipient only. If you have
received this email in error, please advise us immediately and
delete it. You are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of
this information is strictly prohibited. Although we have taken
reasonable precautions to ensure no viruses are present in this
email, we cannot accept responsibility for any loss or damage
arising from the viruses in this email or attachments. We exclude
any liability for the content of this email, or for the
consequences of any actions taken on the basis of the information
provided in this email or its attachments, unless that
information is subsequently confirmed in writing. <#rbnl#1898i>
------------------------------------------------------------------------
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge:https://bluejeans.com/441850968
Gluster-users mailing list
Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users