I think I am get what's happening. The geo-rep session is non-root. Could you do readlink on brick path mentioned above /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e from a geaccount user and see if you are getting "Permission Denied" errors?
Thanks, Kotresh HR On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) < [email protected]> wrote: > Ok. It happens on all slave nodes (and on the interimmaster as well). > > > > It’s like I assumed. These are the logs of one of the slaves: > > > > gsyncd.log > > [2018-09-24 13:52:25.418382] I [repce(slave > slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on > reaching EOF. > > [2018-09-24 13:52:37.95297] W [gsyncd(slave > slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, > using the default config > path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf > > [2018-09-24 13:52:37.109643] I [resource(slave > slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume > locally... > > [2018-09-24 13:52:38.303920] I [resource(slave > slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster > volume duration=1.1941 > > [2018-09-24 13:52:38.304771] I [resource(slave > slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening > > [2018-09-24 13:52:41.981554] I [resource(slave > slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on > mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e > entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder' > > [2018-09-24 13:52:42.45641] E [repce(slave > slave/bricks/brick1/brick):105:worker] <top>: call failed: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in > worker > > res = getattr(self.obj, rmeth)(*in_data[2:]) > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, > in entry_ops > > src_entry = get_slv_dir_path(slv_host, slv_volume, gfid) > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682, > in get_slv_dir_path > > [ENOENT], [ESTALE]) > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, > in errno_wrap > > return call(*arg) > > OSError: [Errno 13] Permission denied: > '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' > > [2018-09-24 13:52:42.81794] I [repce(slave > slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on > reaching EOF. > > [2018-09-24 13:52:53.459676] W [gsyncd(slave > slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, > using the default config > path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf > > [2018-09-24 13:52:53.473500] I [resource(slave > slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume > locally... > > [2018-09-24 13:52:54.659044] I [resource(slave > slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume > duration=1.1854 > > [2018-09-24 13:52:54.659837] I [resource(slave > slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening > > > > The folder “New folder” will be created via Samba and it was renamed by my > colleague right away after creation. > > [root@slave glustervol1_slave_glustervol1]# ls > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ > > [root@slave glustervol1_slave_glustervol1]# ls > /bricks/brick1/brick/.glusterfs/29/d1/ -al > > total 0 > > drwx--S---+ 2 root AD+group 50 Sep 21 09:39 . > > drwx--S---+ 11 root AD+group 96 Sep 21 09:39 .. > > lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 > 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> > ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager > > > > Creating the folder in > /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, > but it didn’t change anything. > > > > mnt-slave-bricks-brick1-brick.log > > [2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glustervol1-client-0: error returned while attempting to connect to > host:(null), port:0 > > [2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glustervol1-client-0: error returned while attempting to connect to > host:(null), port:0 > > [2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-glustervol1-client-0: changing port to 49152 (from 0) > > [2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glustervol1-client-0: error returned while attempting to connect to > host:(null), port:0 > > [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] > (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) > [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) > [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) > [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument] > > [2018-09-24 13:51:10.643507] I [MSGID: 114006] > [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed > to set process-name in handshake msg > > [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glustervol1-client-0: error returned while attempting to connect to > host:(null), port:0 > > [2018-09-24 13:51:10.671460] I [MSGID: 114046] > [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: > Connected to glustervol1-client-0, attached to remote volume > '/bricks/brick1/brick'. > > [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel > 7.22 > > [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] > 0-fuse: switched to graph 0 > > [2018-09-24 13:51:10.673329] I [MSGID: 109005] > [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory > selfheal failed: Unable to form layout for directory / > > [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] > 0-fuse: initating unmount of > /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E > > [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: > received signum (15), shutting down > > [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: > Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. > > [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing > fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. > > > > Regards, > > Christian > > > > *From: *Kotresh Hiremath Ravishankar <[email protected]> > *Date: *Saturday, 22. September 2018 at 06:52 > *To: *"Kotte, Christian (Ext)" <[email protected]> > *Cc: *Gluster Users <[email protected]> > *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log > OSError: [Errno 13] Permission denied > > > > The problem occured on slave side whose error is propagated to master. > Mostly any traceback with repce involved is related to problem in slave. > Just check few lines above in the log to find the slave node, the crashed > worker is connected to and get geo replication logs to further debug. > > > > > > > > > > > > On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), < > [email protected]> wrote: > > Hi, > > > > Any idea how to troubleshoot this? > > > > New folders and files were created on the master and the replication went > faulty. They were created via Samba. > > > > Version: GlusterFS 4.1.3 > > > > [root@master]# gluster volume geo-replication status > > > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER > SLAVE SLAVE > NODE STATUS CRAWL STATUS LAST_SYNCED > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > master glustervol1 /bricks/brick1/brick geoaccount > ssh://geoaccount@slave_1::glustervol1 N/A Faulty > N/A N/A > > master glustervol1 /bricks/brick1/brick geoaccount > ssh://geoaccount@slave_2::glustervol1 N/A Faulty > N/A N/A > > master glustervol1 /bricks/brick1/brick geoaccount > ssh://geoaccount@interimmaster::glustervol1 N/A Faulty > N/A N/A > > > > The following error is repeatedly logged in the gsyncd.logs: > > [2018-09-21 14:26:38.611479] I [repce(agent > /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching > EOF. > > [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: > worker died in startup phase brick=/bricks/brick1/brick > > [2018-09-21 14:26:39.214322] I > [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > > [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: > starting gsyncd worker brick=/bricks/brick1/brick slave_node= > nrchbs-slp2020.nibr.novartis.net > > [2018-09-21 14:26:49.471532] I [gsyncd(agent > /bricks/brick1/brick):297:main] <top>: Using session config file > path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf > > [2018-09-21 14:26:49.473917] I [changelogagent(agent > /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... > > [2018-09-21 14:26:49.491359] I [gsyncd(worker > /bricks/brick1/brick):297:main] <top>: Using session config file > path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf > > [2018-09-21 14:26:49.538049] I [resource(worker > /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection > between master and slave... > > [2018-09-21 14:26:53.5017] I [resource(worker > /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between > master and slave established. duration=3.4665 > > [2018-09-21 14:26:53.5419] I [resource(worker > /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume > locally... > > [2018-09-21 14:26:54.120374] I [resource(worker > /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume > duration=1.1146 > > [2018-09-21 14:26:54.121012] I [subcmds(worker > /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. > Acknowledging back to monitor > > [2018-09-21 14:26:56.144460] I [master(worker > /bricks/brick1/brick):1593:register] _GMaster: Working dir > path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick > > [2018-09-21 14:26:56.145145] I [resource(worker > /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time > time=1537540016 > > [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker > /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change > status=Active > > [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker > /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl > Status Change status=History Crawl > > [2018-09-21 14:26:56.161536] I [master(worker > /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl > turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) > etime=1537540016 > > [2018-09-21 14:26:56.164277] I [master(worker > /bricks/brick1/brick):1536:crawl] _GMaster: slave's time > stime=(1537522637, 0) > > [2018-09-21 14:26:56.197065] I [master(worker > /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed > entry ops to_changelog=1537522638 num_changelogs=1 > from_changelog=1537522638 > > [2018-09-21 14:26:56.197402] I [master(worker > /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 > MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1 > > [2018-09-21 14:26:56.197623] I [master(worker > /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken > SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 > XATT=0 > > [2018-09-21 14:26:56.198230] I [master(worker > /bricks/brick1/brick):1394:process] _GMaster: Batch Completed > changelog_end=1537522638 entry_stime=(1537537141, 0) > changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 > num_changelogs=1 mode=history_changelog > > [2018-09-21 14:26:57.200436] I [master(worker > /bricks/brick1/brick):1536:crawl] _GMaster: slave's time > stime=(1537522637, 0) > > [2018-09-21 14:26:57.528625] E [repce(worker > /bricks/brick1/brick):197:__call__] RepceClient: call failed > call=17209:140650361157440:1537540017.21 method=entry_ops > error=OSError > > [2018-09-21 14:26:57.529371] E [syncdutils(worker > /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in > main > > func(args) > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in > subcmd_worker > > local.service_loop(remote) > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, > in service_loop > > g3.crawlwrap(oneshot=True) > > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in > crawlwrap > > self.crawl() > > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in > crawl > > self.changelogs_batch_process(changes) > > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in > changelogs_batch_process > > self.process(batch) > > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in > process > > self.process_change(change, done, retry) > > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in > process_change > > failures = self.slave.server.entry_ops(entries) > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in > __call__ > > return self.ins(self.meth, *a) > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in > __call__ > > raise res > > OSError: [Errno 13] Permission denied: > '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' > > > > The permissions look fine. The replication is done via geo user instead of > root. It should be able to read, but I’m not sure if the syncdaemon runs > under geoaccount!? > > > > [root@master vRealize Operation Manager]# ll > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e > > lrwxrwxrwx. 1 root root 75 Sep 21 09:39 > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e > -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation > Manager > > > > [root@master vRealize Operation Manager]# ll > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ > > total 4 > > drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7 > > drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0 > > drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5 > > [root@master vRealize Operation Manager]# > > > > It could be possible that the folder was renamed. I had 3 similar issues > since I migrated to GlusterFS 4.x but couldn’t investigate much. I needed > to completely wipe GlusterFS and geo-repliction to get rid of this error… > > > > Any help is appreciated. > > > > Regards, > > > > *Christian Kotte* > > _______________________________________________ > Gluster-users mailing list > [email protected] > https://lists.gluster.org/mailman/listinfo/gluster-users > <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=> > > -- Thanks and Regards, Kotresh H R
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
