Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

Kotte, Christian (Ext) Mon, 24 Sep 2018 08:10:38 -0700

Yeah right. I get permission denied.

[geoaccount@slave ~]$ ll 
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
ls: cannot access 
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: 
Permission denied
[geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/
ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission denied
[geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/
ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied
[geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/
ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission denied


[root@slave ~]# ll /bricks/brick1/brick/.glusterfs/29
total 0
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2
drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6
drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb
[root@slave ~]#

However, the strange thing is that I could replicate new files and folders 
before. The replication is broken since the “New folder” was created.

These are the permissions on a dev/test system:
[root@slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/
total 3136
drwx------. 44 root root    4096 Aug 22 18:19 00
drwx------. 50 root root    4096 Sep 12 13:14 01
drwx------. 54 root root    4096 Sep 13 11:33 02
drwx------. 59 root root    4096 Aug 22 18:21 03
drwx------. 60 root root    4096 Sep 12 13:14 04
drwx------. 68 root root    4096 Aug 24 12:36 05
drwx------. 56 root root    4096 Aug 22 18:21 06
drwx------. 46 root root    4096 Aug 22 18:21 07
drwx------. 51 root root    4096 Aug 22 18:21 08
drwx------. 42 root root    4096 Aug 22 18:21 09
drwx------. 44 root root    4096 Sep 13 11:16 0a

I’ve configured an AD group, SGID bit, and ACLs via Ansible on the local mount 
point. Could this be an issue? Should I avoid configuring the permissions on 
.glusterfs and below?

# ll /mnt/glustervol1/
total 12
drwxrwsr-x+  4 AD+user AD+group 4096 Jul 13 07:46 Scripts
drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software
-rw-rw-r--+  1 root    AD+group    0 Aug  8 08:44 test
drwxr-xr-x+  6 AD+user AD+group 4096 Apr 18 10:58 tftp

glusterfs_volumes:
[…]
    permissions:
      mode: "02775"
      owner: root
      group: "AD+group"
      acl_permissions: rw
[…]

# root directory is owned by root.
# set permissions to 'g+s' to automatically set the group to "AD+group"
# permissions of individual files will be set by Samba during creation
- name: Configure volume directory permission 1/2
  tags: glusterfs
  file:
    path: /mnt/{{ item.volume }}
    state: directory
    mode: "{{ item.permissions.mode }}"
    owner: "{{ item.permissions.owner }}"
    group: "{{ item.permissions.group }}"
  with_items: "{{ glusterfs_volumes }}"
  loop_control:
    label: "{{ item.volume }}"
  when: item.permissions is defined

# ACL needs to be set to override default umask and grant "AD+group" write 
permissions
- name: Configure volume directory permission 2/2 (ACL)
  tags: glusterfs
  acl:
    path: /mnt/{{ item.volume }}
    default: yes
    entity: "{{ item.permissions.group }}"
    etype: group
    permissions: "{{ item.permissions.acl_permissions }}"
    state: present
  with_items: "{{ glusterfs_volumes }}"
  loop_control:
    label: "{{ item.volume }}"
  when: item.permissions is defined

Regards,
Christian

From: Kotresh Hiremath Ravishankar <[email protected]>
Date: Monday, 24. September 2018 at 16:20
To: "Kotte, Christian (Ext)" <[email protected]>
Cc: Gluster Users <[email protected]>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: 
[Errno 13] Permission denied

I think I am get what's happening. The geo-rep session is non-root.
Could you do readlink on brick path mentioned above 
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
from a geaccount user and see if you are getting "Permission Denied" errors?

Thanks,
Kotresh HR

On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) 
<[email protected]<mailto:[email protected]>> wrote:
Ok. It happens on all slave nodes (and on the interimmaster as well).

It’s like I assumed. These are the logs of one of the slaves:

gsyncd.log
[2018-09-24 13:52:25.418382] I [repce(slave 
slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on 
reaching EOF.
[2018-09-24 13:52:37.95297] W [gsyncd(slave 
slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, 
using the default config     
path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:37.109643] I [resource(slave 
slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume 
locally...
[2018-09-24 13:52:38.303920] I [resource(slave 
slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume      
duration=1.1941
[2018-09-24 13:52:38.304771] I [resource(slave 
slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
[2018-09-24 13:52:41.981554] I [resource(slave 
slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on mkdir  
      gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e       
entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder'
[2018-09-24 13:52:42.45641] E [repce(slave 
slave/bricks/brick1/brick):105:worker] <top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, in 
entry_ops
    src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682, in 
get_slv_dir_path
    [ENOENT], [ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, in 
errno_wrap
    return call(*arg)
OSError: [Errno 13] Permission denied: 
'/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
[2018-09-24 13:52:42.81794] I [repce(slave 
slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on 
reaching EOF.
[2018-09-24 13:52:53.459676] W [gsyncd(slave 
slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, 
using the default config    
path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
[2018-09-24 13:52:53.473500] I [resource(slave 
slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume 
locally...
[2018-09-24 13:52:54.659044] I [resource(slave 
slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume      
duration=1.1854
[2018-09-24 13:52:54.659837] I [resource(slave 
slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening

The folder “New folder” will be created via Samba and it was renamed by my 
colleague right away after creation.
[root@slave glustervol1_slave_glustervol1]# ls 
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
[root@slave glustervol1_slave_glustervol1]# ls 
/bricks/brick1/brick/.glusterfs/29/d1/ -al
total 0
drwx--S---+  2 root AD+group 50 Sep 21 09:39 .
drwx--S---+ 11 root AD+group 96 Sep 21 09:39 ..
lrwxrwxrwx.  1 root AD+group 75 Sep 21 09:39 
29d1d60d-1ad6-45fc-87e0-93d478f7331e -> 
../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

Creating the folder in 
/bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, 
but it didn’t change anything.

mnt-slave-bricks-brick1-brick.log
[2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glustervol1-client-0: error returned while attempting to connect to 
host:(null), port:0
[2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glustervol1-client-0: error returned while attempting to connect to 
host:(null), port:0
[2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 
0-glustervol1-client-0: changing port to 49152 (from 0)
[2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glustervol1-client-0: error returned while attempting to connect to 
host:(null), port:0
[2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] 
(-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) 
[0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) 
[0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) [0x7fafbdb7fea1] 
) 0-dict: value is NULL [Invalid argument]
[2018-09-24 13:51:10.643507] I [MSGID: 114006] 
[client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed to 
set process-name in handshake msg
[2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glustervol1-client-0: error returned while attempting to connect to 
host:(null), port:0
[2018-09-24 13:51:10.671460] I [MSGID: 114046] 
[client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: 
Connected to glustervol1-client-0, attached to remote volume 
'/bricks/brick1/brick'.
[2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: 
FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: 
switched to graph 0
[2018-09-24 13:51:10.673329] I [MSGID: 109005] 
[dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory 
selfheal failed: Unable to form layout for directory /
[2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] 0-fuse: 
initating unmount of /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E
[2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] 
(-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] 
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] 
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: received 
signum (15), shutting down
[2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: Unmounting 
'/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
[2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing fuse 
connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.

Regards,
Christian

From: Kotresh Hiremath Ravishankar 
<[email protected]<mailto:[email protected]>>
Date: Saturday, 22. September 2018 at 06:52
To: "Kotte, Christian (Ext)" 
<[email protected]<mailto:[email protected]>>
Cc: Gluster Users <[email protected]<mailto:[email protected]>>
Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: 
[Errno 13] Permission denied

The problem occured on slave side whose error is propagated to master. Mostly 
any traceback with repce involved is related to problem in slave. Just check 
few lines above in the log to find the slave node, the crashed worker is 
connected to and get geo replication logs to further debug.





On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

Any idea how to troubleshoot this?

New folders and files were created on the master and the replication went 
faulty. They were created via Samba.

Version: GlusterFS 4.1.3

[root@master]# gluster volume geo-replication status

MASTER NODE                         MASTER VOL     MASTER BRICK            
SLAVE USER    SLAVE                                                             
SLAVE NODE    STATUS    CRAWL STATUS    LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master    glustervol1    /bricks/brick1/brick    geoaccount    
ssh://geoaccount@slave_1::glustervol1       N/A           Faulty    N/A         
    N/A
master    glustervol1    /bricks/brick1/brick    geoaccount    
ssh://geoaccount@slave_2::glustervol1       N/A           Faulty    N/A         
    N/A
master    glustervol1    /bricks/brick1/brick    geoaccount    
ssh://geoaccount@interimmaster::glustervol1   N/A           Faulty    N/A       
      N/A

The following error is repeatedly logged in the gsyncd.logs:
[2018-09-21 14:26:38.611479] I [repce(agent 
/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.
[2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker 
died in startup phase     brick=/bricks/brick1/brick
[2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] 
GeorepStatus: Worker Status Change status=Faulty
[2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting 
gsyncd worker   brick=/bricks/brick1/brick      
slave_node=nrchbs-slp2020.nibr.novartis.net<http://nrchbs-slp2020.nibr.novartis.net>
[2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] 
<top>: Using session config file   
path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.473917] I [changelogagent(agent 
/bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
[2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] 
<top>: Using session config file  
path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
[2018-09-21 14:26:49.538049] I [resource(worker 
/bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection 
between master and slave...
[2018-09-21 14:26:53.5017] I [resource(worker 
/bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master 
and slave established.      duration=3.4665
[2018-09-21 14:26:53.5419] I [resource(worker 
/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...
[2018-09-21 14:26:54.120374] I [resource(worker 
/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume     
duration=1.1146
[2018-09-21 14:26:54.121012] I [subcmds(worker 
/bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. 
Acknowledging back to monitor
[2018-09-21 14:26:56.144460] I [master(worker 
/bricks/brick1/brick):1593:register] _GMaster: Working dir        
path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick
[2018-09-21 14:26:56.145145] I [resource(worker 
/bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016
[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker 
/bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change    
status=Active
[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker 
/bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status 
Change        status=History Crawl
[2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] 
_GMaster: starting history crawl        turns=1 stime=(1537522637, 0)   
entry_stime=(1537537141, 0)     etime=1537540016
[2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] 
_GMaster: slave's time  stime=(1537522637, 0)
[2018-09-21 14:26:56.197065] I [master(worker 
/bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry 
ops        to_changelog=1537522638 num_changelogs=1        
from_changelog=1537522638
[2018-09-21 14:26:56.197402] I [master(worker 
/bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken    MKD=0   MKN=0 
  LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0.0000 UNL=1
[2018-09-21 14:26:56.197623] I [master(worker 
/bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken    
SETA=0  SETX=0  meta_duration=0.0000    data_duration=0.0284    DATA=0  XATT=0
[2018-09-21 14:26:56.198230] I [master(worker 
/bricks/brick1/brick):1394:process] _GMaster: Batch Completed     
changelog_end=1537522638        entry_stime=(1537537141, 0)     
changelog_start=1537522638      stime=(1537522637, 0)   duration=0.0333 
num_changelogs=1        mode=history_changelog
[2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] 
_GMaster: slave's time  stime=(1537522637, 0)
[2018-09-21 14:26:57.528625] E [repce(worker 
/bricks/brick1/brick):197:__call__] RepceClient: call failed       
call=17209:140650361157440:1537540017.21        method=entry_ops        
error=OSError
[2018-09-21 14:26:57.529371] E [syncdutils(worker 
/bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
    func(args)
  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in 
subcmd_worker
    local.service_loop(remote)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in 
service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in 
crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in 
changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in 
process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in 
process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in 
__call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in 
__call__
    raise res
OSError: [Errno 13] Permission denied: 
'/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'

The permissions look fine. The replication is done via geo user instead of 
root. It should be able to read, but I’m not sure if the syncdaemon runs under 
geoaccount!?

[root@master vRealize Operation Manager]# ll 
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
lrwxrwxrwx. 1 root root 75 Sep 21 09:39 
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> 
../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager

[root@master vRealize Operation Manager]# ll 
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
total 4
drwxrwxr-x. 2 AD+user AD+group  131 Sep 21 10:14 6.7
drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
drwxrwxr-x. 2 AD+user AD+group   57 Sep 21 10:28 7.5
[root@master vRealize Operation Manager]#

It could be possible that the folder was renamed. I had 3 similar issues since 
I migrated to GlusterFS 4.x but couldn’t investigate much. I needed to 
completely wipe GlusterFS and geo-repliction to get rid of this error…

Any help is appreciated.

Regards,

Christian Kotte
_______________________________________________
Gluster-users mailing list
[email protected]<mailto:[email protected]>
https://lists.gluster.org/mailman/listinfo/gluster-users<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=>


--
Thanks and Regards,
Kotresh H R

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

Reply via email to