[Gluster-users] Geo-Replication memory leak on slave node

Mark Betham Wed, 06 Jun 2018 06:41:12 -0700

Dear Gluster-Users,

I have geo-replication setup and configured between 2 Gluster pools located
at different sites.  What I am seeing is an error being reported within the
geo-replication slave log as follows;


*[2018-06-05 12:05:26.767615] E [syncdutils(slave):331:log_raise_exception]
<top>: FAIL: *
*Traceback (most recent call last):*
*  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 361,
in twrap*
*    tf(*aa)*
*  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1009,
in <lambda>*
*    t = syncdutils.Thread(target=lambda: (repce.service_loop(),*
*  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 90, in
service_loop*
*    self.q.put(recv(self.inf))*
*  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 61, in
recv*
*    return pickle.load(inf)*
*ImportError: No module named
h_2013-04-26-04:02:49-2013-04-26_11:02:53.gz.15WBuUh*
*[2018-06-05 12:05:26.768085] E [repce(slave):117:worker] <top>: call
failed: *
*Traceback (most recent call last):*
*  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in
worker*
*    res = getattr(self.obj, rmeth)(*in_data[2:])*
*TypeError: getattr(): attribute name must be string*

>From this point in time the slave server begins to consume all of its
available RAM until it becomes non-responsive.  Eventually the gluster
service seems to kill off the offending process and the memory is returned
to the system.  Once the memory has been returned to the remote slave
system the geo-replication often recovers and data transfer resumes.

I have attached the full geo-replication slave log containing the error
shown above.  I have also attached an image file showing the memory usage
of the affected storage server.

We are currently running Gluster version 3.12.9 on top of CentOS 7.5
x86_64.  The system has been fully patched and is running the latest
software, excluding glibc which had to be downgraded to get geo-replication
working.

The Gluster volume runs on a dedicated partition using the XFS filesystem
which in turn is running on a LVM thin volume.  The physical storage is
presented as a single drive due to the underlying disks being part of a
raid 10 array.

The Master volume which is being replicated has a total of 2.2 TB of data
to be replicated.  The total size of the volume fluctuates very little as
data being removed equals the new data coming in.  This data is made up of
many thousands of files across many separated directories.  Data file sizes
vary from the very small (>1K) to the large (>1Gb).  The Gluster service
itself is running with a single volume in a replicated configuration across
3 bricks at each of the sites.  The delta changes being replicated are on
average about 100GB per day, where this includes file creation / deletion /
modification.

The config for the geo-replication session is as follows, taken from the
current source server;

*special_sync_mode: partial*
*gluster_log_file:
/var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.gluster.log*
*ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem*
*change_detector: changelog*
*session_owner: 40e9e77a-034c-44a2-896e-59eec47e8a84*
*state_file:
/var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/monitor.status*
*gluster_params: aux-gfid-mount acl*
*log_rsync_performance: true*
*remote_gsyncd: /nonexistent/gsyncd*
*working_dir:
/var/lib/misc/glusterfsd/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1*
*state_detail_file:
/var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1-detail.status*
*gluster_command_dir: /usr/sbin/*
*pid_file:
/var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/monitor.pid*
*georep_session_working_dir:
/var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/*
*ssh_command_tar: ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem*
*master.stime_xattr_name:
trusted.glusterfs.40e9e77a-034c-44a2-896e-59eec47e8a84.ccfaed9b-ff4b-4a55-acfa-03f092cdf460.stime*
*changelog_log_file:
/var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1-changes.log*
*socketdir: /var/run/gluster*
*volume_id: 40e9e77a-034c-44a2-896e-59eec47e8a84*
*ignore_deletes: false*
*state_socket_unencoded:
/var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.socket*
*log_file:
/var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.log*

If any further information is required in order to troubleshoot this issue
then please let me know.

I would be very grateful for any help or guidance received.

Many thanks,

Mark Betham.

-- 




          This
            email may contain confidential material; 
unintended
            recipients must not disseminate, use, or act upon 
any
            information in it. If you received this email in error,
    
        please contact the sender and permanently delete the email.

       
     Performance Horizon Group Limited | Registered in England
            
& Wales 07188234 | Level 8, West One, Forth Banks,
            Newcastle 
upon Tyne, NE1 3PA

40e9e77a-034c-44a2-896e-59eec47e8a84:storage-server.%2Fdata%2Fbrick0.glustervol1.log
Description: Binary data

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Geo-Replication memory leak on slave node

Reply via email to