On Wed, Jun 8, 2016 at 12:33 PM, Oleksandr Natalenko < oleksa...@natalenko.name> wrote:
> Yup, I can do that, but please note that RSS does not change. Will > statedump show VIRT values? > > Also, I'm looking at the numbers now, and see that on each reconnect VIRT > grows by ~24M (once per ~10–15 mins). Probably, that could give you some > idea what is going wrong. > That's interesting. Never saw something like this happen. I would still like to see if there are any clues in statedump when all this happens. May be what you said will be confirmed that nothing new is allocated but I would just like to confirm. > 08.06.2016 09:50, Pranith Kumar Karampuri написав: > > Oleksandr, >> Could you take statedump of the shd process once in 5-10 minutes and >> send may be 5 samples of them when it starts to increase? This will >> help us find what datatypes are being allocated a lot and can lead to >> coming up with possible theories for the increase. >> >> On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko >> <oleksa...@natalenko.name> wrote: >> >> Also, I've checked shd log files, and found out that for some reason >>> shd constantly reconnects to bricks: [1] >>> >>> Please note that suggested fix [2] by Pranith does not help, VIRT >>> value still grows: >>> >>> === >>> root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket >>> --xlator-option >>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 >>> === >>> >>> I do not know the reason why it is reconnecting, but I suspect leak >>> to happen on that reconnect. >>> >>> CCing Pranith. >>> >>> [1] http://termbin.com/brob >>> [2] http://review.gluster.org/#/c/14053/ >>> >>> 06.06.2016 12:21, Kaushal M написав: >>> Has multi-threaded SHD been merged into 3.7.* by any chance? If >>> not, >>> >>> what I'm saying below doesn't apply. >>> >>> We saw problems when encrypted transports were used, because the RPC >>> layer was not reaping threads (doing pthread_join) when a connection >>> ended. This lead to similar observations of huge VIRT and relatively >>> small RSS. >>> >>> I'm not sure how multi-threaded shd works, but it could be leaking >>> threads in a similar way. >>> >>> On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko >>> <oleksa...@natalenko.name> wrote: >>> Hello. >>> >>> We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for >>> keeping >>> volumes metadata. >>> >>> Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: >>> >>> === >>> root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket >>> --xlator-option >>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 >>> === >>> >>> that is ~73G. RSS seems to be OK (~522M). Here is the statedump of >>> glustershd process: [1] >>> >>> Also, here is sum of sizes, presented in statedump: >>> >>> === >>> # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F >>> '=' 'BEGIN >>> {sum=0} /^size=/ {sum+=$2} END {print sum}' >>> 353276406 >>> === >>> >>> That is ~337 MiB. >>> >>> Also, here are VIRT values from 2 replica nodes: >>> >>> === >>> root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket >>> --xlator-option >>> *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 >>> root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket >>> --xlator-option >>> *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 >>> === >>> >>> Those are 5 to 6G, which is much less than dummy node has, but still >>> look >>> too big for us. >>> >>> Should we care about huge VIRT value on dummy node? Also, how one >>> would >>> debug that? >>> >>> Regards, >>> Oleksandr. >>> >>> [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 >>> _______________________________________________ >>> Gluster-devel mailing list >>> gluster-de...@gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-devel >>> >> >> -- >> >> Pranith >> > -- Pranith
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users