Re: [Gluster-devel] IMPORTANT: Patches that need attention for 3.8
On 09/06/16 09:04, Raghavendra Gowdappa wrote: - Original Message - From: "Poornima Gurusiddaiah"To: "Gluster Devel" , "Raghavendra Gowdappa" , "Atin Mukherjee" , "Niels de Vos" , "Shyam" , "Rajesh Joseph" , "Raghavendra Talur" Sent: Wednesday, June 8, 2016 6:34:34 PM Subject: IMPORTANT: Patches that need attention for 3.8 Hi, Here is the list of patches that need to go for 3.8. I request the maintainers of each component mentioned here to review/merge the same at the earliest: Protocol/RPC: http://review.gluster.org/#/c/14647/ http://review.gluster.org/#/c/14648/ Is there a deadline you are targeting these for? I can plan the reviews based on that. The deadline for 3.8 GA is 14th June, 2016. So it should merge on 3.8 branch before that. -- Jiffin Glusterd: http://review.gluster.org/#/c/14626/ Lease: http://review.gluster.org/#/c/14568/ Gfapi: http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:bug-1319992 Regards, Poornima ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] IMPORTANT: Patches that need attention for 3.8
On 06/08/2016 06:34 PM, Poornima Gurusiddaiah wrote: > Hi, > > Here is the list of patches that need to go for 3.8. I request the > maintainers of each component mentioned here to review/merge the same at > the earliest: > > Protocol/RPC: > http://review.gluster.org/#/c/14647/ > http://review.gluster.org/#/c/14648/ > > Glusterd: > http://review.gluster.org/#/c/14626/ Poornima, I already reviewed the patch and acked with +2 and asked you to follow up with Niels to get that in. The reason I didn't merge the patch as it introduces a new feature, not a bug fix. Any new feature specific content needs to be evaluated by Niels being the release manager before the maintainer(s) can merge it. ~Atin > > Lease: > http://review.gluster.org/#/c/14568/ > > Gfapi: > http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:bug-1319992 > > Regards, > Poornima ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] IMPORTANT: Patches that need attention for 3.8
- Original Message - > From: "Poornima Gurusiddaiah"> To: "Gluster Devel" , "Raghavendra Gowdappa" > , "Atin Mukherjee" > , "Niels de Vos" , "Shyam" > , "Rajesh Joseph" > , "Raghavendra Talur" > Sent: Wednesday, June 8, 2016 6:34:34 PM > Subject: IMPORTANT: Patches that need attention for 3.8 > > Hi, > > Here is the list of patches that need to go for 3.8. I request the > maintainers of each component mentioned here to review/merge the same at the > earliest: > > Protocol/RPC: > http://review.gluster.org/#/c/14647/ > http://review.gluster.org/#/c/14648/ Is there a deadline you are targeting these for? I can plan the reviews based on that. > > Glusterd: > http://review.gluster.org/#/c/14626/ > > Lease: > http://review.gluster.org/#/c/14568/ > > Gfapi: > http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:bug-1319992 > > Regards, > Poornima > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Failure to release unusable file open fd_count on glusterfs v3.7.11
Hi, I have a volume created with 3 bricks. After delete file which was created by "echo", the file has been move to unlink folder. Excepted result, opened fd should be zero, and unlink folder contains no file. But actually, opened fd is not zero, unlink folder contains a file. Here are some examples: # gluster volume info ec2 Volume Name: ec2 Type: Disperse Volume ID: 47988520-0e18-4413-9e55-3ec3f3352600 Status: Started Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: giting1:/export/ec2/fs Brick2: giting2:/export/ec2/fs Brick3: giting3:/export/ec2/fs Options Reconfigured: performance.readdir-ahead: on # gluster v status ec2 Status of volume: ec2 Gluster process TCP Port RDMA Port Online Pid -- Brick giting1:/export/ec2/fs49154 0 Y 10856 Brick giting2:/export/ec2/fs49154 0 Y 7967 Brick giting3:/export/ec2/fs49153 0 Y 7216 NFS Server on localhost N/A N/AN N/A Self-heal Daemon on localhost N/A N/AY 10884 NFS Server on giting3 2049 0 Y 7236 Self-heal Daemon on giting3 N/A N/AY 7244 NFS Server on giting2 2049 0 Y 7987 Self-heal Daemon on giting2 N/A N/AY 7995 Task Status of Volume ec2 -- There are no active volume tasks # mount -t glusterfs giting1:ec2 /ec2 # df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 18G 12G 5.8G 67% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs1.9G 0 1.9G 0% /dev/shm tmpfs1.9G 41M 1.9G 3% /run tmpfs1.9G 0 1.9G 0% /sys/fs/cgroup /dev/sdb 40G 33M 40G 1% /export/bk1 /dev/sda1497M 168M 330M 34% /boot tmpfs380M 0 380M 0% /run/user/0 giting1:dht 80G 66M 80G 1% /dht giting1:/ec1 35G 24G 12G 67% /volume/ec1 giting1:ec2 35G 24G 12G 67% /ec2 # gluster v top ec2 open Brick: giting1:/export/ec2/fs Current open fds: 0, Max open fds: 0, Max openfd time: N/A Brick: giting2:/export/ec2/fs Current open fds: 0, Max open fds: 0, Max openfd time: N/A Brick: giting3:/export/ec2/fs Current open fds: 0, Max open fds: 0, Max openfd time: N/A # for ((i=0;i<10;i++)); do echo 123 > /ec2/test.txt; done # gluster v top ec2 open Brick: giting1:/export/ec2/fs Current open fds: 9, Max open fds: 10, Max openfd time: 2016-06-08 10:09:23.665717 Count filename === 10 /test.txt Brick: giting3:/export/ec2/fs Current open fds: 9, Max open fds: 10, Max openfd time: 2016-06-08 10:09:23.299795 Count filename === 10 /test.txt Brick: giting2:/export/ec2/fs Current open fds: 9, Max open fds: 10, Max openfd time: 2016-06-08 10:09:23.236294 Count filename === 10 /test.txt # ll /export/ec2/fs/.glusterfs/unlink/ total 0 # rm /ec2/test.txt # ls -l /export/ec2/fs/.glusterfs/unlink/ total 8 -rw-r--r-- 1 root root 512 Jun 8 18:09 a053b266-15c5-4ac7-ac44-841e177c7ebe # gluster v top ec2 open Brick: giting1:/export/ec2/fs Current open fds: 8, Max open fds: 10, Max openfd time: 2016-06-08 10:09:23.665717 Count filename === 10 /test.txt Brick: giting2:/export/ec2/fs Current open fds: 8, Max open fds: 10, Max openfd time: 2016-06-08 10:09:23.236294 Count filename === 10 /test.txt Brick: giting3:/export/ec2/fs Current open fds: 8, Max open fds: 10, Max openfd time: 2016-06-08 10:09:23.299795 Count filename === 10 /test.txt Reference: Commit: storage/posix: Implement .unlink directory https://github.com/gluster/glusterfs/commit/195548f55b09bf71db92929b7b734407b863093c Regards, Gi-ting Peng ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Weekly community meeting - 8/Jun/16
Thanks everyone who attended the meeting. The logs can be viewed at the links below. Minutes: https://meetbot.fedoraproject.org/gluster-meeting/2016-06-08/weekly_community_meeting_8jun2016.2016-06-08-12.00.html Minutes (text): https://meetbot.fedoraproject.org/gluster-meeting/2016-06-08/weekly_community_meeting_8jun2016.2016-06-08-12.00.txt Log: https://meetbot.fedoraproject.org/gluster-meeting/2016-06-08/weekly_community_meeting_8jun2016.2016-06-08-12.00.log.html Next weeks meeting will be held at the same time and in the same place. ~kaushal Meeting summary --- * Rollcall (kshlm, 12:01:42) * GlusterFS-4.0 (kshlm, 12:04:55) * GlusterFS-3.8 (kshlm, 12:11:46) * LINK: https://bugzilla.redhat.com/showdependencytree.cgi?id=glusterfs-3.8.0_resolved=1 (ndevos, 12:16:44) * GlusterFS-3.7 (kshlm, 12:30:07) * LINK: https://www.gluster.org/pipermail/gluster-devel/2016-June/049767.html (post-factum, 12:31:03) * LINK: http://www.gluster.org/pipermail/maintainers/2016-June/000847.html (kshlm, 12:37:39) * ACTION: kshlm/atinm to ack 3.7.12 before the end of the week (kshlm, 12:38:41) * GlusterFS-3.6 (kshlm, 12:39:36) * Last weeks AIs (kshlm, 12:46:37) * Open floor (kshlm, 12:47:27) * Ganesha (kshlm, 12:48:18) * Open floor (kshlm, 12:52:06) Meeting ended at 12:59:59 UTC. Action Items * kshlm/atinm to ack 3.7.12 before the end of the week Action Items, by person --- * atinm * kshlm/atinm to ack 3.7.12 before the end of the week * kshlm * kshlm/atinm to ack 3.7.12 before the end of the week * **UNASSIGNED** * (none) People Present (lines said) --- * kshlm (99) * atinm (33) * ndevos (32) * misc (9) * post-factum (8) * poornimag (7) * kkeithley (7) * spalai (5) * jdarcy (5) * zodbot (3) * partner (3) * rastar (3) * msvbhat_ (2) * kotreshhr (2) * aravindavk (2) * anoopcs (1) * rafi (1) On Wed, Jun 8, 2016 at 4:27 PM, Kaushal Mwrote: > Hi all, > The weekly meeting will start in #gluster-meeting on Freenode, in > about 1 hour from now. > The agenda for the meeting is available at > https://public.pad.fsfe.org/p/gluster-community-meetings . > Please update the agenda if you have a topic you'd like to discuss. > > ~kaushal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] IMPORTANT: Patches that need attention for 3.8
Hi, Here is the list of patches that need to go for 3.8. I request the maintainers of each component mentioned here to review/merge the same at the earliest: Protocol/RPC: http://review.gluster.org/#/c/14647/ http://review.gluster.org/#/c/14648/ Glusterd: http://review.gluster.org/#/c/14626/ Lease: http://review.gluster.org/#/c/14568/ Gfapi: http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:bug-1319992 Regards, Poornima ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Minutes of Gluster Community Bug Triage meeting at 12:00 UTC ~(in 45 minutes)
Meeting summary --- * Roll call (jiffin, 12:02:49) * kkeithley Saravanakmr will set up Coverity, clang, etc on public facing machine and run it regularly (jiffin, 12:05:07) * ACTION: kkeithley Saravanakmr will set up Coverity, clang, etc on public facing machine and run it regularly (jiffin, 12:07:03) * ACTION: ndevos need to decide on how to provide/use debug builds (jiffin, 12:07:35) * ACTION: ndevos to propose some test-cases for minimal libgfapi test (jiffin, 12:07:44) * Manikandan and gem to followup with kshlm/misc to get access to gluster-infra (jiffin, 12:07:55) * ACTION: Manikandan and gem to followup with kshlm/misc/nigelb to get access to gluster-infra (jiffin, 12:09:50) * ? decide how component maintainers/developers use the BZ queries or RSS-feeds for the Triaged bugs (jiffin, 12:10:59) * ACTION: Saravanakmr will host bug triage meeting on June 14th 2016 (jiffin, 12:17:51) * ACTION: Manikandan will host bug triage meeting on June 21st 2016 (jiffin, 12:17:59) * ACTION: ndevos will host bug triage meeting on June 28th 2016 (jiffin, 12:18:08) * Group Triage (jiffin, 12:18:23) * Open Floor (jiffin, 12:39:07) Meeting ended at 12:41:56 UTC. Action Items * kkeithley Saravanakmr will set up Coverity, clang, etc on public facing machine and run it regularly * ndevos need to decide on how to provide/use debug builds * ndevos to propose some test-cases for minimal libgfapi test * Manikandan and gem to followup with kshlm/misc/nigelb to get access to gluster-infra * Saravanakmr will host bug triage meeting on June 14th 2016 * Manikandan will host bug triage meeting on June 21st 2016 * ndevos will host bug triage meeting on June 28th 2016 Action Items, by person --- * gem * Manikandan and gem to followup with kshlm/misc/nigelb to get access to gluster-infra * kkeithley * kkeithley Saravanakmr will set up Coverity, clang, etc on public facing machine and run it regularly * Saravanakmr * kkeithley Saravanakmr will set up Coverity, clang, etc on public facing machine and run it regularly * Saravanakmr will host bug triage meeting on June 14th 2016 * **UNASSIGNED** * ndevos need to decide on how to provide/use debug builds * ndevos to propose some test-cases for minimal libgfapi test * Manikandan will host bug triage meeting on June 21st 2016 * ndevos will host bug triage meeting on June 28th 2016 People Present (lines said) --- * jiffin (50) * kkeithley (9) * hgowtham (6) * rafi (4) * zodbot (3) * Saravanakmr (3) * gem (3) * skoduri (1) On 07/06/16 16:50, Jiffin Tony Thottan wrote: Hi, This meeting is scheduled for anyone, who is interested in learning more about, or assisting with the Bug Triage. Meeting details: - location: #gluster-meeting on Freenode IRC (https://webchat.freenode.net/?channels=gluster-meeting ) - date: every Tuesday - time: 12:00 UTC (in your terminal, run: date -d "12:00 UTC") - agenda:https://public.pad.fsfe.org/p/gluster-bug-triage Currently the following items are listed: * Roll Call * Status of last weeks action items * Group Triage * Open Floor The last two topics have space for additions. If you have a suitable bug or topic to discuss, please add it to the agenda. Appreciate your participation. Thanks, Jiffin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Weekly community meeting - 8/Jun/16
Hi all, The weekly meeting will start in #gluster-meeting on Freenode, in about 1 hour from now. The agenda for the meeting is available at https://public.pad.fsfe.org/p/gluster-community-meetings . Please update the agenda if you have a topic you'd like to discuss. ~kaushal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
OK, here the results go. I've taken 5 statedumps with 30 mins between each statedump. Also, before taking the statedump, I've recorded memory usage. Memory consumption: 1. root 1010 0.0 9.6 7538188 374864 ? Ssl чер07 0:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 2. root 1010 0.0 9.6 7825048 375312 ? Ssl чер07 0:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 3. root 1010 0.0 9.6 7825048 375312 ? Ssl чер07 0:17 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 4. root 1010 0.0 9.6 8202064 375892 ? Ssl чер07 0:17 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 5. root 1010 0.0 9.6 8316808 376084 ? Ssl чер07 0:17 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 As you may see VIRT constantly grows (except for one measurements), and RSS grows as well, although its increase is considerably smaller. Now lets take a look at statedumps: 1. https://gist.github.com/3fa121c7531d05b210b84d9db763f359 2. https://gist.github.com/87f48b8ac8378262b84d448765730fd9 3. https://gist.github.com/f8780014d8430d67687c70cfd1df9c5c 4. https://gist.github.com/916ac788f806328bad9de5311ce319d7 5. https://gist.github.com/8ba5dbf27d2cc61c04ca954d7fb0a7fd I'd go with comparing first statedump with last one, and here is diff output: https://gist.github.com/e94e7f17fe8b3688c6a92f49cbc15193 I see numbers changing, but now cannot conclude what is meaningful and what is meaningless. Pranith? 08.06.2016 10:06, Pranith Kumar Karampuri написав: On Wed, Jun 8, 2016 at 12:33 PM, Oleksandr Natalenkowrote: Yup, I can do that, but please note that RSS does not change. Will statedump show VIRT values? Also, I'm looking at the numbers now, and see that on each reconnect VIRT grows by ~24M (once per ~10–15 mins). Probably, that could give you some idea what is going wrong. That's interesting. Never saw something like this happen. I would still like to see if there are any clues in statedump when all this happens. May be what you said will be confirmed that nothing new is allocated but I would just like to confirm. 08.06.2016 09:50, Pranith Kumar Karampuri написав: Oleksandr, Could you take statedump of the shd process once in 5-10 minutes and send may be 5 samples of them when it starts to increase? This will help us find what datatypes are being allocated a lot and can lead to coming up with possible theories for the increase. On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko wrote: Also, I've checked shd log files, and found out that for some reason shd constantly reconnects to bricks: [1] Please note that suggested fix [2] by Pranith does not help, VIRT value still grows: === root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === I do not know the reason why it is reconnecting, but I suspect leak to happen on that reconnect. CCing Pranith. [1] http://termbin.com/brob [2] http://review.gluster.org/#/c/14053/ 06.06.2016 12:21, Kaushal M написав: Has multi-threaded SHD been merged into 3.7.* by any chance? If not, what I'm saying below doesn't apply. We saw problems when encrypted transports were used, because the RPC layer was not reaping threads (doing pthread_join) when a connection ended. This lead to similar observations of huge VIRT and relatively small RSS. I'm not sure how multi-threaded shd works, but it could be leaking threads in a similar way. On Mon, Jun 6, 2016 at
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
On Wed, Jun 8, 2016 at 12:33 PM, Oleksandr Natalenko < oleksa...@natalenko.name> wrote: > Yup, I can do that, but please note that RSS does not change. Will > statedump show VIRT values? > > Also, I'm looking at the numbers now, and see that on each reconnect VIRT > grows by ~24M (once per ~10–15 mins). Probably, that could give you some > idea what is going wrong. > That's interesting. Never saw something like this happen. I would still like to see if there are any clues in statedump when all this happens. May be what you said will be confirmed that nothing new is allocated but I would just like to confirm. > 08.06.2016 09:50, Pranith Kumar Karampuri написав: > > Oleksandr, >> Could you take statedump of the shd process once in 5-10 minutes and >> send may be 5 samples of them when it starts to increase? This will >> help us find what datatypes are being allocated a lot and can lead to >> coming up with possible theories for the increase. >> >> On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko >>wrote: >> >> Also, I've checked shd log files, and found out that for some reason >>> shd constantly reconnects to bricks: [1] >>> >>> Please note that suggested fix [2] by Pranith does not help, VIRT >>> value still grows: >>> >>> === >>> root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket >>> --xlator-option >>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 >>> === >>> >>> I do not know the reason why it is reconnecting, but I suspect leak >>> to happen on that reconnect. >>> >>> CCing Pranith. >>> >>> [1] http://termbin.com/brob >>> [2] http://review.gluster.org/#/c/14053/ >>> >>> 06.06.2016 12:21, Kaushal M написав: >>> Has multi-threaded SHD been merged into 3.7.* by any chance? If >>> not, >>> >>> what I'm saying below doesn't apply. >>> >>> We saw problems when encrypted transports were used, because the RPC >>> layer was not reaping threads (doing pthread_join) when a connection >>> ended. This lead to similar observations of huge VIRT and relatively >>> small RSS. >>> >>> I'm not sure how multi-threaded shd works, but it could be leaking >>> threads in a similar way. >>> >>> On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko >>> wrote: >>> Hello. >>> >>> We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for >>> keeping >>> volumes metadata. >>> >>> Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: >>> >>> === >>> root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket >>> --xlator-option >>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 >>> === >>> >>> that is ~73G. RSS seems to be OK (~522M). Here is the statedump of >>> glustershd process: [1] >>> >>> Also, here is sum of sizes, presented in statedump: >>> >>> === >>> # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F >>> '=' 'BEGIN >>> {sum=0} /^size=/ {sum+=$2} END {print sum}' >>> 353276406 >>> === >>> >>> That is ~337 MiB. >>> >>> Also, here are VIRT values from 2 replica nodes: >>> >>> === >>> root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket >>> --xlator-option >>> *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 >>> root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket >>> --xlator-option >>> *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 >>> === >>> >>> Those are 5 to 6G, which is much less than dummy node has, but still >>> look >>> too big for us. >>> >>> Should we care about huge VIRT value on dummy node? Also, how one >>> would >>> debug that? >>> >>> Regards, >>> Oleksandr. >>> >>> [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 >>> ___ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-devel >>> >> >> -- >> >> Pranith >> > -- Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Yup, I can do that, but please note that RSS does not change. Will statedump show VIRT values? Also, I'm looking at the numbers now, and see that on each reconnect VIRT grows by ~24M (once per ~10–15 mins). Probably, that could give you some idea what is going wrong. 08.06.2016 09:50, Pranith Kumar Karampuri написав: Oleksandr, Could you take statedump of the shd process once in 5-10 minutes and send may be 5 samples of them when it starts to increase? This will help us find what datatypes are being allocated a lot and can lead to coming up with possible theories for the increase. On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenkowrote: Also, I've checked shd log files, and found out that for some reason shd constantly reconnects to bricks: [1] Please note that suggested fix [2] by Pranith does not help, VIRT value still grows: === root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === I do not know the reason why it is reconnecting, but I suspect leak to happen on that reconnect. CCing Pranith. [1] http://termbin.com/brob [2] http://review.gluster.org/#/c/14053/ 06.06.2016 12:21, Kaushal M написав: Has multi-threaded SHD been merged into 3.7.* by any chance? If not, what I'm saying below doesn't apply. We saw problems when encrypted transports were used, because the RPC layer was not reaping threads (doing pthread_join) when a connection ended. This lead to similar observations of huge VIRT and relatively small RSS. I'm not sure how multi-threaded shd works, but it could be leaking threads in a similar way. On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko wrote: Hello. We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for keeping volumes metadata. Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: === root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === that is ~73G. RSS seems to be OK (~522M). Here is the statedump of glustershd process: [1] Also, here is sum of sizes, presented in statedump: === # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}' 353276406 === That is ~337 MiB. Also, here are VIRT values from 2 replica nodes: === root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 === Those are 5 to 6G, which is much less than dummy node has, but still look too big for us. Should we care about huge VIRT value on dummy node? Also, how one would debug that? Regards, Oleksandr. [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Oleksandr, Could you take statedump of the shd process once in 5-10 minutes and send may be 5 samples of them when it starts to increase? This will help us find what datatypes are being allocated a lot and can lead to coming up with possible theories for the increase. On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko < oleksa...@natalenko.name> wrote: > Also, I've checked shd log files, and found out that for some reason shd > constantly reconnects to bricks: [1] > > Please note that suggested fix [2] by Pranith does not help, VIRT value > still grows: > > === > root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option > *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 > === > > I do not know the reason why it is reconnecting, but I suspect leak to > happen on that reconnect. > > CCing Pranith. > > [1] http://termbin.com/brob > [2] http://review.gluster.org/#/c/14053/ > > 06.06.2016 12:21, Kaushal M написав: > >> Has multi-threaded SHD been merged into 3.7.* by any chance? If not, >> >> what I'm saying below doesn't apply. >> >> We saw problems when encrypted transports were used, because the RPC >> layer was not reaping threads (doing pthread_join) when a connection >> ended. This lead to similar observations of huge VIRT and relatively >> small RSS. >> >> I'm not sure how multi-threaded shd works, but it could be leaking >> threads in a similar way. >> >> On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko >>wrote: >> >>> Hello. >>> >>> We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for >>> keeping >>> volumes metadata. >>> >>> Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: >>> >>> === >>> root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option >>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 >>> === >>> >>> that is ~73G. RSS seems to be OK (~522M). Here is the statedump of >>> glustershd process: [1] >>> >>> Also, here is sum of sizes, presented in statedump: >>> >>> === >>> # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' >>> 'BEGIN >>> {sum=0} /^size=/ {sum+=$2} END {print sum}' >>> 353276406 >>> === >>> >>> That is ~337 MiB. >>> >>> Also, here are VIRT values from 2 replica nodes: >>> >>> === >>> root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option >>> *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 >>> root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option >>> *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 >>> === >>> >>> Those are 5 to 6G, which is much less than dummy node has, but still look >>> too big for us. >>> >>> Should we care about huge VIRT value on dummy node? Also, how one would >>> debug that? >>> >>> Regards, >>> Oleksandr. >>> >>> [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 >>> ___ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-devel >>> >> -- Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Also, I've checked shd log files, and found out that for some reason shd constantly reconnects to bricks: [1] Please note that suggested fix [2] by Pranith does not help, VIRT value still grows: === root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === I do not know the reason why it is reconnecting, but I suspect leak to happen on that reconnect. CCing Pranith. [1] http://termbin.com/brob [2] http://review.gluster.org/#/c/14053/ 06.06.2016 12:21, Kaushal M написав: Has multi-threaded SHD been merged into 3.7.* by any chance? If not, what I'm saying below doesn't apply. We saw problems when encrypted transports were used, because the RPC layer was not reaping threads (doing pthread_join) when a connection ended. This lead to similar observations of huge VIRT and relatively small RSS. I'm not sure how multi-threaded shd works, but it could be leaking threads in a similar way. On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenkowrote: Hello. We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for keeping volumes metadata. Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: === root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === that is ~73G. RSS seems to be OK (~522M). Here is the statedump of glustershd process: [1] Also, here is sum of sizes, presented in statedump: === # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}' 353276406 === That is ~337 MiB. Also, here are VIRT values from 2 replica nodes: === root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 === Those are 5 to 6G, which is much less than dummy node has, but still look too big for us. Should we care about huge VIRT value on dummy node? Also, how one would debug that? Regards, Oleksandr. [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel