Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client
On 01/06/2016 01:58 PM, Oleksandr Natalenko wrote: OK, here is valgrind log of patched Ganesha (I took recent version of your patchset, 8685abfc6d) with Entries_HWMARK set to 500. https://gist.github.com/5397c152a259b9600af0 See no huge runtime leaks now. Glad to hear this :) However, I've repeated this test with another volume in replica and got the following Ganesha error: === ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >= nlookup' failed. === I repeated the tests on replica volume as well. But haven't hit any assert. Could you confirm if you have taken the latest gluster patch set #3 ? - http://review.gluster.org/#/c/13096/3 If you are hitting the issue even then, please provide the core if possible. Thanks, Soumya 06.01.2016 08:40, Soumya Koduri написав: On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote: OK, I've repeated the same traversing test with patched GlusterFS API, and here is new Valgrind log: https://gist.github.com/17ecb16a11c9aed957f5 Fuse mount doesn't use gfapi helper. Does your above GlusterFS API application call glfs_fini() during exit? glfs_fini() is responsible for freeing the memory consumed by gfAPI applications. Could you repeat the test with nfs-ganesha (which for sure calls glfs_fini() and purges inodes if exceeds its inode cache limit) if possible. Thanks, Soumya Still leaks. On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote: On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote: Unfortunately, both patches didn't make any difference for me. I've patched 3.7.6 with both patches, recompiled and installed patched GlusterFS package on client side and mounted volume with ~2M of files. The I performed usual tree traverse with simple "find". Memory RES value went from ~130M at the moment of mounting to ~1.5G after traversing the volume for ~40 mins. Valgrind log still shows lots of leaks. Here it is: https://gist.github.com/56906ca6e657c4ffa4a1 Looks like you had done fuse mount. The patches which I have pasted below apply to gfapi/nfs-ganesha applications. Also, to resolve the nfs-ganesha issue which I had mentioned below (in case if Entries_HWMARK option gets changed), I have posted below fix - https://review.gerrithub.io/#/c/258687 Thanks, Soumya Ideas? 05.01.2016 12:31, Soumya Koduri написав: I tried to debug the inode* related leaks and seen some improvements after applying the below patches when ran the same test (but will smaller load). Could you please apply those patches & confirm the same? a) http://review.gluster.org/13125 This will fix the inodes & their ctx related leaks during unexport and the program exit. Please check the valgrind output after applying the patch. It should not list any inodes related memory as lost. b) http://review.gluster.org/13096 The reason the change in Entries_HWMARK (in your earlier mail) dint have much effect is that the inode_nlookup count doesn't become zero for those handles/inodes being closed by ganesha. Hence those inodes shall get added to inode lru list instead of purge list which shall get forcefully purged only when the number of gfapi inode table entries reaches its limit (which is 137012). This patch fixes those 'nlookup' counts. Please apply this patch and reduce 'Entries_HWMARK' to much lower value and check if it decreases the in-memory being consumed by ganesha process while being active. CACHEINODE { Entries_HWMark = 500; } Note: I see an issue with nfs-ganesha during exit when the option 'Entries_HWMARK' gets changed. This is not related to any of the above patches (or rather Gluster) and I am currently debugging it. Thanks, Soumya On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote: 1. test with Cache_Size = 256 and Entries_HWMark = 4096 Before find . -type f: root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT After: root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT ~250M leak. 2. test with default values (after ganesha restart) Before: root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT After: root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT ~159M leak. No reasonable correlation detected. Second test was finished much faster than first (I guess, server-side GlusterFS cache or server kernel page cache is the cause). There are ~1.8M files on this test volume. On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote: On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: Another addition: it seems to be GlusterFS API library memory leak because NFS-Ganesha also consumes huge amount of memory while doi
Re: [Gluster-devel] NLMv4 isn't available over UDP
On Thu, Jan 7, 2016 at 11:00 AM, Sponge Tsui wrote: > hi, I'm sorry to ask for help if it is a common question. I noticed > that in glusterfs NLMv4 is only available over TCP,but not available > over UDP as the show of "rpcinfo -p" when using glusterfs to export > volumes through NFS protocol. The result is as follow: > > program vers proto port service > 104 tcp111 portmapper > 103 tcp111 portmapper > 102 tcp111 portmapper > 104 udp111 portmapper > 103 udp111 portmapper > 102 udp111 portmapper > 153 tcp 38465 mountd > 151 tcp 38466 mountd > 133 tcp 2049 nfs > 1000214 tcp 38468 nlockmgr > 1002273 tcp 2049 > 1000211 udp868 nlockmgr > 1000211 tcp869 nlockmgr > 1000241 udp 34507 status > 1000241 tcp 40511 status > > I want to find out that why NLMv4 wasn't supported over UDP while > NLMv4 has been supported over TCP already. Whether or not NLMv4 over > UDP would be took into account. > > Please excuse me if I've used bad terminology. Maybe it's because of > some other issues that i didn't realize. To be honest, I'm a newcomer > for glusterfs less than a mouth. > > Thanks for your attention. I'm looking forward to your reply, and any > answers will be appreciated ! The GlusterFS NFS service only supports the NFSv3 protocol over TCP. So the NLM service was also implemented only over TCP. As for the reason why only TCP support was implemented for NFS, it was probably because NFSv3 over TCP provided support for most clients, with the least amount of implementation effort. > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] quota.t hangs on NetBSD machines
- Original Message - > From: "Manikandan Selvaganesh" > To: "Raghavendra G" > Cc: "Gluster Devel" > Sent: Wednesday, January 6, 2016 7:54:32 PM > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines > > Hi, > We are debugging the issue. With the patch[1], the quota.t doesn't seem to > hang and all the test passes successfully but it throws an error(while > running test #24 in quota.t) "perfused: perfuse_node_inactive: > perfuse_node_fsync failed error = 69: Resource temporarily unavailable". I > have attached a tar file which contains the logs while the test(quota.t) is > being run. > Thanks to Raghavendra Talur and Vijay for helping :) > > [1] http://review.gluster.org/#/c/13177/ I've merged this patch. If there are any tests hung, please kill them and retrigger. > > Thank you :-) > > -- > Regards, > Manikandan Selvaganesh. > > - Original Message - > From: "Manikandan Selvaganesh" > To: "Raghavendra G" > Cc: "Gluster Devel" > Sent: Wednesday, January 6, 2016 11:04:23 AM > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines > > Hi Raghavendra, > I will check out this with the fix and update you soon on this :) > > -- > Regards, > Manikandan Selvaganesh. > > - Original Message - > From: "Raghavendra G" > To: "Raghavendra Gowdappa" > Cc: "Manikandan Selvaganesh" , "Gluster Devel" > > Sent: Tuesday, January 5, 2016 11:15:57 PM > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines > > Manikandan, > > Can you test this fix since you've the setup ready? > > regards, > Raghavendra > > On Tue, Jan 5, 2016 at 11:11 PM, Raghavendra G > wrote: > > > A fix has been sent to: > > http://review.gluster.org/13177 > > > > On Tue, Jan 5, 2016 at 2:24 PM, Raghavendra Gowdappa > > wrote: > > > >> > >> > >> - Original Message - > >> > From: "Raghavendra Gowdappa" > >> > To: "Manikandan Selvaganesh" > >> > Cc: "Vijaikumar Mallikarjuna" , "Emmanuel > >> Dreyfus" , "Raghavendra Talur" > >> > , "Gluster Devel" > >> > Sent: Tuesday, January 5, 2016 12:16:27 PM > >> > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines > >> > > >> > Thanks. There is a write call which is not unwound by write-behind as > >> can be > >> > seen below: > >> > > >> > [.WRITE] > >> > request-ptr=0xb80a4830 > >> > refcount=2 > >> > wound=no > >> > generation-number=90 > >> > req->op_ret=32768 > >> > req->op_errno=0 > >> > sync-attempts=0 > >> > sync-in-progress=no > >> > size=32768 > >> > offset=11665408 > >> > lied=0 > >> > append=0 > >> > fulfilled=0 > >> > go=0 > >> > > >> > note, the request is not wound (wound=no and sync-in-progress=no), not > >> > unwound (lied=0). I am yet to figure out the RCA. will be sending a > >> patch > >> > soon. > >> > >> I figured out this issue occurs when "trickling-writes" is on in > >> write-behind (by default its on). Unfortunately this option cannot be > >> turned off using cli as of now (one can edit volfiles though). > >> > >> > > >> > - Original Message - > >> > > From: "Manikandan Selvaganesh" > >> > > To: "Raghavendra Gowdappa" > >> > > Cc: "Vijaikumar Mallikarjuna" , "Emmanuel > >> Dreyfus" > >> > > , "Raghavendra Talur" > >> > > , "Gluster Devel" > >> > > Sent: Tuesday, January 5, 2016 11:43:55 AM > >> > > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines > >> > > > >> > > Hi Raghavendra, > >> > > > >> > > Yeah, we have taken the statedump when the test program was in 'D' > >> state. I > >> > > have enabled statedump of inodes too. > >> > > > >> > > Attaching the entire statedump file. > >> > > > >> > > Thank you :-) > >> > > > >> > > -- > >> > > Regards, > >> > > Manikandan Selvaganesh. > >> > > > >> > > - Original Message - > >> > > From: "Raghavendra Gowdappa" > >> > > To: "Manikandan Selvaganesh" > >> > > Cc: "Emmanuel Dreyfus" , "Gluster Devel" > >> > > , "Vijaikumar Mallikarjuna" > >> > > > >> > > Sent: Monday, January 4, 2016 11:56:03 PM > >> > > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines > >> > > > >> > > > >> > > > >> > > - Original Message - > >> > > > From: "Manikandan Selvaganesh" > >> > > > To: "Raghavendra Gowdappa" > >> > > > Cc: "Emmanuel Dreyfus" , "Gluster Devel" > >> > > > , "Vijaikumar Mallikarjuna" > >> > > > > >> > > > Sent: Monday, January 4, 2016 7:00:16 PM > >> > > > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines > >> > > > > >> > > > Hi, > >> > > > > >> > > > We have taken statedump of fuse client process, quotad and bricks. > >> > > > Apparently, we could not find any stack information in brick's > >> statedump. > >> > > > Below is the client statedump state information: > >> > > > >> > > Thanks Manikandan :). You took this statedump once the test program > >> was in > >> > > 'D' state, right? Otherwise these can be just in-transit fops. > >> > > > >> > > > > >> > > > [global.callpool.stack.1.frame.1] > >> > > > frame=0xb80775f0 > >> > > > ref_count=0 > >> > > > translator=patchy-write-behind > >> > > > c
[Gluster-devel] NLMv4 isn't available over UDP
hi, I'm sorry to ask for help if it is a common question. I noticed that in glusterfs NLMv4 is only available over TCP,but not available over UDP as the show of "rpcinfo -p" when using glusterfs to export volumes through NFS protocol. The result is as follow: program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 153 tcp 38465 mountd 151 tcp 38466 mountd 133 tcp 2049 nfs 1000214 tcp 38468 nlockmgr 1002273 tcp 2049 1000211 udp868 nlockmgr 1000211 tcp869 nlockmgr 1000241 udp 34507 status 1000241 tcp 40511 status I want to find out that why NLMv4 wasn't supported over UDP while NLMv4 has been supported over TCP already. Whether or not NLMv4 over UDP would be took into account. Please excuse me if I've used bad terminology. Maybe it's because of some other issues that i didn't realize. To be honest, I'm a newcomer for glusterfs less than a mouth. Thanks for your attention. I'm looking forward to your reply, and any answers will be appreciated ! ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compare-bug-version-and-git-branch.sh FAILING
This says "BUG id 1296206 is marked private, please remove the groups." Does the bug is limited to a private group? If so that could be the reason, otherwise rest of the things look fine to me. Thanks, Atin On 01/07/2016 10:12 AM, Milind Changire wrote: > for patch: http://review.gluster.org/13186 > Jenkins failed > job: https://build.gluster.org/job/compare-bug-version-and-git-branch/14201/ > > > I had mistakenly entered a downstream BUG ID for rfc.sh and then later > amended the commit message with the correct mainline BUG ID and > resubmitted via rfc.sh. I also corrected the Topic tag in Gerritt to use > the correct BUG ID. > > But the job for this patch is failing even after corrections. > > Please advise. > > -- > Milind > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Gluster 4.0 Roadmap page
Folks, A month back I created the roadmap page for Gluster 4.0 [1] which should be the place holder to capture details in terms of feature pages, specs links etc. I'd request all 4.0 initiative leads to update it with the respective details (send a PR to glusterweb repo). I also submitted a PR [2] to update the page with all the relevant details I have as of now. [1] https://www.gluster.org/community/roadmap/4.0/ [2] https://github.com/gluster/glusterweb/pull/34 Thanks, Atin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] compare-bug-version-and-git-branch.sh FAILING
for patch: http://review.gluster.org/13186 Jenkins failed job: https://build.gluster.org/job/compare-bug-version-and-git-branch/14201/ I had mistakenly entered a downstream BUG ID for rfc.sh and then later amended the commit message with the correct mainline BUG ID and resubmitted via rfc.sh. I also corrected the Topic tag in Gerritt to use the correct BUG ID. But the job for this patch is failing even after corrections. Please advise. -- Milind ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NLMv4 isn't available over UDPv4
hi, I'm sorry to ask for help if it is a common question. I noticed that in glusterfs NLMv4 is only available over TCP,but not available over UDP as the show of "rpcinfo -p" when using glusterfs to export volumes through NFS protocol. The result is as follow: program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 153 tcp 38465 mountd 151 tcp 38466 mountd 133 tcp 2049 nfs 1000214 tcp 38468 nlockmgr 1002273 tcp 2049 1000211 udp868 nlockmgr 1000211 tcp869 nlockmgr 1000241 udp 34507 status 1000241 tcp 40511 status I want to find out that why NLMv4 wasn't supported over UDP while NLMv4 has been supported over TCP already. Whether or not NLMv4 over UDP would be took into account. Please excuse me if I've used bad terminology. Maybe it's because of some other issues that i didn't realize. To be honest, I'm a newcomer for glusterfs less than a mouth. Thanks for your attention. I'm looking forward to your reply, and any answers will be appreciated ! ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] snapshot/bug-1227646.t throws core [rev...@dev.gluster.org: Change in glusterfs[master]: hook-scripts: reconsile mount, fixing manual mount]
On 01/06/2016 01:55 AM, Rajesh Joseph wrote: Yesterday, I looked into the core. From the core it looked like the rebalance process crashed during cleanup_and_exit. The frame seems to be corrupted. Therefore it does not look like a snapshot bug. The test case tests snapshot with tiered volume. If somebody from rebalance or tiering team can also take a look then it would help. Raghavendar, Shyam - can you please help here? Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] quota.t hangs on NetBSD machines
Manikandan Selvaganesh wrote: > We are debugging the issue. With the patch[1], the quota.t doesn't seem to > hang and all the test passes successfully but it throws an error(while > running test #24 in quota.t) "perfused: perfuse_node_inactive: > perfuse_node_fsync failed error = 69: Resource temporarily unavailable". This is a non fatal warning about a failed fsync. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
On 01/06/2016 07:50 PM, Jeff Darcy wrote: 1) fops will be compounded per inode, meaning 2 fops on different inodes can't be compounded (Not because of the design, Just reducing scope of the problem). 2) Each xlator that wants a compound fop packs the arguments by itself. Packed how? Are we talking about XDR here, or something else? How is dict_t handled? Will there be generic packing/unpacking code somewhere, or is each translator expected to do this manually? Packed as mentioned in step-4 below. There will be common functions provided which will fill an array cell with the given information to the function for that fop. In conjunction to that there will be filling functions for each of the compound fops listed at: https://public.pad.fsfe.org/p/glusterfs-compound-fops. XDR should be similar to what Soumya suggested in earlier mails just like in NFS. 3) On the server side a de-compounder placed below server xlator unpacks the arguments and does the necessary operations. 4) Arguments for compound fops will be passed as array of union of structures where each structure is associated with a fop. 5) Each xlator will have _compound_fop () which receives the fop and does additional processing that is required for itself. What happens when (not if) some translator fails to provide this? Is there a default function? Is there something at the end of the chain that will log an error if the fop gets that far without being handled (as with GF_FOP_IPC)? Yes there will be default_fop provided just like other fops which is just a pass through. Posix will log unwind with -1, ENOTSUPP. 6) Response will also be an array of union of response structures where each structure is associated with a fop's response. What are the error semantics? Does processing of a series always stop at the first error, or are there some errors that allow retry/continue? If/when processing stops, who's responsible for cleaning up state changed by those parts that succeeded? What happens if the connection dies in the middle? Yes, at the moment we are implementing stop at first error semantics as it seems to satisfy all the compound fops we listed @ https://public.pad.fsfe.org/p/glusterfs-compound-fops. Each translator which looks to handle the compound fop should handle failures just like they do for normal fop at the moment. How are values returned from one operation in a series propagated as arguments for the next? They are not. In the first cut the only dependency between two fops now is whether the previous one succeeded or not. Just this much seems to work fine for the fops we are targeting for now: https://public.pad.fsfe.org/p/glusterfs-compound-fops, We may have to enhance it in future based on what will come up in the future. What are the implications for buffer and message sizes? What are the limits on how large these can get, and/or how many operations can be compounded? It depends on the limits imposed by rpc layer. If it can't send the request, the fop will fail. If it can send the request but the response is too big to send back, I think the fop will lead to error by frame timeout for the response. Either way it will be a failure. At the moment for the fops listed at: https://public.pad.fsfe.org/p/glusterfs-compound-fops this doesn't seem to be a problem. How is synchronization handled? Is the inode locked for the duration of the compound operation, to prevent other operations from changing the context in which later parts of the compound operation execute? Are there possibilities for deadlock here? Alternatively, if no locking is done, are we going to document the fact that compound operations are not atomic/linearizable? Since we are limiting the scope to single inode fops, locking should suffice. EC doesn't have any problem as it just has one lock for both data/entry, metadata locks. In afr we need to come up with locking order for metadata, data domains. Something similar to what we do in rename where we need to take multiple locks. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Meeting minutes for Gluster Community Meeting 6-Jan-2016
Happy New Year everyone! The years first community meeting has concluded. Not much of the AIs from the last meeting[1] were acted upon, mostly because it was the holidays. We had a quite a long discussion on the purpose and usage of gluster-specs repository. Some of the results should be visible in the coming days. Nothing much else was discussed. The links to the logs and meeting minutes are below. The minutes have also been pasted at the end of the mail for quick reference. Minutes: http://meetbot.fedoraproject.org/gluster-meeting/2016-01-06/weekly_gluster_community_meeting.2016-01-06-12.03.html Minutes (text): http://meetbot.fedoraproject.org/gluster-meeting/2016-01-06/weekly_gluster_community_meeting.2016-01-06-12.03.txt Log: http://meetbot.fedoraproject.org/gluster-meeting/2016-01-06/weekly_gluster_community_meeting.2016-01-06-12.03.log.html Thank you. Kaushal [1] The last meeting was 2 weeks back on 23rd December. Last weeks meeting wasn't held, as none of us showed up Meeting summary --- * Rollcall (kshlm, 12:03:38) * Last weeks AIs (kshlm, 12:06:09) * ndevos to send out a reminder to the maintainers about more actively enforcing backports of bugfixes (next year) (kshlm, 12:06:51) * ACTION: ndevos to send out a reminder to the maintainers about more actively enforcing backports of bugfixes (this year) (kshlm, 12:08:27) * rastar and msvbhat to consolidate and publish testing matrix on gluster.org. amye can help post Jan 1. (kshlm, 12:09:14) * ACTION: rastar and msvbhat to consolidate and publish testing matrix on gluster.org. amye can help post Jan 1. (kshlm, 12:11:20) * kshlm & csim to set up faux/pseudo user email for gerrit, bugzilla, github (after csim comes back on 4th Jan) (kshlm, 12:11:35) * ACTION: kshlm & csim to set up faux/pseudo user email for gerrit, bugzilla, github (after csim comes back on 4th Jan) (kshlm, 12:12:11) * hagarth to create 3.6.8 for bugzilla version (kshlm, 12:13:30) * kkeithley to send a mail about using sanity checker tools in the codebase (kshlm, 12:15:06) * ACTION: kkeithley to send a mail about using sanity checker tools in the codebase (kshlm, 12:17:49) * rtalur/rastar will send a seperate email about the Gerrit patch merge strategies to the maintainers list (kshlm, 12:18:12) * ACTION: rtalur/rastar will send a seperate email about the Gerrit patch merge strategies to the maintainers list (kshlm, 12:19:10) * atinm/hagarth to share details on the MVP plan for Gluster-4.0 (kshlm, 12:19:27) * LINK: https://www.gluster.org/pipermail/gluster-devel/2015-December/047528.html (kshlm, 12:21:02) * LINK: http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/13337 (ndevos, 12:21:08) * ACTION: kshlm to write up a README for glusterfs-specs. (kshlm, 12:51:39) * GlusterFS 3.7 (kshlm, 12:52:46) * GlusterFS 3.6 (kshlm, 12:53:47) * GlusterFS-3.5 (kshlm, 12:56:11) * Gluster 3.8 and 4.0 (kshlm, 12:57:41) Meeting ended at 13:04:24 UTC. Action Items * ndevos to send out a reminder to the maintainers about more actively enforcing backports of bugfixes (this year) * rastar and msvbhat to consolidate and publish testing matrix on gluster.org. amye can help post Jan 1. * kshlm & csim to set up faux/pseudo user email for gerrit, bugzilla, github (after csim comes back on 4th Jan) * kkeithley to send a mail about using sanity checker tools in the codebase * rtalur/rastar will send a seperate email about the Gerrit patch merge strategies to the maintainers list * kshlm to write up a README for glusterfs-specs. Action Items, by person --- * kshlm * kshlm & csim to set up faux/pseudo user email for gerrit, bugzilla, github (after csim comes back on 4th Jan) * kshlm to write up a README for glusterfs-specs. * ndevos * ndevos to send out a reminder to the maintainers about more actively enforcing backports of bugfixes (this year) * **UNASSIGNED** * rastar and msvbhat to consolidate and publish testing matrix on gluster.org. amye can help post Jan 1. * kkeithley to send a mail about using sanity checker tools in the codebase * rtalur/rastar will send a seperate email about the Gerrit patch merge strategies to the maintainers list People Present (lines said) --- * kshlm (103) * ndevos (35) * jdarcy (23) * justinclift (15) * overclk (9) * zodbot (3) * aravindavk (3) * anoopcs (1) * obnox (1) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
> 1) fops will be compounded per inode, meaning 2 fops on different > inodes can't be compounded (Not because of the design, Just reducing > scope of the problem). > > 2) Each xlator that wants a compound fop packs the arguments by > itself. Packed how? Are we talking about XDR here, or something else? How is dict_t handled? Will there be generic packing/unpacking code somewhere, or is each translator expected to do this manually? > 3) On the server side a de-compounder placed below server xlator > unpacks the arguments and does the necessary operations. > > 4) Arguments for compound fops will be passed as array of union of > structures where each structure is associated with a fop. > > 5) Each xlator will have _compound_fop () which receives the > fop and does additional processing that is required for itself. What happens when (not if) some translator fails to provide this? Is there a default function? Is there something at the end of the chain that will log an error if the fop gets that far without being handled (as with GF_FOP_IPC)? > 6) Response will also be an array of union of response structures > where each structure is associated with a fop's response. What are the error semantics? Does processing of a series always stop at the first error, or are there some errors that allow retry/continue? If/when processing stops, who's responsible for cleaning up state changed by those parts that succeeded? What happens if the connection dies in the middle? How are values returned from one operation in a series propagated as arguments for the next? What are the implications for buffer and message sizes? What are the limits on how large these can get, and/or how many operations can be compounded? How is synchronization handled? Is the inode locked for the duration of the compound operation, to prevent other operations from changing the context in which later parts of the compound operation execute? Are there possibilities for deadlock here? Alternatively, if no locking is done, are we going to document the fact that compound operations are not atomic/linearizable? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
Hi, After discussions with Pranith and Soumya, here is the design for compound fops: 1) fops will be compounded per inode, meaning 2 fops on different inodes can't be compounded (Not because of the design, Just reducing scope of the problem). 2) Each xlator that wants a compound fop packs the arguments by itself. 3) On the server side a de-compounder placed below server xlator unpacks the arguments and does the necessary operations. 4) Arguments for compound fops will be passed as array of union of structures where each structure is associated with a fop. 5) Each xlator will have _compound_fop () which receives the fop and does additional processing that is required for itself. 6) Response will also be an array of union of response structures where each structure is associated with a fop's response. Comments welcome! - Original Message - > From: "Milind Changire" > To: "Jeff Darcy" > Cc: "Gluster Devel" > Sent: Friday, December 11, 2015 9:25:38 PM > Subject: Re: [Gluster-devel] compound fop design first cut > > > > On Wed, Dec 9, 2015 at 8:02 PM, Jeff Darcy < jda...@redhat.com > wrote: > > > > > > On December 9, 2015 at 7:07:06 AM, Ira Cooper ( i...@redhat.com ) wrote: > > A simple "abort on failure" and let the higher levels clean it up is > > probably right for the type of compounding I propose. It is what SMB2 > > does. So, if you get an error return value, cancel the rest of the > > request, and have it return ECOMPOUND as the errno. > > This is exactly the part that worries me. If a compound operation > fails, some parts of it will often need to be undone. “Let the higher > levels clean it up” means that rollback code will be scattered among all > of the translators that use compound operations. Some of them will do > it right. Others . . . less so. ;) All willl have to be tested > separately. If we centralize dispatch of compound operations into one > piece of code, we can centralize error detection and recovery likewise. > That ensures uniformity of implementation, and facilitates focused > testing (or even formal proof) of that implementation. > > Can we gain the same benefits with a more generic design? Perhaps. It > would require that the compounding translator know how to reverse each > type of operation, so that it can do so after an error. That’s > feasible, though it does mean maintaining a stack of undo actions > instead of a simple state. It might also mean testing combinations and > scenarios that will actually never occur in other components’ usage of > the compounding feature. More likely it means that people will *think* > they can use the facility in unanticipated ways, until their > unanticipated usage creates a combination or scenario that was never > tested and doesn’t work. Those are going to be hard problems to debug. > I think it’s better to be explicit about which permutations we actually > expect to work, and have those working earlier. > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > Could we have a dry-run phase and a commit phase for the compound operation. > The dry-run phase phase could test the validity of the transaction and the > commit phase can actually perform the operation. > > If any of the operation in the dry-run operation sequence returns error, the > compound operation can be aborted immediately without the complexity of an > undo ... scattered or centralized. > > But if the subsequent operations depend on the changed state of the system > from earlier operations, then we'll have to introduce a system state object > for such transactions ... and maybe serialize such operations. The system > state object can be passed through the operation sequence. How well this > idea would work in a multi-threaded world is not clear to me too. > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel -- Thanks, Anuradha. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD tests not running to completion.
I re triggered NetBSD regressions for http://review.gluster.org/#/c/13041/3 but they are being run in silent mode and are not completing. Can some one from the infra-team take a look? The last 22 tests in https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/ have failed. Highly unlikely that something is wrong with all those patches. Thanks, Ravi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] GFID to Path Conversion
regards Aravinda On 01/06/2016 02:49 AM, Shyam wrote: On 12/09/2015 12:47 AM, Aravinda wrote: Hi, Sharing draft design for GFID to Path Conversion.(Directory GFID to Path is very easy in DHT v.1, this design may not work in case of DHT 2.0) (current thought) DHT2 would extend the manner in which name,pGFID is stored for files, for directories. So reverse path walking would leverage the same mechanism as explained below. Of course, as this would involve MDS hopping, the intention would be to *not* use this in IO critical paths, and rather use this in the tool set that needs reverse path walks to provide information to admins. Performance and Storage space impact yet to be analyzed. Storing the required informaton --- Metadata information related to Parent GFID and Basename will reside with the file. PGFID and hash of Basename will become part of Xattr Key name and Basename will be saved as Value. Xattr Key = meta.. Xattr Value = I would think we should keep the xattr name constant, and specialize the value, instead of encoding data in the xattr value itself. The issue is of course multiple xattr name:value pairs where name is constant is not feasible and needs some thought. If we use single xattr for multiple values then updating one's basename will have to parse the existing xattr before update(in case of hardlinks) Wrote about other experiments did to update and read xattrs. http://www.gluster.org/pipermail/gluster-devel/2015-December/047380.html Non-crypto hash is suitable for this purpose. Number of Xattrs on a file = Number of Links Converting GFID to Path --- Example GFID: 78e8bce0-a8c9-4e67-9ffb-c4c4c7eff038 Here is where we get into a bit of a problem, if a file has links. Which path to follow would be a dilemma. We could return all paths, but tools like glusterfind or backup related, would prefer a single file. One of the thoughts is, if we could feed a pGFID:GFID pair as input, this still does not solve a file having links within the same pGFID. Anyway, something to note or consider. 1. List all xattrs of GFID file in the brick backend. ($BRICK_ROOT/.glusterfs/78/e8/78e8bce0-a8c9-4e67-9ffb-c4c4c7eff038) 2. If Xattr Key starts with “meta”, Split to get parent GFID and collect xattr value 3. Convert Parent GFID to path using recursive readlink till path. This is the part which should/would change with DHT2 in my opinion. Sort of repeating step (2) here instead of a readlink. 4. Join Converted parent dir path and xattr value(basename) Recording - MKNOD/CREATE/LINK/SYMLINK: Add new Xattr(PGFID, BN) Most of these operations as they exist today are not atomic, i.e we create the file and then add the xattrs and then possibly hardlink the GFID, so by the time the GFID makes it's presence, the file is all ready and (maybe) hence consistent. The other way to look at this is that we get the GFID representation ready, and then hard link the name into the name tree. Alternately we could leverage O_TMPFILE to create the file encode all its inode information and then bring it to life in the namespace. This is orthogonal to this design, but brings in needs to be consistent on failures. Either way, if a failure occurs midway, we have no way to recover the information for the inode and set it right. Thoughts? RENAME: Remove old xattr(PGFID1, BN1), Add new xattr(PGFID2, BN2) UNLINK: If Link count > 1 then Remove xattr(PGFID, BN) Heal on Lookup -- Healing on lookup can be enabled if required, by default we can disable this option since this may have performance implications during read. Enabling the logging - This can be enabled using Volume set option. Option name TBD. Rebuild Index - Offline activity, crawls the backend filesystem and builds all the required xattrs. Frequency of the rebuild? I would assume this would be run when the option is enabled, and later almost never, unless we want to recover from some inconsistency in the data (how to detect the same would be an open question). Also I think once this option is enabled, we should prevent disabling the same (or at least till the packages are downgraded), as this would be a hinge that multiple other features may depend on, and so we consider this an on-disk change that is made once, and later maintained for the volume, rather than turn on/off. Which means the initial index rebuild would be a volume version conversion from current to this representation and may need aditional thoughts on how we maintain volume versions. Comments and Suggestions Welcome. regards Aravinda On 11/25/2015 10:08 AM, Aravinda wrote: regards Aravinda On 11/24/2015 11:25 PM, Shyam wrote: There seem to be other interested consumers in gluster for the same information, and I guess we need a god base design to address this on disk change, so that it can be leveraged in the various u
[Gluster-devel] Wrong usage of dict functions
hi, It seems like two ways to create dictionary is causing problems. There are quite a few dict_new()/dict_destroy() or get_new_dict()/dict_unref() in the code base. So stopped exposing the functions without ref/unref i.e. get_new_dict()/dict_destroy() as part of http://review.gluster.org/13183 Files changed as part of the patch: api/src/glfs-mgmt.c| 2 +- api/src/glfs.c | 2 +- cli/src/cli-cmd-parser.c | 42 +- cli/src/cli-cmd-system.c | 6 +++--- cli/src/cli-cmd-volume.c | 2 +- cli/src/cli-rpc-ops.c | 4 ++-- cli/src/cli.c | 2 +- glusterfsd/src/glusterfsd.c| 2 +- libglusterfs/src/dict.h| 5 - libglusterfs/src/graph.c | 2 +- libglusterfs/src/graph.y | 2 +- xlators/cluster/afr/src/afr-self-heal-common.c | 6 +++--- xlators/cluster/afr/src/afr-self-heal-name.c | 2 +- xlators/cluster/dht/src/dht-selfheal.c | 15 +++ xlators/cluster/dht/src/dht-shared.c | 2 +- xlators/mgmt/glusterd/src/glusterd-geo-rep.c | 4 ++-- xlators/mgmt/glusterd/src/glusterd-op-sm.c | 4 ++-- xlators/mgmt/glusterd/src/glusterd-volgen.c| 12 ++-- xlators/mount/fuse/src/fuse-bridge.c | 3 +-- xlators/mount/fuse/src/fuse-bridge.h | 2 -- 20 files changed, 56 insertions(+), 65 deletions(-) Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] quota.t hangs on NetBSD machines
On Wed, Jan 06, 2016 at 12:18:25AM -0500, Raghavendra Gowdappa wrote: > No, its not a portability issue. It should have occurred at > similar probability on Linux machines too. Does Linux issue frequent > fsyncs compared to NetBsd? The kernel's behavior are certtainly different here. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client
OK, here is valgrind log of patched Ganesha (I took recent version of your patchset, 8685abfc6d) with Entries_HWMARK set to 500. https://gist.github.com/5397c152a259b9600af0 See no huge runtime leaks now. However, I've repeated this test with another volume in replica and got the following Ganesha error: === ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >= nlookup' failed. === 06.01.2016 08:40, Soumya Koduri написав: On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote: OK, I've repeated the same traversing test with patched GlusterFS API, and here is new Valgrind log: https://gist.github.com/17ecb16a11c9aed957f5 Fuse mount doesn't use gfapi helper. Does your above GlusterFS API application call glfs_fini() during exit? glfs_fini() is responsible for freeing the memory consumed by gfAPI applications. Could you repeat the test with nfs-ganesha (which for sure calls glfs_fini() and purges inodes if exceeds its inode cache limit) if possible. Thanks, Soumya Still leaks. On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote: On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote: Unfortunately, both patches didn't make any difference for me. I've patched 3.7.6 with both patches, recompiled and installed patched GlusterFS package on client side and mounted volume with ~2M of files. The I performed usual tree traverse with simple "find". Memory RES value went from ~130M at the moment of mounting to ~1.5G after traversing the volume for ~40 mins. Valgrind log still shows lots of leaks. Here it is: https://gist.github.com/56906ca6e657c4ffa4a1 Looks like you had done fuse mount. The patches which I have pasted below apply to gfapi/nfs-ganesha applications. Also, to resolve the nfs-ganesha issue which I had mentioned below (in case if Entries_HWMARK option gets changed), I have posted below fix - https://review.gerrithub.io/#/c/258687 Thanks, Soumya Ideas? 05.01.2016 12:31, Soumya Koduri написав: I tried to debug the inode* related leaks and seen some improvements after applying the below patches when ran the same test (but will smaller load). Could you please apply those patches & confirm the same? a) http://review.gluster.org/13125 This will fix the inodes & their ctx related leaks during unexport and the program exit. Please check the valgrind output after applying the patch. It should not list any inodes related memory as lost. b) http://review.gluster.org/13096 The reason the change in Entries_HWMARK (in your earlier mail) dint have much effect is that the inode_nlookup count doesn't become zero for those handles/inodes being closed by ganesha. Hence those inodes shall get added to inode lru list instead of purge list which shall get forcefully purged only when the number of gfapi inode table entries reaches its limit (which is 137012). This patch fixes those 'nlookup' counts. Please apply this patch and reduce 'Entries_HWMARK' to much lower value and check if it decreases the in-memory being consumed by ganesha process while being active. CACHEINODE { Entries_HWMark = 500; } Note: I see an issue with nfs-ganesha during exit when the option 'Entries_HWMARK' gets changed. This is not related to any of the above patches (or rather Gluster) and I am currently debugging it. Thanks, Soumya On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote: 1. test with Cache_Size = 256 and Entries_HWMark = 4096 Before find . -type f: root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT After: root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT ~250M leak. 2. test with default values (after ganesha restart) Before: root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT After: root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT ~159M leak. No reasonable correlation detected. Second test was finished much faster than first (I guess, server-side GlusterFS cache or server kernel page cache is the cause). There are ~1.8M files on this test volume. On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote: On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: Another addition: it seems to be GlusterFS API library memory leak because NFS-Ganesha also consumes huge amount of memory while doing ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory usage: === root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT === 1.4G is too much for simple stat() :(. Ideas? nfs-ganesha also has cache layer
[Gluster-devel] Selective Read-only in glusterfs-specs
Hi, I have submitted a design document for Selective read-only - enforcing selective read-only for specific clients. http://review.gluster.org/#/c/13180 Please review. Thanks, Saravanakumar ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel