Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-06 Thread Soumya Koduri



On 01/06/2016 01:58 PM, Oleksandr Natalenko wrote:

OK, here is valgrind log of patched Ganesha (I took recent version of
your patchset, 8685abfc6d) with Entries_HWMARK set to 500.

https://gist.github.com/5397c152a259b9600af0

See no huge runtime leaks now.


Glad to hear this :)

However, I've repeated this test with

another volume in replica and got the following Ganesha error:

===
ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >=
nlookup' failed.
===


I repeated the tests on replica volume as well. But haven't hit any 
assert. Could you confirm if you have taken the latest gluster patch set 
#3 ?

 - http://review.gluster.org/#/c/13096/3

If you are hitting the issue even then, please provide the core if possible.

Thanks,
Soumya



06.01.2016 08:40, Soumya Koduri написав:

On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote:

OK, I've repeated the same traversing test with patched GlusterFS
API, and
here is new Valgrind log:

https://gist.github.com/17ecb16a11c9aed957f5


Fuse mount doesn't use gfapi helper. Does your above GlusterFS API
application call glfs_fini() during exit? glfs_fini() is responsible
for freeing the memory consumed by gfAPI applications.

Could you repeat the test with nfs-ganesha (which for sure calls
glfs_fini() and purges inodes if exceeds its inode cache limit) if
possible.

Thanks,
Soumya


Still leaks.

On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote:

On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote:

Unfortunately, both patches didn't make any difference for me.

I've patched 3.7.6 with both patches, recompiled and installed patched
GlusterFS package on client side and mounted volume with ~2M of files.
The I performed usual tree traverse with simple "find".

Memory RES value went from ~130M at the moment of mounting to ~1.5G
after traversing the volume for ~40 mins. Valgrind log still shows
lots
of leaks. Here it is:

https://gist.github.com/56906ca6e657c4ffa4a1


Looks like you had done fuse mount. The patches which I have pasted
below apply to gfapi/nfs-ganesha applications.

Also, to resolve the nfs-ganesha issue which I had mentioned below (in
case if Entries_HWMARK option gets changed), I have posted below fix -
https://review.gerrithub.io/#/c/258687

Thanks,
Soumya


Ideas?

05.01.2016 12:31, Soumya Koduri написав:

I tried to debug the inode* related leaks and seen some improvements
after applying the below patches when ran the same test (but will
smaller load). Could you please apply those patches & confirm the
same?

a) http://review.gluster.org/13125

This will fix the inodes & their ctx related leaks during unexport
and
the program exit. Please check the valgrind output after applying the
patch. It should not list any inodes related memory as lost.

b) http://review.gluster.org/13096

The reason the change in Entries_HWMARK (in your earlier mail) dint
have much effect is that the inode_nlookup count doesn't become zero
for those handles/inodes being closed by ganesha. Hence those inodes
shall get added to inode lru list instead of purge list which shall
get forcefully purged only when the number of gfapi inode table
entries reaches its limit (which is 137012).

This patch fixes those 'nlookup' counts. Please apply this patch and
reduce 'Entries_HWMARK' to much lower value and check if it decreases
the in-memory being consumed by ganesha process while being active.

CACHEINODE {

 Entries_HWMark = 500;

}


Note: I see an issue with nfs-ganesha during exit when the option
'Entries_HWMARK' gets changed. This is not related to any of the
above
patches (or rather Gluster) and I am currently debugging it.

Thanks,
Soumya

On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:

1. test with Cache_Size = 256 and Entries_HWMark = 4096

Before find . -type f:

root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

After:

root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

~250M leak.

2. test with default values (after ganesha restart)

Before:

root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

After:

root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

~159M leak.

No reasonable correlation detected. Second test was finished much
faster than
first (I guess, server-side GlusterFS cache or server kernel page
cache is the
cause).

There are ~1.8M files on this test volume.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:

Another addition: it seems to be GlusterFS API library memory leak
because NFS-Ganesha also consumes huge amount of memory while
doi

Re: [Gluster-devel] NLMv4 isn't available over UDP

2016-01-06 Thread Kaushal M
On Thu, Jan 7, 2016 at 11:00 AM, Sponge Tsui  wrote:
> hi, I'm sorry to ask for help if it is a common question. I noticed
> that in glusterfs NLMv4 is only available over TCP,but not available
> over UDP as the show of "rpcinfo -p" when using glusterfs to export
> volumes through NFS protocol. The result is as follow:
>
> program vers proto   port  service
> 104   tcp111  portmapper
> 103   tcp111  portmapper
> 102   tcp111  portmapper
> 104   udp111  portmapper
> 103   udp111  portmapper
> 102   udp111  portmapper
> 153   tcp  38465  mountd
> 151   tcp  38466  mountd
> 133   tcp   2049  nfs
> 1000214   tcp  38468  nlockmgr
> 1002273   tcp   2049
> 1000211   udp868  nlockmgr
> 1000211   tcp869  nlockmgr
> 1000241   udp  34507  status
> 1000241   tcp  40511  status
>
> I want to find out that why NLMv4 wasn't supported over UDP while
> NLMv4 has been supported over TCP already. Whether or not NLMv4 over
> UDP would be took into account.
>
> Please excuse me if I've used bad terminology. Maybe it's because of
> some other issues that i didn't realize. To be honest, I'm a newcomer
> for glusterfs less than a mouth.
>
> Thanks for your attention. I'm looking forward to your reply, and any
> answers will be appreciated !

The GlusterFS NFS service only supports the NFSv3 protocol over TCP.
So the NLM service was also implemented only over TCP.
As for the reason why only TCP support was implemented for NFS, it was
probably because NFSv3 over TCP provided support for most clients,
with the least amount of implementation effort.

>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] quota.t hangs on NetBSD machines

2016-01-06 Thread Raghavendra Gowdappa


- Original Message -
> From: "Manikandan Selvaganesh" 
> To: "Raghavendra G" 
> Cc: "Gluster Devel" 
> Sent: Wednesday, January 6, 2016 7:54:32 PM
> Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines
> 
> Hi,
> We are debugging the issue. With the patch[1], the quota.t doesn't seem to
> hang and all the test passes successfully but it throws an error(while
> running test #24 in quota.t) "perfused: perfuse_node_inactive:
> perfuse_node_fsync failed error = 69: Resource temporarily unavailable". I
> have attached a tar file which contains the logs while the test(quota.t) is
> being run.
> Thanks to Raghavendra Talur and Vijay for helping :)
> 
> [1] http://review.gluster.org/#/c/13177/

I've merged this patch. If there are any tests hung, please kill them and 
retrigger.

> 
> Thank you :-)
> 
> --
> Regards,
> Manikandan Selvaganesh.
> 
> - Original Message -
> From: "Manikandan Selvaganesh" 
> To: "Raghavendra G" 
> Cc: "Gluster Devel" 
> Sent: Wednesday, January 6, 2016 11:04:23 AM
> Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines
> 
> Hi Raghavendra,
> I will check out this with the fix and update you soon on this :)
> 
> --
> Regards,
> Manikandan Selvaganesh.
> 
> - Original Message -
> From: "Raghavendra G" 
> To: "Raghavendra Gowdappa" 
> Cc: "Manikandan Selvaganesh" , "Gluster Devel"
> 
> Sent: Tuesday, January 5, 2016 11:15:57 PM
> Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines
> 
> Manikandan,
> 
> Can you test this fix since you've the setup ready?
> 
> regards,
> Raghavendra
> 
> On Tue, Jan 5, 2016 at 11:11 PM, Raghavendra G 
> wrote:
> 
> > A fix has been sent to:
> > http://review.gluster.org/13177
> >
> > On Tue, Jan 5, 2016 at 2:24 PM, Raghavendra Gowdappa 
> > wrote:
> >
> >>
> >>
> >> - Original Message -
> >> > From: "Raghavendra Gowdappa" 
> >> > To: "Manikandan Selvaganesh" 
> >> > Cc: "Vijaikumar Mallikarjuna" , "Emmanuel
> >> Dreyfus" , "Raghavendra Talur"
> >> > , "Gluster Devel" 
> >> > Sent: Tuesday, January 5, 2016 12:16:27 PM
> >> > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines
> >> >
> >> > Thanks. There is a write call which is not unwound by write-behind as
> >> can be
> >> > seen below:
> >> >
> >> > [.WRITE]
> >> > request-ptr=0xb80a4830
> >> > refcount=2
> >> > wound=no
> >> > generation-number=90
> >> > req->op_ret=32768
> >> > req->op_errno=0
> >> > sync-attempts=0
> >> > sync-in-progress=no
> >> > size=32768
> >> > offset=11665408
> >> > lied=0
> >> > append=0
> >> > fulfilled=0
> >> > go=0
> >> >
> >> > note, the request is not wound (wound=no and sync-in-progress=no), not
> >> > unwound (lied=0). I am yet to figure out the RCA. will be sending a
> >> patch
> >> > soon.
> >>
> >> I figured out this issue occurs when "trickling-writes" is on in
> >> write-behind (by default its on). Unfortunately this option cannot be
> >> turned off using cli as of now (one can edit volfiles though).
> >>
> >> >
> >> > - Original Message -
> >> > > From: "Manikandan Selvaganesh" 
> >> > > To: "Raghavendra Gowdappa" 
> >> > > Cc: "Vijaikumar Mallikarjuna" , "Emmanuel
> >> Dreyfus"
> >> > > , "Raghavendra Talur"
> >> > > , "Gluster Devel" 
> >> > > Sent: Tuesday, January 5, 2016 11:43:55 AM
> >> > > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines
> >> > >
> >> > > Hi Raghavendra,
> >> > >
> >> > > Yeah, we have taken the statedump when the test program was in 'D'
> >> state. I
> >> > > have enabled statedump of inodes too.
> >> > >
> >> > > Attaching the entire statedump file.
> >> > >
> >> > > Thank you :-)
> >> > >
> >> > > --
> >> > > Regards,
> >> > > Manikandan Selvaganesh.
> >> > >
> >> > > - Original Message -
> >> > > From: "Raghavendra Gowdappa" 
> >> > > To: "Manikandan Selvaganesh" 
> >> > > Cc: "Emmanuel Dreyfus" , "Gluster Devel"
> >> > > , "Vijaikumar Mallikarjuna"
> >> > > 
> >> > > Sent: Monday, January 4, 2016 11:56:03 PM
> >> > > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines
> >> > >
> >> > >
> >> > >
> >> > > - Original Message -
> >> > > > From: "Manikandan Selvaganesh" 
> >> > > > To: "Raghavendra Gowdappa" 
> >> > > > Cc: "Emmanuel Dreyfus" , "Gluster Devel"
> >> > > > , "Vijaikumar Mallikarjuna"
> >> > > > 
> >> > > > Sent: Monday, January 4, 2016 7:00:16 PM
> >> > > > Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > We have taken statedump of fuse client process, quotad and bricks.
> >> > > > Apparently, we could not find any stack information in brick's
> >> statedump.
> >> > > > Below is the client statedump state information:
> >> > >
> >> > > Thanks Manikandan :). You took this statedump once the test program
> >> was in
> >> > > 'D' state, right? Otherwise these can be just in-transit fops.
> >> > >
> >> > > >
> >> > > > [global.callpool.stack.1.frame.1]
> >> > > > frame=0xb80775f0
> >> > > > ref_count=0
> >> > > > translator=patchy-write-behind
> >> > > > c

[Gluster-devel] NLMv4 isn't available over UDP

2016-01-06 Thread Sponge Tsui
hi, I'm sorry to ask for help if it is a common question. I noticed
that in glusterfs NLMv4 is only available over TCP,but not available
over UDP as the show of "rpcinfo -p" when using glusterfs to export
volumes through NFS protocol. The result is as follow:

program vers proto   port  service
104   tcp111  portmapper
103   tcp111  portmapper
102   tcp111  portmapper
104   udp111  portmapper
103   udp111  portmapper
102   udp111  portmapper
153   tcp  38465  mountd
151   tcp  38466  mountd
133   tcp   2049  nfs
1000214   tcp  38468  nlockmgr
1002273   tcp   2049
1000211   udp868  nlockmgr
1000211   tcp869  nlockmgr
1000241   udp  34507  status
1000241   tcp  40511  status

I want to find out that why NLMv4 wasn't supported over UDP while
NLMv4 has been supported over TCP already. Whether or not NLMv4 over
UDP would be took into account.

Please excuse me if I've used bad terminology. Maybe it's because of
some other issues that i didn't realize. To be honest, I'm a newcomer
for glusterfs less than a mouth.

Thanks for your attention. I'm looking forward to your reply, and any
answers will be appreciated !
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] compare-bug-version-and-git-branch.sh FAILING

2016-01-06 Thread Atin Mukherjee
This says "BUG id 1296206 is marked private, please remove the groups."
Does the bug is limited to a private group? If so that could be the
reason, otherwise rest of the things look fine to me.

Thanks,
Atin

On 01/07/2016 10:12 AM, Milind Changire wrote:
> for patch: http://review.gluster.org/13186
> Jenkins failed
> job: https://build.gluster.org/job/compare-bug-version-and-git-branch/14201/
> 
> 
> I had mistakenly entered a downstream BUG ID for rfc.sh and then later
> amended the commit message with the correct mainline BUG ID and
> resubmitted via rfc.sh. I also corrected the Topic tag in Gerritt to use
> the correct BUG ID.
> 
> But the job for this patch is failing even after corrections.
> 
> Please advise.
> 
> --
> Milind
> 
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Gluster 4.0 Roadmap page

2016-01-06 Thread Atin Mukherjee
Folks,

A month back I created the roadmap page for Gluster 4.0 [1] which should
be the place holder to capture details in terms of feature pages, specs
links etc.

I'd request all 4.0 initiative leads to update it with the respective
details (send a PR to glusterweb repo). I also submitted a PR [2] to
update the page with all the relevant details I have as of now.

[1] https://www.gluster.org/community/roadmap/4.0/
[2] https://github.com/gluster/glusterweb/pull/34

Thanks,
Atin
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] compare-bug-version-and-git-branch.sh FAILING

2016-01-06 Thread Milind Changire
for patch: http://review.gluster.org/13186
Jenkins failed job:
https://build.gluster.org/job/compare-bug-version-and-git-branch/14201/


I had mistakenly entered a downstream BUG ID for rfc.sh and then later
amended the commit message with the correct mainline BUG ID and resubmitted
via rfc.sh. I also corrected the Topic tag in Gerritt to use the correct
BUG ID.

But the job for this patch is failing even after corrections.

Please advise.

--
Milind
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] NLMv4 isn't available over UDPv4

2016-01-06 Thread Sponge Tsui
hi, I'm sorry to ask for help if it is a common question. I noticed
that in glusterfs NLMv4 is only available over TCP,but not available
over UDP as the show of "rpcinfo -p" when using glusterfs to export
volumes through NFS protocol. The result is as follow:

program vers proto   port  service
104   tcp111  portmapper
103   tcp111  portmapper
102   tcp111  portmapper
104   udp111  portmapper
103   udp111  portmapper
102   udp111  portmapper
153   tcp  38465  mountd
151   tcp  38466  mountd
133   tcp   2049  nfs
1000214   tcp  38468  nlockmgr
1002273   tcp   2049
1000211   udp868  nlockmgr
1000211   tcp869  nlockmgr
1000241   udp  34507  status
1000241   tcp  40511  status

I want to find out that why NLMv4 wasn't supported over UDP while
NLMv4 has been supported over TCP already. Whether or not NLMv4 over
UDP would be took into account.

Please excuse me if I've used bad terminology. Maybe it's because of
some other issues that i didn't realize. To be honest, I'm a newcomer
for glusterfs less than a mouth.

Thanks for your attention. I'm looking forward to your reply, and any
answers will be appreciated !
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] snapshot/bug-1227646.t throws core [rev...@dev.gluster.org: Change in glusterfs[master]: hook-scripts: reconsile mount, fixing manual mount]

2016-01-06 Thread Vijay Bellur

On 01/06/2016 01:55 AM, Rajesh Joseph wrote:

Yesterday, I looked into the core. From the core it looked like the rebalance 
process crashed
during cleanup_and_exit. The frame seems to be corrupted. Therefore it does not 
look like a snapshot
bug. The test case tests snapshot with tiered volume. If somebody from 
rebalance or tiering team can
also take a look then it would help.



Raghavendar, Shyam - can you please help here?

Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] quota.t hangs on NetBSD machines

2016-01-06 Thread Emmanuel Dreyfus
Manikandan Selvaganesh  wrote:

> We are debugging the issue. With the patch[1], the quota.t doesn't seem to
> hang and all the test passes successfully but it throws an error(while
> running test #24 in quota.t) "perfused: perfuse_node_inactive:
> perfuse_node_fsync failed error = 69: Resource temporarily unavailable".

This is a non fatal warning about a failed fsync.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] compound fop design first cut

2016-01-06 Thread Pranith Kumar Karampuri



On 01/06/2016 07:50 PM, Jeff Darcy wrote:

1) fops will be compounded per inode, meaning 2 fops on different
inodes can't be compounded (Not because of the design, Just reducing
scope of the problem).

2) Each xlator that wants a compound fop packs the arguments by
itself.

Packed how?  Are we talking about XDR here, or something else?  How is
dict_t handled?  Will there be generic packing/unpacking code somewhere,
or is each translator expected to do this manually?


Packed as mentioned in step-4 below. There will be common functions 
provided which will fill an array cell with the given information to the 
function for that fop. In conjunction to that there will be filling 
functions for each of the compound fops listed at: 
https://public.pad.fsfe.org/p/glusterfs-compound-fops. XDR should be 
similar to what Soumya suggested in earlier mails just like in NFS.





3) On the server side a de-compounder placed below server xlator
unpacks the arguments and does the necessary operations.

4) Arguments for compound fops will be passed as array of union of
structures where each structure is associated with a fop.

5) Each xlator will have _compound_fop () which receives the
fop and does additional processing that is required for itself.

What happens when (not if) some translator fails to provide this?  Is
there a default function?  Is there something at the end of the chain
that will log an error if the fop gets that far without being handled
(as with GF_FOP_IPC)?


Yes there will be default_fop provided just like other fops which is 
just a pass through. Posix will log unwind with -1, ENOTSUPP.





6) Response will also be an array of union of response structures
where each structure is associated with a fop's response.

What are the error semantics?  Does processing of a series always stop
at the first error, or are there some errors that allow retry/continue?
If/when processing stops, who's responsible for cleaning up state
changed by those parts that succeeded?  What happens if the connection
dies in the middle?


Yes, at the moment we are implementing stop at first error semantics as 
it seems to satisfy all the compound fops we listed @ 
https://public.pad.fsfe.org/p/glusterfs-compound-fops. Each translator 
which looks to handle the compound fop should handle failures just like 
they do for normal fop at the moment.




How are values returned from one operation in a series propagated as
arguments for the next?


They are not. In the first cut the only dependency between two fops now 
is whether the previous one succeeded or not. Just this much seems to 
work fine for the fops we are targeting for now: 
https://public.pad.fsfe.org/p/glusterfs-compound-fops, We may have to 
enhance it in future based on what will come up in the future.




What are the implications for buffer and message sizes?  What are the
limits on how large these can get, and/or how many operations can be
compounded?


It depends on the limits imposed by rpc layer. If it can't send the 
request, the fop will fail. If it can send the request but the response 
is too big to send back, I think the fop will lead to error by frame 
timeout for the response. Either way it will be a failure. At the moment 
for the fops listed at: 
https://public.pad.fsfe.org/p/glusterfs-compound-fops this doesn't seem 
to be a problem.




How is synchronization handled?  Is the inode locked for the duration of
the compound operation, to prevent other operations from changing the
context in which later parts of the compound operation execute?  Are
there possibilities for deadlock here?  Alternatively, if no locking is
done, are we going to document the fact that compound operations are not
atomic/linearizable?


Since we are limiting the scope to single inode fops, locking should 
suffice. EC doesn't have any problem as it just has one lock for both 
data/entry, metadata locks. In afr we need to come up with locking order 
for metadata, data domains. Something similar to what we do in rename 
where we need to take multiple locks.


Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Meeting minutes for Gluster Community Meeting 6-Jan-2016

2016-01-06 Thread Kaushal M
Happy New Year everyone!

The years first community meeting has concluded. Not much of the AIs
from the last meeting[1] were acted upon, mostly because it was the
holidays. We had a quite a long discussion on the purpose and usage of
gluster-specs repository. Some of the results should be visible in the
coming days. Nothing much else was discussed. The links to the logs
and meeting minutes are below. The minutes have also been pasted at
the end of the mail for quick reference.

Minutes: 
http://meetbot.fedoraproject.org/gluster-meeting/2016-01-06/weekly_gluster_community_meeting.2016-01-06-12.03.html
Minutes (text):
http://meetbot.fedoraproject.org/gluster-meeting/2016-01-06/weekly_gluster_community_meeting.2016-01-06-12.03.txt
Log: 
http://meetbot.fedoraproject.org/gluster-meeting/2016-01-06/weekly_gluster_community_meeting.2016-01-06-12.03.log.html

Thank you.
Kaushal

[1] The last meeting was 2 weeks back on 23rd December. Last weeks
meeting wasn't held, as none of us showed up

Meeting summary
---
* Rollcall  (kshlm, 12:03:38)

* Last weeks AIs  (kshlm, 12:06:09)

* ndevos to send out a reminder to the maintainers about more actively
  enforcing backports of bugfixes (next year)  (kshlm, 12:06:51)
  * ACTION: ndevos to send out a reminder to the maintainers about more
actively enforcing backports of bugfixes (this year)  (kshlm,
12:08:27)

* rastar and msvbhat to consolidate and publish testing matrix on
  gluster.org. amye can help post Jan 1.  (kshlm, 12:09:14)
  * ACTION: rastar and msvbhat to consolidate and publish testing matrix
on gluster.org. amye can help post Jan 1.  (kshlm, 12:11:20)

* kshlm & csim to set up faux/pseudo user email for gerrit, bugzilla,
  github (after csim comes back on 4th Jan)  (kshlm, 12:11:35)
  * ACTION: kshlm & csim to set up faux/pseudo user email for gerrit,
bugzilla, github (after csim comes back on 4th Jan)  (kshlm,
12:12:11)

* hagarth to create 3.6.8 for bugzilla version  (kshlm, 12:13:30)

* kkeithley to send a mail about using sanity checker tools in the
  codebase  (kshlm, 12:15:06)
  * ACTION: kkeithley to send a mail about using sanity checker tools in
the codebase  (kshlm, 12:17:49)

* rtalur/rastar will send a seperate email about the Gerrit patch merge
  strategies to the maintainers list  (kshlm, 12:18:12)
  * ACTION: rtalur/rastar will send a seperate email about the Gerrit
patch merge strategies to the maintainers list  (kshlm, 12:19:10)

* atinm/hagarth to share details on the MVP plan for Gluster-4.0
  (kshlm, 12:19:27)
  * LINK:
https://www.gluster.org/pipermail/gluster-devel/2015-December/047528.html
(kshlm, 12:21:02)
  * LINK:
http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/13337
(ndevos, 12:21:08)
  * ACTION: kshlm to write up a README for glusterfs-specs.  (kshlm,
12:51:39)

* GlusterFS 3.7  (kshlm, 12:52:46)

* GlusterFS 3.6  (kshlm, 12:53:47)

* GlusterFS-3.5  (kshlm, 12:56:11)

* Gluster 3.8 and 4.0  (kshlm, 12:57:41)

Meeting ended at 13:04:24 UTC.




Action Items

* ndevos to send out a reminder to the maintainers about more actively
  enforcing backports of bugfixes (this year)
* rastar and msvbhat to consolidate and publish testing matrix on
  gluster.org. amye can help post Jan 1.
* kshlm & csim to set up faux/pseudo user email for gerrit, bugzilla,
  github (after csim comes back on 4th Jan)
* kkeithley to send a mail about using sanity checker tools in the
  codebase
* rtalur/rastar will send a seperate email about the Gerrit patch merge
  strategies to the maintainers list
* kshlm to write up a README for glusterfs-specs.




Action Items, by person
---
* kshlm
  * kshlm & csim to set up faux/pseudo user email for gerrit, bugzilla,
github (after csim comes back on 4th Jan)
  * kshlm to write up a README for glusterfs-specs.
* ndevos
  * ndevos to send out a reminder to the maintainers about more actively
enforcing backports of bugfixes (this year)
* **UNASSIGNED**
  * rastar and msvbhat to consolidate and publish testing matrix on
gluster.org. amye can help post Jan 1.
  * kkeithley to send a mail about using sanity checker tools in the
codebase
  * rtalur/rastar will send a seperate email about the Gerrit patch
merge strategies to the maintainers list




People Present (lines said)
---
* kshlm (103)
* ndevos (35)
* jdarcy (23)
* justinclift (15)
* overclk (9)
* zodbot (3)
* aravindavk (3)
* anoopcs (1)
* obnox (1)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] compound fop design first cut

2016-01-06 Thread Jeff Darcy
> 1) fops will be compounded per inode, meaning 2 fops on different
> inodes can't be compounded (Not because of the design, Just reducing
> scope of the problem).
>
> 2) Each xlator that wants a compound fop packs the arguments by
> itself.

Packed how?  Are we talking about XDR here, or something else?  How is
dict_t handled?  Will there be generic packing/unpacking code somewhere,
or is each translator expected to do this manually?

> 3) On the server side a de-compounder placed below server xlator
> unpacks the arguments and does the necessary operations.
>
> 4) Arguments for compound fops will be passed as array of union of
> structures where each structure is associated with a fop.
>
> 5) Each xlator will have _compound_fop () which receives the
> fop and does additional processing that is required for itself.

What happens when (not if) some translator fails to provide this?  Is
there a default function?  Is there something at the end of the chain
that will log an error if the fop gets that far without being handled
(as with GF_FOP_IPC)?

> 6) Response will also be an array of union of response structures
> where each structure is associated with a fop's response.

What are the error semantics?  Does processing of a series always stop
at the first error, or are there some errors that allow retry/continue?
If/when processing stops, who's responsible for cleaning up state
changed by those parts that succeeded?  What happens if the connection
dies in the middle?

How are values returned from one operation in a series propagated as
arguments for the next?

What are the implications for buffer and message sizes?  What are the
limits on how large these can get, and/or how many operations can be
compounded?

How is synchronization handled?  Is the inode locked for the duration of
the compound operation, to prevent other operations from changing the
context in which later parts of the compound operation execute?  Are
there possibilities for deadlock here?  Alternatively, if no locking is
done, are we going to document the fact that compound operations are not
atomic/linearizable?
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] compound fop design first cut

2016-01-06 Thread Anuradha Talur
Hi,

After discussions with Pranith and Soumya, here is the design for compound fops:

1) fops will be compounded per inode, meaning 2 fops on different inodes can't 
be compounded (Not because of the design, Just reducing scope of the problem).
2) Each xlator that wants a compound fop packs the arguments by itself.
3) On the server side a de-compounder placed below server xlator unpacks the 
arguments and does the necessary operations.
4) Arguments for compound fops will be passed as array of union of structures 
where each structure is associated with a fop.
5) Each xlator will have _compound_fop () which receives the fop and 
does additional processing that is required for itself.
6) Response will also be an array of union of response structures where each 
structure is associated with a fop's response.

Comments welcome!

- Original Message -
> From: "Milind Changire" 
> To: "Jeff Darcy" 
> Cc: "Gluster Devel" 
> Sent: Friday, December 11, 2015 9:25:38 PM
> Subject: Re: [Gluster-devel] compound fop design first cut
> 
> 
> 
> On Wed, Dec 9, 2015 at 8:02 PM, Jeff Darcy < jda...@redhat.com > wrote:
> 
> 
> 
> 
> 
> On December 9, 2015 at 7:07:06 AM, Ira Cooper ( i...@redhat.com ) wrote:
> > A simple "abort on failure" and let the higher levels clean it up is
> > probably right for the type of compounding I propose. It is what SMB2
> > does. So, if you get an error return value, cancel the rest of the
> > request, and have it return ECOMPOUND as the errno.
> 
> This is exactly the part that worries me. If a compound operation
> fails, some parts of it will often need to be undone. “Let the higher
> levels clean it up” means that rollback code will be scattered among all
> of the translators that use compound operations. Some of them will do
> it right. Others . . . less so. ;) All willl have to be tested
> separately. If we centralize dispatch of compound operations into one
> piece of code, we can centralize error detection and recovery likewise.
> That ensures uniformity of implementation, and facilitates focused
> testing (or even formal proof) of that implementation.
> 
> Can we gain the same benefits with a more generic design? Perhaps. It
> would require that the compounding translator know how to reverse each
> type of operation, so that it can do so after an error. That’s
> feasible, though it does mean maintaining a stack of undo actions
> instead of a simple state. It might also mean testing combinations and
> scenarios that will actually never occur in other components’ usage of
> the compounding feature. More likely it means that people will *think*
> they can use the facility in unanticipated ways, until their
> unanticipated usage creates a combination or scenario that was never
> tested and doesn’t work. Those are going to be hard problems to debug.
> I think it’s better to be explicit about which permutations we actually
> expect to work, and have those working earlier.
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
> Could we have a dry-run phase and a commit phase for the compound operation.
> The dry-run phase phase could test the validity of the transaction and the
> commit phase can actually perform the operation.
> 
> If any of the operation in the dry-run operation sequence returns error, the
> compound operation can be aborted immediately without the complexity of an
> undo ... scattered or centralized.
> 
> But if the subsequent operations depend on the changed state of the system
> from earlier operations, then we'll have to introduce a system state object
> for such transactions ... and maybe serialize such operations. The system
> state object can be passed through the operation sequence. How well this
> idea would work in a multi-threaded world is not clear to me too.
> 
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
Thanks,
Anuradha.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] NetBSD tests not running to completion.

2016-01-06 Thread Ravishankar N
I re triggered NetBSD regressions for 
http://review.gluster.org/#/c/13041/3 but they are being run in silent 
mode and are not completing. Can some one from the infra-team take a 
look? The last 22 tests in 
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/ 
have failed. Highly unlikely that something is wrong with all those 
patches.


Thanks,
Ravi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GFID to Path Conversion

2016-01-06 Thread Aravinda


regards
Aravinda

On 01/06/2016 02:49 AM, Shyam wrote:

On 12/09/2015 12:47 AM, Aravinda wrote:

Hi,

Sharing draft design for GFID to Path Conversion.(Directory GFID to 
Path is

very easy in DHT v.1, this design may not work in case of DHT 2.0)


(current thought) DHT2 would extend the manner in which name,pGFID is 
stored for files, for directories. So reverse path walking would 
leverage the same mechanism as explained below.


Of course, as this would involve MDS hopping, the intention would be 
to *not* use this in IO critical paths, and rather use this in the 
tool set that needs reverse path walks to provide information to admins.




Performance and Storage space impact yet to be analyzed.

Storing the required informaton
---
Metadata information related to Parent GFID and Basename will reside
with the file. PGFID and hash of Basename will become part of Xattr
Key name and Basename will be saved as Value.

 Xattr Key = meta..
 Xattr Value = 


I would think we should keep the xattr name constant, and specialize 
the value, instead of encoding data in the xattr value itself. The 
issue is of course multiple xattr name:value pairs where name is 
constant is not feasible and needs some thought.
If we use single xattr for multiple values then updating one's basename 
will have to parse the existing xattr before update(in case of hardlinks)

Wrote about other experiments did to update and read xattrs.
http://www.gluster.org/pipermail/gluster-devel/2015-December/047380.html




Non-crypto hash is suitable for this purpose.
Number of Xattrs on a file = Number of Links

Converting GFID to Path
---
Example GFID: 78e8bce0-a8c9-4e67-9ffb-c4c4c7eff038


Here is where we get into a bit of a problem, if a file has links. 
Which path to follow would be a dilemma. We could return all paths, 
but tools like glusterfind or backup related, would prefer a single 
file. One of the thoughts is, if we could feed a pGFID:GFID pair as 
input, this still does not solve a file having links within the same 
pGFID.


Anyway, something to note or consider.



1. List all xattrs of GFID file in the brick backend.
($BRICK_ROOT/.glusterfs/78/e8/78e8bce0-a8c9-4e67-9ffb-c4c4c7eff038)
2. If Xattr Key starts with “meta”, Split to get parent GFID and collect
xattr value
3. Convert Parent GFID to path using recursive readlink till path.


This is the part which should/would change with DHT2 in my opinion. 
Sort of repeating step (2) here instead of a readlink.



4. Join Converted parent dir path and xattr value(basename)

Recording
-
MKNOD/CREATE/LINK/SYMLINK: Add new Xattr(PGFID, BN)


Most of these operations as they exist today are not atomic, i.e we 
create the file and then add the xattrs and then possibly hardlink the 
GFID, so by the time the GFID makes it's presence, the file is all 
ready and (maybe) hence consistent.


The other way to look at this is that we get the GFID representation 
ready, and then hard link the name into the name tree. Alternately we 
could leverage O_TMPFILE to create the file encode all its inode 
information and then bring it to life in the namespace. This is 
orthogonal to this design, but brings in needs to be consistent on 
failures.


Either way, if a failure occurs midway, we have no way to recover the 
information for the inode and set it right. Thoughts?



RENAME: Remove old xattr(PGFID1, BN1), Add new xattr(PGFID2, BN2)
UNLINK: If Link count > 1 then Remove xattr(PGFID, BN)

Heal on Lookup
--
Healing on lookup can be enabled if required, by default we can
disable this option since this may have performance implications
during read.

Enabling the logging
-
This can be enabled using Volume set option. Option name TBD.

Rebuild Index
-
Offline activity, crawls the backend filesystem and builds all the
required xattrs.


Frequency of the rebuild? I would assume this would be run when the 
option is enabled, and later almost never, unless we want to recover 
from some inconsistency in the data (how to detect the same would be 
an open question).


Also I think once this option is enabled, we should prevent disabling 
the same (or at least till the packages are downgraded), as this would 
be a hinge that multiple other features may depend on, and so we 
consider this an on-disk change that is made once, and later 
maintained for the volume, rather than turn on/off.


Which means the initial index rebuild would be a volume version 
conversion from current to this representation and may need aditional 
thoughts on how we maintain volume versions.




Comments and Suggestions Welcome.

regards
Aravinda

On 11/25/2015 10:08 AM, Aravinda wrote:


regards
Aravinda

On 11/24/2015 11:25 PM, Shyam wrote:

There seem to be other interested consumers in gluster for the same
information, and I guess we need a god base design to address this on
disk change, so that it can be leveraged in the various u

[Gluster-devel] Wrong usage of dict functions

2016-01-06 Thread Pranith Kumar Karampuri

hi,
   It seems like two ways to create dictionary is causing problems. 
There are quite a few dict_new()/dict_destroy() or 
get_new_dict()/dict_unref() in the code base. So stopped exposing the 
functions without ref/unref i.e. get_new_dict()/dict_destroy() as part 
of http://review.gluster.org/13183


Files changed as part of the patch:
 api/src/glfs-mgmt.c|  2 +-
 api/src/glfs.c |  2 +-
 cli/src/cli-cmd-parser.c   | 42 
+-

 cli/src/cli-cmd-system.c   |  6 +++---
 cli/src/cli-cmd-volume.c   |  2 +-
 cli/src/cli-rpc-ops.c  |  4 ++--
 cli/src/cli.c  |  2 +-
 glusterfsd/src/glusterfsd.c|  2 +-
 libglusterfs/src/dict.h|  5 -
 libglusterfs/src/graph.c   |  2 +-
 libglusterfs/src/graph.y   |  2 +-
 xlators/cluster/afr/src/afr-self-heal-common.c |  6 +++---
 xlators/cluster/afr/src/afr-self-heal-name.c   |  2 +-
 xlators/cluster/dht/src/dht-selfheal.c | 15 +++
 xlators/cluster/dht/src/dht-shared.c   |  2 +-
 xlators/mgmt/glusterd/src/glusterd-geo-rep.c   |  4 ++--
 xlators/mgmt/glusterd/src/glusterd-op-sm.c |  4 ++--
 xlators/mgmt/glusterd/src/glusterd-volgen.c| 12 ++--
 xlators/mount/fuse/src/fuse-bridge.c   |  3 +--
 xlators/mount/fuse/src/fuse-bridge.h   |  2 --
 20 files changed, 56 insertions(+), 65 deletions(-)


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] quota.t hangs on NetBSD machines

2016-01-06 Thread Emmanuel Dreyfus
On Wed, Jan 06, 2016 at 12:18:25AM -0500, Raghavendra Gowdappa wrote:
> No, its not a portability issue. It should have occurred at
> similar probability on Linux machines too. Does Linux issue frequent
> fsyncs compared to NetBsd?

The kernel's behavior are certtainly different here. 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-06 Thread Oleksandr Natalenko
OK, here is valgrind log of patched Ganesha (I took recent version of 
your patchset, 8685abfc6d) with Entries_HWMARK set to 500.


https://gist.github.com/5397c152a259b9600af0

See no huge runtime leaks now. However, I've repeated this test with 
another volume in replica and got the following Ganesha error:


===
ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >= 
nlookup' failed.

===

06.01.2016 08:40, Soumya Koduri написав:

On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote:
OK, I've repeated the same traversing test with patched GlusterFS API, 
and

here is new Valgrind log:

https://gist.github.com/17ecb16a11c9aed957f5


Fuse mount doesn't use gfapi helper. Does your above GlusterFS API
application call glfs_fini() during exit? glfs_fini() is responsible
for freeing the memory consumed by gfAPI applications.

Could you repeat the test with nfs-ganesha (which for sure calls
glfs_fini() and purges inodes if exceeds its inode cache limit) if
possible.

Thanks,
Soumya


Still leaks.

On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote:

On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote:

Unfortunately, both patches didn't make any difference for me.

I've patched 3.7.6 with both patches, recompiled and installed 
patched
GlusterFS package on client side and mounted volume with ~2M of 
files.

The I performed usual tree traverse with simple "find".

Memory RES value went from ~130M at the moment of mounting to ~1.5G
after traversing the volume for ~40 mins. Valgrind log still shows 
lots

of leaks. Here it is:

https://gist.github.com/56906ca6e657c4ffa4a1


Looks like you had done fuse mount. The patches which I have pasted
below apply to gfapi/nfs-ganesha applications.

Also, to resolve the nfs-ganesha issue which I had mentioned below  
(in
case if Entries_HWMARK option gets changed), I have posted below fix 
-

https://review.gerrithub.io/#/c/258687

Thanks,
Soumya


Ideas?

05.01.2016 12:31, Soumya Koduri написав:
I tried to debug the inode* related leaks and seen some 
improvements

after applying the below patches when ran the same test (but will
smaller load). Could you please apply those patches & confirm the
same?

a) http://review.gluster.org/13125

This will fix the inodes & their ctx related leaks during unexport 
and
the program exit. Please check the valgrind output after applying 
the

patch. It should not list any inodes related memory as lost.

b) http://review.gluster.org/13096

The reason the change in Entries_HWMARK (in your earlier mail) dint
have much effect is that the inode_nlookup count doesn't become 
zero
for those handles/inodes being closed by ganesha. Hence those 
inodes

shall get added to inode lru list instead of purge list which shall
get forcefully purged only when the number of gfapi inode table
entries reaches its limit (which is 137012).

This patch fixes those 'nlookup' counts. Please apply this patch 
and
reduce 'Entries_HWMARK' to much lower value and check if it 
decreases

the in-memory being consumed by ganesha process while being active.

CACHEINODE {

 Entries_HWMark = 500;

}


Note: I see an issue with nfs-ganesha during exit when the option
'Entries_HWMARK' gets changed. This is not related to any of the 
above

patches (or rather Gluster) and I am currently debugging it.

Thanks,
Soumya

On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:

1. test with Cache_Size = 256 and Entries_HWMark = 4096

Before find . -type f:

root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf 
-N

NIV_EVENT

After:

root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf 
-N

NIV_EVENT

~250M leak.

2. test with default values (after ganesha restart)

Before:

root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf 
-N

NIV_EVENT

After:

root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf 
-N

NIV_EVENT

~159M leak.

No reasonable correlation detected. Second test was finished much
faster than
first (I guess, server-side GlusterFS cache or server kernel page
cache is the
cause).

There are ~1.8M files on this test volume.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
Another addition: it seems to be GlusterFS API library memory 
leak
because NFS-Ganesha also consumes huge amount of memory while 
doing
ordinary "find . -type f" via NFSv4.2 on remote client. Here is 
memory

usage:

===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N NIV_EVENT
===

1.4G is too much for simple stat() :(.

Ideas?


nfs-ganesha also has cache layer

[Gluster-devel] Selective Read-only in glusterfs-specs

2016-01-06 Thread Saravanakumar Arumugam

Hi,

I have submitted a design document for Selective read-only - enforcing 
selective read-only for specific clients.

http://review.gluster.org/#/c/13180

Please review.

Thanks,
Saravanakumar

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel