[Nfs-ganesha-devel] reg. FSAL_ACE_PERM_WRITE_DATA check in fsal_check_setattr_perms

2017-02-13 Thread Satya Prakash GS
Hi,

Ganesha seems to be checking for FSAL_ACE_PERM_WRITE_DATA permission
to change owner/group perms (in the function
fsal_check_setattr_perms). In our filesystem, there is another user
who is equivalent to the root user. This user should be able to change
owner/group of any file like the root user. Can somebody please
explain the rationale behind this check and how our requirement of
having another super user can be achieved.

Thanks,
Satya.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] reg. FSAL_ACE_PERM_WRITE_DATA check in fsal_check_setattr_perms

2017-02-14 Thread Satya Prakash GS
I was referring to this check --->

if (access_check != FSAL_ACE4_MASK_SET(FSAL_ACE_PERM_WRITE_DATA)) {
  status = CACHE_INODE_FSAL_EPERM;
  note = "(no ACL to check)";
  goto out;
}

which is done if the user is not owner of the file.

As per the code,  user can do chown if he is owner or if there is an
acl on the file. Can Ganesha just pass the credentials (uid, gid) on
to the server for it to decide if chown is allowed on that file by a
particular user (irrespective of acls set on that file). That way,
certain users can be treated specially by the server and grant them
access.

> > Looking at the code, we don't check WRITE_DATA for owner checks, only for
> > size or time changes.  For owner/group changes, we check
> > FSAL_ACE_PERM_WRITE_OWNER, which is the correct ACL to check.
> >
> > Presumably, you could just add an ACL to all files allowing all access to
your
> > "root" user.  This should allow access, correct?

> This would be a solution.

I am trying to see if we can avoid any on-disk changes. Since NFS is
one of the ways to access filesystem it would be better if we can
avoid handling it differently.

> On 02/13/2017 09:31 AM, Satya Prakash GS wrote:
> > Hi,
> >
> > Ganesha seems to be checking for FSAL_ACE_PERM_WRITE_DATA
> permission
> > to change owner/group perms (in the function
> > fsal_check_setattr_perms). In our filesystem, there is another user
> > who is equivalent to the root user. This user should be able to change
> > owner/group of any file like the root user. Can somebody please
> > explain the rationale behind this check and how our requirement of
> > having another super user can be achieved.

> If you need a true additional super-user, Ganesha would really need to have
> code added to be able to configure such, and work to allow super-user
> privileges everywhere appropriate.

> What FSAL and what filesystem are you using?

We have our own filesystem and FSAL.

> Frank

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] reg. FSAL_ACE_PERM_WRITE_DATA check in fsal_check_setattr_perms

2017-02-14 Thread Satya Prakash GS
>On 02/14/2017 06:48 AM, Satya Prakash GS wrote:
>> I was referring to this check --->
>>
>> if (access_check != FSAL_ACE4_MASK_SET(FSAL_ACE_PERM_WRITE_DATA)) {
>>   status = CACHE_INODE_FSAL_EPERM;
>>   note = "(no ACL to check)";
>>   goto out;
>> }

> Sorry, I assumed an ACL existed on the file.  What this check is saying
> is that, if there's no ACL, the finest granularity check we can do is
> unix permission bits, which is just Read Write Execute (and Write is the
> only relevant one here), so only continue if we're looking for Write access.

Can Ganesha avoid doing this check and call test_access always with
the constructed access_mask. I see nothing should be broken because of
this.

>> which is done if the user is not owner of the file.
>>
>> As per the code,  user can do chown if he is owner or if there is an
>> acl on the file. Can Ganesha just pass the credentials (uid, gid) on
>> to the server for it to decide if chown is allowed on that file by a
>> particular user (irrespective of acls set on that file). That way,
>> certain users can be treated specially by the server and grant them
>> access.
>>
>>>> Looking at the code, we don't check WRITE_DATA for owner checks, only for
>>>> size or time changes.  For owner/group changes, we check
>>>> FSAL_ACE_PERM_WRITE_OWNER, which is the correct ACL to check.
>>>>
>>>> Presumably, you could just add an ACL to all files allowing all access to
>> your
>>>> "root" user.  This should allow access, correct?
>>
>>> This would be a solution.
>>
>> I am trying to see if we can avoid any on-disk changes. Since NFS is
>> one of the ways to access filesystem it would be better if we can
>> avoid handling it differently.

> You don't have to do this in the filesystem; you can have the getattrs()
> in your FSAL just always add an ACL to the beginning that allows all
> access to your superuser.

This could mean interpreting an existing acl and building a new acl if
an acl exists on that file.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] reg. FSAL_ACE_PERM_WRITE_DATA check in fsal_check_setattr_perms

2017-02-15 Thread Satya Prakash GS
Frank,

I have subscribed to the list. Apologies for any inconvenience caused.

> This wouldn't actually help since the call to test_access just winds up in
> fsal_test_access which isn't going to know about your special super user.
> All Ganesha is doing here is not making a test_access call that turns into
> an fsal_test_access call that would always fail the permission check - or
> actually, I think it might always pass the permission check for files that
> don't have an NFS v4 ACL... We would have to change the test_access API to
> add permissions to check for in mode tests that are outside the mode
> permission checking...

> The alternative as a general mechanism is to increase the number of calls to
> the underlying filesystem Ganesha makes which is likely to have a negative
> impact on other FSAL's performance.

Our filesystem doesn't support NFSv4 ACLs. Sorry to prolong this
further but just to be clear, we have implemented our own test_access
call. In our test_access implementation we have a way to figure out if
the user is super user or not. Agree that removing the check would
result in a lot of fsal_test_access calls.

On Wed, Feb 15, 2017 at 3:57 AM, Frank Filz  wrote:
> One thing,
>
> I suggest you subscribe to the nfs-ganesha-devel mailing list. I have made
> it so your responses should go through without being a member, but you risk
> missing a response if someone doesn't reply-all (or worse, we risk missing a
> response that is just sent to you).
>
>> -Original Message-
>> From: Satya Prakash GS [mailto:g.satyaprak...@gmail.com]
>> Sent: Tuesday, February 14, 2017 2:07 PM
>> To: nfs-ganesha-devel@lists.sourceforge.net
>> Subject: Re: [Nfs-ganesha-devel] reg. FSAL_ACE_PERM_WRITE_DATA check
>> in fsal_check_setattr_perms
>>
>> >On 02/14/2017 06:48 AM, Satya Prakash GS wrote:
>> >> I was referring to this check --->
>> >>
>> >> if (access_check !=
>> FSAL_ACE4_MASK_SET(FSAL_ACE_PERM_WRITE_DATA)) {
>> >>   status = CACHE_INODE_FSAL_EPERM;
>> >>   note = "(no ACL to check)";
>> >>   goto out;
>> >> }
>>
>> > Sorry, I assumed an ACL existed on the file.  What this check is
>> > saying is that, if there's no ACL, the finest granularity check we can
>> > do is unix permission bits, which is just Read Write Execute (and
>> > Write is the only relevant one here), so only continue if we're looking
> for
>> Write access.
>>
>> Can Ganesha avoid doing this check and call test_access always with the
>> constructed access_mask. I see nothing should be broken because of this.
>
> This wouldn't actually help since the call to test_access just winds up in
> fsal_test_access which isn't going to know about your special super user.
> All Ganesha is doing here is not making a test_access call that turns into
> an fsal_test_access call that would always fail the permission check - or
> actually, I think it might always pass the permission check for files that
> don't have an NFS v4 ACL... We would have to change the test_access API to
> add permissions to check for in mode tests that are outside the mode
> permission checking...
>
> The alternative as a general mechanism is to increase the number of calls to
> the underlying filesystem Ganesha makes which is likely to have a negative
> impact on other FSAL's performance.
>
>> >> which is done if the user is not owner of the file.
>> >>
>> >> As per the code,  user can do chown if he is owner or if there is an
>> >> acl on the file. Can Ganesha just pass the credentials (uid, gid) on
>> >> to the server for it to decide if chown is allowed on that file by a
>> >> particular user (irrespective of acls set on that file). That way,
>> >> certain users can be treated specially by the server and grant them
>> >> access.
>> >>
>> >>>> Looking at the code, we don't check WRITE_DATA for owner checks,
>> >>>> only for size or time changes.  For owner/group changes, we check
>> >>>> FSAL_ACE_PERM_WRITE_OWNER, which is the correct ACL to check.
>> >>>>
>> >>>> Presumably, you could just add an ACL to all files allowing all
>> >>>> access to
>> >> your
>> >>>> "root" user.  This should allow access, correct?
>> >>
>> >>> This would be a solution.
>> >>
>> >> I am trying to see if we can avoid any on-disk changes. Since NFS is
>> >> one of the ways to access filesystem 

Re: [Nfs-ganesha-devel] reg. FSAL_ACE_PERM_WRITE_DATA check in fsal_check_setattr_perms

2017-02-16 Thread Satya Prakash GS
Daniel/Frank,

We are using v2.3-stable at the moment. I am yet to go through
stackable fsals to understand your previous comments. Also, we want to
make sure that we aren't hit when we upgrade to v2.4 or latest.

I like the is_super_logic that Frank has proposed, can we have that in
Ganesha. I can own the task and publish the change.

Thanks,
Satya.


On Thu, Feb 16, 2017 at 12:50 AM, Frank Filz  wrote:
>
>> Frank,
>>
>> I have subscribed to the list. Apologies for any inconvenience caused.
>>
>> > This wouldn't actually help since the call to test_access just winds
>> > up in fsal_test_access which isn't going to know about your special super
>> user.
>> > All Ganesha is doing here is not making a test_access call that turns
>> > into an fsal_test_access call that would always fail the permission
>> > check - or actually, I think it might always pass the permission check
>> > for files that don't have an NFS v4 ACL... We would have to change the
>> > test_access API to add permissions to check for in mode tests that are
>> > outside the mode permission checking...
>>
>> > The alternative as a general mechanism is to increase the number of
>> > calls to the underlying filesystem Ganesha makes which is likely to
>> > have a negative impact on other FSAL's performance.
>>
>> Our filesystem doesn't support NFSv4 ACLs. Sorry to prolong this further but
>> just to be clear, we have implemented our own test_access call. In our
>> test_access implementation we have a way to figure out if the user is super
>> user or not. Agree that removing the check would result in a lot of
>> fsal_test_access calls.
>
> What version of Ganesha do you use? 2.4 and later will not ever call your 
> FSAL's test_access because FSAL_MDCACHE always calls fsal_test_access and 
> never calls the underlying FSAL's own test_access.
>
> Maybe what we need is a way for places that are checking for super user to 
> call an FSAL is_super_user(creds) method, which of could would default to 
> returning true only for uid == 0.
>
> Your implementation of course could do whatever you need to do.
>
> Then we just have to get out of the habit of checking for uid == 0 and 
> instead invoke is_super_user(creds)...
>
> Frank
>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-01 Thread Satya Prakash GS
Hi,

I am seeing "Permission denied" errors while running iozone on nfs
client with kerberos enabled. Digging further, I found there are a lot
of AUTH_REJECTEDCRED messages in nfs server log. NFS client tolerates
2 errors from server and tries to refresh the credentials. On the
third call it would throw an error to the application.

http://lxr.free-electrons.com/source/net/sunrpc/clnt.c#L2343

2395 switch ((n = ntohl(*p++))) {
2396 case RPC_AUTH_REJECTEDCRED:
2397 case RPC_AUTH_REJECTEDVERF:
2398 case RPCSEC_GSS_CREDPROBLEM:
2399 case RPCSEC_GSS_CTXPROBLEM:
2400 if (!task->tk_cred_retry)
2401 break;
2402 task->tk_cred_retry--;
2403 dprintk("RPC: %5u %s: retry stale creds\n",
2404 task->tk_pid, __func__);
2405 rpcauth_invalcred(task);


On the client I have seen this message twice :

Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_status (status 20)
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_decode (status 20)
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 rpc_verify_header: retry
stale creds
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 invalidating RPCSEC_GSS
cred 880544ce4600
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 release request 8804062e7000
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_reserve (status 0)
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 failed to lock transport
8808723c5800
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 sleep_on(queue
"xprt_sending" time 25264836677)
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 added to queue
8808723c5990 "xprt_sending"
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 __rpc_wake_up_task (now
25264836722)
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 removed from queue
8808723c5990 "xprt_sending"
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 __rpc_execute flags=0x801
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_reserveresult (status -11)
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_retry_reserve (status 0)
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 reserved req
8806c2e01a00 xid 929383d1
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_reserveresult (status 0)
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_refresh (status 0)
Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 refreshing RPCSEC_GSS
cred 88086f634240

On the third occurrence the filesystem OP failed :

Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 __rpc_execute flags=0x801
Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 call_status (status 20)
Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 call_decode (status 20)
Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 rpc_verify_header: call rejected 2
Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 rpc_verify_header: call
failed with error -13
Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 return 0, status -13

Say, the ticket has expired (within the renewable lifetime) and the
server did not find the ticket in the cache for the first time but the
second/third call shouldn't ideally fail when the credentials were
just refreshed through an upcall. Unavailability of the creds in the
cache/a failing svcauth_gss_accept_sec_context call could throw the
REJECTEDCRED error. Could you share some pointers on which is more
likely or if there is something else that could cause this issue.

Thanks,
Satya.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-06 Thread Satya Prakash GS
With libntirpc debugs enabled I could see all the three retries are
failing because of the unavailability of the creds in the cache. The
credentials are being removed by the reaper in the authgss_ctx_gc_idle
because of this condition -
abs(axp->gen - gd->gen) > __svc_params->gss.max_idle_gen

>From the code, I can see that only a further RPCSEC_GSS_INIT call from
the client can repopulate the credentials in the cache. I am not sure
how server can dictate client to establish the context again.

Any help is appreciated.

Thanks,
Satya.

On Wed, Mar 1, 2017 at 7:35 PM, Satya Prakash GS
 wrote:
> Hi,
>
> I am seeing "Permission denied" errors while running iozone on nfs
> client with kerberos enabled. Digging further, I found there are a lot
> of AUTH_REJECTEDCRED messages in nfs server log. NFS client tolerates
> 2 errors from server and tries to refresh the credentials. On the
> third call it would throw an error to the application.
>
> http://lxr.free-electrons.com/source/net/sunrpc/clnt.c#L2343
>
> 2395 switch ((n = ntohl(*p++))) {
> 2396 case RPC_AUTH_REJECTEDCRED:
> 2397 case RPC_AUTH_REJECTEDVERF:
> 2398 case RPCSEC_GSS_CREDPROBLEM:
> 2399 case RPCSEC_GSS_CTXPROBLEM:
> 2400 if (!task->tk_cred_retry)
> 2401 break;
> 2402 task->tk_cred_retry--;
> 2403 dprintk("RPC: %5u %s: retry stale creds\n",
> 2404 task->tk_pid, __func__);
> 2405 rpcauth_invalcred(task);
>
>
> On the client I have seen this message twice :
>
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_status (status 20)
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_decode (status 20)
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 rpc_verify_header: retry
> stale creds
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 invalidating RPCSEC_GSS
> cred 880544ce4600
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 release request 8804062e7000
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_reserve (status 0)
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 failed to lock transport
> 8808723c5800
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 sleep_on(queue
> "xprt_sending" time 25264836677)
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 added to queue
> 8808723c5990 "xprt_sending"
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 __rpc_wake_up_task (now
> 25264836722)
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 removed from queue
> 8808723c5990 "xprt_sending"
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 __rpc_execute flags=0x801
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_reserveresult (status -11)
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_retry_reserve (status 0)
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 reserved req
> 8806c2e01a00 xid 929383d1
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_reserveresult (status 0)
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_refresh (status 0)
> Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 refreshing RPCSEC_GSS
> cred 88086f634240
>
> On the third occurrence the filesystem OP failed :
>
> Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 __rpc_execute flags=0x801
> Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 call_status (status 20)
> Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 call_decode (status 20)
> Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 rpc_verify_header: call rejected 
> 2
> Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 rpc_verify_header: call
> failed with error -13
> Feb 26 10:28:25 atsqa6c71 kernel: RPC: 39431 return 0, status -13
>
> Say, the ticket has expired (within the renewable lifetime) and the
> server did not find the ticket in the cache for the first time but the
> second/third call shouldn't ideally fail when the credentials were
> just refreshed through an upcall. Unavailability of the creds in the
> cache/a failing svcauth_gss_accept_sec_context call could throw the
> REJECTEDCRED error. Could you share some pointers on which is more
> likely or if there is something else that could cause this issue.
>
> Thanks,
> Satya.

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-07 Thread Satya Prakash GS
>On 3/7/17 4:56 AM, William Allen Simpson wrote:
> On 3/6/17 6:58 PM, Matt Benjamin wrote:
>> Looking briefly at section 5.3.3.3 of rfc2203, it seems like that would be 
>> correct.  If the client has just refreshed its credentials, why is it 
>> continuing to send with the expired context?
>>

Thank you for the reply.

The client may not be sending expired credentials but it is supposed
to reestablish the credentials using
RPC_GSS_PROC_DESTROY/RPC_GSS_PROC_INIT which I guess is not happening.
I am continuing to debug this further.

As per the RFC, Ganesha is supposed to be throwing AUTH_REJECTEDCRED
instead of RPCSEC_GSS_CREDPROBLEM when it doesn't find credentials in
the cache.

However the nfs client handles AUTH_REJECTEDCRED,
RPCSEC_GSS_CREDPROBLEM similarly. I am not hopeful of this change but
I can give it a try.

> I don't know, but I'll take a look.  Now that we always have a server
> for a client, perhaps the cache can be moved into a shared structure?

>Sorry, thought that was our ntirpc client.  Looking back, that's the
>kernel client.  Not much we can do about the kernel client other than
>report a bug.

William,
I want to be sure it's a client bug and not Ganesha bug before putting
it on the kernel mailing list. Given that the issue is reproducible
twice/thrice a week I am wondering how it would have gone unreported
so far.

Regards,
Satya.



On Tue, Mar 7, 2017 at 5:28 AM, Matt Benjamin  wrote:
> Hi Satya,
>
> Looking briefly at section 5.3.3.3 of rfc2203, it seems like that would be 
> correct.  If the client has just refreshed its credentials, why is it 
> continuing to send with the expired context?
>
> Matt
>
> - Original Message -
>> From: "Satya Prakash GS" 
>> To: nfs-ganesha-devel@lists.sourceforge.net
>> Sent: Monday, March 6, 2017 1:10:36 PM
>> Subject: Re: [Nfs-ganesha-devel] Permission denied error with Kerberos   
>>  enabled
>>
>> With libntirpc debugs enabled I could see all the three retries are
>> failing because of the unavailability of the creds in the cache. The
>> credentials are being removed by the reaper in the authgss_ctx_gc_idle
>> because of this condition -
>> abs(axp->gen - gd->gen) > __svc_params->gss.max_idle_gen
>>
>> >From the code, I can see that only a further RPCSEC_GSS_INIT call from
>> the client can repopulate the credentials in the cache. I am not sure
>> how server can dictate client to establish the context again.
>>
>> Any help is appreciated.
>>
>> Thanks,
>> Satya.
>>
>> On Wed, Mar 1, 2017 at 7:35 PM, Satya Prakash GS
>>  wrote:
>> > Hi,
>> >
>> > I am seeing "Permission denied" errors while running iozone on nfs
>> > client with kerberos enabled. Digging further, I found there are a lot
>> > of AUTH_REJECTEDCRED messages in nfs server log. NFS client tolerates
>> > 2 errors from server and tries to refresh the credentials. On the
>> > third call it would throw an error to the application.
>> >
>> > http://lxr.free-electrons.com/source/net/sunrpc/clnt.c#L2343
>> >
>> > 2395 switch ((n = ntohl(*p++))) {
>> > 2396 case RPC_AUTH_REJECTEDCRED:
>> > 2397 case RPC_AUTH_REJECTEDVERF:
>> > 2398 case RPCSEC_GSS_CREDPROBLEM:
>> > 2399 case RPCSEC_GSS_CTXPROBLEM:
>> > 2400 if (!task->tk_cred_retry)
>> > 2401 break;
>> > 2402 task->tk_cred_retry--;
>> > 2403 dprintk("RPC: %5u %s: retry stale creds\n",
>> > 2404 task->tk_pid, __func__);
>> > 2405 rpcauth_invalcred(task);
>> >
>> >
>> > On the client I have seen this message twice :
>> >
>> > Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_status (status 20)
>> > Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_decode (status 20)
>> > Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 rpc_verify_header: retry
>> > stale creds
>> > Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 invalidating RPCSEC_GSS
>> > cred 880544ce4600
>> > Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 release request
>> > 8804062e7000
>> > Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 call_reserve (status 0)
>> > Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 failed to lock transport
>> > 8808723c5800
>> > Feb 26 10:27:01 atsqa6c71 kernel: RPC: 39431 sleep_on(queue
&

Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-09 Thread Satya Prakash GS
Looks like the gen field in svc_rpc_gss_data is used to check the
freshness of a context. However it is not initialized to axp->gen in
authgss_ctx_hash_set.
Will this not result in evicting the entries out early or am I missing
something ?

Thanks,
Satya.


On Tue, Mar 7, 2017 at 4:36 PM, Satya Prakash GS
 wrote:
>>On 3/7/17 4:56 AM, William Allen Simpson wrote:
>> On 3/6/17 6:58 PM, Matt Benjamin wrote:
>>> Looking briefly at section 5.3.3.3 of rfc2203, it seems like that would be 
>>> correct.  If the client has just refreshed its credentials, why is it 
>>> continuing to send with the expired context?
>>>
>
> Thank you for the reply.
>
> The client may not be sending expired credentials but it is supposed
> to reestablish the credentials using
> RPC_GSS_PROC_DESTROY/RPC_GSS_PROC_INIT which I guess is not happening.
> I am continuing to debug this further.
>
> As per the RFC, Ganesha is supposed to be throwing AUTH_REJECTEDCRED
> instead of RPCSEC_GSS_CREDPROBLEM when it doesn't find credentials in
> the cache.
>
> However the nfs client handles AUTH_REJECTEDCRED,
> RPCSEC_GSS_CREDPROBLEM similarly. I am not hopeful of this change but
> I can give it a try.
>
>> I don't know, but I'll take a look.  Now that we always have a server
>> for a client, perhaps the cache can be moved into a shared structure?
>
>>Sorry, thought that was our ntirpc client.  Looking back, that's the
>>kernel client.  Not much we can do about the kernel client other than
>>report a bug.
>
> William,
> I want to be sure it's a client bug and not Ganesha bug before putting
> it on the kernel mailing list. Given that the issue is reproducible
> twice/thrice a week I am wondering how it would have gone unreported
> so far.
>
> Regards,
> Satya.
>
>
>
> On Tue, Mar 7, 2017 at 5:28 AM, Matt Benjamin  wrote:
>> Hi Satya,
>>
>> Looking briefly at section 5.3.3.3 of rfc2203, it seems like that would be 
>> correct.  If the client has just refreshed its credentials, why is it 
>> continuing to send with the expired context?
>>
>> Matt
>>
>> - Original Message -
>>> From: "Satya Prakash GS" 
>>> To: nfs-ganesha-devel@lists.sourceforge.net
>>> Sent: Monday, March 6, 2017 1:10:36 PM
>>> Subject: Re: [Nfs-ganesha-devel] Permission denied error with Kerberos  
>>>   enabled
>>>
>>> With libntirpc debugs enabled I could see all the three retries are
>>> failing because of the unavailability of the creds in the cache. The
>>> credentials are being removed by the reaper in the authgss_ctx_gc_idle
>>> because of this condition -
>>> abs(axp->gen - gd->gen) > __svc_params->gss.max_idle_gen
>>>
>>> >From the code, I can see that only a further RPCSEC_GSS_INIT call from
>>> the client can repopulate the credentials in the cache. I am not sure
>>> how server can dictate client to establish the context again.
>>>
>>> Any help is appreciated.
>>>
>>> Thanks,
>>> Satya.
>>>
>>> On Wed, Mar 1, 2017 at 7:35 PM, Satya Prakash GS
>>>  wrote:
>>> > Hi,
>>> >
>>> > I am seeing "Permission denied" errors while running iozone on nfs
>>> > client with kerberos enabled. Digging further, I found there are a lot
>>> > of AUTH_REJECTEDCRED messages in nfs server log. NFS client tolerates
>>> > 2 errors from server and tries to refresh the credentials. On the
>>> > third call it would throw an error to the application.
>>> >
>>> > http://lxr.free-electrons.com/source/net/sunrpc/clnt.c#L2343
>>> >
>>> > 2395 switch ((n = ntohl(*p++))) {
>>> > 2396 case RPC_AUTH_REJECTEDCRED:
>>> > 2397 case RPC_AUTH_REJECTEDVERF:
>>> > 2398 case RPCSEC_GSS_CREDPROBLEM:
>>> > 2399 case RPCSEC_GSS_CTXPROBLEM:
>>> > 2400 if (!task->tk_cred_retry)
>>> > 2401 break;
>>> > 2402 task->tk_cred_retry--;
>>> > 2403 dprintk("RPC: %5u %s: retry stale creds\n",
>>> > 2404 task->tk_pid, __func__);
>>> > 2405 rpcauth_invalcred(task);
>>> >
>>> >
>>> > On the client I have seen this message twice :
>>> >
>>> > Feb 26 10:27:01 atsqa6c71 ker

Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-10 Thread Satya Prakash GS
Is this a possibility :

Server first rejects a client op with CREDPROBLEM/REJECTEDCRED,
Client does an upcall and gssd initializes the context with the server.
However the server recycles it immediately before the operation was
retried (looks like there is a bug in the LRU implementation on
Ganesha. To make things worse, I enabled the server debugs and it
slowed down the client operations making the eviction of the entry
easier). This happens thrice failing the client op.

Thanks,
Satya.

On Thu, Mar 9, 2017 at 8:07 PM, Satya Prakash GS
 wrote:
> Looks like the gen field in svc_rpc_gss_data is used to check the
> freshness of a context. However it is not initialized to axp->gen in
> authgss_ctx_hash_set.
> Will this not result in evicting the entries out early or am I missing
> something ?
>
> Thanks,
> Satya.
>

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-10 Thread Satya Prakash GS
On Sat, Mar 11, 2017 at 12:37 AM, William Allen Simpson
 wrote:
> I'm not familiar with this code, so not likely to be much help.
> Looks mostly written by Matt, but Malahal made the most recent
> changes in July 2016.
>
> On 3/10/17 9:35 AM, Satya Prakash GS wrote:
>>
>> Is this a possibility :
>>
>> Server first rejects a client op with CREDPROBLEM/REJECTEDCRED,
>> Client does an upcall and gssd initializes the context with the server.
>> However the server recycles it immediately before the operation was
>> retried (looks like there is a bug in the LRU implementation on
>> Ganesha. To make things worse, I enabled the server debugs and it
>> slowed down the client operations making the eviction of the entry
>> easier). This happens thrice failing the client op.
>>
> Problem is not obvious.
>
> axp->gen is initialized to zero with the rest of *axp -- mem_zalloc().
>
> gd->gen is initialized to zero by alloc_svc_rpc_gss_data().
>
> axp->gen is bumped by one (++) each time it is handled by LRU code in
> authgss_ctx_hash_get().
>

If a node gen isn't getting incremented it means that node is not
being looked up often.

> atomic_inc_uint32_t(&gd->gen) is immediately after that.
>
> You think gd->gen also needs to be set to axp->gen in _set()?
>

> I'm not sure they are related.  There are many gd per axp, so
> axp->gen could be much higher than gd->gen.
>

>From authgss_ctx_gc_idle ->

if (abs(axp->gen - gd->gen) > __svc_params->gss.max_idle_gen) {
Remove the entry from the tree; //gd is no more in the cache after this
}

Translates to - gd wasn't looked up in quite sometime let's clean it up.

//gss.max_idle_gen -> by default set to 1024

If tree's gen is 5000 and a new node gets inserted into the tree, node
gen shouldn't start at 0 or it might pass the above condition in the
next authgss_ctx_gc_idle call.

> Both _get and _set are only called in svc_auth_gss.c _svcauth_gss().
>
> Admittedly, it is hard to track that there are 2 fields both called gen.
>
>> Thanks,
>> Satya.
>>
>> On Thu, Mar 9, 2017 at 8:07 PM, Satya Prakash GS
>>  wrote:
>>>
>>> Looks like the gen field in svc_rpc_gss_data is used to check the
>>> freshness of a context. However it is not initialized to axp->gen in
>>> authgss_ctx_hash_set.
>>> Will this not result in evicting the entries out early or am I missing
>>> something ?
>>>
>>> Thanks,
>>> Satya.
>>>
>>
>>
>> --
>> Announcing the Oxford Dictionaries API! The API offers world-renowned
>> dictionary content that is easy and intuitive to access. Sign up for an
>> account today to start using our lexical data to power your apps and
>> projects. Get started today and enter our developer competition.
>> http://sdm.link/oxford
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>

Thanks,
Satya.

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-11 Thread Satya Prakash GS
We are using 2.3-stable. Given that most of our testing has been done
it's a bit difficult for us to move to 2.5 now but we can take fixes
from 2.5.

I put a similar fix (to the existing one in 2.5) but I am running into
another issue around the ticket renew time. Operations are failing on
the client with auth check failed error -13 (Permission denied). I see
this after every ticket renewal :

27510445:442377371:Mar 10 08:37:21 atsqa8c43 kernel: RPC: 60114 gss_validate
27510447:442377373:Mar 10 08:37:21 atsqa8c43 kernel: RPC: 60114
gss_validate: gss_verify_mic returned error 0x000c
27510448:442377374:Mar 10 08:37:21 atsqa8c43 kernel: RPC: 60114
gss_validate failed ret -13.
27510449:442377375:Mar 10 08:37:21 atsqa8c43 kernel: RPC: 60114
rpc_verify_header: auth check failed with -13
27510450:442377376:Mar 10 08:37:21 atsqa8c43 kernel: RPC: 60114
rpc_verify_header: retrying

gss_verify_mic failed on client with error code set to GSS_S_CONTEXT_EXPIRED.
Either the server or the client is using the wrong context while
wrapping/unwrapping.

Do you remember fixing bug like that ?

Thanks,
Satya.


On Sat, Mar 11, 2017 at 5:55 PM, Malahal Naineni  wrote:
> gd->gen is not used in the latest code. If I remember, there was a bug
> removing recent cached entries resulting in permission errors. What
> version are you using? Try using V2.5.
>
> Regards, Malahal.
>
> On Sat, Mar 11, 2017 at 12:54 AM, Satya Prakash GS
>  wrote:
>> On Sat, Mar 11, 2017 at 12:37 AM, William Allen Simpson
>>  wrote:
>>> I'm not familiar with this code, so not likely to be much help.
>>> Looks mostly written by Matt, but Malahal made the most recent
>>> changes in July 2016.
>>>
>>> On 3/10/17 9:35 AM, Satya Prakash GS wrote:
>>>>
>>>> Is this a possibility :
>>>>
>>>> Server first rejects a client op with CREDPROBLEM/REJECTEDCRED,
>>>> Client does an upcall and gssd initializes the context with the server.
>>>> However the server recycles it immediately before the operation was
>>>> retried (looks like there is a bug in the LRU implementation on
>>>> Ganesha. To make things worse, I enabled the server debugs and it
>>>> slowed down the client operations making the eviction of the entry
>>>> easier). This happens thrice failing the client op.
>>>>
>>> Problem is not obvious.
>>>
>>> axp->gen is initialized to zero with the rest of *axp -- mem_zalloc().
>>>
>>> gd->gen is initialized to zero by alloc_svc_rpc_gss_data().
>>>
>>> axp->gen is bumped by one (++) each time it is handled by LRU code in
>>> authgss_ctx_hash_get().
>>>
>>
>> If a node gen isn't getting incremented it means that node is not
>> being looked up often.
>>
>>> atomic_inc_uint32_t(&gd->gen) is immediately after that.
>>>
>>> You think gd->gen also needs to be set to axp->gen in _set()?
>>>
>>
>>> I'm not sure they are related.  There are many gd per axp, so
>>> axp->gen could be much higher than gd->gen.
>>>
>>
>> >From authgss_ctx_gc_idle ->
>>
>> if (abs(axp->gen - gd->gen) > __svc_params->gss.max_idle_gen) {
>> Remove the entry from the tree; //gd is no more in the cache after this
>> }
>>
>> Translates to - gd wasn't looked up in quite sometime let's clean it up.
>>
>> //gss.max_idle_gen -> by default set to 1024
>>
>> If tree's gen is 5000 and a new node gets inserted into the tree, node
>> gen shouldn't start at 0 or it might pass the above condition in the
>> next authgss_ctx_gc_idle call.
>>
>>> Both _get and _set are only called in svc_auth_gss.c _svcauth_gss().
>>>
>>> Admittedly, it is hard to track that there are 2 fields both called gen.
>>>
>>>> Thanks,
>>>> Satya.
>>>>
>>>> On Thu, Mar 9, 2017 at 8:07 PM, Satya Prakash GS
>>>>  wrote:
>>>>>
>>>>> Looks like the gen field in svc_rpc_gss_data is used to check the
>>>>> freshness of a context. However it is not initialized to axp->gen in
>>>>> authgss_ctx_hash_set.
>>>>> Will this not result in evicting the entries out early or am I missing
>>>>> something ?
>>>>>
>>>>> Thanks,
>>>>> Satya.
>>>>>
>>>>
>>>>
>>>> --
>>>> Announcing the Oxford Dictionaries API! The API offers worl

Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-13 Thread Satya Prakash GS
My bad, I should have mentioned the version in the original post.

Mahalal was kind enough to share a list of relevant commits. With the
patches I continued to see the issue. I suspect the client code is not
handling GSS_S_CONTEXT_EXPIRED correctly on a call to gss_verify_mic.
Instead I fixed the server code to timeout the ticket 5 mins before
the actual timeout (Ganesha is already timing the ticket 5 seconds
earlier).
So far, the issue hasn't got reproduced but I will continue running
the test for a day or two before confirming if the fix works. Do you
see any issue with this fix ?

Thanks,
Satya.

On Sun, Mar 12, 2017 at 8:26 PM, Malahal Naineni  wrote:
>>>  Indeed, 2.4 was mostly a bug fix release
>
> Actually, 2.4 has couple big features as far as ganesha project is
> concerned, but Bill is probably indicating that libntirpc
> corresponding to ganesha2.4 is mostly bug fix release.
>
> Regards, Malahal.
>
> On Sun, Mar 12, 2017 at 8:15 PM, William Allen Simpson
>  wrote:
>> On 3/11/17 8:15 AM, Satya Prakash GS wrote:
>>>
>>> We are using 2.3-stable. Given that most of our testing has been done
>>> it's a bit difficult for us to move to 2.5 now but we can take fixes
>>> from 2.5.
>>>
>> Sorry, I should have asked long ago what version you were using.
>>
>> On this list, I always assume that you are using the most recent -dev
>> release.  There are an awful lot of bug fixes since 2.3.  Indeed, 2.4
>> was mostly a bug fix release, and 2.5 is supposed to be a performance
>> release (but has a fair number of bug fixes, too).

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-16 Thread Satya Prakash GS
Has anyone seen client ops failing with error -13 because of context
expiry on client (gss_verify_mic fails).
Surprisingly with little load, it's consistently reproducible on my setup.
Can someone point me to the relevant commits if this has already been fixed.

Thanks,
Satya.

On Mon, Mar 13, 2017 at 4:01 PM, Satya Prakash GS
 wrote:
> My bad, I should have mentioned the version in the original post.
>
> Mahalal was kind enough to share a list of relevant commits. With the
> patches I continued to see the issue. I suspect the client code is not
> handling GSS_S_CONTEXT_EXPIRED correctly on a call to gss_verify_mic.
> Instead I fixed the server code to timeout the ticket 5 mins before
> the actual timeout (Ganesha is already timing the ticket 5 seconds
> earlier).
> So far, the issue hasn't got reproduced but I will continue running
> the test for a day or two before confirming if the fix works. Do you
> see any issue with this fix ?
>
> Thanks,
> Satya.
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-21 Thread Satya Prakash GS
Here are the reproduction steps:

I have 3 different servers hosting nfs client, server and KDC.
I set the ticket lifetime to 10 minutes on the client and server (in krb5.conf).
When adding a principal I used specified "-maxlife "10 minutes"
-maxrenew 2017-04-30".
I specified max_life (to 10 mins) in the kdc.conf file.
I am using machine credentials on the client (running operation as root user).

Run iozone or bonnie from 2 different clients and you should see the
issue within an hour.

The issue seems to be with the clock-skew which is set to 5 minutes by default.
The server is seeing context timeout of 15 mins while it should have
been 10 mins (taking the clock-skew into account).
Client is rejecting the server messages if the context is used for
more than 10 mins (on the server). This happens thrice and the user
operation fails.

Please let me know if you need any other details.

Thanks,
Satya.


On Sun, Mar 19, 2017 at 5:08 PM, Malahal Naineni  wrote:
> If I understand, you have renewable ticket and commands fail when the
> ticket expires? I will let our folks tests it. Any more details on
> reproducing this issue.
>
> On Fri, Mar 17, 2017 at 9:59 AM, Satya Prakash GS
>  wrote:
>> Has anyone seen client ops failing with error -13 because of context
>> expiry on client (gss_verify_mic fails).
>> Surprisingly with little load, it's consistently reproducible on my setup.
>> Can someone point me to the relevant commits if this has already been fixed.
>>
>> Thanks,
>> Satya.
>>
>> On Mon, Mar 13, 2017 at 4:01 PM, Satya Prakash GS
>>  wrote:
>>> My bad, I should have mentioned the version in the original post.
>>>
>>> Mahalal was kind enough to share a list of relevant commits. With the
>>> patches I continued to see the issue. I suspect the client code is not
>>> handling GSS_S_CONTEXT_EXPIRED correctly on a call to gss_verify_mic.
>>> Instead I fixed the server code to timeout the ticket 5 mins before
>>> the actual timeout (Ganesha is already timing the ticket 5 seconds
>>> earlier).
>>> So far, the issue hasn't got reproduced but I will continue running
>>> the test for a day or two before confirming if the fix works. Do you
>>> see any issue with this fix ?
>>>
>>> Thanks,
>>> Satya.
>>>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] drc and non-cacheable ops

2017-04-26 Thread Satya Prakash GS
Hi,

I have been looking at the drc code, I see operations like READ,
READDIR, etc are not cached in drc. Can a compound operations have mix
of both cacheable and non-cacheable operations. For example, can
client send both SETATTR and READ as part of one compound operation
(if concurrent operations are going on). If there is a mix of
operations looks like DRC doesn't cache the operation. Is this ok ?

Thanks,
Satya.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc and non-cacheable ops

2017-05-01 Thread Satya Prakash GS
Can somebody please reply to this.

Thanks,
Satya.

On Wed, Apr 26, 2017 at 3:02 PM, Satya Prakash GS
 wrote:
> Hi,
>
> I have been looking at the drc code, I see operations like READ,
> READDIR, etc are not cached in drc. Can a compound operations have mix
> of both cacheable and non-cacheable operations. For example, can
> client send both SETATTR and READ as part of one compound operation
> (if concurrent operations are going on). If there is a mix of
> operations looks like DRC doesn't cache the operation. Is this ok ?
>
> Thanks,
> Satya.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] drc refcnt

2017-05-01 Thread Satya Prakash GS
Hi,

DRC refcnt is incremented on every get_drc. However, every
nfs_dupreq_finish doesn't call a put_drc. How is it ensured that the
drc refcnt drops to zero. On doing an umount, is drc eventually
cleaned up.

Thanks,
Satya.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc refcnt

2017-05-01 Thread Satya Prakash GS
Daniel,

I meant to say - nfs_dupreq_finish doesn't call put_drc always. It
does only if it meets certain criteria (drc_should_retire).
Say the maxsize is 1000, hiwat is 800 and retire window size = 0.
At the time of unmount if the drc size is just 100 wouldn't the
refcount stay > 0.

Thanks,
Satya.

>nfs_dupreq_finish() calls dupreq_entry_put() at about line 1238, and
>nfs_dupreq_put_drc() at about line 1222, so I think this is okay.

>Daniel

>On 05/01/2017 11:08 AM, Satya Prakash GS wrote:
>> Hi,
>>
>> DRC refcnt is incremented on every get_drc. However, every
>> nfs_dupreq_finish doesn't call a put_drc. How is it ensured that the
>> drc refcnt drops to zero. On doing an umount, is drc eventually
>> cleaned up.
>>
>> Thanks,
>> Satya.
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@...
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>


On Mon, May 1, 2017 at 9:09 PM, Matt Benjamin  wrote:
> Hi Satya,
>
> I don't -think- that's the case (that DRCs are leaked).  If so, we would 
> certainly wish to correct it.  Malahal has most recently updated these code 
> paths.
>
> Regards,
>
> Matt
>
> - Original Message -
>> From: "Satya Prakash GS" 
>> To: nfs-ganesha-devel@lists.sourceforge.net
>> Sent: Monday, May 1, 2017 11:08:48 AM
>> Subject: [Nfs-ganesha-devel] drc refcnt
>>
>> Hi,
>>
>> DRC refcnt is incremented on every get_drc. However, every
>> nfs_dupreq_finish doesn't call a put_drc. How is it ensured that the
>> drc refcnt drops to zero. On doing an umount, is drc eventually
>> cleaned up.
>>
>> Thanks,
>> Satya.
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>
> --
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc refcnt

2017-05-01 Thread Satya Prakash GS
> On Tue, May 2, 2017 at 7:58 AM, Malahal Naineni  wrote:
> A dupreq will place a refcount on its DRC when it calls xxx_get_drc, so we
> will release that DRC refcount when we free the dupreq.

Ok, so every dupreq holds a ref on the drc. In case of drc cache hit,
a dupreq entry can ref the
drc more than once. This is still fine because unless the dupreq entry
ref goes to zero the drc isn't freed.

> nfs_dupreq_finish() shouldn't free its own dupreq. When it does free some
> other dupreq, we will release DRC refcount corresponding to that dupreq.

> When we free all dupreqs that belong to a DRC

In the case of a disconnected client when are all the dupreqs freed ?

When all the filesystem operations subside from a client (mount point
is no longer in use),
nfs_dupreq_finish doesn't get called anymore. This is the only place
where dupreq entries are removed from
the drc. If the entries aren't removed from drc, drc refcnt doesn't go to 0.

>, its refcount should go to
> zero (maybe another ref is held by the socket itself, so the socket has to
> be closed as well).
>
>
> In fact, if we release DRC refcount without freeing the dupreq, that would
> be a bug!
>
> Regards, Malahal.
>
Thanks,
Satya.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc and non-cacheable ops

2017-05-02 Thread Satya Prakash GS
Ok. DRC doesn't work on requests with cacheable and non-cacheable ops in it.
Thank you Matt.

Regards,
Satya.

On Mon, May 1, 2017 at 9:07 PM, Matt Benjamin  wrote:
> Hi Satya,
>
> That is expected, yes.  I'm not aware of all possible implications.  The 
> issue of compound ops, specifically, is evidently only present in NFSv4.0 (or 
> 4.1, the DRC is not used).
>
> Matt
>
> ----- Original Message -
>> From: "Satya Prakash GS" 
>> To: nfs-ganesha-devel@lists.sourceforge.net
>> Sent: Monday, May 1, 2017 10:58:11 AM
>> Subject: Re: [Nfs-ganesha-devel] drc and non-cacheable ops
>>
>> Can somebody please reply to this.
>>
>> Thanks,
>> Satya.
>>
>> On Wed, Apr 26, 2017 at 3:02 PM, Satya Prakash GS
>>  wrote:
>> > Hi,
>> >
>> > I have been looking at the drc code, I see operations like READ,
>> > READDIR, etc are not cached in drc. Can a compound operations have mix
>> > of both cacheable and non-cacheable operations. For example, can
>> > client send both SETATTR and READ as part of one compound operation
>> > (if concurrent operations are going on). If there is a mix of
>> > operations looks like DRC doesn't cache the operation. Is this ok ?
>> >
>> > Thanks,
>> > Satya.
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>
> --
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc refcnt

2017-05-02 Thread Satya Prakash GS
>On Tue, May 2, 2017 at 11:51 AM, Malahal Naineni  wrote:
> Sorry, every cacheable request holds a ref on its DRC as well as its DUPREQ.
> The ref on DUPREQ should be released when the request goes away (via
> nfs_dupreq_rele). The ref on DRC will be released when the corresponding
> DUPREQ request gets released. Since we release DUPREQs while processing
> other requests, you are right that the DRC won't be freed if there are no
> more requests that would use the same DRC.

Ok.

> I think we should be freeing dupreq periodically using a timed function,
> something like that drc_free_expired.

Refcnt of the dupreq entries within the stale drc (of the disconnect
client) is +ve since it starts at 2 (it is decremented once in
nfs_dupreq_rele).
First the refcnt of the dupreq_entries should be decremented and freed
when it reaches 0. On freeing a dupreq entry, refcnt of drc should be
decremented.

All of the above should happen in free_expired or a similar function.

> Regards, Malahal.

Thanks,
Satya.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] reg. drc nested locks

2017-05-03 Thread Satya Prakash GS
Hi,

In nfs_dupreq_start and nfs_dupreq_finish when allocating/freeing a
dupreq_entry we are trying hard to keep both dupreq_q and the rbtree
in sync acquiring both the partition lock and the drc (t->mtx,
drc->mtx). This requires dropping and reacquiring locks at certain
places. Can these nested locks be changed to take locks one after the
other.

For example at the time of allocation, we could choose to do this -

PTHREAD_MUTEX_lock(&t->mtx); /* partition lock */
nv = rbtree_x_cached_lookup(&drc->xt, t, &dk->rbt_k, dk->hk);
if (!nv) {
dk->refcnt = 2;
(void)rbtree_x_cached_insert(&drc->xt, t,
&dk->rbt_k, dk->hk);
PTHREAD_MUTEX_unlock(&t->mtx); /* partition lock */

PTHREAD_MUTEX_lock(&drc->mtx);
TAILQ_INSERT_TAIL(&drc->dupreq_q, dk, fifo_q);
++(drc->size);
PTHREAD_MUTEX_unlock(&drc->mtx);
}

I am assuming this would simplify the lock code a lot.
If there is a case where this would introduce a race please let me know.

Thanks,
Satya.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] reg. drc nested locks

2017-05-03 Thread Satya Prakash GS
Thank you for the quick reply.

In dupreq_finish, as part of retiring the drc quite a few locks are
acquired and dropped (per entry). I want to fix a bug where drc retire
will happen as part of a different function (this will be called from
free_expired). The existing logic gets carried over to the new
function and I was thinking that we may not have to acquire and
release lock so many times.

Thanks,
Satya.

On Thu, May 4, 2017 at 1:21 AM, Matt Benjamin  wrote:
> Hi Satya,
>
> Sorry, my recommendation would be, we do not change locking to be more coarse 
> grained, and in general, should update it in response to an indication that 
> it is incorrect, not to improve readability in the first instance.
>
> Regards,
>
> Matt
>
> - Original Message -
>> From: "Matt Benjamin" 
>> To: "Satya Prakash GS" 
>> Cc: nfs-ganesha-devel@lists.sourceforge.net, "Malahal Naineni" 
>> 
>> Sent: Wednesday, May 3, 2017 3:43:06 PM
>> Subject: Re: [Nfs-ganesha-devel] reg. drc nested locks
>>
>> No?
>>
>> Matt
>>
>> - Original Message -
>> > From: "Satya Prakash GS" 
>> > To: nfs-ganesha-devel@lists.sourceforge.net, "Malahal Naineni"
>> > 
>> > Sent: Wednesday, May 3, 2017 3:34:31 PM
>> > Subject: [Nfs-ganesha-devel] reg. drc nested locks
>> >
>> > Hi,
>> >
>> > In nfs_dupreq_start and nfs_dupreq_finish when allocating/freeing a
>> > dupreq_entry we are trying hard to keep both dupreq_q and the rbtree
>> > in sync acquiring both the partition lock and the drc (t->mtx,
>> > drc->mtx). This requires dropping and reacquiring locks at certain
>> > places. Can these nested locks be changed to take locks one after the
>> > other.
>> >
>> > For example at the time of allocation, we could choose to do this -
>> >
>> > PTHREAD_MUTEX_lock(&t->mtx); /* partition lock */
>> > nv = rbtree_x_cached_lookup(&drc->xt, t, &dk->rbt_k, dk->hk);
>> > if (!nv) {
>> > dk->refcnt = 2;
>> > (void)rbtree_x_cached_insert(&drc->xt, t,
>> > &dk->rbt_k, dk->hk);
>> > PTHREAD_MUTEX_unlock(&t->mtx); /* partition lock */
>> >
>> > PTHREAD_MUTEX_lock(&drc->mtx);
>> > TAILQ_INSERT_TAIL(&drc->dupreq_q, dk, fifo_q);
>> > ++(drc->size);
>> > PTHREAD_MUTEX_unlock(&drc->mtx);
>> > }
>> >
>> > I am assuming this would simplify the lock code a lot.
>> > If there is a case where this would introduce a race please let me know.
>> >
>> > Thanks,
>> > Satya.
>> >
>> > --
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> > ___
>> > Nfs-ganesha-devel mailing list
>> > Nfs-ganesha-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>> >
>>
>> --
>> Matt Benjamin
>> Red Hat, Inc.
>> 315 West Huron Street, Suite 140A
>> Ann Arbor, Michigan 48103
>>
>> http://www.redhat.com/en/technologies/storage
>>
>> tel.  734-821-5101
>> fax.  734-769-8938
>> cel.  734-216-5309
>>
>
> --
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc refcnt

2017-05-03 Thread Satya Prakash GS
Agree, existing retire logic in dupreq_finish will not be changed. In
addition to this, stale drc objects will be handled in the timeout
function. Stale drc objects are drcs which aren't referenced in a
while (maintain last_used timestamp in drc and update it on every
ref). This stale drc can have upto 1000 dupreq objects.

static void
handle_stale_drcs(drc_t *drc) {

lock_drc;
while (entry_exists_in_list(dupreq_q)) {
dv = TAILQ_REMOVE(dupreq_q);
dv->next = prev;
prev = dv;
}
unlock_drc;

Now the dupreq entries are in the list pointed to by dv;

/* At this point only other references to dv are threads which are
actively using the dv.
 * These threads will just do entry_put and not add the dv back to
dupreq_q list.
 *  So, we arrested all the double frees.
 */

while (entry_exists_in_list(dv)) {
hk = dv->hk;
lock_partition(hk);
remove_from_rb_tree(dv);
entry_put(dv);
unlock_partition(hk);
}

put_drc(drc);
}

This is what I had in mind. I could possibly have missed some race.

Thanks,
Satya.

On Thu, May 4, 2017 at 8:07 AM, Malahal Naineni  wrote:
> Matt, you are correct. We lose some memory (drc and dupreqs) for a client
> that never reconnects. Doing solely time based strategy is not scalable as
> well unless we fork multiple threads for doing this. My understanding is
> that there will be one time based strategy (hopefully, the time is long
> enough that it does not interfere with current strategy) in __addition__ to
> the current retiring strategy.
>
> Regards, Malahal.
>
> On Thu, May 4, 2017 at 3:56 AM, Matt Benjamin  wrote:
>>
>> Hi Guys,
>>
>> To get on the record here, the current retire strategy using new requests
>> to retire old ones is an intrinsic good, particularly with TCP and related
>> cots-ord transports where requests are totally ordered.  I don't think
>> moving to a strictly time-based strategy is preferable.  Apparently the
>> actually observed or theorized issue has to do with not disposing of
>> requests in invalidated DRCs?  That seems to be a special case, no?
>>
>> Matt
>>
>> - Original Message -
>> > From: "Malahal Naineni" 
>> > To: "Satya Prakash GS" 
>> > Cc: "Matt Benjamin" ,
>> > nfs-ganesha-devel@lists.sourceforge.net
>> > Sent: Tuesday, May 2, 2017 2:21:48 AM
>> > Subject: Re: [Nfs-ganesha-devel] drc refcnt
>> >
>> > Sorry, every cacheable request holds a ref on its DRC as well as its
>> > DUPREQ. The ref on DUPREQ should be released when the request goes away
>> > (via nfs_dupreq_rele). The ref on DRC will be released when the
>> > corresponding DUPREQ request gets released. Since we release DUPREQs
>> > while
>> > processing other requests, you are right that the DRC won't be freed if
>> > there are no more requests that would use the same DRC.
>> >
>> > I think we should be freeing dupreq periodically using a timed function,
>> > something like that drc_free_expired.
>> >
>> > Regards, Malahal.
>> >
>> >
>> >
>> > On Tue, May 2, 2017 at 10:38 AM, Satya Prakash GS
>> > 
>> > wrote:
>> >
>> > > > On Tue, May 2, 2017 at 7:58 AM, Malahal Naineni 
>> > > wrote:
>> > > > A dupreq will place a refcount on its DRC when it calls xxx_get_drc,
>> > > > so
>> > > we
>> > > > will release that DRC refcount when we free the dupreq.
>> > >
>> > > Ok, so every dupreq holds a ref on the drc. In case of drc cache hit,
>> > > a dupreq entry can ref the
>> > > drc more than once. This is still fine because unless the dupreq entry
>> > > ref goes to zero the drc isn't freed.
>> > >
>> > > > nfs_dupreq_finish() shouldn't free its own dupreq. When it does free
>> > > > some
>> > > > other dupreq, we will release DRC refcount corresponding to that
>> > > > dupreq.
>> > >
>> > > > When we free all dupreqs that belong to a DRC
>> > >
>> > > In the case of a disconnected client when are all the dupreqs freed ?
>> > >
>> > > When all the filesystem operations subside from a client (mount point
>> > > is no longer in use),
>> > > nfs_dupreq_finish doesn't get called anymore. This is the only place
>> > > where dupreq entries are removed from
>> > > the drc. If the entries aren't removed from drc, drc refcnt doesn't go
>> > > to
>> > > 0.
>> > >
>> > > >

[Nfs-ganesha-devel] review request https://review.gerrithub.io/#/c/390652/

2018-03-09 Thread Satya Prakash GS
Can somebody please review this change : https://review.gerrithub.io/#/c/390652/

It addresses this issue :

Leak in DRC when client disconnects nfs_dupreq_finish doesn't call
put_drc always. It does only if it meets certain criteria
(drc_should_retire). This can leak the drc and the dupreq entries
within it when the client disconnects. More information can be found
here : https://sourceforge.net/p/nfs-ganesha/mailman/message/35815930/



Main idea behind the change.

Introduced a new drc queue which holds all the active drc objects
(tcp_drc_q in drc_st).
Every new drc is added to tcp_drc_q initially. Eventually it is moved
to tcp_drc_recycle_q. Drcs are freed from tcp_drc_recycle_q. Every drc
is either in the active drc queue or in the recycle queue.

DRC Refcount and transition from active drc to recycle queue :

Drc refcnt is initialized to 2. In dupreq_start, increment the drc
refcount. In dupreq_rele, decrement the drc refcnt. Drc refcnt is also
decremented in nfs_rpc_free_user_data. When drc refcnt goes to 0 and
drc is found not in use for 10 minutes, pick it up and free the
entries in iterations of 32 items at at time. Once the dupreq entries
goes to 0, remove the drc from tcp_drc_q and add it to
tcp_drc_recycle_q. Today, entries added to tcp_drc_recycle_q are
cleaned up periodically. Same logic should clean up these entries too.

Thanks,
Satya.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] review request https://review.gerrithub.io/#/c/390652/

2018-03-09 Thread Satya Prakash GS
I had replied to the comments on the same day Matt posted. My replies show
as drafts, looks like I have to publish them. I don't see a publish button
either. Can you guys help me out.

Thanks,
Satya.

On 9 Mar 2018 20:48, "Frank Filz"  wrote:

> Matt had called for additional discussion on this, so let's get that
> discussion going.
>
> Could you address Matt's questions?
>
> Frank
>
> > -Original Message-
> > From: Satya Prakash GS [mailto:g.satyaprak...@gmail.com]
> > Sent: Friday, March 9, 2018 4:17 AM
> > To: nfs-ganesha-devel@lists.sourceforge.net
> > Cc: Malahal Naineni ; Frank Filz
> > 
> > Subject: review request https://review.gerrithub.io/#/c/390652/
> >
> > Can somebody please review this change :
> > https://review.gerrithub.io/#/c/390652/
> >
> > It addresses this issue :
> >
> > Leak in DRC when client disconnects nfs_dupreq_finish doesn't call
> put_drc
> > always. It does only if it meets certain criteria (drc_should_retire).
> This can leak
> > the drc and the dupreq entries within it when the client disconnects.
> More
> > information can be found here : https://sourceforge.net/p/nfs-
> > ganesha/mailman/message/35815930/
> >
> > 
> >
> > Main idea behind the change.
> >
> > Introduced a new drc queue which holds all the active drc objects
> (tcp_drc_q in
> > drc_st).
> > Every new drc is added to tcp_drc_q initially. Eventually it is moved to
> > tcp_drc_recycle_q. Drcs are freed from tcp_drc_recycle_q. Every drc is
> either in
> > the active drc queue or in the recycle queue.
> >
> > DRC Refcount and transition from active drc to recycle queue :
> >
> > Drc refcnt is initialized to 2. In dupreq_start, increment the drc
> refcount. In
> > dupreq_rele, decrement the drc refcnt. Drc refcnt is also decremented in
> > nfs_rpc_free_user_data. When drc refcnt goes to 0 and drc is found not
> in use
> > for 10 minutes, pick it up and free the entries in iterations of 32
> items at at time.
> > Once the dupreq entries goes to 0, remove the drc from tcp_drc_q and add
> it to
> > tcp_drc_recycle_q. Today, entries added to tcp_drc_recycle_q are cleaned
> up
> > periodically. Same logic should clean up these entries too.
> >
> > Thanks,
> > Satya.
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] review request https://review.gerrithub.io/#/c/390652/

2018-03-09 Thread Satya Prakash GS
Aah. Now I could publish the comments. Thank you Matt.

Regards,
Satya.

On Fri, Mar 9, 2018 at 9:53 PM, Matt Benjamin  wrote:
> Hi Satya,
>
> To reply, to a reply on the top level (can even be blank), all your
> inline comments will publish then.
>
> Matt
>
> On Fri, Mar 9, 2018 at 11:21 AM, Satya Prakash GS
>  wrote:
>> I had replied to the comments on the same day Matt posted. My replies show
>> as drafts, looks like I have to publish them. I don't see a publish button
>> either. Can you guys help me out.
>>
>> Thanks,
>> Satya.
>>
>> On 9 Mar 2018 20:48, "Frank Filz"  wrote:
>>>
>>> Matt had called for additional discussion on this, so let's get that
>>> discussion going.
>>>
>>> Could you address Matt's questions?
>>>
>>> Frank
>>>
>>> > -Original Message-
>>> > From: Satya Prakash GS [mailto:g.satyaprak...@gmail.com]
>>> > Sent: Friday, March 9, 2018 4:17 AM
>>> > To: nfs-ganesha-devel@lists.sourceforge.net
>>> > Cc: Malahal Naineni ; Frank Filz
>>> > 
>>> > Subject: review request https://review.gerrithub.io/#/c/390652/
>>> >
>>> > Can somebody please review this change :
>>> > https://review.gerrithub.io/#/c/390652/
>>> >
>>> > It addresses this issue :
>>> >
>>> > Leak in DRC when client disconnects nfs_dupreq_finish doesn't call
>>> > put_drc
>>> > always. It does only if it meets certain criteria (drc_should_retire).
>>> > This can leak
>>> > the drc and the dupreq entries within it when the client disconnects.
>>> > More
>>> > information can be found here : https://sourceforge.net/p/nfs-
>>> > ganesha/mailman/message/35815930/
>>> >
>>> > 
>>> >
>>> > Main idea behind the change.
>>> >
>>> > Introduced a new drc queue which holds all the active drc objects
>>> > (tcp_drc_q in
>>> > drc_st).
>>> > Every new drc is added to tcp_drc_q initially. Eventually it is moved to
>>> > tcp_drc_recycle_q. Drcs are freed from tcp_drc_recycle_q. Every drc is
>>> > either in
>>> > the active drc queue or in the recycle queue.
>>> >
>>> > DRC Refcount and transition from active drc to recycle queue :
>>> >
>>> > Drc refcnt is initialized to 2. In dupreq_start, increment the drc
>>> > refcount. In
>>> > dupreq_rele, decrement the drc refcnt. Drc refcnt is also decremented in
>>> > nfs_rpc_free_user_data. When drc refcnt goes to 0 and drc is found not
>>> > in use
>>> > for 10 minutes, pick it up and free the entries in iterations of 32
>>> > items at at time.
>>> > Once the dupreq entries goes to 0, remove the drc from tcp_drc_q and add
>>> > it to
>>> > tcp_drc_recycle_q. Today, entries added to tcp_drc_recycle_q are cleaned
>>> > up
>>> > periodically. Same logic should clean up these entries too.
>>> >
>>> > Thanks,
>>> > Satya.
>>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nfs4 idmapping issue with sssd fully qualified domain names

2018-08-22 Thread Satya Prakash GS
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.
We hit the exact bug that was mentioned here.
https://bugzilla.redhat.com/show_bug.cgi?id=1378557

The issue was happening only with multiple AD domains configured and
trust established between them. libnfsidmap was stripping the domain
name if a username is passed in fully-qualified domain name format.
No-Strip option in idmapd.conf has to be set to "both" to stop
nfsidmap from stripping the domain name. Even with this option set, I
could not get it working with libnfsidmap-0.25/26. I could only get it
working with libnfsidmap-0.27. I compiled it, replaced both libraries,
libnfsidmap.so and nsswitch.so and username was properly being passed
to the layer below (sssd/winbind).

With this fix, id is properly being resolved with sssd and winbind.

Thanks,
Satya.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel