> There was a change made to the delegation monitor for setattr that went 
> in snv_76.  Previously, setattr only caused a recall when the size, 
> group, owner, or permissions of the file changed.  Now, any setattr will 
> cause a recall.

Hmm, yes, using a "chmod 777" instead of "touch" on the nfs4
client gets the same 3 minute timeout...


And this probably explains why the issue isn't reproducible on the
sparc box nfs4 server, which is running snv_72.


> If the client in your example had a write delegation, the recall would 
> not happen and you wouldn't see the "hang".

The client is actually running an imap server, which apparently is
trying to find out the format of the user's mailbox by opening the
mailbox file O_RDONLY, reading and analysing the first few bytes
and closing the mailbox file.  After imapd has figured out the mailbox's
format it apparently tries to restore the mailbox access time using
utimes(2).

These mailbox files were located on an nfs4 server on a remote
machine, and the imap daemon was very slow opening the 
mailbox files from the nfs4 filesystem.  I think my procedure
with "date", "cat" and "touch" reproduces the issue that I was
observing with the imapd.


> The hang is caused by the 
> client not returning the delegation and the monitor holding the setattr 
> operation until the delegation is returned or, in this case, revoked.  
> After waiting a lease period (90 seconds), the server sends another 
> cb_recall.  When the delegation still isn't returned after another lease 
> period, the server revokes the delegation and the setattr operation 
> completes.
> 
> Since the client doesn't return the delegation until the setattr 
> completes, I would speculate that the client can't perform the 
> delegation return over the wire until the previous over the wire 
> operation (setattr) completes.
> 
> I'm not sure if there is a dependency that needs to be fixed, or if the 
> client should just return the delegation before performing the setattr 
> operation.  However, there was another change to the monitors which got 
> putback to build snv_80 which may solve the problem (though I need to 
> test that).  The monitors have been changed to return EAGAIN when the 
> caller doesn't want to block while waiting for the delegation to be 
> returned.  The changes were put in for the NFSv3 and NFSv2 servers, so 
> I'm not sure it will fix the problem for v4.  If it doesn't, it is just 
> a simple code change to make it work (adding a flag to caller context 
> and have the server check the return from VOP_SETATTR).

Ahh, the "cc_flags" member in the caller_context_t was added on
December 5th with the putback for "PSARC 2007/632 Caller context
flags"?

Isn't the new ct.cc_flags member uninitialized in nfs4_srv.c, in function
do_rfs4_op_setattr()?   And in rfs4_op_read(), rfs4_op_write(), 
rfs4_createfile(),
rfs4_do_open(), ... ?

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/nfs/nfs4_srv.c#4891

   4891         /* Check stateid only if size has been set */
   4892         if (sarg.vap->va_mask & AT_SIZE) {
   4893                 trunc = (sarg.vap->va_size == 0);
   4894                 status = rfs4_check_stateid(FWRITE, cs->vp, stateid,
   4895                     trunc, &cs->deleg, sarg.vap->va_mask & AT_SIZE, 
&ct);
   4896                 if (status != NFS4_OK)
   4897                         goto done;
   4898         } else {
   4899                 ct.cc_sysid = 0;
   4900                 ct.cc_pid = 0;
   4901                 ct.cc_caller_id = nfs4_srv_caller_id;        
<<<<<<<<<<<  ct.cc_flags  ????
   4902         }


Maybe in those cases where I couldn't reliably reproduce the 3 min
"hang" for the touch command, the uninitialized ct.cc_flags on the
server had the low bit set, so that recall_all_delegations()
didn't wait and returned NFS4ERR_DELAY, the monitor
returned that as EAGAIN, ...
 
 
This message posted from opensolaris.org

Reply via email to