Hello,

we have the same problem here after update to Opensolaris 2009.06 SU7 
(+IDR30+35)
on a 40 clients production system.
About every second day one of our servers looses state,  and gives the output 
below on snoop
(numerous reopen attempts with a  NFS4ERR_NO_GRACE error). All clients hang, 
since 
they don't get their data, only a restart of the nfs/server process cures this 
problem.
This seems to be a serious bug in the NFS module introduced recently in both 
Opensolaris
support line and solaris 10 u8.

   sun -> local-s11    NFS R 4 (open        ) NFS4ERR_STALE_CLIENTID PUTFH 
NFS4_OK OPEN NFS4ERR_STALE_CLIENTID
   local-s11 -> sun   NFS C 4 (reopen      ) PUTFH FH=B5F3 OPEN OT=NC SQ=14571 
CT=P DT=N AC=W DN=N OO=0955 GE
TFH GETATTR 10011a b0a23a 
   sun -> local-s11    NFS R 4 (reopen      ) NFS4ERR_NO_GRACE PUTFH NFS4_OK 
OPEN NFS4ERR_NO_GRACE 
           ? -> (multicast)  ETHER Type=0001 (LLC/802.3), size=52 bytes
   local-s11 -> sun    NFS C 4 (reopen      ) PUTFH FH=92CA OPEN OT=NC SQ=14539 
CT=P DT=N AC=R DN=N OO=0A4C GE
TFH GETATTR 10011a b0a23a 
   local-s11 -> sun    TCP D=2049 S=1023 Ack=293438085 Seq=4069541650 Len=0 
Win=64074 Options=<nop,nop,tstamp 
25776951 188898831>
   local-s11 -> sun    TCP D=2049 S=1023 Ack=293438153 Seq=4069541650 Len=0 
Win=64074 Options=<nop,nop,tstamp 
25776951 188898907>
   sun -> local-s11    NFS R 4 (reopen      ) NFS4ERR_NO_GRACE PUTFH NFS4_OK 
OPEN NFS4ERR_NO_GRACE 
   local-s11 -> sun    NFS C 4 (reopen      ) PUTFH FH=AA7B OPEN 
imksuns11:1.log OT=NC SQ=14572 CT=N AC=W DN=N
 OO=0955 GETFH GETATTR 10011a b0a...
   local-s11 -> sun    NFS C 4 (reopen      ) PUTFH FH=95AC OPEN start_version 
OT=NC SQ=14540 CT=N AC=R DN=N O
O=0A4C GETFH GETATTR 10011a b0a23a 
   sun -> local-s11    TCP D=1023 S=2049 Ack=4069542150 Seq=293438289 Len=0 
Win=64074 Options=<nop,nop,tstamp 
188898915 25776951>
   sun -> local-s11    NFS R 4 (reopen      ) NFS4_OK PUTFH NFS4_OK OPEN 
NFS4_OK ST=1A6F:7271 RF=PL DT=N GETFH
 NFS4_OK FH=92CA GETATTR NFS4_OK 
   local-s11 -> sun    NFS C 4 (reopen      ) PUTFH FH=92CA OPEN OT=NC SQ=14539 
CT=P DT=N AC=R DN=N OO=0A0F GE
TFH GETATTR 10011a b0a23a 
   sun -> local-s11    NFS R 4 (reopen      ) NFS4ERR_NO_GRACE PUTFH NFS4_OK 
OPEN NFS4ERR_NO_GRACE 
   local-s11 -> sun    NFS C 4 (reopen      ) PUTFH FH=95AC OPEN start_version 
OT=NC SQ=14540 CT=N AC=R DN=N O
O=0A0F GETFH GETATTR 10011a b0a23a 
   sun -> local-s11    NFS R 4 (reopen      ) NFS4_OK PUTFH NFS4_OK OPEN 
NFS4_OK ST=1B35:7271 RF=PL DT=N GETFH
 NFS4_OK FH=92CA GETATTR NFS4_OK 
   local-s11 -> sun    NFS C 4 (reopen      ) PUTFH FH=92CA OPEN OT=NC SQ=14539 
CT=P DT=N AC=R DN=N OO=0AEE GE
TFH GETATTR 10011a b0a23a 
   sun -> local-s11    NFS R 4 (reopen      ) NFS4ERR_NO_GRACE PUTFH NFS4_OK 
OPEN NFS4ERR_NO_GRACE 
   local-s11 -> sun    NFS C 4 (reopen      ) PUTFH FH=95AC OPEN start_version 
OT=NC SQ=14540 CT=N AC=R DN=N O

NFS4ERR_NO_GRACE
     A reclaim of client state was attempted in circumstances in 
      which the server cannot guarantee that conflicting state has 
      not been provided to another client.  This can occur because 
      the reclaim has been done outside of the grace period of the
      server, after the client has done a RECLAIM_COMPLETE operation,
      or because previous operations have created a situation in which
      the server is not able to determine that a reclaim-interfering
      edge condition does not exist.
-- 
This message posted from opensolaris.org

Reply via email to