Re: Vnode API cleanup pass 2a

2014-01-13 Thread David Holland
On Mon, Jan 13, 2014 at 04:06:02PM +0100, J. Hannken-Illjes wrote:
 > [various nfs stuff]

Yeah, I'm not sure what to make of that either.
 
 > The "reference implementation" from Solaris is not atomic:
 > 
 >   VOP_MKDIR(dvp, &vp)
 >   VOP_GETATTR(vp)
 >   makefh(vp)
 >   VN_RELE(vp)
 >   VN_RELE(dvp)

They can't do this kind of thing atomically with their blinkered vfs
locking :-)

One should get the same file handle even if it isn't atomic and that's
probably ok; on the other hand it would be weird if you created a
directory mode 755 and the stat info you got back said 700 because a
chmod raced with the getattr. On the other hand, it's NFS.

 >  sys/fs/union/union_subr.c:union_mkshadow()
 > > 
 > > ...but this doesn't, the caller is trying to create the directory and
 > > also bind it into the onionfs layering atomically. Of course, being
 > > onionfs, it's already only approximate and adding another race to it
 > > may not make things any worse; but it's probably better not to.
 > 
 > Looking closer it makes things better here.  Currently unionfs will
 > lock in the wrong order (uppervp -> upperdvp) while it now does it right.

Oh lovely, I missed that.

 > If the shadow directory we just created would disappear because another
 > thread removed it outside of unionfs we had much more problems than
 > a race here.

Yeah, I think the more likely problem is that it might try to create
it twice and have one of them fail unexpectedly. But maybe it's
already capable of dealing with that.

 > Even though it looks like we don't need creation ops to return with
 > locked directory at the moment it may be better to do this change now
 > so atomic operations like the one seen in NFS server are possible.

Yeah, that's what I was thinking

 > Thus we would end up with:
 > [snip]

Right.

It would probably be better to make this a separate commit; if that's
a nuisance for you, I can do it...

-- 
David A. Holland
dholl...@netbsd.org


Re: Vnode API cleanup pass 2a

2014-01-13 Thread J . Hannken-Illjes
On Jan 13, 2014, at 8:39 AM, David Holland  wrote:

> On Tue, Jan 07, 2014 at 11:30:40AM +0100, J. Hannken-Illjes wrote:
>>> (also, while this is minor I think I'd rather have vop_mkdir_args_v2
>>> and/or vop_mkdir_desc_v2 rather than vop_mkdir2_args and
>>> vop_mkdir2_desc. The version doesn't need to be part of the operation
>>> name.)
>> 
>> Looks very good -- changed vnode_if.sh to create xxx_v2_args. Diff attached.
> 
> Would you mind committing the versioning logic right away? It has
> clear value and it would be nice for it to appear in a separate
> changeset.

Done.


> Unlocking and relocking should not itself be a problem since AFAIK
> there isn't any reason for those locks to be held in the first place;
> however, I think I'd better go review all the call sites just in case.
> 
> Plus I think it would be advisable to make this change first, so that
> in case we do find a site where it matters we can lock the returned
> object before anyone else can see it:
> 
> #
> -#% mkdir  dvp L U U
> +#% mkdir  dvp L L L
 
 Where should it matter? VOP_MKDIR is only used from:

 sys/nfs/nfs_serv.c:nfsrv_mkdir()
> 
> This extracts the nfs file handle for the new dir and calls
> VOP_GETATTR. If we don't leave the directory locked, these actions are
> no longer atomic with the directory creation. I don't know whether
> this is safe or not. My guess is not. In the absence of clear
> information it seems prudent to not introduce this race.

RFC 1813 is a bit vague here:

  post_op_attr

  This structure is used for returning attributes in those operations that are
  not directly involved with manipulating attributes. One of the principles of
  this revision of the NFS protocol is to return the real value from the 
indicated
  operation and not an error from an incidental operation. The post_op_attr 
structure
  was designed to allow the server to recover from errors encountered while
  getting attributes.

  This appears to make returning attributes optional. However, server 
implementors
  are strongly encouraged to make best effort to return attributes whenever 
possible,
  even when returning an error.

  post_op_fh3

  One of the principles of this revision of the NFS protocol is to return the 
real
  value from the indicated operation and not an error from an incidental 
operation.
  The post_op_fh3 structure was designed to allow the server to recover from 
errors
  encountered while constructing a file handle.

  This is the structure used to return a file handle from the CREATE, MKDIR, 
SYMLINK,
  MKNOD, and READDIRPLUS requests. In each case, the client can get the file 
handle by
  issuing a LOOKUP request after a successful return from one of the listed 
operations.
  Returning the file handle is an optimization so that the client is not forced 
to
  immediately issue a LOOKUP request to get the file handle.

The "reference implementation" from Solaris is not atomic:

  VOP_MKDIR(dvp, &vp)
  VOP_GETATTR(vp)
  makefh(vp)
  VN_RELE(vp)
  VN_RELE(dvp)

 sys/fs/union/union_subr.c:union_mkshadow()
> 
> ...but this doesn't, the caller is trying to create the directory and
> also bind it into the onionfs layering atomically. Of course, being
> onionfs, it's already only approximate and adding another race to it
> may not make things any worse; but it's probably better not to.


Looking closer it makes things better here.  Currently unionfs will
lock in the wrong order (uppervp -> upperdvp) while it now does it right.

If the shadow directory we just created would disappear because another
thread removed it outside of unionfs we had much more problems than
a race here.


Even though it looks like we don't need creation ops to return with
locked directory at the moment it may be better to do this change now
so atomic operations like the one seen in NFS server are possible.

Thus we would end up with:

-#% create dvp L U U
+#% create dvp L L L
-#% create vpp - L -
+#% create vpp - U -
-   IN LOCKED=YES WILLPUT struct vnode *dvp;
+   IN LOCKED=YES struct vnode *dvp;
-#% mknod  dvp L U U
+#% mknod  dvp L L L
-#% mknod  vpp - L -
+#% mknod  vpp - U -
-   IN LOCKED=YES WILLPUT struct vnode *dvp;
+   IN LOCKED=YES struct vnode *dvp;
-#% mkdir  dvp L U U
+#% mkdir  dvp L L L
-#% mkdir  vpp - L - 
+#% mkdir  vpp - U - 
-   IN LOCKED=YES WILLPUT struct vnode *dvp;
+   IN LOCKED=YES struct vnode *dvp;
-#% symlinkdvp L U U
+#% symlinkdvp L L L
-#% symlinkvpp - L -
+#% symlinkvpp - U -
-   IN LOCKED=YES WILLPUT struct vnode *dvp;
+   IN LOCKED=YES struct vnode *dvp;


Please comment soon, I will prepare a new diff then.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)