On Jan 13, 2014, at 8:39 AM, David Holland wrote:
> On Tue, Jan 07, 2014 at 11:30:40AM +0100, J. Hannken-Illjes wrote:
>>> (also, while this is minor I think I'd rather have vop_mkdir_args_v2
>>> and/or vop_mkdir_desc_v2 rather than vop_mkdir2_args and
>>> vop_mkdir2_desc. The version doesn't need to be part of the operation
>>> name.)
>>
>> Looks very good -- changed vnode_if.sh to create xxx_v2_args. Diff attached.
>
> Would you mind committing the versioning logic right away? It has
> clear value and it would be nice for it to appear in a separate
> changeset.
Done.
> Unlocking and relocking should not itself be a problem since AFAIK
> there isn't any reason for those locks to be held in the first place;
> however, I think I'd better go review all the call sites just in case.
>
> Plus I think it would be advisable to make this change first, so that
> in case we do find a site where it matters we can lock the returned
> object before anyone else can see it:
>
> #
> -#% mkdir dvp L U U
> +#% mkdir dvp L L L
Where should it matter? VOP_MKDIR is only used from:
sys/nfs/nfs_serv.c:nfsrv_mkdir()
>
> This extracts the nfs file handle for the new dir and calls
> VOP_GETATTR. If we don't leave the directory locked, these actions are
> no longer atomic with the directory creation. I don't know whether
> this is safe or not. My guess is not. In the absence of clear
> information it seems prudent to not introduce this race.
RFC 1813 is a bit vague here:
post_op_attr
This structure is used for returning attributes in those operations that are
not directly involved with manipulating attributes. One of the principles of
this revision of the NFS protocol is to return the real value from the
indicated
operation and not an error from an incidental operation. The post_op_attr
structure
was designed to allow the server to recover from errors encountered while
getting attributes.
This appears to make returning attributes optional. However, server
implementors
are strongly encouraged to make best effort to return attributes whenever
possible,
even when returning an error.
post_op_fh3
One of the principles of this revision of the NFS protocol is to return the
real
value from the indicated operation and not an error from an incidental
operation.
The post_op_fh3 structure was designed to allow the server to recover from
errors
encountered while constructing a file handle.
This is the structure used to return a file handle from the CREATE, MKDIR,
SYMLINK,
MKNOD, and READDIRPLUS requests. In each case, the client can get the file
handle by
issuing a LOOKUP request after a successful return from one of the listed
operations.
Returning the file handle is an optimization so that the client is not forced
to
immediately issue a LOOKUP request to get the file handle.
The "reference implementation" from Solaris is not atomic:
VOP_MKDIR(dvp, &vp)
VOP_GETATTR(vp)
makefh(vp)
VN_RELE(vp)
VN_RELE(dvp)
sys/fs/union/union_subr.c:union_mkshadow()
>
> ...but this doesn't, the caller is trying to create the directory and
> also bind it into the onionfs layering atomically. Of course, being
> onionfs, it's already only approximate and adding another race to it
> may not make things any worse; but it's probably better not to.
Looking closer it makes things better here. Currently unionfs will
lock in the wrong order (uppervp -> upperdvp) while it now does it right.
If the shadow directory we just created would disappear because another
thread removed it outside of unionfs we had much more problems than
a race here.
Even though it looks like we don't need creation ops to return with
locked directory at the moment it may be better to do this change now
so atomic operations like the one seen in NFS server are possible.
Thus we would end up with:
-#% create dvp L U U
+#% create dvp L L L
-#% create vpp - L -
+#% create vpp - U -
- IN LOCKED=YES WILLPUT struct vnode *dvp;
+ IN LOCKED=YES struct vnode *dvp;
-#% mknod dvp L U U
+#% mknod dvp L L L
-#% mknod vpp - L -
+#% mknod vpp - U -
- IN LOCKED=YES WILLPUT struct vnode *dvp;
+ IN LOCKED=YES struct vnode *dvp;
-#% mkdir dvp L U U
+#% mkdir dvp L L L
-#% mkdir vpp - L -
+#% mkdir vpp - U -
- IN LOCKED=YES WILLPUT struct vnode *dvp;
+ IN LOCKED=YES struct vnode *dvp;
-#% symlinkdvp L U U
+#% symlinkdvp L L L
-#% symlinkvpp - L -
+#% symlinkvpp - U -
- IN LOCKED=YES WILLPUT struct vnode *dvp;
+ IN LOCKED=YES struct vnode *dvp;
Please comment soon, I will prepare a new diff then.
--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)