Re: Vnode API cleanup pass 2a

2014-01-21 Thread J . Hannken-Illjes
On Jan 20, 2014, at 5:02 PM, Taylor R Campbell 
campbell+netbsd-tech-k...@mumble.net wrote:

   Date: Mon, 20 Jan 2014 16:01:07 +0100
   From: J. Hannken-Illjes hann...@eis.cs.tu-bs.de
 
   Back to vnode creation operations returning unlocked vnodes I put a new
   diff to http://www.netbsd.org/~hannken/vnode-pass2a-4.diff
 
   Any more objections or OK to commit?
 
 On further consideration, this looks to me like the right approach
 now.  I gave it a quick skim, but not a thorough review.  It looks
 pretty good, modulo a few little nits below.  I assume all the atf
 tests and your other stress tests still pass?

Sure.  Now even on a 20-core amd64 with linux/kvm as a host.

   @@ -517,8 +517,10 @@ zfs_replay_create(zfsvfs_t *zfsvfs, lr_c
   error = VOP_MKDIR(ZTOV(dzp), vp, cn, xva.xva_vattr 
 /*,vflg*/);
   break;
   case TX_MKXATTR:
   error = zfs_make_xattrdir(dzp, xva.xva_vattr, vp, kcred);
   +   if (error == 0  vp != NULL)
   +   VOP_UNLOCK(vp);
 
 This looks suspicious -- does zfs_make_xattrdir return a locked vnode?
 I don't immediately see evidence that it does.  (Also, if it turns out
 this branch is appropriate, shouldn't ((error == 0) == (vp != NULL))
 be guaranteed?)

Yes, looks like zfs_make_xattrdir returns an unlocked vnode (which means
the current implementation is wrong).  Removed this part.

   @@ -1563,10 +1553,11 @@ coda_symlink(void *v)
   saved_cn_flags = cnp-cn_flags;
   cnp-cn_flags = ~(MODMASK | OPMASK);
   cnp-cn_flags |= LOOKUP;
   error = VOP_LOOKUP(dvp, ap-a_vpp, cnp);
   +   if (error == 0)
   +   VOP_UNLOCK(*ap-a_vpp);
 
 Can you leave an XXX comment here marking it as a provisional kludge
 until VOP_LOOKUP becomes sane (i.e., returns the vnode unlocked)?

Done.

   @@ -601,8 +601,10 @@ smbfs_create(void *v)
   error = smbfs_smb_lookup(dnp, name, nmlen, fattr, scred);
   if (error)
   goto out;
   error = smbfs_nget(VTOVFS(dvp), dvp, name, nmlen, fattr, 
 ap-a_vpp);
   +   if (error == 0)
   +   VOP_UNLOCK(*ap-a_vpp);
   if (error) goto out;
 
 Can you do the VOP_UNLOCK unconditionally after the `if (error)'
 branch?  Generally I find it much easier to read `branch on error'
 than `branch on success' and I try to adhere to that in new code.
 There are a few other cases of this in your patch, although some match
 the surrounding style of doing `if (error == 0)' or would require new
 labels so I wouldn't change those.

Done.

   --- sys/rump/librump/rumpvfs/rumpfs.c   17 Jan 2014 10:55:03 -  
 1.122
   +++ sys/rump/librump/rumpvfs/rumpfs.c   20 Jan 2014 09:10:53 -
   @@ -564,9 +564,10 @@ makeprivate(enum vtype vt, mode_t mode, 
   return rn;
}
 
static int
   -makevnode(struct mount *mp, struct rumpfs_node *rn, struct vnode **vpp)
   +makevnode(struct mount *mp, struct rumpfs_node *rn, struct vnode **vpp,
   +bool lock_result)
 
 There are only a couple cases of makevnode(..., true) in your patch,
 in rump_vop_lookup and rumpfs_mountfs.  Instead of adding an argument
 to makevnode, why not just call vn_lock after makevnode in those
 cases?

Done.  rump_vop_lookup changed as proposed -- this lock will go with the
next pass doing the same for VOP_LOOKUP.  rumpfs_mountfs doesn't need a
lock -- an unmount can not happen here:

RCS file: /cvsroot/src/sys/rump/librump/rumpvfs/rumpfs.c,v
@@ -813,4 +812,11 @@ rump_vop_lookup(void *v)
mutex_exit(reclock);
rv = makevnode(dvp-v_mount, rn, vpp);
+   if (rv == 0) {
+   rv = vn_lock(*vpp, LK_EXCLUSIVE);
+   if (rv != 0) {
+   vrele(*vpp);
+   *vpp = NULL;
+   }
+   }
}
if (dotdot)
@@ -1762,5 +1768,4 @@ rumpfs_mountfs(struct mount *mp)
 
rfsmp-rfsmp_rvp-v_vflag |= VV_ROOT;
-   VOP_UNLOCK(rfsmp-rfsmp_rvp);
 
mp-mnt_data = rfsmp;


   @@ -183,16 +184,17 @@ ext2fs_mknod(void *v)
* Remove inode so that it will be reloaded by VFS_VGET and
* checked to see if it is an alias of an existing entry in
* the inode cache.
*/
   -   VOP_UNLOCK(*vpp);
   (*vpp)-v_type = VNON;
   +   VOP_UNLOCK(*vpp);
 
 Can you do this one in a separate tiny commit?  It looks like an
 unrelated bug fix.


Done.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Vnode API cleanup pass 2a

2014-01-20 Thread J . Hannken-Illjes
Back to vnode creation operations returning unlocked vnodes I put a new
diff to http://www.netbsd.org/~hannken/vnode-pass2a-4.diff

Any more objections or OK to commit?

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)


Re: Vnode API cleanup pass 2a

2014-01-20 Thread Taylor R Campbell
   Date: Mon, 20 Jan 2014 16:01:07 +0100
   From: J. Hannken-Illjes hann...@eis.cs.tu-bs.de

   Back to vnode creation operations returning unlocked vnodes I put a new
   diff to http://www.netbsd.org/~hannken/vnode-pass2a-4.diff

   Any more objections or OK to commit?

On further consideration, this looks to me like the right approach
now.  I gave it a quick skim, but not a thorough review.  It looks
pretty good, modulo a few little nits below.  I assume all the atf
tests and your other stress tests still pass?

   @@ -517,8 +517,10 @@ zfs_replay_create(zfsvfs_t *zfsvfs, lr_c
   error = VOP_MKDIR(ZTOV(dzp), vp, cn, xva.xva_vattr 
/*,vflg*/);
   break;
   case TX_MKXATTR:
   error = zfs_make_xattrdir(dzp, xva.xva_vattr, vp, kcred);
   +   if (error == 0  vp != NULL)
   +   VOP_UNLOCK(vp);

This looks suspicious -- does zfs_make_xattrdir return a locked vnode?
I don't immediately see evidence that it does.  (Also, if it turns out
this branch is appropriate, shouldn't ((error == 0) == (vp != NULL))
be guaranteed?)

   @@ -1563,10 +1553,11 @@ coda_symlink(void *v)
   saved_cn_flags = cnp-cn_flags;
   cnp-cn_flags = ~(MODMASK | OPMASK);
   cnp-cn_flags |= LOOKUP;
   error = VOP_LOOKUP(dvp, ap-a_vpp, cnp);
   +   if (error == 0)
   +   VOP_UNLOCK(*ap-a_vpp);

Can you leave an XXX comment here marking it as a provisional kludge
until VOP_LOOKUP becomes sane (i.e., returns the vnode unlocked)?

   @@ -601,8 +601,10 @@ smbfs_create(void *v)
   error = smbfs_smb_lookup(dnp, name, nmlen, fattr, scred);
   if (error)
   goto out;
   error = smbfs_nget(VTOVFS(dvp), dvp, name, nmlen, fattr, ap-a_vpp);
   +   if (error == 0)
   +   VOP_UNLOCK(*ap-a_vpp);
   if (error)
   goto out;

Can you do the VOP_UNLOCK unconditionally after the `if (error)'
branch?  Generally I find it much easier to read `branch on error'
than `branch on success' and I try to adhere to that in new code.
There are a few other cases of this in your patch, although some match
the surrounding style of doing `if (error == 0)' or would require new
labels so I wouldn't change those.

   --- sys/rump/librump/rumpvfs/rumpfs.c17 Jan 2014 10:55:03 -  
1.122
   +++ sys/rump/librump/rumpvfs/rumpfs.c20 Jan 2014 09:10:53 -
   @@ -564,9 +564,10 @@ makeprivate(enum vtype vt, mode_t mode, 
   return rn;
}

static int
   -makevnode(struct mount *mp, struct rumpfs_node *rn, struct vnode **vpp)
   +makevnode(struct mount *mp, struct rumpfs_node *rn, struct vnode **vpp,
   +bool lock_result)

There are only a couple cases of makevnode(..., true) in your patch,
in rump_vop_lookup and rumpfs_mountfs.  Instead of adding an argument
to makevnode, why not just call vn_lock after makevnode in those
cases?

   @@ -183,16 +184,17 @@ ext2fs_mknod(void *v)
* Remove inode so that it will be reloaded by VFS_VGET and
* checked to see if it is an alias of an existing entry in
* the inode cache.
*/
   -   VOP_UNLOCK(*vpp);
   (*vpp)-v_type = VNON;
   +   VOP_UNLOCK(*vpp);

Can you do this one in a separate tiny commit?  It looks like an
unrelated bug fix.


Re: Vnode API cleanup pass 2a

2014-01-16 Thread Taylor R Campbell
   Date: Thu, 16 Jan 2014 07:07:56 +
   From: David Holland dholland-t...@netbsd.org

   On Wed, Jan 15, 2014 at 04:31:07PM +0100, J. Hannken-Illjes wrote:
 I put a diff to http://www.netbsd.org/~hannken/vnode-pass2a-3.diff that
 changes the vnode creation operations to keep the dvp locked.
 
 Any objections or OK to commit?

   I don't understand where all of the zfs changes arise (without wading
   into the zfs code) but I'm not going to worry too much about that.

Whoops, I missed those on my first reading -- in particular, the first
two files, zfs_ioctl.c and zfs_replay.c (the zfs_vnops.c changes look
fine and I'm glad we can now get rid of the comments on insanity I
left there).

It's not clear to me why the vop protocol change would induce these
zfs changes.  I'm particularly nervous because you've made the vops
*stop* unlocking and releasing vnodes, so callers should at most have
to *add* unlocks and vreles, but you've *removed* some vreles and left
things locked.  It could be that something causes changes to zfs_zget,
but this seems unlikely.  Did you run this code?  (I suspect not -- it
looks like the code had broken locking to begin with.)


Re: Vnode API cleanup pass 2a

2014-01-16 Thread J. Hannken-Illjes
On Jan 16, 2014, at 2:45 PM, Taylor R Campbell 
campbell+netbsd-tech-k...@mumble.net wrote:

   Date: Thu, 16 Jan 2014 07:07:56 +
   From: David Holland dholland-t...@netbsd.org
 
   On Wed, Jan 15, 2014 at 04:31:07PM +0100, J. Hannken-Illjes wrote:
 I put a diff to http://www.netbsd.org/~hannken/vnode-pass2a-3.diff that
 changes the vnode creation operations to keep the dvp locked.
 
 Any objections or OK to commit?
 
   I don't understand where all of the zfs changes arise (without wading
   into the zfs code) but I'm not going to worry too much about that.
 
 Whoops, I missed those on my first reading -- in particular, the first
 two files, zfs_ioctl.c and zfs_replay.c (the zfs_vnops.c changes look
 fine and I'm glad we can now get rid of the comments on insanity I
 left there).
 
 It's not clear to me why the vop protocol change would induce these
 zfs changes.  I'm particularly nervous because you've made the vops
 *stop* unlocking and releasing vnodes, so callers should at most have
 to *add* unlocks and vreles, but you've *removed* some vreles and left
 things locked.  It could be that something causes changes to zfs_zget,
 but this seems unlikely.  Did you run this code?  (I suspect not -- it
 looks like the code had broken locking to begin with.)

Taylor, you're right.

- the change to zfs_ioctl.c was wrong and unneeded (this part is not
  used from NetBSD).

- the change to zfs_replay.c was wrong.  Looks like zfs_zget() returns
  a referenced and unlocked vnode so the attached diff should do it
  right as soon as VOP_CREATE and VOP_MKDIR get out of #ifdef TODO.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



zfs.diff
Description: Binary data


Re: Vnode API cleanup pass 2a

2014-01-16 Thread Taylor R Campbell
   Date: Thu, 16 Jan 2014 15:13:47 +0100
   From: J. Hannken-Illjes hann...@eis.cs.tu-bs.de

   Taylor, you're right.

   - the change to zfs_ioctl.c was wrong and unneeded (this part is not
 used from NetBSD).

   - the change to zfs_replay.c was wrong.  Looks like zfs_zget() returns
 a referenced and unlocked vnode so the attached diff should do it
 right as soon as VOP_CREATE and VOP_MKDIR get out of #ifdef TODO.

OK, great.  LGTM.


Re: Vnode API cleanup pass 2a

2014-01-16 Thread haad
I'm not sure if I remember it correctly but we do not use zfs_replay.c
it used with zfs snapshots and last time I did work on it it wasn't
working.

On Thu, Jan 16, 2014 at 3:13 PM, J. Hannken-Illjes
hann...@eis.cs.tu-bs.de wrote:
 On Jan 16, 2014, at 2:45 PM, Taylor R Campbell 
 campbell+netbsd-tech-k...@mumble.net wrote:

   Date: Thu, 16 Jan 2014 07:07:56 +
   From: David Holland dholland-t...@netbsd.org

   On Wed, Jan 15, 2014 at 04:31:07PM +0100, J. Hannken-Illjes wrote:
 I put a diff to http://www.netbsd.org/~hannken/vnode-pass2a-3.diff that
 changes the vnode creation operations to keep the dvp locked.

 Any objections or OK to commit?

   I don't understand where all of the zfs changes arise (without wading
   into the zfs code) but I'm not going to worry too much about that.

 Whoops, I missed those on my first reading -- in particular, the first
 two files, zfs_ioctl.c and zfs_replay.c (the zfs_vnops.c changes look
 fine and I'm glad we can now get rid of the comments on insanity I
 left there).

 It's not clear to me why the vop protocol change would induce these
 zfs changes.  I'm particularly nervous because you've made the vops
 *stop* unlocking and releasing vnodes, so callers should at most have
 to *add* unlocks and vreles, but you've *removed* some vreles and left
 things locked.  It could be that something causes changes to zfs_zget,
 but this seems unlikely.  Did you run this code?  (I suspect not -- it
 looks like the code had broken locking to begin with.)

 Taylor, you're right.

 - the change to zfs_ioctl.c was wrong and unneeded (this part is not
   used from NetBSD).

 - the change to zfs_replay.c was wrong.  Looks like zfs_zget() returns
   a referenced and unlocked vnode so the attached diff should do it
   right as soon as VOP_CREATE and VOP_MKDIR get out of #ifdef TODO.

 --
 J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)




-- 


Regards.

Adam


Re: Vnode API cleanup pass 2a

2014-01-15 Thread J. Hannken-Illjes
On Jan 14, 2014, at 5:12 PM, J. Hannken-Illjes hann...@eis.cs.tu-bs.de wrote:

 On Jan 14, 2014, at 5:54 AM, David Holland dholland-t...@netbsd.org wrote:
 
 On Mon, Jan 13, 2014 at 04:06:02PM +0100, J. Hannken-Illjes wrote:
 snip
 Even though it looks like we don't need creation ops to return with
 locked directory at the moment it may be better to do this change now
 so atomic operations like the one seen in NFS server are possible.
 
 Yeah, that's what I was thinking
 
 Thus we would end up with:
 [snip]
 
 Right.
 
 It would probably be better to make this a separate commit; if that's
 a nuisance for you, I can do it...
 
 Give me a day (or maybe two) ...

I put a diff to http://www.netbsd.org/~hannken/vnode-pass2a-3.diff that
changes the vnode creation operations to keep the dvp locked.

Any objections or OK to commit?

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Vnode API cleanup pass 2a

2014-01-15 Thread Taylor R Campbell
   Date: Mon, 13 Jan 2014 16:06:02 +0100
   From: J. Hannken-Illjes hann...@eis.cs.tu-bs.de

   On Jan 13, 2014, at 8:39 AM, David Holland dholland-t...@netbsd.org wrote:

On Tue, Jan 07, 2014 at 11:30:40AM +0100, J. Hannken-Illjes wrote:
(also, while this is minor I think I'd rather have vop_mkdir_args_v2
and/or vop_mkdir_desc_v2 rather than vop_mkdir2_args and
vop_mkdir2_desc. The version doesn't need to be part of the operation
name.)

Looks very good -- changed vnode_if.sh to create xxx_v2_args. Diff 
attached.

Would you mind committing the versioning logic right away? It has
clear value and it would be nice for it to appear in a separate
changeset.

   Done.

As is, existing code calling, e.g., VOP_MKDIR will still compile.  Is
there a reason the versioning doesn't rename it to VOP_MKDIR_V2?


Re: Vnode API cleanup pass 2a

2014-01-15 Thread Taylor R Campbell
   Date: Wed, 15 Jan 2014 16:08:04 +
   From: Taylor R Campbell campbell+netbsd-tech-k...@mumble.net

   As is, existing code calling, e.g., VOP_MKDIR will still compile.  Is
   there a reason the versioning doesn't rename it to VOP_MKDIR_V2?

For that matter, why new machinery for this versioning stuff at all?
Why not just rename the vop from mkdir to mkdir_v2?  That would take
care of both struct vop_mkdir_v2_args and VOP_MKDIR_V2.  Am I missing
something?


Re: Vnode API cleanup pass 2a

2014-01-15 Thread Alan Barrett

On Wed, 15 Jan 2014, Taylor R Campbell wrote:

For that matter, why new machinery for this versioning stuff at all?
Why not just rename the vop from mkdir to mkdir_v2?  That would take
care of both struct vop_mkdir_v2_args and VOP_MKDIR_V2.  Am I missing
something?


That would the calling code ugly.

--apb (Alan Barrett)


Re: Vnode API cleanup pass 2a

2014-01-15 Thread David Holland
On Wed, Jan 15, 2014 at 05:31:56PM +, Taylor R Campbell wrote:
 As is, existing code calling, e.g., VOP_MKDIR will still compile.  Is
 there a reason the versioning doesn't rename it to VOP_MKDIR_V2?
  
  For that matter, why new machinery for this versioning stuff at all?
  Why not just rename the vop from mkdir to mkdir_v2?  That would take
  care of both struct vop_mkdir_v2_args and VOP_MKDIR_V2.  Am I missing
  something?

Well, one thing is that there aren't expected to be out-of-tree
callers of VOP_MKDIR, only out-of-tree callees.

It is convenient to be able to break callers too, but I'm ok with
doing it some other way temporarily while hacking.

-- 
David A. Holland
dholl...@netbsd.org


Re: Vnode API cleanup pass 2a

2014-01-15 Thread David Holland
On Wed, Jan 15, 2014 at 04:31:07PM +0100, J. Hannken-Illjes wrote:
  I put a diff to http://www.netbsd.org/~hannken/vnode-pass2a-3.diff that
  changes the vnode creation operations to keep the dvp locked.
  
  Any objections or OK to commit?

I don't understand where all of the zfs changes arise (without wading
into the zfs code) but I'm not going to worry too much about that.

Otherwise it looks ok. I'm kind of distressed by how far afield some
of the vput calls are from the vop entry points in some of the fses,
but I suppose that's the usual level of code quality. :-/

I suppose there might be a problem in one of those cases where
leaving the directory locked causes something else wrong to happen.
But if so I think I'd rather tackle it head on.

(Function calls should be symmetric with respect to locking, all the
more so in what's ostensibly a supported API.)

Also I'm not sure that if there were something missing I'd necessarily
spot it from reading a diff, but that can't be helped.


Now somebody just has to fix the rest of the vops that do this...
maybe this weekend I can finally deal with the fix for PR 47040 and
then I can stop being wedged on vfs work.

-- 
David A. Holland
dholl...@netbsd.org


Re: Vnode API cleanup pass 2a

2014-01-14 Thread J. Hannken-Illjes
On Jan 14, 2014, at 5:54 AM, David Holland dholland-t...@netbsd.org wrote:

 On Mon, Jan 13, 2014 at 04:06:02PM +0100, J. Hannken-Illjes wrote:
snip
 Even though it looks like we don't need creation ops to return with
 locked directory at the moment it may be better to do this change now
 so atomic operations like the one seen in NFS server are possible.
 
 Yeah, that's what I was thinking
 
 Thus we would end up with:
 [snip]
 
 Right.
 
 It would probably be better to make this a separate commit; if that's
 a nuisance for you, I can do it...

Give me a day (or maybe two) ...

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Vnode API cleanup pass 2a

2014-01-13 Thread J . Hannken-Illjes
On Jan 13, 2014, at 8:39 AM, David Holland dholland-t...@netbsd.org wrote:

 On Tue, Jan 07, 2014 at 11:30:40AM +0100, J. Hannken-Illjes wrote:
 (also, while this is minor I think I'd rather have vop_mkdir_args_v2
 and/or vop_mkdir_desc_v2 rather than vop_mkdir2_args and
 vop_mkdir2_desc. The version doesn't need to be part of the operation
 name.)
 
 Looks very good -- changed vnode_if.sh to create xxx_v2_args. Diff attached.
 
 Would you mind committing the versioning logic right away? It has
 clear value and it would be nice for it to appear in a separate
 changeset.

Done.

snip
 Unlocking and relocking should not itself be a problem since AFAIK
 there isn't any reason for those locks to be held in the first place;
 however, I think I'd better go review all the call sites just in case.
 
 Plus I think it would be advisable to make this change first, so that
 in case we do find a site where it matters we can lock the returned
 object before anyone else can see it:
 
 #
 -#% mkdir  dvp L U U
 +#% mkdir  dvp L L L
 
 Where should it matter? VOP_MKDIR is only used from:
snip
 sys/nfs/nfs_serv.c:nfsrv_mkdir()
 
 This extracts the nfs file handle for the new dir and calls
 VOP_GETATTR. If we don't leave the directory locked, these actions are
 no longer atomic with the directory creation. I don't know whether
 this is safe or not. My guess is not. In the absence of clear
 information it seems prudent to not introduce this race.

RFC 1813 is a bit vague here:

  post_op_attr

  This structure is used for returning attributes in those operations that are
  not directly involved with manipulating attributes. One of the principles of
  this revision of the NFS protocol is to return the real value from the 
indicated
  operation and not an error from an incidental operation. The post_op_attr 
structure
  was designed to allow the server to recover from errors encountered while
  getting attributes.

  This appears to make returning attributes optional. However, server 
implementors
  are strongly encouraged to make best effort to return attributes whenever 
possible,
  even when returning an error.

  post_op_fh3

  One of the principles of this revision of the NFS protocol is to return the 
real
  value from the indicated operation and not an error from an incidental 
operation.
  The post_op_fh3 structure was designed to allow the server to recover from 
errors
  encountered while constructing a file handle.

  This is the structure used to return a file handle from the CREATE, MKDIR, 
SYMLINK,
  MKNOD, and READDIRPLUS requests. In each case, the client can get the file 
handle by
  issuing a LOOKUP request after a successful return from one of the listed 
operations.
  Returning the file handle is an optimization so that the client is not forced 
to
  immediately issue a LOOKUP request to get the file handle.

The reference implementation from Solaris is not atomic:

  VOP_MKDIR(dvp, vp)
  VOP_GETATTR(vp)
  makefh(vp)
  VN_RELE(vp)
  VN_RELE(dvp)

 sys/fs/union/union_subr.c:union_mkshadow()
 
 ...but this doesn't, the caller is trying to create the directory and
 also bind it into the onionfs layering atomically. Of course, being
 onionfs, it's already only approximate and adding another race to it
 may not make things any worse; but it's probably better not to.


Looking closer it makes things better here.  Currently unionfs will
lock in the wrong order (uppervp - upperdvp) while it now does it right.

If the shadow directory we just created would disappear because another
thread removed it outside of unionfs we had much more problems than
a race here.


Even though it looks like we don't need creation ops to return with
locked directory at the moment it may be better to do this change now
so atomic operations like the one seen in NFS server are possible.

Thus we would end up with:

-#% create dvp L U U
+#% create dvp L L L
-#% create vpp - L -
+#% create vpp - U -
-   IN LOCKED=YES WILLPUT struct vnode *dvp;
+   IN LOCKED=YES struct vnode *dvp;
-#% mknod  dvp L U U
+#% mknod  dvp L L L
-#% mknod  vpp - L -
+#% mknod  vpp - U -
-   IN LOCKED=YES WILLPUT struct vnode *dvp;
+   IN LOCKED=YES struct vnode *dvp;
-#% mkdir  dvp L U U
+#% mkdir  dvp L L L
-#% mkdir  vpp - L - 
+#% mkdir  vpp - U - 
-   IN LOCKED=YES WILLPUT struct vnode *dvp;
+   IN LOCKED=YES struct vnode *dvp;
-#% symlinkdvp L U U
+#% symlinkdvp L L L
-#% symlinkvpp - L -
+#% symlinkvpp - U -
-   IN LOCKED=YES WILLPUT struct vnode *dvp;
+   IN LOCKED=YES struct vnode *dvp;


Please comment soon, I will prepare a new diff then.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Vnode API cleanup pass 2a

2014-01-13 Thread David Holland
On Mon, Jan 13, 2014 at 04:06:02PM +0100, J. Hannken-Illjes wrote:
  [various nfs stuff]

Yeah, I'm not sure what to make of that either.
 
  The reference implementation from Solaris is not atomic:
  
VOP_MKDIR(dvp, vp)
VOP_GETATTR(vp)
makefh(vp)
VN_RELE(vp)
VN_RELE(dvp)

They can't do this kind of thing atomically with their blinkered vfs
locking :-)

One should get the same file handle even if it isn't atomic and that's
probably ok; on the other hand it would be weird if you created a
directory mode 755 and the stat info you got back said 700 because a
chmod raced with the getattr. On the other hand, it's NFS.

   sys/fs/union/union_subr.c:union_mkshadow()
   
   ...but this doesn't, the caller is trying to create the directory and
   also bind it into the onionfs layering atomically. Of course, being
   onionfs, it's already only approximate and adding another race to it
   may not make things any worse; but it's probably better not to.
  
  Looking closer it makes things better here.  Currently unionfs will
  lock in the wrong order (uppervp - upperdvp) while it now does it right.

Oh lovely, I missed that.

  If the shadow directory we just created would disappear because another
  thread removed it outside of unionfs we had much more problems than
  a race here.

Yeah, I think the more likely problem is that it might try to create
it twice and have one of them fail unexpectedly. But maybe it's
already capable of dealing with that.

  Even though it looks like we don't need creation ops to return with
  locked directory at the moment it may be better to do this change now
  so atomic operations like the one seen in NFS server are possible.

Yeah, that's what I was thinking

  Thus we would end up with:
  [snip]

Right.

It would probably be better to make this a separate commit; if that's
a nuisance for you, I can do it...

-- 
David A. Holland
dholl...@netbsd.org


Re: Vnode API cleanup pass 2a

2014-01-12 Thread David Holland
On Tue, Jan 07, 2014 at 11:32:28AM +0100, J. Hannken-Illjes wrote:
   This seems to largely add code to file systems to mess with locking
   further, rather than reducing that code.
   
   What do you mean with mess here?
   
   Can you explain how this
   change helps to clean up those dirty hacks?
   
   This is the first of hopefully two steps.  Once lookup returns its
   result unlocked the hacks can go as all operations adding or looking
   up layer nodes work on unlocked vnodes.
   
   Are you planning to make
   lookup return the node unlocked too or something first?
   
   Sure, changing the lookup operation to do the same is be the next step.
   
   also, maybe it would be better to tackle lookup first?
  
  Why should the order matter?

Less mess of the kind Taylor was objecting to. Maybe. (Or maybe not,
too.)

-- 
David A. Holland
dholl...@netbsd.org


Re: Vnode API cleanup pass 2a

2014-01-12 Thread David Holland
On Tue, Jan 07, 2014 at 11:30:40AM +0100, J. Hannken-Illjes wrote:
   (also, while this is minor I think I'd rather have vop_mkdir_args_v2
   and/or vop_mkdir_desc_v2 rather than vop_mkdir2_args and
   vop_mkdir2_desc. The version doesn't need to be part of the operation
   name.)
  
  Looks very good -- changed vnode_if.sh to create xxx_v2_args. Diff attached.

Would you mind committing the versioning logic right away? It has
clear value and it would be nice for it to appear in a separate
changeset.

   Also, since what you're doing is basically unlocking before return in
   every implementation, and then relocking after return at every call
   site, it's possible to do one op at a time; it seems like that would
   be a good idea in case unexpected problems crop up.
   
   This is not what I'm doing.
   
   It still seems like a good idea to do one op at a time, or at least do
   one op first.
  
  This is missing any arguments for the second time

...because it's vfs and everything in vfs is delicate? Small
incremental changes are safer; if something weird goes wrong, it's
easier to deal with the smaller the change was. (If nothing else, it's
easier to rever only the part that broke; it's also easier to bisect
later.)

  and you ignored my
  arguments after This is not what I'm doing..

That's because they aren't really relevant to working incrementally.

   Unlocking and relocking should not itself be a problem since AFAIK
   there isn't any reason for those locks to be held in the first place;
   however, I think I'd better go review all the call sites just in case.
   
   Plus I think it would be advisable to make this change first, so that
   in case we do find a site where it matters we can lock the returned
   object before anyone else can see it:
   
   #
   -#% mkdir  dvp L U U
   +#% mkdir  dvp L L L
   
   Where should it matter? VOP_MKDIR is only used from:

In the case of VOP_SYMLINK it definitely matters, as the new symlink
isn't filled until after VOP_SYMLINK returns. Since the locking
protocol for the containing directory is to unlock it *before*
returning, it then becomes possible to see the empty symlink. And that
isn't so good.

mkdir is less clear, but:

   external/cddl/osnet/dist/uts/common/fs/zfs/zfs_replay.c:zfs_replay_create()

This doesn't appear to do anything with the new dir (right now it just
panics...)

   sys/kern/vfs_syscalls.c:do_sys_mkdirat()

This does nothing and is ok.

   sys/nfs/nfs_serv.c:nfsrv_mkdir()

This extracts the nfs file handle for the new dir and calls
VOP_GETATTR. If we don't leave the directory locked, these actions are
no longer atomic with the directory creation. I don't know whether
this is safe or not. My guess is not. In the absence of clear
information it seems prudent to not introduce this race.

   sys/fs/union/union_vnops.c:union_mkdir()

This looks ok.

   sys/fs/union/union_subr.c:union_mkshadow()

...but this doesn't, the caller is trying to create the directory and
also bind it into the onionfs layering atomically. Of course, being
onionfs, it's already only approximate and adding another race to it
may not make things any worse; but it's probably better not to.

   sys/rump/librump/rumpvfs/rumpvnode_if.c:RUMP_VOP_MKDIR()

this shouldn't matter.

   I have been putting off answering this because I ought to examine all
   these, and I haven't managed to do so yet but figured I ought to
   answer the rest.

It looks to me like it would definitely be better to first fix the
VOP_MKDIR interface to not release the directory lock within the call.

Certainly for VOP_SYMLINK.

Therefore, it's probably best to fix all the creation ops.

-- 
David A. Holland
dholl...@netbsd.org


Re: Vnode API cleanup pass 2a

2014-01-12 Thread David Holland
On Mon, Jan 13, 2014 at 07:39:48AM +, David Holland wrote:
  In the case of VOP_SYMLINK it definitely matters, as the new symlink
  isn't filled until after VOP_SYMLINK returns. Since the locking
  protocol for the containing directory is to unlock it *before*
  returning, it then becomes possible to see the empty symlink. And that
  isn't so good.

Er. This isn't actually true - I think I was thinking of some other
older vfs interface.

  It looks to me like it would definitely be better to first fix the
  VOP_MKDIR interface to not release the directory lock within the call.
  
  Certainly for VOP_SYMLINK.
  
  Therefore, it's probably best to fix all the creation ops.

Therefore, this is less critical; but I think the mkdir cases warrant
doing it.

-- 
David A. Holland
dholl...@netbsd.org


Re: Vnode API cleanup pass 2a

2014-01-08 Thread J. Hannken-Illjes
On Jan 7, 2014, at 5:29 AM, David Holland dholland-t...@netbsd.org wrote:

snip

 Plus I think it would be advisable to make this change first, so that
 in case we do find a site where it matters we can lock the returned
 object before anyone else can see it:
 
 #
 -#% mkdir  dvp L U U
 +#% mkdir  dvp L L L
 
 Where should it matter? VOP_MKDIR is only used from:
 
 external/cddl/osnet/dist/uts/common/fs/zfs/zfs_replay.c:zfs_replay_create()
 sys/kern/vfs_syscalls.c:do_sys_mkdirat()
 sys/nfs/nfs_serv.c:nfsrv_mkdir()
 sys/fs/union/union_vnops.c:union_mkdir()
 sys/fs/union/union_subr.c:union_mkshadow()
 sys/rump/librump/rumpvfs/rumpvnode_if.c:RUMP_VOP_MKDIR()
 
 and I don't see any problems here.
 
 I have been putting off answering this because I ought to examine all
 these, and I haven't managed to do so yet but figured I ought to
 answer the rest.

Please do -- plan is to commit next monday, 2014-01-13

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Vnode API cleanup pass 2a

2014-01-07 Thread J. Hannken-Illjes
On Jan 7, 2014, at 5:29 AM, David Holland dholland-t...@netbsd.org wrote:

 On Tue, Dec 31, 2013 at 11:34:27AM +0100, J. Hannken-Illjes wrote:
 The layered file systems hashlists currently have to work on locked
 vnodes as the vnode operations returning vnodes return them locked.
 
 This leads to some very dirty hacks -- see layer_node_find() from
 sys/miscfs/genfs/layer_subr.c for example.
 
 The best solution is to change these vnode operations to return
 unlocked (and referenced) vnodes.
 
 The attached diff implements this change for operations create, mknod,
 mkdir and symlink.  It passes the atf test suite and some file system
 stress tests I use here.
 
 Comments or objections anyone?
 
 My first thought is that I really don't think it's a good idea to
 change important semantic properties (particularly locking) of calls
 from what's effectively a public API without changing the signatures
 too so old code won't compile.
 
 Ok.  Will add an API version argument to vnode_if.src so
 
  vop_mkdir {
 +VERSION 1
  IN LOCKED=YES WILLPUT struct vnode *dvp;
 
 will rename vop_mkdir_desc to vop_mkdir1_desc and then change all
 file systems to use the new desc.
 
 While on the whole I suppose this is probably a good thing, it feels
 inelegant. Also, the ops table is typically a long way from the code
 that actually needs to be changed.
 
 Wouldn't it be better to attach the version number to the name of the
 args structure? Then it will fail to compile in the VOP function that
 needs attention. That is,
 
 int
 ufs_mkdir(void *v)
 {
struct vop_mkdir_args /* {
struct vnode*a_dvp;
struct vnode**a_vpp;
struct componentname*a_cnp;
struct vattr*a_vap;
} */ *ap = v;
:
:
 }
 
 would need to be changed to e.g.
 
 int
 ufs_mkdir(void *v)
 {
struct vop_mkdir_args_v2 /* {
struct vnode*a_dvp;
struct vnode**a_vpp;
struct componentname*a_cnp;
struct vattr*a_vap;
} */ *ap = v;
:
:
 }
 
 (also, while this is minor I think I'd rather have vop_mkdir_args_v2
 and/or vop_mkdir_desc_v2 rather than vop_mkdir2_args and
 vop_mkdir2_desc. The version doesn't need to be part of the operation
 name.)

Looks very good -- changed vnode_if.sh to create xxx_v2_args. Diff attached.

 Also, since what you're doing is basically unlocking before return in
 every implementation, and then relocking after return at every call
 site, it's possible to do one op at a time; it seems like that would
 be a good idea in case unexpected problems crop up.
 
 This is not what I'm doing.
 
 It still seems like a good idea to do one op at a time, or at least do
 one op first.

This is missing any arguments for the second time and you ignored my
arguments after This is not what I'm doing..

 Unlocking and relocking should not itself be a problem since AFAIK
 there isn't any reason for those locks to be held in the first place;
 however, I think I'd better go review all the call sites just in case.
 
 Plus I think it would be advisable to make this change first, so that
 in case we do find a site where it matters we can lock the returned
 object before anyone else can see it:
 
 #
 -#% mkdir  dvp L U U
 +#% mkdir  dvp L L L
 
 Where should it matter? VOP_MKDIR is only used from:
 
 external/cddl/osnet/dist/uts/common/fs/zfs/zfs_replay.c:zfs_replay_create()
 sys/kern/vfs_syscalls.c:do_sys_mkdirat()
 sys/nfs/nfs_serv.c:nfsrv_mkdir()
 sys/fs/union/union_vnops.c:union_mkdir()
 sys/fs/union/union_subr.c:union_mkshadow()
 sys/rump/librump/rumpvfs/rumpvnode_if.c:RUMP_VOP_MKDIR()
 
 and I don't see any problems here.
 
 I have been putting off answering this because I ought to examine all
 these, and I haven't managed to do so yet but figured I ought to
 answer the rest.

An updated diff at http://www.netbsd.org/~hannken/vnode-pass2a-2.diff
Diffs to vnode_if.{sh,src} attached.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)


vnode-pass2a-new.diff
Description: Binary data




Re: Vnode API cleanup pass 2a

2014-01-07 Thread J. Hannken-Illjes

On Jan 7, 2014, at 5:30 AM, David Holland dholland-t...@netbsd.org wrote:

 On Mon, Dec 30, 2013 at 05:26:14PM +0100, J. Hannken-Illjes wrote:
 This seems to largely add code to file systems to mess with locking
 further, rather than reducing that code.
 
 What do you mean with mess here?
 
 Can you explain how this
 change helps to clean up those dirty hacks?
 
 This is the first of hopefully two steps.  Once lookup returns its
 result unlocked the hacks can go as all operations adding or looking
 up layer nodes work on unlocked vnodes.
 
 Are you planning to make
 lookup return the node unlocked too or something first?
 
 Sure, changing the lookup operation to do the same is be the next step.
 
 also, maybe it would be better to tackle lookup first?

Why should the order matter?

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Vnode API cleanup pass 2a

2014-01-06 Thread David Holland
On Mon, Dec 30, 2013 at 05:26:14PM +0100, J. Hannken-Illjes wrote:
   This seems to largely add code to file systems to mess with locking
   further, rather than reducing that code.
  
  What do you mean with mess here?
  
   Can you explain how this
   change helps to clean up those dirty hacks?
  
  This is the first of hopefully two steps.  Once lookup returns its
  result unlocked the hacks can go as all operations adding or looking
  up layer nodes work on unlocked vnodes.
  
   Are you planning to make
   lookup return the node unlocked too or something first?
  
  Sure, changing the lookup operation to do the same is be the next step.

also, maybe it would be better to tackle lookup first?

-- 
David A. Holland
dholl...@netbsd.org


Re: Vnode API cleanup pass 2a

2014-01-06 Thread David Holland
On Tue, Dec 31, 2013 at 11:34:27AM +0100, J. Hannken-Illjes wrote:
   The layered file systems hashlists currently have to work on locked
   vnodes as the vnode operations returning vnodes return them locked.
   
   This leads to some very dirty hacks -- see layer_node_find() from
   sys/miscfs/genfs/layer_subr.c for example.
   
   The best solution is to change these vnode operations to return
   unlocked (and referenced) vnodes.
   
   The attached diff implements this change for operations create, mknod,
   mkdir and symlink.  It passes the atf test suite and some file system
   stress tests I use here.
   
   Comments or objections anyone?
   
   My first thought is that I really don't think it's a good idea to
   change important semantic properties (particularly locking) of calls
   from what's effectively a public API without changing the signatures
   too so old code won't compile.
  
  Ok.  Will add an API version argument to vnode_if.src so
  
   vop_mkdir {
  +VERSION 1
   IN LOCKED=YES WILLPUT struct vnode *dvp;
  
  will rename vop_mkdir_desc to vop_mkdir1_desc and then change all
  file systems to use the new desc.

While on the whole I suppose this is probably a good thing, it feels
inelegant. Also, the ops table is typically a long way from the code
that actually needs to be changed.

Wouldn't it be better to attach the version number to the name of the
args structure? Then it will fail to compile in the VOP function that
needs attention. That is,

int
ufs_mkdir(void *v)
{
struct vop_mkdir_args /* {
struct vnode*a_dvp;
struct vnode**a_vpp;
struct componentname*a_cnp;
struct vattr*a_vap;
} */ *ap = v;
 :
 :
}

would need to be changed to e.g.

int
ufs_mkdir(void *v)
{
struct vop_mkdir_args_v2 /* {
struct vnode*a_dvp;
struct vnode**a_vpp;
struct componentname*a_cnp;
struct vattr*a_vap;
} */ *ap = v;
 :
 :
}

(also, while this is minor I think I'd rather have vop_mkdir_args_v2
and/or vop_mkdir_desc_v2 rather than vop_mkdir2_args and
vop_mkdir2_desc. The version doesn't need to be part of the operation
name.)

   Also, since what you're doing is basically unlocking before return in
   every implementation, and then relocking after return at every call
   site, it's possible to do one op at a time; it seems like that would
   be a good idea in case unexpected problems crop up.
  
  This is not what I'm doing.

It still seems like a good idea to do one op at a time, or at least do
one op first.

   Unlocking and relocking should not itself be a problem since AFAIK
   there isn't any reason for those locks to be held in the first place;
   however, I think I'd better go review all the call sites just in case.
   
   Plus I think it would be advisable to make this change first, so that
   in case we do find a site where it matters we can lock the returned
   object before anyone else can see it:
   
   #
   -#% mkdir  dvp L U U
   +#% mkdir  dvp L L L
  
  Where should it matter? VOP_MKDIR is only used from:
  
  external/cddl/osnet/dist/uts/common/fs/zfs/zfs_replay.c:zfs_replay_create()
  sys/kern/vfs_syscalls.c:do_sys_mkdirat()
  sys/nfs/nfs_serv.c:nfsrv_mkdir()
  sys/fs/union/union_vnops.c:union_mkdir()
  sys/fs/union/union_subr.c:union_mkshadow()
  sys/rump/librump/rumpvfs/rumpvnode_if.c:RUMP_VOP_MKDIR()
  
  and I don't see any problems here.

I have been putting off answering this because I ought to examine all
these, and I haven't managed to do so yet but figured I ought to
answer the rest.

-- 
David A. Holland
dholl...@netbsd.org


Re: Vnode API cleanup pass 2a

2014-01-04 Thread J. Hannken-Illjes
On Dec 31, 2013, at 11:34 AM, J. Hannken-Illjes hann...@eis.cs.tu-bs.de wrote:

 On Dec 30, 2013, at 10:58 PM, David Holland dholland-t...@netbsd.org wrote:
 
 On Mon, Dec 30, 2013 at 11:35:48AM +0100, J. Hannken-Illjes wrote:
 The layered file systems hashlists currently have to work on locked
 vnodes as the vnode operations returning vnodes return them locked.
 
 This leads to some very dirty hacks -- see layer_node_find() from
 sys/miscfs/genfs/layer_subr.c for example.
 
 The best solution is to change these vnode operations to return
 unlocked (and referenced) vnodes.
 
 The attached diff implements this change for operations create, mknod,
 mkdir and symlink.  It passes the atf test suite and some file system
 stress tests I use here.
 
 Comments or objections anyone?
 
 My first thought is that I really don't think it's a good idea to
 change important semantic properties (particularly locking) of calls
 from what's effectively a public API without changing the signatures
 too so old code won't compile.
 
 Ok.  Will add an API version argument to vnode_if.src so
 
   vop_mkdir {
 + VERSION 1
   IN LOCKED=YES WILLPUT struct vnode *dvp;
 
 will rename vop_mkdir_desc to vop_mkdir1_desc and then change all
 file systems to use the new desc.
 
 Also, since what you're doing is basically unlocking before return in
 every implementation, and then relocking after return at every call
 site, it's possible to do one op at a time; it seems like that would
 be a good idea in case unexpected problems crop up.
 
 This is not what I'm doing.
 
 Many file systems use one internal function to create a new inode and
 where possible I change this function to return the node unlocked.
 
 Relocking is done only for nfs server, vn_open() and temporarily for
 layered file systems.  The operations from vfs_syscalls.c all change
 vput() to vrele().
 
 Unlocking and relocking should not itself be a problem since AFAIK
 there isn't any reason for those locks to be held in the first place;
 however, I think I'd better go review all the call sites just in case.
 
 Plus I think it would be advisable to make this change first, so that
 in case we do find a site where it matters we can lock the returned
 object before anyone else can see it:
 
 #
 -#% mkdir  dvp L U U
 +#% mkdir  dvp L L L
 
 Where should it matter? VOP_MKDIR is only used from:
 
 external/cddl/osnet/dist/uts/common/fs/zfs/zfs_replay.c:zfs_replay_create()
 sys/kern/vfs_syscalls.c:do_sys_mkdirat()
 sys/nfs/nfs_serv.c:nfsrv_mkdir()
 sys/fs/union/union_vnops.c:union_mkdir()
 sys/fs/union/union_subr.c:union_mkshadow()
 sys/rump/librump/rumpvfs/rumpvnode_if.c:RUMP_VOP_MKDIR()
 
 and I don't see any problems here.

Put an updated diff to http://www.netbsd.org/~hannken/vnode-pass2a-1.diff

This diff adds API version, diffs to vnode_if.{sh,src} attached.

Further comments or objections anyone?

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



vnode-pass2a-new.diff
Description: Binary data



Re: Vnode API cleanup pass 2a

2013-12-31 Thread J . Hannken-Illjes
On Dec 30, 2013, at 10:58 PM, David Holland dholland-t...@netbsd.org wrote:

 On Mon, Dec 30, 2013 at 11:35:48AM +0100, J. Hannken-Illjes wrote:
 The layered file systems hashlists currently have to work on locked
 vnodes as the vnode operations returning vnodes return them locked.
 
 This leads to some very dirty hacks -- see layer_node_find() from
 sys/miscfs/genfs/layer_subr.c for example.
 
 The best solution is to change these vnode operations to return
 unlocked (and referenced) vnodes.
 
 The attached diff implements this change for operations create, mknod,
 mkdir and symlink.  It passes the atf test suite and some file system
 stress tests I use here.
 
 Comments or objections anyone?
 
 My first thought is that I really don't think it's a good idea to
 change important semantic properties (particularly locking) of calls
 from what's effectively a public API without changing the signatures
 too so old code won't compile.

Ok.  Will add an API version argument to vnode_if.src so

vop_mkdir {
+   VERSION 1
IN LOCKED=YES WILLPUT struct vnode *dvp;

will rename vop_mkdir_desc to vop_mkdir1_desc and then change all
file systems to use the new desc.

 Also, since what you're doing is basically unlocking before return in
 every implementation, and then relocking after return at every call
 site, it's possible to do one op at a time; it seems like that would
 be a good idea in case unexpected problems crop up.

This is not what I'm doing.

Many file systems use one internal function to create a new inode and
where possible I change this function to return the node unlocked.

Relocking is done only for nfs server, vn_open() and temporarily for
layered file systems.  The operations from vfs_syscalls.c all change
vput() to vrele().

 Unlocking and relocking should not itself be a problem since AFAIK
 there isn't any reason for those locks to be held in the first place;
 however, I think I'd better go review all the call sites just in case.
 
 Plus I think it would be advisable to make this change first, so that
 in case we do find a site where it matters we can lock the returned
 object before anyone else can see it:
 
 #
 -#% mkdir  dvp L U U
 +#% mkdir  dvp L L L

Where should it matter? VOP_MKDIR is only used from:

external/cddl/osnet/dist/uts/common/fs/zfs/zfs_replay.c:zfs_replay_create()
sys/kern/vfs_syscalls.c:do_sys_mkdirat()
sys/nfs/nfs_serv.c:nfsrv_mkdir()
sys/fs/union/union_vnops.c:union_mkdir()
sys/fs/union/union_subr.c:union_mkshadow()
sys/rump/librump/rumpvfs/rumpvnode_if.c:RUMP_VOP_MKDIR()

and I don't see any problems here.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Vnode API cleanup pass 2a

2013-12-30 Thread J. Hannken-Illjes
The layered file systems hashlists currently have to work on locked
vnodes as the vnode operations returning vnodes return them locked.

This leads to some very dirty hacks -- see layer_node_find() from
sys/miscfs/genfs/layer_subr.c for example.

The best solution is to change these vnode operations to return
unlocked (and referenced) vnodes.

The attached diff implements this change for operations create, mknod,
mkdir and symlink.  It passes the atf test suite and some file system
stress tests I use here.

Comments or objections anyone?

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)


vnode-pass2a.diff
Description: Binary data


Re: Vnode API cleanup pass 2a

2013-12-30 Thread Taylor R Campbell
   Date: Mon, 30 Dec 2013 11:35:48 +0100
   From: J. Hannken-Illjes hann...@eis.cs.tu-bs.de

   The layered file systems hashlists currently have to work on locked
   vnodes as the vnode operations returning vnodes return them locked.

   This leads to some very dirty hacks -- see layer_node_find() from
   sys/miscfs/genfs/layer_subr.c for example.

   The best solution is to change these vnode operations to return
   unlocked (and referenced) vnodes.

   The attached diff implements this change for operations create, mknod,
   mkdir and symlink.  It passes the atf test suite and some file system
   stress tests I use here.

This seems to largely add code to file systems to mess with locking
further, rather than reducing that code.  Can you explain how this
change helps to clean up those dirty hacks?  Are you planning to make
lookup return the node unlocked too or something first?


Re: Vnode API cleanup pass 2a

2013-12-30 Thread J. Hannken-Illjes
On Dec 30, 2013, at 4:12 PM, Taylor R Campbell 
campbell+netbsd-tech-k...@mumble.net wrote:

   Date: Mon, 30 Dec 2013 11:35:48 +0100
   From: J. Hannken-Illjes hann...@eis.cs.tu-bs.de
 
   The layered file systems hashlists currently have to work on locked
   vnodes as the vnode operations returning vnodes return them locked.
 
   This leads to some very dirty hacks -- see layer_node_find() from
   sys/miscfs/genfs/layer_subr.c for example.
 
   The best solution is to change these vnode operations to return
   unlocked (and referenced) vnodes.
 
   The attached diff implements this change for operations create, mknod,
   mkdir and symlink.  It passes the atf test suite and some file system
   stress tests I use here.
 
 This seems to largely add code to file systems to mess with locking
 further, rather than reducing that code.

What do you mean with mess here?

 Can you explain how this
 change helps to clean up those dirty hacks?

This is the first of hopefully two steps.  Once lookup returns its
result unlocked the hacks can go as all operations adding or looking
up layer nodes work on unlocked vnodes.

 Are you planning to make
 lookup return the node unlocked too or something first?

Sure, changing the lookup operation to do the same is be the next step.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Vnode API cleanup pass 2a

2013-12-30 Thread David Holland
On Mon, Dec 30, 2013 at 11:35:48AM +0100, J. Hannken-Illjes wrote:
  The layered file systems hashlists currently have to work on locked
  vnodes as the vnode operations returning vnodes return them locked.
  
  This leads to some very dirty hacks -- see layer_node_find() from
  sys/miscfs/genfs/layer_subr.c for example.
  
  The best solution is to change these vnode operations to return
  unlocked (and referenced) vnodes.
  
  The attached diff implements this change for operations create, mknod,
  mkdir and symlink.  It passes the atf test suite and some file system
  stress tests I use here.
  
  Comments or objections anyone?

My first thought is that I really don't think it's a good idea to
change important semantic properties (particularly locking) of calls
from what's effectively a public API without changing the signatures
too so old code won't compile.

Also, since what you're doing is basically unlocking before return in
every implementation, and then relocking after return at every call
site, it's possible to do one op at a time; it seems like that would
be a good idea in case unexpected problems crop up.

Unlocking and relocking should not itself be a problem since AFAIK
there isn't any reason for those locks to be held in the first place;
however, I think I'd better go review all the call sites just in case.

Plus I think it would be advisable to make this change first, so that
in case we do find a site where it matters we can lock the returned
object before anyone else can see it:

 #
-#% mkdir  dvp L U U
+#% mkdir  dvp L L L
 #% mkdir  vpp - L - 
 #% mkdir  vpp - L - 
 #

This is not, unfortunately, entirely trivial; I've been putting it off
to combine with namei/componentname-related changes.

-- 
David A. Holland
dholl...@netbsd.org