Re: svn commit: r363482 - in head/sys: kern sys

2020-07-30 Thread Konstantin Belousov
On Thu, Jul 30, 2020 at 03:13:19PM -0700, Conrad Meyer wrote:
> Hi Konstantin,
> 
> On Tue, Jul 28, 2020 at 11:42 AM Konstantin Belousov
>  wrote:
> >
> > On Fri, Jul 24, 2020 at 05:34:05PM +, Conrad Meyer wrote:
> > > ...
> > > --- head/sys/kern/vfs_bio.c   Fri Jul 24 17:32:10 2020(r363481)
> > > +++ head/sys/kern/vfs_bio.c   Fri Jul 24 17:34:04 2020(r363482)
> > > @@ -3849,7 +3849,7 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t 
> > > dblkn
> > > ...
> > > + /* Attempt lockless lookup first. */
> > > + bp = gbincore_unlocked(bo, blkno);
> > > + if (bp == NULL)
> > > + goto newbuf_unlocked;
> > > +
> > > + lockflags = LK_EXCLUSIVE | LK_SLEEPFAIL |
> > > + ((flags & GB_LOCK_NOWAIT) ? LK_NOWAIT : 0);
> > > +
> > > + error = BUF_TIMELOCK(bp, lockflags, NULL, "getblku", slpflag,
> > > + slptimeo);
> > I realized that this is not safe.  There is an ordering between buffer
> > types that defines which order buffer locks should obey.  For instance,
> > on UFS the critical order is inode buffer -> snaplk -> cg buffer, or
> > data block -> indirect data block.  Since buffer identity can change under
> > us, we might end up waiting for a lock of type that is incompatible with
> > the currently owned lock.
> >
> > I think the easiest fix is to use LK_NOWAIT always, after all it is lockless
> > path.  ERESTART/EINTR checks below than can be removed.
> 
> Thanks, that makes sense to me.  Please see 
> https://reviews.freebsd.org/D25898 .
> 
> (For the UFS scenario, I think this requires an on-disk sector
> changing identity from one kind to another?  I believe lblknos are
> mostly statically typed in UFS, but it could happen with data blocks
> and indirect blocks?  Of course, UFS is not the only filesystem.)
For UFS, it is enough for buffer to change its identity due to reclaim.
We do not pin buffers to device/block number for their lifetime.
So buffer might be freed and reassigned to different block under us.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r363482 - in head/sys: kern sys

2020-07-30 Thread Conrad Meyer
Hi Konstantin,

On Tue, Jul 28, 2020 at 11:42 AM Konstantin Belousov
 wrote:
>
> On Fri, Jul 24, 2020 at 05:34:05PM +, Conrad Meyer wrote:
> > ...
> > --- head/sys/kern/vfs_bio.c   Fri Jul 24 17:32:10 2020(r363481)
> > +++ head/sys/kern/vfs_bio.c   Fri Jul 24 17:34:04 2020(r363482)
> > @@ -3849,7 +3849,7 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t dblkn
> > ...
> > + /* Attempt lockless lookup first. */
> > + bp = gbincore_unlocked(bo, blkno);
> > + if (bp == NULL)
> > + goto newbuf_unlocked;
> > +
> > + lockflags = LK_EXCLUSIVE | LK_SLEEPFAIL |
> > + ((flags & GB_LOCK_NOWAIT) ? LK_NOWAIT : 0);
> > +
> > + error = BUF_TIMELOCK(bp, lockflags, NULL, "getblku", slpflag,
> > + slptimeo);
> I realized that this is not safe.  There is an ordering between buffer
> types that defines which order buffer locks should obey.  For instance,
> on UFS the critical order is inode buffer -> snaplk -> cg buffer, or
> data block -> indirect data block.  Since buffer identity can change under
> us, we might end up waiting for a lock of type that is incompatible with
> the currently owned lock.
>
> I think the easiest fix is to use LK_NOWAIT always, after all it is lockless
> path.  ERESTART/EINTR checks below than can be removed.

Thanks, that makes sense to me.  Please see https://reviews.freebsd.org/D25898 .

(For the UFS scenario, I think this requires an on-disk sector
changing identity from one kind to another?  I believe lblknos are
mostly statically typed in UFS, but it could happen with data blocks
and indirect blocks?  Of course, UFS is not the only filesystem.)

Best regards,
Conrad
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r363482 - in head/sys: kern sys

2020-07-28 Thread Konstantin Belousov
On Fri, Jul 24, 2020 at 05:34:05PM +, Conrad Meyer wrote:
> Author: cem
> Date: Fri Jul 24 17:34:04 2020
> New Revision: 363482
> URL: https://svnweb.freebsd.org/changeset/base/363482
> 
> Log:
>   Add unlocked/SMR fast path to getblk()
>   
>   Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
>   API wrapping this functionality.  Use it for a fast path in getblkx(),
>   falling back to locked lookup if we raced a thread changing the buf's
>   identity.
>   
>   Reported by:Attilio
>   Reviewed by:kib, markj
>   Testing:pho (in progress)
>   Sponsored by:   Isilon
>   Differential Revision:  https://reviews.freebsd.org/D25782
> 
> Modified:
>   head/sys/kern/vfs_bio.c
>   head/sys/kern/vfs_subr.c
>   head/sys/sys/buf.h
> 
> Modified: head/sys/kern/vfs_bio.c
> ==
> --- head/sys/kern/vfs_bio.c   Fri Jul 24 17:32:10 2020(r363481)
> +++ head/sys/kern/vfs_bio.c   Fri Jul 24 17:34:04 2020(r363482)
> @@ -3849,7 +3849,7 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t dblkn
>   struct buf *bp;
>   struct bufobj *bo;
>   daddr_t d_blkno;
> - int bsize, error, maxsize, vmio;
> + int bsize, error, maxsize, vmio, lockflags;
>   off_t offset;
>  
>   CTR3(KTR_BUF, "getblk(%p, %ld, %d)", vp, (long)blkno, size);
> @@ -3864,11 +3864,33 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t dblkn
>  
>   bo = >v_bufobj;
>   d_blkno = dblkno;
> +
> + /* Attempt lockless lookup first. */
> + bp = gbincore_unlocked(bo, blkno);
> + if (bp == NULL)
> + goto newbuf_unlocked;
> +
> + lockflags = LK_EXCLUSIVE | LK_SLEEPFAIL |
> + ((flags & GB_LOCK_NOWAIT) ? LK_NOWAIT : 0);
> +
> + error = BUF_TIMELOCK(bp, lockflags, NULL, "getblku", slpflag,
> + slptimeo);
I realized that this is not safe.  There is an ordering between buffer
types that defines which order buffer locks should obey.  For instance,
on UFS the critical order is inode buffer -> snaplk -> cg buffer, or
data block -> indirect data block.  Since buffer identity can change under
us, we might end up waiting for a lock of type that is incompatible with
the currently owned lock.

I think the easiest fix is to use LK_NOWAIT always, after all it is lockless
path.  ERESTART/EINTR checks below than can be removed.

> + if (error == EINTR || error == ERESTART)
> + return (error);
> + else if (error != 0)
> + goto loop;
> +
> + /* Verify buf identify has not changed since lookup. */
> + if (bp->b_bufobj == bo && bp->b_lblkno == blkno)
> + goto foundbuf_fastpath;
> +
> + /* It changed, fallback to locked lookup. */
> + BUF_UNLOCK_RAW(bp);
> +
>  loop:
>   BO_RLOCK(bo);
>   bp = gbincore(bo, blkno);
>   if (bp != NULL) {
> - int lockflags;
>   /*
>* Buffer is in-core.  If the buffer is not busy nor managed,
>* it must be on a queue.
> @@ -3890,8 +3912,10 @@ loop:
>   /* We timed out or were interrupted. */
>   else if (error != 0)
>   return (error);
> +
> +foundbuf_fastpath:
>   /* If recursed, assume caller knows the rules. */
> - else if (BUF_LOCKRECURSED(bp))
> + if (BUF_LOCKRECURSED(bp))
>   goto end;
>  
>   /*
> @@ -3989,6 +4013,7 @@ loop:
>* buffer is also considered valid (not marked B_INVAL).
>*/
>   BO_RUNLOCK(bo);
> +newbuf_unlocked:
>   /*
>* If the user does not want us to create the buffer, bail out
>* here.
> 
> Modified: head/sys/kern/vfs_subr.c
> ==
> --- head/sys/kern/vfs_subr.c  Fri Jul 24 17:32:10 2020(r363481)
> +++ head/sys/kern/vfs_subr.c  Fri Jul 24 17:34:04 2020(r363482)
> @@ -234,6 +234,7 @@ static struct mtx __exclusive_cache_line vnode_list_mt
>  struct nfs_public nfs_pub;
>  
>  static uma_zone_t buf_trie_zone;
> +static smr_t buf_trie_smr;
>  
>  /* Zone for allocation of new vnodes - used exclusively by getnewvnode() */
>  static uma_zone_t vnode_zone;
> @@ -491,17 +492,16 @@ static int vnsz2log;
>  static void *
>  buf_trie_alloc(struct pctrie *ptree)
>  {
> -
> - return uma_zalloc(buf_trie_zone, M_NOWAIT);
> + return (uma_zalloc_smr(buf_trie_zone, M_NOWAIT));
>  }
>  
>  static void
>  buf_trie_free(struct pctrie *ptree, void *node)
>  {
> -
> - uma_zfree(buf_trie_zone, node);
> + uma_zfree_smr(buf_trie_zone, node);
>  }
> -PCTRIE_DEFINE(BUF, buf, b_lblkno, buf_trie_alloc, buf_trie_free);
> +PCTRIE_DEFINE_SMR(BUF, buf, b_lblkno, buf_trie_alloc, buf_trie_free,
> +buf_trie_smr);
>  
>  /*
>   * Initialize the vnode management data structures.
> @@ -675,7 +675,8 @@ vntblinit(void *dummy 

svn commit: r363482 - in head/sys: kern sys

2020-07-24 Thread Conrad Meyer
Author: cem
Date: Fri Jul 24 17:34:04 2020
New Revision: 363482
URL: https://svnweb.freebsd.org/changeset/base/363482

Log:
  Add unlocked/SMR fast path to getblk()
  
  Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
  API wrapping this functionality.  Use it for a fast path in getblkx(),
  falling back to locked lookup if we raced a thread changing the buf's
  identity.
  
  Reported by:  Attilio
  Reviewed by:  kib, markj
  Testing:  pho (in progress)
  Sponsored by: Isilon
  Differential Revision:https://reviews.freebsd.org/D25782

Modified:
  head/sys/kern/vfs_bio.c
  head/sys/kern/vfs_subr.c
  head/sys/sys/buf.h

Modified: head/sys/kern/vfs_bio.c
==
--- head/sys/kern/vfs_bio.c Fri Jul 24 17:32:10 2020(r363481)
+++ head/sys/kern/vfs_bio.c Fri Jul 24 17:34:04 2020(r363482)
@@ -3849,7 +3849,7 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t dblkn
struct buf *bp;
struct bufobj *bo;
daddr_t d_blkno;
-   int bsize, error, maxsize, vmio;
+   int bsize, error, maxsize, vmio, lockflags;
off_t offset;
 
CTR3(KTR_BUF, "getblk(%p, %ld, %d)", vp, (long)blkno, size);
@@ -3864,11 +3864,33 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t dblkn
 
bo = >v_bufobj;
d_blkno = dblkno;
+
+   /* Attempt lockless lookup first. */
+   bp = gbincore_unlocked(bo, blkno);
+   if (bp == NULL)
+   goto newbuf_unlocked;
+
+   lockflags = LK_EXCLUSIVE | LK_SLEEPFAIL |
+   ((flags & GB_LOCK_NOWAIT) ? LK_NOWAIT : 0);
+
+   error = BUF_TIMELOCK(bp, lockflags, NULL, "getblku", slpflag,
+   slptimeo);
+   if (error == EINTR || error == ERESTART)
+   return (error);
+   else if (error != 0)
+   goto loop;
+
+   /* Verify buf identify has not changed since lookup. */
+   if (bp->b_bufobj == bo && bp->b_lblkno == blkno)
+   goto foundbuf_fastpath;
+
+   /* It changed, fallback to locked lookup. */
+   BUF_UNLOCK_RAW(bp);
+
 loop:
BO_RLOCK(bo);
bp = gbincore(bo, blkno);
if (bp != NULL) {
-   int lockflags;
/*
 * Buffer is in-core.  If the buffer is not busy nor managed,
 * it must be on a queue.
@@ -3890,8 +3912,10 @@ loop:
/* We timed out or were interrupted. */
else if (error != 0)
return (error);
+
+foundbuf_fastpath:
/* If recursed, assume caller knows the rules. */
-   else if (BUF_LOCKRECURSED(bp))
+   if (BUF_LOCKRECURSED(bp))
goto end;
 
/*
@@ -3989,6 +4013,7 @@ loop:
 * buffer is also considered valid (not marked B_INVAL).
 */
BO_RUNLOCK(bo);
+newbuf_unlocked:
/*
 * If the user does not want us to create the buffer, bail out
 * here.

Modified: head/sys/kern/vfs_subr.c
==
--- head/sys/kern/vfs_subr.cFri Jul 24 17:32:10 2020(r363481)
+++ head/sys/kern/vfs_subr.cFri Jul 24 17:34:04 2020(r363482)
@@ -234,6 +234,7 @@ static struct mtx __exclusive_cache_line vnode_list_mt
 struct nfs_public nfs_pub;
 
 static uma_zone_t buf_trie_zone;
+static smr_t buf_trie_smr;
 
 /* Zone for allocation of new vnodes - used exclusively by getnewvnode() */
 static uma_zone_t vnode_zone;
@@ -491,17 +492,16 @@ static int vnsz2log;
 static void *
 buf_trie_alloc(struct pctrie *ptree)
 {
-
-   return uma_zalloc(buf_trie_zone, M_NOWAIT);
+   return (uma_zalloc_smr(buf_trie_zone, M_NOWAIT));
 }
 
 static void
 buf_trie_free(struct pctrie *ptree, void *node)
 {
-
-   uma_zfree(buf_trie_zone, node);
+   uma_zfree_smr(buf_trie_zone, node);
 }
-PCTRIE_DEFINE(BUF, buf, b_lblkno, buf_trie_alloc, buf_trie_free);
+PCTRIE_DEFINE_SMR(BUF, buf, b_lblkno, buf_trie_alloc, buf_trie_free,
+buf_trie_smr);
 
 /*
  * Initialize the vnode management data structures.
@@ -675,7 +675,8 @@ vntblinit(void *dummy __unused)
 */
buf_trie_zone = uma_zcreate("BUF TRIE", pctrie_node_size(),
NULL, NULL, pctrie_zone_init, NULL, UMA_ALIGN_PTR, 
-   UMA_ZONE_NOFREE);
+   UMA_ZONE_NOFREE | UMA_ZONE_SMR);
+   buf_trie_smr = uma_zone_get_smr(buf_trie_zone);
uma_prealloc(buf_trie_zone, nbuf);
 
vnodes_created = counter_u64_alloc(M_WAITOK);
@@ -2330,7 +2331,25 @@ gbincore(struct bufobj *bo, daddr_t lblkno)
bp = BUF_PCTRIE_LOOKUP(>bo_clean.bv_root, lblkno);
if (bp != NULL)
return (bp);
-   return BUF_PCTRIE_LOOKUP(>bo_dirty.bv_root, lblkno);
+   return (BUF_PCTRIE_LOOKUP(>bo_dirty.bv_root, lblkno));
+}
+
+/*
+ * Look up a buf using the buffer tries,