Re: svn commit: r363482 - in head/sys: kern sys
On Thu, Jul 30, 2020 at 03:13:19PM -0700, Conrad Meyer wrote: > Hi Konstantin, > > On Tue, Jul 28, 2020 at 11:42 AM Konstantin Belousov > wrote: > > > > On Fri, Jul 24, 2020 at 05:34:05PM +, Conrad Meyer wrote: > > > ... > > > --- head/sys/kern/vfs_bio.c Fri Jul 24 17:32:10 2020(r363481) > > > +++ head/sys/kern/vfs_bio.c Fri Jul 24 17:34:04 2020(r363482) > > > @@ -3849,7 +3849,7 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t > > > dblkn > > > ... > > > + /* Attempt lockless lookup first. */ > > > + bp = gbincore_unlocked(bo, blkno); > > > + if (bp == NULL) > > > + goto newbuf_unlocked; > > > + > > > + lockflags = LK_EXCLUSIVE | LK_SLEEPFAIL | > > > + ((flags & GB_LOCK_NOWAIT) ? LK_NOWAIT : 0); > > > + > > > + error = BUF_TIMELOCK(bp, lockflags, NULL, "getblku", slpflag, > > > + slptimeo); > > I realized that this is not safe. There is an ordering between buffer > > types that defines which order buffer locks should obey. For instance, > > on UFS the critical order is inode buffer -> snaplk -> cg buffer, or > > data block -> indirect data block. Since buffer identity can change under > > us, we might end up waiting for a lock of type that is incompatible with > > the currently owned lock. > > > > I think the easiest fix is to use LK_NOWAIT always, after all it is lockless > > path. ERESTART/EINTR checks below than can be removed. > > Thanks, that makes sense to me. Please see > https://reviews.freebsd.org/D25898 . > > (For the UFS scenario, I think this requires an on-disk sector > changing identity from one kind to another? I believe lblknos are > mostly statically typed in UFS, but it could happen with data blocks > and indirect blocks? Of course, UFS is not the only filesystem.) For UFS, it is enough for buffer to change its identity due to reclaim. We do not pin buffers to device/block number for their lifetime. So buffer might be freed and reassigned to different block under us. ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r363482 - in head/sys: kern sys
Hi Konstantin, On Tue, Jul 28, 2020 at 11:42 AM Konstantin Belousov wrote: > > On Fri, Jul 24, 2020 at 05:34:05PM +, Conrad Meyer wrote: > > ... > > --- head/sys/kern/vfs_bio.c Fri Jul 24 17:32:10 2020(r363481) > > +++ head/sys/kern/vfs_bio.c Fri Jul 24 17:34:04 2020(r363482) > > @@ -3849,7 +3849,7 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t dblkn > > ... > > + /* Attempt lockless lookup first. */ > > + bp = gbincore_unlocked(bo, blkno); > > + if (bp == NULL) > > + goto newbuf_unlocked; > > + > > + lockflags = LK_EXCLUSIVE | LK_SLEEPFAIL | > > + ((flags & GB_LOCK_NOWAIT) ? LK_NOWAIT : 0); > > + > > + error = BUF_TIMELOCK(bp, lockflags, NULL, "getblku", slpflag, > > + slptimeo); > I realized that this is not safe. There is an ordering between buffer > types that defines which order buffer locks should obey. For instance, > on UFS the critical order is inode buffer -> snaplk -> cg buffer, or > data block -> indirect data block. Since buffer identity can change under > us, we might end up waiting for a lock of type that is incompatible with > the currently owned lock. > > I think the easiest fix is to use LK_NOWAIT always, after all it is lockless > path. ERESTART/EINTR checks below than can be removed. Thanks, that makes sense to me. Please see https://reviews.freebsd.org/D25898 . (For the UFS scenario, I think this requires an on-disk sector changing identity from one kind to another? I believe lblknos are mostly statically typed in UFS, but it could happen with data blocks and indirect blocks? Of course, UFS is not the only filesystem.) Best regards, Conrad ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r363482 - in head/sys: kern sys
On Fri, Jul 24, 2020 at 05:34:05PM +, Conrad Meyer wrote: > Author: cem > Date: Fri Jul 24 17:34:04 2020 > New Revision: 363482 > URL: https://svnweb.freebsd.org/changeset/base/363482 > > Log: > Add unlocked/SMR fast path to getblk() > > Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked() > API wrapping this functionality. Use it for a fast path in getblkx(), > falling back to locked lookup if we raced a thread changing the buf's > identity. > > Reported by:Attilio > Reviewed by:kib, markj > Testing:pho (in progress) > Sponsored by: Isilon > Differential Revision: https://reviews.freebsd.org/D25782 > > Modified: > head/sys/kern/vfs_bio.c > head/sys/kern/vfs_subr.c > head/sys/sys/buf.h > > Modified: head/sys/kern/vfs_bio.c > == > --- head/sys/kern/vfs_bio.c Fri Jul 24 17:32:10 2020(r363481) > +++ head/sys/kern/vfs_bio.c Fri Jul 24 17:34:04 2020(r363482) > @@ -3849,7 +3849,7 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t dblkn > struct buf *bp; > struct bufobj *bo; > daddr_t d_blkno; > - int bsize, error, maxsize, vmio; > + int bsize, error, maxsize, vmio, lockflags; > off_t offset; > > CTR3(KTR_BUF, "getblk(%p, %ld, %d)", vp, (long)blkno, size); > @@ -3864,11 +3864,33 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t dblkn > > bo = >v_bufobj; > d_blkno = dblkno; > + > + /* Attempt lockless lookup first. */ > + bp = gbincore_unlocked(bo, blkno); > + if (bp == NULL) > + goto newbuf_unlocked; > + > + lockflags = LK_EXCLUSIVE | LK_SLEEPFAIL | > + ((flags & GB_LOCK_NOWAIT) ? LK_NOWAIT : 0); > + > + error = BUF_TIMELOCK(bp, lockflags, NULL, "getblku", slpflag, > + slptimeo); I realized that this is not safe. There is an ordering between buffer types that defines which order buffer locks should obey. For instance, on UFS the critical order is inode buffer -> snaplk -> cg buffer, or data block -> indirect data block. Since buffer identity can change under us, we might end up waiting for a lock of type that is incompatible with the currently owned lock. I think the easiest fix is to use LK_NOWAIT always, after all it is lockless path. ERESTART/EINTR checks below than can be removed. > + if (error == EINTR || error == ERESTART) > + return (error); > + else if (error != 0) > + goto loop; > + > + /* Verify buf identify has not changed since lookup. */ > + if (bp->b_bufobj == bo && bp->b_lblkno == blkno) > + goto foundbuf_fastpath; > + > + /* It changed, fallback to locked lookup. */ > + BUF_UNLOCK_RAW(bp); > + > loop: > BO_RLOCK(bo); > bp = gbincore(bo, blkno); > if (bp != NULL) { > - int lockflags; > /* >* Buffer is in-core. If the buffer is not busy nor managed, >* it must be on a queue. > @@ -3890,8 +3912,10 @@ loop: > /* We timed out or were interrupted. */ > else if (error != 0) > return (error); > + > +foundbuf_fastpath: > /* If recursed, assume caller knows the rules. */ > - else if (BUF_LOCKRECURSED(bp)) > + if (BUF_LOCKRECURSED(bp)) > goto end; > > /* > @@ -3989,6 +4013,7 @@ loop: >* buffer is also considered valid (not marked B_INVAL). >*/ > BO_RUNLOCK(bo); > +newbuf_unlocked: > /* >* If the user does not want us to create the buffer, bail out >* here. > > Modified: head/sys/kern/vfs_subr.c > == > --- head/sys/kern/vfs_subr.c Fri Jul 24 17:32:10 2020(r363481) > +++ head/sys/kern/vfs_subr.c Fri Jul 24 17:34:04 2020(r363482) > @@ -234,6 +234,7 @@ static struct mtx __exclusive_cache_line vnode_list_mt > struct nfs_public nfs_pub; > > static uma_zone_t buf_trie_zone; > +static smr_t buf_trie_smr; > > /* Zone for allocation of new vnodes - used exclusively by getnewvnode() */ > static uma_zone_t vnode_zone; > @@ -491,17 +492,16 @@ static int vnsz2log; > static void * > buf_trie_alloc(struct pctrie *ptree) > { > - > - return uma_zalloc(buf_trie_zone, M_NOWAIT); > + return (uma_zalloc_smr(buf_trie_zone, M_NOWAIT)); > } > > static void > buf_trie_free(struct pctrie *ptree, void *node) > { > - > - uma_zfree(buf_trie_zone, node); > + uma_zfree_smr(buf_trie_zone, node); > } > -PCTRIE_DEFINE(BUF, buf, b_lblkno, buf_trie_alloc, buf_trie_free); > +PCTRIE_DEFINE_SMR(BUF, buf, b_lblkno, buf_trie_alloc, buf_trie_free, > +buf_trie_smr); > > /* > * Initialize the vnode management data structures. > @@ -675,7 +675,8 @@ vntblinit(void *dummy
svn commit: r363482 - in head/sys: kern sys
Author: cem Date: Fri Jul 24 17:34:04 2020 New Revision: 363482 URL: https://svnweb.freebsd.org/changeset/base/363482 Log: Add unlocked/SMR fast path to getblk() Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked() API wrapping this functionality. Use it for a fast path in getblkx(), falling back to locked lookup if we raced a thread changing the buf's identity. Reported by: Attilio Reviewed by: kib, markj Testing: pho (in progress) Sponsored by: Isilon Differential Revision:https://reviews.freebsd.org/D25782 Modified: head/sys/kern/vfs_bio.c head/sys/kern/vfs_subr.c head/sys/sys/buf.h Modified: head/sys/kern/vfs_bio.c == --- head/sys/kern/vfs_bio.c Fri Jul 24 17:32:10 2020(r363481) +++ head/sys/kern/vfs_bio.c Fri Jul 24 17:34:04 2020(r363482) @@ -3849,7 +3849,7 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t dblkn struct buf *bp; struct bufobj *bo; daddr_t d_blkno; - int bsize, error, maxsize, vmio; + int bsize, error, maxsize, vmio, lockflags; off_t offset; CTR3(KTR_BUF, "getblk(%p, %ld, %d)", vp, (long)blkno, size); @@ -3864,11 +3864,33 @@ getblkx(struct vnode *vp, daddr_t blkno, daddr_t dblkn bo = >v_bufobj; d_blkno = dblkno; + + /* Attempt lockless lookup first. */ + bp = gbincore_unlocked(bo, blkno); + if (bp == NULL) + goto newbuf_unlocked; + + lockflags = LK_EXCLUSIVE | LK_SLEEPFAIL | + ((flags & GB_LOCK_NOWAIT) ? LK_NOWAIT : 0); + + error = BUF_TIMELOCK(bp, lockflags, NULL, "getblku", slpflag, + slptimeo); + if (error == EINTR || error == ERESTART) + return (error); + else if (error != 0) + goto loop; + + /* Verify buf identify has not changed since lookup. */ + if (bp->b_bufobj == bo && bp->b_lblkno == blkno) + goto foundbuf_fastpath; + + /* It changed, fallback to locked lookup. */ + BUF_UNLOCK_RAW(bp); + loop: BO_RLOCK(bo); bp = gbincore(bo, blkno); if (bp != NULL) { - int lockflags; /* * Buffer is in-core. If the buffer is not busy nor managed, * it must be on a queue. @@ -3890,8 +3912,10 @@ loop: /* We timed out or were interrupted. */ else if (error != 0) return (error); + +foundbuf_fastpath: /* If recursed, assume caller knows the rules. */ - else if (BUF_LOCKRECURSED(bp)) + if (BUF_LOCKRECURSED(bp)) goto end; /* @@ -3989,6 +4013,7 @@ loop: * buffer is also considered valid (not marked B_INVAL). */ BO_RUNLOCK(bo); +newbuf_unlocked: /* * If the user does not want us to create the buffer, bail out * here. Modified: head/sys/kern/vfs_subr.c == --- head/sys/kern/vfs_subr.cFri Jul 24 17:32:10 2020(r363481) +++ head/sys/kern/vfs_subr.cFri Jul 24 17:34:04 2020(r363482) @@ -234,6 +234,7 @@ static struct mtx __exclusive_cache_line vnode_list_mt struct nfs_public nfs_pub; static uma_zone_t buf_trie_zone; +static smr_t buf_trie_smr; /* Zone for allocation of new vnodes - used exclusively by getnewvnode() */ static uma_zone_t vnode_zone; @@ -491,17 +492,16 @@ static int vnsz2log; static void * buf_trie_alloc(struct pctrie *ptree) { - - return uma_zalloc(buf_trie_zone, M_NOWAIT); + return (uma_zalloc_smr(buf_trie_zone, M_NOWAIT)); } static void buf_trie_free(struct pctrie *ptree, void *node) { - - uma_zfree(buf_trie_zone, node); + uma_zfree_smr(buf_trie_zone, node); } -PCTRIE_DEFINE(BUF, buf, b_lblkno, buf_trie_alloc, buf_trie_free); +PCTRIE_DEFINE_SMR(BUF, buf, b_lblkno, buf_trie_alloc, buf_trie_free, +buf_trie_smr); /* * Initialize the vnode management data structures. @@ -675,7 +675,8 @@ vntblinit(void *dummy __unused) */ buf_trie_zone = uma_zcreate("BUF TRIE", pctrie_node_size(), NULL, NULL, pctrie_zone_init, NULL, UMA_ALIGN_PTR, - UMA_ZONE_NOFREE); + UMA_ZONE_NOFREE | UMA_ZONE_SMR); + buf_trie_smr = uma_zone_get_smr(buf_trie_zone); uma_prealloc(buf_trie_zone, nbuf); vnodes_created = counter_u64_alloc(M_WAITOK); @@ -2330,7 +2331,25 @@ gbincore(struct bufobj *bo, daddr_t lblkno) bp = BUF_PCTRIE_LOOKUP(>bo_clean.bv_root, lblkno); if (bp != NULL) return (bp); - return BUF_PCTRIE_LOOKUP(>bo_dirty.bv_root, lblkno); + return (BUF_PCTRIE_LOOKUP(>bo_dirty.bv_root, lblkno)); +} + +/* + * Look up a buf using the buffer tries,