On 21/12/22(Wed) 09:20, David Hill wrote:
> 
> 
> On 12/21/22 07:08, David Hill wrote:
> > 
> > 
> > On 12/21/22 05:33, Martin Pieuchot wrote:
> > > On 18/12/22(Sun) 20:55, Martin Pieuchot wrote:
> > > > On 17/12/22(Sat) 14:15, David Hill wrote:
> > > > > 
> > > > > 
> > > > > On 10/28/22 03:46, Renato Aguiar wrote:
> > > > > > Use of bbolt Go library causes 7.2 to freeze. I suspect
> > > > > > it is triggering some
> > > > > > sort of deadlock in mmap because threads get stuck at vmmaplk.
> > > > > > 
> > > > > > I managed to reproduce it consistently in a laptop with
> > > > > > 4 cores (i5-1135G7)
> > > > > > using one unit test from bbolt:
> > > > > > 
> > > > > >     $ doas pkg_add git go
> > > > > >     $ git clone https://github.com/etcd-io/bbolt.git
> > > > > >     $ cd bbolt
> > > > > >     $ git checkout v1.3.6
> > > > > >     $ go test -v -run TestSimulate_10000op_10p
> > > > > > 
> > > > > > The test never ends and this is the 'top' report:
> > > > > > 
> > > > > >     PID      TID PRI NICE  SIZE   RES STATE    
> > > > > > WAIT      TIME    CPU COMMAND
> > > > > > 32181   438138 -18    0   57M   13M idle      uvn_fls  
> > > > > > 0:00  0.00% bbolt.test
> > > > > > 32181   331169  10    0   57M   13M sleep/1   nanoslp  
> > > > > > 0:00  0.00% bbolt.test
> > > > > > 32181   497390  10    0   57M   13M idle      vmmaplk  
> > > > > > 0:00  0.00% bbolt.test
> > > > > > 32181   380477  14    0   57M   13M idle      vmmaplk  
> > > > > > 0:00  0.00% bbolt.test
> > > > > > 32181   336950  14    0   57M   13M idle      vmmaplk  
> > > > > > 0:00  0.00% bbolt.test
> > > > > > 32181   491043  14    0   57M   13M idle      vmmaplk  
> > > > > > 0:00  0.00% bbolt.test
> > > > > > 32181   347071   2    0   57M   13M idle      kqread   
> > > > > > 0:00  0.00% bbolt.test
> > > > > > 
> > > > > > After this, most commands just hang. For example,
> > > > > > running a 'ps | grep foo' in
> > > > > > another shell would do it.
> > > > > > 
> > > > > 
> > > > > I can reproduce this on MP, but not SP.  Here is /trace from
> > > > > ddb after using
> > > > > the ddb.trigger sysctl.  Is there any other information I
> > > > > could pull from
> > > > > DDB that may help?
> > > > 
> > > > Thanks for the useful report David!
> > > > 
> > > > The issue seems to be a deadlock between the `vmmaplk' and a particular
> > > > `vmobjlock'.  uvm_map_clean() calls uvn_flush() which sleeps with the
> > > > `vmmaplk' held.
> > > > 
> > > > I'll think a bit about this and try to come up with a fix ASAP.
> > > 
> > > I'm missing a piece of information.  All the threads in your report seem
> > > to want a read version of the `vmmaplk' so they should not block.  Could
> > > you reproduce the hang with a WITNESS kernel and print 'show all locks'
> > > in addition to all the informations you've reported?
> > > 
> > 
> > Sure.  Its always the same; 2 processes (sysctl and bbolt.test) and 3
> > locks (sysctllk, kernel_lock, and vmmaplk) with bbolt.test always on the
> > uvn_flsh thread.
> > 
> > 
> > Process 98301 (sysctl) thread 0xfff......
> > exclusive rwlock sysctllk r = 0 (0xfffff...)
> > exclusive kernel_lock &kernel_lock r = 0 (0xffffff......)
> > Process 32181 (bbolt.test) thread (0xffffff...) (438138)
> > shared rwlock vmmaplk r = 0 (0xfffff......)
> > 
> > To reproduce, just do:
> > $ doas pkg_add git go
> > $ git clone https://github.com/etcd-io/bbolt.git
> > $ cd bbolt
> > $ git checkout v1.3.6
> > $ go test -v -run TestSimulate_10000op_10p
> > 
> > The test will hang happen almost instantly.
> > 
> 
> Not sure if this is a hint..
> 
> https://github.com/etcd-io/bbolt/blob/master/db.go#L27-L31
> 
> // IgnoreNoSync specifies whether the NoSync field of a DB is ignored when
> // syncing changes to a file.  This is required as some operating systems,
> // such as OpenBSD, do not have a unified buffer cache (UBC) and writes
> // must be synchronized using the msync(2) syscall.
> const IgnoreNoSync = runtime.GOOS == "openbsd"

Yes, the issue is related to sync(2).  Could you try the diff below, it
is not a fix, and tell me if you can produce the issue with it?  I can't. 

Index: kern/kern_rwlock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_rwlock.c,v
retrieving revision 1.48
diff -u -p -r1.48 kern_rwlock.c
--- kern/kern_rwlock.c  10 May 2022 16:56:16 -0000      1.48
+++ kern/kern_rwlock.c  21 Dec 2022 16:14:44 -0000
@@ -61,7 +61,7 @@ rw_cas(volatile unsigned long *p, unsign
  *
  * RW_WRITE    The lock must be completely empty. We increment it with
  *             RWLOCK_WRLOCK and the proc pointer of the holder.
- *             Sets RWLOCK_WAIT|RWLOCK_WRWANT while waiting.
+ *             Sets RWLOCK_WAIT while waiting.
  * RW_READ     RWLOCK_WRLOCK|RWLOCK_WRWANT may not be set. We increment
  *             with RWLOCK_READ_INCR. RWLOCK_WAIT while waiting.
  */
@@ -75,7 +75,7 @@ static const struct rwlock_op {
        {       /* RW_WRITE */
                RWLOCK_WRLOCK,
                ULONG_MAX,
-               RWLOCK_WAIT | RWLOCK_WRWANT,
+               RWLOCK_WAIT,
                1,
                PLOCK - 4
        },

Reply via email to