On Sat, Jun 02, 2018 at 03:08:14PM -0700, Philip Guenther wrote:
> On Sat, 2 Jun 2018, Christophe Prévotaux wrote:
> > This a witness report I got on boot with snapshot Jun 1st amd64
> >
> > root on sd0a (9b49e3196b9bfae8.a) swap on sd0b dump on sd0b
> > lock order reversal:
> >  1st 0xffffff021cdac180 vmmaplk (&map->lock) @ 
> > /usr/src/sys/uvm/uvm_map.c:4433
> >  2nd 0xffffff01dc5f71a8 inode (&ip->i_lock)
> 
> I believe uvm and the vnode layer handle this correctly, with lock tries 
> that fall back to releasing the other lock and retrying so progress is 
> made.  The fix for WITNESS complaining is to mark vmmaplk as a vnode lock.

I think there is an actual issue because the locking calls are
unconditional. FreeBSD appears to work around the problem by unlocking
the vm_map when calling the pager. The diff below adapts that logic
to OpenBSD.

Because the temporary unlocking may allow another thread to change the
vm_map, the code has to check if the map has been altered since the
unlocking, and if so, handle the case somehow. The patch uses a best
effort approach where the code proceeds from the vm_map entry indicated
by the end address of the current vm_map entry. The sanity checks that
are done at the start of uvm_map_clean() are not rerun.

The system call that triggers the reversal is msync(2), and the
reversal can be reproduced with the sys/kern/mmap regression test.
sys/kern/mmap3 shows that there is another similar reversal with
mlock(2) which is not covered by the patch.

Index: uvm/uvm_map.c
===================================================================
RCS file: src/sys/uvm/uvm_map.c,v
retrieving revision 1.237
diff -u -p -r1.237 uvm_map.c
--- uvm/uvm_map.c       18 Apr 2018 16:05:21 -0000      1.237
+++ uvm/uvm_map.c       3 Jun 2018 10:42:59 -0000
@@ -4420,8 +4420,11 @@ uvm_map_clean(struct vm_map *map, vaddr_
        struct vm_page *pg;
        struct uvm_object *uobj;
        vaddr_t cp_start, cp_end;
+       vaddr_t ent_start, ent_end;
+       voff_t ent_offset;
        int refs;
        int error;
+       unsigned int timestamp;
        boolean_t rv;
 
        KASSERT((flags & (PGO_FREE|PGO_DEACTIVATE)) !=
@@ -4450,8 +4453,7 @@ uvm_map_clean(struct vm_map *map, vaddr_
        }
 
        error = 0;
-       for (entry = first; entry != NULL && entry->start < end;
-           entry = RBT_NEXT(uvm_map_addr, entry)) {
+       for (entry = first; entry != NULL && entry->start < end; ) {
                amap = entry->aref.ar_amap;     /* top layer */
                if (UVM_ET_ISOBJ(entry))
                        uobj = entry->object.uvm_obj;
@@ -4546,13 +4548,27 @@ flush_object:
                    ((flags & PGO_FREE) == 0 ||
                     ((entry->max_protection & PROT_WRITE) != 0 &&
                      (entry->etype & UVM_ET_COPYONWRITE) == 0))) {
+                       ent_start = entry->start;
+                       ent_end = entry->end;
+                       ent_offset = entry->offset;
+                       timestamp = map->timestamp;
+                       vm_map_unlock_read(map);
+
                        rv = uobj->pgops->pgo_flush(uobj,
-                           cp_start - entry->start + entry->offset,
-                           cp_end - entry->start + entry->offset, flags);
+                           cp_start - ent_start + ent_offset,
+                           cp_end - ent_start + ent_offset, flags);
 
                        if (rv == FALSE)
                                error = EFAULT;
-               }
+
+                       vm_map_lock_read(map);
+                       if (timestamp != map->timestamp)
+                               entry = uvm_map_entrybyaddr(&map->addr,
+                                   ent_end);
+                       else
+                               entry = RBT_NEXT(uvm_map_addr, entry);
+               } else
+                       entry = RBT_NEXT(uvm_map_addr, entry);
        }
 
        vm_map_unlock_read(map);

Reply via email to