Re: [PATCH 1/3 gnumach] vm_map: Fix deadlock in vm code

Sergey Bugaev Wed, 21 Feb 2024 11:23:03 -0800

On Wed, Feb 21, 2024 at 9:44 PM Samuel Thibault <samuel.thiba...@gnu.org> wrote:
> Damien Zammit, le mer. 21 févr. 2024 13:45:59 +0000, a ecrit:
> > Authored-by: Sergey Bugaev <buga...@gmail.com>


> IIRC you got to see which kind of deadlock we are protecting from here?
> I'd be very useful to document it here, so next people reading this get
> to know.

I didn't document the exact deadlock in the code, since the deadlock
is only a manifestation of the actual issue, the actual issue being
that the entry was being modified by other threads while accessed
inside vm_fault_wire(). This is -- for one thing, technically
undefined behavior -- but also will _of course_ break any kind of
logic that works with the entry's offset / address range. It could be
worse than a deadlock. So what I wanted to emphasize in the code is
that we should make sure the entry is not concurrently read and
modified, and I was going to describe the actual deadlock we've
witnessed in the commit message.

So let's go with that; I'll re-send this patch myself with a detailed
commit message and reworked comment.

> >        [...] We trust that the
> >        * kernel threads are well-behaved, and therefore will
> >        * not do anything destructive to this region of the map
> >        * while we have it unlocked.
>
> Are we not here precisely in the case that they are not well-behaved?

No. The threads are well-behaved, but turns out that's not quite
enough, because, as the new comment says, in the face of clipping and
coalescing entries, even well-behaved, well-intentioned operations on
*adjacent* VM regions can affect the same entry -- in fact, the entry
can get freed altogether if it gets coalesced into the previous entry,
why would create a use-after-free.

> The comment probably deserves an update, to make the situation cler.

I thought it was clear enough, but evidently it wasn't. I'll amend it.

> > +      * Once we unlock the map, even well-intentioned operations
> > +      * on adjacent VM regions can end up affecting our entry,
> > +      * due to clipping and coalescing entries.  So, make a
> > +      * temporary copy of the entry, and pass that to vm_fault_wire()
> > +      * instead of the original.
>
> We need to be very careful with such a thing. If it's a copy that is
> given to further function calls, we need to make sure that they won't
> try to modify it, i.e. add const qualifiers on the entry parameter all
> along to vm_fault_wire and its callees being passed entry.

That's a good idea, yes.

>
> That being said, this doesn't seem completely safe to me: in the end
> vm_fault_wire_fast uses the object field, are we sure that the object
> won't disappear? Or are we sure it won't because "kernel threads are
> well-behaved"? (but I don't really know how we are sure of this, so this
> probably also deserves documenting).

We are sure because kernel threads (or rather, any threads running in
the kernel) are well-behaved, yes. We trust that they won't unmap the
mapping while it's being wired, and the mapping (the entry) keeps the
object alive.

But... now that I think of it, we do also coalesce objects. So we
better keep a second reference on the object over vm_fault_wire
indeed.

> > +      *
> >        * HACK HACK HACK HACK
> >        */
> >       if (vm_map_pmap(map) == kernel_pmap) {
> > +             /*
> > +              *      TODO: Support wiring more than one entry
> > +              *      in the kernel map at a time.
> > +              */
> > +             assert(start->vme_next == end);
>
> Are we sure that this currently is never to happen?

Not really. I don't expect the kernel to do this anywhere, but I don't
know for sure. Please try running some complex workloads with this
patch, and see if you catch this being not so?

There isn't anything fundamentally incompatible with supporting
multi-entry wiring here, I just didn't want to do an alloca.

> > +             entry_copy = *start;
> >               vm_map_unlock(map); /* trust me ... */
> > -     } else {
> > -             vm_map_lock_set_recursive(map);
> > -             vm_map_lock_write_to_read(map);
> > +             vm_fault_wire(map, &entry_copy);
>
> This is also assuming that for non-kernel_pmap there is just one entry
> for which to call vm_fault_wire, while userland can very well ask for
> wiring a wide range of memory, spanning various objects and whatnot.

No, this is still the kernel map branch (note that the "else" is being
deleted by the patch).

> AIUI you'd rather want the converse: inline the code for the kernel_pmap
> case, which is much more trivial since start->vme_next == end and thus
> no for loop to perform, just call vm_fault_wire and the locking stuff
> around, and be done.

That's exactly what this patch does, yes.

Sergey

Re: [PATCH 1/3 gnumach] vm_map: Fix deadlock in vm code

Reply via email to