Re: [PATCH v16] kern: simple futex for gnumach

Richard Braun Sat, 18 Jan 2014 00:54:43 -0800

On Fri, Jan 17, 2014 at 07:45:15PM -0800, Roland McGrath wrote:
> > This is why I was insisting on passing *memory* through IPC. 
> 
> It's not at all clear that makes any kind of sense, unless you mean
> something I haven't imagined.  Can you be specific about exactly what the
> interface (say, a well-commented MiG .defs fragment) you have in mind would
> look like?
> 
> If it's an RPC that passes out of line memory, that (IIRC) always has
> virtual-copy semantics, never page-sharing semantics.  So it would be
> fundamentally the wrong model for matching up with other futex calls (from
> the same task or others) to synchronize on a shared int, which is what the
> futex semantics is all about.


That's right, IPC can only copy private memory, not make it shared. So
technically, not through IPC, but through the VM system.

> What I always anticipated for a Machish futex interface was vm_futex_*
> calls, which is to say, technically RPCs to the task port (which need not
> be the task port of the caller), passing an address as an integer literal
> just as calls like vm_write do (and each compare&exchange value as a datum,
> i.e. an integer literal, just as vm_write takes a datum of byte-array type,
> with semantics unchanged by whether that's inline or out of line memory).

That's more what I had in mind too.

> The task port and address serve as a proxy by which the kernel finds the
> memory object and offset, and the actual synchronization semantics are
> about that offset in that memory object and the contents of the word at
> that location.  (Like all such calls, they would likely be optimized
> especially for the case of calls to task-self and probably even to the
> extent of having a bespoke syscall for the most-optimized case, as with
> vm_allocate.  But that's later optimization.)
> 
> Given the specified usage patterns for the futex operations, it might be
> reasonable enough to implement those semantics solely by translating to a
> physical page, including blocking to fault one in, and then associating the
> wait queues with offsets into the physical page rather than the memory
> object abstraction.  (Both a waiter and a waker will have just faulted in
> the page before making the futex call anyway.)  But note that the semantics
> require that if a waiter was blocked when the virtual page got paged out,
> then when you page it back in inside vm_futex_wake, that old waiter must
> get woken.  I don't know the kernel's VM internals much at all, but I
> suspect that all tasks mapping a shared page do not get eagerly updated
> when the memory object page is paged in to service a page fault in some
> other task, but rather service minor faults on demand (i.e. later) to
> rediscover the new assocation between the virtual page and the new physical
> page incidentally brought in by someone else's page fault a little earlier.
> Since you need to track waiters at the memory object level while their page
> is nonresident anyway, it probably makes sense just to hang the {offset =>
> wait queue} table off the memory object and always use that.  At least,
> that seems like the approach for the first version that ensures correctness
> in all the corners of the semantics.  It can get fancier as needed in later
> optimizations.  When it comes to optimizing it, a fairly deep understanding
> of the Linux futex implementation (which I don't have off hand, though I
> have read it in the past) is probably instructive.

Locking physical pages could be used for denial of service, i.e. a user
may implicitely starve the system of wired memory, unless they're
accounted as such, but then users might be "randomly" unable to use
mutexes. Trying to cope with object/offset to pages associations would
imply some container very similar to what has been considered until now
for the regular case (that is, a hash table or tree for all shared
futexes) so that futexes can immediately be reassociated to pages after
faulting them in. So I expect we have to use VM objects.

What I had in mind is already partially explained in previous mails
but I still didn't take the time to get a clear view of every use case
so it's probably incomplete. But it would start with a union of either
(task translated to map, address) or (object, offset), depending on the
futex type (private, shared, respectively). Problems I can see with that
approach are :
- do we have to check that shared futexes refer to shareable memory ?
- if so, how to make that check reliably ?
- what happens when unmapping a futex ?
- does copy-on-right have any effect on a private futex - if implemented
  as a (map, address) pair, I imagine it wouldn't, but is it true ?

These are the kind of things I was hoping to discuss with this patch.

-- 
Richard Braun

Re: [PATCH v16] kern: simple futex for gnumach

Reply via email to