Re: [PATCH RFC 2/2] futex: Implement mechanism to wait on any of several futexes

Zebediah Figura Wed, 31 Jul 2019 16:02:36 -0700

On 7/31/19 5:39 PM, Thomas Gleixner wrote:

On Wed, 31 Jul 2019, Zebediah Figura wrote:

On 7/31/19 7:06 AM, Peter Zijlstra wrote:

On Tue, Jul 30, 2019 at 06:06:02PM -0400, Gabriel Krisman Bertazi wrote:

This is a new futex operation, called FUTEX_WAIT_MULTIPLE, which allows
a thread to wait on several futexes at the same time, and be awoken by
any of them.  In a sense, it implements one of the features that was
supported by pooling on the old FUTEX_FD interface.


My use case for this operation lies in Wine, where we want to implement
a similar interface available in Windows, used mainly for event
handling.  The wine folks have an implementation that uses eventfd, but
it suffers from FD exhaustion (I was told they have application that go
to the order of multi-milion FDs), and higher CPU utilization.


So is multi-million the range we expect for @count ?


Not in Wine's case; in fact Wine has a hard limit of 64 synchronization
primitives that can be waited on at once (which, with the current user-side
code, translates into 65 futexes). The exhaustion just had to do with the
number of primitives created; some programs seem to leak them badly.


And how is the futex approach better suited to 'fix' resource leaks?

The crucial constraints for implementing Windows synchronizationprimitives in Wine are that (a) it must be possible to access them frommultiple processes and (b) it must be possible to wait on more than oneat a time.

The current best solution for this, performance-wise, backs each Windowssynchronization primitive with an eventfd(2) descriptor and uses poll(2)to select on them. Windows programs can create an apparently unboundednumber of synchronization objects, though they can only wait on up to 64at a time. However, on Linux the NOFILE limit causes problems; somedistributions have it as low as 4096 by default, which is too low evenfor some modern programs that don't leak objects.

The approach we are developing, that relies on this patch, backs eachobject with a single futex whose value represents its signaled state.Therefore the only resource we are at risk of running out of isavailable memory, which exists in far greater quantities than availabledescriptors. [Presumably Windows synchronization primitives require atleast some kernel memory to be allocated per object as well, so thisputs us essentially at parity, for whatever that's worth.]

To be clear, I think the primary impetus for developing the futex-basedapproach was performance; it lets us avoid some system calls in hotpaths (e.g. waiting on an already signaled object, resetting the stateof an object to unsignaled. In that respect we're trying to get ahead ofWindows, I guess.) But we have still been encountering occasional griefdue to NOFILE limits that are too low, so this is another helpful benefit.

Re: [PATCH RFC 2/2] futex: Implement mechanism to wait on any of several futexes

Reply via email to