On 7/31/19 8:22 PM, Zebediah Figura wrote:
On 7/31/19 7:45 PM, Thomas Gleixner wrote:
If I assume a maximum of 65 futexes which got mentioned in one of the
replies then this will allocate 7280 bytes alone for the futex_q array with
a stock debian config which has no debug options enabled which would bloat
the struct. Adding the futex_wait_block array into the same allocation
becomes larger than 8K which already exceeds thelimit of SLUB kmem
caches and forces the whole thing into the page allocator directly.

This sucks.

Also I'm confused about the 64 maximum resulting in 65 futexes comment in
one of the mails.

Can you please explain what you are trying to do exatly on the user space
side?

The extra futex comes from the fact that there are a couple of, as it
were, out-of-band ways to wake up a thread on Windows. [Specifically, a
thread can enter an "alertable" wait in which case it will be woken up
by a request from another thread to execute an "asynchronous procedure
call".] It's easiest for us to just add another futex to the list in
that case.

To be clear, the 64/65 distinction is an implementation detail that's pretty much outside the scope of this discussion. I should have just said 65 directly. Sorry about that.


I'd also point out, for whatever it's worth, that while 64 is a hard
limit, real applications almost never go nearly that high. By far the
most common number of primitives to select on is one.
Performance-critical code never tends to wait on more than three. The
most I've ever seen is twelve.

If you'd like to see the user-side source, most of the relevant code is
at [1], in particular the functions __fsync_wait_objects() [line 712]
and do_single_wait [line 655]. Please feel free to ask for further
clarification.

[1]
https://github.com/ValveSoftware/wine/blob/proton_4.11/dlls/ntdll/fsync.c




Thanks,

        tglx



Reply via email to