Marko Kreen <marko@l-t.ee> writes: > On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote: >> However, given that we are only expecting >> the spinlock to be held for a couple dozen instructions, using the >> kernel futex mechanism is huge overkill --- the in-kernel overhead >> to manage the futex state is almost certainly several orders of >> magnitude more than the delay we actually want.
> Why do you think so? AFAIK on uncontented case there will be no > kernel access, only atomic inc/dec. In the uncontended case, we never even enter s_lock() and so the entire mechanism of yielding is irrelevant. The problem that's being exposed by these test cases is that on multiprocessors, you can see a significant rate of spinlock contention (order of 100 events/second, which is still a tiny fraction of the number of TAS calls) and our existing mechanism for dealing with contention is just not efficient enough. > On contented case you'll want task switch anyway, so the futex > managing should not matter. No, we DON'T want a task switch. That's the entire point: in a multiprocessor, it's a good bet that the spinlock is held by a task running on another processor, and doing a task switch will take orders of magnitude longer than just spinning until the lock is released. You should yield only after spinning long enough to make it a strong probability that the spinlock is held by a process that's lost the CPU and needs to be rescheduled. > If you don't want Linux-specific locking in core code, then > it's another matter... Well, it's true, we don't particularly want a one-platform solution, but if it did what we wanted we might hold our noses and use it anyway. (I think, BTW, that using futexes at the spinlock level is misguided; what would be interesting would be to see if we could substitute for both LWLock and spinlock logic with one futex-based module.) >> I also saw fairly frequent "stuck spinlock" panics when running >> more queries than there were processors --- this despite increasing >> NUM_DELAYS to 10000 in s_lock.c. So I don't trust sched_yield >> anymore. Whatever it's doing in Linux 2.6 isn't what you'd expect. >> (I speculate that it's set up to only yield the processor to other >> processes already affiliated to that processor. In any case, it >> is definitely capable of getting through 10000 yields without >> running the guy who's holding the spinlock.) > This is intended behaviour of sched_yield. > http://lwn.net/Articles/31462/ > http://marc.theaimsgroup.com/?l=linux-kernel&m=112432727428224&w=2 No; that page still says specifically "So a process calling sched_yield() now must wait until all other runnable processes in the system have used up their time slices before it will get the processor again." I can prove that that is NOT what happens, at least not on a multi-CPU Opteron with current FC4 kernel. However, if the newer kernels penalize a process calling sched_yield as heavily as this page claims, then it's not what we want anyway ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly