On Tue, Sep 13, 2005 at 10:10:13AM -0400, Tom Lane wrote: > Marko Kreen <marko@l-t.ee> writes: > > On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote: > >> However, given that we are only expecting > >> the spinlock to be held for a couple dozen instructions, using the > >> kernel futex mechanism is huge overkill --- the in-kernel overhead > >> to manage the futex state is almost certainly several orders of > >> magnitude more than the delay we actually want. > > > Why do you think so? AFAIK on uncontented case there will be no > > kernel access, only atomic inc/dec. > > In the uncontended case, we never even enter s_lock() and so the entire > mechanism of yielding is irrelevant. The problem that's being exposed > by these test cases is that on multiprocessors, you can see a > significant rate of spinlock contention (order of 100 events/second, > which is still a tiny fraction of the number of TAS calls) and our > existing mechanism for dealing with contention is just not efficient > enough. > > > On contented case you'll want task switch anyway, so the futex > > managing should not matter. > > No, we DON'T want a task switch. That's the entire point: in a > multiprocessor, it's a good bet that the spinlock is held by a task > running on another processor, and doing a task switch will take orders > of magnitude longer than just spinning until the lock is released. > You should yield only after spinning long enough to make it a strong > probability that the spinlock is held by a process that's lost the > CPU and needs to be rescheduled.
Hmm. I guess this could be separated into 2 cases: 1. Light load - both lock owner and lock requester wont get scheduled while busy (owner in critical section, waiter spinning.) 2. Big load - either or both of them gets scheduled while busy. (waiter is scheduled by OS or voluntarily by eg. calling select()) So my impression is that currently you optimize for #1 at the expense of #2, while with futexes you'd optimize for #2 at the expense of #1. Additionally I'm pretty convinced that futexes give you most efficient implementation for #2, as kernel knows what processes are waiting on particular lock so it can make best decisions for scheduling. > > If you don't want Linux-specific locking in core code, then > > it's another matter... > > Well, it's true, we don't particularly want a one-platform solution, > but if it did what we wanted we might hold our noses and use it anyway. > > (I think, BTW, that using futexes at the spinlock level is misguided; > what would be interesting would be to see if we could substitute for > both LWLock and spinlock logic with one futex-based module.) Use pthreads ;) > >> I also saw fairly frequent "stuck spinlock" panics when running > >> more queries than there were processors --- this despite increasing > >> NUM_DELAYS to 10000 in s_lock.c. So I don't trust sched_yield > >> anymore. Whatever it's doing in Linux 2.6 isn't what you'd expect. > >> (I speculate that it's set up to only yield the processor to other > >> processes already affiliated to that processor. In any case, it > >> is definitely capable of getting through 10000 yields without > >> running the guy who's holding the spinlock.) > > > This is intended behaviour of sched_yield. > > > http://lwn.net/Articles/31462/ > > http://marc.theaimsgroup.com/?l=linux-kernel&m=112432727428224&w=2 > > No; that page still says specifically "So a process calling > sched_yield() now must wait until all other runnable processes in the > system have used up their time slices before it will get the processor > again." I can prove that that is NOT what happens, at least not on > a multi-CPU Opteron with current FC4 kernel. However, if the newer > kernels penalize a process calling sched_yield as heavily as this page > claims, then it's not what we want anyway ... My fault. As I saw that there is problem with sched_yield, I said "I bet this is because of behaviour change" and only skimmed the exact details. But the point that sched_yield is not meant for such usage still stands. About fast yielding, comment on sys_sched_yield() says: * sys_sched_yield - yield the current processor to other threads. * * this function yields the current CPU by moving the calling thread * to the expired array. If there are no other threads running on this * CPU then this function will return. So there just is nothing else to schedule on that CPU. -- marko ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly