Marko Kreen <marko@l-t.ee> writes: > Hmm. I guess this could be separated into 2 cases:
> 1. Light load - both lock owner and lock requester wont get > scheduled while busy (owner in critical section, waiter > spinning.) > 2. Big load - either or both of them gets scheduled while busy. > (waiter is scheduled by OS or voluntarily by eg. calling select()) Don't forget that the coding rules for our spinlocks say that you mustn't hold any such lock for more than a couple dozen instructions, and certainly any kernel call while holding the lock is Right Out. There is *no* case where the holder of a spinlock is going to voluntarily give up the CPU. The design intention was that the odds of losing the CPU while holding a spinlock would be negligibly small, simply because we don't hold it very long. > About fast yielding, comment on sys_sched_yield() says: > * sys_sched_yield - yield the current processor to other threads. > * > * this function yields the current CPU by moving the calling thread > * to the expired array. If there are no other threads running on this > * CPU then this function will return. Mph. So that's pretty much exactly what I suspected... I just had a thought: it seems that the reason we are seeing a significant issue here is that on SMP machines, the cost of trading exclusively-owned cache lines back and forth between processors is so high that the TAS instructions (specifically the xchgb, in the x86 cases) represent a significant fraction of backend execution time all by themselves. (We know this is the case due to oprofile results, see discussions from last April.) What that means is that there's a fair chance of a process losing its timeslice immediately after the xchgb. Which is precisely the scenario we do not want, if the process successfully acquired the spinlock by means of the xchgb. We could ameliorate this if there were a way to acquire ownership of the cache line without necessarily winning the spinlock. I'm imagining that we insert a "dummy" locked instruction just ahead of the xchgb, which touches the spinlock in such a way as to not change its state. (xchgb won't do for this, but maybe one of the other lockable instructions will.) We do the xchgb just after this one. The idea is that if we don't own the cache line, the first instruction causes it to be faulted into the processor's cache, and if our timeslice expires while that is happening, we lose the processor without having acquired the spinlock. This assumes that once we've got the cache line, the xchgb that actually does the work can get executed with not much extra time spent and only low probability of someone else stealing the cache line back first. The fact that cmpb isn't helping proves that getting the cache line in a read-only fashion does *not* do enough to protect the xchgb in this way. But maybe another locking instruction would. Comments? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match