I wrote: > We could ameliorate this if there were a way to acquire ownership of the > cache line without necessarily winning the spinlock. I'm imagining > that we insert a "dummy" locked instruction just ahead of the xchgb, > which touches the spinlock in such a way as to not change its state.
I tried this, using this tas code: static __inline__ int tas(volatile slock_t *lock) { register slock_t _res = 1; register slock_t _dummy = 0; /* Use a locking test before trying to take the spinlock */ /* xchg implies a LOCK prefix, so no need to say LOCK for it */ __asm__ __volatile__( " lock \n" " xaddb %2,%1 \n" " xchgb %0,%1 \n" : "+q"(_res), "+m"(*lock), "+q"(_dummy) : : "memory", "cc"); return (int) _res; } At least on Opteron, it's a loser. The previous best results (with slock-no-cmpb and spin-delay patches) were 1 31s 2 42s 4 51s 8 100s and with this instead of slock-no-cmpb, 1 33s 2 45s 4 55s 8 104s The xadd may indeed be helping in terms of protecting the xchg from end-of-timeslice --- the rate of select() delays is really tiny, one every few seconds, which is better than I saw before. But the extra cost of the extra locked operation isn't getting repaid overall. Feel free to try it on other hardware, but it doesn't look promising. BTW, I also determined that on that 4-way Opteron box, the integer modulo idea doesn't make any difference --- that is, spin-delay and what Michael called spin-delay-2 are the same speed. I think I had tried the modulo before adding the variable spin delay, and it did help in that configuration; but most likely, it was just helping by stretching out the amount of time spent looping before entering the kernel. So we can drop that idea too. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match