On 06/14/2013 07:57 AM, Paul E. McKenney wrote: > On Fri, Jun 14, 2013 at 07:25:57AM +0800, Lai Jiangshan wrote: >> On Thu, Jun 13, 2013 at 11:22 PM, Paul E. McKenney >> <paul...@linux.vnet.ibm.com> wrote: >>> On Thu, Jun 13, 2013 at 10:55:41AM +0800, Lai Jiangshan wrote: >>>> On 06/12/2013 11:40 PM, Paul E. McKenney wrote: >>>>> Breaking up locks is better than implementing high-contention locks, but >>>>> if we must have high-contention locks, why not make them automatically >>>>> switch between light-weight ticket locks at low contention and queued >>>>> locks at high contention? After all, this would remove the need for >>>>> the developer to predict which locks will be highly contended. >>>>> >>>>> This commit allows ticket locks to automatically switch between pure >>>>> ticketlock and queued-lock operation as needed. If too many CPUs are >>>>> spinning on a given ticket lock, a queue structure will be allocated >>>>> and the lock will switch to queued-lock operation. When the lock becomes >>>>> free, it will switch back into ticketlock operation. The low-order bit >>>>> of the head counter is used to indicate that the lock is in queued mode, >>>>> which forces an unconditional mismatch between the head and tail counters. >>>>> This approach means that the common-case code path under conditions of >>>>> low contention is very nearly that of a plain ticket lock. >>>>> >>>>> A fixed number of queueing structures is statically allocated in an >>>>> array. The ticket-lock address is used to hash into an initial element, >>>>> but if that element is already in use, it moves to the next element. If >>>>> the entire array is already in use, continue to spin in ticket mode. >>>>> >>>>> Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> >>>>> [ paulmck: Eliminate duplicate code and update comments (Steven Rostedt). >>>>> ] >>>>> [ paulmck: Address Eric Dumazet review feedback. ] >>>>> [ paulmck: Use Lai Jiangshan idea to eliminate smp_mb(). ] >>>>> [ paulmck: Expand ->head_tkt from s32 to s64 (Waiman Long). ] >>>>> [ paulmck: Move cpu_relax() to main spin loop (Steven Rostedt). ] >>>>> [ paulmck: Reduce queue-switch contention (Waiman Long). ] >>>>> [ paulmck: __TKT_SPIN_INC for __ticket_spin_trylock() (Steffen Persvold). >>>>> ] >>>>> [ paulmck: Type safety fixes (Steven Rostedt). ] >>>>> [ paulmck: Pre-check cmpxchg() value (Waiman Long). ] >>>>> [ paulmck: smp_mb() downgrade to smp_wmb() (Lai Jiangshan). ] >>>> >>>> >>>> Hi, Paul, >>>> >>>> I simplify the code and remove the search by encoding the index of struct >>>> tkt_q_head >>>> into lock->tickets.head. >>>> >>>> A) lock->tickets.head(when queued-lock): >>>> --------------------------------- >>>> index of struct tkt_q_head |0|1| >>>> --------------------------------- >>> >>> Interesting approach! It might reduce queued-mode overhead a bit in >>> some cases, though I bet that in the common case the first queue element >>> examined is the right one. More on this below. >>> >>>> The bit0 = 1 for queued, and the bit1 = 0 is reserved for >>>> __ticket_spin_unlock(), >>>> thus __ticket_spin_unlock() will not change the higher bits of >>>> lock->tickets.head. >>>> >>>> B) tqhp->head is for the real value of lock->tickets.head. >>>> if the last bit of tqhp->head is 1, it means it is the head ticket when >>>> started queuing. >>> >>> But don't you also need the xadd() in __ticket_spin_unlock() to become >>> a cmpxchg() for this to work? Or is your patch missing your changes to >>> arch/x86/include/asm/spinlock.h? Either way, this is likely to increase >>> the no-contention overhead, which might be counterproductive. Wouldn't >>> hurt to get measurements, though. >> >> No, don't need to change __ticket_spin_unlock() in my idea. >> bit1 in the tickets.head is reserved for __ticket_spin_unlock(), >> __ticket_spin_unlock() only changes the bit1, it will not change >> the higher bits. tkt_q_do_wake() will restore the tickets.head. >> >> This approach avoids cmpxchg in __ticket_spin_unlock(). > > Ah, I did miss that. But doesn't the adjustment in __ticket_spin_lock() > need to be atomic in order to handle concurrent invocations of > __ticket_spin_lock()?
I don't understand, do we just discuss about __ticket_spin_unlock() only? Or does my suggestion hurt __ticket_spin_lock()? > > Either way, it would be good to see the performance effects of this. > > Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/