在 2017/2/10 上午4:53, Waiman Long 写道:
> On 02/07/2017 10:39 PM, Xinhui Pan wrote:
>>
>>
>> 2016-12-26 4:26 GMT+08:00 Waiman Long <[email protected]
>> <mailto:[email protected]>>:
>>
>> A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more
>> relaxed versions to improve performance on architectures that use LL/SC.
>>
>> All the locking related cmpxchg's are replaced with the _acquire
>> variants:
>> - pv_queued_spin_steal_lock()
>> - trylock_clear_pending()
>>
>> The cmpxchg's related to hashing are replaced by either by the _release
>> or the _relaxed variants. See the inline comment for details.
>>
>> Signed-off-by: Waiman Long <[email protected]
>> <mailto:[email protected]>>
>>
>> v1->v2:
>> - Add comments in changelog and code for the rationale of the change.
>>
>> ---
>> kernel/locking/qspinlock_paravirt.h | 50
>> ++++++++++++++++++++++++-------------
>> 1 file changed, 33 insertions(+), 17 deletions(-)
>>
>>
>> @@ -323,8 +329,14 @@ static void pv_wait_node(struct mcs_spinlock *node,
>> struct mcs_spinlock *prev)
>> * If pv_kick_node() changed us to vcpu_hashed, retain
>> that
>> * value so that pv_wait_head_or_lock() knows to not
>> also try
>> * to hash this lock.
>> + *
>> + * The smp_store_mb() and control dependency above will
>> ensure
>> + * that state change won't happen before that.
>> Synchronizing
>> + * with pv_kick_node() wrt hashing by this waiter or by
>> the
>> + * lock holder is done solely by the state variable.
>> There is
>> + * no other ordering requirement.
>> */
>> - cmpxchg(&pn->state, vcpu_halted, vcpu_running);
>> + cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_running);
>>
>> /*
>> * If the locked flag is still not set after wakeup, it
>> is a
>> @@ -360,9 +372,12 @@ static void pv_kick_node(struct qspinlock *lock,
>> struct mcs_spinlock *node)
>> * pv_wait_node(). If OTOH this fails, the vCPU was running and
>> will
>> * observe its next->locked value and advance itself.
>> *
>> - * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
>> + * Matches with smp_store_mb() and cmpxchg_relaxed() in
>> pv_wait_node().
>> + * A release barrier is used here to ensure that node->locked is
>> + * always set before changing the state. See comment in
>> pv_wait_node().
>> */
>> - if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != vcpu_halted)
>> + if (cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed)
>> + != vcpu_halted)
>> return;
>>
>> hi, Waiman
>> We can't use _release here, a full barrier is needed.
>>
>> There is pv_kick_node vs pv_wait_head_or_lock
>>
>> [w] l->locked = _Q_SLOW_VAL //reordered here
>>
>> if (READ_ONCE(pn->state) == vcpu_hashed) //False.
>>
>> lp = (struct qspinlock **)1;
>>
>> [STORE] pn->state = vcpu_hashed lp = pv_hash(lock,
>> pn);
>> pv_hash() if
>> (xchg(&l->locked, _Q_SLOW_VAL) == 0) // fasle, not unhashed.
>>
>> Then the same lock has hashed twice but only unhashed once. So at last as
>> the hash table grows big, we hit RCU stall.
>>
>> I hit RCU stall when I run netperf benchmark
>>
>> thanks
>> xinhui
>>
>>
>> --
>> 1.8.3.1
>>
>>
> Yes, I know I am being too aggressive in this patch. I am going to tone it
> down a bit. I just don't have time to run a performance test on PPC system to
> verify the gain yet. I am planning to send an updated patch soon.
>
hi, All
I guess I have found the scenario that causes the RCU stall.
pv_wait_node
[L] pn->state // this load is reordered from cmxchg_release.
smp_store_mb(pn->state, vcpu_halted);
if (!READ_ONCE(node->locked))
arch_mcs_spin_unlock_contended(&next->locked);
pv_kick_node
[-L]cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed)
//cmpxchg_release fails, so pn->state keep as it is.
pv_wait(&pn->state, vcpu_halted);
//on PPC, It will not return until pn->state != vcpu_halted.
And when rcu stall hit, I fire an BUG(), and enter debug mode, it seems most
cpus are in pv_wait...
So the soltuion to solve this problems is simple, keep the cmpxchg as it is in
pv_kick_node, cmpxchg on ppc provides full barriers.
thanks
xinhui
> Cheers,
> Longman
>