On 07/13/2015 08:02 AM, Peter Zijlstra wrote:
On Sat, Jul 11, 2015 at 04:36:52PM -0400, Waiman Long wrote:@@ -181,9 +187,9 @@ static void pv_wait_node(struct mcs_spinlock *node) pv_wait(&pn->state, vcpu_halted);/* - * Reset the vCPU state to avoid unncessary CPU kicking + * Reset the state except when vcpu_hashed is set. */ - WRITE_ONCE(pn->state, vcpu_running); + cmpxchg(&pn->state, vcpu_halted, vcpu_running);Why? Suppose we did get advanced into the hashed state, and then get a (spurious) wakeup, this means we'll observe our ->locked == 1 condition and fall out of pv_wait_node(). We'll then enter pv_wait_head(), which with your modification:@@ -229,19 +244,42 @@ static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node) { struct pv_node *pn = (struct pv_node *)node; struct __qspinlock *l = (void *)lock; - struct qspinlock **lp = NULL; + struct qspinlock **lp; int loop; + /* + * Initialize lp to a non-NULL value if it has already been in the + * pv_hashed state so that pv_hash() won't be called again. + */ + lp = (READ_ONCE(pn->state) == vcpu_hashed) ? (struct qspinlock **)1 + : NULL; for (;;) { + WRITE_ONCE(pn->state, vcpu_running);Will instantly and unconditionally write vcpu_running.
This code is kind of complicated. I am going to get rid of the current tri-state setup, and switch to a separate sync variable for defer kicking.
Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

