On Thu, Apr 01, 2021 at 11:31:53AM -0400, Alex Kogan wrote:

> +/*
> + * cna_splice_tail -- splice the next node from the primary queue onto
> + * the secondary queue.
> + */
> +static void cna_splice_next(struct mcs_spinlock *node,
> +                         struct mcs_spinlock *next,
> +                         struct mcs_spinlock *nnext)

You forgot to update the comment when you changed the name on this
thing.

> +/*
> + * cna_order_queue - check whether the next waiter in the main queue is on
> + * the same NUMA node as the lock holder; if not, and it has a waiter behind
> + * it in the main queue, move the former onto the secondary queue.
> + */
> +static void cna_order_queue(struct mcs_spinlock *node)
> +{
> +     struct mcs_spinlock *next = READ_ONCE(node->next);
> +     struct cna_node *cn = (struct cna_node *)node;
> +     int numa_node, next_numa_node;
> +
> +     if (!next) {
> +             cn->partial_order = LOCAL_WAITER_NOT_FOUND;
> +             return;
> +     }
> +
> +     numa_node = cn->numa_node;
> +     next_numa_node = ((struct cna_node *)next)->numa_node;
> +
> +     if (next_numa_node != numa_node) {
> +             struct mcs_spinlock *nnext = READ_ONCE(next->next);
> +
> +             if (nnext) {
> +                     cna_splice_next(node, next, nnext);
> +                     next = nnext;
> +             }
> +             /*
> +              * Inherit NUMA node id of primary queue, to maintain the
> +              * preference even if the next waiter is on a different node.
> +              */
> +             ((struct cna_node *)next)->numa_node = numa_node;
> +     }
> +}

So the obvious change since last time I looked a this is that it now
only looks 1 entry ahead. Which makes sense I suppose.

I'm not really a fan of the 'partial_order' name combined with that
silly enum { LOCAL_WAITER_FOUND, LOCAL_WAITER_NOT_FOUND }. That's just
really bad naming all around. The enum is about having a waiter while
the variable is about partial order, that doesn't match at all.

If you rename the variable to 'has_waiter' and simply use 0,1 values,
things would be ever so more readable. But I don't think that makes
sense, see below.

I'm also not sure about that whole numa_node thing, why would you
over-write the numa node, why at this point ?

> +
> +/* Abuse the pv_wait_head_or_lock() hook to get some work done */
> +static __always_inline u32 cna_wait_head_or_lock(struct qspinlock *lock,
> +                                              struct mcs_spinlock *node)
> +{
> +     /*
> +      * Try and put the time otherwise spent spin waiting on
> +      * _Q_LOCKED_PENDING_MASK to use by sorting our lists.
> +      */
> +     cna_order_queue(node);
> +
> +     return 0; /* we lied; we didn't wait, go do so now */

So here we inspect one entry ahead and then quit. I can't rmember, but
did we try something like:

        /*
         * Try and put the time otherwise spent spin waiting on
         * _Q_LOCKED_PENDING_MASK to use by sorting our lists.
         * Move one entry at a go until either the list is fully
         * sorted or we ran out of spin condition.
         */
        while (READ_ONCE(lock->val) & _Q_LOCKED_PENDING_MASK &&
               node->partial_order)
                cna_order_queue(node);

        return 0;

This will keep moving @next to the remote list until such a time that
we're forced to continue or @next is local.

> +}
> +
> +static inline void cna_lock_handoff(struct mcs_spinlock *node,
> +                              struct mcs_spinlock *next)
> +{
> +     struct cna_node *cn = (struct cna_node *)node;
> +     u32 val = 1;
> +
> +     u32 partial_order = cn->partial_order;
> +
> +     if (partial_order == LOCAL_WAITER_NOT_FOUND)
> +             cna_order_queue(node);
> +

AFAICT this is where playing silly games with ->numa_node belong; but
right along with that goes a comment that describes why any of that
makes sense.

I mean, if you leave your node, for any reason, why bother coming back
to it, why not accept it is a sign of the gods you're overdue for a
node-change?

Was the efficacy of this scheme tested?

> +     /*
> +      * We have a local waiter, either real or fake one;
> +      * reload @next in case it was changed by cna_order_queue().
> +      */
> +     next = node->next;
> +     if (node->locked > 1)
> +             val = node->locked;     /* preseve secondary queue */

IIRC we used to do:

        val |= node->locked;

Which is simpler for not having branches. Why change a good thing?

> +
> +     arch_mcs_lock_handoff(&next->locked, val);
> +}

Reply via email to