> It suffers the typical problems all those constructs do; namely it
> wrecks accountability.

That's "government thinking" ;-) - for most real users throughput is
more important than accountability. With the right API it ought to also
be compile time switchable.

> But here that is compounded by the fact that you inject other people's
> work into 'your' lock region, thereby bloating lock hold times. Worse,
> afaict (from a quick reading) there really isn't a bound on the amount
> of work you inject.

That should be relatively easy to fix but for this kind of lock you
normally get the big wins from stuff that is only a short amount of
executing code. The fairness your trade in the cases it is useful should
be tiny except under extreme load, where the "accountability first"
behaviour would be to fall over in a heap.

If your "lock" involves a lot of work then it probably should be a work
queue or not using this kind of locking.

> And while its a cute collapse of an MCS lock and lockless list style
> work queue (MCS after all is a lockless list), saving a few cycles from
> the naive spinlock+llist implementation of the same thing, I really
> do not see enough justification for any of this.

I've only personally dealt with such locks in the embedded space but
there it was a lot more than a few cycles because you go from


        take lock
                                                spins
        pull things into cache
        do stuff
        cache lines go write/exclusive
        unlock

                                                take lock
                                                move all the cache
                                                do stuff
                                                etc

to

        take lock
                                                queue work
        pull things into cache
        do work 1
        caches line go write/exclusive
        do work 2
        
        unlock
                                                done

and for the kind of stuff you apply those locks you got big improvements.
Even on crappy little embedded processors cache bouncing hurts. Even
better work merging locks like this tend to improve throughput more the
higher the contention unlike most other lock types.

The claim in the original post is 3x performance but doesn't explain
performance doing what, or which kernel locks were switched and what
patches were used. I don't find the numbers hard to believe for a big big
box, but I'd like to see the actual use case patches so it can be benched
with other workloads and also for latency and the like.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to