for those without much mwait experience, mwait is a kernel-only primitive (as per the instructions) that pauses the processor until a change has been made in some range of memory. the size is determined by probing the h/w, but think cacheline. so the discussion of locking is kernel specific as well.
> > On 17 Dec 2013, at 12:00, cinap_len...@felloff.net wrote: > > > > thats a surprising result. by dog pile lock you mean the runq spinlock no? > > > > I guess it depends on the HW, but I donĀ“t find that so surprising. You are > looping > sending messages to the coherency fabric, which gets congested as a result. > I have seen that happen. i assume you mean that there is contention on the cacheline holding the runq lock? i don't think there's classical congestion. as i believe cachelines not involved in the mwait would experience no hold up. > You should back off, but sleeping for a fixed time is not a good solution > either. > Mwait is a perfect solution in this case, there is some latency, but you are > in a bad > place anyway and with mwait, performance does not degrade too much. mwait() does improve things and one would expect the latency to always be better than spining*. but as it turns out the current scheduler is pretty hopeless in its locking anyway. simply grabbing the lock with lock rather than canlock makes more sense to me. also, using ticket locks (see 9atom nix kernel) will provide automatic backoff within the lock. ticket locks are a poor solution as they're not really scalable but they will scale to 24 cpus much better than tas locks. mcs locks or some other queueing-style lock is clearly the long-term solution. but as charles points out one would really perfer to figure out a way to fit them to the lock api. i have some test code, but testing queueing locks in user space is ... interesting. i need a new approach. - erik * have you done tests on this?