On Fri, Mar 6, 2015 at 11:20 AM, Davidlohr Bueso <d...@stgolabs.net> wrote: > > I obviously agree with all those points, however fyi most of the testing > on rwsems I do includes scaling address space ops stressing the > mmap_sem, which is a real world concern. So while it does include > microbenchmarks, it is not guided by them.
So I agree that mmap_sem is problematic. We probably still end up holding it over many actual IO operations, for example. The whole "FAULT_RETRY" thing should have helped a lot, in that hopefully at least a fair amount of the time we now end up waiting for the IO without holding the semaphore, but I bet many other cases remain. And I also suspect that we could try to be even more aggressive, and allow some entirely unlocked cases. For example, long long ago we used to have a completely SMP-unsafe model where we would do things optimistically - doing IO without holding any locks, and then before we "committed" to it, we'd re-try. And I wonder if we might want to re-introduce that for the cases where we hit in caches and could use RCU. IOW, I wonder if we could special-case the common non-IO fault-handling path something along the lines of: - look up the vma in the vma lookup cache - look up the page in the page cache - get the page table spinlock - re-check the vma now (it ends up being stable if it can't be torn down due to the page table spinlock) because I suspect that page faults are the biggest users of that mmap_sem, and we could probably handle a fairly large common case (making it simpler by special-casing it and punting in any even _slightly_ complicated situations) without even getting the semaphore at all, since we have to serialize on the actual page table *anyway*. Basically, to me, the whole "if a lock is so contended that we need to play locking games, then we should look at why we *use* the lock, rather than at the lock itself" is a religion. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/