On Tue, 6 Mar 2018 13:17:37 -0800 Yang Shi <yang....@linux.alibaba.com> wrote:
> > > It just mitigates the hung task warning, can't resolve the mmap_sem > scalability issue. Furthermore, waiting on pure uninterruptible state > for reading /proc sounds unnecessary. It doesn't wait for I/O completion. OK. > > > > Where the heck are we holding mmap_sem for so long? Can that be fixed? > > The mmap_sem is held for unmapping a large map which has every single > page mapped. This is not a issue in real production code. Just found it > by running vm-scalability on a machine with ~600GB memory. > > AFAIK, I don't see any easy fix for the mmap_sem scalability issue. I > saw range locking patches (https://lwn.net/Articles/723648/) were > floating around. But, it may not help too much on the case that a large > map with every single page mapped. Well it sounds fairly simple to mitigate? Simplistically: don't unmap 600G in a single hit; do it 1G at a time, dropping mmap_sem each time. A smarter version might only come up for air if there are mmap_sem waiters and if it has already done some work. I don't think we have any particular atomicity requirements when unmapping?