On Wed, 3 Feb 2016, Michal Hocko wrote: > Hi, > this thread went mostly quite. Are all the main concerns clarified? > Are there any new concerns? Are there any objections to targeting > this for the next merge window?
Sorry to say at this late date, but I do have one concern: hopefully you can tweak something somewhere, or point me to some tunable that I can adjust (I've not studied the patches, sorry). This rework makes it impossible to run my tmpfs swapping loads: they're soon OOM-killed when they ran forever before, so swapping does not get the exercise on mmotm that it used to. (But I'm not so arrogant as to expect you to optimize for my load!) Maybe it's just that I'm using tmpfs, and there's code that's conscious of file and anon, but doesn't cope properly with the awkward shmem case. (Of course, tmpfs is and always has been a problem for OOM-killing, given that it takes up memory, but none is freed by killing processes: but although that is a tiresome problem, it's not what either of us is attacking here.) Taking many of the irrelevancies out of my load, here's something you could try, first on v4.5-rc5 and then on mmotm. Boot with mem=1G (or boot your usual way, and do something to occupy most of the memory: I think /proc/sys/vm/nr_hugepages provides a great way to gobble up most of the memory, though it's not how I've done it). Make sure you have swap: 2G is more than enough. Copy the v4.5-rc5 kernel source tree into a tmpfs: size=2G is more than enough. make defconfig there, then make -j20. On a v4.5-rc5 kernel that builds fine, on mmotm it is soon OOM-killed. Except that you'll probably need to fiddle around with that j20, it's true for my laptop but not for my workstation. j20 just happens to be what I've had there for years, that I now see breaking down (I can lower to j6 to proceed, perhaps could go a bit higher, but it still doesn't exercise swap very much). This OOM detection rework significantly lowers the number of jobs which can be run in parallel without being OOM-killed. Which would be welcome if it were choosing to abort in place of thrashing, but the system was far from thrashing: j20 took a few seconds more than j6, and even j30 didn't take 50% longer. (I have /proc/sys/vm/swappiness 100, if that matters.) I hope there's an easy answer to this: thanks! Hugh