Michal Hocko wrote: > OK, that would suggest that the oom rework patches are not really > related. They just moved from the livelock to a sleep which is good in > general IMHO. We even know that it is most probably the IO that is the > problem because we know that more than half of the reclaimable memory is > either dirty or under writeback. That is where you should be looking. > Why the IO is not making progress or such a slow progress. >
A footnote. Regarding this reproducer, the problem was "anybody can declare OOM and call out_of_memory(). But out_of_memory() does nothing because there is a thread which has TIF_MEMDIE." before the OOM detection rework patches, and the problem is "nobody can declare OOM and call out_of_memory(). Although out_of_memory() will do nothing because there is a thread which has TIF_MEMDIE." after the OOM detection rework patches. Dave Chinner wrote at http://lkml.kernel.org/r/20160211225929.GU14668@dastard : > > Although there are memory allocating tasks passing gfp flags with > > __GFP_KSWAPD_RECLAIM, kswapd is unable to make forward progress because > > it is blocked at down() called from memory reclaim path. And since it is > > legal to block kswapd from memory reclaim path (am I correct?), I think > > we must not assume that current_is_kswapd() check will break the infinite > > loop condition. > > Right, the threads that are blocked in writeback waiting on memory > reclaim will be using GFP_NOFS to prevent recursion deadlocks, but > that does not avoid the problem that kswapd can then get stuck > on those locks, too. Hence there is no guarantee that kswapd can > make reclaim progress if it does dirty page writeback... Unless we address the issue Dave commented, the OOM detection rework patches add a new location of livelock (which is demonstrated by this reproducer) in the memory allocator. It is an unfortunate change that we add a new location of livelock when we are trying to solve thrashing problem.