On Tue, Sep 22, 2020 at 8:16 AM Michal Hocko <mho...@suse.com> wrote: > > On Tue 22-09-20 06:37:02, Shakeel Butt wrote: > [...] > > > I would recommend to focus on tracking down the who is blocking the > > > further progress. > > > > I was able to find the CPU next in line for the list_lock from the > > dump. I don't think anyone is blocking the progress as such but more > > like the spinlock in the irq context is starving the spinlock in the > > process context. This is a high traffic machine and there are tens of > > thousands of potential network ACKs on the queue. > > So there is a forward progress but it is too slow to have any reasonable > progress in userspace?
Yes. > > > I talked about this problem with Johannes at LPC 2019 and I think we > > talked about two potential solutions. First was to somehow give memory > > reserves to oomd and second was in-kernel PSI based oom-killer. I am > > not sure the first one will work in this situation but the second one > > might help. > > Why does your oomd depend on memory allocation? > It does not but I think my concern was the potential allocations during syscalls. Anyways, what do you think of the in-kernel PSI based oom-kill trigger. I think Johannes had a prototype as well.