On Fri 2018-04-27 19:22:45, Sergey Senozhatsky wrote: > On (04/26/18 11:42), Petr Mladek wrote: > [..] > > Honestly, I do not believe that console drivers are like Scheherazade. > > They are not able to make up long interesting stories. Let's say that > > lockdep splat has more than 100 lines but it can happen only once. > > Let's say that WARNs have about 40 lines. I somehow doubt that we > > could ever see 10 different WARN calls from one con->write() call. > > The problem here is that it takes a human being with IQ to tell what's > repetitive, what's useless and what's not. > > vprintk(...) > { > if (!__ratelimit()) > return; > } > > has zero IQ to make such decisions.
You make it too complicated. Also it seems that you repeatedly hide the fact that con->write() context is recursive. Just try to add printk() into call_console_drivers() and see what happens. IMHO, if con->write() wants to add more than 1000 (or 100 or whatever sane limit) new lines then something is really wrong and we should stop it. It is that simple. > > > But we first need a real reason. Right now it looks to me like > > > we have "a solution" to a problem which we have never witnessed. > > > > I am trying to find a "simple" and generic solution for the problem > > reported by Tejun: > [..] > > 1. Console is IPMI emulated serial console. Super slow. Also > > netconsole is in use. > > 2. System runs out of memory, OOM triggers. > > 3. OOM handler is printing out OOM debug info. > > 4. While trying to emit the messages for netconsole, the network stack > > / driver tries to allocate memory and then fail, which in turn > > triggers allocation failure or other warning messages. printk was > > already flushing, so the messages are queued on the ring. > > 5. OOM handler keeps flushing but 4 repeats and the queue is never > > shrinking. Because OOM handler is trapped in printk flushing, it > > never manages to free memory and no one else can enter OOM path > > either, so the system is trapped in this state. > > </paste> IMHO, we do not need to chase down this particular problem. It was already "solved" by the commit 400e22499dd92613821 ("mm: don't warn about allocations which stall for too long"). It was just an example. I wanted to make con->write() generally safe. I thought that the problem (recursion) was clear enough. > Yes, and that's why I want to take a look at the logs/backtraces. If you want more cases to analyze, fair enough. I do not have any at hands. It is not an urgent issue for me and I am not going to spend more time on this. Best Regards, Petr