On Sun, 12 Apr 2015, ZhangNeil wrote: > > Sorry, this is just way too verbose. This output is emitted to the kernel > > log on oom kill and since we lack a notification mechanism on system oom, > > the _only_ way for userspace to detect oom kills that have occurred is by > > scraping the kernel log. This is exactly what we do, and we have missed > > oom kill events because they scroll from the ring buffer due to excessive > > output such as this, which is why output was limited with the > > show_free_areas() filter in the first place. Just because oom kill output > > is much less than it has been in the past, for precisely this reason, > > doesn't mean we can make it excessive again. > > > > Just like you said, OOM kill is much less than before, but we still need to > analyze it when > it happens on a mobile device. It can give more detailed info for us when > debugging. >
There is a very large amount of data that would be true for, and we simply can't make oom killing more verbose in the kernel log because it is the _only_ mechanism we have to determine that the kernel killed a user process and what that process was. You could make the same argument for dumping all of /proc/slabinfo, which people have proposed before and it's been nacked, to discover slab leaks. We simply can't make it more verbose, it's that easy. > Besides OOM kill, we also can check the memory usages in runtime by echo 'm' > to sysRq. > It can help us to find out code defect sometimes, for example, we even found > that the NR_FREE_CMA > memory was not align with the total CMA pages in the free list showed by this > patch. > Sysrq is an entirely different usecase and the natural response would be to export this information for sysrq but not oom kill, but in this case there is no compelling reason to dump it in the ring buffer in the first place: it should be in procfs where it can easily be read and parsed.