On Thu, Jul 6, 2017 at 1:16 PM, Mel Gorman <mgor...@techsingularity.net> wrote:
>
> I'm still struggling to see how counters help when an agent that monitors
> for high CPU usage could be activated
>

I suspect Roman has the same problem set as us, the CPU usage is
either always high, high and service critical likely when something
interesting is happening. We'd like to collect data on 200k machines,
and study the results statistically and with respect to time based on
kernel versions, build configs, hardware types, process types, load
patterns, etc, etc. Even finding good candidate machines and at the
right time of day to manually debug with ftrace is problematic.
Granted we could be utilizing existing counters like compact_fail
better. Ultimately the data either leads to dealing with certain bad
actors, different vm tunings, or patches to mm.

Reply via email to