----- On Sep 26, 2016, at 8:27 AM, Peter Zijlstra [email protected] wrote:
> On Fri, Sep 23, 2016 at 12:49:30PM -0400, Julien Desfossez wrote: >> With this macro, we propose new versions of the sched_switch, sched_waking, >> sched_process_fork and sched_pi_setprio tracepoint probes that contain more >> scheduling information and get rid of the "prio" field. We also add the PI >> information to these tracepoints, so if a process is currently boosted, we >> show >> the name and PID of the top waiter. This allows to quickly see the blocking >> chain even if some of the trace background is missing. > > Urgh.. bigger mess than ever :-( > > So I thought the initial idea was to provide a 'blocked-on' tracepoint, > along with with the 'prio-changed' tracepoint, so you can reconstruct > the entire PI chain. > > The only problem with that was initial state; when you start tracing (or > miss the start of a trace) its hard (impossible) to know what the > current state is. > > But now you send a patch-set that just adds a metric ton of tracepoints. > > This doesn't fix the current mess, it makes it worse :-( There are actually four problems we try to tackle here with this patchset: 1) Missing explicit priority change instrumentation We're covering it by adding a new "sched_update_prio" callsite and user-visible tracepoint. 2) Missing "blocked-on" information for PI We're covering it by adding a new user-visible tracepoint to the sched_pi_setprio callsite. The following fields provide the blocked-on info: top_waiter_comm, top_waiter_pid We chose to add it in a new user-visible tracepoint rather than the current sched_pi_setprio so the new event would not expose the internal "prio" task struct field, which is an internal implementation detail of the scheduler AFAIU, and could go away eventually. We could move those fields to the preexisting sched_pi_setprio event if you prefer, but then we would have to keep the "oldprio" and "newprio" fields forever. I would not call this a "blocked-on" tracepoint, because it is specific to PI. The general "blocking" concept imply blocking on a resource (e.g. waitqueue), and we only know which PID we were waiting for when we are later awakened. In the PI case, we know which PID owns the resource we are blocked on. 3) Missing deadline scheduler instrumentation We understood that exposing "prio" really does not cover the deadline scheduler, as is clearly pointed out in your patchset. We have added deadline scheduler info to a new set of user-visible tracepoints, which are connected to the pre-existing tracepoint callsites in the scheduler. We're therefore not "adding" scheduler tracepoints in the fast-path source-code wise. We have named those alternative versions with a "_prio" suffix. 4) Missing initial state The sched_switch_prio tracepoint deals with the problem of missing initial state: it's a tracepoint that occurs periodically, and we can therefore get the initial state of a running thread when it is scheduled. We chose to present it as an alternative tracepoint from a user POV because we did not want to bloat the current sched_switch event with lots of extra fields when prio information is not needed. Do you recommend that we bring those new extra fields into the pre-existing tracepoints instead, even considering the extra bloat ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com

