Re: [PATCH 1/2] perf sched: Introduce timehist command - v2
On 12/2/13, 2:33 AM, Pekka Enberg wrote: I'm seeing rather large "sch delay" numbers on Fedora 19's default kernel that seem like a data error of some sort: 4500.663117 [02] fix_client[7216] 0.007 18446744073708.033 1.786 Does this address the large numbers for you? David diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c index abcb40a44151..ae71782064ac 100644 --- a/tools/perf/builtin-sched.c +++ b/tools/perf/builtin-sched.c @@ -1751,8 +1751,12 @@ static void timehist_update_runtime_stats(struct thread_runtime *r, r->dt_between = 0; if (tprev) { r->dt_run = t - tprev; - if (r->ready_to_run) - r->dt_delay = tprev - r->ready_to_run; + if (r->ready_to_run) { + if (r->ready_to_run > tprev) + pr_debug("time travel: wakeup time for task > previous sched_switch event\n"); + else + r->dt_delay = tprev - r->ready_to_run; + } if (r->last_time > tprev) pr_debug("time travel: last sched out time for task > previous sched_switch event\n");
Re: [PATCH 1/2] perf sched: Introduce timehist command - v2
On 12/2/13, 2:33 AM, Pekka Enberg wrote: On 12/02/2013 02:23 AM, David Ahern wrote: 'perf sched timehist' provides an analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the time between sched-in events for the task, the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name[tid/pid]b/n time sch delay run time - - - - 79371.874569 [11] gcc[31949] 0.014 0.000 1.148 79371.874591 [10] gcc[31951] 0.000 0.000 0.024 79371.874603 [10] migration/10[59] 3.350 0.004 0.011 79371.874604 [11]1.148 0.000 0.035 79371.874723 [05]0.016 0.000 1.383 79371.874746 [05] gcc[31949] 0.153 0.078 0.022 What does "b/n time" mean? Time between schedule-in events. I'm seeing rather large "sch delay" numbers on Fedora 19's default kernel that seem like a data error of some sort: 4500.663117 [02] fix_client[7216] 0.007 18446744073708.033 1.786 hmm it means tprev < t_ready-to-run and both are non-zero. ie., wakeup time is after the time of the schedule event prior to fix_client getting scheduled. Is this a multi-socket box? I can add a sanity check and set to 0 if it happens. Scheduling delays don't seem to be part of "-s" summary output. Why? Recently added feature to show the scheduling delay. At this point I have too many features locally and need to get the basic command into perf before adding more -- like stats for scheduling delay. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] perf sched: Introduce timehist command - v2
On 12/2/13, 12:58 AM, Namhyung Kim wrote: time cpu task name[tid/pid]b/n time sch delay run time - - - - 79371.874569 [11] gcc[31949] 0.014 0.000 1.148 79371.874591 [10] gcc[31951] 0.000 0.000 0.024 79371.874603 [10] migration/10[59] 3.350 0.004 0.011 79371.874604 [11]1.148 0.000 0.035 79371.874723 [05]0.016 0.000 1.383 79371.874746 [05] gcc[31949] 0.153 0.078 0.022 ... Times are in msec.usec. Hmm.. I'm not sure this is right. It probably confuse users since timehist_time_str() still uses "sec.usec" format and it looks not natural for me to use "msec". Yeah, I see perf stat uses "msec.usec" for result of clock events but AFAICT it also shows the unit explicitly. And perf stat -I uses "sec.nsec" format and perf script also uses "sec.usec" format so there's a little consistency here. I think this "msec.usec" format fits well for the scheduling events but in general "sec.usec" format looks better IMHO. Arnaldo / Ingo: any thoughts on the units here? sec.usec versus msec.usec? David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] perf sched: Introduce timehist command - v2
On 12/02/2013 02:23 AM, David Ahern wrote: 'perf sched timehist' provides an analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the time between sched-in events for the task, the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name[tid/pid]b/n time sch delay run time - - - - 79371.874569 [11] gcc[31949] 0.014 0.000 1.148 79371.874591 [10] gcc[31951] 0.000 0.000 0.024 79371.874603 [10] migration/10[59] 3.350 0.004 0.011 79371.874604 [11]1.148 0.000 0.035 79371.874723 [05]0.016 0.000 1.383 79371.874746 [05] gcc[31949] 0.153 0.078 0.022 What does "b/n time" mean? I'm seeing rather large "sch delay" numbers on Fedora 19's default kernel that seem like a data error of some sort: 4500.663117 [02] fix_client[7216] 0.007 18446744073708.033 1.786 Scheduling delays don't seem to be part of "-s" summary output. Why? Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] perf sched: Introduce timehist command - v2
On Sun, 1 Dec 2013 17:23:14 -0700, David Ahern wrote: > 'perf sched timehist' provides an analysis of scheduling events. > > Example usage: > perf sched record -- sleep 1 > perf sched timehist > > By default it shows the individual schedule events, including the time between > sched-in events for the task, the task scheduling delay (time between wakeup > and actually running) and run time for the task: > >time cpu task name[tid/pid]b/n time sch delay run time > - - - - >79371.874569 [11] gcc[31949] 0.014 0.000 1.148 >79371.874591 [10] gcc[31951] 0.000 0.000 0.024 >79371.874603 [10] migration/10[59] 3.350 0.004 0.011 >79371.874604 [11]1.148 0.000 0.035 >79371.874723 [05]0.016 0.000 1.383 >79371.874746 [05] gcc[31949] 0.153 0.078 0.022 > ... > > Times are in msec.usec. Hmm.. I'm not sure this is right. It probably confuse users since timehist_time_str() still uses "sec.usec" format and it looks not natural for me to use "msec". Yeah, I see perf stat uses "msec.usec" for result of clock events but AFAICT it also shows the unit explicitly. And perf stat -I uses "sec.nsec" format and perf script also uses "sec.usec" format so there's a little consistency here. I think this "msec.usec" format fits well for the scheduling events but in general "sec.usec" format looks better IMHO. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] perf sched: Introduce timehist command - v2
'perf sched timehist' provides an analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the time between sched-in events for the task, the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name[tid/pid]b/n time sch delay run time - - - - 79371.874569 [11] gcc[31949] 0.014 0.000 1.148 79371.874591 [10] gcc[31951] 0.000 0.000 0.024 79371.874603 [10] migration/10[59] 3.350 0.004 0.011 79371.874604 [11]1.148 0.000 0.035 79371.874723 [05]0.016 0.000 1.383 79371.874746 [05] gcc[31949] 0.153 0.078 0.022 ... Times are in msec.usec. If callchains were recorded they are appended to the line with a default stack depth of 5: 79371.874569 [11] gcc[31949] 0.14 0.00 0.001148 wait_for_completion_killable do_fork sys_vfork stub_vfork __vfork 79371.874591 [10] gcc[31951] 0.00 0.00 0.24 __cond_resched _cond_resched wait_for_completion stop_one_cpu sched_exec 79371.874603 [10] migration/10[59]0.003350 0.04 0.11 smpboot_thread_fn kthread ret_from_fork 79371.874604 [11] 0.001148 0.00 0.35 cpu_startup_entry start_secondary 79371.874723 [05] 0.16 0.00 0.001383 cpu_startup_entry start_secondary 79371.874746 [05] gcc[31949] 0.000153 0.78 0.22 do_wait sys_wait4 system_call_fastpath __GI___waitpid --no-call-graph can be used to not show the callchains. --max-stack is used to control the number of frames shown (default of 5). -x/--excl options can be used to collapse redundant callchains to get more relevant data on screen. Similar to perf-trace -s and -S can be used to dump a statistical summary without or with events (respectively). Statistics include min run time, average run time and max run time. Stats are also shown for run time by cpu. The cpu-visual option provides a visual aid for sched switches by cpu: ... 79371.874569 [11]s gcc[31949] 0.14 0.00 0.001148 79371.874591 [10] s gcc[31951] 0.00 0.00 0.24 79371.874603 [10] s migration/10[59]0.003350 0.04 0.11 79371.874604 [11]i0.001148 0.00 0.35 79371.874723 [05] i 0.16 0.00 0.001383 79371.874746 [05] sgcc[31949] 0.000153 0.78 0.22 ... Additional options: - -w option can be used to include a dump of wakeup events - -c comm to only display events and stats for the given comm (csv list) - -p/-t to only display events and stats for given pid/tid (csv list) v2: - Addressed Namhyung's comments - changed times to be in msec.usec versus sec.usec to be consistent with other perf commands Signed-off-by: David Ahern Cc: Ingo Molnar Cc: Frederic Weisbecker Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Jiri Olsa Cc: Namhyung Kim Cc: Stephane Eranian Cc: Pekka Enberg --- tools/perf/builtin-sched.c | 932 - tools/perf/perf.h | 3 + 2 files changed, 934 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c index 0f3c65518a2c..abcb40a44151 100644 --- a/tools/perf/builtin-sched.c +++ b/tools/perf/builtin-sched.c @@ -16,6 +16,9 @@ #include "util/debug.h" +#include "util/strlist.h" +#include "util/intlist.h" +#include "asm/bug.h" #include #include @@ -169,8 +172,49 @@ struct perf_sched { u64 cpu_last_switched[MAX_CPUS]; struct rb_root atom_root, sorted_atom_root; struct list_head sort_list, cmp_pid; + + /* options for timehist command */ + boolsummary; + boolsummary_only; + boolshow_callchain; + unsigned intmax_stack; + boolshow_cpu_visual; + boolshow_wakeups; + /* process and task id's of interest */ + struct target target; + struct intlist *pid, *tid; +}; + +/* used in symbol filter */ +static const char *excl_sym_list_str; +static struct strlist *excl_sym_list; + +/* per thread run time data */ +struct thread_runtime { + u64 last_time; /* time of previous sched in/out event */ + u64 dt_run; /* run time */ + u64 dt_between; /* time between CPU access (off cpu) */ + u64 dt_delay; /* time between wakeup and sched-in */ + u64 ready_to_run; /* time of wakeup */ + +