Re: [PATCH 1/2] perf sched: Introduce timehist command - v2

2013-12-02 Thread David Ahern

On 12/2/13, 2:33 AM, Pekka Enberg wrote:

I'm seeing rather large "sch delay" numbers on Fedora 19's default
kernel that seem like a data error of some sort:

 4500.663117 [02] fix_client[7216] 0.007
18446744073708.033 1.786



Does this address the large numbers for you?

David
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index abcb40a44151..ae71782064ac 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1751,8 +1751,12 @@ static void timehist_update_runtime_stats(struct 
thread_runtime *r,
r->dt_between = 0;
if (tprev) {
r->dt_run = t - tprev;
-   if (r->ready_to_run)
-   r->dt_delay = tprev - r->ready_to_run;
+   if (r->ready_to_run) {
+   if (r->ready_to_run > tprev)
+   pr_debug("time travel: wakeup time for task > 
previous sched_switch event\n");
+   else
+   r->dt_delay = tprev - r->ready_to_run;
+   }
 
if (r->last_time > tprev)
pr_debug("time travel: last sched out time for task > 
previous sched_switch event\n");


Re: [PATCH 1/2] perf sched: Introduce timehist command - v2

2013-12-02 Thread David Ahern

On 12/2/13, 2:33 AM, Pekka Enberg wrote:

On 12/02/2013 02:23 AM, David Ahern wrote:

'perf sched timehist' provides an analysis of scheduling events.

Example usage:
 perf sched record -- sleep 1
 perf sched timehist

By default it shows the individual schedule events, including the time
between
sched-in events for the task, the task scheduling delay (time between
wakeup
and actually running) and run time for the task:

time cpu  task name[tid/pid]b/n time sch delay  run time
   -   - - -
79371.874569 [11] gcc[31949]   0.014 0.000 1.148
79371.874591 [10] gcc[31951]   0.000 0.000 0.024
79371.874603 [10] migration/10[59] 3.350 0.004 0.011
79371.874604 [11]1.148 0.000 0.035
79371.874723 [05]0.016 0.000 1.383
79371.874746 [05] gcc[31949]   0.153 0.078 0.022


What does "b/n time" mean?


Time between schedule-in events.



I'm seeing rather large "sch delay" numbers on Fedora 19's default
kernel that seem like a data error of some sort:

 4500.663117 [02] fix_client[7216] 0.007
18446744073708.033 1.786


hmm it means tprev < t_ready-to-run and both are non-zero. ie., 
wakeup time is after the time of the schedule event prior to fix_client 
getting scheduled. Is this a multi-socket box? I can add a sanity check 
and set to 0 if it happens.




Scheduling delays don't seem to be part of "-s" summary output. Why?


Recently added feature to show the scheduling delay. At this point I 
have too many features locally and need to get the basic command into 
perf before adding more -- like stats for scheduling delay.


David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] perf sched: Introduce timehist command - v2

2013-12-02 Thread David Ahern

On 12/2/13, 12:58 AM, Namhyung Kim wrote:

time cpu  task name[tid/pid]b/n time sch delay  run time
   -   - - -
79371.874569 [11] gcc[31949]   0.014 0.000 1.148
79371.874591 [10] gcc[31951]   0.000 0.000 0.024
79371.874603 [10] migration/10[59] 3.350 0.004 0.011
79371.874604 [11]1.148 0.000 0.035
79371.874723 [05]0.016 0.000 1.383
79371.874746 [05] gcc[31949]   0.153 0.078 0.022
...

Times are in msec.usec.


Hmm.. I'm not sure this is right.  It probably confuse users since
timehist_time_str() still uses "sec.usec" format and it looks not
natural for me to use "msec".

Yeah, I see perf stat uses "msec.usec" for result of clock events but
AFAICT it also shows the unit explicitly.  And perf stat -I uses
"sec.nsec" format and perf script also uses "sec.usec" format so there's
a little consistency here.

I think this "msec.usec" format fits well for the scheduling events but
in general "sec.usec" format looks better IMHO.


Arnaldo / Ingo: any thoughts on the units here? sec.usec versus msec.usec?

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] perf sched: Introduce timehist command - v2

2013-12-02 Thread Pekka Enberg

On 12/02/2013 02:23 AM, David Ahern wrote:

'perf sched timehist' provides an analysis of scheduling events.

Example usage:
 perf sched record -- sleep 1
 perf sched timehist

By default it shows the individual schedule events, including the time between
sched-in events for the task, the task scheduling delay (time between wakeup
and actually running) and run time for the task:

time cpu  task name[tid/pid]b/n time sch delay  run time
   -   - - -
79371.874569 [11] gcc[31949]   0.014 0.000 1.148
79371.874591 [10] gcc[31951]   0.000 0.000 0.024
79371.874603 [10] migration/10[59] 3.350 0.004 0.011
79371.874604 [11]1.148 0.000 0.035
79371.874723 [05]0.016 0.000 1.383
79371.874746 [05] gcc[31949]   0.153 0.078 0.022


What does "b/n time" mean?

I'm seeing rather large "sch delay" numbers on Fedora 19's default 
kernel that seem like a data error of some sort:


4500.663117 [02] fix_client[7216] 0.007 
18446744073708.033 1.786


Scheduling delays don't seem to be part of "-s" summary output. Why?

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] perf sched: Introduce timehist command - v2

2013-12-01 Thread Namhyung Kim
On Sun,  1 Dec 2013 17:23:14 -0700, David Ahern wrote:
> 'perf sched timehist' provides an analysis of scheduling events.
>
> Example usage:
> perf sched record -- sleep 1
> perf sched timehist
>
> By default it shows the individual schedule events, including the time between
> sched-in events for the task, the task scheduling delay (time between wakeup
> and actually running) and run time for the task:
>
>time cpu  task name[tid/pid]b/n time sch delay  run time
>   -   - - -
>79371.874569 [11] gcc[31949]   0.014 0.000 1.148
>79371.874591 [10] gcc[31951]   0.000 0.000 0.024
>79371.874603 [10] migration/10[59] 3.350 0.004 0.011
>79371.874604 [11]1.148 0.000 0.035
>79371.874723 [05]0.016 0.000 1.383
>79371.874746 [05] gcc[31949]   0.153 0.078 0.022
> ...
>
> Times are in msec.usec.

Hmm.. I'm not sure this is right.  It probably confuse users since
timehist_time_str() still uses "sec.usec" format and it looks not
natural for me to use "msec".

Yeah, I see perf stat uses "msec.usec" for result of clock events but
AFAICT it also shows the unit explicitly.  And perf stat -I uses
"sec.nsec" format and perf script also uses "sec.usec" format so there's
a little consistency here.

I think this "msec.usec" format fits well for the scheduling events but
in general "sec.usec" format looks better IMHO.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] perf sched: Introduce timehist command - v2

2013-12-01 Thread David Ahern
'perf sched timehist' provides an analysis of scheduling events.

Example usage:
perf sched record -- sleep 1
perf sched timehist

By default it shows the individual schedule events, including the time between
sched-in events for the task, the task scheduling delay (time between wakeup
and actually running) and run time for the task:

   time cpu  task name[tid/pid]b/n time sch delay  run time
  -   - - -
   79371.874569 [11] gcc[31949]   0.014 0.000 1.148
   79371.874591 [10] gcc[31951]   0.000 0.000 0.024
   79371.874603 [10] migration/10[59] 3.350 0.004 0.011
   79371.874604 [11]1.148 0.000 0.035
   79371.874723 [05]0.016 0.000 1.383
   79371.874746 [05] gcc[31949]   0.153 0.078 0.022
...

Times are in msec.usec.

If callchains were recorded they are appended to the line with a default stack 
depth of 5:

   79371.874569 [11] gcc[31949]  0.14  0.00  0.001148  
wait_for_completion_killable do_fork sys_vfork stub_vfork __vfork
   79371.874591 [10] gcc[31951]  0.00  0.00  0.24  
__cond_resched _cond_resched wait_for_completion stop_one_cpu sched_exec
   79371.874603 [10] migration/10[59]0.003350  0.04  0.11  
smpboot_thread_fn kthread ret_from_fork
   79371.874604 [11]   0.001148  0.00  0.35  
cpu_startup_entry start_secondary
   79371.874723 [05]   0.16  0.00  0.001383  
cpu_startup_entry start_secondary
   79371.874746 [05] gcc[31949]  0.000153  0.78  0.22  
do_wait sys_wait4 system_call_fastpath __GI___waitpid

--no-call-graph can be used to not show the callchains. --max-stack is used
to control the number of frames shown (default of 5). -x/--excl options can
be used to collapse redundant callchains to get more relevant data on screen.

Similar to perf-trace -s and -S can be used to dump a statistical summary
without or with events (respectively). Statistics include min run time,
average run time and max run time. Stats are also shown for run time by
cpu.

The cpu-visual option provides a visual aid for sched switches by cpu:
...
   79371.874569 [11]s  gcc[31949]  0.14  
0.00  0.001148
   79371.874591 [10]   s   gcc[31951]  0.00  
0.00  0.24
   79371.874603 [10]   s   migration/10[59]0.003350  
0.04  0.11
   79371.874604 [11]i0.001148  
0.00  0.35
   79371.874723 [05]  i  0.16  
0.00  0.001383
   79371.874746 [05]  sgcc[31949]  0.000153  
0.78  0.22
...

Additional options:
- -w option can be used to include a dump of wakeup events
- -c comm to only display events and stats for the given comm (csv list)
- -p/-t to only display events and stats for given pid/tid (csv list)

v2:
- Addressed Namhyung's comments
- changed times to be in msec.usec versus sec.usec to be consistent with
  other perf commands

Signed-off-by: David Ahern 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Mike Galbraith 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
Cc: Pekka Enberg 
---
 tools/perf/builtin-sched.c | 932 -
 tools/perf/perf.h  |   3 +
 2 files changed, 934 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 0f3c65518a2c..abcb40a44151 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -16,6 +16,9 @@
 
 #include "util/debug.h"
 
+#include "util/strlist.h"
+#include "util/intlist.h"
+#include "asm/bug.h"
 #include 
 #include 
 
@@ -169,8 +172,49 @@ struct perf_sched {
u64  cpu_last_switched[MAX_CPUS];
struct rb_root   atom_root, sorted_atom_root;
struct list_head sort_list, cmp_pid;
+
+   /* options for timehist command */
+   boolsummary;
+   boolsummary_only;
+   boolshow_callchain;
+   unsigned intmax_stack;
+   boolshow_cpu_visual;
+   boolshow_wakeups;
+   /* process and task id's of interest */
+   struct target target;
+   struct intlist  *pid, *tid;
+};
+
+/* used in symbol filter */
+static const char  *excl_sym_list_str;
+static struct strlist  *excl_sym_list;
+
+/* per thread run time data */
+struct thread_runtime {
+   u64 last_time;  /* time of previous sched in/out event */
+   u64 dt_run; /* run time */
+   u64 dt_between; /* time between CPU access (off cpu) */
+   u64 dt_delay;   /* time between wakeup and sched-in */
+   u64 ready_to_run;   /* time of wakeup */
+
+