Re: [RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"
On Mon, 8 Mar 2021 16:57:29 +0200 "Yordan Karadzhov (VMware)" wrote: > > If we had: > > > > <...>-37 [004] ...1 2022.303820: gc_worker > > <-process_one_work > > <...>-37 [004] ...1 2022.303820: ___might_sleep > > <-gc_worker > > <...>-37 [004] ...1 2022.303831: ___might_sleep > > <-gc_worker (last ts: 2022.303828 repeats: 127) > > <...>-37 [004] ...1 2022.303831: queue_delayed_work_on > > <-process_one_work > > > > We would know the last time __might_sleep was called. > > > > That is, not only should we save the ip and pip in the trace_func_repeats > > structure, but we should also be storing the last time stamp of the last > > function event that repeated. Otherwise the above looks like the last > > __might_sleep called above happened when the queue_delayed_work_on > > happened, where that may not be the case. > > If we store the last timestamp, this means we will need to use > additional 64b on the buffer, every time we record the "func_repeats" > event. This looks like an overkill to me. > Can we store only the duration of the repeats (the difference between > the timestamp)? This way we can use less memory at the price of having > one extra arithmetic operation. > Alternative approach can be to store only the least-significant bits of > the timestamp. > > What do you think? I like the way you're thinking ;-) Let's take a look at the current event sizes. The function event is defined as: name: function ID: 1 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned long ip; offset:8; size:8; signed:0; field:unsigned long parent_ip; offset:16; size:8; signed:0; Showing that it's total size is 24 bytes (offset:16 + size:8) and your current repeat event has: ID: 17 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned long ip; offset:8; size:8; signed:0; field:unsigned long pip;offset:16; size:8; signed:0; field:unsigned long count; offset:24; size:8; signed:0; Which is 32 bytes. Adding another 8 would make it 40. It's bigger than one function event, but still smaller than two, and adding just 4 bytes is still some size more than a single function event. Since this event is new, we could even do another trick (one that the preemptirq events do: ID: 434 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:s32 caller_offs; offset:8; size:4; signed:1; field:s32 parent_offs; offset:12; size:4; signed:1; print fmt: "caller=%pS parent=%pS", (void *)((unsigned long)(_stext) + REC->caller_offs), (void *)((unsigned long)(_stext) + REC->parent_offs) The (_stext) is defined in /proc/kallsyms: # grep stext /proc/kallsyms 8d00 T _stext And the offsets are off of that, which we could do the same thing here. field:s32 ip; offset:8; size:4; signed:1; field:s32 pip; offset:12; size:4; signed:1; field:s16 count;offset:20; size:2; signed:1; field:u16 top_ts; offset:22; size:2; signed:0; field:u32 bottom_ts;offset:24; size:4; signed:0; By putting count down to 2 bytes. Having more than 64K repeats is probably not going to happen, and if it does, we could inject this more than once ;-) And as all events must be 4 byte aligned, we could use 6 bytes for the offset, giving us: 2^40 bits instead of just 2^32 bits, that is, making the time go from 4 seconds (2^32 nanoseconds) to 18 minutes (2^40 nanoseconds). I would grab the time stamp after saving the event to make sure that it doesn't go backwards. It may be a little racy (if an interrupt comes in between), but it's still a "best effort" approach. -- Steve
Re: [RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"
On 4.03.21 г. 18:38, Steven Rostedt wrote: On Thu, 4 Mar 2021 11:01:37 +0200 "Yordan Karadzhov (VMware)" wrote: Thanks Yordan for doing this! I have some comments below. Hi Steven, Thank you very much for looking into this! Your suggestion makes perfect sense. I only have one clarifying question below. diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h index 4547ac59da61..8007f9b6417f 100644 --- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -338,3 +338,19 @@ FTRACE_ENTRY(hwlat, hwlat_entry, __entry->nmi_total_ts, __entry->nmi_count) ); + +FTRACE_ENTRY(func_repeats, func_repeats_entry, + + TRACE_FUNC_REPEATS, + + F_STRUCT( + __field(unsigned long, ip ) + __field(unsigned long, pip ) + __field(unsigned long, count ) + ), + + F_printk(" %ps <-%ps\t(repeats:%lu)", +(void *)__entry->ip, +(void *)__entry->pip, +__entry->count) After playing with this a little, I realized that we should also store the last timestamp as well. I think that would be interesting information. <...>-37 [004] ...1 2022.303820: gc_worker <-process_one_work <...>-37 [004] ...1 2022.303820: ___might_sleep <-gc_worker <...>-37 [004] ...1 2022.303831: ___might_sleep <-gc_worker (repeats: 127) <...>-37 [004] ...1 2022.303831: queue_delayed_work_on <-process_one_work The above shows that __might_sleep() was called 128 times, but what I don't get from the above, is when that last call was made. You'll see that the timestamp for the repeat output is the same as the next function shown (queue_delayed_work_on()). But the timestamp for the last call to __might_sleep() is lost, and the repeat event ends up being written when it is detected that there are no more repeats. If we had: <...>-37 [004] ...1 2022.303820: gc_worker <-process_one_work <...>-37 [004] ...1 2022.303820: ___might_sleep <-gc_worker <...>-37 [004] ...1 2022.303831: ___might_sleep <-gc_worker (last ts: 2022.303828 repeats: 127) <...>-37 [004] ...1 2022.303831: queue_delayed_work_on <-process_one_work We would know the last time __might_sleep was called. That is, not only should we save the ip and pip in the trace_func_repeats structure, but we should also be storing the last time stamp of the last function event that repeated. Otherwise the above looks like the last __might_sleep called above happened when the queue_delayed_work_on happened, where that may not be the case. If we store the last timestamp, this means we will need to use additional 64b on the buffer, every time we record the "func_repeats" event. This looks like an overkill to me. Can we store only the duration of the repeats (the difference between the timestamp)? This way we can use less memory at the price of having one extra arithmetic operation. Alternative approach can be to store only the least-significant bits of the timestamp. What do you think? Best regards, Yordan -- Steve
Re: [RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"
On Thu, 4 Mar 2021 11:01:37 +0200 "Yordan Karadzhov (VMware)" wrote: Thanks Yordan for doing this! I have some comments below. > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h > index 4547ac59da61..8007f9b6417f 100644 > --- a/kernel/trace/trace_entries.h > +++ b/kernel/trace/trace_entries.h > @@ -338,3 +338,19 @@ FTRACE_ENTRY(hwlat, hwlat_entry, >__entry->nmi_total_ts, >__entry->nmi_count) > ); > + > +FTRACE_ENTRY(func_repeats, func_repeats_entry, > + > + TRACE_FUNC_REPEATS, > + > + F_STRUCT( > + __field(unsigned long, ip ) > + __field(unsigned long, pip ) > + __field(unsigned long, count ) > + ), > + > + F_printk(" %ps <-%ps\t(repeats:%lu)", > + (void *)__entry->ip, > + (void *)__entry->pip, > + __entry->count) After playing with this a little, I realized that we should also store the last timestamp as well. I think that would be interesting information. <...>-37 [004] ...1 2022.303820: gc_worker <-process_one_work <...>-37 [004] ...1 2022.303820: ___might_sleep <-gc_worker <...>-37 [004] ...1 2022.303831: ___might_sleep <-gc_worker (repeats: 127) <...>-37 [004] ...1 2022.303831: queue_delayed_work_on <-process_one_work The above shows that __might_sleep() was called 128 times, but what I don't get from the above, is when that last call was made. You'll see that the timestamp for the repeat output is the same as the next function shown (queue_delayed_work_on()). But the timestamp for the last call to __might_sleep() is lost, and the repeat event ends up being written when it is detected that there are no more repeats. If we had: <...>-37 [004] ...1 2022.303820: gc_worker <-process_one_work <...>-37 [004] ...1 2022.303820: ___might_sleep <-gc_worker <...>-37 [004] ...1 2022.303831: ___might_sleep <-gc_worker (last ts: 2022.303828 repeats: 127) <...>-37 [004] ...1 2022.303831: queue_delayed_work_on <-process_one_work We would know the last time __might_sleep was called. That is, not only should we save the ip and pip in the trace_func_repeats structure, but we should also be storing the last time stamp of the last function event that repeated. Otherwise the above looks like the last __might_sleep called above happened when the queue_delayed_work_on happened, where that may not be the case. -- Steve
[RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"
The event aims to consolidate the function tracing record in the cases when a single function is called number of times consecutively. while (cond) do_func(); This may happen in various scenarios (busy waiting for example). The new ftrace event can be used to show repeated function events with a single event and save space on the ring buffer Signed-off-by: Yordan Karadzhov (VMware) --- kernel/trace/trace.h | 3 +++ kernel/trace/trace_entries.h | 16 + kernel/trace/trace_output.c | 44 3 files changed, 63 insertions(+) diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index a6446c03cfbc..2be4a56879de 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -45,6 +45,7 @@ enum trace_type { TRACE_BPUTS, TRACE_HWLAT, TRACE_RAW_DATA, + TRACE_FUNC_REPEATS, __TRACE_LAST_TYPE, }; @@ -441,6 +442,8 @@ extern void __ftrace_bad_type(void); TRACE_GRAPH_ENT); \ IF_ASSIGN(var, ent, struct ftrace_graph_ret_entry, \ TRACE_GRAPH_RET); \ + IF_ASSIGN(var, ent, struct func_repeats_entry, \ + TRACE_FUNC_REPEATS); \ __ftrace_bad_type();\ } while (0) diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h index 4547ac59da61..8007f9b6417f 100644 --- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -338,3 +338,19 @@ FTRACE_ENTRY(hwlat, hwlat_entry, __entry->nmi_total_ts, __entry->nmi_count) ); + +FTRACE_ENTRY(func_repeats, func_repeats_entry, + + TRACE_FUNC_REPEATS, + + F_STRUCT( + __field(unsigned long, ip ) + __field(unsigned long, pip ) + __field(unsigned long, count ) + ), + + F_printk(" %ps <-%ps\t(repeats:%lu)", +(void *)__entry->ip, +(void *)__entry->pip, +__entry->count) +); diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 61255bad7e01..af6b066972e9 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -1373,6 +1373,49 @@ static struct trace_event trace_raw_data_event = { .funcs = _raw_data_funcs, }; +static enum print_line_t +trace_func_repeats_raw(struct trace_iterator *iter, int flags, +struct trace_event *event) +{ + struct func_repeats_entry *field; + struct trace_seq *s = >seq; + + trace_assign_type(field, iter->ent); + + trace_seq_printf(s, "%lu %lu %li\n", +field->pip, +field->ip, +field->count); + + return trace_handle_return(s); +} + +static enum print_line_t +trace_func_repeats_print(struct trace_iterator *iter, int flags, +struct trace_event *event) +{ + struct func_repeats_entry *field; + struct trace_seq *s = >seq; + + trace_assign_type(field, iter->ent); + + seq_print_ip_sym(s, field->ip, flags); + trace_seq_puts(s, " <-"); + seq_print_ip_sym(s, field->pip, flags); + trace_seq_printf(s, " (repeats: %li)\n", field->count); + + return trace_handle_return(s); +} + +static struct trace_event_functions trace_func_repeats_funcs = { + .trace = trace_func_repeats_print, + .raw= trace_func_repeats_raw, +}; + +static struct trace_event trace_func_repeats_event = { + .type = TRACE_FUNC_REPEATS, + .funcs = _func_repeats_funcs, +}; static struct trace_event *events[] __initdata = { _fn_event, @@ -1385,6 +1428,7 @@ static struct trace_event *events[] __initdata = { _print_event, _hwlat_event, _raw_data_event, + _func_repeats_event, NULL }; -- 2.25.1