On Mon, 8 Mar 2021 16:57:29 +0200
"Yordan Karadzhov (VMware)" <[email protected]> wrote:

> > If we had:
> > 
> >             <...>-37      [004] ...1  2022.303820: gc_worker 
> > <-process_one_work
> >             <...>-37      [004] ...1  2022.303820: ___might_sleep 
> > <-gc_worker
> >             <...>-37      [004] ...1  2022.303831: ___might_sleep 
> > <-gc_worker (last ts: 2022.303828 repeats: 127)
> >             <...>-37      [004] ...1  2022.303831: queue_delayed_work_on 
> > <-process_one_work
> > 
> > We would know the last time __might_sleep was called.
> > 
> > That is, not only should we save the ip and pip in the trace_func_repeats
> > structure, but we should also be storing the last time stamp of the last
> > function event that repeated. Otherwise the above looks like the last
> > __might_sleep called above happened when the queue_delayed_work_on
> > happened, where that may not be the case.  
> 
> If we store the last timestamp, this means we will need to use 
> additional 64b on the buffer, every time we record the "func_repeats" 
> event. This looks like an overkill to me.
> Can we store only the duration of the repeats (the difference between 
> the timestamp)? This way we can use less memory at the price of having 
> one extra arithmetic operation.
> Alternative approach can be to store only the least-significant bits of 
> the timestamp.
> 
> What do you think?

I like the way you're thinking ;-)

Let's take a look at the current event sizes.

The function event is defined as:

name: function
ID: 1
format:
        field:unsigned short common_type;       offset:0;       size:2; 
signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; 
signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; 
signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:unsigned long ip; offset:8;       size:8; signed:0;
        field:unsigned long parent_ip;  offset:16;      size:8; signed:0;

Showing that it's total size is 24 bytes (offset:16 + size:8)

and your current repeat event has:

ID: 17
format:
        field:unsigned short common_type;       offset:0;       size:2; 
signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; 
signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; 
signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:unsigned long ip; offset:8;       size:8; signed:0;
        field:unsigned long pip;        offset:16;      size:8; signed:0;
        field:unsigned long count;      offset:24;      size:8; signed:0;

Which is 32 bytes. Adding another 8 would make it 40. It's bigger than one
function event, but still smaller than two, and adding just 4 bytes is
still some size more than a single function event.

Since this event is new, we could even do another trick (one that the
preemptirq events do:

ID: 434
format:
        field:unsigned short common_type;       offset:0;       size:2; 
signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; 
signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; 
signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:s32 caller_offs;  offset:8;       size:4; signed:1;
        field:s32 parent_offs;  offset:12;      size:4; signed:1;

print fmt: "caller=%pS parent=%pS", (void *)((unsigned long)(_stext) + 
REC->caller_offs), (void *)((unsigned long)(_stext) + REC->parent_offs)

The (_stext) is defined in /proc/kallsyms:

# grep stext /proc/kallsyms 
ffffffff8d000000 T _stext

And the offsets are off of that, which we could do the same thing here.

        field:s32 ip;   offset:8;       size:4; signed:1;
        field:s32 pip;  offset:12;      size:4; signed:1;
        field:s16 count;        offset:20;      size:2; signed:1;
        field:u16 top_ts;       offset:22;      size:2; signed:0;
        field:u32 bottom_ts;    offset:24;      size:4; signed:0;

By putting count down to 2 bytes. Having more than 64K repeats is
probably not going to happen, and if it does, we could inject this more
than once ;-)

And as all events must be 4 byte aligned, we could use 6 bytes for the
offset, giving us: 2^40 bits instead of just 2^32 bits, that is, making the
time go from 4 seconds (2^32 nanoseconds) to 18 minutes (2^40 nanoseconds).

I would grab the time stamp after saving the event to make sure that it
doesn't go backwards. It may be a little racy (if an interrupt comes in
between), but it's still a "best effort" approach.

-- Steve

Reply via email to