Andy,

Pulling out the A option:

>> > A) Use auipc/jalr, only patch jalr to take us to a common
>> >    dispatcher/trampoline
>> >
>> >  | <func_trace_target_data_8B> # probably on a data cache-line != func 
>> > .text to avoid ping-pong
>> >  | ...
>> >  | func:
>> >  |   ...make sure ra isn't messed up...
>> >  |   aupic
>> >  |   nop <=> jalr # Text patch point -> common_dispatch
>> >  |   ACTUAL_FUNC
>> >  |
>> >  | common_dispatch:
>> >  |   load <func_trace_target_data_8B> based on ra
>> >  |   jalr
>> >  |   ...
>> >
>> > The auipc is never touched, and will be overhead. Also, we need a mv to
>> > store ra in a scratch register as well -- like Arm. We'll have two insn
>> > per-caller overhead for a disabled caller.
>
> My patch series takes a similar "in-function dispatch" approach. A
> difference is that the <func_trace_target_data_8B_per_function> is
> embedded within each function entry. I'd like to have it moved to a
> run-time allocated array to reduce total text size.

This is what arm64 has as well. It's a 8B + 1-2 dirt cheap movish like
instructions (save ra, prepare jump with auipc). I think that's a
reasonable overhead.

> Another difference is that my series changes the first instruction to
> "j ACTUAL_FUNC" for the "ftrace disable" case. As long as the
> architecture guarantees the atomicity of the first instruction, then
> we are safe. For example, we are safe if the first instruction could
> only be "mv tmp, ra" or "j ACTUAL_FUNC". And since the loaded address is
> always valid, we can fix "mv + jalr" down so we don't have to
> play with the exception handler trick. The guarantee from arch would
> require ziccif (in RVA22) though, but I think it is the same for us
> (unless with stop_machine). For ziccif, I would rather call that out
> during boot than blindly assume.

I'm maybe biased, but I'd prefer the A) over your version with the
unconditional jump. A) has the overhead of two, I'd say, free
instructions (again "Meten is Weten!" ;-)).

> However, one thing I am not very sure is: do we need a destination
> address in a "per-function" manner? It seems like most of the time the
> destination address can only be ftrace_call, or ftrace_regs_call. If
> the number of destination addresses is very few, then we could
> potentially reduce the size of
> <func_trace_target_data_8B_per_function>.

Yes, we do need a per-function manner. BPF, e.g., uses
dynamically/JIT:ed trampolines/targets.



Björn

Reply via email to