Andy Chiu <andy.c...@sifive.com> writes: > On Thu, Mar 21, 2024 at 4:48 PM Björn Töpel <bj...@kernel.org> wrote: >> >> Andy, >> >> Pulling out the A option: >> >> >> > A) Use auipc/jalr, only patch jalr to take us to a common >> >> > dispatcher/trampoline >> >> > >> >> > | <func_trace_target_data_8B> # probably on a data cache-line != func >> >> > .text to avoid ping-pong >> >> > | ... >> >> > | func: >> >> > | ...make sure ra isn't messed up... >> >> > | aupic >> >> > | nop <=> jalr # Text patch point -> common_dispatch >> >> > | ACTUAL_FUNC >> >> > | >> >> > | common_dispatch: >> >> > | load <func_trace_target_data_8B> based on ra >> >> > | jalr >> >> > | ... >> >> > >> >> > The auipc is never touched, and will be overhead. Also, we need a mv to >> >> > store ra in a scratch register as well -- like Arm. We'll have two insn >> >> > per-caller overhead for a disabled caller. >> > >> > My patch series takes a similar "in-function dispatch" approach. A >> > difference is that the <func_trace_target_data_8B_per_function> is >> > embedded within each function entry. I'd like to have it moved to a >> > run-time allocated array to reduce total text size. >> >> This is what arm64 has as well. It's a 8B + 1-2 dirt cheap movish like >> instructions (save ra, prepare jump with auipc). I think that's a >> reasonable overhead. >> >> > Another difference is that my series changes the first instruction to >> > "j ACTUAL_FUNC" for the "ftrace disable" case. As long as the >> > architecture guarantees the atomicity of the first instruction, then >> > we are safe. For example, we are safe if the first instruction could >> > only be "mv tmp, ra" or "j ACTUAL_FUNC". And since the loaded address is >> > always valid, we can fix "mv + jalr" down so we don't have to >> > play with the exception handler trick. The guarantee from arch would >> > require ziccif (in RVA22) though, but I think it is the same for us >> > (unless with stop_machine). For ziccif, I would rather call that out >> > during boot than blindly assume. >> >> I'm maybe biased, but I'd prefer the A) over your version with the >> unconditional jump. A) has the overhead of two, I'd say, free >> instructions (again "Meten is Weten!" ;-)). > > Yes, I'd also prefer A for less overall patch size. We can also > optimize the overhead with a direct jump if that makes sense. Though, > we need to sort out a way to map functions to corresponding > trampolines. A direct way I could image is CALL_OPS'ish patching > style, if the ftrace destination has to be patched in a per-function > manner. For example: > > <index-in-dispatch-list> > func_symbol: > auipc t0, common_dispatch:high <=> j actual_func: > jalr t0, common_dispatch:low(t0) > > common_dispatch: > load t1, index + dispatch-list > ld t1, 0(t1) > jr t1
Yup, exactly like that (but I'd put the acutal target ptr in there, instead of an additional indirection. Exactly what Mark does for arm64). When we enter the common_dispatch, the ptr <index-in-dispatch-list> would be -12(t0). As for patching auipc or jalr, I guess we need to measure what's best. My knee-jerk would be always auipc is better than jump -- but let's measure. ;-) Björn