Puranjay! Puranjay Mohan <puranja...@gmail.com> writes:
> This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V. > This allows each ftrace callsite to provide an ftrace_ops to the common > ftrace trampoline, allowing each callsite to invoke distinct tracer > functions without the need to fall back to list processing or to > allocate custom trampolines for each callsite. This significantly speeds > up cases where multiple distinct trace functions are used and callsites > are mostly traced by a single tracer. > > The idea and most of the implementation is taken from the ARM64's > implementation of the same feature. The idea is to place a pointer to > the ftrace_ops as a literal at a fixed offset from the function entry > point, which can be recovered by the common ftrace trampoline. Not really a review, but some more background; Another rationale (on-top of the improved per-call performance!) for CALL_OPS was to use it to build ftrace direct call support (which BPF uses a lot!). Mark, please correct me if I'm lying here! On Arm64, CALL_OPS makes it possible to implement direct calls, while only patching one BL instruction -- nice! On RISC-V we cannot use use the same ideas as Arm64 straight off, because the range of jal (compare to BL) is simply too short (+/-1M). So, on RISC-V we need to use a full auipc/jal pair (the text patching story is another chapter, but let's leave that aside for now). Since we have to patch multiple instructions, the cmodx situation doesn't really improve with CALL_OPS. Let's say that we continue building on your patch and implement direct calls on CALL_OPS for RISC-V as well. >From Florent's commit message for direct calls: | There are a few cases to distinguish: | - If a direct call ops is the only one tracing a function: | - If the direct called trampoline is within the reach of a BL | instruction | -> the ftrace patchsite jumps to the trampoline | - Else | -> the ftrace patchsite jumps to the ftrace_caller trampoline which | reads the ops pointer in the patchsite and jumps to the direct | call address stored in the ops | - Else | -> the ftrace patchsite jumps to the ftrace_caller trampoline and its | ops literal points to ftrace_list_ops so it iterates over all | registered ftrace ops, including the direct call ops and calls its | call_direct_funcs handler which stores the direct called | trampoline's address in the ftrace_regs and the ftrace_caller | trampoline will return to that address instead of returning to the | traced function On RISC-V, where auipc/jalr is used, the direct called trampoline would always be reachable, and then first Else-clause would never be entered. This means the the performance for direct calls would be the same as the one we have today (i.e. no regression!). RISC-V does like x86 does (-ish) -- patch multiple instructions, long reach. Arm64 uses CALL_OPS and patch one instruction BL. Now, with this background in mind, compared to what we have today, CALL_OPS would give us (again assuming we're using it for direct calls): * Better performance for tracer per-call (faster ops lookup) GOOD * Larger text size (function alignment + extra nops) BAD * Same direct call performance NEUTRAL * Same complicated text patching required NEUTRAL It would be interesting to see how the per-call performance would improve on x86 with CALL_OPS! ;-) I'm trying to wrap my head if it makes sense to have it on RISC-V, given that we're a bit different from Arm64. Does the scale tip to the GOOD side? Oh, and we really need to see performance numbers on real HW! I have a VF2 that I could try this series on. Björn