Hey, On Tue, Apr 08, 2025 at 02:08:34AM +0800, Andy Chiu wrote: > From: Puranjay Mohan <[email protected]> > > This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V. > This allows each ftrace callsite to provide an ftrace_ops to the common > ftrace trampoline, allowing each callsite to invoke distinct tracer > functions without the need to fall back to list processing or to > allocate custom trampolines for each callsite. This significantly speeds > up cases where multiple distinct trace functions are used and callsites > are mostly traced by a single tracer. > > The idea and most of the implementation is taken from the ARM64's > implementation of the same feature. The idea is to place a pointer to > the ftrace_ops as a literal at a fixed offset from the function entry > point, which can be recovered by the common ftrace trampoline. > > We use -fpatchable-function-entry to reserve 8 bytes above the function > entry by emitting 2 4 byte or 4 2 byte nops depending on the presence of > CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer > to the associated ftrace_ops for that callsite. Functions are aligned to > 8 bytes to make sure that the accesses to this literal are atomic. > > This approach allows for directly invoking ftrace_ops::func even for > ftrace_ops which are dynamically-allocated (or part of a module), > without going via ftrace_ops_list_func. > > We've benchamrked this with the ftrace_ops sample module on Spacemit K1 > Jupiter: > > Without this patch: > > baseline (Linux rivos 6.14.0-09584-g7d06015d936c #3 SMP Sat Mar 29 > +-----------------------+-----------------+----------------------------+ > | Number of tracers | Total time (ns) | Per-call average time | > |-----------------------+-----------------+----------------------------| > | Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) | > |----------+------------+-----------------+------------+---------------| > | 0 | 0 | 1357958 | 13 | - | > | 0 | 1 | 1302375 | 13 | - | > | 0 | 2 | 1302375 | 13 | - | > | 0 | 10 | 1379084 | 13 | - | > | 0 | 100 | 1302458 | 13 | - | > | 0 | 200 | 1302333 | 13 | - | > |----------+------------+-----------------+------------+---------------| > | 1 | 0 | 13677833 | 136 | 123 | > | 1 | 1 | 18500916 | 185 | 172 | > | 1 | 2 | 22856459 | 228 | 215 | > | 1 | 10 | 58824709 | 588 | 575 | > | 1 | 100 | 505141584 | 5051 | 5038 | > | 1 | 200 | 1580473126 | 15804 | 15791 | > |----------+------------+-----------------+------------+---------------| > | 1 | 0 | 13561000 | 135 | 122 | > | 2 | 0 | 19707292 | 197 | 184 | > | 10 | 0 | 67774750 | 677 | 664 | > | 100 | 0 | 714123125 | 7141 | 7128 | > | 200 | 0 | 1918065668 | 19180 | 19167 | > +----------+------------+-----------------+------------+---------------+ > > Note: per-call overhead is estimated relative to the baseline case with > 0 relevant tracers and 0 irrelevant tracers. > > With this patch: > > v4-rc4 (Linux rivos 6.14.0-09598-gd75747611c93 #4 SMP Sat Mar 29 > +-----------------------+-----------------+----------------------------+ > | Number of tracers | Total time (ns) | Per-call average time | > |-----------------------+-----------------+----------------------------| > | Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) | > |----------+------------+-----------------+------------+---------------| > | 0 | 0 | 1459917 | 14 | - | > | 0 | 1 | 1408000 | 14 | - | > | 0 | 2 | 1383792 | 13 | - | > | 0 | 10 | 1430709 | 14 | - | > | 0 | 100 | 1383791 | 13 | - | > | 0 | 200 | 1383750 | 13 | - | > |----------+------------+-----------------+------------+---------------| > | 1 | 0 | 5238041 | 52 | 38 | > | 1 | 1 | 5228542 | 52 | 38 | > | 1 | 2 | 5325917 | 53 | 40 | > | 1 | 10 | 5299667 | 52 | 38 | > | 1 | 100 | 5245250 | 52 | 39 | > | 1 | 200 | 5238459 | 52 | 39 | > |----------+------------+-----------------+------------+---------------| > | 1 | 0 | 5239083 | 52 | 38 | > | 2 | 0 | 19449417 | 194 | 181 | > | 10 | 0 | 67718584 | 677 | 663 | > | 100 | 0 | 709840708 | 7098 | 7085 | > | 200 | 0 | 2203580626 | 22035 | 22022 | > +----------+------------+-----------------+------------+---------------+ > > Note: per-call overhead is estimated relative to the baseline case with > 0 relevant tracers and 0 irrelevant tracers. > > As can be seen from the above: > > a) Whenever there is a single relevant tracer function associated with a > tracee, the overhead of invoking the tracer is constant, and does not > scale with the number of tracers which are *not* associated with that > tracee. > > b) The overhead for a single relevant tracer has dropped to ~1/3 of the > overhead prior to this series (from 122ns to 38ns). This is largely > due to permitting calls to dynamically-allocated ftrace_ops without > going through ftrace_ops_list_func. > > Signed-off-by: Puranjay Mohan <[email protected]> > > [update kconfig, asm, refactor] > > Signed-off-by: Andy Chiu <[email protected]> > Tested-by: Björn Töpel <[email protected]>
I bisected a boot failure to this commit [c217157bcd1df ("riscv:
Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS")] yesterday, that appears
to be affecting all LLVM versions that I currently have installed. From
some initial testing of Kconfig options, it looks like the issue is
CFI_CLANG related because when I disable CFI_CLANG things work once
more. Since this option depends on !CFI_CLANG, but is def_bool y, I
modified Kconfig to force disable it at all times and tested
!DYNAMIC_FTRACE_WITH_CALL_OPS && !CFG_CLANG, which did boot.
I dunno anything about what's going on in this patch, but so little in
it relates to having DYNAMIC_FTRACE_WITH_CALL_OPS, that I was able to
figure out that the problem is -fpatchable-function-entry=8,4
FWIW, if anyone checks out this commit directly, you'll need to
cherry-pick commit e9d86b8e17e72 ("scripts: Do not strip .rela.dyn
section"), as the base of the branch that c217157bcd1df is on is
v6.15-rc3, which is in itself broken in turn by the issue fixed by
e9d86b8e17e72. Probably not someone anyone will do, but made for an
awful time trying to figure out what commit was at fault!
Cheers,
Conor.
signature.asc
Description: PGP signature
