On Mon, Feb 23, 2026 at 3:28 PM Conor Dooley <[email protected]> wrote:
>
> On Mon, Feb 23, 2026 at 03:18:17PM +0000, Puranjay Mohan wrote:
> > On Sat, Feb 21, 2026 at 12:15 PM Conor Dooley <[email protected]> wrote:
> > >
> > > Hey,
> > >
> > > On Tue, Apr 08, 2025 at 02:08:34AM +0800, Andy Chiu wrote:
> > > > From: Puranjay Mohan <[email protected]>
> > > >
> > > > This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V.
> > > > This allows each ftrace callsite to provide an ftrace_ops to the common
> > > > ftrace trampoline, allowing each callsite to invoke distinct tracer
> > > > functions without the need to fall back to list processing or to
> > > > allocate custom trampolines for each callsite. This significantly speeds
> > > > up cases where multiple distinct trace functions are used and callsites
> > > > are mostly traced by a single tracer.
> > > >
> > > > The idea and most of the implementation is taken from the ARM64's
> > > > implementation of the same feature. The idea is to place a pointer to
> > > > the ftrace_ops as a literal at a fixed offset from the function entry
> > > > point, which can be recovered by the common ftrace trampoline.
> > > >
> > > > We use -fpatchable-function-entry to reserve 8 bytes above the function
> > > > entry by emitting 2 4 byte or 4 2 byte nops depending on the presence
> > > > of
> > > > CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer
> > > > to the associated ftrace_ops for that callsite. Functions are aligned to
> > > > 8 bytes to make sure that the accesses to this literal are atomic.
> > > >
> > > > This approach allows for directly invoking ftrace_ops::func even for
> > > > ftrace_ops which are dynamically-allocated (or part of a module),
> > > > without going via ftrace_ops_list_func.
> > > >
> > > > We've benchamrked this with the ftrace_ops sample module on Spacemit K1
> > > > Jupiter:
> > > >
> > > > Without this patch:
> > > >
> > > > baseline (Linux rivos 6.14.0-09584-g7d06015d936c #3 SMP Sat Mar 29
> > > > +-----------------------+-----------------+----------------------------+
> > > > | Number of tracers | Total time (ns) | Per-call average time |
> > > > |-----------------------+-----------------+----------------------------|
> > > > | Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) |
> > > > |----------+------------+-----------------+------------+---------------|
> > > > | 0 | 0 | 1357958 | 13 | - |
> > > > | 0 | 1 | 1302375 | 13 | - |
> > > > | 0 | 2 | 1302375 | 13 | - |
> > > > | 0 | 10 | 1379084 | 13 | - |
> > > > | 0 | 100 | 1302458 | 13 | - |
> > > > | 0 | 200 | 1302333 | 13 | - |
> > > > |----------+------------+-----------------+------------+---------------|
> > > > | 1 | 0 | 13677833 | 136 | 123 |
> > > > | 1 | 1 | 18500916 | 185 | 172 |
> > > > | 1 | 2 | 22856459 | 228 | 215 |
> > > > | 1 | 10 | 58824709 | 588 | 575 |
> > > > | 1 | 100 | 505141584 | 5051 | 5038 |
> > > > | 1 | 200 | 1580473126 | 15804 | 15791 |
> > > > |----------+------------+-----------------+------------+---------------|
> > > > | 1 | 0 | 13561000 | 135 | 122 |
> > > > | 2 | 0 | 19707292 | 197 | 184 |
> > > > | 10 | 0 | 67774750 | 677 | 664 |
> > > > | 100 | 0 | 714123125 | 7141 | 7128 |
> > > > | 200 | 0 | 1918065668 | 19180 | 19167 |
> > > > +----------+------------+-----------------+------------+---------------+
> > > >
> > > > Note: per-call overhead is estimated relative to the baseline case with
> > > > 0 relevant tracers and 0 irrelevant tracers.
> > > >
> > > > With this patch:
> > > >
> > > > v4-rc4 (Linux rivos 6.14.0-09598-gd75747611c93 #4 SMP Sat Mar 29
> > > > +-----------------------+-----------------+----------------------------+
> > > > | Number of tracers | Total time (ns) | Per-call average time |
> > > > |-----------------------+-----------------+----------------------------|
> > > > | Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) |
> > > > |----------+------------+-----------------+------------+---------------|
> > > > | 0 | 0 | 1459917 | 14 | - |
> > > > | 0 | 1 | 1408000 | 14 | - |
> > > > | 0 | 2 | 1383792 | 13 | - |
> > > > | 0 | 10 | 1430709 | 14 | - |
> > > > | 0 | 100 | 1383791 | 13 | - |
> > > > | 0 | 200 | 1383750 | 13 | - |
> > > > |----------+------------+-----------------+------------+---------------|
> > > > | 1 | 0 | 5238041 | 52 | 38 |
> > > > | 1 | 1 | 5228542 | 52 | 38 |
> > > > | 1 | 2 | 5325917 | 53 | 40 |
> > > > | 1 | 10 | 5299667 | 52 | 38 |
> > > > | 1 | 100 | 5245250 | 52 | 39 |
> > > > | 1 | 200 | 5238459 | 52 | 39 |
> > > > |----------+------------+-----------------+------------+---------------|
> > > > | 1 | 0 | 5239083 | 52 | 38 |
> > > > | 2 | 0 | 19449417 | 194 | 181 |
> > > > | 10 | 0 | 67718584 | 677 | 663 |
> > > > | 100 | 0 | 709840708 | 7098 | 7085 |
> > > > | 200 | 0 | 2203580626 | 22035 | 22022 |
> > > > +----------+------------+-----------------+------------+---------------+
> > > >
> > > > Note: per-call overhead is estimated relative to the baseline case with
> > > > 0 relevant tracers and 0 irrelevant tracers.
> > > >
> > > > As can be seen from the above:
> > > >
> > > > a) Whenever there is a single relevant tracer function associated with
> > > > a
> > > > tracee, the overhead of invoking the tracer is constant, and does
> > > > not
> > > > scale with the number of tracers which are *not* associated with
> > > > that
> > > > tracee.
> > > >
> > > > b) The overhead for a single relevant tracer has dropped to ~1/3 of the
> > > > overhead prior to this series (from 122ns to 38ns). This is largely
> > > > due to permitting calls to dynamically-allocated ftrace_ops without
> > > > going through ftrace_ops_list_func.
> > > >
> > > > Signed-off-by: Puranjay Mohan <[email protected]>
> > > >
> > > > [update kconfig, asm, refactor]
> > > >
> > > > Signed-off-by: Andy Chiu <[email protected]>
> > > > Tested-by: Björn Töpel <[email protected]>
> > >
> > > I bisected a boot failure to this commit [c217157bcd1df ("riscv:
> > > Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS")] yesterday, that appears
> > > to be affecting all LLVM versions that I currently have installed. From
> > > some initial testing of Kconfig options, it looks like the issue is
> > > CFI_CLANG related because when I disable CFI_CLANG things work once
> > > more. Since this option depends on !CFI_CLANG, but is def_bool y, I
> > > modified Kconfig to force disable it at all times and tested
> > > !DYNAMIC_FTRACE_WITH_CALL_OPS && !CFG_CLANG, which did boot.
> > >
> > > I dunno anything about what's going on in this patch, but so little in
> > > it relates to having DYNAMIC_FTRACE_WITH_CALL_OPS, that I was able to
> > > figure out that the problem is -fpatchable-function-entry=8,4
> > >
> >
> > DYNAMIC_FTRACE_WITH_CALL_OPS can't work together with CFI_CLANG.
> >
> > arm64 has:
> >
> > select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS \
> > if (DYNAMIC_FTRACE_WITH_ARGS && !CFI && \
> > (CC_IS_CLANG || !CC_OPTIMIZE_FOR_SIZE))
> >
> > would need something similar for riscv if not already done.
>
>
> I think you've misunderstood my email. We already have:
>
> select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if (DYNAMIC_FTRACE_WITH_ARGS
> && !CFI)
>
> The problem is that the patch broke using CFI_CLANG, due to the
> fpatchable-function-entry change.
Yeah, sorry I did not see the patch,
the original one I sent had:
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS), y)
+ifeq ($(CONFIG_RISCV_ISA_C),y)
+ CC_FLAGS_FTRACE := -fpatchable-function-entry=8,4
+else
+ CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2
+endif
+else
The basic Idea is that we can't put nops before the function entry
when using CFI_CLANG, because they both interfere with each other.
the fix should be something like:
-- >8 --
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 371da75a47f9..94100810a6a4 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -14,11 +14,19 @@ endif
ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
LDFLAGS_vmlinux += --no-relax
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS),y)
ifeq ($(CONFIG_RISCV_ISA_C),y)
CC_FLAGS_FTRACE := -fpatchable-function-entry=8,4
else
CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2
endif
+else
+ifeq ($(CONFIG_RISCV_ISA_C),y)
+ CC_FLAGS_FTRACE := -fpatchable-function-entry=4
+else
+ CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
+endif
endif
ifeq ($(CONFIG_CMODEL_MEDLOW),y)
-- 8< --
Thanks,
Puranjay