from:"'Naveen N. Rao'"

[PATCH 2/2] MAINTAINERS: Update powerpc BPF JIT maintainers

2024-07-14 Thread Naveen N Rao

Hari Bathini has been updating and maintaining the powerpc BPF JIT since
a while now. Christophe Leroy has been doing the same for 32-bit
powerpc. Add them as maintainers for the powerpc BPF JIT.

I am no longer actively looking into the powerpc BPF JIT. Change my role
to that of a reviewer so that I can help with the odd query.

Signed-off-by: Naveen N Rao 
---
 MAINTAINERS | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 05f14b67cd74..c7a931ee7a2e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3878,8 +3878,10 @@ S:   Odd Fixes
 F: drivers/net/ethernet/netronome/nfp/bpf/
 
 BPF JIT for POWERPC (32-BIT AND 64-BIT)
-M: Naveen N Rao 
 M: Michael Ellerman 
+M: Hari Bathini 
+M: Christophe Leroy 
+R: Naveen N Rao 
 L: b...@vger.kernel.org
 S: Supported
 F: arch/powerpc/net/
-- 
2.45.2

[PATCH 1/2] MAINTAINERS: Update email address of Naveen

2024-07-14 Thread Naveen N Rao

I have switched to using my @kernel.org id for my contributions. Update
MAINTAINERS and mailmap to reflect the same.

Cc: Naveen N. Rao 
Signed-off-by: Naveen N Rao 
---
 .mailmap| 2 ++
 MAINTAINERS | 6 +++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/.mailmap b/.mailmap
index 81ac1e17ac3c..289011ebca00 100644
--- a/.mailmap
+++ b/.mailmap
@@ -473,6 +473,8 @@ Nadia Yvette Chambers  William Lee 
Irwin III  
 Naoya Horiguchi  
 Nathan Chancellor  
+Naveen N Rao  
+Naveen N Rao  
 Neeraj Upadhyay  
 Neeraj Upadhyay  
 Neil Armstrong  
diff --git a/MAINTAINERS b/MAINTAINERS
index fa32e3c035c2..05f14b67cd74 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3878,7 +3878,7 @@ S:Odd Fixes
 F: drivers/net/ethernet/netronome/nfp/bpf/
 
 BPF JIT for POWERPC (32-BIT AND 64-BIT)
-M: Naveen N. Rao 
+M: Naveen N Rao 
 M: Michael Ellerman 
 L: b...@vger.kernel.org
 S: Supported
@@ -12332,7 +12332,7 @@ F:  mm/kmsan/
 F: scripts/Makefile.kmsan
 
 KPROBES
-M: Naveen N. Rao 
+M: Naveen N Rao 
 M: Anil S Keshavamurthy 
 M: "David S. Miller" 
 M: Masami Hiramatsu 
@@ -12708,7 +12708,7 @@ LINUX FOR POWERPC (32-BIT AND 64-BIT)
 M: Michael Ellerman 
 R: Nicholas Piggin 
 R: Christophe Leroy 
-R:     Naveen N. Rao 
+R:     Naveen N Rao 
 L: linuxppc-dev@lists.ozlabs.org
 S: Supported
 W: https://github.com/linuxppc/wiki/wiki

base-commit: 582b0e554593e530b1386eacafee6c412c5673cc
-- 
2.45.2

[RFC PATCH v4 02/17] powerpc/kprobes: Use ftrace to determine if a probe is at function entry

2024-07-14 Thread Naveen N Rao

Rather than hard-coding the offset into a function to be used to
determine if a kprobe is at function entry, use ftrace_location() to
determine the ftrace location within the function and categorize all
instructions till that offset to be function entry.

For functions that cannot be traced, we fall back to using a fixed
offset of 8 (two instructions) to categorize a probe as being at
function entry for 64-bit elfv2, unless we are using pcrel.

Acked-by: Masami Hiramatsu (Google) 
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/kprobes.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 14c5ddec3056..ca204f4f21c1 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
return addr;
 }
 
-static bool arch_kprobe_on_func_entry(unsigned long offset)
+static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
 {
-#ifdef CONFIG_PPC64_ELF_ABI_V2
-#ifdef CONFIG_KPROBES_ON_FTRACE
-   return offset <= 16;
-#else
-   return offset <= 8;
-#endif
-#else
+   unsigned long ip = ftrace_location(addr);
+
+   if (ip)
+   return offset <= (ip - addr);
+   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && 
!IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+   return offset <= 8;
return !offset;
-#endif
 }
 
 /* XXX try and fold the magic of kprobe_lookup_name() in this */
 kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long 
offset,
 bool *on_func_entry)
 {
-   *on_func_entry = arch_kprobe_on_func_entry(offset);
+   *on_func_entry = arch_kprobe_on_func_entry(addr, offset);
return (kprobe_opcode_t *)(addr + offset);
 }
 
-- 
2.45.2

[RFC PATCH v4 01/17] powerpc/trace: Account for -fpatchable-function-entry support by toolchain

2024-07-14 Thread Naveen N Rao

So far, we have relied on the fact that gcc supports both
-mprofile-kernel, as well as -fpatchable-function-entry, and clang
supports neither. Our Makefile only checks for CONFIG_MPROFILE_KERNEL to
decide which files to build. Clang has a feature request out [*] to
implement -fpatchable-function-entry, and is unlikely to support
-mprofile-kernel.

Update our Makefile checks so that we pick up the correct files to build
once clang picks up support for -fpatchable-function-entry.

[*] https://github.com/llvm/llvm-project/issues/57031

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/Makefile | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/Makefile 
b/arch/powerpc/kernel/trace/Makefile
index 125f4ca588b9..d6c3885453bd 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -9,12 +9,15 @@ CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_ftrace_64_pg.o = $(CC_FLAGS_FTRACE)
 endif
 
-obj32-$(CONFIG_FUNCTION_TRACER)+= ftrace.o ftrace_entry.o
-ifdef CONFIG_MPROFILE_KERNEL
-obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace.o ftrace_entry.o
+ifdef CONFIG_FUNCTION_TRACER
+obj32-y+= ftrace.o ftrace_entry.o
+ifeq ($(CONFIG_MPROFILE_KERNEL)$(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY),)
+obj64-y+= ftrace_64_pg.o 
ftrace_64_pg_entry.o
 else
-obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_64_pg.o 
ftrace_64_pg_entry.o
+obj64-y+= ftrace.o ftrace_entry.o
 endif
+endif
+
 obj-$(CONFIG_TRACING)  += trace_clock.o
 
 obj-$(CONFIG_PPC64)+= $(obj64-y)
-- 
2.45.2

[RFC PATCH v4 10/17] powerpc/ftrace: Add a postlink script to validate function tracer

2024-07-14 Thread Naveen N Rao

Function tracer on powerpc can only work with vmlinux having a .text
size of up to ~64MB due to powerpc branch instruction having a limited
relative branch range of 32MB. Today, this is only detected on kernel
boot when ftrace is init'ed. Add a post-link script to check the size of
.text so that we can detect this at build time, and break the build if
necessary.

We add a dependency on !COMPILE_TEST for CONFIG_HAVE_FUNCTION_TRACER so
that allyesconfig and other test builds can continue to work without
enabling ftrace.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/Makefile.postlink |  8 ++
 arch/powerpc/tools/ftrace_check.sh | 45 ++
 3 files changed, 54 insertions(+), 1 deletion(-)
 create mode 100755 arch/powerpc/tools/ftrace_check.sh

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f8891fbe7c16..68f0e7a5576f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -244,7 +244,7 @@ config PPC
select HAVE_FUNCTION_DESCRIPTORSif PPC64_ELF_ABI_V1
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_FUNCTION_GRAPH_TRACER
-   select HAVE_FUNCTION_TRACER if PPC64 || (PPC32 && CC_IS_GCC)
+   select HAVE_FUNCTION_TRACER if !COMPILE_TEST && (PPC64 || 
(PPC32 && CC_IS_GCC))
select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200   # 
plugin support on gcc <= 5.1 is buggy on PPC
select HAVE_GENERIC_VDSO
select HAVE_HARDLOCKUP_DETECTOR_ARCHif PPC_BOOK3S_64 && SMP
diff --git a/arch/powerpc/Makefile.postlink b/arch/powerpc/Makefile.postlink
index ae5a4256b03d..bb601be36173 100644
--- a/arch/powerpc/Makefile.postlink
+++ b/arch/powerpc/Makefile.postlink
@@ -24,6 +24,9 @@ else
$(CONFIG_SHELL) $(srctree)/arch/powerpc/tools/relocs_check.sh 
"$(OBJDUMP)" "$(NM)" "$@"
 endif
 
+quiet_cmd_ftrace_check = CHKFTRC $@
+  cmd_ftrace_check = $(CONFIG_SHELL) 
$(srctree)/arch/powerpc/tools/ftrace_check.sh "$(NM)" "$@"
+
 # `@true` prevents complaint when there is nothing to be done
 
 vmlinux: FORCE
@@ -34,6 +37,11 @@ endif
 ifdef CONFIG_RELOCATABLE
$(call if_changed,relocs_check)
 endif
+ifdef CONFIG_FUNCTION_TRACER
+ifndef CONFIG_PPC64_ELF_ABI_V1
+   $(call cmd,ftrace_check)
+endif
+endif
 
 clean:
rm -f .tmp_symbols.txt
diff --git a/arch/powerpc/tools/ftrace_check.sh 
b/arch/powerpc/tools/ftrace_check.sh
new file mode 100755
index ..33f2fd45e54d
--- /dev/null
+++ b/arch/powerpc/tools/ftrace_check.sh
@@ -0,0 +1,45 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# This script checks vmlinux to ensure that all functions can call 
ftrace_caller() either directly,
+# or through the stub, ftrace_tramp_text, at the end of kernel text.
+
+# Error out if any command fails
+set -e
+
+# Allow for verbose output
+if [ "$V" = "1" ]; then
+   set -x
+fi
+
+if [ $# -lt 2 ]; then
+   echo "$0 [path to nm] [path to vmlinux]" 1>&2
+   exit 1
+fi
+
+# Have Kbuild supply the path to nm so we handle cross compilation.
+nm="$1"
+vmlinux="$2"
+
+stext_addr=$($nm "$vmlinux" | grep -e " [TA] _stext$" | cut -d' ' -f1 | tr 
'[[:lower:]]' '[[:upper:]]')
+ftrace_caller_addr=$($nm "$vmlinux" | grep -e " T ftrace_caller$" | cut -d' ' 
-f1 | tr '[[:lower:]]' '[[:upper:]]')
+ftrace_tramp_addr=$($nm "$vmlinux" | grep -e " T ftrace_tramp_text$" | cut -d' 
' -f1 | tr '[[:lower:]]' '[[:upper:]]')
+
+ftrace_caller_offset=$(echo "ibase=16;$ftrace_caller_addr - $stext_addr" | bc)
+ftrace_tramp_offset=$(echo "ibase=16;$ftrace_tramp_addr - $ftrace_caller_addr" 
| bc)
+sz_32m=$(printf "%d" 0x200)
+sz_64m=$(printf "%d" 0x400)
+
+# ftrace_caller - _stext < 32M
+if [ $ftrace_caller_offset -ge $sz_32m ]; then
+   echo "ERROR: ftrace_caller (0x$ftrace_caller_addr) is beyond 32MiB of 
_stext" 1>&2
+   echo "ERROR: consider disabling CONFIG_FUNCTION_TRACER, or reducing the 
size of kernel text" 1>&2
+   exit 1
+fi
+
+# ftrace_tramp_text - ftrace_caller < 64M
+if [ $ftrace_tramp_offset -ge $sz_64m ]; then
+   echo "ERROR: kernel text extends beyond 64MiB from ftrace_caller" 1>&2
+   echo "ERROR: consider disabling CONFIG_FUNCTION_TRACER, or reducing the 
size of kernel text" 1>&2
+   exit 1
+fi
-- 
2.45.2

[RFC PATCH v4 09/17] powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into bpf_jit_emit_func_call_rel()

2024-07-14 Thread Naveen N Rao

Commit 61688a82e047 ("powerpc/bpf: enable kfunc call") enhanced
bpf_jit_emit_func_call_hlp() to handle calls out to module region, where
bpf progs are generated. The only difference now between
bpf_jit_emit_func_call_hlp() and bpf_jit_emit_func_call_rel() is in
handling of the initial pass where target function address is not known.
Fold that logic into bpf_jit_emit_func_call_hlp() and rename it to
bpf_jit_emit_func_call_rel() to simplify bpf function call JIT code.

We don't actually need to load/restore TOC across a call out to a
different kernel helper or to a different bpf program since they all
work with the kernel TOC. We only need to do it if we have to call out
to a module function. So, guard TOC load/restore with appropriate
conditions.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/net/bpf_jit_comp64.c | 61 +--
 1 file changed, 17 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 2cbcdf93cc19..f3be024fc685 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -202,14 +202,22 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_BLR());
 }
 
-static int
-bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func)
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func)
 {
unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
long reladdr;
 
-   if (WARN_ON_ONCE(!kernel_text_address(func_addr)))
-   return -EINVAL;
+   /* bpf to bpf call, func is not known in the initial pass. Emit 5 nops 
as a placeholder */
+   if (!func) {
+   for (int i = 0; i < 5; i++)
+   EMIT(PPC_RAW_NOP());
+   /* elfv1 needs an additional instruction to load addr from 
descriptor */
+   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V1))
+   EMIT(PPC_RAW_NOP());
+   EMIT(PPC_RAW_MTCTR(_R12));
+   EMIT(PPC_RAW_BCTRL());
+   return 0;
+   }
 
 #ifdef CONFIG_PPC_KERNEL_PCREL
reladdr = func_addr - local_paca->kernelbase;
@@ -266,7 +274,8 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct 
codegen_context *ctx,
 * We can clobber r2 since we get called through a
 * function pointer (so caller will save/restore r2).
 */
-   EMIT(PPC_RAW_LD(_R2, bpf_to_ppc(TMP_REG_2), 8));
+   if (is_module_text_address(func_addr))
+   EMIT(PPC_RAW_LD(_R2, bpf_to_ppc(TMP_REG_2), 8));
} else {
PPC_LI64(_R12, func);
EMIT(PPC_RAW_MTCTR(_R12));
@@ -276,46 +285,14 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, 
struct codegen_context *ctx,
 * Load r2 with kernel TOC as kernel TOC is used if function 
address falls
 * within core kernel text.
 */
-   EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, 
kernel_toc)));
+   if (is_module_text_address(func_addr))
+   EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, 
kernel_toc)));
}
 #endif
 
return 0;
 }
 
-int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func)
-{
-   unsigned int i, ctx_idx = ctx->idx;
-
-   if (WARN_ON_ONCE(func && is_module_text_address(func)))
-   return -EINVAL;
-
-   /* skip past descriptor if elf v1 */
-   func += FUNCTION_DESCR_SIZE;
-
-   /* Load function address into r12 */
-   PPC_LI64(_R12, func);
-
-   /* For bpf-to-bpf function calls, the callee's address is unknown
-* until the last extra pass. As seen above, we use PPC_LI64() to
-* load the callee's address, but this may optimize the number of
-* instructions required based on the nature of the address.
-*
-* Since we don't want the number of instructions emitted to increase,
-* we pad the optimized PPC_LI64() call with NOPs to guarantee that
-* we always have a five-instruction sequence, which is the maximum
-* that PPC_LI64() can emit.
-*/
-   if (!image)
-   for (i = ctx->idx - ctx_idx; i < 5; i++)
-   EMIT(PPC_RAW_NOP());
-
-   EMIT(PPC_RAW_MTCTR(_R12));
-   EMIT(PPC_RAW_BCTRL());
-
-   return 0;
-}
-
 static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 
out)
 {
/*
@@ -1102,11 +1079,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
u32 *fimage, struct code
if (ret < 0)
return ret;
 
-

[RFC PATCH v4 08/17] powerpc/ftrace: Move ftrace stub used for init text before _einittext

2024-07-14 Thread Naveen N Rao

Move the ftrace stub used to cover inittext before _einittext so that it
is within kernel text, as seen through core_kernel_text(). This is
required for a subsequent change to ftrace.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/vmlinux.lds.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index f420df7888a7..0aef9959f2cd 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -267,14 +267,13 @@ SECTIONS
.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
_sinittext = .;
INIT_TEXT
-
+   *(.tramp.ftrace.init);
/*
 *.init.text might be RO so we must ensure this section ends on
 * a page boundary.
 */
. = ALIGN(PAGE_SIZE);
_einittext = .;
-   *(.tramp.ftrace.init);
} :text
 
/* .exit.text is discarded at runtime, not link time,
-- 
2.45.2

[RFC PATCH v4 07/17] powerpc/ftrace: Skip instruction patching if the instructions are the same

2024-07-14 Thread Naveen N Rao

To simplify upcoming changes to ftrace, add a check to skip actual
instruction patching if the old and new instructions are the same. We
still validate that the instruction is what we expect, but don't
actually patch the same instruction again.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index fe0546fbac8e..719517265d39 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -82,7 +82,7 @@ static inline int ftrace_modify_code(unsigned long ip, 
ppc_inst_t old, ppc_inst_
 {
int ret = ftrace_validate_inst(ip, old);
 
-   if (!ret)
+   if (!ret && !ppc_inst_equal(old, new))
ret = patch_instruction((u32 *)ip, new);
 
return ret;
-- 
2.45.2

[RFC PATCH v4 06/17] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace

2024-07-14 Thread Naveen N Rao

Pointer to struct module is only relevant for ftrace records belonging
to kernel modules. Having this field in dyn_arch_ftrace wastes memory
for all ftrace records belonging to the kernel. Remove the same in
favour of looking up the module from the ftrace record address, similar
to other architectures.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ftrace.h|  1 -
 arch/powerpc/kernel/trace/ftrace.c   | 49 +
 arch/powerpc/kernel/trace/ftrace_64_pg.c | 69 ++--
 3 files changed, 56 insertions(+), 63 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 107fc5a48456..201f9d15430a 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -26,7 +26,6 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
unsigned long ip,
 struct module;
 struct dyn_ftrace;
 struct dyn_arch_ftrace {
-   struct module *mod;
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 8c3e523e4f96..fe0546fbac8e 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -106,28 +106,43 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
return 0;
 }
 
+#ifdef CONFIG_MODULES
+static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long 
addr)
+{
+   struct module *mod = NULL;
+
+   preempt_disable();
+   mod = __module_text_address(ip);
+   preempt_enable();
+
+   if (!mod)
+   pr_err("No module loaded at addr=%lx\n", ip);
+
+   return (addr == (unsigned long)ftrace_caller ? mod->arch.tramp : 
mod->arch.tramp_regs);
+}
+#else
+static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long 
addr)
+{
+   return 0;
+}
+#endif
+
 static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, 
ppc_inst_t *call_inst)
 {
unsigned long ip = rec->ip;
unsigned long stub;
 
-   if (is_offset_in_branch_range(addr - ip)) {
+   if (is_offset_in_branch_range(addr - ip))
/* Within range */
stub = addr;
-#ifdef CONFIG_MODULES
-   } else if (rec->arch.mod) {
-   /* Module code would be going to one of the module stubs */
-   stub = (addr == (unsigned long)ftrace_caller ? 
rec->arch.mod->arch.tramp :
-  
rec->arch.mod->arch.tramp_regs);
-#endif
-   } else if (core_kernel_text(ip)) {
+   else if (core_kernel_text(ip))
/* We would be branching to one of our ftrace stubs */
stub = find_ftrace_tramp(ip);
-   if (!stub) {
-   pr_err("0x%lx: No ftrace stubs reachable\n", ip);
-   return -EINVAL;
-   }
-   } else {
+   else
+   stub = ftrace_lookup_module_stub(ip, addr);
+
+   if (!stub) {
+   pr_err("0x%lx: No ftrace stubs reachable\n", ip);
return -EINVAL;
}
 
@@ -262,14 +277,6 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
if (ret)
return ret;
 
-   if (!core_kernel_text(ip)) {
-   if (!mod) {
-   pr_err("0x%lx: No module provided for non-kernel 
address\n", ip);
-   return -EFAULT;
-   }
-   rec->arch.mod = mod;
-   }
-
/* Nop-out the ftrace location */
new = ppc_inst(PPC_RAW_NOP());
addr = MCOUNT_ADDR;
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c 
b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 12fab1803bcf..8a551dfca3d0 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -116,6 +116,20 @@ static unsigned long find_bl_target(unsigned long ip, 
ppc_inst_t op)
 }
 
 #ifdef CONFIG_MODULES
+static struct module *ftrace_lookup_module(struct dyn_ftrace *rec)
+{
+   struct module *mod;
+
+   preempt_disable();
+   mod = __module_text_address(rec->ip);
+   preempt_enable();
+
+   if (!mod)
+   pr_err("No module loaded at addr=%lx\n", rec->ip);
+
+   return mod;
+}
+
 static int
 __ftrace_make_nop(struct module *mod,
  struct dyn_ftrace *rec, unsigned long addr)
@@ -124,6 +138,12 @@ __ftrace_make_nop(struct module *mod,
unsigned long ip = rec->ip;
ppc_inst_t op, pop;
 
+   if (!mod) {
+   mod = ftrace_lookup_module(rec);
+   if (!mod)
+   return -EINVAL;
+   }
+
/* read where this goes */
if (copy_inst_from_kernel_nofault(&op, (void *)ip)) {
pr_err("Fetching opcode failed.\n");
@@ -366,27 +386,6 @@ int ftrace_m

[RFC PATCH v4 05/17] powerpc/module_64: Convert #ifdef to IS_ENABLED()

2024-07-14 Thread Naveen N Rao

Minor refactor for converting #ifdef to IS_ENABLED().

Reviewed-by: Nicholas Piggin 
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/module_64.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index e9bab599d0c2..1db88409bd95 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -241,14 +241,8 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
}
}
 
-#ifdef CONFIG_DYNAMIC_FTRACE
-   /* make the trampoline to the ftrace_caller */
-   relocs++;
-#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
-   /* an additional one for ftrace_regs_caller */
-   relocs++;
-#endif
-#endif
+   /* stubs for ftrace_caller and ftrace_regs_caller */
+   relocs += IS_ENABLED(CONFIG_DYNAMIC_FTRACE) + 
IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS);
 
pr_debug("Looks like a total of %lu stubs, max\n", relocs);
return relocs * sizeof(struct ppc64_stub_entry);
-- 
2.45.2

[RFC PATCH v4 04/17] powerpc32/ftrace: Unify 32-bit and 64-bit ftrace entry code

2024-07-14 Thread Naveen N Rao

On 32-bit powerpc, gcc generates a three instruction sequence for
function profiling:
mflrr0
stw r0, 4(r1)
bl  _mcount

On kernel boot, the call to _mcount() is nop-ed out, to be patched back
in when ftrace is actually enabled. The 'stw' instruction therefore is
not necessary unless ftrace is enabled. Nop it out during ftrace init.

When ftrace is enabled, we want the 'stw' so that stack unwinding works
properly. Perform the same within the ftrace handler, similar to 64-bit
powerpc.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c   | 6 --
 arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 2ef504700e8d..8c3e523e4f96 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -240,8 +240,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
} else if (IS_ENABLED(CONFIG_PPC32)) {
/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
-   if (!ret)
-   ret = ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
+   if (ret)
+   return ret;
+   ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 
4)),
+ppc_inst(PPC_RAW_NOP()));
} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl 
_mcount' */
ret = ftrace_read_inst(ip - 4, &old);
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index 76dbe9fd2c0f..244a1c7bb1e8 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,6 +33,8 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro ftrace_regs_entry allregs
+   /* Save the original return address in A's stack frame */
+   PPC_STL r0, LRSAVE(r1)
/* Create a minimal stack frame for representing B */
PPC_STLUr1, -STACK_FRAME_MIN_SIZE(r1)
 
@@ -44,8 +46,6 @@
SAVE_GPRS(3, 10, r1)
 
 #ifdef CONFIG_PPC64
-   /* Save the original return address in A's stack frame */
-   std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
/* Ok to continue? */
lbz r3, PACA_FTRACE_ENABLED(r13)
cmpdi   r3, 0
-- 
2.45.2

[RFC PATCH v4 03/17] powerpc64/ftrace: Nop out additional 'std' instruction emitted by gcc v5.x

2024-07-14 Thread Naveen N Rao

Gcc v5.x emits a 3-instruction sequence for -mprofile-kernel:
mflrr0
std r0, 16(r1)
bl  _mcount

Gcc v6.x moved to a simpler 2-instruction sequence by removing the 'std'
instruction. The store saved the return address in the LR save area in
the caller stack frame for stack unwinding. However, with dynamic
ftrace, we no longer have a call to _mcount on kernel boot when ftrace
is not enabled. When ftrace is enabled, that store is performed within
ftrace_caller(). As such, the additional 'std' instruction is redundant.
Nop it out on kernel boot.

With this change, we now use the same 2-instruction profiling sequence
with both -mprofile-kernel, as well as -fpatchable-function-entry on
64-bit powerpc.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index d8d6b4fd9a14..2ef504700e8d 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -246,8 +246,12 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl 
_mcount' */
ret = ftrace_read_inst(ip - 4, &old);
if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0 {
+   /* Gcc v5.x emit the additional 'std' instruction, gcc 
v6.x don't */
ret = ftrace_validate_inst(ip - 8, 
ppc_inst(PPC_RAW_MFLR(_R0)));
-   ret |= ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
+   if (ret)
+   return ret;
+   ret = ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
+ppc_inst(PPC_RAW_NOP()));
}
} else {
return -EINVAL;
-- 
2.45.2

[RFC PATCH v4 17/17] powerpc64/bpf: Add support for bpf trampolines

2024-07-14 Thread Naveen N Rao

Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for 64-bit powerpc. While the code is generic, BPF trampolines are only
enabled on 64-bit powerpc. 32-bit powerpc will need testing and some
updates.

BPF Trampolines adhere to the existing ftrace ABI utilizing a
two-instruction profiling sequence, as well as the newer ABI utilizing a
three-instruction profiling sequence enabling return with a 'blr'. The
trampoline code itself closely follows x86 implementation.

BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
having a single nop at function entry, followed by the function
profiling sequence out-of-line and a separate long branch stub for calls
to trampolines that are out of range. A dummy_tramp is provided to
simplify synchronization similar to arm64.

When attaching a bpf trampoline to a bpf prog, we can patch up to three
things:
- the nop at bpf prog entry to go to the out-of-line stub
- the instruction in the out-of-line stub to either call the bpf trampoline
directly, or to branch to the long_branch stub.
- the trampoline address before the long_branch stub.

We do not need any synchronization here since we always have a valid
branch target regardless of the order in which the above stores are
seen. dummy_tramp ensures that the long_branch stub goes to a valid
destination on other cpus, even when the branch to the long_branch stub
is seen before the updated trampoline address.

However, when detaching a bpf trampoline from a bpf prog, or if changing
the bpf trampoline address, we need synchronization to ensure that other
cpus can no longer branch into the older trampoline so that it can be
safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus
make forward progress, but we still need to ensure that other cpus
execute isync (or some CSI) so that they don't go back into the
trampoline again.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ppc-opcode.h |  14 +
 arch/powerpc/net/bpf_jit.h|  12 +
 arch/powerpc/net/bpf_jit_comp.c   | 842 +-
 arch/powerpc/net/bpf_jit_comp32.c |   7 +-
 arch/powerpc/net/bpf_jit_comp64.c |   7 +-
 5 files changed, 879 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index b98a9e982c03..4312bcb913a4 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -587,12 +587,26 @@
 #define PPC_RAW_MTSPR(spr, d)  (0x7c0003a6 | ___PPC_RS(d) | 
__PPC_SPR(spr))
 #define PPC_RAW_EIEIO()(0x7c0006ac)
 
+/* bcl 20,31,$+4 */
+#define PPC_RAW_BCL4() (0x429f0005)
 #define PPC_RAW_BRANCH(offset) (0x4800 | PPC_LI(offset))
 #define PPC_RAW_BL(offset) (0x4801 | PPC_LI(offset))
 #define PPC_RAW_TW(t0, a, b)   (0x7c08 | ___PPC_RS(t0) | 
___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_TRAP() PPC_RAW_TW(31, 0, 0)
 #define PPC_RAW_SETB(t, bfa)   (0x7c000100 | ___PPC_RT(t) | 
___PPC_RA((bfa) << 2))
 
+#ifdef CONFIG_PPC32
+#define PPC_RAW_STLPPC_RAW_STW
+#define PPC_RAW_STLU   PPC_RAW_STWU
+#define PPC_RAW_LL PPC_RAW_LWZ
+#define PPC_RAW_CMPLI  PPC_RAW_CMPWI
+#else
+#define PPC_RAW_STLPPC_RAW_STD
+#define PPC_RAW_STLU   PPC_RAW_STDU
+#define PPC_RAW_LL PPC_RAW_LD
+#define PPC_RAW_CMPLI  PPC_RAW_CMPDI
+#endif
+
 /* Deal with instructions that older assemblers aren't aware of */
 #definePPC_BCCTR_FLUSH stringify_in_c(.long 
PPC_INST_BCCTR_FLUSH)
 #definePPC_CP_ABORTstringify_in_c(.long PPC_RAW_CP_ABORT)
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index cdea5dccaefe..2d04ce5a23da 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -12,6 +12,7 @@
 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_PPC64_ELF_ABI_V1
 #define FUNCTION_DESCR_SIZE24
@@ -21,6 +22,9 @@
 
 #define CTX_NIA(ctx) ((unsigned long)ctx->idx * 4)
 
+#define SZLsizeof(unsigned long)
+#define BPF_INSN_SAFETY64
+
 #define PLANT_INSTR(d, idx, instr)   \
do { if (d) { (d)[idx] = instr; } idx++; } while (0)
 #define EMIT(instr)PLANT_INSTR(image, ctx->idx, instr)
@@ -81,6 +85,13 @@
EMIT(PPC_RAW_ORI(d, d, (uintptr_t)(i) &   \
0x)); \
} } while (0)
+#define PPC_LI_ADDRPPC_LI64
+#define PPC64_LOAD_PACA()\
+   EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)))
+#else
+#define PPC_LI64(d, i) BUILD_BUG()
+#define PPC_LI_ADDRPPC_LI32
+#define PPC64_LOAD_PACA() BUILD_BUG()
 #endif
 
 /*
@@ -165,6 +176,7 @@ int bpf_jit

[RFC PATCH v4 16/17] samples/ftrace: Add support for ftrace direct samples on powerpc

2024-07-14 Thread Naveen N Rao

Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to
show the sample instruction sequence to be used by ftrace direct calls
to adhere to the ftrace ABI.

On 64-bit powerpc, TOC setup requires some additional work.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig|   2 +
 samples/ftrace/ftrace-direct-modify.c   |  85 +++-
 samples/ftrace/ftrace-direct-multi-modify.c | 101 +++-
 samples/ftrace/ftrace-direct-multi.c|  79 ++-
 samples/ftrace/ftrace-direct-too.c  |  83 +++-
 samples/ftrace/ftrace-direct.c  |  69 -
 6 files changed, 414 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 29aab3770415..f6ff44acf112 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -275,6 +275,8 @@ config PPC
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE
select HAVE_RSEQ
+   select HAVE_SAMPLE_FTRACE_DIRECTif 
HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+   select HAVE_SAMPLE_FTRACE_DIRECT_MULTI  if 
HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_SETUP_PER_CPU_AREA  if PPC64
select HAVE_SOFTIRQ_ON_OWN_STACK
select HAVE_STACKPROTECTOR  if PPC32 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
diff --git a/samples/ftrace/ftrace-direct-modify.c 
b/samples/ftrace/ftrace-direct-modify.c
index 81220390851a..cfea7a38befb 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -2,7 +2,7 @@
 #include 
 #include 
 #include 
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include 
 #endif
 
@@ -199,6 +199,89 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include 
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 48
+#else
+#define STACK_FRAME_SIZE 24
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE  \
+"  std 2, 24(1)\n" \
+"  bcl 20, 31, 1f\n"   \
+"   1: mflr12\n"   \
+"  ld  2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE  \
+"  ld  2, 24(1)\n"
+#define PPC64_TOC  \
+"   99:.quad   .TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR  \
+   PPC_LL" 0, "__stringify(PPC_LR_STKOFF)"(1)\n"   \
+"  mtlr0\n"
+#define PPC_FTRACE_RET \
+"  blr\n"
+#else
+#define PPC_FTRACE_RESTORE_LR  \
+   PPC_LL" 0, "__stringify(PPC_LR_STKOFF)"(1)\n"   \
+"  mtctr   0\n"
+#define PPC_FTRACE_RET \
+"  mtlr0\n"\
+"  bctr\n"
+#endif
+
+asm (
+"  .pushsection.text, \"ax\", @progbits\n"
+"  .type   my_tramp1, @function\n"
+"  .globl  my_tramp1\n"
+"   my_tramp1:\n"
+   PPC_STL"0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+   PPC_STLU"   1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"  mflr0\n"
+   PPC_STL"0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+   PPC_STLU"   1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+   PPC64_TOC_SAVE_AND_UPDATE
+"  bl  my_direct_func1\n"
+   PPC64_TOC_RESTORE
+"  addi1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+   PPC_FTRACE_RESTORE_LR
+"  addi1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+   PPC_LL" 0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+   PPC_FTRACE_RET
+"  .size   my_tramp1, .-my_tramp1\n"
+
+"  .type   my_tramp2, @function\n"
+"  .globl  my_tramp2\n"
+"   my_tramp2:\n"
+   PPC_STL"0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+   PPC_STLU"   1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"  mflr0\n"
+   PPC_STL"0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+   PPC_STLU"   1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"

[RFC PATCH v4 15/17] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS

2024-07-14 Thread Naveen N Rao

Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64
implementation.

ftrace direct calls allow custom trampolines to be called into directly
from function ftrace call sites, bypassing the ftrace trampoline
completely. This functionality is currently utilized by BPF trampolines
to hook into kernel function entries.

Since we have limited relative branch range, we support ftrace direct
calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this
approach, ftrace trampoline is not entirely bypassed. Rather, it is
re-purposed into a stub that reads direct_call field from the associated
ftrace_ops structure and branches into that, if it is not NULL. For
this, it is sufficient if we can ensure that the ftrace trampoline is
reachable from all traceable functions.

When multiple ftrace_ops are associated with a call site, we utilize a
call back to set pt_regs->orig_gpr3 that can then be tested on the
return path from the ftrace trampoline to branch into the direct caller.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig |   1 +
 arch/powerpc/include/asm/ftrace.h|  15 +++
 arch/powerpc/kernel/asm-offsets.c|   3 +
 arch/powerpc/kernel/trace/ftrace.c   |   9 ++
 arch/powerpc/kernel/trace/ftrace_entry.S | 114 +--
 5 files changed, 113 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cb6031d86dc9..29aab3770415 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -236,6 +236,7 @@ config PPC
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if PPC_FTRACE_OUT_OF_LINE || 
(PPC32 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
+   select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if 
HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
select HAVE_DYNAMIC_FTRACE_WITH_REGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 267bd52fef21..2f1a6d25838d 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -150,6 +150,21 @@ extern unsigned int ftrace_ool_stub_text_end_count, 
ftrace_ool_stub_text_count,
 #endif
 void ftrace_free_init_tramp(void);
 unsigned long ftrace_call_adjust(unsigned long addr);
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+/*
+ * When an ftrace registered caller is tracing a function that is also set by a
+ * register_ftrace_direct() call, it needs to be differentiated in the
+ * ftrace_caller trampoline so that the direct call can be invoked after the
+ * other ftrace ops. To do this, place the direct caller in the orig_gpr3 field
+ * of pt_regs. This tells ftrace_caller that there's a direct caller.
+ */
+static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, 
unsigned long addr)
+{
+   struct pt_regs *regs = &fregs->regs;
+   regs->orig_gpr3 = addr;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
 #else
 static inline void ftrace_free_init_tramp(void) { }
 static inline unsigned long ftrace_call_adjust(unsigned long addr) { return 
addr; }
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 60d1e388c2ba..dbd56264a8bc 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -680,6 +680,9 @@ int main(void)
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+   OFFSET(FTRACE_OPS_DIRECT_CALL, ftrace_ops, direct_call);
+#endif
 #endif
 
return 0;
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 291c6c3d3a78..4316a7cfbdb8 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -150,6 +150,15 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, 
unsigned long addr, ppc_
else
ip = rec->ip;
 
+   if (!is_offset_in_branch_range(addr - ip) && addr != FTRACE_ADDR && 
addr != FTRACE_REGS_ADDR) {
+   /* This can only happen with ftrace direct */
+   if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS)) {
+   pr_err("0x%lx (0x%lx): Unexpected target address 
0x%lx\n", ip, rec->ip, addr);
+   return -EINVAL;
+   }
+   addr = FTRACE_ADDR;
+   }
+
if (is_offset_in_branch_range(addr - ip))
/* Within range */
stub = addr;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index c019380bdd6a..eea4cb3737a8 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b

[RFC PATCH v4 14/17] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS

2024-07-14 Thread Naveen N Rao

Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the
arm64 implementation.

This works by patching-in a pointer to an associated ftrace_ops
structure before each traceable function. If multiple ftrace_ops are
associated with a call site, then a special ftrace_list_ops is used to
enable iterating over all the registered ftrace_ops. If no ftrace_ops
are associated with a call site, then a special ftrace_nop_ops structure
is used to render the ftrace call as a no-op. ftrace trampoline can then
read the associated ftrace_ops for a call site by loading from an offset
from the LR, and branch directly to the associated function.

The primary advantage with this approach is that we don't have to
iterate over all the registered ftrace_ops for call sites that have a
single ftrace_ops registered. This is the equivalent of implementing
support for dynamic ftrace trampolines, which set up a special ftrace
trampoline for each registered ftrace_ops and have individual call sites
branch into those directly.

A secondary advantage is that this gives us a way to add support for
direct ftrace callers without having to resort to using stubs. The
address of the direct call trampoline can be loaded from the ftrace_ops
structure.

To support this, we reserve a nop before each function on 32-bit
powerpc. For 64-bit powerpc, two nops are reserved before each
out-of-line stub. During ftrace activation, we update this location with
the associated ftrace_ops pointer. Then, on ftrace entry, we load from
this location and call into ftrace_ops->func().

For 64-bit powerpc, we ensure that the out-of-line stub area is
doubleword aligned so that ftrace_ops address can be updated atomically.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  4 ++
 arch/powerpc/include/asm/ftrace.h  |  5 +-
 arch/powerpc/kernel/asm-offsets.c  |  4 ++
 arch/powerpc/kernel/trace/ftrace.c | 59 +-
 arch/powerpc/kernel/trace/ftrace_entry.S   | 36 ++---
 arch/powerpc/tools/ftrace-gen-ool-stubs.sh |  5 +-
 7 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a4dff8624510..cb6031d86dc9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -235,6 +235,7 @@ config PPC
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
+   select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if PPC_FTRACE_OUT_OF_LINE || 
(PPC32 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
select HAVE_DYNAMIC_FTRACE_WITH_REGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index c973e6cd1ae8..7dede0ec0163 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -158,8 +158,12 @@ KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
 ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
 CC_FLAGS_FTRACE := -fpatchable-function-entry=1
 else
+ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS # PPC32 only
+CC_FLAGS_FTRACE := -fpatchable-function-entry=3,1
+else
 CC_FLAGS_FTRACE := -fpatchable-function-entry=2
 endif
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index dc870824359c..267bd52fef21 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -138,8 +138,11 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { 
return 1; }
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
 #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
 struct ftrace_ool_stub {
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+   struct ftrace_ops *ftrace_op;
+#endif
u32 insn[4];
-};
+} __aligned(sizeof(unsigned long));
 extern struct ftrace_ool_stub ftrace_ool_stub_text_end[], 
ftrace_ool_stub_text[],
  ftrace_ool_stub_inittext[];
 extern unsigned int ftrace_ool_stub_text_end_count, ftrace_ool_stub_text_count,
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 6854547d3164..60d1e388c2ba 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -678,5 +678,9 @@ int main(void)
DEFINE(FTRACE_OOL_STUB_SIZE, sizeof(struct ftrace_ool_stub));
 #endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+   OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#endif
+
return 0;
 }
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index b4de8b8cbe3a..291c6c3d3a78 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -38,8 +38,11 @@ unsigned long ftrace_call_adjust(unsigned long addr)
return

[RFC PATCH v4 13/17] powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubs

2024-07-14 Thread Naveen N Rao

We are restricted to a .text size of ~32MB when using out-of-line
function profile sequence. Allow this to be extended up to the previous
limit of ~64MB by reserving space in the middle of .text.

A new config option CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE is
introduced to specify the number of function stubs that are reserved in
.text. On boot, ftrace utilizes stubs from this area first before using
the stub area at the end of .text.

A ppc64le defconfig has ~44k functions that can be traced. A more
conservative value of 32k functions is chosen as the default value of
PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE so that we do not allot more space
than necessary by default. If building a kernel that only has 32k
trace-able functions, we won't allot any more space at the end of .text
during the pass on vmlinux.o. Otherwise, only the remaining functions
get space for stubs at the end of .text. This default value should help
cover a .text size of ~48MB in total (including space reserved at the
end of .text which can cover up to 32MB), which should be sufficient for
most common builds. For a very small kernel build, this can be set to 0.
Or, this can be bumped up to a larger value to support vmlinux .text
size up to ~64MB.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig   | 12 
 arch/powerpc/include/asm/ftrace.h  |  6 --
 arch/powerpc/kernel/trace/ftrace.c | 21 +
 arch/powerpc/kernel/trace/ftrace_entry.S   |  8 
 arch/powerpc/tools/Makefile|  2 +-
 arch/powerpc/tools/ftrace-gen-ool-stubs.sh | 11 +++
 6 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f50cfd15bb73..a4dff8624510 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -573,6 +573,18 @@ config PPC_FTRACE_OUT_OF_LINE
depends on PPC64
select ARCH_WANTS_PRE_LINK_VMLINUX
 
+config PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE
+   int "Number of ftrace out-of-line stubs to reserve within .text"
+   default 32768 if PPC_FTRACE_OUT_OF_LINE
+   default 0
+   help
+ Number of stubs to reserve for use by ftrace. This space is
+ reserved within .text, and is distinct from any additional space
+ added at the end of .text before the final vmlinux link. Set to
+ zero to have stubs only be generated at the end of vmlinux (only
+ if the size of vmlinux is less than 32MB). Set to a higher value
+ if building vmlinux larger than 48MB.
+
 config HOTPLUG_CPU
bool "Support for enabling/disabling CPUs"
depends on SMP && (PPC_PSERIES || \
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 0589bb252de7..dc870824359c 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -140,8 +140,10 @@ extern unsigned int ftrace_tramp_text[], 
ftrace_tramp_init[];
 struct ftrace_ool_stub {
u32 insn[4];
 };
-extern struct ftrace_ool_stub ftrace_ool_stub_text_end[], 
ftrace_ool_stub_inittext[];
-extern unsigned int ftrace_ool_stub_text_end_count, 
ftrace_ool_stub_inittext_count;
+extern struct ftrace_ool_stub ftrace_ool_stub_text_end[], 
ftrace_ool_stub_text[],
+ ftrace_ool_stub_inittext[];
+extern unsigned int ftrace_ool_stub_text_end_count, ftrace_ool_stub_text_count,
+   ftrace_ool_stub_inittext_count;
 #endif
 void ftrace_free_init_tramp(void);
 unsigned long ftrace_call_adjust(unsigned long addr);
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index c03336301bad..b4de8b8cbe3a 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -168,7 +168,7 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, 
unsigned long addr, ppc_
 static int ftrace_init_ool_stub(struct module *mod, struct dyn_ftrace *rec)
 {
 #ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
-   static int ool_stub_text_end_index, ool_stub_inittext_index;
+   static int ool_stub_text_index, ool_stub_text_end_index, 
ool_stub_inittext_index;
int ret = 0, ool_stub_count, *ool_stub_index;
ppc_inst_t inst;
/*
@@ -191,9 +191,22 @@ static int ftrace_init_ool_stub(struct module *mod, struct 
dyn_ftrace *rec)
ool_stub_index = &ool_stub_inittext_index;
ool_stub_count = ftrace_ool_stub_inittext_count;
} else if (is_kernel_text(rec->ip)) {
-   ool_stub = ftrace_ool_stub_text_end;
-   ool_stub_index = &ool_stub_text_end_index;
-   ool_stub_count = ftrace_ool_stub_text_end_count;
+   /*
+* ftrace records are sorted, so we first use up the stub area 
within .text
+* (ftrace_ool_stub_text) before using the area at the end of 
.text
+* (ftrace_ool_stub_text_end), un

[RFC PATCH v4 12/17] powerpc64/ftrace: Move ftrace sequence out of line

2024-07-14 Thread Naveen N Rao

Function profile sequence on powerpc includes two instructions at the
beginning of each function:
mflrr0
bl  ftrace_caller

The call to ftrace_caller() gets nop'ed out during kernel boot and is
patched in when ftrace is enabled.

Given the sequence, we cannot return from ftrace_caller with 'blr' as we
need to keep LR and r0 intact. This results in link stack (return
address predictor) imbalance when ftrace is enabled. To address that, we
would like to use a three instruction sequence:
mflrr0
bl  ftrace_caller
mtlrr0

Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
reserve two instruction slots before the function. This results in a
total of five instruction slots to be reserved for ftrace use on each
function that is traced.

Move the function profile sequence out-of-line to minimize its impact.
To do this, we reserve a single nop at function entry using
-fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
the total number of functions that can be traced. This is then used to
generate a .S file reserving the appropriate amount of space for use as
ftrace stubs, which is built and linked into vmlinux.

On bootup, the stub space is split into separate stubs per function and
populated with the proper instruction sequence. A pointer to the
associated stub is maintained in dyn_arch_ftrace.

For modules, space for ftrace stubs is reserved from the generic module
stub space.

This is restricted to and enabled by default only on 64-bit powerpc,
though there are some changes to accommodate 32-bit powerpc. This is
done so that 32-bit powerpc could choose to opt into this based on
further tests and benchmarks.

As an example, after this patch, kernel functions will have a single nop
at function entry:
:
addis   r2,r12,467
addir2,r2,-16028
nop
mfocrf  r11,8
...

When ftrace is enabled, the nop is converted to an unconditional branch
to the stub associated with that function:
:
addis   r2,r12,467
addir2,r2,-16028
b   ftrace_ool_stub_text_end+0x11b28
mfocrf  r11,8
...

The associated stub:
:
mflrr0
bl  ftrace_caller
mtlrr0
b   kernel_clone+0xc
...

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig   |   5 +
 arch/powerpc/Makefile  |   4 +
 arch/powerpc/include/asm/ftrace.h  |  11 ++
 arch/powerpc/include/asm/module.h  |   5 +
 arch/powerpc/kernel/asm-offsets.c  |   4 +
 arch/powerpc/kernel/module_64.c|  58 +++-
 arch/powerpc/kernel/trace/ftrace.c | 157 +++--
 arch/powerpc/kernel/trace/ftrace_entry.S   | 116 +++
 arch/powerpc/tools/Makefile|  10 ++
 arch/powerpc/tools/ftrace-gen-ool-stubs.sh |  48 +++
 10 files changed, 381 insertions(+), 37 deletions(-)
 create mode 100644 arch/powerpc/tools/Makefile
 create mode 100755 arch/powerpc/tools/ftrace-gen-ool-stubs.sh

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 68f0e7a5576f..f50cfd15bb73 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -568,6 +568,11 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
def_bool 
$(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh 
$(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
def_bool 
$(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh 
$(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
 
+config PPC_FTRACE_OUT_OF_LINE
+   def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+   depends on PPC64
+   select ARCH_WANTS_PRE_LINK_VMLINUX
+
 config HOTPLUG_CPU
bool "Support for enabling/disabling CPUs"
depends on SMP && (PPC_PSERIES || \
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index bbfe4a1f06ef..c973e6cd1ae8 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -155,7 +155,11 @@ CC_FLAGS_NO_FPU:= $(call 
cc-option,-msoft-float)
 ifdef CONFIG_FUNCTION_TRACER
 ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+CC_FLAGS_FTRACE := -fpatchable-function-entry=1
+else
 CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 201f9d15430a..0589bb252de7 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -26,6 +26,10 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
unsigned long ip,
 struct module;
 struct dyn_ftrace;
 struct dyn_arch_ftrace {
+#ifdef CONFIG_PPC_FTRACE_OUT_OF_LINE
+   /* pointer to the associated out-of-line stub

[RFC PATCH v4 11/17] kbuild: Add generic hook for architectures to use before the final vmlinux link

2024-07-14 Thread Naveen N Rao

On powerpc, we would like to be able to make a pass on vmlinux.o and
generate a new object file to be linked into vmlinux. Add a generic pass
in Makefile.vmlinux that architectures can use for this purpose.

Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
provide arch//tools/Makefile with .arch.vmlinux.o target, which
will be invoked prior to the final vmlinux link step.

Signed-off-by: Naveen N Rao 
---
 arch/Kconfig |  6 ++
 scripts/Makefile.vmlinux |  8 
 scripts/link-vmlinux.sh  | 11 ---
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 975dd22a2dbd..ef868ff8156a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1643,4 +1643,10 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
 config ARCH_NEED_CMPXCHG_1_EMU
bool
 
+config ARCH_WANTS_PRE_LINK_VMLINUX
+   def_bool n
+   help
+ An architecture can select this if it provides 
arch//tools/Makefile
+ with .arch.vmlinux.o target to be linked into vmlinux.
+
 endmenu
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 49946cb96844..6410e0be7f52 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -22,6 +22,14 @@ targets += .vmlinux.export.o
 vmlinux: .vmlinux.export.o
 endif
 
+ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
+targets += .arch.vmlinux.o
+.arch.vmlinux.o: vmlinux.o FORCE
+   $(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools .arch.vmlinux.o
+
+vmlinux: .arch.vmlinux.o
+endif
+
 ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
 
 # Final link of vmlinux with optional arch pass after final link
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 518c70b8db50..aafaed1412ea 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -122,7 +122,7 @@ gen_btf()
return 1
fi
 
-   vmlinux_link ${1}
+   vmlinux_link ${1} ${arch_vmlinux_o}
 
info "BTF" ${2}
LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
@@ -178,7 +178,7 @@ kallsyms_step()
kallsymso=${kallsyms_vmlinux}.o
kallsyms_S=${kallsyms_vmlinux}.S
 
-   vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" 
${btf_vmlinux_bin_o}
+   vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" 
${btf_vmlinux_bin_o} ${arch_vmlinux_o}
mksysmap ${kallsyms_vmlinux} ${kallsyms_vmlinux}.syms
kallsyms ${kallsyms_vmlinux}.syms ${kallsyms_S}
 
@@ -223,6 +223,11 @@ fi
 
 ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init 
init/version-timestamp.o
 
+arch_vmlinux_o=""
+if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
+   arch_vmlinux_o=.arch.vmlinux.o
+fi
+
 btf_vmlinux_bin_o=""
 if is_enabled CONFIG_DEBUG_INFO_BTF; then
btf_vmlinux_bin_o=.btf.vmlinux.bin.o
@@ -273,7 +278,7 @@ if is_enabled CONFIG_KALLSYMS; then
fi
 fi
 
-vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o}
+vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o} ${arch_vmlinux_o}
 
 # fill in BTF IDs
 if is_enabled CONFIG_DEBUG_INFO_BTF && is_enabled CONFIG_BPF; then
-- 
2.45.2

[RFC PATCH v4 00/17] powerpc: Core ftrace rework, support for ftrace direct and bpf trampolines

2024-07-14 Thread Naveen N Rao

This is v4 of the series posted here:
http://lkml.kernel.org/r/cover.1718908016.git.nav...@kernel.org

This series reworks core ftrace support on powerpc to have the function 
profiling sequence moved out of line. This enables us to have a single 
nop at kernel function entry virtually eliminating effect of the 
function tracer when it is not enabled. The function profile sequence is 
moved out of line and is allocated at two separate places depending on a 
new config option.

For 64-bit powerpc, the function profiling sequence is also updated to 
include an additional instruction 'mtlr r0' after the usual 
two-instruction sequence to fix link stack imbalance (return address 
predictor) when ftrace is enabled. This showed an improvement of ~22% in 
null_syscall benchmark on a Power 10 system with ftrace enabled.

Finally, support for ftrace direct calls is added based on support for
DYNAMIC_FTRACE_WITH_CALL_OPS. BPF Trampoline support is added atop this.

Support for ftrace direct calls is added for 32-bit powerpc. There is 
some code to enable bpf trampolines for 32-bit powerpc, but it is not 
complete and will need to be pursued separately.

This is marked RFC so that this can get more testing. Patches 1 to 10 
are independent of this series and can go in separately though. Rest of 
the patches depend on the series from Benjamin Gray adding support for 
patch_uint() and patch_ulong():
http://lkml.kernel.org/r/20240515024445.236364-1-bg...@linux.ibm.com


Changelog v4:
- Patches 1, 10 and 13 are new.
- Address review comments from Nick. Numerous changes throughout the 
  patch series.
- Extend support for ftrace ool to vmlinux text up to 64MB (patch 13).
- Address remaining TODOs in support for BPF Trampolines.
- Update synchronization when patching instructions during trampoline 
  attach/detach.


- Naveen


Naveen N Rao (17):
  powerpc/trace: Account for -fpatchable-function-entry support by
toolchain
  powerpc/kprobes: Use ftrace to determine if a probe is at function
entry
  powerpc64/ftrace: Nop out additional 'std' instruction emitted by gcc
v5.x
  powerpc32/ftrace: Unify 32-bit and 64-bit ftrace entry code
  powerpc/module_64: Convert #ifdef to IS_ENABLED()
  powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
  powerpc/ftrace: Skip instruction patching if the instructions are the
same
  powerpc/ftrace: Move ftrace stub used for init text before _einittext
  powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into
bpf_jit_emit_func_call_rel()
  powerpc/ftrace: Add a postlink script to validate function tracer
  kbuild: Add generic hook for architectures to use before the final
vmlinux link
  powerpc64/ftrace: Move ftrace sequence out of line
  powerpc64/ftrace: Support .text larger than 32MB with out-of-line
stubs
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  samples/ftrace: Add support for ftrace direct samples on powerpc
  powerpc64/bpf: Add support for bpf trampolines

 arch/Kconfig|   6 +
 arch/powerpc/Kconfig|  23 +-
 arch/powerpc/Makefile   |   8 +
 arch/powerpc/Makefile.postlink  |   8 +
 arch/powerpc/include/asm/ftrace.h   |  32 +-
 arch/powerpc/include/asm/module.h   |   5 +
 arch/powerpc/include/asm/ppc-opcode.h   |  14 +
 arch/powerpc/kernel/asm-offsets.c   |  11 +
 arch/powerpc/kernel/kprobes.c   |  18 +-
 arch/powerpc/kernel/module_64.c |  66 +-
 arch/powerpc/kernel/trace/Makefile  |  11 +-
 arch/powerpc/kernel/trace/ftrace.c  | 295 ++-
 arch/powerpc/kernel/trace/ftrace_64_pg.c|  69 +-
 arch/powerpc/kernel/trace/ftrace_entry.S| 246 --
 arch/powerpc/kernel/vmlinux.lds.S   |   3 +-
 arch/powerpc/net/bpf_jit.h  |  12 +
 arch/powerpc/net/bpf_jit_comp.c | 842 +++-
 arch/powerpc/net/bpf_jit_comp32.c   |   7 +-
 arch/powerpc/net/bpf_jit_comp64.c   |  68 +-
 arch/powerpc/tools/Makefile |  10 +
 arch/powerpc/tools/ftrace-gen-ool-stubs.sh  |  52 ++
 arch/powerpc/tools/ftrace_check.sh  |  45 ++
 samples/ftrace/ftrace-direct-modify.c   |  85 +-
 samples/ftrace/ftrace-direct-multi-modify.c | 101 ++-
 samples/ftrace/ftrace-direct-multi.c|  79 +-
 samples/ftrace/ftrace-direct-too.c  |  83 +-
 samples/ftrace/ftrace-direct.c  |  69 +-
 scripts/Makefile.vmlinux|   8 +
 scripts/link-vmlinux.sh |  11 +-
 29 files changed, 2083 insertions(+), 204 deletions(-)
 create mode 100644 arch/powerpc/tools/Makefile
 create mode 100755 arch/powerpc/tools/ftrace-gen-ool-stubs.sh
 create mode 100755 arch/powerpc/tools/ftrace_check.sh


base-commit: 582b0e554593e530b1386eacafee6c412c5673cc
prerequisite-patch-id: a1d50e589288239d5a8b

Re: [RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines

2024-07-14 Thread Naveen N Rao


Hi Vishal,

Vishal Chourasia wrote:

On Fri, Jun 21, 2024 at 12:24:03AM +0530, Naveen N Rao wrote:

This is v3 of the patches posted here:
http://lkml.kernel.org/r/cover.1718008093.git.nav...@kernel.org

Since v2, I have addressed review comments from Steven and Masahiro 
along with a few fixes. Patches 7-11 are new in this series and add 
support for ftrace direct and bpf trampolines. 

This series depends on the patch series from Benjamin Gray adding 
support for patch_ulong():

http://lkml.kernel.org/r/20240515024445.236364-1-bg...@linux.ibm.com


- Naveen


Hello Naveen,

I've noticed an issue with `kstack()` in bpftrace [1] when using `kfunc` 
compared to `kprobe`. Despite trying all three modes specified in the 
documentation (bpftrace, perf, or raw), the stack isn't unwinding 
properly with `kfunc`. 


[1] 
https://github.com/bpftrace/bpftrace/blob/master/man/adoc/bpftrace.adoc#kstack


for mode in modes; do
run bpftrace with kfunc
disable cpu
kill bpftrace
run bpftrace with kprobe
enable cpu
kill bpftrace

# ./kprobe_vs_kfunc.sh
+ bpftrace -e 'kfunc:vmlinux:build_sched_domains {@[kstack(bpftrace), comm, 
tid]=count();}'
Attaching 1 probe...
+ chcpu -d 2-3
CPU 2 disabled
CPU 3 disabled
+ kill 35214

@[
bpf_prog_cfd8d6c8bb4898ce+972
, cpuhp/2, 33]: 1
@[
bpf_prog_cfd8d6c8bb4898ce+972
, cpuhp/3, 38]: 1


Yeah, this is because we don't capture the full register state with bpf 
trampolines unlike with kprobes. BPF stackmap relies on 
perf_arch_fetch_caller_regs() to create a dummy pt_regs for use by 
get_perf_callchain(). We end up with a NULL LR, and bpftrace (and most 
other userspace tools) stop showing the backtrace when they encounter a 
NULL entry. I recall fixing some tools to continue to show backtrace 
inspite of a NULL entry, but I may be mis-remembering.


Perhaps we should fix/change how the perf callchain is captured in the 
kernel. We filter out invalid entries, and capture an additional entry 
for perf since we can't be sure of our return address. We should revisit 
this and see if we can align with the usual expectations of a callchain 
not having a NULL entry. Something like this may help, but this needs 
more testing especially on the perf side:


diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 6b4434dd0ff3..9f67b764da92 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -83,12 +83,12 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
* We can't tell which of the first two addresses
* we get are valid, but we can filter out the
* obviously bogus ones here.  We replace them
-* with 0 rather than removing them entirely so
+* with -1 rather than removing them entirely so
* that userspace can tell which is which.
*/
   if ((level == 1 && next_ip == lr) ||
   (level <= 1 && !kernel_text_address(next_ip)))
-   next_ip = 0;
+   next_ip = -1;

   ++level;
}


- Naveen

Re: WARNING&Oops in v6.6.37 on ppc64lea - Trying to vfree() bad address (00000000453be747)

2024-07-09 Thread Naveen N Rao

Greg Kroah-Hartman wrote:

On Mon, Jul 08, 2024 at 11:16:48PM -0400, matoro wrote:

On 2024-07-05 16:34, Vitaly Chikunov wrote:
> Hi,
> 
> There is new WARNING and Oops on ppc64le in v6.6.37 when running LTP tests:

> bpf_prog01, bpf_prog02, bpf_prog04, bpf_prog05, prctl04. Logs excerpt
> below. I
> see there is 1 commit in v6.6.36..v6.6.37 with call to
> bpf_jit_binary_pack_finalize, backported from 5 patch mainline patchset:
> 
>   f99feda5684a powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
> 

> 
> And so on. Temporary build/test log is at

> https://git.altlinux.org/tasks/352218/build/100/ppc64le/log
> 
> Other stable/longterm branches or other architectures does not exhibit this.
> 
> Thanks,

Hi all - this just took down a production server for me, on POWER9 bare
metal.  Not running tests, just booting normally, before services even came
up.  Had to perform manual restoration, reverting to 6.6.36 worked.  Also
running 64k kernel, unsure if it's better on 4k kernel.

In case it's helpful, here's the log from my boot:
https://dpaste.org/Gyxxg/raw

Ok, this isn't good, something went wrong with my backports here.  Let
me go revert them all and push out a new 6.6.y release right away.

I think the problem is that the series adding support for bpf prog_pack 
was partially backported. In particular, the below patches are missing 
from stable v6.6:

465cabc97b42 powerpc/code-patching: introduce patch_instructions()
033ffaf0af1f powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
6efc1675acb8 powerpc/bpf: implement bpf_arch_text_copy

It should be sufficient to revert commit f99feda5684a (powerpc/bpf: use 
bpf_jit_binary_pack_[alloc|finalize|free]) to allow the above to apply 
cleanly, followed by cherry picking commit 90d862f370b6 (powerpc/bpf: 
use bpf_jit_binary_pack_[alloc|finalize|free]) from upstream.

Alternately, commit f99feda5684a (powerpc/bpf: use 
bpf_jit_binary_pack_[alloc|finalize|free]) can be reverted.

- Naveen

Re: [PATCH v3] PowerPC: Replace kretprobe with rethook

2024-07-09 Thread Naveen N Rao


Masami Hiramatsu wrote:

On Thu, 27 Jun 2024 09:21:01 -0400
Abhishek Dubey  wrote:


+/* rethook initializer */
+int __init arch_init_kprobes(void)
+{
+   return register_kprobe(&trampoline_p);
+}


No, please don't use arch_init_kprobes() for initializing rethook, since
rethook is used from fprobe too (at this moment).

If you want to make it relays on kprobes, you have to make a dependency
in powerpc's kconfig, e.g.

select HAVE_RETHOOK if KPROBES

But I don't recommend it.


Given that kretprobe has always worked this way on powerpc, I think this
is a fair tradeoff. We get to enable fprobes on powerpc only if kprobes
is also enabled.

Longer term, it would certainly be nice to get rid of that probe, and to
expand the trampoline to directly invoke the rethook callback.


Thanks,
Naveen

Re: [RFC PATCH v3 11/11] powerpc64/bpf: Add support for bpf trampolines

2024-07-01 Thread Naveen N Rao

On Mon, Jul 01, 2024 at 09:03:52PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 5:09 AM AEST, Naveen N Rao wrote:
> > Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
> > for 64-bit powerpc.
> 
> What do BPF trampolines give you?

At a very basic level, they provide a way to attach bpf programs at 
function entry/exit - as an alternative to ftrace/kprobe - in a more 
optimal manner. Commit fec56f5890d9 ("bpf: Introduce BPF trampoline") 
has more details.

> 
> > BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
> > having a single nop at function entry, followed by the function
> > profiling sequence out-of-line and a separate long branch stub for calls
> > to trampolines that are out of range. A dummy_tramp is provided to
> > simplify synchronization similar to arm64.
> 
> Synrhonization - between BPF and ftrace interfaces?
> 
> > BPF Trampolines adhere to the existing ftrace ABI utilizing a
> > two-instruction profiling sequence, as well as the newer ABI utilizing a
> > three-instruction profiling sequence enabling return with a 'blr'. The
> > trampoline code itself closely follows x86 implementation.
> >
> > While the code is generic, BPF trampolines are only enabled on 64-bit
> > powerpc. 32-bit powerpc will need testing and some updates.
> >
> > Signed-off-by: Naveen N Rao 
> 
> Just a quick glance for now, and I don't know BPF code much.
> 
> > ---
> >  arch/powerpc/include/asm/ppc-opcode.h |  14 +
> >  arch/powerpc/net/bpf_jit.h|  11 +
> >  arch/powerpc/net/bpf_jit_comp.c   | 702 +-
> >  arch/powerpc/net/bpf_jit_comp32.c |   7 +-
> >  arch/powerpc/net/bpf_jit_comp64.c |   7 +-
> >  5 files changed, 738 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
> > b/arch/powerpc/include/asm/ppc-opcode.h
> > index 076ae60b4a55..9eaa2c5d9b73 100644
> > --- a/arch/powerpc/include/asm/ppc-opcode.h
> > +++ b/arch/powerpc/include/asm/ppc-opcode.h
> > @@ -585,12 +585,26 @@
> >  #define PPC_RAW_MTSPR(spr, d)  (0x7c0003a6 | ___PPC_RS(d) | 
> > __PPC_SPR(spr))
> >  #define PPC_RAW_EIEIO()(0x7c0006ac)
> >  
> > +/* bcl 20,31,$+4 */
> > +#define PPC_RAW_BCL()  (0x429f0005)
> 
> This is the special bcl form that gives the current address.
> Maybe call it PPC_RAW_BCL4()

Sure.

> 
> >  
> > +void dummy_tramp(void);
> > +
> > +asm (
> > +"  .pushsection .text, \"ax\", @progbits   ;"
> > +"  .global dummy_tramp ;"
> > +"  .type dummy_tramp, @function;"
> > +"dummy_tramp:  ;"
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +"  blr ;"
> > +#else
> > +"  mflr11  ;"
> 
> Can you just drop this instruction? The caller will always
> have it in r11?

Indeed. Will add a comment and remove the instruction.

> 
> > +"  mtctr   11  ;"
> > +"  mtlr0   ;"
> > +"  bctr;"
> > +#endif
> > +"  .size dummy_tramp, .-dummy_tramp;"
> > +"  .popsection ;"
> > +);
> > +
> > +void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
> > +{
> > +   int ool_stub_idx, long_branch_stub_idx;
> > +
> > +   /*
> > +* Out-of-line stub:
> > +*  mflrr0
> > +*  [b|bl]  tramp
> > +*  mtlrr0 // only with CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +*  b   bpf_func + 4
> > +*/
> > +   ool_stub_idx = ctx->idx;
> > +   EMIT(PPC_RAW_MFLR(_R0));
> > +   EMIT(PPC_RAW_NOP());
> > +   if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
> > +   EMIT(PPC_RAW_MTLR(_R0));
> > +   WARN_ON_ONCE(!is_offset_in_branch_range(4 - (long)ctx->idx * 4)); /* 
> > TODO */
> > +   EMIT(PPC_RAW_BRANCH(4 - (long)ctx->idx * 4));
> > +
> > +   /*
> > +* Long branch stub:
> > +*  .long   
> > +*  mflrr11
> > +*  bcl 20,31,$+4
> > +*  mflrr12
> > +*  ld  r12, -8-SZL(r12)
> > +*  mtctr   r12
> > +*  mtlrr11 // needed to retain ftrace ABI
> > +*  bctr
> > +*/
> 
> You could avoid clobbering LR on >= POWER9 with addpcis instruction. Or
> use a pcrel load with pcrel even. I guess that's something to do later.

Yes, much of BPF JIT could use a re-look to consider opportunities to 
emit prefix instructions.


Thanks,
Naveen

Re: [RFC PATCH v3 06/11] powerpc64/ftrace: Move ftrace sequence out of line

2024-07-01 Thread Naveen N Rao

On Mon, Jul 01, 2024 at 08:39:03PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> > Function profile sequence on powerpc includes two instructions at the
> > beginning of each function:
> > mflrr0
> > bl  ftrace_caller
> >
> > The call to ftrace_caller() gets nop'ed out during kernel boot and is
> > patched in when ftrace is enabled.
> >
> > Given the sequence, we cannot return from ftrace_caller with 'blr' as we
> > need to keep LR and r0 intact. This results in link stack imbalance when
> 
> (link stack is IBMese for "return address predictor", if that wasn't
> obvious)
> 
> > ftrace is enabled. To address that, we would like to use a three
> > instruction sequence:
> > mflrr0
> > bl  ftrace_caller
> > mtlrr0
> >
> > Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
> > reserve two instruction slots before the function. This results in a
> > total of five instruction slots to be reserved for ftrace use on each
> > function that is traced.
> >
> > Move the function profile sequence out-of-line to minimize its impact.
> > To do this, we reserve a single nop at function entry using
> > -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
> 
> What's the need to do this on vmlinux.o rather than vmlinux? We have
> all function syms?

We want to be able to build and include another .o file to be linked 
into vmlinux. That file contains symbols (pfe_stub_text, et al) used by 
vmlinux.o

> 
> > the total number of functions that can be traced. This is then used to
> > generate a .S file reserving the appropriate amount of space for use as
> > ftrace stubs, which is built and linked into vmlinux.
> 
> An example instruction listing for the "after" case would be nice too.

Sure.

> 
> Is this all ftrace stubs in the one place? And how do you deal with
> kernel size exceeding the limit, if so?

Yes, all at the end. Ftrace init fails on bootup if text size exceeds 
branch range. I should really be putting in a post-link script to detect 
and break the build in that case.

> 
> >
> > On bootup, the stub space is split into separate stubs per function and
> > populated with the proper instruction sequence. A pointer to the
> > associated stub is maintained in dyn_arch_ftrace.
> >
> > For modules, space for ftrace stubs is reserved from the generic module
> > stub space.
> >
> > This is restricted to and enabled by default only on 64-bit powerpc.
> 
> This is cool.
> 
> [...]
> 
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -568,6 +568,11 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
> > def_bool 
> > $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh
> >  $(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
> > def_bool 
> > $(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh
> >  $(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
> >  
> > +config FTRACE_PFE_OUT_OF_LINE
> > +   def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
> > +   depends on PPC64
> > +   select ARCH_WANTS_PRE_LINK_VMLINUX
> 
> This remains powerpc specific? Maybe add a PPC_ prefix to the config
> option?
> 
> Bikeshed - should PFE be expanded to be consistent with the ARCH_
> option?

I agree. PFE isn't immediately obvious. Now that I think about it, not 
sure it really matters that this uses -fpatchable-function-entry. I'll 
call this PPC_FTRACE_SEQUENCE_OUT_OF_LINE. Suggestions welcome :)

> 
> [...]
> 
> > diff --git a/arch/powerpc/include/asm/ftrace.h 
> > b/arch/powerpc/include/asm/ftrace.h
> > index 201f9d15430a..9da1da0f87b4 100644
> > --- a/arch/powerpc/include/asm/ftrace.h
> > +++ b/arch/powerpc/include/asm/ftrace.h
> > @@ -26,6 +26,9 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
> > unsigned long ip,
> >  struct module;
> >  struct dyn_ftrace;
> >  struct dyn_arch_ftrace {
> > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
> > +   unsigned long pfe_stub;
> > +#endif
> >  };
> 
> Ah, we put something else in here. This is the offset to the
> stub? Maybe call it pfe_stub_offset?

Ack.

> 
> [...]
> 
> > diff --git a/arch/powerpc/kernel/trace/ftrace.c 
> > b/arch/powerpc/kernel/trace/ftrace.c
> > index 2cff37b5fd2c..9f3c10307331 100644
> > --- a/arch/powerpc/kernel/trace/ftrace.c
> > +++ b/arch/powerpc/kernel/trace/ftrace.c
> > @@ -37,7

Re: [RFC PATCH v3 04/11] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace

2024-07-01 Thread Naveen N Rao

On Mon, Jul 01, 2024 at 07:27:55PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> > Pointer to struct module is only relevant for ftrace records belonging
> > to kernel modules. Having this field in dyn_arch_ftrace wastes memory
> > for all ftrace records belonging to the kernel. Remove the same in
> > favour of looking up the module from the ftrace record address, similar
> > to other architectures.
> 
> arm is the only one left that requires dyn_arch_ftrace after this.

Yes, but as you noticed, we add a different field in a subsequenct patch 
in this series.

> 
> >
> > Signed-off-by: Naveen N Rao 
> > ---
> >  arch/powerpc/include/asm/ftrace.h|  1 -
> >  arch/powerpc/kernel/trace/ftrace.c   | 54 +++---
> >  arch/powerpc/kernel/trace/ftrace_64_pg.c | 73 +++-
> >  3 files changed, 65 insertions(+), 63 deletions(-)
> >
> 
> [snip]
> 
> > @@ -106,28 +106,48 @@ static unsigned long find_ftrace_tramp(unsigned long 
> > ip)
> > return 0;
> >  }
> >  
> > +#ifdef CONFIG_MODULES
> > +static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned 
> > long addr)
> > +{
> > +   struct module *mod = NULL;
> > +
> > +   /*
> > +* NOTE: __module_text_address() must be called with preemption
> > +* disabled, but we can rely on ftrace_lock to ensure that 'mod'
> > +* retains its validity throughout the remainder of this code.
> > +*/
> > +   preempt_disable();
> > +   mod = __module_text_address(ip);
> > +   preempt_enable();
> 
> If 'mod' was guaranteed to exist before your patch, then it
> should do afterward too. But is it always ftrace_lock that
> protects it, or do dyn_ftrace entries pin a module in some
> cases?

We don't pin a module. It is the ftrace_lock acquired during 
delete_module() in ftrace_release_mod() that protects it.

You're right though. That comment is probably not necessary since there 
are no new users of this new function.

> 
> > @@ -555,7 +551,10 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned 
> > long old_addr,
> > ppc_inst_t op;
> > unsigned long ip = rec->ip;
> > unsigned long entry, ptr, tramp;
> > -   struct module *mod = rec->arch.mod;
> > +   struct module *mod = ftrace_lookup_module(rec);
> > +
> > +   if (!mod)
> > +   return -EINVAL;
> >  
> > /* If we never set up ftrace trampolines, then bail */
> > if (!mod->arch.tramp || !mod->arch.tramp_regs) {
> > @@ -668,14 +667,6 @@ int ftrace_modify_call(struct dyn_ftrace *rec, 
> > unsigned long old_addr,
> > return -EINVAL;
> > }
> >  
> > -   /*
> > -* Out of range jumps are called from modules.
> > -*/
> > -   if (!rec->arch.mod) {
> > -   pr_err("No module loaded\n");
> > -   return -EINVAL;
> > -   }
> > -
> 
> A couple of these conversions are not _exactly_ the same (lost
> the pr_err here), maybe that's deliberate because the messages
> don't look too useful.

Indeed. Most of the earlier ones being eliminated are in 
ftrace_init_nop(). The other ones get covered by the pr_err in 
ftrace_lookup_module()/ftrace_lookup_module_stub().

> 
> Looks okay though
> 
> Reviewed-by: Nicholas Piggin 


Thanks,
Naveen

Re: [RFC PATCH v3 02/11] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code

2024-07-01 Thread Naveen N Rao

On Mon, Jul 01, 2024 at 06:57:12PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> > On 32-bit powerpc, gcc generates a three instruction sequence for
> > function profiling:
> > mflrr0
> > stw r0, 4(r1)
> > bl  _mcount
> >
> > On kernel boot, the call to _mcount() is nop-ed out, to be patched back
> > in when ftrace is actually enabled. The 'stw' instruction therefore is
> > not necessary unless ftrace is enabled. Nop it out during ftrace init.
> >
> > When ftrace is enabled, we want the 'stw' so that stack unwinding works
> > properly. Perform the same within the ftrace handler, similar to 64-bit
> > powerpc.
> >
> > For 64-bit powerpc, early versions of gcc used to emit a three
> > instruction sequence for function profiling (with -mprofile-kernel) with
> > a 'std' instruction to mimic the 'stw' above. Address that scenario also
> > by nop-ing out the 'std' instruction during ftrace init.
> 
> Cool! Could 32-bit use the 2-insn sequence as well if it had
> -mprofile-kernel, out of curiosity?

Yes! It actually already does with the previous change to add support 
for -fpatchable-function-entry. Commit 0f71dcfb4aef ("powerpc/ftrace: 
Add support for -fpatchable-function-entry") changelog describes this:

This changes the profiling instructions used on ppc32. The default -pg
option emits an additional 'stw' instruction after 'mflr r0' and before
the branch to _mcount 'bl _mcount'. This is very similar to the original
-mprofile-kernel implementation on ppc64le, where an additional 'std'
instruction was used to save LR to its save location in the caller's
    stackframe. Subsequently, this additional store was removed in later
compiler versions for performance reasons. The same reasons apply for
ppc32 so we only patch in a 'mflr r0'.


> 
> >
> > Signed-off-by: Naveen N Rao 
> > ---
> >  arch/powerpc/kernel/trace/ftrace.c   | 6 --
> >  arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
> >  2 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/trace/ftrace.c 
> > b/arch/powerpc/kernel/trace/ftrace.c
> > index d8d6b4fd9a14..463bd7531dc8 100644
> > --- a/arch/powerpc/kernel/trace/ftrace.c
> > +++ b/arch/powerpc/kernel/trace/ftrace.c
> > @@ -241,13 +241,15 @@ int ftrace_init_nop(struct module *mod, struct 
> > dyn_ftrace *rec)
> > /* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
> > ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
> > if (!ret)
> > -   ret = ftrace_validate_inst(ip - 4, 
> > ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
> > +   ret = ftrace_modify_code(ip - 4, 
> > ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
> > +ppc_inst(PPC_RAW_NOP()));
> > } else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
> > /* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl 
> > _mcount' */
> > ret = ftrace_read_inst(ip - 4, &old);
> > if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0 {
> > ret = ftrace_validate_inst(ip - 8, 
> > ppc_inst(PPC_RAW_MFLR(_R0)));
> > -   ret |= ftrace_validate_inst(ip - 4, 
> > ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
> > +   ret |= ftrace_modify_code(ip - 4, 
> > ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
> > + ppc_inst(PPC_RAW_NOP()));
> 
> So this is the old style path... Should you check the mflr validate
> result first? Also do you know what GCC version, roughly? Maybe we
> could have a comment here and eventually deprecate it.

Sure, this is gcc v5.5 for sure. gcc v6.3 doesn't seem to emit the 
additional 'std' instruction.

> 
> You could split this change into its own patch.

Indeed. I will do that.

> 
> > }
> > } else {
> > return -EINVAL;
> > diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
> > b/arch/powerpc/kernel/trace/ftrace_entry.S
> > index 76dbe9fd2c0f..244a1c7bb1e8 100644
> > --- a/arch/powerpc/kernel/trace/ftrace_entry.S
> > +++ b/arch/powerpc/kernel/trace/ftrace_entry.S
> > @@ -33,6 +33,8 @@
> >   * and then arrange for the ftrace function to be called.
> >   */
> >  .macro ftrace_regs_entry allregs
> > +   /* Save the original return address in A's stack frame */
> > +   PPC_STL r0, LRSAVE(r1)
> > /* Create a minimal stack frame for representing B */
> > PPC_STLUr1, -STACK_FRAME_MIN_SIZE(r1)
> >  
> > @@ -44,8 +46,6 @@
> > SAVE_GPRS(3, 10, r1)
> >  
> >  #ifdef CONFIG_PPC64
> > -   /* Save the original return address in A's stack frame */
> > -   std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
> > /* Ok to continue? */
> > lbz r3, PACA_FTRACE_ENABLED(r13)
> > cmpdi   r3, 0
> 
> That seems right to me.
> 
> Reviewed-by: Nicholas Piggin 

Thanks,
Naveen

Re: [RFC PATCH v3 01/11] powerpc/kprobes: Use ftrace to determine if a probe is at function entry

2024-07-01 Thread Naveen N Rao

Hi Nick,
Thanks for the reviews!

On Mon, Jul 01, 2024 at 06:40:50PM GMT, Nicholas Piggin wrote:
> On Fri Jun 21, 2024 at 4:54 AM AEST, Naveen N Rao wrote:
> > Rather than hard-coding the offset into a function to be used to
> > determine if a kprobe is at function entry, use ftrace_location() to
> > determine the ftrace location within the function and categorize all
> > instructions till that offset to be function entry.
> >
> > For functions that cannot be traced, we fall back to using a fixed
> > offset of 8 (two instructions) to categorize a probe as being at
> > function entry for 64-bit elfv2, unless we are using pcrel.
> >
> > Acked-by: Masami Hiramatsu (Google) 
> > Signed-off-by: Naveen N Rao 
> > ---
> >  arch/powerpc/kernel/kprobes.c | 18 --
> >  1 file changed, 8 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> > index 14c5ddec3056..ca204f4f21c1 100644
> > --- a/arch/powerpc/kernel/kprobes.c
> > +++ b/arch/powerpc/kernel/kprobes.c
> > @@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
> > unsigned int offset)
> > return addr;
> >  }
> >  
> > -static bool arch_kprobe_on_func_entry(unsigned long offset)
> > +static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long 
> > offset)
> >  {
> > -#ifdef CONFIG_PPC64_ELF_ABI_V2
> > -#ifdef CONFIG_KPROBES_ON_FTRACE
> > -   return offset <= 16;
> > -#else
> > -   return offset <= 8;
> > -#endif
> > -#else
> > +   unsigned long ip = ftrace_location(addr);
> > +
> > +   if (ip)
> > +   return offset <= (ip - addr);
> > +   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && 
> > !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
> > +   return offset <= 8;
> 
> If it is PCREL, why not offset == 0 as well?

That's handled by the fallback code that is after the above line:
return !offset;

That addresses both pcrel, as well as 32-bit powerpc.

Thanks,
Naveen

Re: [PATCH v3] PowerPC: Replace kretprobe with rethook

2024-07-01 Thread Naveen N Rao

mpoline, @function\n"
> - "__kretprobe_trampoline:\n"
> - "nop\n"
> - "blr\n"
> - ".size __kretprobe_trampoline, .-__kretprobe_trampoline\n");
> -
> -/*
> - * Called when the probe at kretprobe trampoline is hit
> - */
> -static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
> -{
> - unsigned long orig_ret_address;
> -
> - orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
> - /*
> -  * We get here through one of two paths:
> -  * 1. by taking a trap -> kprobe_handler() -> here
> -  * 2. by optprobe branch -> optimized_callback() -> opt_pre_handler() 
> -> here
> -  *
> -  * When going back through (1), we need regs->nip to be setup properly
> -  * as it is used to determine the return address from the trap.
> -  * For (2), since nip is not honoured with optprobes, we instead setup
> -  * the link register properly so that the subsequent 'blr' in
> -  * __kretprobe_trampoline jumps back to the right instruction.
> -  *
> -  * For nip, we should set the address to the previous instruction since
> -  * we end up emulating it in kprobe_handler(), which increments the nip
> -  * again.
> -  */
> - regs_set_return_ip(regs, orig_ret_address - 4);
> - regs->link = orig_ret_address;
> -
> - return 0;
> -}
> -NOKPROBE_SYMBOL(trampoline_probe_handler);
> -
>  /*
>   * Called after single-stepping.  p->addr is the address of the
>   * instruction whose first byte has been replaced by the "breakpoint"
> @@ -539,19 +486,9 @@ int kprobe_fault_handler(struct pt_regs *regs, int 
> trapnr)
>  }
>  NOKPROBE_SYMBOL(kprobe_fault_handler);
>  
> -static struct kprobe trampoline_p = {
> - .addr = (kprobe_opcode_t *) &__kretprobe_trampoline,
> - .pre_handler = trampoline_probe_handler
> -};
> -
> -int __init arch_init_kprobes(void)
> -{
> - return register_kprobe(&trampoline_p);
> -}
> -
>  int arch_trampoline_kprobe(struct kprobe *p)
>  {
> - if (p->addr == (kprobe_opcode_t *)&__kretprobe_trampoline)
> + if (p->addr == (kprobe_opcode_t *)&arch_rethook_trampoline)
>   return 1;
>  
>   return 0;
> diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
> index 004fae2044a3..c0b351d61058 100644
> --- a/arch/powerpc/kernel/optprobes.c
> +++ b/arch/powerpc/kernel/optprobes.c
> @@ -56,7 +56,7 @@ static unsigned long can_optimize(struct kprobe *p)
>* has a 'nop' instruction, which can be emulated.
>* So further checks can be skipped.
>*/
> - if (p->addr == (kprobe_opcode_t *)&__kretprobe_trampoline)
> + if (p->addr == (kprobe_opcode_t *)&arch_rethook_trampoline)
>   return addr + sizeof(kprobe_opcode_t);
>  
>   /*
> diff --git a/arch/powerpc/kernel/rethook.c b/arch/powerpc/kernel/rethook.c
> new file mode 100644
> index ..d2453793ea5d
> --- /dev/null
> +++ b/arch/powerpc/kernel/rethook.c
> @@ -0,0 +1,77 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * PowerPC implementation of rethook. This depends on kprobes.
> + */
> +
> +#include 
> +#include 
> +
> +/*
> + * Function return trampoline:
> + * - init_kprobes() establishes a probepoint here
> + * - When the probed function returns, this probe
> + * causes the handlers to fire
> + */
> +asm(".global arch_rethook_trampoline\n"
> + ".type arch_rethook_trampoline, @function\n"
> + "arch_rethook_trampoline:\n"
> + "nop\n"
> +     "blr\n"
> + ".size arch_rethook_trampoline, .-arch_rethook_trampoline\n");
> +
> +/*
> + * Called when the probe at kretprobe trampoline is hit
> + */
> +static int trampoline_rethook_handler(struct kprobe *p, struct pt_regs *regs)
> +{
> + unsigned long orig_ret_address;
> +
> + orig_ret_address = rethook_trampoline_handler(regs, 0);
> + rethook_trampoline_handler(regs, orig_ret_address);
> + return 0;
> +}

I think we should pass in regs->gpr[1] to allow it to be verified 
against the stack pointer we saved on function entry. This can be 
simplified to:

static int trampoline_rethook_handler(struct kprobe *p, struct pt_regs *regs)
{
return !rethook_trampoline_handler(regs, regs->gpr[1]);
}

I have tested this patch with that change. So, with that change 
included:
Reviewed-by: Naveen N Rao 


> +NOKPROBE_SYMBOL(trampoline_rethook_handler);
> +
> +void arch_rethook_prepare(struct rethook_node *rh, struct pt_regs *regs, 
> bool mcou

[RFC PATCH v3 11/11] powerpc64/bpf: Add support for bpf trampolines

2024-06-20 Thread Naveen N Rao

Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for 64-bit powerpc.

BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
having a single nop at function entry, followed by the function
profiling sequence out-of-line and a separate long branch stub for calls
to trampolines that are out of range. A dummy_tramp is provided to
simplify synchronization similar to arm64.

BPF Trampolines adhere to the existing ftrace ABI utilizing a
two-instruction profiling sequence, as well as the newer ABI utilizing a
three-instruction profiling sequence enabling return with a 'blr'. The
trampoline code itself closely follows x86 implementation.

While the code is generic, BPF trampolines are only enabled on 64-bit
powerpc. 32-bit powerpc will need testing and some updates.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ppc-opcode.h |  14 +
 arch/powerpc/net/bpf_jit.h|  11 +
 arch/powerpc/net/bpf_jit_comp.c   | 702 +-
 arch/powerpc/net/bpf_jit_comp32.c |   7 +-
 arch/powerpc/net/bpf_jit_comp64.c |   7 +-
 5 files changed, 738 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 076ae60b4a55..9eaa2c5d9b73 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -585,12 +585,26 @@
 #define PPC_RAW_MTSPR(spr, d)  (0x7c0003a6 | ___PPC_RS(d) | 
__PPC_SPR(spr))
 #define PPC_RAW_EIEIO()(0x7c0006ac)
 
+/* bcl 20,31,$+4 */
+#define PPC_RAW_BCL()  (0x429f0005)
 #define PPC_RAW_BRANCH(offset) (0x4800 | PPC_LI(offset))
 #define PPC_RAW_BL(offset) (0x4801 | PPC_LI(offset))
 #define PPC_RAW_TW(t0, a, b)   (0x7c08 | ___PPC_RS(t0) | 
___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_TRAP() PPC_RAW_TW(31, 0, 0)
 #define PPC_RAW_SETB(t, bfa)   (0x7c000100 | ___PPC_RT(t) | 
___PPC_RA((bfa) << 2))
 
+#ifdef CONFIG_PPC32
+#define PPC_RAW_STLPPC_RAW_STW
+#define PPC_RAW_STLU   PPC_RAW_STWU
+#define PPC_RAW_LL PPC_RAW_LWZ
+#define PPC_RAW_CMPLI  PPC_RAW_CMPWI
+#else
+#define PPC_RAW_STLPPC_RAW_STD
+#define PPC_RAW_STLU   PPC_RAW_STDU
+#define PPC_RAW_LL PPC_RAW_LD
+#define PPC_RAW_CMPLI  PPC_RAW_CMPDI
+#endif
+
 /* Deal with instructions that older assemblers aren't aware of */
 #definePPC_BCCTR_FLUSH stringify_in_c(.long 
PPC_INST_BCCTR_FLUSH)
 #definePPC_CP_ABORTstringify_in_c(.long PPC_RAW_CP_ABORT)
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index cdea5dccaefe..58cdfbfbef94 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -21,6 +21,9 @@
 
 #define CTX_NIA(ctx) ((unsigned long)ctx->idx * 4)
 
+#define SZLsizeof(unsigned long)
+#define BPF_INSN_SAFETY64
+
 #define PLANT_INSTR(d, idx, instr)   \
do { if (d) { (d)[idx] = instr; } idx++; } while (0)
 #define EMIT(instr)PLANT_INSTR(image, ctx->idx, instr)
@@ -81,6 +84,13 @@
EMIT(PPC_RAW_ORI(d, d, (uintptr_t)(i) &   \
0x)); \
} } while (0)
+#define PPC_LI_ADDRPPC_LI64
+#define PPC64_LOAD_PACA()\
+   EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)))
+#else
+#define PPC_LI64   BUILD_BUG
+#define PPC_LI_ADDRPPC_LI32
+#define PPC64_LOAD_PACA() BUILD_BUG()
 #endif
 
 /*
@@ -165,6 +175,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 
*fimage, struct code
   u32 *addrs, int pass, bool extra_pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
+void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr);
 
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 984655419da5..54df51ce54c8 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -22,11 +22,81 @@
 
 #include "bpf_jit.h"
 
+/* These offsets are from bpf prog end and stay the same across progs */
+static int bpf_jit_ool_stub, bpf_jit_long_branch_stub;
+
 static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
 {
memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
+void dummy_tramp(void);
+
+asm (
+"  .pushsection .text, \"ax\", @progbits   ;"
+"  .global dummy_tramp ;

[RFC PATCH v3 10/11] powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into bpf_jit_emit_func_call_rel()

2024-06-20 Thread Naveen N Rao

Commit 61688a82e047 ("powerpc/bpf: enable kfunc call") enhanced
bpf_jit_emit_func_call_hlp() to handle calls out to module region, where
bpf progs are generated. The only difference now between
bpf_jit_emit_func_call_hlp() and bpf_jit_emit_func_call_rel() is in
handling of the initial pass where target function address is not known.
Fold that logic into bpf_jit_emit_func_call_hlp() and rename it to
bpf_jit_emit_func_call_rel() to simplify bpf function call JIT code.

We don't actually need to load/restore TOC across a call out to a
different kernel helper or to a different bpf program since they all
work with the kernel TOC. We only need to do it if we have to call out
to a module function. So, guard TOC load/restore with appropriate
conditions.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/net/bpf_jit_comp64.c | 61 +--
 1 file changed, 17 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 7703dcf48be8..288ff32d676f 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -202,14 +202,22 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_BLR());
 }
 
-static int
-bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func)
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func)
 {
unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
long reladdr;
 
-   if (WARN_ON_ONCE(!kernel_text_address(func_addr)))
-   return -EINVAL;
+   /* bpf to bpf call, func is not known in the initial pass. Emit 5 nops 
as a placeholder */
+   if (!func) {
+   for (int i = 0; i < 5; i++)
+   EMIT(PPC_RAW_NOP());
+   /* elfv1 needs an additional instruction to load addr from 
descriptor */
+   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V1))
+   EMIT(PPC_RAW_NOP());
+   EMIT(PPC_RAW_MTCTR(_R12));
+   EMIT(PPC_RAW_BCTRL());
+   return 0;
+   }
 
 #ifdef CONFIG_PPC_KERNEL_PCREL
reladdr = func_addr - local_paca->kernelbase;
@@ -266,7 +274,8 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct 
codegen_context *ctx,
 * We can clobber r2 since we get called through a
 * function pointer (so caller will save/restore r2).
 */
-   EMIT(PPC_RAW_LD(_R2, bpf_to_ppc(TMP_REG_2), 8));
+   if (is_module_text_address(func_addr))
+   EMIT(PPC_RAW_LD(_R2, bpf_to_ppc(TMP_REG_2), 8));
} else {
PPC_LI64(_R12, func);
EMIT(PPC_RAW_MTCTR(_R12));
@@ -276,46 +285,14 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, 
struct codegen_context *ctx,
 * Load r2 with kernel TOC as kernel TOC is used if function 
address falls
 * within core kernel text.
 */
-   EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, 
kernel_toc)));
+   if (is_module_text_address(func_addr))
+   EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, 
kernel_toc)));
}
 #endif
 
return 0;
 }
 
-int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func)
-{
-   unsigned int i, ctx_idx = ctx->idx;
-
-   if (WARN_ON_ONCE(func && is_module_text_address(func)))
-   return -EINVAL;
-
-   /* skip past descriptor if elf v1 */
-   func += FUNCTION_DESCR_SIZE;
-
-   /* Load function address into r12 */
-   PPC_LI64(_R12, func);
-
-   /* For bpf-to-bpf function calls, the callee's address is unknown
-* until the last extra pass. As seen above, we use PPC_LI64() to
-* load the callee's address, but this may optimize the number of
-* instructions required based on the nature of the address.
-*
-* Since we don't want the number of instructions emitted to increase,
-* we pad the optimized PPC_LI64() call with NOPs to guarantee that
-* we always have a five-instruction sequence, which is the maximum
-* that PPC_LI64() can emit.
-*/
-   if (!image)
-   for (i = ctx->idx - ctx_idx; i < 5; i++)
-   EMIT(PPC_RAW_NOP());
-
-   EMIT(PPC_RAW_MTCTR(_R12));
-   EMIT(PPC_RAW_BCTRL());
-
-   return 0;
-}
-
 static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 
out)
 {
/*
@@ -1047,11 +1024,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
u32 *fimage, struct code
if (ret < 0)
return ret;
 
-

[RFC PATCH v3 09/11] samples/ftrace: Add support for ftrace direct samples on powerpc

2024-06-20 Thread Naveen N Rao

Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to
show the sample instruction sequence to be used by ftrace direct calls
to adhere to the ftrace ABI.

On 64-bit powerpc, TOC setup requires some additional work.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig|   2 +
 samples/ftrace/ftrace-direct-modify.c   |  85 +++-
 samples/ftrace/ftrace-direct-multi-modify.c | 101 +++-
 samples/ftrace/ftrace-direct-multi.c|  79 ++-
 samples/ftrace/ftrace-direct-too.c  |  83 +++-
 samples/ftrace/ftrace-direct.c  |  69 -
 6 files changed, 414 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 96ae653bdcde..cf5780d6f7bf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -275,6 +275,8 @@ config PPC
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE
select HAVE_RSEQ
+   select HAVE_SAMPLE_FTRACE_DIRECTif 
HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+   select HAVE_SAMPLE_FTRACE_DIRECT_MULTI  if 
HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_SETUP_PER_CPU_AREA  if PPC64
select HAVE_SOFTIRQ_ON_OWN_STACK
select HAVE_STACKPROTECTOR  if PPC32 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
diff --git a/samples/ftrace/ftrace-direct-modify.c 
b/samples/ftrace/ftrace-direct-modify.c
index 81220390851a..fa8996e251c8 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -2,7 +2,7 @@
 #include 
 #include 
 #include 
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include 
 #endif
 
@@ -199,6 +199,89 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC
+#include 
+
+#ifdef CONFIG_PPC64
+#define STACK_FRAME_SIZE 48
+#else
+#define STACK_FRAME_SIZE 24
+#endif
+
+#if defined(CONFIG_PPC64_ELF_ABI_V2) && !defined(CONFIG_PPC_KERNEL_PCREL)
+#define PPC64_TOC_SAVE_AND_UPDATE  \
+"  std 2, 24(1)\n" \
+"  bcl 20, 31, 1f\n"   \
+"   1: mflr12\n"   \
+"  ld  2, (99f - 1b)(12)\n"
+#define PPC64_TOC_RESTORE  \
+"  ld  2, 24(1)\n"
+#define PPC64_TOC  \
+"   99:.quad   .TOC.@tocbase\n"
+#else
+#define PPC64_TOC_SAVE_AND_UPDATE ""
+#define PPC64_TOC_RESTORE ""
+#define PPC64_TOC ""
+#endif
+
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+#define PPC_FTRACE_RESTORE_LR  \
+   PPC_LL" 0, "__stringify(PPC_LR_STKOFF)"(1)\n"   \
+"  mtlr0\n"
+#define PPC_FTRACE_RET \
+"  blr\n"
+#else
+#define PPC_FTRACE_RESTORE_LR  \
+   PPC_LL" 0, "__stringify(PPC_LR_STKOFF)"(1)\n"   \
+"  mtctr   0\n"
+#define PPC_FTRACE_RET \
+"  mtlr0\n"\
+"  bctr\n"
+#endif
+
+asm (
+"  .pushsection.text, \"ax\", @progbits\n"
+"  .type   my_tramp1, @function\n"
+"  .globl  my_tramp1\n"
+"   my_tramp1:\n"
+   PPC_STL"0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+   PPC_STLU"   1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"  mflr0\n"
+   PPC_STL"0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+   PPC_STLU"   1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"
+   PPC64_TOC_SAVE_AND_UPDATE
+"  bl  my_direct_func1\n"
+   PPC64_TOC_RESTORE
+"  addi1, 1, "__stringify(STACK_FRAME_SIZE)"\n"
+   PPC_FTRACE_RESTORE_LR
+"  addi1, 1, "__stringify(STACK_FRAME_MIN_SIZE)"\n"
+   PPC_LL" 0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+   PPC_FTRACE_RET
+"  .size   my_tramp1, .-my_tramp1\n"
+
+"  .type   my_tramp2, @function\n"
+"  .globl  my_tramp2\n"
+"   my_tramp2:\n"
+   PPC_STL"0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+   PPC_STLU"   1, -"__stringify(STACK_FRAME_MIN_SIZE)"(1)\n"
+"  mflr0\n"
+   PPC_STL"0, "__stringify(PPC_LR_STKOFF)"(1)\n"
+   PPC_STLU"   1, -"__stringify(STACK_FRAME_SIZE)"(1)\n"

[RFC PATCH v3 08/11] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS

2024-06-20 Thread Naveen N Rao

Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64
implementation.

ftrace direct calls allow custom trampolines to be called into directly
from function ftrace call sites, bypassing the ftrace trampoline
completely. This functionality is currently utilized by BPF trampolines
to hook into kernel function entries.

Since we have limited relative branch range, we support ftrace direct
calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this
approach, ftrace trampoline is not entirely bypassed. Rather, it is
re-purposed into a stub that reads direct_call field from the associated
ftrace_ops structure and branches into that, if it is not NULL. For
this, it is sufficient if we can ensure that the ftrace trampoline is
reachable from all traceable functions.

When multiple ftrace_ops are associated with a call site, we utilize a
call back to set pt_regs->orig_gpr3 that can then be tested on the
return path from the ftrace trampoline to branch into the direct caller.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/ftrace.h| 15 
 arch/powerpc/kernel/asm-offsets.c|  3 +
 arch/powerpc/kernel/trace/ftrace.c   |  9 +++
 arch/powerpc/kernel/trace/ftrace_entry.S | 99 ++--
 5 files changed, 105 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fde64ad19de5..96ae653bdcde 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -236,6 +236,7 @@ config PPC
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if FTRACE_PFE_OUT_OF_LINE || 
(PPC32 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
+   select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if 
HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
select HAVE_DYNAMIC_FTRACE_WITH_REGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 938cecf72eb1..fc0f25b10e86 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -147,6 +147,21 @@ extern unsigned long ftrace_pfe_stub_text_count, 
ftrace_pfe_stub_inittext_count;
 #endif
 void ftrace_free_init_tramp(void);
 unsigned long ftrace_call_adjust(unsigned long addr);
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+/*
+ * When an ftrace registered caller is tracing a function that is also set by a
+ * register_ftrace_direct() call, it needs to be differentiated in the
+ * ftrace_caller trampoline so that the direct call can be invoked after the
+ * other ftrace ops. To do this, place the direct caller in the orig_gpr3 field
+ * of pt_regs. This tells ftrace_caller that there's a direct caller.
+ */
+static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, 
unsigned long addr)
+{
+   struct pt_regs *regs = &fregs->regs;
+   regs->orig_gpr3 = addr;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
 #else
 static inline void ftrace_free_init_tramp(void) { }
 static inline unsigned long ftrace_call_adjust(unsigned long addr) { return 
addr; }
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index a11ea5f4d86a..0b955dddeb28 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -680,6 +680,9 @@ int main(void)
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+   OFFSET(FTRACE_OPS_DIRECT_CALL, ftrace_ops, direct_call);
+#endif
 #endif
 
return 0;
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 028548312c23..799612ee270f 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -153,6 +153,15 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, 
unsigned long addr, ppc_
if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE))
ip = ftrace_get_pfe_stub(rec) + MCOUNT_INSN_SIZE; /* second 
instruction in stub */
 
+   if (!is_offset_in_branch_range(addr - ip) && addr != FTRACE_ADDR && 
addr != FTRACE_REGS_ADDR) {
+   /* This can only happen with ftrace direct */
+   if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS)) {
+   pr_err("0x%lx (0x%lx): Unexpected target address 
0x%lx\n", ip, rec->ip, addr);
+   return -EINVAL;
+   }
+   addr = FTRACE_ADDR;
+   }
+
if (is_offset_in_branch_range(addr - ip))
/* Within range */
stub = addr;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrac

[RFC PATCH v3 01/11] powerpc/kprobes: Use ftrace to determine if a probe is at function entry

2024-06-20 Thread Naveen N Rao

Rather than hard-coding the offset into a function to be used to
determine if a kprobe is at function entry, use ftrace_location() to
determine the ftrace location within the function and categorize all
instructions till that offset to be function entry.

For functions that cannot be traced, we fall back to using a fixed
offset of 8 (two instructions) to categorize a probe as being at
function entry for 64-bit elfv2, unless we are using pcrel.

Acked-by: Masami Hiramatsu (Google) 
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/kprobes.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 14c5ddec3056..ca204f4f21c1 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
return addr;
 }
 
-static bool arch_kprobe_on_func_entry(unsigned long offset)
+static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
 {
-#ifdef CONFIG_PPC64_ELF_ABI_V2
-#ifdef CONFIG_KPROBES_ON_FTRACE
-   return offset <= 16;
-#else
-   return offset <= 8;
-#endif
-#else
+   unsigned long ip = ftrace_location(addr);
+
+   if (ip)
+   return offset <= (ip - addr);
+   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && 
!IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+   return offset <= 8;
return !offset;
-#endif
 }
 
 /* XXX try and fold the magic of kprobe_lookup_name() in this */
 kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long 
offset,
 bool *on_func_entry)
 {
-   *on_func_entry = arch_kprobe_on_func_entry(offset);
+   *on_func_entry = arch_kprobe_on_func_entry(addr, offset);
return (kprobe_opcode_t *)(addr + offset);
 }
 
-- 
2.45.2

[RFC PATCH v3 07/11] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS

2024-06-20 Thread Naveen N Rao

Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the
arm64 implementation.

This works by patching-in a pointer to an associated ftrace_ops
structure before each traceable function. If multiple ftrace_ops are
associated with a call site, then a special ftrace_list_ops is used to
enable iterating over all the registered ftrace_ops. If no ftrace_ops
are associated with a call site, then a special ftrace_nop_ops structure
is used to render the ftrace call as a no-op. ftrace trampoline can then
read the associated ftrace_ops for a call site by loading from an offset
from the LR, and branch directly to the associated function.

The primary advantage with this approach is that we don't have to
iterate over all the registered ftrace_ops for call sites that have a
single ftrace_ops registered. This is the equivalent of implementing
support for dynamic ftrace trampolines, which set up a special ftrace
trampoline for each registered ftrace_ops and have individual call sites
branch into those directly.

A secondary advantage is that this gives us a way to add support for
direct ftrace callers without having to resort to using stubs. The
address of the direct call trampoline can be loaded from the ftrace_ops
structure.

To support this, we reserve a nop before each function on 32-bit
powerpc. For 64-bit powerpc, two nops are reserved before each
out-of-line stub. During ftrace activation, we update this location with
the associated ftrace_ops pointer. Then, on ftrace entry, we load from
this location and call into ftrace_ops->func().

For 64-bit powerpc, we ensure that the out-of-line stub area is
doubleword aligned so that ftrace_ops address can be updated atomically.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  4 ++
 arch/powerpc/include/asm/ftrace.h  |  5 +-
 arch/powerpc/kernel/asm-offsets.c  |  4 ++
 arch/powerpc/kernel/trace/ftrace.c | 59 +-
 arch/powerpc/kernel/trace/ftrace_entry.S   | 34 ++---
 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh |  5 +-
 7 files changed, 101 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index dd7efca2275a..fde64ad19de5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -235,6 +235,7 @@ config PPC
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
+   select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if FTRACE_PFE_OUT_OF_LINE || 
(PPC32 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
select HAVE_DYNAMIC_FTRACE_WITH_REGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index bb920d48ec6e..c3e577dea137 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -158,8 +158,12 @@ KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
 ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
 CC_FLAGS_FTRACE := -fpatchable-function-entry=1
 else
+ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS # PPC32 only
+CC_FLAGS_FTRACE := -fpatchable-function-entry=3,1
+else
 CC_FLAGS_FTRACE := -fpatchable-function-entry=2
 endif
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 9da1da0f87b4..938cecf72eb1 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -137,8 +137,11 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { 
return 1; }
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
 #ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
 struct ftrace_pfe_stub {
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+   struct ftrace_ops *ftrace_op;
+#endif
u32 insn[4];
-};
+} __aligned(sizeof(unsigned long));
 extern struct ftrace_pfe_stub ftrace_pfe_stub_text[], 
ftrace_pfe_stub_inittext[];
 extern unsigned long ftrace_pfe_stub_text_count, 
ftrace_pfe_stub_inittext_count;
 #endif
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 5f1a411d714c..a11ea5f4d86a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -678,5 +678,9 @@ int main(void)
DEFINE(FTRACE_PFE_STUB_SIZE, sizeof(struct ftrace_pfe_stub));
 #endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+   OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#endif
+
return 0;
 }
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 9f3c10307331..028548312c23 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -38,8 +38,11 @@ unsigned long ftrace_call_adjust(unsigned long addr)
return 0;

[RFC PATCH v3 06/11] powerpc64/ftrace: Move ftrace sequence out of line

2024-06-20 Thread Naveen N Rao

Function profile sequence on powerpc includes two instructions at the
beginning of each function:
mflrr0
bl  ftrace_caller

The call to ftrace_caller() gets nop'ed out during kernel boot and is
patched in when ftrace is enabled.

Given the sequence, we cannot return from ftrace_caller with 'blr' as we
need to keep LR and r0 intact. This results in link stack imbalance when
ftrace is enabled. To address that, we would like to use a three
instruction sequence:
mflrr0
bl  ftrace_caller
mtlrr0

Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
reserve two instruction slots before the function. This results in a
total of five instruction slots to be reserved for ftrace use on each
function that is traced.

Move the function profile sequence out-of-line to minimize its impact.
To do this, we reserve a single nop at function entry using
-fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
the total number of functions that can be traced. This is then used to
generate a .S file reserving the appropriate amount of space for use as
ftrace stubs, which is built and linked into vmlinux.

On bootup, the stub space is split into separate stubs per function and
populated with the proper instruction sequence. A pointer to the
associated stub is maintained in dyn_arch_ftrace.

For modules, space for ftrace stubs is reserved from the generic module
stub space.

This is restricted to and enabled by default only on 64-bit powerpc.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig   |   5 +
 arch/powerpc/Makefile  |   4 +
 arch/powerpc/include/asm/ftrace.h  |  10 ++
 arch/powerpc/include/asm/module.h  |   5 +
 arch/powerpc/kernel/asm-offsets.c  |   4 +
 arch/powerpc/kernel/module_64.c|  58 +++-
 arch/powerpc/kernel/trace/ftrace.c | 147 +++--
 arch/powerpc/kernel/trace/ftrace_entry.S   |  99 ++
 arch/powerpc/kernel/vmlinux.lds.S  |   3 +-
 arch/powerpc/tools/Makefile|  10 ++
 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh |  48 +++
 11 files changed, 355 insertions(+), 38 deletions(-)
 create mode 100644 arch/powerpc/tools/Makefile
 create mode 100755 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c88c6d46a5bc..dd7efca2275a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -568,6 +568,11 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
def_bool 
$(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh 
$(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
def_bool 
$(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh 
$(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
 
+config FTRACE_PFE_OUT_OF_LINE
+   def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+   depends on PPC64
+   select ARCH_WANTS_PRE_LINK_VMLINUX
+
 config HOTPLUG_CPU
bool "Support for enabling/disabling CPUs"
depends on SMP && (PPC_PSERIES || \
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index a8479c881cac..bb920d48ec6e 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -155,7 +155,11 @@ CC_FLAGS_NO_FPU:= $(call 
cc-option,-msoft-float)
 ifdef CONFIG_FUNCTION_TRACER
 ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+CC_FLAGS_FTRACE := -fpatchable-function-entry=1
+else
 CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 201f9d15430a..9da1da0f87b4 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -26,6 +26,9 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
unsigned long ip,
 struct module;
 struct dyn_ftrace;
 struct dyn_arch_ftrace {
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+   unsigned long pfe_stub;
+#endif
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
@@ -132,6 +135,13 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { 
return 1; }
 
 #ifdef CONFIG_FUNCTION_TRACER
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+struct ftrace_pfe_stub {
+   u32 insn[4];
+};
+extern struct ftrace_pfe_stub ftrace_pfe_stub_text[], 
ftrace_pfe_stub_inittext[];
+extern unsigned long ftrace_pfe_stub_text_count, 
ftrace_pfe_stub_inittext_count;
+#endif
 void ftrace_free_init_tramp(void);
 unsigned long ftrace_call_adjust(unsigned long addr);
 #else
diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index 300c777cc307..28dbd1ec5593 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/ar

[RFC PATCH v3 05/11] kbuild: Add generic hook for architectures to use before the final vmlinux link

2024-06-20 Thread Naveen N Rao

On powerpc, we would like to be able to make a pass on vmlinux.o and
generate a new object file to be linked into vmlinux. Add a generic pass
in Makefile.vmlinux that architectures can use for this purpose.

Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
provide arch//tools/Makefile with .arch.vmlinux.o target, which
will be invoked prior to the final vmlinux link step.

Signed-off-by: Naveen N Rao 
---
 arch/Kconfig |  3 +++
 scripts/Makefile.vmlinux |  8 
 scripts/link-vmlinux.sh  | 11 ---
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 975dd22a2dbd..649f0903e7ef 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1643,4 +1643,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
 config ARCH_NEED_CMPXCHG_1_EMU
bool
 
+config ARCH_WANTS_PRE_LINK_VMLINUX
+   def_bool n
+
 endmenu
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 49946cb96844..6410e0be7f52 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -22,6 +22,14 @@ targets += .vmlinux.export.o
 vmlinux: .vmlinux.export.o
 endif
 
+ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
+targets += .arch.vmlinux.o
+.arch.vmlinux.o: vmlinux.o FORCE
+   $(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools .arch.vmlinux.o
+
+vmlinux: .arch.vmlinux.o
+endif
+
 ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
 
 # Final link of vmlinux with optional arch pass after final link
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 518c70b8db50..aafaed1412ea 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -122,7 +122,7 @@ gen_btf()
return 1
fi
 
-   vmlinux_link ${1}
+   vmlinux_link ${1} ${arch_vmlinux_o}
 
info "BTF" ${2}
LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
@@ -178,7 +178,7 @@ kallsyms_step()
kallsymso=${kallsyms_vmlinux}.o
kallsyms_S=${kallsyms_vmlinux}.S
 
-   vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" 
${btf_vmlinux_bin_o}
+   vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" 
${btf_vmlinux_bin_o} ${arch_vmlinux_o}
mksysmap ${kallsyms_vmlinux} ${kallsyms_vmlinux}.syms
kallsyms ${kallsyms_vmlinux}.syms ${kallsyms_S}
 
@@ -223,6 +223,11 @@ fi
 
 ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init 
init/version-timestamp.o
 
+arch_vmlinux_o=""
+if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
+   arch_vmlinux_o=.arch.vmlinux.o
+fi
+
 btf_vmlinux_bin_o=""
 if is_enabled CONFIG_DEBUG_INFO_BTF; then
btf_vmlinux_bin_o=.btf.vmlinux.bin.o
@@ -273,7 +278,7 @@ if is_enabled CONFIG_KALLSYMS; then
fi
 fi
 
-vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o}
+vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o} ${arch_vmlinux_o}
 
 # fill in BTF IDs
 if is_enabled CONFIG_DEBUG_INFO_BTF && is_enabled CONFIG_BPF; then
-- 
2.45.2

[RFC PATCH v3 04/11] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace

2024-06-20 Thread Naveen N Rao

Pointer to struct module is only relevant for ftrace records belonging
to kernel modules. Having this field in dyn_arch_ftrace wastes memory
for all ftrace records belonging to the kernel. Remove the same in
favour of looking up the module from the ftrace record address, similar
to other architectures.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ftrace.h|  1 -
 arch/powerpc/kernel/trace/ftrace.c   | 54 +++---
 arch/powerpc/kernel/trace/ftrace_64_pg.c | 73 +++-
 3 files changed, 65 insertions(+), 63 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 107fc5a48456..201f9d15430a 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -26,7 +26,6 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
unsigned long ip,
 struct module;
 struct dyn_ftrace;
 struct dyn_arch_ftrace {
-   struct module *mod;
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 463bd7531dc8..2cff37b5fd2c 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -106,28 +106,48 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
return 0;
 }
 
+#ifdef CONFIG_MODULES
+static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long 
addr)
+{
+   struct module *mod = NULL;
+
+   /*
+* NOTE: __module_text_address() must be called with preemption
+* disabled, but we can rely on ftrace_lock to ensure that 'mod'
+* retains its validity throughout the remainder of this code.
+*/
+   preempt_disable();
+   mod = __module_text_address(ip);
+   preempt_enable();
+
+   if (!mod)
+   pr_err("No module loaded at addr=%lx\n", ip);
+
+   return (addr == (unsigned long)ftrace_caller ? mod->arch.tramp : 
mod->arch.tramp_regs);
+}
+#else
+static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long 
addr)
+{
+   return 0;
+}
+#endif
+
 static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, 
ppc_inst_t *call_inst)
 {
unsigned long ip = rec->ip;
unsigned long stub;
 
-   if (is_offset_in_branch_range(addr - ip)) {
+   if (is_offset_in_branch_range(addr - ip))
/* Within range */
stub = addr;
-#ifdef CONFIG_MODULES
-   } else if (rec->arch.mod) {
-   /* Module code would be going to one of the module stubs */
-   stub = (addr == (unsigned long)ftrace_caller ? 
rec->arch.mod->arch.tramp :
-  
rec->arch.mod->arch.tramp_regs);
-#endif
-   } else if (core_kernel_text(ip)) {
+   else if (core_kernel_text(ip))
/* We would be branching to one of our ftrace stubs */
stub = find_ftrace_tramp(ip);
-   if (!stub) {
-   pr_err("0x%lx: No ftrace stubs reachable\n", ip);
-   return -EINVAL;
-   }
-   } else {
+   else
+   stub = ftrace_lookup_module_stub(ip, addr);
+
+   if (!stub) {
+   pr_err("0x%lx: No ftrace stubs reachable\n", ip);
return -EINVAL;
}
 
@@ -258,14 +278,6 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
if (ret)
return ret;
 
-   if (!core_kernel_text(ip)) {
-   if (!mod) {
-   pr_err("0x%lx: No module provided for non-kernel 
address\n", ip);
-   return -EFAULT;
-   }
-   rec->arch.mod = mod;
-   }
-
/* Nop-out the ftrace location */
new = ppc_inst(PPC_RAW_NOP());
addr = MCOUNT_ADDR;
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c 
b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 12fab1803bcf..a563b9ffcc2b 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -116,6 +116,24 @@ static unsigned long find_bl_target(unsigned long ip, 
ppc_inst_t op)
 }
 
 #ifdef CONFIG_MODULES
+static struct module *ftrace_lookup_module(struct dyn_ftrace *rec)
+{
+   struct module *mod;
+   /*
+* NOTE: __module_text_address() must be called with preemption
+* disabled, but we can rely on ftrace_lock to ensure that 'mod'
+* retains its validity throughout the remainder of this code.
+*/
+   preempt_disable();
+   mod = __module_text_address(rec->ip);
+   preempt_enable();
+
+   if (!mod)
+   pr_err("No module loaded at addr=%lx\n", rec->ip);
+
+   return mod;
+}
+
 static int
 __ftrace_make_nop(struct module *mod,
  struct dyn_ftrace *rec, unsigned l

[RFC PATCH v3 03/11] powerpc/module_64: Convert #ifdef to IS_ENABLED()

2024-06-20 Thread Naveen N Rao

Minor refactor for converting #ifdef to IS_ENABLED().

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/module_64.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index e9bab599d0c2..c202be11683b 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -241,14 +241,13 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
}
}
 
-#ifdef CONFIG_DYNAMIC_FTRACE
/* make the trampoline to the ftrace_caller */
-   relocs++;
-#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE))
+   relocs++;
+
/* an additional one for ftrace_regs_caller */
-   relocs++;
-#endif
-#endif
+   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
+   relocs++;
 
pr_debug("Looks like a total of %lu stubs, max\n", relocs);
return relocs * sizeof(struct ppc64_stub_entry);
-- 
2.45.2

[RFC PATCH v3 02/11] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code

2024-06-20 Thread Naveen N Rao

On 32-bit powerpc, gcc generates a three instruction sequence for
function profiling:
mflrr0
stw r0, 4(r1)
bl  _mcount

On kernel boot, the call to _mcount() is nop-ed out, to be patched back
in when ftrace is actually enabled. The 'stw' instruction therefore is
not necessary unless ftrace is enabled. Nop it out during ftrace init.

When ftrace is enabled, we want the 'stw' so that stack unwinding works
properly. Perform the same within the ftrace handler, similar to 64-bit
powerpc.

For 64-bit powerpc, early versions of gcc used to emit a three
instruction sequence for function profiling (with -mprofile-kernel) with
a 'std' instruction to mimic the 'stw' above. Address that scenario also
by nop-ing out the 'std' instruction during ftrace init.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c   | 6 --
 arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index d8d6b4fd9a14..463bd7531dc8 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -241,13 +241,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
if (!ret)
-   ret = ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
+   ret = ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
+ppc_inst(PPC_RAW_NOP()));
} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl 
_mcount' */
ret = ftrace_read_inst(ip - 4, &old);
if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0 {
ret = ftrace_validate_inst(ip - 8, 
ppc_inst(PPC_RAW_MFLR(_R0)));
-   ret |= ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
+   ret |= ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
+ ppc_inst(PPC_RAW_NOP()));
}
} else {
return -EINVAL;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index 76dbe9fd2c0f..244a1c7bb1e8 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,6 +33,8 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro ftrace_regs_entry allregs
+   /* Save the original return address in A's stack frame */
+   PPC_STL r0, LRSAVE(r1)
/* Create a minimal stack frame for representing B */
PPC_STLUr1, -STACK_FRAME_MIN_SIZE(r1)
 
@@ -44,8 +46,6 @@
SAVE_GPRS(3, 10, r1)
 
 #ifdef CONFIG_PPC64
-   /* Save the original return address in A's stack frame */
-   std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
/* Ok to continue? */
lbz r3, PACA_FTRACE_ENABLED(r13)
cmpdi   r3, 0
-- 
2.45.2

[RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines

2024-06-20 Thread Naveen N Rao

This is v3 of the patches posted here:
http://lkml.kernel.org/r/cover.1718008093.git.nav...@kernel.org

Since v2, I have addressed review comments from Steven and Masahiro 
along with a few fixes. Patches 7-11 are new in this series and add 
support for ftrace direct and bpf trampolines. 

This series depends on the patch series from Benjamin Gray adding 
support for patch_ulong():
http://lkml.kernel.org/r/20240515024445.236364-1-bg...@linux.ibm.com


- Naveen


Naveen N Rao (11):
  powerpc/kprobes: Use ftrace to determine if a probe is at function
entry
  powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
  powerpc/module_64: Convert #ifdef to IS_ENABLED()
  powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
  kbuild: Add generic hook for architectures to use before the final
vmlinux link
  powerpc64/ftrace: Move ftrace sequence out of line
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  samples/ftrace: Add support for ftrace direct samples on powerpc
  powerpc64/bpf: Fold bpf_jit_emit_func_call_hlp() into
bpf_jit_emit_func_call_rel()
  powerpc64/bpf: Add support for bpf trampolines

 arch/Kconfig|   3 +
 arch/powerpc/Kconfig|   9 +
 arch/powerpc/Makefile   |   8 +
 arch/powerpc/include/asm/ftrace.h   |  29 +-
 arch/powerpc/include/asm/module.h   |   5 +
 arch/powerpc/include/asm/ppc-opcode.h   |  14 +
 arch/powerpc/kernel/asm-offsets.c   |  11 +
 arch/powerpc/kernel/kprobes.c   |  18 +-
 arch/powerpc/kernel/module_64.c |  67 +-
 arch/powerpc/kernel/trace/ftrace.c  | 269 +++-
 arch/powerpc/kernel/trace/ftrace_64_pg.c|  73 +-
 arch/powerpc/kernel/trace/ftrace_entry.S| 210 --
 arch/powerpc/kernel/vmlinux.lds.S   |   3 +-
 arch/powerpc/net/bpf_jit.h  |  11 +
 arch/powerpc/net/bpf_jit_comp.c | 702 +++-
 arch/powerpc/net/bpf_jit_comp32.c   |   7 +-
 arch/powerpc/net/bpf_jit_comp64.c   |  68 +-
 arch/powerpc/tools/Makefile |  10 +
 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh  |  49 ++
 samples/ftrace/ftrace-direct-modify.c   |  85 ++-
 samples/ftrace/ftrace-direct-multi-modify.c | 101 ++-
 samples/ftrace/ftrace-direct-multi.c|  79 ++-
 samples/ftrace/ftrace-direct-too.c  |  83 ++-
 samples/ftrace/ftrace-direct.c  |  69 +-
 scripts/Makefile.vmlinux|   8 +
 scripts/link-vmlinux.sh |  11 +-
 26 files changed, 1813 insertions(+), 189 deletions(-)
 create mode 100644 arch/powerpc/tools/Makefile
 create mode 100755 arch/powerpc/tools/gen-ftrace-pfe-stubs.sh


base-commit: e2b06d707dd067509cdc9ceba783c06fa6a551c2
prerequisite-patch-id: a1d50e589288239d5a8b1c1f354cd4737057c9d3
prerequisite-patch-id: da4142d56880861bd0ad7ad7087c9e2670a2ee54
prerequisite-patch-id: 609d292e054b2396b603890522a940fa0bdfb6d8
prerequisite-patch-id: 6f7213fb77b1260defbf43be0e47bff9c80054cc
prerequisite-patch-id: ad3b71bf071ae4ba1bee5b087e61a2055772a74f
-- 
2.45.2

Re: [PATCH v2] PowerPC: Replace kretprobe with rethook

2024-06-18 Thread Naveen N Rao

On Tue, Jun 18, 2024 at 06:43:06AM GMT, Masami Hiramatsu wrote:
> On Mon, 17 Jun 2024 18:28:07 +0530
> Naveen N Rao  wrote:
> 
> > Hi Abhishek,
> > 
> > On Mon, Jun 10, 2024 at 11:45:09AM GMT, Abhishek Dubey wrote:
> > > This is an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes:
> > > Replace kretprobe with rethook on x86") to PowerPC.
> > > 
> > > Replaces the kretprobe code with rethook on Power. With this patch,
> > > kretprobe on Power uses the rethook instead of kretprobe specific
> > > trampoline code.
> > > 
> > > Reference to other archs:
> > > commit b57c2f124098 ("riscv: add riscv rethook implementation")
> > > commit 7b0a096436c2 ("LoongArch: Replace kretprobe with rethook")
> > > 
> > > Signed-off-by: Abhishek Dubey 
> > > ---
> > >  arch/powerpc/Kconfig |  1 +
> > >  arch/powerpc/kernel/Makefile |  1 +
> > >  arch/powerpc/kernel/kprobes.c| 65 +
> > >  arch/powerpc/kernel/optprobes.c  |  2 +-
> > >  arch/powerpc/kernel/rethook.c| 71 
> > >  arch/powerpc/kernel/stacktrace.c | 10 +++--
> > >  6 files changed, 81 insertions(+), 69 deletions(-)
> > >  create mode 100644 arch/powerpc/kernel/rethook.c
...
> > > +
> > > + return 0;
> > > +}
> > > +NOKPROBE_SYMBOL(trampoline_rethook_handler);
> > > +
> > > +void arch_rethook_prepare(struct rethook_node *rh, struct pt_regs *regs, 
> > > bool mcount)
> > > +{
> > > + rh->ret_addr = regs->link;
> > > + rh->frame = 0;
> > 
> > There is additional code to validate our assumption with a frame pointer 
> > set, so I think we should set this to regs->gpr[1].
> 
> Additonal note: If this sets regs->gpr[1], pass it to 
> rethook_trampoline_handler()
> too, so that it can find correct frame.
> 
> BTW, it seems powerpc does not use kretprobe/rethook shadow stack for
> stack unwinding yet, is that right?

Yes, you are right. That would be a good addition. I suppose we could 
add something in show_stack() to show the actual function name rather 
than the rethook trampoline. It can be a separate patch though.


Thanks,
Naveen

Re: [PATCH v2] PowerPC: Replace kretprobe with rethook

2024-06-17 Thread Naveen N Rao

Hi Abhishek,

On Mon, Jun 10, 2024 at 11:45:09AM GMT, Abhishek Dubey wrote:
> This is an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes:
> Replace kretprobe with rethook on x86") to PowerPC.
> 
> Replaces the kretprobe code with rethook on Power. With this patch,
> kretprobe on Power uses the rethook instead of kretprobe specific
> trampoline code.
> 
> Reference to other archs:
> commit b57c2f124098 ("riscv: add riscv rethook implementation")
> commit 7b0a096436c2 ("LoongArch: Replace kretprobe with rethook")
> 
> Signed-off-by: Abhishek Dubey 
> ---
>  arch/powerpc/Kconfig |  1 +
>  arch/powerpc/kernel/Makefile |  1 +
>  arch/powerpc/kernel/kprobes.c| 65 +
>  arch/powerpc/kernel/optprobes.c  |  2 +-
>  arch/powerpc/kernel/rethook.c| 71 
>  arch/powerpc/kernel/stacktrace.c | 10 +++--
>  6 files changed, 81 insertions(+), 69 deletions(-)
>  create mode 100644 arch/powerpc/kernel/rethook.c

Thanks for implementing this - it is looking good, but please find a few 
small suggestions below.

> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index c88c6d46a5bc..fa0b1ab3f935 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -270,6 +270,7 @@ config PPC
>   select HAVE_PERF_EVENTS_NMI if PPC64
>   select HAVE_PERF_REGS
>   select HAVE_PERF_USER_STACK_DUMP
> + select HAVE_RETHOOK
>   select HAVE_REGS_AND_STACK_ACCESS_API
>   select HAVE_RELIABLE_STACKTRACE
>   select HAVE_RSEQ
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 8585d03c02d3..7dd1b523b17f 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -140,6 +140,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o
>  obj-$(CONFIG_OPTPROBES)  += optprobes.o optprobes_head.o
>  obj-$(CONFIG_KPROBES_ON_FTRACE)  += kprobes-ftrace.o
>  obj-$(CONFIG_UPROBES)+= uprobes.o
> +obj-$(CONFIG_RETHOOK)   += rethook.o
>  obj-$(CONFIG_PPC_UDBG_16550) += legacy_serial.o udbg_16550.o
>  obj-$(CONFIG_SWIOTLB)+= dma-swiotlb.o
>  obj-$(CONFIG_ARCH_HAS_DMA_SET_MASK) += dma-mask.o
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 14c5ddec3056..f8aa91bc3b17 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -228,16 +228,6 @@ static nokprobe_inline void set_current_kprobe(struct 
> kprobe *p, struct pt_regs
>   kcb->kprobe_saved_msr = regs->msr;
>  }
>  
> -void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs 
> *regs)
> -{
> - ri->ret_addr = (kprobe_opcode_t *)regs->link;
> - ri->fp = NULL;
> -
> - /* Replace the return addr with trampoline addr */
> - regs->link = (unsigned long)__kretprobe_trampoline;
> -}
> -NOKPROBE_SYMBOL(arch_prepare_kretprobe);
> -
>  static int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
>  {
>   int ret;
> @@ -394,49 +384,6 @@ int kprobe_handler(struct pt_regs *regs)
>  }
>  NOKPROBE_SYMBOL(kprobe_handler);
>  
> -/*
> - * Function return probe trampoline:
> - *   - init_kprobes() establishes a probepoint here
> - *   - When the probed function returns, this probe
> - *   causes the handlers to fire
> - */
> -asm(".global __kretprobe_trampoline\n"
> - ".type __kretprobe_trampoline, @function\n"
> - "__kretprobe_trampoline:\n"
> - "nop\n"
> - "blr\n"
> - ".size __kretprobe_trampoline, .-__kretprobe_trampoline\n");
> -
> -/*
> - * Called when the probe at kretprobe trampoline is hit
> - */
> -static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
> -{
> - unsigned long orig_ret_address;
> -
> - orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
> - /*
> -  * We get here through one of two paths:
> -  * 1. by taking a trap -> kprobe_handler() -> here
> -  * 2. by optprobe branch -> optimized_callback() -> opt_pre_handler() 
> -> here
> -  *
> -  * When going back through (1), we need regs->nip to be setup properly
> -  * as it is used to determine the return address from the trap.
> -  * For (2), since nip is not honoured with optprobes, we instead setup
> -  * the link register properly so that the subsequent 'blr' in
> -  * __kretprobe_trampoline jumps back to the right instruction.
> -  *
> -  * For nip, we should set the address to the previous instruction since
> -  * we end up emulating it in kprobe_handler(), which increments the nip
> -  * again.
> -  */
> - regs_set_return_ip(regs, orig_ret_address - 4);
> - regs->link = orig_ret_address;
> -
> - return 0;
> -}
> -NOKPROBE_SYMBOL(trampoline_probe_handler);
> -
>  /*
>   * Called after single-stepping.  p->addr is the address of the
>   * instruction whose first byte has been replaced by the "breakpoint"
> @@ -539,19 +486,9 @@ int kprobe_fault_handler(struct

Re: [RFC PATCH v2 4/5] kbuild: Add generic hook for architectures to use before the final vmlinux link

2024-06-12 Thread Naveen N Rao

On Tue, Jun 11, 2024 at 11:05:56PM GMT, Naveen N Rao wrote:
> On Tue, Jun 11, 2024 at 06:51:51AM GMT, Masahiro Yamada wrote:
> > On Tue, Jun 11, 2024 at 2:20 AM Naveen N Rao  wrote:
> > >
> > > On Mon, Jun 10, 2024 at 06:14:51PM GMT, Masahiro Yamada wrote:
> > > > On Mon, Jun 10, 2024 at 5:39 PM Naveen N Rao  wrote:
> > > > >
> > > > > +arch_vmlinux_o=""
> > > > > +if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
> > > > > +   arch_vmlinux_o=.arch.vmlinux.o
> > > > > +   info "ARCH" ${arch_vmlinux_o}
> > > > > +   if ! ${srctree}/arch/${SRCARCH}/tools/vmlinux_o.sh 
> > > > > ${arch_vmlinux_o} ; then
> > > > > +   echo >&2 "Failed to generate ${arch_vmlinux_o}"
> > > > > +   echo >&2 "Try to disable 
> > > > > CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX"
> > > > > +   exit 1
> > > > > +   fi
> > > > > +fi
> > > >
> > > >
> > > >
> > > > This is wrong because scripts/link-vmlinux.sh is not triggered
> > > > even when source files under arch/powerpc/tools/ are changed.
> > > >
> > > > Presumably, scripts/Makefile.vmlinux will be the right place.
> > >
> > > Ah, yes. Something like this?
> > >
> > > diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> > > index 49946cb96844..77d90b6ac53e 100644
> > > --- a/scripts/Makefile.vmlinux
> > > +++ b/scripts/Makefile.vmlinux
> > > @@ -22,6 +22,10 @@ targets += .vmlinux.export.o
> > >  vmlinux: .vmlinux.export.o
> > >  endif
> > >
> > > +ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
> > > +vmlinux: $(srctree)/arch/$(SRCARCH)/tools/vmlinux_o.sh
> > > +endif
> > > +
> > >  ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
> > >
> > >  # Final link of vmlinux with optional arch pass after final link
> > >
> > >
> > > Thanks,
> > > Naveen
> > >
> > 
> > 
> > 
> > No.
> > 
> > Something like below.
> > 
> > Then, you can do everything in Makefile, not a shell script.
> > 
> > 
> > 
> > ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
> > vmlinux: .arch.vmlinux.o
> > 
> > .arch.vmlinux.o: FORCE
> > $(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools .arch.vmlinux.o
> > 
> > endif
> > 
> > 
> > 
> > I did not test it, though.
> 
> Thanks for the pointer. I will try and build on that.
> 
> Just to be completely sure, does the below incremetal diff on top of the 
> existing patch capture your suggestion?
> 
> ---
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index 49946cb96844..04e088d7a1ca 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -22,6 +22,13 @@ targets += .vmlinux.export.o
>  vmlinux: .vmlinux.export.o
>  endif
>  
> +ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
> +vmlinux: .arch.vmlinux.o
> +
> +.arch.vmlinux.o: FORCE
> +$(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools .arch.vmlinux.o
> +endif
> +
>  ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
>  
>  # Final link of vmlinux with optional arch pass after final link
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 07f70e105d82..f1b705b8cdca 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -227,12 +227,6 @@ ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init 
> init/version-timestamp.o
>  arch_vmlinux_o=""
>  if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
> arch_vmlinux_o=.arch.vmlinux.o
> -   info "ARCH" ${arch_vmlinux_o}
> -   if ! ${srctree}/arch/${SRCARCH}/tools/vmlinux_o.sh ${arch_vmlinux_o} 
> ; then
> -   echo >&2 "Failed to generate ${arch_vmlinux_o}"
> -   echo >&2 "Try to disable CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX"
> -   exit 1
> -   fi
>  fi
>  
>  btf_vmlinux_bin_o=""

This is what I ended up with:

---
diff --git a/arch/Kconfig b/arch/Kconfig
index 975dd22a2dbd..649f0903e7ef 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1643,4 +1643,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
 config ARCH_NEED_CMPXCHG_1_EMU
bool

+config ARCH_WANTS_PRE_LINK_VMLINUX
+   def_bool n
+
 endmenu
diff --git a/scripts/Makefile.vmlinux b/scripts/Mak

Re: [RFC PATCH v2 4/5] kbuild: Add generic hook for architectures to use before the final vmlinux link

2024-06-11 Thread Naveen N Rao

On Tue, Jun 11, 2024 at 06:51:51AM GMT, Masahiro Yamada wrote:
> On Tue, Jun 11, 2024 at 2:20 AM Naveen N Rao  wrote:
> >
> > On Mon, Jun 10, 2024 at 06:14:51PM GMT, Masahiro Yamada wrote:
> > > On Mon, Jun 10, 2024 at 5:39 PM Naveen N Rao  wrote:
> > > >
> > > > On powerpc, we would like to be able to make a pass on vmlinux.o and
> > > > generate a new object file to be linked into vmlinux. Add a generic pass
> > > > in link-vmlinux.sh that architectures can use for this purpose.
> > > > Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
> > > > provide arch//tools/vmlinux_o.sh, which will be invoked prior to
> > > > the final vmlinux link step.
> > > >
> > > > Signed-off-by: Naveen N Rao 
> > > > ---
> > > >  arch/Kconfig|  3 +++
> > > >  scripts/link-vmlinux.sh | 18 +++---
> > > >  2 files changed, 18 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/Kconfig b/arch/Kconfig
> > > > index 975dd22a2dbd..649f0903e7ef 100644
> > > > --- a/arch/Kconfig
> > > > +++ b/arch/Kconfig
> > > > @@ -1643,4 +1643,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
> > > >  config ARCH_NEED_CMPXCHG_1_EMU
> > > > bool
> > > >
> > > > +config ARCH_WANTS_PRE_LINK_VMLINUX
> > > > +   def_bool n
> > > > +
> > > >  endmenu
> > > > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > > > index 46ce5d04dbeb..07f70e105d82 100755
> > > > --- a/scripts/link-vmlinux.sh
> > > > +++ b/scripts/link-vmlinux.sh
> > ...
> > > >
> > > > +arch_vmlinux_o=""
> > > > +if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
> > > > +   arch_vmlinux_o=.arch.vmlinux.o
> > > > +   info "ARCH" ${arch_vmlinux_o}
> > > > +   if ! ${srctree}/arch/${SRCARCH}/tools/vmlinux_o.sh 
> > > > ${arch_vmlinux_o} ; then
> > > > +   echo >&2 "Failed to generate ${arch_vmlinux_o}"
> > > > +   echo >&2 "Try to disable 
> > > > CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX"
> > > > +   exit 1
> > > > +   fi
> > > > +fi
> > >
> > >
> > >
> > > This is wrong because scripts/link-vmlinux.sh is not triggered
> > > even when source files under arch/powerpc/tools/ are changed.
> > >
> > > Presumably, scripts/Makefile.vmlinux will be the right place.
> >
> > Ah, yes. Something like this?
> >
> > diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> > index 49946cb96844..77d90b6ac53e 100644
> > --- a/scripts/Makefile.vmlinux
> > +++ b/scripts/Makefile.vmlinux
> > @@ -22,6 +22,10 @@ targets += .vmlinux.export.o
> >  vmlinux: .vmlinux.export.o
> >  endif
> >
> > +ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
> > +vmlinux: $(srctree)/arch/$(SRCARCH)/tools/vmlinux_o.sh
> > +endif
> > +
> >  ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
> >
> >  # Final link of vmlinux with optional arch pass after final link
> >
> >
> > Thanks,
> > Naveen
> >
> 
> 
> 
> No.
> 
> Something like below.
> 
> Then, you can do everything in Makefile, not a shell script.
> 
> 
> 
> ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
> vmlinux: .arch.vmlinux.o
> 
> .arch.vmlinux.o: FORCE
> $(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools .arch.vmlinux.o
> 
> endif
> 
> 
> 
> I did not test it, though.

Thanks for the pointer. I will try and build on that.

Just to be completely sure, does the below incremetal diff on top of the 
existing patch capture your suggestion?

---
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 49946cb96844..04e088d7a1ca 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -22,6 +22,13 @@ targets += .vmlinux.export.o
 vmlinux: .vmlinux.export.o
 endif
 
+ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
+vmlinux: .arch.vmlinux.o
+
+.arch.vmlinux.o: FORCE
+$(Q)$(MAKE) $(build)=arch/$(SRCARCH)/tools .arch.vmlinux.o
+endif
+
 ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
 
 # Final link of vmlinux with optional arch pass after final link
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 07f70e105d82..f1b705b8cdca 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -227,12 +227,6 @@ ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init 
init/version-timestamp.o
 arch_vmlinux_o=""
 if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
arch_vmlinux_o=.arch.vmlinux.o
-   info "ARCH" ${arch_vmlinux_o}
-   if ! ${srctree}/arch/${SRCARCH}/tools/vmlinux_o.sh ${arch_vmlinux_o} ; 
then
-   echo >&2 "Failed to generate ${arch_vmlinux_o}"
-   echo >&2 "Try to disable CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX"
-   exit 1
-   fi
 fi
 
 btf_vmlinux_bin_o=""



Thanks,
Naveen

Re: [RFC PATCH v2 3/5] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code

2024-06-11 Thread Naveen N Rao

On Mon, Jun 10, 2024 at 04:06:32PM GMT, Steven Rostedt wrote:
> On Mon, 10 Jun 2024 14:08:16 +0530
> Naveen N Rao  wrote:
> 
> > On 32-bit powerpc, gcc generates a three instruction sequence for
> > function profiling:
> > mflrr0
> > stw r0, 4(r1)
> > bl  _mcount
> > 
> > On kernel boot, the call to _mcount() is nop-ed out, to be patched back
> > in when ftrace is actually enabled. The 'stw' instruction therefore is
> > not necessary unless ftrace is enabled. Nop it out during ftrace init.
> > 
> > When ftrace is enabled, we want the 'stw' so that stack unwinding works
> > properly. Perform the same within the ftrace handler, similar to 64-bit
> > powerpc.
> > 
> > For 64-bit powerpc, early versions of gcc used to emit a three
> > instruction sequence for function profiling (with -mprofile-kernel) with
> > a 'std' instruction to mimic the 'stw' above. Address that scenario also
> > by nop-ing out the 'std' instruction during ftrace init.
> > 
> > Signed-off-by: Naveen N Rao 
> 
> Isn't there still the race that there's a preemption between the:
> 
>   stw r0, 4(r1)
> and
>   bl  _mcount
> 
> And if this breaks stack unwinding, couldn't this cause an issue for live
> kernel patching?
> 
> I know it's very unlikely, but in theory, I think the race exists.

I *think* you are assuming that we will be patching back the 'stw' 
instruction here? So, there could be an issue if a cpu has executed the 
nop instead of 'stw' and then sees the call to _mcount().

But, we don't patch back the 'stw' instruction. That is instead done as 
part of ftrace_caller(), along with setting up an additional stack frame 
to ensure reliable stack unwinding. Commit 41a506ef71eb 
("powerpc/ftrace: Create a dummy stackframe to fix stack unwind") has 
more details.

The primary motivation for this patch is to address differences in the 
function profile sequence with various toolchains. Since commit 
0f71dcfb4aef ("powerpc/ftrace: Add support for 
-fpatchable-function-entry"), we use the same two-instruction profile 
sequence across 32-bit and 64-bit powerpc:
mflrr0
bl  ftrace_caller

This has also been true on 64-bit powerpc with -mprofile-kernel, except 
the very early versions of gcc that supported that option (gcc v5).

On 32-bit powerpc, we used to use the three instruction sequence before 
support for -fpatchable-function-entry was introduced.

In this patch, we move all toolchain variants to use the two-instruction 
sequence for consistency.

Thanks,
Naveen

Re: [RFC PATCH v2 2/5] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace

2024-06-11 Thread Naveen N Rao

On Mon, Jun 10, 2024 at 04:03:56PM GMT, Steven Rostedt wrote:
> On Mon, 10 Jun 2024 14:08:15 +0530
> Naveen N Rao  wrote:
> 
> > Pointer to struct module is only relevant for ftrace records belonging
> > to kernel modules. Having this field in dyn_arch_ftrace wastes memory
> > for all ftrace records belonging to the kernel. Remove the same in
> > favour of looking up the module from the ftrace record address, similar
> > to other architectures.
> > 
> > Signed-off-by: Naveen N Rao 
> > ---
> >  arch/powerpc/include/asm/ftrace.h|  1 -
> >  arch/powerpc/kernel/trace/ftrace.c   | 47 ++-
> >  arch/powerpc/kernel/trace/ftrace_64_pg.c | 73 +++-
> >  3 files changed, 64 insertions(+), 57 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/ftrace.h 
> > b/arch/powerpc/include/asm/ftrace.h
> > index 107fc5a48456..201f9d15430a 100644
> > --- a/arch/powerpc/include/asm/ftrace.h
> > +++ b/arch/powerpc/include/asm/ftrace.h
> > @@ -26,7 +26,6 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
> > unsigned long ip,
> >  struct module;
> >  struct dyn_ftrace;
> >  struct dyn_arch_ftrace {
> > -   struct module *mod;
> >  };
> 
> Nice. I always hated that extra field.

It was your complaint a while back that prompted this change :)

Though I introduce a different pointer here in the next patch. /me 
ducks.

> 
> 
> >  
> >  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
> > diff --git a/arch/powerpc/kernel/trace/ftrace.c 
> > b/arch/powerpc/kernel/trace/ftrace.c
> > index d8d6b4fd9a14..041be965485e 100644
> > --- a/arch/powerpc/kernel/trace/ftrace.c
> > +++ b/arch/powerpc/kernel/trace/ftrace.c
> > @@ -106,20 +106,36 @@ static unsigned long find_ftrace_tramp(unsigned long 
> > ip)
> > return 0;
> >  }
> >  
> > +static struct module *ftrace_lookup_module(struct dyn_ftrace *rec)
> > +{
> > +   struct module *mod = NULL;
> > +
> > +#ifdef CONFIG_MODULES
> > +   /*
> > +* NOTE: __module_text_address() must be called with preemption
> > +* disabled, but we can rely on ftrace_lock to ensure that 'mod'
> > +* retains its validity throughout the remainder of this code.
> > +   */
> > +   preempt_disable();
> > +   mod = __module_text_address(rec->ip);
> > +   preempt_enable();
> > +
> > +   if (!mod)
> > +   pr_err("No module loaded at addr=%lx\n", rec->ip);
> > +#endif
> > +
> > +   return mod;
> > +}
> 
> It may look nicer to have:
> 
> #ifdef CONFIG_MODULES
> static struct module *ftrace_lookup_module(struct dyn_ftrace *rec)
> {
>   struct module *mod = NULL;
> 
>   /*
>* NOTE: __module_text_address() must be called with preemption
>* disabled, but we can rely on ftrace_lock to ensure that 'mod'
>* retains its validity throughout the remainder of this code.
>   */
>   preempt_disable();
>   mod = __module_text_address(rec->ip);
>   preempt_enable();
> 
>   if (!mod)
>   pr_err("No module loaded at addr=%lx\n", rec->ip);
> 
>   return mod;
> }
> #else
> static inline struct module *ftrace_lookup_module(struct dyn_ftrace *rec)
> {
>   return NULL;
> }
> #endif

I wrote this, and then I thought it will be simpler to do the version I 
posted. I will move back to this since it looks to be the preferred way.

> 
> > +
> >  static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long 
> > addr, ppc_inst_t *call_inst)
> >  {
> > unsigned long ip = rec->ip;
> > unsigned long stub;
> > +   struct module *mod;
> >  
> > if (is_offset_in_branch_range(addr - ip)) {
> > /* Within range */
> > stub = addr;
> > -#ifdef CONFIG_MODULES
> > -   } else if (rec->arch.mod) {
> > -   /* Module code would be going to one of the module stubs */
> > -   stub = (addr == (unsigned long)ftrace_caller ? 
> > rec->arch.mod->arch.tramp :
> > -  
> > rec->arch.mod->arch.tramp_regs);
> > -#endif
> > } else if (core_kernel_text(ip)) {
> > /* We would be branching to one of our ftrace stubs */
> > stub = find_ftrace_tramp(ip);
> > @@ -128,7 +144,16 @@ static int ftrace_get_call_inst(struct dyn_ftrace 
> > *rec, unsigned long addr, ppc_
> > return -EINVAL;
> > }
> >

Re: [RFC PATCH v2 4/5] kbuild: Add generic hook for architectures to use before the final vmlinux link

2024-06-10 Thread Naveen N Rao

On Mon, Jun 10, 2024 at 06:14:51PM GMT, Masahiro Yamada wrote:
> On Mon, Jun 10, 2024 at 5:39 PM Naveen N Rao  wrote:
> >
> > On powerpc, we would like to be able to make a pass on vmlinux.o and
> > generate a new object file to be linked into vmlinux. Add a generic pass
> > in link-vmlinux.sh that architectures can use for this purpose.
> > Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
> > provide arch//tools/vmlinux_o.sh, which will be invoked prior to
> > the final vmlinux link step.
> >
> > Signed-off-by: Naveen N Rao 
> > ---
> >  arch/Kconfig|  3 +++
> >  scripts/link-vmlinux.sh | 18 +++---
> >  2 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 975dd22a2dbd..649f0903e7ef 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1643,4 +1643,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
> >  config ARCH_NEED_CMPXCHG_1_EMU
> > bool
> >
> > +config ARCH_WANTS_PRE_LINK_VMLINUX
> > +   def_bool n
> > +
> >  endmenu
> > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > index 46ce5d04dbeb..07f70e105d82 100755
> > --- a/scripts/link-vmlinux.sh
> > +++ b/scripts/link-vmlinux.sh
...
> >
> > +arch_vmlinux_o=""
> > +if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
> > +   arch_vmlinux_o=.arch.vmlinux.o
> > +   info "ARCH" ${arch_vmlinux_o}
> > +   if ! ${srctree}/arch/${SRCARCH}/tools/vmlinux_o.sh 
> > ${arch_vmlinux_o} ; then
> > +   echo >&2 "Failed to generate ${arch_vmlinux_o}"
> > +   echo >&2 "Try to disable CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX"
> > +   exit 1
> > +   fi
> > +fi
> 
> 
> 
> This is wrong because scripts/link-vmlinux.sh is not triggered
> even when source files under arch/powerpc/tools/ are changed.
> 
> Presumably, scripts/Makefile.vmlinux will be the right place.

Ah, yes. Something like this?

diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 49946cb96844..77d90b6ac53e 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -22,6 +22,10 @@ targets += .vmlinux.export.o
 vmlinux: .vmlinux.export.o
 endif

+ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
+vmlinux: $(srctree)/arch/$(SRCARCH)/tools/vmlinux_o.sh
+endif
+
 ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)

 # Final link of vmlinux with optional arch pass after final link


Thanks,
Naveen

[RFC PATCH v2 5/5] powerpc64/ftrace: Move ftrace sequence out of line

2024-06-10 Thread Naveen N Rao

Function profile sequence on powerpc includes two instructions at the
beginning of each function:
mflrr0
bl  ftrace_caller

The call to ftrace_caller() gets nop'ed out during kernel boot and is
patched in when ftrace is enabled.

Given the sequence, we cannot return from ftrace_caller with 'blr' as we
need to keep LR and r0 intact. This results in link stack imbalance when
ftrace is enabled. To address that, we would like to use a three
instruction sequence:
mflrr0
bl  ftrace_caller
mtlrr0

Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
reserve two instruction slots before the function. This results in a
total of five instruction slots to be reserved for ftrace use on each
function that is traced.

Move the function profile sequence out-of-line to minimize its impact.
To do this, we reserve a single nop at function entry using
-fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
the total number of functions that can be traced. This is then used to
generate a .S file reserving the appropriate amount of space for use as
ftrace stubs, which is built and linked into vmlinux.

On bootup, the stub space is split into separate stubs per function and
populated with the proper instruction sequence. A pointer to the
associated stub is maintained in dyn_arch_ftrace.

For modules, space for ftrace stubs is reserved from the generic module
stub space.

This is restricted to and enabled by default only on 64-bit powerpc.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig |   4 +
 arch/powerpc/Makefile|   4 +
 arch/powerpc/include/asm/ftrace.h|  10 ++
 arch/powerpc/include/asm/module.h|   5 +
 arch/powerpc/kernel/asm-offsets.c|   4 +
 arch/powerpc/kernel/module_64.c  |  67 +--
 arch/powerpc/kernel/trace/ftrace.c   | 145 +--
 arch/powerpc/kernel/trace/ftrace_entry.S |  71 ---
 arch/powerpc/kernel/vmlinux.lds.S|   3 +-
 arch/powerpc/tools/vmlinux_o.sh  |  47 
 10 files changed, 324 insertions(+), 36 deletions(-)
 create mode 100755 arch/powerpc/tools/vmlinux_o.sh

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c88c6d46a5bc..c393daeaf643 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -568,6 +568,10 @@ config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
def_bool 
$(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh 
$(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
def_bool 
$(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh 
$(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
 
+config FTRACE_PFE_OUT_OF_LINE
+   def_bool PPC64 && ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+   select ARCH_WANTS_PRE_LINK_VMLINUX
+
 config HOTPLUG_CPU
bool "Support for enabling/disabling CPUs"
depends on SMP && (PPC_PSERIES || \
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index a8479c881cac..bb920d48ec6e 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -155,7 +155,11 @@ CC_FLAGS_NO_FPU:= $(call 
cc-option,-msoft-float)
 ifdef CONFIG_FUNCTION_TRACER
 ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+CC_FLAGS_FTRACE := -fpatchable-function-entry=1
+else
 CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 201f9d15430a..9da1da0f87b4 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -26,6 +26,9 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
unsigned long ip,
 struct module;
 struct dyn_ftrace;
 struct dyn_arch_ftrace {
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+   unsigned long pfe_stub;
+#endif
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
@@ -132,6 +135,13 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { 
return 1; }
 
 #ifdef CONFIG_FUNCTION_TRACER
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
+#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE
+struct ftrace_pfe_stub {
+   u32 insn[4];
+};
+extern struct ftrace_pfe_stub ftrace_pfe_stub_text[], 
ftrace_pfe_stub_inittext[];
+extern unsigned long ftrace_pfe_stub_text_count, 
ftrace_pfe_stub_inittext_count;
+#endif
 void ftrace_free_init_tramp(void);
 unsigned long ftrace_call_adjust(unsigned long addr);
 #else
diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index 300c777cc307..28dbd1ec5593 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -47,6 +47,11 @@ struct mod_arch_specific {
 #ifdef CONFIG_DYNAMIC_FTRACE
unsigned long tramp;
unsign

[RFC PATCH v2 1/5] powerpc/kprobes: Use ftrace to determine if a probe is at function entry

2024-06-10 Thread Naveen N Rao

Rather than hard-coding the offset into a function to be used to
determine if a kprobe is at function entry, use ftrace_location() to
determine the ftrace location within the function and categorize all
instructions till that offset to be function entry.

For functions that cannot be traced, we fall back to using a fixed
offset of 8 (two instructions) to categorize a probe as being at
function entry for 64-bit elfv2, unless we are using pcrel.

Acked-by: Masami Hiramatsu (Google) 
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/kprobes.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 14c5ddec3056..ca204f4f21c1 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
return addr;
 }
 
-static bool arch_kprobe_on_func_entry(unsigned long offset)
+static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
 {
-#ifdef CONFIG_PPC64_ELF_ABI_V2
-#ifdef CONFIG_KPROBES_ON_FTRACE
-   return offset <= 16;
-#else
-   return offset <= 8;
-#endif
-#else
+   unsigned long ip = ftrace_location(addr);
+
+   if (ip)
+   return offset <= (ip - addr);
+   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) && 
!IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+   return offset <= 8;
return !offset;
-#endif
 }
 
 /* XXX try and fold the magic of kprobe_lookup_name() in this */
 kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long 
offset,
 bool *on_func_entry)
 {
-   *on_func_entry = arch_kprobe_on_func_entry(offset);
+   *on_func_entry = arch_kprobe_on_func_entry(addr, offset);
return (kprobe_opcode_t *)(addr + offset);
 }
 
-- 
2.45.2

[RFC PATCH v2 4/5] kbuild: Add generic hook for architectures to use before the final vmlinux link

2024-06-10 Thread Naveen N Rao

On powerpc, we would like to be able to make a pass on vmlinux.o and
generate a new object file to be linked into vmlinux. Add a generic pass
in link-vmlinux.sh that architectures can use for this purpose.
Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must
provide arch//tools/vmlinux_o.sh, which will be invoked prior to
the final vmlinux link step.

Signed-off-by: Naveen N Rao 
---
 arch/Kconfig|  3 +++
 scripts/link-vmlinux.sh | 18 +++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 975dd22a2dbd..649f0903e7ef 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1643,4 +1643,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
 config ARCH_NEED_CMPXCHG_1_EMU
bool
 
+config ARCH_WANTS_PRE_LINK_VMLINUX
+   def_bool n
+
 endmenu
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 46ce5d04dbeb..07f70e105d82 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -122,7 +122,7 @@ gen_btf()
return 1
fi
 
-   vmlinux_link ${1}
+   vmlinux_link ${1} ${arch_vmlinux_o}
 
info "BTF" ${2}
LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
@@ -178,7 +178,7 @@ kallsyms_step()
kallsymso=${kallsyms_vmlinux}.o
kallsyms_S=${kallsyms_vmlinux}.S
 
-   vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" 
${btf_vmlinux_bin_o}
+   vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" 
${btf_vmlinux_bin_o} ${arch_vmlinux_o}
mksysmap ${kallsyms_vmlinux} ${kallsyms_vmlinux}.syms
kallsyms ${kallsyms_vmlinux}.syms ${kallsyms_S}
 
@@ -203,6 +203,7 @@ sorttable()
 
 cleanup()
 {
+   rm -f .arch.vmlinux.*
rm -f .btf.*
rm -f System.map
rm -f vmlinux
@@ -223,6 +224,17 @@ fi
 
 ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init 
init/version-timestamp.o
 
+arch_vmlinux_o=""
+if is_enabled CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX; then
+   arch_vmlinux_o=.arch.vmlinux.o
+   info "ARCH" ${arch_vmlinux_o}
+   if ! ${srctree}/arch/${SRCARCH}/tools/vmlinux_o.sh ${arch_vmlinux_o} ; 
then
+   echo >&2 "Failed to generate ${arch_vmlinux_o}"
+   echo >&2 "Try to disable CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX"
+   exit 1
+   fi
+fi
+
 btf_vmlinux_bin_o=""
 if is_enabled CONFIG_DEBUG_INFO_BTF; then
btf_vmlinux_bin_o=.btf.vmlinux.bin.o
@@ -273,7 +285,7 @@ if is_enabled CONFIG_KALLSYMS; then
fi
 fi
 
-vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o}
+vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o} ${arch_vmlinux_o}
 
 # fill in BTF IDs
 if is_enabled CONFIG_DEBUG_INFO_BTF && is_enabled CONFIG_BPF; then
-- 
2.45.2

[RFC PATCH v2 3/5] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code

2024-06-10 Thread Naveen N Rao

On 32-bit powerpc, gcc generates a three instruction sequence for
function profiling:
mflrr0
stw r0, 4(r1)
bl  _mcount

On kernel boot, the call to _mcount() is nop-ed out, to be patched back
in when ftrace is actually enabled. The 'stw' instruction therefore is
not necessary unless ftrace is enabled. Nop it out during ftrace init.

When ftrace is enabled, we want the 'stw' so that stack unwinding works
properly. Perform the same within the ftrace handler, similar to 64-bit
powerpc.

For 64-bit powerpc, early versions of gcc used to emit a three
instruction sequence for function profiling (with -mprofile-kernel) with
a 'std' instruction to mimic the 'stw' above. Address that scenario also
by nop-ing out the 'std' instruction during ftrace init.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c   | 6 --
 arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 041be965485e..2e1667a578ff 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -266,13 +266,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
if (!ret)
-   ret = ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
+   ret = ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
+ppc_inst(PPC_RAW_NOP()));
} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl 
_mcount' */
ret = ftrace_read_inst(ip - 4, &old);
if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0 {
ret = ftrace_validate_inst(ip - 8, 
ppc_inst(PPC_RAW_MFLR(_R0)));
-   ret |= ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
+   ret |= ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
+ ppc_inst(PPC_RAW_NOP()));
}
} else {
return -EINVAL;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index 76dbe9fd2c0f..244a1c7bb1e8 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,6 +33,8 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro ftrace_regs_entry allregs
+   /* Save the original return address in A's stack frame */
+   PPC_STL r0, LRSAVE(r1)
/* Create a minimal stack frame for representing B */
PPC_STLUr1, -STACK_FRAME_MIN_SIZE(r1)
 
@@ -44,8 +46,6 @@
SAVE_GPRS(3, 10, r1)
 
 #ifdef CONFIG_PPC64
-   /* Save the original return address in A's stack frame */
-   std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
/* Ok to continue? */
lbz r3, PACA_FTRACE_ENABLED(r13)
cmpdi   r3, 0
-- 
2.45.2

[RFC PATCH v2 2/5] powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace

2024-06-10 Thread Naveen N Rao

Pointer to struct module is only relevant for ftrace records belonging
to kernel modules. Having this field in dyn_arch_ftrace wastes memory
for all ftrace records belonging to the kernel. Remove the same in
favour of looking up the module from the ftrace record address, similar
to other architectures.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ftrace.h|  1 -
 arch/powerpc/kernel/trace/ftrace.c   | 47 ++-
 arch/powerpc/kernel/trace/ftrace_64_pg.c | 73 +++-
 3 files changed, 64 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 107fc5a48456..201f9d15430a 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -26,7 +26,6 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
unsigned long ip,
 struct module;
 struct dyn_ftrace;
 struct dyn_arch_ftrace {
-   struct module *mod;
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index d8d6b4fd9a14..041be965485e 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -106,20 +106,36 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
return 0;
 }
 
+static struct module *ftrace_lookup_module(struct dyn_ftrace *rec)
+{
+   struct module *mod = NULL;
+
+#ifdef CONFIG_MODULES
+   /*
+* NOTE: __module_text_address() must be called with preemption
+* disabled, but we can rely on ftrace_lock to ensure that 'mod'
+* retains its validity throughout the remainder of this code.
+   */
+   preempt_disable();
+   mod = __module_text_address(rec->ip);
+   preempt_enable();
+
+   if (!mod)
+   pr_err("No module loaded at addr=%lx\n", rec->ip);
+#endif
+
+   return mod;
+}
+
 static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, 
ppc_inst_t *call_inst)
 {
unsigned long ip = rec->ip;
unsigned long stub;
+   struct module *mod;
 
if (is_offset_in_branch_range(addr - ip)) {
/* Within range */
stub = addr;
-#ifdef CONFIG_MODULES
-   } else if (rec->arch.mod) {
-   /* Module code would be going to one of the module stubs */
-   stub = (addr == (unsigned long)ftrace_caller ? 
rec->arch.mod->arch.tramp :
-  
rec->arch.mod->arch.tramp_regs);
-#endif
} else if (core_kernel_text(ip)) {
/* We would be branching to one of our ftrace stubs */
stub = find_ftrace_tramp(ip);
@@ -128,7 +144,16 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, 
unsigned long addr, ppc_
return -EINVAL;
}
} else {
-   return -EINVAL;
+   mod = ftrace_lookup_module(rec);
+   if (mod) {
+#ifdef CONFIG_MODULES
+   /* Module code would be going to one of the module 
stubs */
+   stub = (addr == (unsigned long)ftrace_caller ? 
mod->arch.tramp :
+  
mod->arch.tramp_regs);
+#endif
+   } else {
+   return -EINVAL;
+   }
}
 
*call_inst = ftrace_create_branch_inst(ip, stub, 1);
@@ -256,14 +281,6 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
if (ret)
return ret;
 
-   if (!core_kernel_text(ip)) {
-   if (!mod) {
-   pr_err("0x%lx: No module provided for non-kernel 
address\n", ip);
-   return -EFAULT;
-   }
-   rec->arch.mod = mod;
-   }
-
/* Nop-out the ftrace location */
new = ppc_inst(PPC_RAW_NOP());
addr = MCOUNT_ADDR;
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c 
b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 12fab1803bcf..45bd8dcf9886 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -116,6 +116,24 @@ static unsigned long find_bl_target(unsigned long ip, 
ppc_inst_t op)
 }
 
 #ifdef CONFIG_MODULES
+static struct module *ftrace_lookup_module(struct dyn_ftrace *rec)
+{
+   struct module *mod;
+   /*
+* NOTE: __module_text_address() must be called with preemption
+* disabled, but we can rely on ftrace_lock to ensure that 'mod'
+* retains its validity throughout the remainder of this code.
+   */
+   preempt_disable();
+   mod = __module_text_address(rec->ip);
+   preempt_enable();
+
+   if (!mod)
+   pr_err("No module loaded at addr=%lx\n", rec->ip);
+
+   return mod;
+}
+
 static int
 __ftrace_make_nop(struct mod

[RFC PATCH v2 0/5] powerpc/ftrace: Move ftrace sequence out of line

2024-06-10 Thread Naveen N Rao

This is v2 of the series posted here:
http://lkml.kernel.org/r/cover.1702045299.git.nav...@kernel.org

Since v2, the primary change is that the entire ftrace sequence is moved 
out of line and this is now restricted to 64-bit powerpc by default.  
Patch 5 has the details.

I have dropped patches to enable DYNAMIC_FTRACE_WITH_CALL_OPS and ftrace 
direct support so that this approach can be finalized.

This series depends on Benjamin Gray's series adding support for 
patch_ulong():
http://lkml.kernel.org/r/20240515024445.236364-1-bg...@linux.ibm.com


Appreciate feedback on the approach.


Thanks,
Naveen



Naveen N Rao (5):
  powerpc/kprobes: Use ftrace to determine if a probe is at function
entry
  powerpc/ftrace: Remove pointer to struct module from dyn_arch_ftrace
  powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
  kbuild: Add generic hook for architectures to use before the final
vmlinux link
  powerpc64/ftrace: Move ftrace sequence out of line

 arch/Kconfig |   3 +
 arch/powerpc/Kconfig |   4 +
 arch/powerpc/Makefile|   4 +
 arch/powerpc/include/asm/ftrace.h|  11 +-
 arch/powerpc/include/asm/module.h|   5 +
 arch/powerpc/kernel/asm-offsets.c|   4 +
 arch/powerpc/kernel/kprobes.c|  18 +--
 arch/powerpc/kernel/module_64.c  |  67 +++-
 arch/powerpc/kernel/trace/ftrace.c   | 196 ---
 arch/powerpc/kernel/trace/ftrace_64_pg.c |  73 -
 arch/powerpc/kernel/trace/ftrace_entry.S |  75 ++---
 arch/powerpc/kernel/vmlinux.lds.S|   3 +-
 arch/powerpc/tools/vmlinux_o.sh  |  47 ++
 scripts/link-vmlinux.sh  |  18 ++-
 14 files changed, 419 insertions(+), 109 deletions(-)
 create mode 100755 arch/powerpc/tools/vmlinux_o.sh


base-commit: 2c644f2847c188b4fa545e602e4a1d4db55e8c8d
prerequisite-patch-id: a1d50e589288239d5a8b1c1f354cd4737057c9d3
prerequisite-patch-id: da4142d56880861bd0ad7ad7087c9e2670a2ee54
prerequisite-patch-id: 609d292e054b2396b603890522a940fa0bdfb6d8
prerequisite-patch-id: 6f7213fb77b1260defbf43be0e47bff9c80054cc
prerequisite-patch-id: ad3b71bf071ae4ba1bee5b087e61a2055772a74f
-- 
2.45.2

Re: [PATCH v2] arch/powerpc: Remove unused cede related functions

2024-05-14 Thread Naveen N Rao

On Tue, May 14, 2024 at 06:54:55PM GMT, Gautam Menghani wrote:
> Remove extended_cede_processor() and its helpers as
> extended_cede_processor() has no callers since
> commit 48f6e7f6d948("powerpc/pseries: remove cede offline state for CPUs")
> 
> Signed-off-by: Gautam Menghani 
> ---
> v1 -> v2:
> 1. Remove helpers of extended_cede_processor()

Acked-by: Naveen N Rao 

> 
>  arch/powerpc/include/asm/plpar_wrappers.h | 28 ---
>  1 file changed, 28 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
> b/arch/powerpc/include/asm/plpar_wrappers.h
> index b3ee44a40c2f..71648c126970 100644
> --- a/arch/powerpc/include/asm/plpar_wrappers.h
> +++ b/arch/powerpc/include/asm/plpar_wrappers.h
> @@ -18,16 +18,6 @@ static inline long poll_pending(void)
>   return plpar_hcall_norets(H_POLL_PENDING);
>  }
>  
> -static inline u8 get_cede_latency_hint(void)
> -{
> - return get_lppaca()->cede_latency_hint;
> -}
> -
> -static inline void set_cede_latency_hint(u8 latency_hint)
> -{
> - get_lppaca()->cede_latency_hint = latency_hint;
> -}
> -
>  static inline long cede_processor(void)
>  {
>   /*
> @@ -37,24 +27,6 @@ static inline long cede_processor(void)
>   return plpar_hcall_norets_notrace(H_CEDE);
>  }
>  
> -static inline long extended_cede_processor(unsigned long latency_hint)
> -{
> - long rc;
> - u8 old_latency_hint = get_cede_latency_hint();
> -
> - set_cede_latency_hint(latency_hint);
> -
> - rc = cede_processor();
> -
> - /* Ensure that H_CEDE returns with IRQs on */
> - if (WARN_ON(IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && !(mfmsr() & 
> MSR_EE)))
> - __hard_irq_enable();
> -
> - set_cede_latency_hint(old_latency_hint);
> -
> - return rc;
> -}
> -
>  static inline long vpa_call(unsigned long flags, unsigned long cpu,
>   unsigned long vpa)
>  {
> -- 
> 2.45.0
>

Re: [PATCH] arch/powerpc: Remove the definition of unused cede function

2024-05-14 Thread Naveen N Rao

On Tue, May 14, 2024 at 03:35:03PM GMT, Gautam Menghani wrote:
> Remove extended_cede_processor() definition as it has no callers since
> commit 48f6e7f6d948("powerpc/pseries: remove cede offline state for CPUs")

extended_cede_processor() was added in commit 69ddb57cbea0 
("powerpc/pseries: Add extended_cede_processor() helper function."), 
which also added [get|set]_cede_latency_hint(). Those can also be 
removed if extended_cede_processor() is no longer needed.

- Naveen

> 
> Signed-off-by: Gautam Menghani 
> ---
>  arch/powerpc/include/asm/plpar_wrappers.h | 18 --
>  1 file changed, 18 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
> b/arch/powerpc/include/asm/plpar_wrappers.h
> index b3ee44a40c2f..6431fa1e1cb1 100644
> --- a/arch/powerpc/include/asm/plpar_wrappers.h
> +++ b/arch/powerpc/include/asm/plpar_wrappers.h
> @@ -37,24 +37,6 @@ static inline long cede_processor(void)
>   return plpar_hcall_norets_notrace(H_CEDE);
>  }
>  
> -static inline long extended_cede_processor(unsigned long latency_hint)
> -{
> - long rc;
> - u8 old_latency_hint = get_cede_latency_hint();
> -
> - set_cede_latency_hint(latency_hint);
> -
> - rc = cede_processor();
> -
> - /* Ensure that H_CEDE returns with IRQs on */
> - if (WARN_ON(IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && !(mfmsr() & 
> MSR_EE)))
> - __hard_irq_enable();
> -
> - set_cede_latency_hint(old_latency_hint);
> -
> - return rc;
> -}
> -
>  static inline long vpa_call(unsigned long flags, unsigned long cpu,
>   unsigned long vpa)
>  {
> -- 
> 2.45.0
>

Re: [PATCH v3 3/5] powerpc/64: Convert patch_instruction() to patch_u32()

2024-05-14 Thread Naveen N Rao

On Tue, May 14, 2024 at 04:39:30AM GMT, Christophe Leroy wrote:
> 
> 
> Le 14/05/2024 à 04:59, Benjamin Gray a écrit :
> > On Tue, 2024-04-23 at 15:09 +0530, Naveen N Rao wrote:
> >> On Mon, Mar 25, 2024 at 04:53:00PM +1100, Benjamin Gray wrote:
> >>> This use of patch_instruction() is working on 32 bit data, and can
> >>> fail
> >>> if the data looks like a prefixed instruction and the extra write
> >>> crosses a page boundary. Use patch_u32() to fix the write size.
> >>>
> >>> Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO
> >>> modules")
> >>> Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/
> >>> Signed-off-by: Benjamin Gray 
> >>>
> >>> ---
> >>>
> >>> v2: * Added the fixes tag, it seems appropriate even if the subject
> >>> does
> >>>    mention a more robust solution being required.
> >>>
> >>> patch_u64() should be more efficient, but judging from the bug
> >>> report
> >>> it doesn't seem like the data is doubleword aligned.
> >>
> >> Asking again, is that still the case? It looks like at least the
> >> first
> >> fix below can be converted to patch_u64().
> >>
> >> - Naveen
> > 
> > Sorry, I think I forgot this question last time. Reading the commit
> > descriptions you linked, I don't see any mention of "entry->funcdata
> > will always be doubleword aligned because XYZ". If the patch makes it
> > doubleword aligned anyway, I wouldn't be confident asserting all
> > callers will always do this without looking into it a lot more.

No worries. I was asking primarily to check if you had noticed a 
specific issue with alignment.

As Christophe mentions, the structure is aligned. It is primarily 
allotted in a separate stubs section for modules. Looking at it closer 
though, I wonder if we need the below:

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index cccb1f78e058..0226d73a0007 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -428,8 +428,11 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
 
/* Find .toc and .stubs sections, symtab and strtab */
for (i = 1; i < hdr->e_shnum; i++) {
-   if (strcmp(secstrings + sechdrs[i].sh_name, ".stubs") == 0)
+   if (strcmp(secstrings + sechdrs[i].sh_name, ".stubs") == 0) {
me->arch.stubs_section = i;
+   if (sechdrs[i].sh_addralign < 8)
+   sechdrs[i].sh_addralign = 8;
+   }
 #ifdef CONFIG_PPC_KERNEL_PCREL
else if (strcmp(secstrings + sechdrs[i].sh_name, 
".data..percpu") == 0)
me->arch.pcpu_section = i;

> > 
> > Perhaps a separate series could optimise it with appropriate
> > justification/assertions to catch bad alignment. But I think leaving it
> > out of this series is fine because the original works in words, so it's
> > not regressing anything.

That should be fine.

> 
> As far as I can see, the struct is 64 bits aligned by definition so 
> funcdata field is aligned too as there are just 8x u32 before it:
> 
> struct ppc64_stub_entry {
>   /*
>* 28 byte jump instruction sequence (7 instructions) that can
>* hold ppc64_stub_insns or stub_insns. Must be 8-byte aligned
>* with PCREL kernels that use prefix instructions in the stub.
>*/
>   u32 jump[7];
>   /* Used by ftrace to identify stubs */
>   u32 magic;
>   /* Data for the above code */
>   func_desc_t funcdata;
> } __aligned(8);
> 

Thanks,
Naveen

Re: [PATCH bpf v3] powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH

2024-05-13 Thread Naveen N Rao

On Mon, May 13, 2024 at 10:02:48AM GMT, Puranjay Mohan wrote:
> The Linux Kernel Memory Model [1][2] requires RMW operations that have a
> return value to be fully ordered.
> 
> BPF atomic operations with BPF_FETCH (including BPF_XCHG and
> BPF_CMPXCHG) return a value back so they need to be JITed to fully
> ordered operations. POWERPC currently emits relaxed operations for
> these.
> 
> We can show this by running the following litmus-test:
> 
> PPC SB+atomic_add+fetch
> 
> {
> 0:r0=x;  (* dst reg assuming offset is 0 *)
> 0:r1=2;  (* src reg *)
> 0:r2=1;
> 0:r4=y;  (* P0 writes to this, P1 reads this *)
> 0:r5=z;  (* P1 writes to this, P0 reads this *)
> 0:r6=0;
> 
> 1:r2=1;
> 1:r4=y;
> 1:r5=z;
> }
> 
> P0  | P1;
> stw r2, 0(r4)   | stw  r2,0(r5) ;
> |   ;
> loop:lwarx  r3, r6, r0  |   ;
> mr  r8, r3  |   ;
> add r3, r3, r1  | sync  ;
> stwcx.  r3, r6, r0  |   ;
> bne loop|   ;
> mr  r1, r8  |   ;
> |   ;
> lwa r7, 0(r5)   | lwa  r7,0(r4) ;
> 
> ~exists(0:r7=0 /\ 1:r7=0)
> 
> Witnesses
> Positive: 9 Negative: 3
> Condition ~exists (0:r7=0 /\ 1:r7=0)
> Observation SB+atomic_add+fetch Sometimes 3 9
> 
> This test shows that the older store in P0 is reordered with a newer
> load to a different address. Although there is a RMW operation with
> fetch between them. Adding a sync before and after RMW fixes the issue:
> 
> Witnesses
> Positive: 9 Negative: 0
> Condition ~exists (0:r7=0 /\ 1:r7=0)
> Observation SB+atomic_add+fetch Never 0 9
> 
> [1] https://www.kernel.org/doc/Documentation/memory-barriers.txt
> [2] https://www.kernel.org/doc/Documentation/atomic_t.txt
> 
> Fixes: 65112709115f ("powerpc/bpf/64: add support for BPF_ATOMIC bitwise 
> operations")

As I noted in v2, I think that is the wrong commit. This fixes the below 
four commits in mainline:
Fixes: aea7ef8a82c0 ("powerpc/bpf/32: add support for BPF_ATOMIC bitwise 
operations")
Fixes: 2d9206b22743 ("powerpc/bpf/32: Add instructions for atomic_[cmp]xchg")
Fixes: dbe6e2456fb0 ("powerpc/bpf/64: add support for atomic fetch operations")
Fixes: 1e82dfaa7819 ("powerpc/bpf/64: Add instructions for atomic_[cmp]xchg")

> Signed-off-by: Puranjay Mohan 
> Acked-by: Paul E. McKenney 

Cc: sta...@vger.kernel.org # v6.0+

I have tested this with test_bpf and test_progs.
Reviewed-by: Naveen N Rao 


- Naveen

Re: [PATCH bpf v2] powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH

2024-05-08 Thread Naveen N Rao

On Wed, May 08, 2024 at 11:54:04AM GMT, Puranjay Mohan wrote:
> The Linux Kernel Memory Model [1][2] requires RMW operations that have a
> return value to be fully ordered.
> 
> BPF atomic operations with BPF_FETCH (including BPF_XCHG and
> BPF_CMPXCHG) return a value back so they need to be JITed to fully
> ordered operations. POWERPC currently emits relaxed operations for
> these.
> 
> We can show this by running the following litmus-test:
> 
> PPC SB+atomic_add+fetch
> 
> {
> 0:r0=x;  (* dst reg assuming offset is 0 *)
> 0:r1=2;  (* src reg *)
> 0:r2=1;
> 0:r4=y;  (* P0 writes to this, P1 reads this *)
> 0:r5=z;  (* P1 writes to this, P0 reads this *)
> 0:r6=0;
> 
> 1:r2=1;
> 1:r4=y;
> 1:r5=z;
> }
> 
> P0  | P1;
> stw r2, 0(r4)   | stw  r2,0(r5) ;
> |   ;
> loop:lwarx  r3, r6, r0  |   ;
> mr  r8, r3  |   ;
> add r3, r3, r1  | sync  ;
> stwcx.  r3, r6, r0  |   ;
> bne loop|   ;
> mr  r1, r8  |   ;
> |   ;
> lwa r7, 0(r5)   | lwa  r7,0(r4) ;
> 
> ~exists(0:r7=0 /\ 1:r7=0)
> 
> Witnesses
> Positive: 9 Negative: 3
> Condition ~exists (0:r7=0 /\ 1:r7=0)
> Observation SB+atomic_add+fetch Sometimes 3 9
> 
> This test shows that the older store in P0 is reordered with a newer
> load to a different address. Although there is a RMW operation with
> fetch between them. Adding a sync before and after RMW fixes the issue:
> 
> Witnesses
> Positive: 9 Negative: 0
> Condition ~exists (0:r7=0 /\ 1:r7=0)
> Observation SB+atomic_add+fetch Never 0 9
> 
> [1] https://www.kernel.org/doc/Documentation/memory-barriers.txt
> [2] https://www.kernel.org/doc/Documentation/atomic_t.txt
> 
> Fixes: 65112709115f ("powerpc/bpf/64: add support for BPF_ATOMIC bitwise 
> operations")
> Signed-off-by: Puranjay Mohan 

Thanks for reporting and fixing this.

There are actually four commits that this fixes across ppc32/ppc64:
Fixes: aea7ef8a82c0 ("powerpc/bpf/32: add support for BPF_ATOMIC bitwise 
operations")
Fixes: 2d9206b22743 ("powerpc/bpf/32: Add instructions for atomic_[cmp]xchg")
Fixes: dbe6e2456fb0 ("powerpc/bpf/64: add support for atomic fetch operations")
Fixes: 1e82dfaa7819 ("powerpc/bpf/64: Add instructions for atomic_[cmp]xchg")

> ---
> Changes in v1 -> v2:
> v1: https://lore.kernel.org/all/20240507175439.119467-1-puran...@kernel.org/
> - Don't emit `sync` for non-SMP kernels as that adds unessential overhead.
> ---
>  arch/powerpc/net/bpf_jit_comp32.c | 12 
>  arch/powerpc/net/bpf_jit_comp64.c | 12 
>  2 files changed, 24 insertions(+)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
> b/arch/powerpc/net/bpf_jit_comp32.c
> index 2f39c50ca729..0318b83f2e6a 100644
> --- a/arch/powerpc/net/bpf_jit_comp32.c
> +++ b/arch/powerpc/net/bpf_jit_comp32.c
> @@ -853,6 +853,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
> u32 *fimage, struct code
>   /* Get offset into TMP_REG */
>   EMIT(PPC_RAW_LI(tmp_reg, off));
>   tmp_idx = ctx->idx * 4;
> + /*
> +  * Enforce full ordering for operations with BPF_FETCH 
> by emitting a 'sync'
> +  * before and after the operation.
> +  *
> +  * This is a requirement in the Linux Kernel Memory 
> Model.
> +  * See __cmpxchg_u64() in asm/cmpxchg.h as an example.
 ^^^
Nit...   u32

> +  */
> + if (imm & BPF_FETCH && IS_ENABLED(CONFIG_SMP))
> + EMIT(PPC_RAW_SYNC());

I think this block should go before the previous two instructions. We 
use tmp_idx as a label to retry the ll/sc sequence, so we will end up 
executing the 'sync' operation on a retry here.

>   /* load value from memory into r0 */
>   EMIT(PPC_RAW_LWARX(_R0, tmp_reg, dst_reg, 0));
>  
> @@ -905,6 +914,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
> u32 *fimage, struct code
>  
>   /* For the BPF_FETCH variant, get old data into src_reg 
> */
>   if (imm & BPF_FETCH) {
> + /* Emit 'sync' to enforce full ordering */
> + if (IS_ENABLED(CONFIG_SMP))
> + EMIT(PPC_RAW_SYNC());
>   EMIT(PPC_RAW_MR(ret_reg, ax_reg));
>   if (!fp->aux->verifier_zext)
>   EMIT(PPC_RAW_LI(ret_reg - 1, 0)); /* 
> higher 32-bit */
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 79f23974a320..9a077f8acf7b 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
>

Re: [PATCH v4 2/2] powerpc/bpf: enable kfunc call

2024-05-07 Thread Naveen N Rao

On Thu, May 02, 2024 at 11:02:05PM GMT, Hari Bathini wrote:
> Currently, bpf jit code on powerpc assumes all the bpf functions and
> helpers to be part of core kernel text. This is false for kfunc case,
> as function addresses may not be part of core kernel text area. So,
> add support for addresses that are not within core kernel text area
> too, to enable kfunc support. Emit instructions based on whether the
> function address is within core kernel text address or not, to retain
> optimized instruction sequence where possible.
> 
> In case of PCREL, as a bpf function that is not within core kernel
> text area is likely to go out of range with relative addressing on
> kernel base, use PC relative addressing. If that goes out of range,
> load the full address with PPC_LI64().
> 
> With addresses that are not within core kernel text area supported,
> override bpf_jit_supports_kfunc_call() to enable kfunc support. Also,
> override bpf_jit_supports_far_kfunc_call() to enable 64-bit pointers,
> as an address offset can be more than 32-bit long on PPC64.
> 
> Signed-off-by: Hari Bathini 
> ---
> 
> * Changes in v4:
>   - Use either kernelbase or PC for relative addressing. Also, fallback
> to PPC_LI64(), if both are out of range.
>   - Update r2 with kernel TOC for elfv1 too as elfv1 also uses the
> optimization sequence, that expects r2 to be kernel TOC, when
> function address is within core kernel text.
> 
> * Changes in v3:
>   - Retained optimized instruction sequence when function address is
> a core kernel address as suggested by Naveen.
>   - Used unoptimized instruction sequence for PCREL addressing to
> avoid out of range errors for core kernel function addresses.
>   - Folded patch that adds support for kfunc calls with patch that
> enables/advertises this support as suggested by Naveen.
> 
> 
>  arch/powerpc/net/bpf_jit_comp.c   | 10 +
>  arch/powerpc/net/bpf_jit_comp64.c | 61 ++-
>  2 files changed, 61 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 0f9a21783329..984655419da5 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -359,3 +359,13 @@ void bpf_jit_free(struct bpf_prog *fp)
>  
>   bpf_prog_unlock_free(fp);
>  }
> +
> +bool bpf_jit_supports_kfunc_call(void)
> +{
> + return true;
> +}
> +
> +bool bpf_jit_supports_far_kfunc_call(void)
> +{
> + return IS_ENABLED(CONFIG_PPC64);
> +}
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 4de08e35e284..8afc14a4a125 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -208,17 +208,13 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, 
> struct codegen_context *ctx,
>   unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
>   long reladdr;
>  
> - if (WARN_ON_ONCE(!core_kernel_text(func_addr)))
> + if (WARN_ON_ONCE(!kernel_text_address(func_addr)))
>   return -EINVAL;
>  
> - if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
> - reladdr = func_addr - local_paca->kernelbase;
> +#ifdef CONFIG_PPC_KERNEL_PCREL

Would be good to retain use of IS_ENABLED().
Reviewed-by: Naveen N Rao 


- Naveen

Re: [PATCH v4 1/2] powerpc64/bpf: fix tail calls for PCREL addressing

2024-05-07 Thread Naveen N Rao

On Thu, May 02, 2024 at 11:02:04PM GMT, Hari Bathini wrote:
> With PCREL addressing, there is no kernel TOC. So, it is not setup in
> prologue when PCREL addressing is used. But the number of instructions
> to skip on a tail call was not adjusted accordingly. That resulted in
> not so obvious failures while using tailcalls. 'tailcalls' selftest
> crashed the system with the below call trace:
> 
>   bpf_test_run+0xe8/0x3cc (unreliable)
>   bpf_prog_test_run_skb+0x348/0x778
>   __sys_bpf+0xb04/0x2b00
>   sys_bpf+0x28/0x38
>   system_call_exception+0x168/0x340
>   system_call_vectored_common+0x15c/0x2ec
> 
> Also, as bpf programs are always module addresses and a bpf helper in
> general is a core kernel text address, using PC relative addressing
> often fails with "out of range of pcrel address" error. Switch to
> using kernel base for relative addressing to handle this better.
> 
> Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL 
> addresing")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Hari Bathini 
> ---
> 
> * Changes in v4:
>   - Fix out of range errors by switching to kernelbase instead of PC
> for relative addressing.
> 
> * Changes in v3:
>   - New patch to fix tailcall issues with PCREL addressing.
> 
> 
>  arch/powerpc/net/bpf_jit_comp64.c | 30 --
>  1 file changed, 16 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 79f23974a320..4de08e35e284 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -202,7 +202,8 @@ void bpf_jit_build_epilogue(u32 *image, struct 
> codegen_context *ctx)
>   EMIT(PPC_RAW_BLR());
>  }
>  
> -static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context 
> *ctx, u64 func)
> +static int
> +bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context 
> *ctx, u64 func)
>  {
>   unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
>   long reladdr;
> @@ -211,19 +212,20 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, 
> struct codegen_context *ctx, u
>   return -EINVAL;
>  
>   if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
> - reladdr = func_addr - CTX_NIA(ctx);
> + reladdr = func_addr - local_paca->kernelbase;
>  
>   if (reladdr >= (long)SZ_8G || reladdr < -(long)SZ_8G) {
> - pr_err("eBPF: address of %ps out of range of pcrel 
> address.\n",
> - (void *)func);
> + pr_err("eBPF: address of %ps out of range of 34-bit 
> relative address.\n",
> +(void *)func);
>   return -ERANGE;
>   }
> - /* pla r12,addr */
> - EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(1) | IMM_H18(reladdr));
> - EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | IMM_L(reladdr));
> - EMIT(PPC_RAW_MTCTR(_R12));
> - EMIT(PPC_RAW_BCTR());
> -
> + EMIT(PPC_RAW_LD(_R12, _R13, offsetof(struct paca_struct, 
> kernelbase)));
> + /* Align for subsequent prefix instruction */
> + if (!IS_ALIGNED((unsigned long)fimage + CTX_NIA(ctx), 8))
> + EMIT(PPC_RAW_NOP());

We don't need the prefix instruction to be aligned to a doubleword 
boundary - it just shouldn't cross a 64-byte boundary. Since we know the 
exact address of the instruction here, we should be able to check for 
that case.

> + /* paddi r12,r12,addr */
> + EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(0) | IMM_H18(reladdr));
> + EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | ___PPC_RA(_R12) | 
> IMM_L(reladdr));
>   } else {
>   reladdr = func_addr - kernel_toc_addr();
>   if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {
> @@ -233,9 +235,9 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, struct 
> codegen_context *ctx, u
>  
>   EMIT(PPC_RAW_ADDIS(_R12, _R2, PPC_HA(reladdr)));
>   EMIT(PPC_RAW_ADDI(_R12, _R12, PPC_LO(reladdr)));
> - EMIT(PPC_RAW_MTCTR(_R12));
> - EMIT(PPC_RAW_BCTRL());
>   }
> + EMIT(PPC_RAW_MTCTR(_R12));
> + EMIT(PPC_RAW_BCTRL());

This change shouldn't be necessary since these instructions are moved 
back into the conditional in the next patch.

Other than those minor comments:
Reviewed-by: Naveen N Rao 


- Naveen

Re: [PATCH v6] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests

2024-05-06 Thread Naveen N Rao

; +
> + for_each_possible_cpu(cpu) {
> + kvmhv_set_l2_counters_status(cpu, false);
> + }
> +}
> +
> +static void do_trace_nested_cs_time(struct kvm_vcpu *vcpu)
> +{
> + struct lppaca *lp = get_lppaca();
> + u64 l1_to_l2_ns, l2_to_l1_ns, l2_runtime_ns;
> +
> + l1_to_l2_ns = tb_to_ns(be64_to_cpu(lp->l1_to_l2_cs_tb));
> + l2_to_l1_ns = tb_to_ns(be64_to_cpu(lp->l2_to_l1_cs_tb));
> + l2_runtime_ns = tb_to_ns(be64_to_cpu(lp->l2_runtime_tb));
> + trace_kvmppc_vcpu_stats(vcpu, l1_to_l2_ns - local_paca->l1_to_l2_cs,
> +     l2_to_l1_ns - local_paca->l2_to_l1_cs,
> + l2_runtime_ns - 
> local_paca->l2_runtime_agg);

Depending on how the hypervisor works, if the vcpu was in l2 when the 
tracepoint is enabled, the counters may not be updated on exit and we 
may emit a trace with all values zero. If that is possible, it might be 
a good idea to only emit the trace if any of the counters are non-zero.

Otherwise, this looks good to me.
Acked-by: Naveen N Rao 


- Naveen

> + local_paca->l1_to_l2_cs = l1_to_l2_ns;
> + local_paca->l2_to_l1_cs = l2_to_l1_ns;
> + local_paca->l2_runtime_agg = l2_runtime_ns;
> +}
> +
>  static int kvmhv_vcpu_entry_nestedv2(struct kvm_vcpu *vcpu, u64 time_limit,
>unsigned long lpcr, u64 *tb)
>  {
> @@ -4156,6 +4204,10 @@ static int kvmhv_vcpu_entry_nestedv2(struct kvm_vcpu 
> *vcpu, u64 time_limit,
>  
>   timer_rearm_host_dec(*tb);
>  
> + /* Record context switch and guest_run_time data */
> + if (kvmhv_get_l2_counters_status())
> + do_trace_nested_cs_time(vcpu);
> +
>   return trap;
>  }
>  
> diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
> index 8d57c8428531..dc118ab88f23 100644
> --- a/arch/powerpc/kvm/trace_hv.h
> +++ b/arch/powerpc/kvm/trace_hv.h
> @@ -238,6 +238,9 @@
>   {H_MULTI_THREADS_ACTIVE,"H_MULTI_THREADS_ACTIVE"}, \
>   {H_OUTSTANDING_COP_OPS, "H_OUTSTANDING_COP_OPS"}
>  
> +int kmvhv_counters_tracepoint_regfunc(void);
> +void kmvhv_counters_tracepoint_unregfunc(void);
> +
>  TRACE_EVENT(kvm_guest_enter,
>   TP_PROTO(struct kvm_vcpu *vcpu),
>   TP_ARGS(vcpu),
> @@ -512,6 +515,30 @@ TRACE_EVENT(kvmppc_run_vcpu_exit,
>   __entry->vcpu_id, __entry->exit, __entry->ret)
>  );
>  
> +TRACE_EVENT_FN(kvmppc_vcpu_stats,
> + TP_PROTO(struct kvm_vcpu *vcpu, u64 l1_to_l2_cs, u64 l2_to_l1_cs, u64 
> l2_runtime),
> +
> + TP_ARGS(vcpu, l1_to_l2_cs, l2_to_l1_cs, l2_runtime),
> +
> + TP_STRUCT__entry(
> + __field(int,vcpu_id)
> + __field(u64,l1_to_l2_cs)
> + __field(u64,l2_to_l1_cs)
> + __field(u64,l2_runtime)
> + ),
> +
> + TP_fast_assign(
> + __entry->vcpu_id  = vcpu->vcpu_id;
> + __entry->l1_to_l2_cs = l1_to_l2_cs;
> + __entry->l2_to_l1_cs = l2_to_l1_cs;
> + __entry->l2_runtime = l2_runtime;
> + ),
> +
> + TP_printk("VCPU %d: l1_to_l2_cs_time=%llu ns l2_to_l1_cs_time=%llu ns 
> l2_runtime=%llu ns",
> + __entry->vcpu_id,  __entry->l1_to_l2_cs,
> + __entry->l2_to_l1_cs, __entry->l2_runtime),
> + kmvhv_counters_tracepoint_regfunc, kmvhv_counters_tracepoint_unregfunc
> +);
>  #endif /* _TRACE_KVM_HV_H */
>  
>  /* This part must be outside protection */
> -- 
> 2.44.0
>

Re: [PATCH v5 RESEND] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests

2024-04-25 Thread Naveen N Rao

On Wed, Apr 24, 2024 at 11:08:38AM +0530, Gautam Menghani wrote:
> On Mon, Apr 22, 2024 at 09:15:02PM +0530, Naveen N Rao wrote:
> > On Tue, Apr 02, 2024 at 12:36:54PM +0530, Gautam Menghani wrote:
> > >  static int kvmhv_vcpu_entry_nestedv2(struct kvm_vcpu *vcpu, u64 
> > >  time_limit,
> > >unsigned long lpcr, u64 *tb)
> > >  {
> > > @@ -4130,6 +4161,11 @@ static int kvmhv_vcpu_entry_nestedv2(struct 
> > > kvm_vcpu *vcpu, u64 time_limit,
> > >   kvmppc_gse_put_u64(io->vcpu_run_input, KVMPPC_GSID_LPCR, lpcr);
> > >  
> > >   accumulate_time(vcpu, &vcpu->arch.in_guest);
> > > +
> > > + /* Enable the guest host context switch time tracking */
> > > + if (unlikely(trace_kvmppc_vcpu_exit_cs_time_enabled()))
> > > + kvmhv_set_l2_accumul(1);
> > > +
> > >   rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
> > > &trap, &i);
> > >  
> > > @@ -4156,6 +4192,10 @@ static int kvmhv_vcpu_entry_nestedv2(struct 
> > > kvm_vcpu *vcpu, u64 time_limit,
> > >  
> > >   timer_rearm_host_dec(*tb);
> > >  
> > > + /* Record context switch and guest_run_time data */
> > > + if (kvmhv_get_l2_accumul())
> > > + do_trace_nested_cs_time(vcpu);
> > > +
> > >   return trap;
> > >  }
> > 
> > I'm assuming the counters in VPA are cumulative, since you are zero'ing 
> > them out on exit. If so, I think a better way to implement this is to 
> > use TRACE_EVENT_FN() and provide tracepoint registration and 
> > unregistration functions. You can then enable the counters once during 
> > registration and avoid repeated writes to the VPA area. With that, you 
> > also won't need to do anything before vcpu entry. If you maintain 
> > previous values, you can calculate the delta and emit the trace on vcpu 
> > exit. The values in VPA area can then serve as the cumulative values.
> > 
> 
> This approach will have a problem. The context switch times are reported
> in the L1 LPAR's CPU's VPA area. Consider the following scenario:
> 
> 1. L1 has 2 cpus, and L2 has 1 cpu
> 2. L2 runs on L1's cpu0 for a few seconds, and the counter values go to
> 1 million
> 3. We are maintaining a copy of values of VPA in separate variables, so
> those variables also have 1 million.
> 4. Now if L2's vcpu is migrated to another L1 cpu, that L1 cpu's VPA
> counters will start from 0, so if we try to get delta value, we will end
> up doing 0 - 1 million, which would be wrong.

I'm assuming you mean migrating the task. If we maintain the previous 
readings in paca, it should work I think.

> 
> The aggregation logic in this patch works as we zero out the VPA after
> every switch, and maintain aggregation in a vcpu->arch

Are the cumulative values of the VPA counters of no significance? We 
lose those with this approach. Not sure if we care.


- Naveen

Re: [PATCH v3 0/5] Add generic data patching functions

2024-04-23 Thread Naveen N Rao

On Mon, Mar 25, 2024 at 04:52:57PM +1100, Benjamin Gray wrote:
> Currently patch_instruction() bases the write length on the value being
> written. If the value looks like a prefixed instruction it writes 8 bytes,
> otherwise it writes 4 bytes. This makes it potentially buggy to use for
> writing arbitrary data, as if you want to write 4 bytes but it decides to
> write 8 bytes it may clobber the following memory or be unaligned and
> trigger an oops if it tries to cross a page boundary.
> 
> To solve this, this series pulls out the size parameter to the 'top' of
> the memory patching logic, and propagates it through the various functions.
> 
> The two sizes supported are int and long; this allows for patching
> instructions and pointers on both ppc32 and ppc64. On ppc32 these are the
> same size, so care is taken to only use the size parameter on static
> functions, so the compiler can optimise it out entirely. Unfortunately
> GCC trips over its own feet here and won't optimise in a way that is
> optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX
> (pmac32_defconfig). More details in the v2 cover letter.
> 
> Changes from v2:
>   * Various changes noted on each patch
>   * Data patching now enforced to be aligned
>   * Restore page aligned flushing optimisation
> 
> Changes from v1:
>   * Addressed the v1 review actions
>   * Removed noinline (for now)
> 
> v2: 
> https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20231016050147.115686-1-bg...@linux.ibm.com/
> v1: 
> https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20230207015643.590684-1-bg...@linux.ibm.com/
> 
> Benjamin Gray (5):
>   powerpc/code-patching: Add generic memory patching
>   powerpc/code-patching: Add data patch alignment check
>   powerpc/64: Convert patch_instruction() to patch_u32()
>   powerpc/32: Convert patch_instruction() to patch_uint()
>   powerpc/code-patching: Add boot selftest for data patching
> 
>  arch/powerpc/include/asm/code-patching.h | 37 +
>  arch/powerpc/kernel/module_64.c  |  5 +-
>  arch/powerpc/kernel/static_call.c|  2 +-
>  arch/powerpc/lib/code-patching.c | 70 +++-
>  arch/powerpc/lib/test-code-patching.c| 36 
>  arch/powerpc/platforms/powermac/smp.c|  2 +-
>  6 files changed, 132 insertions(+), 20 deletions(-)

Apart from the minor comments, for this series:
Acked-by: Naveen N Rao 

Thanks for working on this.


- Naveen

Re: [PATCH v3 3/5] powerpc/64: Convert patch_instruction() to patch_u32()

2024-04-23 Thread Naveen N Rao

On Mon, Mar 25, 2024 at 04:53:00PM +1100, Benjamin Gray wrote:
> This use of patch_instruction() is working on 32 bit data, and can fail
> if the data looks like a prefixed instruction and the extra write
> crosses a page boundary. Use patch_u32() to fix the write size.
> 
> Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules")
> Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/
> Signed-off-by: Benjamin Gray 
> 
> ---
> 
> v2: * Added the fixes tag, it seems appropriate even if the subject does
>   mention a more robust solution being required.
> 
> patch_u64() should be more efficient, but judging from the bug report
> it doesn't seem like the data is doubleword aligned.

Asking again, is that still the case? It looks like at least the first 
fix below can be converted to patch_u64().

- Naveen

> ---
>  arch/powerpc/kernel/module_64.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 7112adc597a8..e9bab599d0c2 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -651,12 +651,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
>   // func_desc_t is 8 bytes if ABIv2, else 16 bytes
>   desc = func_desc(addr);
>   for (i = 0; i < sizeof(func_desc_t) / sizeof(u32); i++) {
> - if (patch_instruction(((u32 *)&entry->funcdata) + i,
> -   ppc_inst(((u32 *)(&desc))[i])))
> + if (patch_u32(((u32 *)&entry->funcdata) + i, ((u32 *)&desc)[i]))
>   return 0;
>   }
>  
> - if (patch_instruction(&entry->magic, ppc_inst(STUB_MAGIC)))
> + if (patch_u32(&entry->magic, STUB_MAGIC))
>   return 0;
>  
>   return 1;
> -- 
> 2.44.0
>

Re: [PATCH v3 5/5] powerpc/code-patching: Add boot selftest for data patching

2024-04-23 Thread Naveen N Rao

On Mon, Mar 25, 2024 at 04:53:02PM +1100, Benjamin Gray wrote:
> Extend the code patching selftests with some basic coverage of the new
> data patching variants too.
> 
> Signed-off-by: Benjamin Gray 
> 
> ---
> 
> v3: * New in v3
> ---
>  arch/powerpc/lib/test-code-patching.c | 36 +++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/arch/powerpc/lib/test-code-patching.c 
> b/arch/powerpc/lib/test-code-patching.c
> index c44823292f73..e96c48fcd4db 100644
> --- a/arch/powerpc/lib/test-code-patching.c
> +++ b/arch/powerpc/lib/test-code-patching.c
> @@ -347,6 +347,41 @@ static void __init test_prefixed_patching(void)
>   check(!memcmp(iptr, expected, sizeof(expected)));
>  }
>  
> +static void __init test_data_patching(void)
> +{
> + void *buf;
> + u32 *addr32;
> +
> + buf = vzalloc(PAGE_SIZE);
> + check(buf);
> + if (!buf)
> + return;
> +
> + addr32 = buf + 128;
> +
> + addr32[1] = 0xA0A1A2A3;
> + addr32[2] = 0xB0B1B2B3;
> +
> + patch_uint(&addr32[1], 0xC0C1C2C3);
> +
> + check(addr32[0] == 0);
> + check(addr32[1] == 0xC0C1C2C3);
> + check(addr32[2] == 0xB0B1B2B3);
> + check(addr32[3] == 0);
> +
> + patch_ulong(&addr32[1], 0xD0D1D2D3);
> +
> + check(addr32[0] == 0);
> + *(unsigned long *)(&addr32[1]) = 0xD0D1D2D3;

Should that have been a check() instead?

- Naveen

> +
> + if (!IS_ENABLED(CONFIG_PPC64))
> + check(addr32[2] == 0xB0B1B2B3);
> +
> + check(addr32[3] == 0);
> +
> + vfree(buf);
> +}
> +
>  static int __init test_code_patching(void)
>  {
>   pr_info("Running code patching self-tests ...\n");
> @@ -356,6 +391,7 @@ static int __init test_code_patching(void)
>   test_create_function_call();
>   test_translate_branch();
>   test_prefixed_patching();
> + test_data_patching();
>  
>   return 0;
>  }
> -- 
> 2.44.0
>

Re: [PATCH v5 RESEND] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests

2024-04-22 Thread Naveen N Rao

On Tue, Apr 02, 2024 at 12:36:54PM +0530, Gautam Menghani wrote:
> PAPR hypervisor has introduced three new counters in the VPA area of
> LPAR CPUs for KVM L2 guest (see [1] for terminology) observability - 2
> for context switches from host to guest and vice versa, and 1 counter
> for getting the total time spent inside the KVM guest. Add a tracepoint
> that enables reading the counters for use by ftrace/perf. Note that this
> tracepoint is only available for nestedv2 API (i.e, KVM on PowerVM).
> 
> Also maintain an aggregation of the context switch times in vcpu->arch.
> This will be useful in getting the aggregate times with a pmu driver
> which will be upstreamed in the near future.

It would be better to add code to maintain aggregate times as part of 
that pmu driver.

> 
> [1] Terminology:
> a. L1 refers to the VM (LPAR) booted on top of PAPR hypervisor
> b. L2 refers to the KVM guest booted on top of L1.
> 
> Signed-off-by: Vaibhav Jain 
> Signed-off-by: Gautam Menghani 
> ---
> v5 RESEND: 
> 1. Add the changelog
> 
> v4 -> v5:
> 1. Define helper functions for getting/setting the accumulation counter
> in L2's VPA
> 
> v3 -> v4:
> 1. After vcpu_run, check the VPA flag instead of checking for tracepoint
> being enabled for disabling the cs time accumulation.
> 
> v2 -> v3:
> 1. Move the counter disabling and zeroing code to a different function.
> 2. Move the get_lppaca() inside the tracepoint_enabled() branch.
> 3. Add the aggregation logic to maintain total context switch time.
> 
> v1 -> v2:
> 1. Fix the build error due to invalid struct member reference.
> 
>  arch/powerpc/include/asm/kvm_host.h |  5 
>  arch/powerpc/include/asm/lppaca.h   | 11 +---
>  arch/powerpc/kvm/book3s_hv.c| 40 +
>  arch/powerpc/kvm/trace_hv.h | 25 ++
>  4 files changed, 78 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 8abac532146e..d953b32dd68a 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -847,6 +847,11 @@ struct kvm_vcpu_arch {
>   gpa_t nested_io_gpr;
>   /* For nested APIv2 guests*/
>   struct kvmhv_nestedv2_io nestedv2_io;
> +
> + /* Aggregate context switch and guest run time info (in ns) */
> + u64 l1_to_l2_cs_agg;
> + u64 l2_to_l1_cs_agg;
> + u64 l2_runtime_agg;

Can be dropped from this patch.

>  #endif
>  
>  #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
> diff --git a/arch/powerpc/include/asm/lppaca.h 
> b/arch/powerpc/include/asm/lppaca.h
> index 61ec2447dabf..bda6b86b9f13 100644
> --- a/arch/powerpc/include/asm/lppaca.h
> +++ b/arch/powerpc/include/asm/lppaca.h
> @@ -62,7 +62,8 @@ struct lppaca {
>   u8  donate_dedicated_cpu;   /* Donate dedicated CPU cycles */
>   u8  fpregs_in_use;
>   u8  pmcregs_in_use;
> - u8  reserved8[28];
> + u8  l2_accumul_cntrs_enable;  /* Enable usage of counters for KVM 
> guest */

A simpler name - l2_counters_enable or such?

> + u8  reserved8[27];
>   __be64  wait_state_cycles;  /* Wait cycles for this proc */
>   u8  reserved9[28];
>   __be16  slb_count;  /* # of SLBs to maintain */
> @@ -92,9 +93,13 @@ struct lppaca {
>   /* cacheline 4-5 */
>  
>   __be32  page_ins;   /* CMO Hint - # page ins by OS */
> - u8  reserved12[148];
> + u8  reserved12[28];
> + volatile __be64 l1_to_l2_cs_tb;
> + volatile __be64 l2_to_l1_cs_tb;
> + volatile __be64 l2_runtime_tb;
> + u8 reserved13[96];
>   volatile __be64 dtl_idx;/* Dispatch Trace Log head index */
> - u8  reserved13[96];
> + u8  reserved14[96];
>  } cacheline_aligned;
>  
>  #define lppaca_of(cpu)   (*paca_ptrs[cpu]->lppaca_ptr)
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 8e86eb577eb8..fea1c1429975 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4108,6 +4108,37 @@ static void vcpu_vpa_increment_dispatch(struct 
> kvm_vcpu *vcpu)
>   }
>  }
>  
> +static inline int kvmhv_get_l2_accumul(void)
> +{
> + return get_lppaca()->l2_accumul_cntrs_enable;
> +}
> +
> +static inline void kvmhv_set_l2_accumul(int val)
   ^^^
   bool?

> +{
> + get_lppaca()->l2_accumul_cntrs_enable = val;
> +}
> +
> +static void do_trace_nested_cs_time(struct kvm_vcpu *vcpu)
> +{
> + struct lppaca *lp = get_lppaca();
> + u64 l1_to_l2_ns, l2_to_l1_ns, l2_runtime_ns;
> +
> + l1_to_l2_ns = tb_to_ns(be64_to_cpu(lp->l1_to_l2_cs_tb));
> + l2_to_l1_ns = tb_to_ns(be64_to_cpu(lp->l2_to_l1_cs_tb));
> + l2_runtime_ns = tb_to_ns(be64_to_cpu(lp->l2_runtime_tb));
> + trace_kvmppc_vcpu_exit_cs_time(vcpu, l1_to_l2_ns, l2_to_l1_ns,
> + l2_runtime_ns);
> + lp

Re: [PATCH v3 2/2] powerpc/bpf: enable kfunc call

2024-04-15 Thread Naveen N Rao

On Tue, Apr 02, 2024 at 04:28:06PM +0530, Hari Bathini wrote:
> Currently, bpf jit code on powerpc assumes all the bpf functions and
> helpers to be kernel text. This is false for kfunc case, as function
> addresses can be module addresses as well. So, ensure module addresses
> are supported to enable kfunc support.
> 
> Emit instructions based on whether the function address is kernel text
> address or module address to retain optimized instruction sequence for
> kernel text address case.
> 
> Also, as bpf programs are always module addresses and a bpf helper can
> be within kernel address as well, using relative addressing often fails
> with "out of range of pcrel address" error. Use unoptimized instruction
> sequence for both kernel and module addresses to work around this, when
> PCREL addressing is used.

I guess we need a fixes tag for this?
Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL 
addresing")

It will be good to separate out this fix into a separate patch.

Also, I know I said we could use the generic PPC_LI64() for pcrel, but 
we may be able to use a more optimized sequence when calling bpf kernel 
helpers.  See stub_insns[] in module_64.c for an example where we load 
paca->kernelbase, then use a prefixed load instruction to populate the 
lower 34-bit value. For calls out to module area, we can use the generic 
PPC_LI64() macro only if it is outside range of a prefixed load 
instruction.

> 
> With module addresses supported, override bpf_jit_supports_kfunc_call()
> to enable kfunc support. Since module address offsets can be more than
> 32-bit long on PPC64, override bpf_jit_supports_far_kfunc_call() to
> enable 64-bit pointers.
> 
> Signed-off-by: Hari Bathini 
> ---
> 
> * Changes in v3:
>   - Retained optimized instruction sequence when function address is
> a core kernel address as suggested by Naveen.
>   - Used unoptimized instruction sequence for PCREL addressing to
> avoid out of range errors for core kernel function addresses.
>   - Folded patch that adds support for kfunc calls with patch that
> enables/advertises this support as suggested by Naveen.
> 
> 
>  arch/powerpc/net/bpf_jit_comp.c   | 10 +++
>  arch/powerpc/net/bpf_jit_comp64.c | 48 ---
>  2 files changed, 42 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 0f9a21783329..dc7ffafd7441 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -359,3 +359,13 @@ void bpf_jit_free(struct bpf_prog *fp)
>  
>   bpf_prog_unlock_free(fp);
>  }
> +
> +bool bpf_jit_supports_kfunc_call(void)
> +{
> + return true;
> +}
> +
> +bool bpf_jit_supports_far_kfunc_call(void)
> +{
> + return IS_ENABLED(CONFIG_PPC64) ? true : false;
> +}
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 7f62ac4b4e65..ec3adf715c55 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -207,24 +207,14 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, 
> struct codegen_context *ctx, u
>   unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
>   long reladdr;
>  
> - if (WARN_ON_ONCE(!core_kernel_text(func_addr)))
> + /*
> +  * With the introduction of kfunc feature, BPF helpers can be part of 
> kernel as
> +  * well as module text address.
> +  */
> + if (WARN_ON_ONCE(!kernel_text_address(func_addr)))
>   return -EINVAL;
>  
> - if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
> - reladdr = func_addr - CTX_NIA(ctx);
> -
> - if (reladdr >= (long)SZ_8G || reladdr < -(long)SZ_8G) {
> - pr_err("eBPF: address of %ps out of range of pcrel 
> address.\n",
> - (void *)func);
> - return -ERANGE;
> - }
> - /* pla r12,addr */
> - EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(1) | IMM_H18(reladdr));
> - EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | IMM_L(reladdr));
> - EMIT(PPC_RAW_MTCTR(_R12));
> - EMIT(PPC_RAW_BCTR());
> -
> - } else {
> + if (core_kernel_text(func_addr) && 
> !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
>   reladdr = func_addr - kernel_toc_addr();
>   if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {
>   pr_err("eBPF: address of %ps out of range of 
> kernel_toc.\n", (void *)func);
> @@ -235,6 +225,32 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, struct 
> codegen_context *ctx, u
>   EMIT(PPC_RAW_ADDI(_R12, _R12, PPC_LO(reladdr)));
>   EMIT(PPC_RAW_MTCTR(_R12));
>   EMIT(PPC_RAW_BCTRL());
> + } else {
> + if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V1)) {
> + /* func points to the function descriptor */
> + PPC_LI64(bpf_to_ppc(TMP_REG_2

Re: [PATCH v2 2/2] powerpc/bpf: enable kfunc call

2024-02-15 Thread Naveen N Rao

On Tue, Feb 13, 2024 at 07:54:27AM +, Christophe Leroy wrote:
> 
> 
> Le 01/02/2024 à 18:12, Hari Bathini a écrit :
> > With module addresses supported, override bpf_jit_supports_kfunc_call()
> > to enable kfunc support. Module address offsets can be more than 32-bit
> > long, so override bpf_jit_supports_far_kfunc_call() to enable 64-bit
> > pointers.
> 
> What's the impact on PPC32 ? There are no 64-bit pointers on PPC32.

Looking at commit 1cf3bfc60f98 ("bpf: Support 64-bit pointers to 
kfuncs"), which added bpf_jit_supports_far_kfunc_call(), that does look 
to be very specific to platforms where module addresses are farther than 
s32. This is true for powerpc 64-bit, but shouldn't be needed for 
32-bit.

> 
> > 
> > Signed-off-by: Hari Bathini 
> > ---
> > 
> > * No changes since v1.
> > 
> > 
> >   arch/powerpc/net/bpf_jit_comp.c | 10 ++
> >   1 file changed, 10 insertions(+)
> > 
> > diff --git a/arch/powerpc/net/bpf_jit_comp.c 
> > b/arch/powerpc/net/bpf_jit_comp.c
> > index 7b4103b4c929..f896a4213696 100644
> > --- a/arch/powerpc/net/bpf_jit_comp.c
> > +++ b/arch/powerpc/net/bpf_jit_comp.c
> > @@ -359,3 +359,13 @@ void bpf_jit_free(struct bpf_prog *fp)
> >   
> > bpf_prog_unlock_free(fp);
> >   }
> > +
> > +bool bpf_jit_supports_kfunc_call(void)
> > +{
> > +   return true;
> > +}
> > +
> > +bool bpf_jit_supports_far_kfunc_call(void)
> > +{
> > +   return true;
> > +}

I am not sure there is value in keeping this as a separate patch since 
all support code for kfunc calls is introduced in an earlier patch.  
Consider folding this into the previous patch.

- Naveen

Re: [PATCH v2 1/2] powerpc/bpf: ensure module addresses are supported

2024-02-15 Thread Naveen N Rao

On Thu, Feb 01, 2024 at 10:42:48PM +0530, Hari Bathini wrote:
> Currently, bpf jit code on powerpc assumes all the bpf functions and
> helpers to be kernel text. This is false for kfunc case, as function
> addresses are mostly module addresses in that case. Ensure module
> addresses are supported to enable kfunc support.

I don't think that statement is entirely accurate. Our current 
assumptions are still correct in the sense that:
1. all BPF helpers are part of core kernel text, calls to which are 
generated in bpf_jit_emit_func_call_hlp().
2. bpf-to-bpf calls go out to module area where the JIT output of BPF 
subprogs are written to, handled by bpf_jit_emit_func_call_rel().

kfunc calls add another variant where BPF progs can call directly into a 
kernel function (similar to a BPF helper call), except for the fact that 
the said function could be in a kernel module.

> 
> Assume kernel text address for programs with no kfunc call to optimize
> instruction sequence in that case. Add a check to error out if this
> assumption ever changes in the future.
> 
> Signed-off-by: Hari Bathini 
> ---
> 
> Changes in v2:
> * Using bpf_prog_has_kfunc_call() to decide whether to use optimized
>   instruction sequence or not as suggested by Naveen.
> 
> 
>  arch/powerpc/net/bpf_jit.h|   5 +-
>  arch/powerpc/net/bpf_jit_comp.c   |   4 +-
>  arch/powerpc/net/bpf_jit_comp32.c |   8 ++-
>  arch/powerpc/net/bpf_jit_comp64.c | 109 --
>  4 files changed, 97 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index cdea5dccaefe..fc56ee0ee9c5 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -160,10 +160,11 @@ static inline void bpf_clear_seen_register(struct 
> codegen_context *ctx, int i)
>  }
>  
>  void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
> -int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct 
> codegen_context *ctx, u64 func);
> +int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct 
> codegen_context *ctx, u64 func,
> +bool has_kfunc_call);
>  int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct 
> codegen_context *ctx,
>  u32 *addrs, int pass, bool extra_pass);
> -void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
> +void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx, bool 
> has_kfunc_call);
>  void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
>  void bpf_jit_realloc_regs(struct codegen_context *ctx);
>  int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
> tmp_reg, long exit_addr);
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 0f9a21783329..7b4103b4c929 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -163,7 +163,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>* update ctgtx.idx as it pretends to output instructions, then we can
>* calculate total size from idx.
>*/
> - bpf_jit_build_prologue(NULL, &cgctx);
> + bpf_jit_build_prologue(NULL, &cgctx, bpf_prog_has_kfunc_call(fp));
>   addrs[fp->len] = cgctx.idx * 4;
>   bpf_jit_build_epilogue(NULL, &cgctx);
>  
> @@ -192,7 +192,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   /* Now build the prologue, body code & epilogue for real. */
>   cgctx.idx = 0;
>   cgctx.alt_exit_addr = 0;
> - bpf_jit_build_prologue(code_base, &cgctx);
> + bpf_jit_build_prologue(code_base, &cgctx, 
> bpf_prog_has_kfunc_call(fp));
>   if (bpf_jit_build_body(fp, code_base, fcode_base, &cgctx, 
> addrs, pass,
>  extra_pass)) {
>   bpf_arch_text_copy(&fhdr->size, &hdr->size, 
> sizeof(hdr->size));
> diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
> b/arch/powerpc/net/bpf_jit_comp32.c
> index 2f39c50ca729..447747e51a58 100644
> --- a/arch/powerpc/net/bpf_jit_comp32.c
> +++ b/arch/powerpc/net/bpf_jit_comp32.c
> @@ -123,7 +123,7 @@ void bpf_jit_realloc_regs(struct codegen_context *ctx)
>   }
>  }
>  
> -void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
> +void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx, bool 
> has_kfunc_call)
>  {
>   int i;
>  
> @@ -201,7 +201,8 @@ void bpf_jit_build_epilogue(u32 *image, struct 
> codegen_context *ctx)
>  }
>  
>  /* Relative offset needs to be calculated based on final image location */
> -int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct 
> codegen_context *ctx, u64 func)
> +int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct 
> codegen_context *ctx, u64 func,
> +bool has_kfunc_call)
>  {
>   s32 rel = (s32)func - (s32)(fimage + ctx->idx);
>  
> @@ -1054,7 +1055,8 @@ int bpf_j

[PATCH v2] powerpc/ftrace: Ignore ftrace locations in exit text sections

2024-02-13 Thread Naveen N Rao

Michael reported that we are seeing ftrace bug on bootup when KASAN is
enabled, and if we are using -fpatchable-function-entry:

ftrace: allocating 47780 entries in 18 pages
ftrace-powerpc: 0xc20b3d5c: No module provided for non-kernel 
address
[ ftrace bug ]
ftrace faulted on modifying
[] 0xc20b3d5c
Initializing ftrace call sites
ftrace record flags: 0
 (0)
 expected tramp: c008cef4
[ cut here ]
WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2180 ftrace_bug+0x3c0/0x424
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rc3-00120-g0f71dcfb4aef #860
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 
0xf05 of:SLOF,HEAD hv:linux,kvm pSeries
NIP:  c03aa81c LR: c03aa818 CTR: 
REGS: c33cfab0 TRAP: 0700   Not tainted  
(6.5.0-rc3-00120-g0f71dcfb4aef)
MSR:  82021033   CR: 28028240  XER: 
CFAR: c02781a8 IRQMASK: 3
...
NIP [c03aa81c] ftrace_bug+0x3c0/0x424
LR [c03aa818] ftrace_bug+0x3bc/0x424
Call Trace:
 ftrace_bug+0x3bc/0x424 (unreliable)
 ftrace_process_locs+0x5f4/0x8a0
 ftrace_init+0xc0/0x1d0
 start_kernel+0x1d8/0x484

With CONFIG_FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY=y and
CONFIG_KASAN=y, compiler emits nops in functions that it generates for
registering and unregistering global variables (unlike with -pg and
-mprofile-kernel where calls to _mcount() are not generated in those
functions). Those functions then end up in INIT_TEXT and EXIT_TEXT
respectively. We don't expect to see any profiled functions in
EXIT_TEXT, so ftrace_init_nop() assumes that all addresses that aren't
in the core kernel text belongs to a module. Since these functions do
not match that criteria, we see the above bug.

Address this by having ftrace ignore all locations in the text exit
sections of vmlinux.

Fixes: 0f71dcfb4aef ("powerpc/ftrace: Add support for 
-fpatchable-function-entry")
Cc: sta...@vger.kernel.org
Reported-by: Michael Ellerman 
Signed-off-by: Naveen N Rao 
Reviewed-by: Benjamin Gray 
---
v2:
- Rename exit text section variable name to match other architectures
- Fix clang builds

I've collected Benjamin's Reviewed-by tag since those parts of the patch 
remain the same.

- Naveen

 arch/powerpc/include/asm/ftrace.h| 10 ++
 arch/powerpc/include/asm/sections.h  |  1 +
 arch/powerpc/kernel/trace/ftrace.c   | 12 
 arch/powerpc/kernel/trace/ftrace_64_pg.c |  5 +
 arch/powerpc/kernel/vmlinux.lds.S|  2 ++
 5 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 1ebd2ca97f12..107fc5a48456 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -20,14 +20,6 @@
 #ifndef __ASSEMBLY__
 extern void _mcount(void);
 
-static inline unsigned long ftrace_call_adjust(unsigned long addr)
-{
-   if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
-   addr += MCOUNT_INSN_SIZE;
-
-   return addr;
-}
-
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
unsigned long sp);
 
@@ -142,8 +134,10 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { 
return 1; }
 #ifdef CONFIG_FUNCTION_TRACER
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
 void ftrace_free_init_tramp(void);
+unsigned long ftrace_call_adjust(unsigned long addr);
 #else
 static inline void ftrace_free_init_tramp(void) { }
+static inline unsigned long ftrace_call_adjust(unsigned long addr) { return 
addr; }
 #endif
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index ea26665f82cf..f43f3a6b0051 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -14,6 +14,7 @@ typedef struct func_desc func_desc_t;
 
 extern char __head_end[];
 extern char __srwx_boundary[];
+extern char __exittext_begin[], __exittext_end[];
 
 /* Patch sites */
 extern s32 patch__call_flush_branch_caches1;
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 82010629cf88..d8d6b4fd9a14 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -27,10 +27,22 @@
 #include 
 #include 
 #include 
+#include 
 
 #defineNUM_FTRACE_TRAMPS   2
 static unsigned long ftrace_tramps[NUM_FTRACE_TRAMPS];
 
+unsigned long ftrace_call_adjust(unsigned long addr)
+{
+   if (addr >= (unsigned long)__exittext_begin && addr < (unsigned 
long)__exittext_end)
+   return 0;
+
+   if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
+   addr += MCOUNT_INSN_SIZE;
+
+   return addr;
+}
+
 static ppc_inst_

Re: [PATCH] powerpc/ftrace: Ignore ftrace locations in exit text sections

2024-02-12 Thread Naveen N Rao

On Mon, Feb 12, 2024 at 07:31:03PM +, Christophe Leroy wrote:
> 
> 
> Le 09/02/2024 à 08:59, Naveen N Rao a écrit :
> > diff --git a/arch/powerpc/include/asm/sections.h 
> > b/arch/powerpc/include/asm/sections.h
> > index ea26665f82cf..d389dcecdb0b 100644
> > --- a/arch/powerpc/include/asm/sections.h
> > +++ b/arch/powerpc/include/asm/sections.h
> > @@ -14,6 +14,7 @@ typedef struct func_desc func_desc_t;
> >   
> >   extern char __head_end[];
> >   extern char __srwx_boundary[];
> > +extern char _sexittext[], _eexittext[];
> 
> Should we try to at least use the same symbols as others, or best try to 
> move this into include/asm-generic/sections.h, just like inittext ?

I used this name based on what is used for init text start and end in 
the generic code: _sinittext and _einittext.

> 
> $ git grep exittext
> arch/arm64/include/asm/sections.h:extern char __exittext_begin[], 
> __exittext_end[];

Arm64 also uses the non-standard __inittext_begin/__inittext_end, so it 
looks to be something very specific to arm64.

I do agree it would be good to refactor and unify names across 
architectures.


- Naveen

[PATCH] powerpc/ftrace: Ignore ftrace locations in exit text sections

2024-02-09 Thread Naveen N Rao

Michael reported that we are seeing ftrace bug on bootup when KASAN is
enabled, and if we are using -fpatchable-function-entry:

ftrace: allocating 47780 entries in 18 pages
ftrace-powerpc: 0xc20b3d5c: No module provided for non-kernel 
address
[ ftrace bug ]
ftrace faulted on modifying
[] 0xc20b3d5c
Initializing ftrace call sites
ftrace record flags: 0
 (0)
 expected tramp: c008cef4
[ cut here ]
WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2180 ftrace_bug+0x3c0/0x424
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rc3-00120-g0f71dcfb4aef #860
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 
0xf05 of:SLOF,HEAD hv:linux,kvm pSeries
NIP:  c03aa81c LR: c03aa818 CTR: 
REGS: c33cfab0 TRAP: 0700   Not tainted  
(6.5.0-rc3-00120-g0f71dcfb4aef)
MSR:  82021033   CR: 28028240  XER: 
CFAR: c02781a8 IRQMASK: 3
...
NIP [c03aa81c] ftrace_bug+0x3c0/0x424
LR [c03aa818] ftrace_bug+0x3bc/0x424
Call Trace:
 ftrace_bug+0x3bc/0x424 (unreliable)
 ftrace_process_locs+0x5f4/0x8a0
 ftrace_init+0xc0/0x1d0
 start_kernel+0x1d8/0x484

With CONFIG_FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY=y and
CONFIG_KASAN=y, compiler emits nops in functions that it generates for
registering and unregistering global variables (unlike with -pg and
-mprofile-kernel where calls to _mcount() are not generated in those
functions). Those functions then end up in INIT_TEXT and EXIT_TEXT
respectively. We don't expect to see any profiled functions in
EXIT_TEXT, so ftrace_init_nop() assumes that all addresses that aren't
in the core kernel text belongs to a module. Since these functions do
not match that criteria, we see the above bug.

Address this by having ftrace ignore all locations in the text exit
sections of vmlinux.

Fixes: 0f71dcfb4aef ("powerpc/ftrace: Add support for 
-fpatchable-function-entry")
Cc: sta...@vger.kernel.org
Reported-by: Michael Ellerman 
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ftrace.h   |  9 +
 arch/powerpc/include/asm/sections.h |  1 +
 arch/powerpc/kernel/trace/ftrace.c  | 12 
 arch/powerpc/kernel/vmlinux.lds.S   |  2 ++
 4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 1ebd2ca97f12..d6babd083202 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -20,14 +20,7 @@
 #ifndef __ASSEMBLY__
 extern void _mcount(void);
 
-static inline unsigned long ftrace_call_adjust(unsigned long addr)
-{
-   if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
-   addr += MCOUNT_INSN_SIZE;
-
-   return addr;
-}
-
+unsigned long ftrace_call_adjust(unsigned long addr);
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
unsigned long sp);
 
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index ea26665f82cf..d389dcecdb0b 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -14,6 +14,7 @@ typedef struct func_desc func_desc_t;
 
 extern char __head_end[];
 extern char __srwx_boundary[];
+extern char _sexittext[], _eexittext[];
 
 /* Patch sites */
 extern s32 patch__call_flush_branch_caches1;
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 82010629cf88..b5efd8d7bc01 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -27,10 +27,22 @@
 #include 
 #include 
 #include 
+#include 
 
 #defineNUM_FTRACE_TRAMPS   2
 static unsigned long ftrace_tramps[NUM_FTRACE_TRAMPS];
 
+unsigned long ftrace_call_adjust(unsigned long addr)
+{
+   if (addr >= (unsigned long)_sexittext && addr < (unsigned 
long)_eexittext)
+   return 0;
+
+   if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
+   addr += MCOUNT_INSN_SIZE;
+
+   return addr;
+}
+
 static ppc_inst_t ftrace_create_branch_inst(unsigned long ip, unsigned long 
addr, int link)
 {
ppc_inst_t op;
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 1c5970df3233..9c376ae6857d 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -281,7 +281,9 @@ SECTIONS
 * to deal with references from __bug_table
 */
.exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
+   _sexittext = .;
EXIT_TEXT
+   _eexittext = .;
}
 
. = ALIGN(PAGE_SIZE);

base-commit: 4ef8376c466ae8b03e632dd8eca1e44315f7dd61
-- 
2.43.0

Re: Re: [PATCH v2 1/3] powerpc/code-patching: Add generic memory patching

2024-02-05 Thread Naveen N Rao

On Mon, Feb 05, 2024 at 01:30:46PM +1100, Benjamin Gray wrote:
> On Thu, 2023-11-30 at 15:55 +0530, Naveen N Rao wrote:
> > On Mon, Oct 16, 2023 at 04:01:45PM +1100, Benjamin Gray wrote:
> > > 
> > > diff --git a/arch/powerpc/include/asm/code-patching.h
> > > b/arch/powerpc/include/asm/code-patching.h
> > > index 3f881548fb61..7c6056bb1706 100644
> > > --- a/arch/powerpc/include/asm/code-patching.h
> > > +++ b/arch/powerpc/include/asm/code-patching.h
> > > @@ -75,6 +75,39 @@ int patch_branch(u32 *addr, unsigned long
> > > target, int flags);
> > >  int patch_instruction(u32 *addr, ppc_inst_t instr);
> > >  int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
> > >  
> > > +/*
> > > + * patch_uint() and patch_ulong() should only be called on
> > > addresses where the
> > > + * patch does not cross a cacheline, otherwise it may not be
> > > flushed properly
> > > + * and mixes of new and stale data may be observed. It cannot
> > > cross a page
> > > + * boundary, as only the target page is mapped as writable.
> > 
> > Should we enforce alignment requirements, especially for
> > patch_ulong() 
> > on 64-bit powerpc? I am not sure if there are use cases for unaligned
> > 64-bit writes. That should also ensure that the write doesn't cross a
> > cacheline.
> 
> Yeah, the current description is more just the technical restrictions,
> not an endorsement of usage. If the caller isn't working with aligned
> data, it seems unlikely it would still be cacheline aligned. The caller
> should break it into 32bit patches if this is the case.
> 
> By enforce, are you suggesting a WARN_ON in the code too?

No, just detecting and returning an error code should help detect 
incorrect usage.


- Naveen

[PATCH v2] powerpc/64: Set task pt_regs->link to the LR value on scv entry

2024-02-02 Thread Naveen N Rao

Nysal reported that userspace backtraces are missing in offcputime bcc
tool. As an example:
$ sudo ./bcc/tools/offcputime.py -uU
Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to 
end.

^C
write
-python (9107)
8

write
-sudo (9105)
9

mmap
-python (9107)
16

clock_nanosleep
-multipathd (697)
3001604

The offcputime bcc tool attaches a bpf program to a kprobe on
finish_task_switch(), which is usually hit on a syscall from userspace.
With the switch to system call vectored, we started setting
pt_regs->link to zero. This is because system call vectored behaves like
a function call with LR pointing to the system call return address, and
with no modification to SRR0/SRR1. The LR value does indicate our next
instruction, so it is being saved as pt_regs->nip, and pt_regs->link is
being set to zero. This is not a problem by itself, but BPF uses perf
callchain infrastructure for capturing stack traces, and that stores LR
as the second entry in the stack trace. perf has code to cope with the
second entry being zero, and skips over it. However, generic userspace
unwinders assume that a zero entry indicates end of the stack trace,
resulting in a truncated userspace stack trace.

Rather than fixing all userspace unwinders to ignore/skip past the
second entry, store the real LR value in pt_regs->link so that there
continues to be a valid, though duplicate entry in the stack trace.

With this change:
$ sudo ./bcc/tools/offcputime.py -uU
Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to 
end.

^C
write
write
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
PyObject_VectorcallMethod
[unknown]
[unknown]
PyObject_CallOneArg
PyFile_WriteObject
PyFile_WriteString
[unknown]
[unknown]
PyObject_Vectorcall
_PyEval_EvalFrameDefault
PyEval_EvalCode
[unknown]
[unknown]
[unknown]
_PyRun_SimpleFileObject
_PyRun_AnyFileObject
Py_RunMain
[unknown]
Py_BytesMain
[unknown]
__libc_start_main
-python (1293)
7

write
write
[unknown]
sudo_ev_loop_v1
sudo_ev_dispatch_v1
[unknown]
[unknown]
[unknown]
[unknown]
__libc_start_main
-sudo (1291)
7

syscall
syscall
bpf_open_perf_buffer_opts
[unknown]
[unknown]
[unknown]
[unknown]
_PyObject_MakeTpCall
PyObject_Vectorcall
_PyEval_EvalFrameDefault
PyEval_EvalCode
[unknown]
[unknown]
[unknown]
_PyRun_SimpleFileObject
_PyRun_AnyFileObject
Py_RunMain
[unknown]
Py_BytesMain
[unknown]
__libc_start_main
-python (1293)
11

clock_nanosleep
clock_nanosleep
nanosleep
sleep
[unknown]
[unknown]
__clone
-multipathd (698)
3001661

Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv 
instructions")
Cc: sta...@vger.kernel.org
Reported-by: Nysal Jan K.A 
Signed-off-by: Naveen N Rao 
---
v2: Update change log, re-order instructions storing into pt_regs->nip 
and pt_regs->link and add a comment to better describe the change. Also 
added a Fixes: tag.


 arch/powerpc/kernel/interrupt_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index bd863702d812..1ad059a9e2fe 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -52,7 +52,8 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
mr  r10,r1
ld  r1,PACAKSAVE(r13)
std r10,0(r1)
-   std r11,_NIP(r1)
+   std r11,_LINK(r1)
+   std r11,_NIP(r1)/* Saved LR is also the next instruction */
std r12,_MSR(r1)
std r0,GPR0(r1)
std r10,GPR1(r1)
@@ -70,7 +71,6 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
std r9,GPR13(r1)
SAVE_NVGPRS(r1)
std r11,_XER(r1)
-   std r11,_LINK(r1)
std r11,_CTR(r1)
 
li  r11,\trapnr

base-commit: 414e92af226ede4935509b0b5e041810c92e003f
-- 
2.43.0

Re: Re: [PATCH] powerpc/64: Set LR to a non-NULL value in task pt_regs on scv entry

2024-02-02 Thread Naveen N Rao

On Fri, Feb 02, 2024 at 01:02:39PM +1100, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > Hi!
> >
> > On Thu, Jan 25, 2024 at 05:12:28PM +0530, Naveen N Rao wrote:
> >> diff --git a/arch/powerpc/kernel/interrupt_64.S 
> >> b/arch/powerpc/kernel/interrupt_64.S
> >> index bd863702d812..5cf3758a19d3 100644
> >> --- a/arch/powerpc/kernel/interrupt_64.S
> >> +++ b/arch/powerpc/kernel/interrupt_64.S
> >> @@ -53,6 +53,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
> >>ld  r1,PACAKSAVE(r13)
> >>std r10,0(r1)
> >>std r11,_NIP(r1)
> >> +  std r11,_LINK(r1)
> >
> > Please add a comment here then, saying what the store is for?
> 
> Yeah a comment would be good. 
> 
> Also the r11 value comes from LR, so it's not that we're storing the NIP
> value into the LR slot, rather the value we store in NIP is from LR, see:
> 
> EXC_VIRT_BEGIN(system_call_vectored, 0x3000, 0x1000)
>   /* SCV 0 */
>   mr  r9,r13
>   GET_PACA(r13)
>   mflrr11
> ...
>   b   system_call_vectored_common
> 
> That's slightly pedantic, but I think it answers the question of why
> it's OK to use the same value for NIP & LR, or why we don't have to do
> mflr in system_call_vectored_common to get the actual LR value.

Thanks for clarifying that. I should have done a better job describing 
that in the commit log. I'll update that, add a comment here and send a 
v2.


- Naveen

[PATCH] powerpc/64: Set LR to a non-NULL value in task pt_regs on scv entry

2024-01-25 Thread Naveen N Rao

Nysal reported that userspace backtraces are missing in offcputime bcc
tool. As an example:
$ sudo ./bcc/tools/offcputime.py -uU
Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to 
end.

^C
write
-python (9107)
8

write
-sudo (9105)
9

mmap
-python (9107)
16

clock_nanosleep
-multipathd (697)
3001604

The offcputime bcc tool attaches a bpf program to a kprobe on
finish_task_switch(), which is usually hit on a syscall from userspace.
With the switch to system call vectored, we zero out LR value in user
pt_regs on syscall entry. BPF uses perf callchain infrastructure for
capturing stack traces, and this stores LR as the second entry in the
stack trace. Since this is NULL, userspace unwinders assume that there
are no further entries resulting in a truncated userspace stack trace.

Rather than fixing all userspace unwinders to ignore/skip past the
second entry, store NIP as LR so that there continues to be a valid,
though duplicate entry.

With this change:
$ sudo ./bcc/tools/offcputime.py -uU
Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to 
end.

^C
write
write
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
PyObject_VectorcallMethod
[unknown]
[unknown]
PyObject_CallOneArg
PyFile_WriteObject
PyFile_WriteString
[unknown]
[unknown]
PyObject_Vectorcall
_PyEval_EvalFrameDefault
PyEval_EvalCode
[unknown]
[unknown]
[unknown]
_PyRun_SimpleFileObject
_PyRun_AnyFileObject
Py_RunMain
[unknown]
Py_BytesMain
[unknown]
__libc_start_main
-python (1293)
7

write
write
[unknown]
sudo_ev_loop_v1
sudo_ev_dispatch_v1
[unknown]
[unknown]
[unknown]
[unknown]
__libc_start_main
-sudo (1291)
7

syscall
syscall
bpf_open_perf_buffer_opts
[unknown]
[unknown]
[unknown]
[unknown]
_PyObject_MakeTpCall
PyObject_Vectorcall
_PyEval_EvalFrameDefault
PyEval_EvalCode
[unknown]
[unknown]
[unknown]
_PyRun_SimpleFileObject
_PyRun_AnyFileObject
Py_RunMain
[unknown]
Py_BytesMain
[unknown]
__libc_start_main
-python (1293)
11

clock_nanosleep
clock_nanosleep
nanosleep
sleep
[unknown]
[unknown]
__clone
-multipathd (698)
3001661

Reported-by: Nysal Jan K.A 
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/interrupt_64.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index bd863702d812..5cf3758a19d3 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -53,6 +53,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
ld  r1,PACAKSAVE(r13)
std r10,0(r1)
std r11,_NIP(r1)
+   std r11,_LINK(r1)
std r12,_MSR(r1)
std r0,GPR0(r1)
std r10,GPR1(r1)
@@ -70,7 +71,6 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
std r9,GPR13(r1)
SAVE_NVGPRS(r1)
std r11,_XER(r1)
-   std r11,_LINK(r1)
std r11,_CTR(r1)
 
li  r11,\trapnr

base-commit: 414e92af226ede4935509b0b5e041810c92e003f
-- 
2.43.0

[PATCH v2] powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large

2024-01-10 Thread Naveen N Rao

All supported compilers today (gcc v5.1+ and clang v11+) have support for
-mcmodel=medium. As such, NO_MINIMAL_TOC is no longer being set. Remove
NO_MINIMAL_TOC as well as the fallback to -mminimal-toc.

Reviewed-by: Christophe Leroy 
Signed-off-by: Naveen N Rao 
---
v2: Drop the call to cc-option so we break the build if we ever use a 
compiler that does not support the medium code model.


 arch/powerpc/Makefile   | 6 +-
 arch/powerpc/kernel/Makefile| 3 ---
 arch/powerpc/lib/Makefile   | 2 --
 arch/powerpc/mm/Makefile| 2 --
 arch/powerpc/mm/book3s64/Makefile   | 2 --
 arch/powerpc/mm/nohash/Makefile | 2 --
 arch/powerpc/platforms/pseries/Makefile | 1 -
 arch/powerpc/sysdev/Makefile| 2 --
 arch/powerpc/xmon/Makefile  | 2 --
 9 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 051247027da0..bbe0f99b50e8 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -114,7 +114,6 @@ LDFLAGS_vmlinux := $(LDFLAGS_vmlinux-y)
 
 ifdef CONFIG_PPC64
 ifndef CONFIG_PPC_KERNEL_PCREL
-ifeq ($(call cc-option-yn,-mcmodel=medium),y)
# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
# percpu data area are created by this method.
@@ -124,9 +123,6 @@ ifeq ($(call cc-option-yn,-mcmodel=medium),y)
# kernel percpu data space (starting with 0xc...). We need a full
# 64bit relocation for this to work, hence -mcmodel=large.
KBUILD_CFLAGS_MODULE += -mcmodel=large
-else
-   export NO_MINIMAL_TOC := -mno-minimal-toc
-endif
 endif
 endif
 
@@ -139,7 +135,7 @@ CFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mabi=elfv1)
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcall-aixdesc)
 endif
 endif
-CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call 
cc-option,-mminimal-toc))
+CFLAGS-$(CONFIG_PPC64) += -mcmodel=medium
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions)
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mlong-double-128)
 
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2919433be355..2b0567926259 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -3,9 +3,6 @@
 # Makefile for the linux kernel.
 #
 
-ifdef CONFIG_PPC64
-CFLAGS_prom_init.o += $(NO_MINIMAL_TOC)
-endif
 ifdef CONFIG_PPC32
 CFLAGS_prom_init.o  += -fPIC
 CFLAGS_btext.o += -fPIC
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 6eac63e79a89..50d88651d04f 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -3,8 +3,6 @@
 # Makefile for ppc-specific library files..
 #
 
-ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
-
 CFLAGS_code-patching.o += -fno-stack-protector
 CFLAGS_feature-fixups.o += -fno-stack-protector
 
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 503a6e249940..0fe2f085c05a 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -3,8 +3,6 @@
 # Makefile for the linux ppc-specific parts of the memory manager.
 #
 
-ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
-
 obj-y  := fault.o mem.o pgtable.o maccess.o pageattr.o 
\
   init_$(BITS).o pgtable_$(BITS).o \
   pgtable-frag.o ioremap.o ioremap_$(BITS).o \
diff --git a/arch/powerpc/mm/book3s64/Makefile 
b/arch/powerpc/mm/book3s64/Makefile
index cad2abc1730f..33af5795856a 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -1,7 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
-ccflags-y  := $(NO_MINIMAL_TOC)
-
 obj-y  += mmu_context.o pgtable.o trace.o
 ifdef CONFIG_PPC_64S_HASH_MMU
 CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
diff --git a/arch/powerpc/mm/nohash/Makefile b/arch/powerpc/mm/nohash/Makefile
index f3894e79d5f7..b3f0498dd42f 100644
--- a/arch/powerpc/mm/nohash/Makefile
+++ b/arch/powerpc/mm/nohash/Makefile
@@ -1,7 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
-ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
-
 obj-y  += mmu_context.o tlb.o tlb_low.o kup.o
 obj-$(CONFIG_PPC_BOOK3E_64)+= tlb_low_64e.o book3e_pgtable.o
 obj-$(CONFIG_40x)  += 40x.o
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index f936962a2946..7bf506f6b8c8 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -1,5 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
-ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 ccflags-$(CONFIG_PPC_PSERIES_DEBUG)+= -DDEBUG
 
 obj-y  := lpar.o hvCall.o nvram.o reconfig.o \
diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
index 9cb1d029511a

Re: [PATCH] powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large

2024-01-10 Thread Naveen N Rao

On Tue, Jan 09, 2024 at 12:39:36PM -0600, Segher Boessenkool wrote:
> On Tue, Jan 09, 2024 at 03:15:35PM +, Christophe Leroy wrote:
> > >   CFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mcall-aixdesc)
> > >   endif
> > >   endif
> > > -CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mcmodel=medium,$(call 
> > > cc-option,-mminimal-toc))
> > > +CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mcmodel=medium)
> > 
> > Should we still use $(call cc-option  here ?
> > As we only deal with medium model now, shouldn't we make it such that it 
> > fails in case the compiler doesn't support -mcmodel=medium ?

Yup, v2 on its way.

> 
> The -mcmodel= flag has been supported since 2010.  The kernel requires
> a GCC from 2015 or later (GCC 5.1 is the minimum).  -mcmodel=medium is
> (and always has been) the default, so it is always supported, yes.

Thanks for confirming!

- Naveen

[PATCH] powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large

2024-01-09 Thread Naveen N Rao

All supported compilers today (gcc v5.1+ and clang v11+) have support for
-mcmodel=medium. As such, NO_MINIMAL_TOC is no longer being set. Remove
NO_MINIMAL_TOC as well as the fallback to -mminimal-toc.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Makefile   | 6 +-
 arch/powerpc/kernel/Makefile| 3 ---
 arch/powerpc/lib/Makefile   | 2 --
 arch/powerpc/mm/Makefile| 2 --
 arch/powerpc/mm/book3s64/Makefile   | 2 --
 arch/powerpc/mm/nohash/Makefile | 2 --
 arch/powerpc/platforms/pseries/Makefile | 1 -
 arch/powerpc/sysdev/Makefile| 2 --
 arch/powerpc/xmon/Makefile  | 2 --
 9 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 051247027da0..a0eb0fb1aba8 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -114,7 +114,6 @@ LDFLAGS_vmlinux := $(LDFLAGS_vmlinux-y)
 
 ifdef CONFIG_PPC64
 ifndef CONFIG_PPC_KERNEL_PCREL
-ifeq ($(call cc-option-yn,-mcmodel=medium),y)
# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
# percpu data area are created by this method.
@@ -124,9 +123,6 @@ ifeq ($(call cc-option-yn,-mcmodel=medium),y)
# kernel percpu data space (starting with 0xc...). We need a full
# 64bit relocation for this to work, hence -mcmodel=large.
KBUILD_CFLAGS_MODULE += -mcmodel=large
-else
-   export NO_MINIMAL_TOC := -mno-minimal-toc
-endif
 endif
 endif
 
@@ -139,7 +135,7 @@ CFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mabi=elfv1)
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcall-aixdesc)
 endif
 endif
-CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call 
cc-option,-mminimal-toc))
+CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium)
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions)
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mlong-double-128)
 
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2919433be355..2b0567926259 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -3,9 +3,6 @@
 # Makefile for the linux kernel.
 #
 
-ifdef CONFIG_PPC64
-CFLAGS_prom_init.o += $(NO_MINIMAL_TOC)
-endif
 ifdef CONFIG_PPC32
 CFLAGS_prom_init.o  += -fPIC
 CFLAGS_btext.o += -fPIC
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 6eac63e79a89..50d88651d04f 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -3,8 +3,6 @@
 # Makefile for ppc-specific library files..
 #
 
-ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
-
 CFLAGS_code-patching.o += -fno-stack-protector
 CFLAGS_feature-fixups.o += -fno-stack-protector
 
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 503a6e249940..0fe2f085c05a 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -3,8 +3,6 @@
 # Makefile for the linux ppc-specific parts of the memory manager.
 #
 
-ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
-
 obj-y  := fault.o mem.o pgtable.o maccess.o pageattr.o 
\
   init_$(BITS).o pgtable_$(BITS).o \
   pgtable-frag.o ioremap.o ioremap_$(BITS).o \
diff --git a/arch/powerpc/mm/book3s64/Makefile 
b/arch/powerpc/mm/book3s64/Makefile
index cad2abc1730f..33af5795856a 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -1,7 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
-ccflags-y  := $(NO_MINIMAL_TOC)
-
 obj-y  += mmu_context.o pgtable.o trace.o
 ifdef CONFIG_PPC_64S_HASH_MMU
 CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
diff --git a/arch/powerpc/mm/nohash/Makefile b/arch/powerpc/mm/nohash/Makefile
index f3894e79d5f7..b3f0498dd42f 100644
--- a/arch/powerpc/mm/nohash/Makefile
+++ b/arch/powerpc/mm/nohash/Makefile
@@ -1,7 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
-ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
-
 obj-y  += mmu_context.o tlb.o tlb_low.o kup.o
 obj-$(CONFIG_PPC_BOOK3E_64)+= tlb_low_64e.o book3e_pgtable.o
 obj-$(CONFIG_40x)  += 40x.o
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index f936962a2946..7bf506f6b8c8 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -1,5 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
-ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 ccflags-$(CONFIG_PPC_PSERIES_DEBUG)+= -DDEBUG
 
 obj-y  := lpar.o hvCall.o nvram.o reconfig.o \
diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
index 9cb1d029511a..24a177d164f1 100644
--- a/arch/powerpc/sysdev/Makefile
+++ b/arch/powerpc/sysdev/Makefile
@@ -1,7 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0

Re: [RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line

2023-12-22 Thread Naveen N Rao

On Thu, Dec 21, 2023 at 10:46:08AM +, Christophe Leroy wrote:
> 
> 
> Le 08/12/2023 à 17:30, Naveen N Rao a écrit :
> > Function profile sequence on powerpc includes two instructions at the
> > beginning of each function:
> > 
> > mflrr0
> > bl  ftrace_caller
> > 
> > The call to ftrace_caller() gets nop'ed out during kernel boot and is
> > patched in when ftrace is enabled.
> > 
> > There are two issues with this:
> > 1. The 'mflr r0' instruction at the beginning of each function remains
> > even though ftrace is not being used.
> > 2. When ftrace is activated, we return from ftrace_caller() with a bctr
> > instruction to preserve r0 and LR, resulting in the link stack
> > becoming unbalanced.
> > 
> > To address (1), we have tried to nop'out the 'mflr r0' instruction when
> > nop'ing out the call to ftrace_caller() and restoring it when enabling
> > ftrace. But, that required additional synchronization slowing down
> > ftrace activation. It also left an additional nop instruction at the
> > beginning of each function and that wasn't desirable on 32-bit powerpc.
> > 
> > Instead of that, move the function profile sequence out-of-line leaving
> > a single nop at function entry. On ftrace activation, the nop is changed
> > to an unconditional branch to the out-of-line sequence that in turn
> > calls ftrace_caller(). This removes the need for complex synchronization
> > during ftrace activation and simplifies the code. More importantly, this
> > improves performance of the kernel when ftrace is not in use.
> > 
> > To address (2), change the ftrace trampoline to return with a 'blr'
> > instruction with the original return address in r0 intact. Then, an
> > additional 'mtlr r0' instruction in the function profile sequence can
> > move the correct return address back to LR.
> > 
> > With the above two changes, the function profile sequence now looks like
> > the following:
> > 
> >   [func:# GEP -- 64-bit powerpc, optional
> > addis   r2,r12,imm1
> > addir2,r2,imm2]
> >tramp:
> > mflrr0
> > bl  ftrace_caller
> > mtlrr0
> > b   func
> > nop
> > [nop]   # 64-bit powerpc only
> >func:# LEP
> > nop
> > 
> > On 32-bit powerpc, the ftrace mcount trampoline is now completely
> > outside the function. This is also the case on 64-bit powerpc for
> > functions that do not need a GEP. However, for functions that need a
> > GEP, the additional instructions are inserted between the GEP and the
> > LEP. Since we can only have a fixed number of instructions between GEP
> > and LEP, we choose to emit 6 instructions. Four of those instructions
> > are used for the function profile sequence and two instruction slots are
> > reserved for implementing support for DYNAMIC_FTRACE_WITH_CALL_OPS. On
> > 32-bit powerpc, we emit one additional nop for this purpose resulting in
> > a total of 5 nops before function entry.
> > 
> > To enable ftrace, the nop at function entry is changed to an
> > unconditional branch to 'tramp'. The call to ftrace_caller() may be
> > updated to ftrace_regs_caller() depending on the registered ftrace ops.
> > On 64-bit powerpc, we additionally change the instruction at 'tramp' to
> > 'mflr r0' from an unconditional branch back to func+4. This is so that
> > functions entered through the GEP can skip the function profile sequence
> > unless ftrace is enabled.
> > 
> > With the context_switch microbenchmark on a P9 machine, there is a
> > performance improvement of ~6% with this patch applied, going from 650k
> > context switches to 690k context switches without ftrace enabled. With
> > ftrace enabled, the performance was similar at 86k context switches.
> 
> Wondering how significant that context_switch micorbenchmark is.
> 
> I ran it on both mpc885 and mpc8321 and I'm a bit puzzled by some of the 
> results:
> # ./context_switch --no-fp
> Using threads with yield on cpus 0/0 touching FP:no altivec:no vector:no 
> vdso:no
> 
> On 885, I get the following results before and after your patch.
> 
> CONFIG_FTRACE not selected : 44,9k
> CONFIG_FTRACE selected, before : 32,8k
> CONFIG_FTRACE selected, after : 33,6k
> 
> All this is with CONFIG_INIT_STACK_ALL_ZERO which is the default. But 
> when I select CONFIG_INIT_STACK_NONE, the CONFIG_FTRACE not selected 
> result is only 34,4.
> 
&

Re: [PATCH 1/2] powerpc/bpf: ensure module addresses are supported

2023-12-22 Thread Naveen N Rao

On Wed, Dec 20, 2023 at 10:26:21PM +0530, Hari Bathini wrote:
> Currently, bpf jit code on powerpc assumes all the bpf functions and
> helpers to be kernel text. This is false for kfunc case, as function
> addresses are mostly module addresses in that case. Ensure module
> addresses are supported to enable kfunc support.
> 
> This effectively reverts commit feb6307289d8 ("powerpc64/bpf: Optimize
> instruction sequence used for function calls") and commit 43d636f8b4fd
> ("powerpc64/bpf elfv1: Do not load TOC before calling functions") that
> assumed only kernel text for bpf functions/helpers.
> 
> Also, commit b10cb163c4b3 ("powerpc64/bpf elfv2: Setup kernel TOC in
> r2 on entry") that paved the way for the commits mentioned above is
> reverted.

Instead of that, can we detect kfunc and use separate set of 
instructions just for those?

Unless unavoidable, I would prefer to retain the existing optimal
sequence using TOC for calls to bpf kernel helpers, since those are a 
lot more common than kfunc.

- Naveen

Re: [PATCH 07/13] powerpc/kprobes: Unpoison instruction in kprobe struct

2023-12-14 Thread Naveen N Rao

On Thu, Dec 14, 2023 at 05:55:33AM +, Nicholas Miehlbradt wrote:
> KMSAN does not unpoison the ainsn field of a kprobe struct correctly.
> Manually unpoison it to prevent false positives.
> 
> Signed-off-by: Nicholas Miehlbradt 
> ---
>  arch/powerpc/kernel/kprobes.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index b20ee72e873a..1cbec54f2b6a 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
>  DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
> @@ -179,6 +180,7 @@ int arch_prepare_kprobe(struct kprobe *p)
>  
>   if (!ret) {
>   patch_instruction(p->ainsn.insn, insn);
> + kmsan_unpoison_memory(p->ainsn.insn, sizeof(kprobe_opcode_t));

kprobe_opcode_t is u32, but we could be probing a prefixed instruction.  
You can pass the instruction length through ppc_inst_len(insn).


- Naveen

Re: [PATCH] MAINTAINERS: powerpc: Add Aneesh & Naveen

2023-12-13 Thread Naveen N. Rao


Michael Ellerman wrote:

Aneesh and Naveen are helping out with some aspects of upstream
maintenance, add them as reviewers.

Signed-off-by: Michael Ellerman 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)


Acked-by: Naveen N. Rao 

Thanks,
Naveen



diff --git a/MAINTAINERS b/MAINTAINERS
index ea790149af79..562d048863ee 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12240,6 +12240,8 @@ LINUX FOR POWERPC (32-BIT AND 64-BIT)
 M: Michael Ellerman 
 R: Nicholas Piggin 
 R: Christophe Leroy 
+R: Aneesh Kumar K.V 
+R: Naveen N. Rao 
 L: linuxppc-dev@lists.ozlabs.org
 S: Supported
 W: https://github.com/linuxppc/wiki/wiki
--
2.43.0

[RFC PATCH 1/9] powerpc/ftrace: Fix indentation in ftrace.h

2023-12-08 Thread Naveen N Rao

Replace seven spaces with a tab character to fix an indentation issue
reported by the kernel test robot.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202311221731.aluwtdim-...@intel.com/
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ftrace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 9e5a39b6a311..1ebd2ca97f12 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -25,7 +25,7 @@ static inline unsigned long ftrace_call_adjust(unsigned long 
addr)
if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
addr += MCOUNT_INSN_SIZE;
 
-   return addr;
+   return addr;
 }
 
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
-- 
2.43.0

[RFC PATCH 9/9] samples/ftrace: Add support for ftrace direct samples on powerpc

2023-12-08 Thread Naveen N Rao

Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to
show the sample instruction sequence to be used by ftrace direct calls
to adhere to the ftrace ABI.

On 64-bit powerpc, TOC setup requires some additional work.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig|   2 +
 samples/ftrace/ftrace-direct-modify.c   |  94 -
 samples/ftrace/ftrace-direct-multi-modify.c | 110 +++-
 samples/ftrace/ftrace-direct-multi.c|  64 +++-
 samples/ftrace/ftrace-direct-too.c  |  72 -
 samples/ftrace/ftrace-direct.c  |  61 ++-
 6 files changed, 398 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4fe04fdca33a..28de3a5f3e98 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -274,6 +274,8 @@ config PPC
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE
select HAVE_RSEQ
+   select HAVE_SAMPLE_FTRACE_DIRECTif 
HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+   select HAVE_SAMPLE_FTRACE_DIRECT_MULTI  if 
HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_SETUP_PER_CPU_AREA  if PPC64
select HAVE_SOFTIRQ_ON_OWN_STACK
select HAVE_STACKPROTECTOR  if PPC32 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
diff --git a/samples/ftrace/ftrace-direct-modify.c 
b/samples/ftrace/ftrace-direct-modify.c
index e2a6a69352df..bd985035b937 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -2,7 +2,7 @@
 #include 
 #include 
 #include 
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include 
 #endif
 
@@ -164,6 +164,98 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC32
+
+asm (
+"  .pushsection.text, \"ax\", @progbits\n"
+"  .type   my_tramp1, @function\n"
+"  .globl  my_tramp1\n"
+"   my_tramp1:"
+"  stw 0, 4(1)\n"
+"  stwu1, -16(1)\n"
+"  mflr0\n"
+"  stw 0, 4(1)\n"
+"  stwu1, -16(1)\n"
+"  bl  my_direct_func1\n"
+"  lwz 0, 20(1)\n"
+"  mtlr0\n"
+"  addi1, 1, 32\n"
+"  lwz 0, 4(1)\n"
+"  blr\n"
+"  .size   my_tramp1, .-my_tramp1\n"
+
+"  .type   my_tramp2, @function\n"
+"  .globl  my_tramp2\n"
+"   my_tramp2:"
+"  stw 0, 4(1)\n"
+"  stwu1, -16(1)\n"
+"  mflr0\n"
+"  stw 0, 4(1)\n"
+"  stwu1, -16(1)\n"
+"  bl  my_direct_func2\n"
+"  lwz 0, 20(1)\n"
+"  mtlr0\n"
+"  addi1, 1, 32\n"
+"  lwz 0, 4(1)\n"
+"  blr\n"
+"  .size   my_tramp2, .-my_tramp2\n"
+"  .popsection\n"
+);
+
+#endif /* CONFIG_PPC32 */
+
+#ifdef CONFIG_PPC64
+
+asm (
+"  .pushsection.text, \"ax\", @progbits\n"
+"  .type   my_tramp1, @function\n"
+"  .globl  my_tramp1\n"
+"   my_tramp1:"
+"  std 0, 16(1)\n"
+"  stdu1, -32(1)\n"
+"  mflr0\n"
+"  std 0, 16(1)\n"
+"  stdu1, -32(1)\n"
+"  std 2, 24(1)\n"
+"  bcl 20, 31, 1f\n"
+"   1: mflr12\n"
+"  ld  2, (2f - 1b)(12)\n"
+"  bl  my_direct_func1\n"
+"  ld  2, 24(1)\n"
+"  ld  0, 48(1)\n"
+"  mtlr0\n"
+"  addi1, 1, 64\n"
+"  ld  0, 16(1)\n"
+"  blr\n"
+"   2: .quad   .TOC.@tocbase\n"
+"  .size   my_tramp1, .-my_tramp1\n"
+
+"  .type   my_tramp2, @function\n"
+"  .globl  my_tramp2\n"
+"   my_tramp2:"
+"  std 0, 16(1)\n"
+"  stdu1, -32(1)\n"
+"  mflr0\n"
+"  std 0, 16(1)\n"
+"  stdu1, -32(1)\n"
+"  std 2, 24(1)\n"
+"  bcl 20, 31, 1f\n"
+"   1: mflr12\n"
+"  ld  2, (2f - 1b)(12)\n"
+"  bl

[RFC PATCH 8/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS

2023-12-08 Thread Naveen N Rao

Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64
implementation.

ftrace direct calls allow custom trampolines to be called into directly
from function ftrace call sites, bypassing the ftrace trampoline
completely. This functionality is currently utilized by BPF trampolines
to hook into kernel function entries.

Since we have limited relative branch range, we support ftrace direct
calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this
approach, ftrace trampoline is not entirely bypassed. Rather, it is
re-purposed into a stub that reads direct_call field from the associated
ftrace_ops structure and branches into that, if it is not NULL. For
this, it is sufficient if we can ensure that the ftrace trampoline is
reachable from all traceable functions.

When multiple ftrace_ops are associated with a call site, we utilize a
call back to set pt_regs->orig_gpr3 that can then be tested on the
return path from the ftrace trampoline to branch into the direct caller.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/ftrace.h| 15 
 arch/powerpc/kernel/asm-offsets.c|  3 +
 arch/powerpc/kernel/trace/ftrace_entry.S | 99 ++--
 4 files changed, 93 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c8ecc9dcc914..4fe04fdca33a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -235,6 +235,7 @@ config PPC
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+   select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if 
HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
select HAVE_DYNAMIC_FTRACE_WITH_REGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index d9b99781bea3..986c4fffb9ec 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -93,6 +93,21 @@ struct ftrace_ops;
 #define ftrace_graph_func ftrace_graph_func
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs);
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+/*
+ * When an ftrace registered caller is tracing a function that is also set by a
+ * register_ftrace_direct() call, it needs to be differentiated in the
+ * ftrace_caller trampoline so that the direct call can be invoked after the
+ * other ftrace ops. To do this, place the direct caller in the orig_gpr3 field
+ * of pt_regs. This tells ftrace_caller that there's a direct caller.
+ */
+static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, 
unsigned long addr)
+{
+   struct pt_regs *regs = &fregs->regs;
+   regs->orig_gpr3 = addr;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
 #endif
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8b8a39b57a9f..85da10726d98 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -678,6 +678,9 @@ int main(void)
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+   OFFSET(FTRACE_OPS_DIRECT_CALL, ftrace_ops, direct_call);
+#endif
 #endif
 
return 0;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index 4d1220c2e32f..ab60395fc34b 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,14 +33,57 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro ftrace_regs_entry allregs
-   /* Save the original return address in A's stack frame */
-   PPC_STL r0, LRSAVE(r1)
/* Create a minimal stack frame for representing B */
PPC_STLUr1, -STACK_FRAME_MIN_SIZE(r1)
 
/* Create our stack frame + pt_regs */
PPC_STLUr1,-SWITCH_FRAME_SIZE(r1)
 
+   .if \allregs == 1
+   SAVE_GPRS(11, 12, r1)
+   .endif
+
+   /* Get the _mcount() call site out of LR */
+   mflrr11
+
+#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+   /*
+* This points after the bl at 'mtlr r0', but this sequence could be
+* outside the function. Move this to point just after the ftrace
+* location inside the function for proper unwind.
+*/
+   addir11, r11, FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+   /* Load the ftrace_op */
+   PPC_LL  r12, -SZL-MCOUNT_INSN_SIZE(r11)
+
+#ifdef CONFIG

[RFC PATCH 7/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS

2023-12-08 Thread Naveen N Rao

Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the
arm64 implementation.

This works by patching-in a pointer to an associated ftrace_ops
structure before each traceable function. If multiple ftrace_ops are
associated with a call site, then a special ftrace_list_ops is used to
enable iterating over all the registered ftrace_ops. If no ftrace_ops
are associated with a call site, then a special ftrace_nop_ops structure
is used to render the ftrace call as a no-op. ftrace trampoline can then
read the associated ftrace_ops for a call site by loading from an offset
from the LR, and branch directly to the associated function.

The primary advantage with this approach is that we don't have to
iterate over all the registered ftrace_ops for call sites that have a
single ftrace_ops registered. This is the equivalent of implementing
support for dynamic ftrace trampolines, which set up a special ftrace
trampoline for each registered ftrace_ops and have individual call sites
branch into those directly.

A secondary advantage is that this gives us a way to add support for
direct ftrace callers without having to resort to using stubs. The
address of the direct call trampoline can be loaded from the ftrace_ops
structure.

To support this, we utilize the space between the existing function
profile sequence and the function entry. During ftrace activation, we
update this location with the associated ftrace_ops pointer. Then, on
ftrace entry, we load from this location and call into
ftrace_ops->func().

For 64-bit powerpc, we also select FUNCTION_ALIGNMENT_8B so that the
ftrace_ops pointer is double word aligned and can be updated atomically.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig |  2 +
 arch/powerpc/kernel/asm-offsets.c|  4 ++
 arch/powerpc/kernel/trace/ftrace.c   | 58 
 arch/powerpc/kernel/trace/ftrace_entry.S | 39 +++-
 4 files changed, 91 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 318e5c1b7454..c8ecc9dcc914 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -190,6 +190,7 @@ config PPC
select EDAC_SUPPORT
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY
select FUNCTION_ALIGNMENT_4B
+   select FUNCTION_ALIGNMENT_8Bif PPC64 && 
DYNAMIC_FTRACE_WITH_CALL_OPS
select GENERIC_ATOMIC64 if PPC32
select GENERIC_CLOCKEVENTS_BROADCASTif SMP
select GENERIC_CMOS_UPDATE
@@ -233,6 +234,7 @@ config PPC
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
+   select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY
select HAVE_DYNAMIC_FTRACE_WITH_REGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9f14d95b8b32..8b8a39b57a9f 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -676,5 +676,9 @@ int main(void)
DEFINE(BPT_SIZE, BPT_SIZE);
 #endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+   OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#endif
+
return 0;
 }
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index d3b4949142a8..af84eabf7912 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -124,6 +124,41 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, 
unsigned long addr, ppc_
return 0;
 }
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+static const struct ftrace_ops *powerpc_rec_get_ops(struct dyn_ftrace *rec)
+{
+   const struct ftrace_ops *ops = NULL;
+
+   if (rec->flags & FTRACE_FL_CALL_OPS_EN) {
+   ops = ftrace_find_unique_ops(rec);
+   WARN_ON_ONCE(!ops);
+   }
+
+   if (!ops)
+   ops = &ftrace_list_ops;
+
+   return ops;
+}
+
+static int ftrace_rec_set_ops(const struct dyn_ftrace *rec, const struct 
ftrace_ops *ops)
+{
+   return patch_ulong((void *)(rec->ip - sizeof(unsigned long)), (unsigned 
long)ops);
+}
+
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec)
+{
+   return ftrace_rec_set_ops(rec, &ftrace_nop_ops);
+}
+
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec)
+{
+   return ftrace_rec_set_ops(rec, powerpc_rec_get_ops(rec));
+}
+#else
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) { return 0; }
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec) { return 0; }
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, 
unsigned long

[RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line

2023-12-08 Thread Naveen N Rao

Function profile sequence on powerpc includes two instructions at the
beginning of each function:

mflrr0
bl  ftrace_caller

The call to ftrace_caller() gets nop'ed out during kernel boot and is
patched in when ftrace is enabled.

There are two issues with this:
1. The 'mflr r0' instruction at the beginning of each function remains
   even though ftrace is not being used.
2. When ftrace is activated, we return from ftrace_caller() with a bctr
   instruction to preserve r0 and LR, resulting in the link stack
   becoming unbalanced.

To address (1), we have tried to nop'out the 'mflr r0' instruction when
nop'ing out the call to ftrace_caller() and restoring it when enabling
ftrace. But, that required additional synchronization slowing down
ftrace activation. It also left an additional nop instruction at the
beginning of each function and that wasn't desirable on 32-bit powerpc.

Instead of that, move the function profile sequence out-of-line leaving
a single nop at function entry. On ftrace activation, the nop is changed
to an unconditional branch to the out-of-line sequence that in turn
calls ftrace_caller(). This removes the need for complex synchronization
during ftrace activation and simplifies the code. More importantly, this
improves performance of the kernel when ftrace is not in use.

To address (2), change the ftrace trampoline to return with a 'blr'
instruction with the original return address in r0 intact. Then, an
additional 'mtlr r0' instruction in the function profile sequence can
move the correct return address back to LR.

With the above two changes, the function profile sequence now looks like
the following:

 [func: # GEP -- 64-bit powerpc, optional
addis   r2,r12,imm1
addir2,r2,imm2]
  tramp:
mflrr0
bl  ftrace_caller
mtlrr0
b   func
nop
[nop]   # 64-bit powerpc only
  func: # LEP
nop

On 32-bit powerpc, the ftrace mcount trampoline is now completely
outside the function. This is also the case on 64-bit powerpc for
functions that do not need a GEP. However, for functions that need a
GEP, the additional instructions are inserted between the GEP and the
LEP. Since we can only have a fixed number of instructions between GEP
and LEP, we choose to emit 6 instructions. Four of those instructions
are used for the function profile sequence and two instruction slots are
reserved for implementing support for DYNAMIC_FTRACE_WITH_CALL_OPS. On
32-bit powerpc, we emit one additional nop for this purpose resulting in
a total of 5 nops before function entry.

To enable ftrace, the nop at function entry is changed to an
unconditional branch to 'tramp'. The call to ftrace_caller() may be
updated to ftrace_regs_caller() depending on the registered ftrace ops.
On 64-bit powerpc, we additionally change the instruction at 'tramp' to
'mflr r0' from an unconditional branch back to func+4. This is so that
functions entered through the GEP can skip the function profile sequence
unless ftrace is enabled.

With the context_switch microbenchmark on a P9 machine, there is a
performance improvement of ~6% with this patch applied, going from 650k
context switches to 690k context switches without ftrace enabled. With
ftrace enabled, the performance was similar at 86k context switches.

The downside of this approach is the increase in vmlinux size,
especially on 32-bit powerpc. We now emit 3 additional instructions for
each function (excluding the one or two instructions for supporting
DYNAMIC_FTRACE_WITH_CALL_OPS). On 64-bit powerpc with the current
implementation of -fpatchable-function-entry though, this is not
avoidable since we are forced to emit 6 instructions between the GEP and
the LEP even if we are to only support DYNAMIC_FTRACE_WITH_CALL_OPS.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Makefile|   6 +-
 arch/powerpc/include/asm/code-patching.h |  15 ++-
 arch/powerpc/include/asm/ftrace.h|  18 ++-
 arch/powerpc/kernel/kprobes.c|  51 +++-
 arch/powerpc/kernel/trace/ftrace.c   | 149 ++-
 arch/powerpc/kernel/trace/ftrace_entry.S |  54 ++--
 6 files changed, 246 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f19dbaa1d541..91ef34be8eb9 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -145,7 +145,11 @@ CFLAGS-$(CONFIG_PPC32) += $(call 
cc-option,-mno-readonly-in-sdata)
 ifdef CONFIG_FUNCTION_TRACER
 ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
-CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+ifdef CONFIG_PPC32
+CC_FLAGS_FTRACE := -fpatchable-function-entry=6,5
+else
+CC_FLAGS_FTRACE := -fpatchable-function-entry=7,6
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --

[RFC PATCH 5/9] powerpc/kprobes: Use ftrace to determine if a probe is at function entry

2023-12-08 Thread Naveen N Rao

Rather than hard-coding the offset into a function to be used to
determine if a kprobe is at function entry, use ftrace_location() to
determine the ftrace location within the function and categorize all
instructions till that offset to be function entry.

For functions that cannot be traced, we fall back to using a fixed
offset of 8 (two instructions) to categorize a probe as being at
function entry for 64-bit elfv2.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/kprobes.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index b20ee72e873a..42665dfab59e 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
return addr;
 }
 
-static bool arch_kprobe_on_func_entry(unsigned long offset)
+static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
 {
-#ifdef CONFIG_PPC64_ELF_ABI_V2
-#ifdef CONFIG_KPROBES_ON_FTRACE
-   return offset <= 16;
-#else
-   return offset <= 8;
-#endif
-#else
+   unsigned long ip = ftrace_location(addr);
+
+   if (ip)
+   return offset <= (ip - addr);
+   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
+   return offset <= 8;
return !offset;
-#endif
 }
 
 /* XXX try and fold the magic of kprobe_lookup_name() in this */
 kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long 
offset,
 bool *on_func_entry)
 {
-   *on_func_entry = arch_kprobe_on_func_entry(offset);
+   *on_func_entry = arch_kprobe_on_func_entry(addr, offset);
return (kprobe_opcode_t *)(addr + offset);
 }
 
-- 
2.43.0

[RFC PATCH 4/9] powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B

2023-12-08 Thread Naveen N Rao

From: Sathvika Vasireddy 

Commit d49a0626216b95 ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT")
introduced a generic function-alignment infrastructure. Move to using
FUNCTION_ALIGNMENT_4B on powerpc, to use the same alignment as that of
the existing _GLOBAL macro.

Signed-off-by: Sathvika Vasireddy 
---
 arch/powerpc/Kconfig   | 1 +
 arch/powerpc/include/asm/linkage.h | 3 ---
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f105ee4f3cf..318e5c1b7454 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -189,6 +189,7 @@ config PPC
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+   select FUNCTION_ALIGNMENT_4B
select GENERIC_ATOMIC64 if PPC32
select GENERIC_CLOCKEVENTS_BROADCASTif SMP
select GENERIC_CMOS_UPDATE
diff --git a/arch/powerpc/include/asm/linkage.h 
b/arch/powerpc/include/asm/linkage.h
index b88d1d2cf304..b71b9582e754 100644
--- a/arch/powerpc/include/asm/linkage.h
+++ b/arch/powerpc/include/asm/linkage.h
@@ -4,9 +4,6 @@
 
 #include 
 
-#define __ALIGN.align 2
-#define __ALIGN_STR".align 2"
-
 #ifdef CONFIG_PPC64_ELF_ABI_V1
 #define cond_syscall(x) \
asm ("\t.weak " #x "\n\t.set " #x ", sys_ni_syscall\n"  \
-- 
2.43.0

[RFC PATCH 3/9] powerpc/ftrace: Remove nops after the call to ftrace_stub

2023-12-08 Thread Naveen N Rao

ftrace_stub is within the same CU, so there is no need for a subsequent
nop instruction.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace_entry.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index 17d1ed3d0b40..244a1c7bb1e8 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -162,7 +162,6 @@ _GLOBAL(ftrace_regs_caller)
 .globl ftrace_regs_call
 ftrace_regs_call:
bl  ftrace_stub
-   nop
ftrace_regs_exit 1
 
 _GLOBAL(ftrace_caller)
@@ -171,7 +170,6 @@ _GLOBAL(ftrace_caller)
 .globl ftrace_call
 ftrace_call:
bl  ftrace_stub
-   nop
ftrace_regs_exit 0
 
 _GLOBAL(ftrace_stub)
-- 
2.43.0

[RFC PATCH 2/9] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code

2023-12-08 Thread Naveen N Rao

On 32-bit powerpc, gcc generates a three instruction sequence for
function profiling:
mflrr0
stw r0, 4(r1)
bl  _mcount

On kernel boot, the call to _mcount() is nop-ed out, to be patched back
in when ftrace is actually enabled. The 'stw' instruction therefore is
not necessary unless ftrace is enabled. Nop it out during ftrace init.

When ftrace is enabled, we want the 'stw' so that stack unwinding works
properly. Perform the same within the ftrace handler, similar to 64-bit
powerpc.

For 64-bit powerpc, early versions of gcc used to emit a three
instruction sequence for function profiling (with -mprofile-kernel) with
a 'std' instruction to mimic the 'stw' above. Address that scenario also
by nop-ing out the 'std' instruction during ftrace init.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c   | 6 --
 arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 82010629cf88..2956196c98ff 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -229,13 +229,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
if (!ret)
-   ret = ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
+   ret = ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
+ppc_inst(PPC_RAW_NOP()));
} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl 
_mcount' */
ret = ftrace_read_inst(ip - 4, &old);
if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0 {
ret = ftrace_validate_inst(ip - 8, 
ppc_inst(PPC_RAW_MFLR(_R0)));
-   ret |= ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
+   ret |= ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
+ ppc_inst(PPC_RAW_NOP()));
}
} else {
return -EINVAL;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index 40677416d7b2..17d1ed3d0b40 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,6 +33,8 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro ftrace_regs_entry allregs
+   /* Save the original return address in A's stack frame */
+   PPC_STL r0, LRSAVE(r1)
/* Create a minimal stack frame for representing B */
PPC_STLUr1, -STACK_FRAME_MIN_SIZE(r1)
 
@@ -44,8 +46,6 @@
SAVE_GPRS(3, 10, r1)
 
 #ifdef CONFIG_PPC64
-   /* Save the original return address in A's stack frame */
-   std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
/* Ok to continue? */
lbz r3, PACA_FTRACE_ENABLED(r13)
cmpdi   r3, 0
-- 
2.43.0

[RFC PATCH 0/9] powerpc: ftrace updates

2023-12-08 Thread Naveen N Rao

Early RFC.

This series attempts to address couple of issues with the existing 
support for ftrace on powerpc, with a view towards improving performance 
when ftrace is not enabled. See patch 6 for more details.

Patches 7 and 8 implement support for ftrace direct calls, through 
adding support for DYNAMIC_FTRACE_WITH_CALL_OPS.

The first 5 patches are minor cleanups and updates, and can go in 
separately.

This series depends on Benjamin Gray's series adding support for 
patch_ulong().

I have lightly tested this patch set and it looks to be working well. As 
described in patch 6, context_switch microbenchmark shows an improvement 
of ~6% with this series with ftrace disabled. Performance when ftrace is
enabled reduces due to how DYNAMIC_FTRACE_WITH_CALL_OPS works, and due 
to support for direct calls. Some of that can hopefully be improved, if 
this approach is otherwise ok.

- Naveen



Naveen N Rao (8):
  powerpc/ftrace: Fix indentation in ftrace.h
  powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
  powerpc/ftrace: Remove nops after the call to ftrace_stub
  powerpc/kprobes: Use ftrace to determine if a probe is at function
entry
  powerpc/ftrace: Update and move function profile instructions
out-of-line
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  samples/ftrace: Add support for ftrace direct samples on powerpc

Sathvika Vasireddy (1):
  powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B

 arch/powerpc/Kconfig|   6 +
 arch/powerpc/Makefile   |   6 +-
 arch/powerpc/include/asm/code-patching.h|  15 +-
 arch/powerpc/include/asm/ftrace.h   |  35 ++-
 arch/powerpc/include/asm/linkage.h  |   3 -
 arch/powerpc/kernel/asm-offsets.c   |   7 +
 arch/powerpc/kernel/kprobes.c   |  69 +-
 arch/powerpc/kernel/trace/ftrace.c  | 231 
 arch/powerpc/kernel/trace/ftrace_entry.S| 182 +++
 samples/ftrace/ftrace-direct-modify.c   |  94 +++-
 samples/ftrace/ftrace-direct-multi-modify.c | 110 +-
 samples/ftrace/ftrace-direct-multi.c|  64 +-
 samples/ftrace/ftrace-direct-too.c  |  72 +-
 samples/ftrace/ftrace-direct.c  |  61 +-
 14 files changed, 845 insertions(+), 110 deletions(-)


base-commit: 9a15ae60f2c9707433b01e55815cd9142be102b2
prerequisite-patch-id: 38d3e705bf2e27cfa5e3ba369a6ded84ba6615c2
prerequisite-patch-id: 609d292e054b2396b603890522a940fa0bdfb6d8
prerequisite-patch-id: 6f7213fb77b1260defbf43be0e47bff9c80054cc
prerequisite-patch-id: f2328625ae2193c3c8e336b154b62030940cece8
-- 
2.43.0

Re: [PATCH v2 2/3] powerpc/64: Convert patch_instruction() to patch_u32()

2023-11-30 Thread Naveen N Rao

On Mon, Oct 16, 2023 at 04:01:46PM +1100, Benjamin Gray wrote:
> This use of patch_instruction() is working on 32 bit data, and can fail
> if the data looks like a prefixed instruction and the extra write
> crosses a page boundary. Use patch_u32() to fix the write size.
> 
> Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules")
> Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/
> Signed-off-by: Benjamin Gray 
> 
> ---
> 
> v2: * Added the fixes tag, it seems appropriate even if the subject does
>   mention a more robust solution being required.
> 
> patch_u64() should be more efficient, but judging from the bug report
> it doesn't seem like the data is doubleword aligned.

That doesn't look to be the case anymore due to commits 77e69ee7ce07 
("powerpc/64: modules support building with PCREL addresing") and 
7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL 
addresing")

- Naveen

Re: [PATCH v2 1/3] powerpc/code-patching: Add generic memory patching

2023-11-30 Thread Naveen N Rao

On Mon, Oct 16, 2023 at 04:01:45PM +1100, Benjamin Gray wrote:
> patch_instruction() is designed for patching instructions in otherwise
> readonly memory. Other consumers also sometimes need to patch readonly
> memory, so have abused patch_instruction() for arbitrary data patches.
> 
> This is a problem on ppc64 as patch_instruction() decides on the patch
> width using the 'instruction' opcode to see if it's a prefixed
> instruction. Data that triggers this can lead to larger writes, possibly
> crossing a page boundary and failing the write altogether.
> 
> Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and
> patch_u64() (on ppc64) designed for aligned data patches. The patch
> size is now determined by the called function, and is passed as an
> additional parameter to generic internals.
> 
> While the instruction flushing is not required for data patches, the
> use cases for data patching (mainly module loading and static calls)
> are less performance sensitive than for instruction patching
> (ftrace activation).

That's debatable. While it is nice to be able to activate function 
tracing quickly, it is not necessarily a hot path. On the flip side, I 
do have a use case for data patching for ftrace activation :)

> So the instruction flushing remains unconditional
> in this patch.
> 
> ppc32 does not support prefixed instructions, so is unaffected by the
> original issue. Care is taken in not exposing the size parameter in the
> public (non-static) interface, so the compiler can const-propagate it
> away.
> 
> Signed-off-by: Benjamin Gray 
> 
> ---
> 
> v2: * Deduplicate patch_32() definition
> * Use u32 for val32
> * Remove noinline
> ---
>  arch/powerpc/include/asm/code-patching.h | 33 
>  arch/powerpc/lib/code-patching.c | 66 ++--
>  2 files changed, 83 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/code-patching.h 
> b/arch/powerpc/include/asm/code-patching.h
> index 3f881548fb61..7c6056bb1706 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -75,6 +75,39 @@ int patch_branch(u32 *addr, unsigned long target, int 
> flags);
>  int patch_instruction(u32 *addr, ppc_inst_t instr);
>  int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
>  
> +/*
> + * patch_uint() and patch_ulong() should only be called on addresses where 
> the
> + * patch does not cross a cacheline, otherwise it may not be flushed properly
> + * and mixes of new and stale data may be observed. It cannot cross a page
> + * boundary, as only the target page is mapped as writable.

Should we enforce alignment requirements, especially for patch_ulong() 
on 64-bit powerpc? I am not sure if there are use cases for unaligned 
64-bit writes. That should also ensure that the write doesn't cross a 
cacheline.

> + *
> + * patch_instruction() and other instruction patchers automatically satisfy 
> this
> + * requirement due to instruction alignment requirements.
> + */
> +
> +#ifdef CONFIG_PPC64
> +
> +int patch_uint(void *addr, unsigned int val);
> +int patch_ulong(void *addr, unsigned long val);
> +
> +#define patch_u64 patch_ulong
> +
> +#else
> +
> +static inline int patch_uint(u32 *addr, unsigned int val)

Is there a reason to use u32 * here?

> +{
> + return patch_instruction(addr, ppc_inst(val));
> +}
> +
> +static inline int patch_ulong(void *addr, unsigned long val)
> +{
> + return patch_instruction(addr, ppc_inst(val));
> +}
> +
> +#endif
> +
> +#define patch_u32 patch_uint
> +
>  static inline unsigned long patch_site_addr(s32 *site)
>  {
>   return (unsigned long)site + *site;
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index b00112d7ad46..60289332412f 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -20,15 +20,14 @@
>  #include 
>  #include 
>  
> -static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 
> *patch_addr)
> +static int __patch_memory(void *exec_addr, unsigned long val, void 
> *patch_addr,
> +   bool is_dword)
>  {
> - if (!ppc_inst_prefixed(instr)) {
> - u32 val = ppc_inst_val(instr);
> + if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
> + u32 val32 = val;

Would be good to add a comment indicating the need for this for BE.

- Naveen

[PATCH] powerpc/ftrace: Fix stack teardown in ftrace_no_trace

2023-11-29 Thread Naveen N Rao

Commit 41a506ef71eb ("powerpc/ftrace: Create a dummy stackframe to fix
stack unwind") added use of a new stack frame on ftrace entry to fix
stack unwind. However, the commit missed updating the offset used while
tearing down the ftrace stack when ftrace is disabled. Fix the same.

In addition, the commit missed saving the correct stack pointer in
pt_regs. Update the same.

Fixes: 41a506ef71eb ("powerpc/ftrace: Create a dummy stackframe to fix stack 
unwind")
Cc: sta...@vger.kernel.org
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index 90701885762c..40677416d7b2 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -62,7 +62,7 @@
.endif
 
/* Save previous stack pointer (r1) */
-   addir8, r1, SWITCH_FRAME_SIZE
+   addir8, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
PPC_STL r8, GPR1(r1)
 
.if \allregs == 1
@@ -182,7 +182,7 @@ ftrace_no_trace:
mflrr3
mtctr   r3
REST_GPR(3, r1)
-   addir1, r1, SWITCH_FRAME_SIZE
+   addir1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
mtlrr0
bctr
 #endif

base-commit: 9a15ae60f2c9707433b01e55815cd9142be102b2
-- 
2.43.0

Re: [PATCH] powerpc/lib: Avoid array bounds warnings in vec ops

2023-11-24 Thread Naveen N Rao

On Thu, Nov 23, 2023 at 09:17:54AM -0600, Gustavo A. R. Silva wrote:
> 
> > > To be honest I don't know how paranoid we want to get, we could end up
> > > putting WARN's all over the kernel :)
> > > 
> > > In this case I guess if the size is too large we overflow the buffer on
> > > the kernel stack, so we should at least check the size.
> > > 
> > > But does it need a WARN? I'm not sure. If we had a case that was passing
> > > a out-of-bound size hopefully we would notice in testing? :)
> > 
> > You're right, a simpler check should suffice. I will send an updated
> > patch.
> 
> This[1] patch indeed also makes those -Wstringop-overflow warnings go away. :)
> 
> I'm not subscribed to the list but here are my
> 
> Reviewed-by: Gustavo A. R. Silva 
> Build-tested-by: Gustavo A. R. Silva 

Thanks for testing. I intended my patch to go atop Michael's patch since 
do_fp_load()/do_fp_store() also clamp down the size passed to 
do_byte_reverse(). While the use of min() isn't strictly necessary with 
the added check for 'size' at the beginning of the function, it doesn't 
hurt to have it and Michael's patch does have a better description for 
the change :)

- Naveen

[PATCH] powerpc/lib: Validate size for vector operations

2023-11-22 Thread Naveen N Rao

Some of the fp/vmx code in sstep.c assume a certain maximum size for the
instructions being emulated. The size of those operations however is
determined separately in analyse_instr().

Add a check to validate the assumption on the maximum size of the
operations, so as to prevent any unintended kernel stack corruption.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/lib/sstep.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index a13f05cfc7db..5766180f5380 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -586,6 +586,8 @@ static int do_fp_load(struct instruction_op *op, unsigned 
long ea,
} u;
 
nb = GETSIZE(op->type);
+   if (nb > sizeof(u))
+   return -EINVAL;
if (!address_ok(regs, ea, nb))
return -EFAULT;
rn = op->reg;
@@ -636,6 +638,8 @@ static int do_fp_store(struct instruction_op *op, unsigned 
long ea,
} u;
 
nb = GETSIZE(op->type);
+   if (nb > sizeof(u))
+   return -EINVAL;
if (!address_ok(regs, ea, nb))
return -EFAULT;
rn = op->reg;
@@ -680,6 +684,9 @@ static nokprobe_inline int do_vec_load(int rn, unsigned 
long ea,
u8 b[sizeof(__vector128)];
} u = {};
 
+   if (size > sizeof(u))
+   return -EINVAL;
+
if (!address_ok(regs, ea & ~0xfUL, 16))
return -EFAULT;
/* align to multiple of size */
@@ -707,6 +714,9 @@ static nokprobe_inline int do_vec_store(int rn, unsigned 
long ea,
u8 b[sizeof(__vector128)];
} u;
 
+   if (size > sizeof(u))
+   return -EINVAL;
+
if (!address_ok(regs, ea & ~0xfUL, 16))
return -EFAULT;
/* align to multiple of size */

base-commit: 275f51172646ac48f0c4e690c72183084fd996d1
prerequisite-patch-id: ebc3edfe2b9fce7bdf27098c8631740153249b06
-- 
2.42.0

Re: [PATCH] powerpc/lib: Avoid array bounds warnings in vec ops

2023-11-22 Thread Naveen N Rao

On Wed, Nov 22, 2023 at 03:44:07PM +1100, Michael Ellerman wrote:
> Naveen N Rao  writes:
> > On Tue, Nov 21, 2023 at 10:54:36AM +1100, Michael Ellerman wrote:
> >> Building with GCC 13 (which has -array-bounds enabled) there are several
> >
> > Thanks, gcc13 indeed helps reproduce the warnings.
> 
> Actually that part is no longer true since 0da6e5fd6c37 ("gcc: disable
> '-Warray-bounds' for gcc-13 too").
> 
> >> warnings in sstep.c along the lines of:
> >> 
> >>   In function ‘do_byte_reverse’,
> >>   inlined from ‘do_vec_load’ at arch/powerpc/lib/sstep.c:691:3,
> >>   inlined from ‘emulate_loadstore’ at arch/powerpc/lib/sstep.c:3439:9:
> >>   arch/powerpc/lib/sstep.c:289:23: error: array subscript 2 is outside 
> >> array bounds of ‘u8[16]’ {aka ‘unsigned char[16]’} [-Werror=array-bounds=]
> >> 289 | up[2] = byterev_8(up[1]);
> >> | ~~^~
> >>   arch/powerpc/lib/sstep.c: In function ‘emulate_loadstore’:
> >>   arch/powerpc/lib/sstep.c:681:11: note: at offset 16 into object ‘u’ of 
> >> size 16
> >> 681 | } u = {};
> >> |   ^
> >> 
> >> do_byte_reverse() supports a size up to 32 bytes, but in these cases the
> >> caller is only passing a 16 byte buffer. In practice there is no bug,
> >> do_vec_load() is only called from the LOAD_VMX case in emulate_loadstore().
> >> That in turn is only reached when analyse_instr() recognises VMX ops,
> >> and in all cases the size is no greater than 16:
> >> 
> >>   $ git grep -w LOAD_VMX arch/powerpc/lib/sstep.c
> >>   arch/powerpc/lib/sstep.c:op->type = 
> >> MKOP(LOAD_VMX, 0, 1);
> >>   arch/powerpc/lib/sstep.c:op->type = 
> >> MKOP(LOAD_VMX, 0, 2);
> >>   arch/powerpc/lib/sstep.c:op->type = 
> >> MKOP(LOAD_VMX, 0, 4);
> >>   arch/powerpc/lib/sstep.c:op->type = 
> >> MKOP(LOAD_VMX, 0, 16);
> >> 
> >> Similarly for do_vec_store().
> >> 
> >> Although the warning is incorrect, the code would be safer if it clamped
> >> the size from the caller to the known size of the buffer. Do that using
> >> min_t().
> >
> > But, do_vec_load() and do_vec_store() assume that the maximum size is 16 
> > (the address_ok() check as an example). So, should we be considering a 
> > bigger hammer to help detect future incorrect use?
> 
> Yeah true.
> 
> To be honest I don't know how paranoid we want to get, we could end up
> putting WARN's all over the kernel :)
> 
> In this case I guess if the size is too large we overflow the buffer on
> the kernel stack, so we should at least check the size.
> 
> But does it need a WARN? I'm not sure. If we had a case that was passing
> a out-of-bound size hopefully we would notice in testing? :)

You're right, a simpler check should suffice. I will send an updated 
patch.

Thanks,
Naveen

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1055 matches

Mail list logo