Re: [PATCH RFC] rethook: inline arch_rethook_trampoline_callback() in assembly code

2024-04-29 Thread Andrii Nakryiko
On Wed, Apr 24, 2024 at 5:02 PM Andrii Nakryiko  wrote:
>
> At the lowest level, rethook-based kretprobes on x86-64 architecture go
> through arch_rethoook_trampoline() function, manually written in
> assembly, which calls into a simple arch_rethook_trampoline_callback()
> function, written in C, and only doing a few straightforward field
> assignments, before calling further into rethook_trampoline_handler(),
> which handles kretprobe callbacks generically.
>
> Looking at simplicity of arch_rethook_trampoline_callback(), it seems
> not really worthwhile to spend an extra function call just to do 4 or
> 5 assignments. As such, this patch proposes to "inline"
> arch_rethook_trampoline_callback() into arch_rethook_trampoline() by
> manually implementing it in an assembly code.
>
> This has two motivations. First, we do get a bit of runtime speed up by
> avoiding function calls. Using BPF selftests's bench tool, we see
> 0.6%-0.8% throughput improvement for kretprobe/multi-kretprobe
> triggering code path:
>
> BEFORE (latest probes/for-next)
> ===
> kretprobe  :   10.455 ± 0.024M/s
> kretprobe-multi:   11.150 ± 0.012M/s
>
> AFTER (probes/for-next + this patch)
> 
> kretprobe  :   10.540 ± 0.009M/s (+0.8%)
> kretprobe-multi:   11.219 ± 0.042M/s (+0.6%)
>
> Second, and no less importantly for some specialized use cases, this
> avoids unnecessarily "polluting" LBR records with an extra function call
> (recorded as a jump by CPU). This is the case for the retsnoop ([0])
> tool, which relies havily on capturing LBR records to provide users with
> lots of insight into kernel internals.
>
> This RFC patch is only inlining this function for x86-64, but it's
> possible to do that for 32-bit x86 arch as well and then remove
> arch_rethook_trampoline_callback() implementation altogether. Please let
> me know if this change is acceptable and whether I should complete it
> with 32-bit "inlining" as well. Thanks!
>
>   [0] 
> https://nakryiko.com/posts/retsnoop-intro/#peering-deep-into-functions-with-lbr
>
> Signed-off-by: Andrii Nakryiko 
> ---
>  arch/x86/kernel/asm-offsets_64.c |  4 
>  arch/x86/kernel/rethook.c| 37 +++-
>  2 files changed, 36 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/asm-offsets_64.c 
> b/arch/x86/kernel/asm-offsets_64.c
> index bb65371ea9df..5c444abc540c 100644
> --- a/arch/x86/kernel/asm-offsets_64.c
> +++ b/arch/x86/kernel/asm-offsets_64.c
> @@ -42,6 +42,10 @@ int main(void)
> ENTRY(r14);
> ENTRY(r15);
> ENTRY(flags);
> +   ENTRY(ip);
> +   ENTRY(cs);
> +   ENTRY(ss);
> +   ENTRY(orig_ax);
> BLANK();
>  #undef ENTRY
>
> diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
> index 8a1c0111ae79..3e1c01beebd1 100644
> --- a/arch/x86/kernel/rethook.c
> +++ b/arch/x86/kernel/rethook.c
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "kprobes/common.h"
>
> @@ -34,10 +35,36 @@ asm(
> "   pushq %rsp\n"
> "   pushfq\n"
> SAVE_REGS_STRING
> -   "   movq %rsp, %rdi\n"
> -   "   call arch_rethook_trampoline_callback\n"
> +   "   movq %rsp, %rdi\n" /* $rdi points to regs */
> +   /* fixup registers */
> +   /* regs->cs = __KERNEL_CS; */
> +   "   movq $" __stringify(__KERNEL_CS) ", " __stringify(pt_regs_cs) 
> "(%rdi)\n"
> +   /* regs->ip = (unsigned long)&arch_rethook_trampoline; */
> +   "   movq $arch_rethook_trampoline, " __stringify(pt_regs_ip) 
> "(%rdi)\n"
> +   /* regs->orig_ax = ~0UL; */
> +   "   movq $0x, " __stringify(pt_regs_orig_ax) 
> "(%rdi)\n"
> +   /* regs->sp += 2*sizeof(long); */
> +   "   addq $16, " __stringify(pt_regs_sp) "(%rdi)\n"
> +   /* 2nd arg is frame_pointer = (long *)(regs + 1); */
> +   "   lea " __stringify(PTREGS_SIZE) "(%rdi), %rsi\n"

BTW, all this __stringify() ugliness can be avoided if we move this
assembly into its own .S file, like lots of other assembly functions
in arch/x86/kernel subdir. That has another benefit of generating
better line information in DWARF for those assembly instructions. It's
lots more work, so before I do this, I'd like to get confirmation that
this change is acceptable in principle.

> +   /*
> +* The return address at 'frame_pointer' is recovered by the
> +* arch_rethook_fixup_return() which called from this
> +* rethook_trampoline_handler().
> +*/
> +   "   call rethook_trampoline_handler\n"
> +   /*
> +* Copy FLAGS to 'pt_regs::ss' so we can do RET right after POPF.
> +*
> +* We don't save/restore %rax below, because we ignore
> +* rethook_trampoline_handler result.
> +*
> +* *(unsigned long *)®s->ss = regs->flags;
> +*/
> +   "   mov " __stringify(pt_regs_flags) "(%rsp), %r

Re: [PATCH bpf-next] bpf: add support to read cpu_entry in bpf program

2024-04-29 Thread Yonghong Song



On 4/27/24 8:18 AM, Florian Lehner wrote:

Add new field "cpu_entry" to bpf_perf_event_data which could be read by
bpf programs attached to perf events. The value contains the CPU value
recorded by specifying sample_type with PERF_SAMPLE_CPU when calling
perf_event_open().


You can use bpf_cast_to_kern_ctx kfunc which can cast 'struct 
bpf_perf_event_data'
ctx to 'struct bpf_perf_event_data_kern'.

struct bpf_perf_event_data_kern {
bpf_user_pt_regs_t *regs;
struct perf_sample_data *data;
struct perf_event *event;
};

You can access bpf_perf_event_data_kern->data and then to access 'cpu_entry' 
field.



Signed-off-by: Florian Lehner 
---
  include/uapi/linux/bpf_perf_event.h   |  4 
  kernel/trace/bpf_trace.c  | 13 +
  tools/include/uapi/linux/bpf_perf_event.h |  4 
  3 files changed, 21 insertions(+)

diff --git a/include/uapi/linux/bpf_perf_event.h 
b/include/uapi/linux/bpf_perf_event.h
index eb1b9d21250c..4856b4396ece 100644
--- a/include/uapi/linux/bpf_perf_event.h
+++ b/include/uapi/linux/bpf_perf_event.h
@@ -14,6 +14,10 @@ struct bpf_perf_event_data {
bpf_user_pt_regs_t regs;
__u64 sample_period;
__u64 addr;
+   struct {
+   u32 cpu;
+   u32 reserved;
+   }   cpu_entry;
  };
  
  #endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index afb232b1d7c2..2b303221af5c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2176,6 +2176,11 @@ static bool pe_prog_is_valid_access(int off, int size, 
enum bpf_access_type type
if (!bpf_ctx_narrow_access_ok(off, size, size_u64))
return false;
break;
+   case bpf_ctx_range(struct bpf_perf_event_data, cpu_entry):
+   bpf_ctx_record_field_size(info, size_u64);
+   if (!bpf_ctx_narrow_access_ok(off, size, size_u64))
+   return false;
+   break;
default:
if (size != sizeof(long))
return false;
@@ -2208,6 +2213,14 @@ static u32 pe_prog_convert_ctx_access(enum 
bpf_access_type type,
  bpf_target_off(struct perf_sample_data, 
addr, 8,
 target_size));
break;
+   case offsetof(struct bpf_perf_event_data, cpu_entry):
+   *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct 
bpf_perf_event_data_kern,
+  data), si->dst_reg, 
si->src_reg,
+ offsetof(struct bpf_perf_event_data_kern, 
data));
+   *insn++ = BPF_LDX_MEM(BPF_DW, si->dst_reg, si->dst_reg,
+ bpf_target_off(struct perf_sample_data, 
cpu_entry, 8,
+target_size));
+   break;
default:
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct 
bpf_perf_event_data_kern,
   regs), si->dst_reg, 
si->src_reg,
diff --git a/tools/include/uapi/linux/bpf_perf_event.h 
b/tools/include/uapi/linux/bpf_perf_event.h
index eb1b9d21250c..4856b4396ece 100644
--- a/tools/include/uapi/linux/bpf_perf_event.h
+++ b/tools/include/uapi/linux/bpf_perf_event.h
@@ -14,6 +14,10 @@ struct bpf_perf_event_data {
bpf_user_pt_regs_t regs;
__u64 sample_period;
__u64 addr;
+   struct {
+   u32 cpu;
+   u32 reserved;
+   }   cpu_entry;
  };
  
  #endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */




Re: [PATCH 5.15,5.10,5.4,4.19 0/2] Fix warning when tracing with large filenames

2024-04-29 Thread Greg KH
On Wed, Apr 24, 2024 at 07:20:07PM -0300, Thadeu Lima de Souza Cascardo wrote:
> The warning described on patch "tracing: Increase PERF_MAX_TRACE_SIZE to
> handle Sentinel1 and docker together" can be triggered with a perf probe on
> do_execve with a large path. As PATH_MAX is larger than PERF_MAX_TRACE_SIZE
> (2048 before the patch), the warning will trigger.
> 
> The fix was included in 5.16, so backporting to 5.15 and earlier LTS
> kernels. Also included is a patch that better describes the attempted
> allocation size.

All now queued up, thanks.

greg k-h