nt using
"perf probe" tool.
Thank you,
>
> I don't know if "common" is the right question here, because it's a
> chicken-egg problem: no tracepoint, we give up; we have the
> tracepoint, it unlocks a range of new use cases (that require robust
> solution to make BPF programs exec-aware, and a tracepoint is the only
> option IMHO).
>
> Thanks,
> -- Marco
--
Masami Hiramatsu (Google)
On Tue, 9 Apr 2024 15:13:09 -0400
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> The "buffer_percent" logic that is used by the ring buffer splice code to
> only wake up the tasks when there's no data after the buffer is filled to
> the perc
rethook
> infrastructure) runtime throughput by 2%, according to BPF benchmarks ([0]).
>
> [0]
> https://lore.kernel.org/bpf/caef4bzauq2wkmjzdc9s0rbwa01bybgwhn6andxqshyia47p...@mail.gmail.com/
>
This looks good to me :)
Acked-by: Masami Hiramatsu (Google)
Thank you,
> Cc: S
claimed
> @@ -174,6 +175,7 @@ struct rethook_node *rethook_try_get(struct rethook *rh)
>*/
> if (unlikely(!rcu_is_watching()))
> return NULL;
> +#endif
>
> return (struct rethook_node *)objpool_pop(>pool);
> }
> --
> 2.43.0
>
--
Masami Hiramatsu (Google)
Hi Paul,
Thanks, both looks good to me.
Acked-by: Masami Hiramatsu (Google)
Let me update bootconfig/fixes.
On Mon, 8 Apr 2024 21:42:49 -0700
"Paul E. McKenney" wrote:
> Hello!
>
> This series removes redundant comments from /proc/bootconfig:
>
> 1.fs/proc:
On Tue, 9 Apr 2024 14:20:45 +0800
Zheng Yejian wrote:
> On 2024/4/8 20:41, Masami Hiramatsu (Google) wrote:
> > Hi Zheng,
> >
> > On Mon, 8 Apr 2024 16:34:03 +0800
> > Zheng Yejian wrote:
> >
> >> There is once warn in __arm_kprobe_ftrace() on:
> >sigreturn() exists only to allow the implementation of signal
> > handlers. It should never be
> >called directly. Details of the arguments (if any) passed to
> > sigreturn() vary depending on
> >the architecture.
> >
> > this is how sys_uretprobe() should be treated/documented.
>
> yes, will include man page patch in new version
And please follow Documentation/process/adding-syscalls.rst in new version,
then we can avoid repeating the same discussion :-)
Thank you!
>
> jirka
>
> >
> > sigreturn() can be "improved" too. Say, it could validate sigcontext->ip
> > and return -EINVAL if this addr is not valid. But why?
> >
> > Oleg.
> >
--
Masami Hiramatsu (Google)
900, Masami Hiramatsu wrote:
> > > > On Fri, 5 Apr 2024 10:23:24 +0900
> > > > Masami Hiramatsu (Google) wrote:
> > > >
> > > > > On Thu, 4 Apr 2024 10:43:14 -0700
> > > > > "Paul E. McKenney" wrote:
> >
if (WARN_ON_ONCE(tsk != current && task_is_running(tsk)))
> + if (tsk != current && task_is_running(tsk))
> return 0;
>
> do {
> --
> 2.34.1
>
--
Masami Hiramatsu (Google)
7,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> goto out;
> }
>
> - /* Check if 'p' is probing a module. */
> - *probed_mod = __module_text_address((unsigned long) p->addr);
> + /* Get module refcount and reject __init functions for loaded modules.
> */
> if (*probed_mod) {
> /*
>* We must hold a refcount of the probed module while updating
> --
> 2.25.1
>
--
Masami Hiramatsu (Google)
).
I would like to clear that the abuse of this syscall will not possible to harm
the normal programs, and even if it is used by malicious code (e.g. injected by
stack overflow) it doesn't cause a problem. At least thsese points are cleared,
and documented. it is easier to push it as new Linux API.
Thank you,
>
> Thanks!
>
> Oleg.
>
>
--
Masami Hiramatsu (Google)
ry == ri->stack
0: |ret-addr| <- call pushed it
0: |ret-addr| <- sp at return trampoline
3: |r11 | <- regs->sp at syscall
2: |rcx |
1: |rax | <- ri->stack
0: |ret-addr|
(Note: The lower the line, the larger the address.)
Thus, we can check the stack address by (regs->sp + 16 == ri->stack).
Thank you,
--
Masami Hiramatsu (Google)
t_reserved(p->addr, p->addr) ||
> @@ -1581,7 +1587,6 @@ static int check_kprobe_address_safe(struct kprobe *p,
> }
>
> /* Check if 'p' is probing a module. */
/* Get module refcount and reject __init functions for loaded modules.
*/
> - *probed_mod = __module_text_address((unsigned long) p->addr);
> if (*probed_mod) {
> /*
>* We must hold a refcount of the probed module while updating
> --
> 2.25.1
>
Thank you,
--
Masami Hiramatsu (Google)
h Masami's tree, but the change makes
> > sense to me, given this is an expected condition.
> >
> > Acked-by: Andrii Nakryiko
>
> Masami, I assume you'll pick this up?
OK, anyway it will just return 0 if this situation happens, and caller will
get the trampoline address instead of correct return address in this case.
I think it does not do any unsafe things. So I agree removing it.
But I think the explanation is a bit confusing.
Thank you,
>
> Thanks,
> Daniel
--
Masami Hiramatsu (Google)
On Tue, 2 Apr 2024 22:21:00 -0700
Andrii Nakryiko wrote:
> On Tue, Apr 2, 2024 at 9:00 PM Andrii Nakryiko
> wrote:
> >
> > On Tue, Apr 2, 2024 at 5:52 PM Steven Rostedt wrote:
> > >
> > > On Wed, 3 Apr 2024 09:40:48 +0900
> > > Masami Hirama
> handlers. It should never be
>called directly. Details of the arguments (if any) passed to
> sigreturn() vary depending on
>the architecture.
>
> this is how sys_uretprobe() should be treated/documented.
OK.
>
> sigreturn() can be "improved" too. Say, it could validate sigcontext->ip
> and return -EINVAL if this addr is not valid. But why?
Because sigreturn() never returns, but sys_uretprobe() will return.
If it changes the regs->ip and directly returns to the original return address,
I think it is natural to send SIGILL.
Thank you,
--
Masami Hiramatsu (Google)
On Thu, 4 Apr 2024 21:25:41 -0700
"Paul E. McKenney" wrote:
> On Fri, Apr 05, 2024 at 11:57:45AM +0900, Masami Hiramatsu wrote:
> > On Fri, 5 Apr 2024 10:23:24 +0900
> > Masami Hiramatsu (Google) wrote:
> >
> > > On Thu, 4 Apr 2024 10:43:
On Fri, 5 Apr 2024 10:23:24 +0900
Masami Hiramatsu (Google) wrote:
> On Thu, 4 Apr 2024 10:43:14 -0700
> "Paul E. McKenney" wrote:
>
> > On Thu, Apr 04, 2024 at 08:55:22AM +0900, Masami Hiramatsu wrote:
> > > On Wed, 3 Apr 2024 12:16:28 -07
c894d4b ("fs/proc: Add boot loader arguments as comment to
> > > /proc/bootconfig")
> > > Signed-off-by: Zhenhua Huang
> > > Signed-off-by: Paul E. McKenney
> > > Cc: Masami Hiramatsu
> > > Cc:
> > > Cc:
> >
> > OOps, good
sp + expected_offset)
goto sigill;
expected_offset should be 16 (push * 3 - ret) on x64 if we ri->stack is the
regs->sp right after call.
Thank you,
--
Masami Hiramatsu (Google)
peration strictly.
> > > > > I concern that new system calls introduce vulnerabilities.
> > > > >
> > > >
> > > > As Oleg and Jiri mentioned, this syscall can't harm kernel or other
> > > > processes, only the process that is abusing the API. So any extra
> > > > checks that would slow down this approach is an unnecessary overhead
> > > > and complication that will never be useful in practice.
> > >
> > > I think at least it should check the caller address to ensure the
> > > address is in the trampoline.
> > > But anyway, uprobes itself can break the target process, so no one
> > > might care if this system call breaks the process now.
> >
> > If we already have an expected range of addresses, then I think it's
> > fine to do a quick unlikely() check. I'd be more concerned if we need
> > to do another lookup or search to just validate this. I'm sure Jiri
> > will figure it out.
>
> Oleg mentioned the trampoline address check could race with another
> thread's mremap call, however trap is using that check as well, it
> still seems like good idea to do it also in the uretprobe syscall
Yeah, and also, can we add a stack pointer check if the trampoline is
shared with other probe points?
Thank you,
>
> jirka
--
Masami Hiramatsu (Google)
ul in practice.
> >
> > I think at least it should check the caller address to ensure the
> > address is in the trampoline.
> > But anyway, uprobes itself can break the target process, so no one
> > might care if this system call breaks the process now.
>
> If we already have an expected range of addresses, then I think it's
> fine to do a quick unlikely() check. I'd be more concerned if we need
> to do another lookup or search to just validate this. I'm sure Jiri
> will figure it out.
Good.
>
> >
> > >
> > > Also note that sys_uretprobe is a kind of internal and unstable API
> > > and it is explicitly called out that its contract can change at any
> > > time and user space shouldn't rely on it. It's purely for the kernel's
> > > own usage.
> >
> > Is that OK to use a syscall as "internal" and "unstable" API?
>
> See above about rt_sigreturn. It seems like yes, for some highly
> specialized syscalls it is the case already.
OK, but as I said it is just for performance optimization, that is
a bit different from rt_sigreturn case.
Thank you,
> >
> > >
> > > So let's please keep it fast and simple.
> > >
> > >
> > > > Thank you,
> > > >
> > > >
> > > > >
> > > > > thanks,
> > > > > jirka
> > > > >
> > > > >
> > > > > >
> > >
> > > [...]
> >
> >
> > ([OT] If we can add syscall so casually, I would like to add sys_traceevent
> > for recording user space events :-) .)
>
> Have you proposed this upstream? :) I have no clue and no opinion about it...
>
> >
> > --
> > Masami Hiramatsu (Google)
--
Masami Hiramatsu (Google)
t should check the caller address to ensure the
address is in the trampoline.
But anyway, uprobes itself can break the target process, so no one
might care if this system call breaks the process now.
>
> Also note that sys_uretprobe is a kind of internal and unstable API
> and it is explicitly called out that its contract can change at any
> time and user space shouldn't rely on it. It's purely for the kernel's
> own usage.
Is that OK to use a syscall as "internal" and "unstable" API?
>
> So let's please keep it fast and simple.
>
>
> > Thank you,
> >
> >
> > >
> > > thanks,
> > > jirka
> > >
> > >
> > > >
>
> [...]
([OT] If we can add syscall so casually, I would like to add sys_traceevent
for recording user space events :-) .)
--
Masami Hiramatsu (Google)
> Signed-off-by: Paul E. McKenney
> Cc: Masami Hiramatsu
> Cc:
> Cc:
OOps, good catch! Let me pick it.
Acked-by: Masami Hiramatsu (Google)
Thank you!
>
> diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
> index 902b326e1e560..e5635a6b127b0 100644
>
__NR_uretprobe 462
> > > +__SYSCALL(__NR_uretprobe, sys_uretprobe)
> > > +
> > > #undef __NR_syscalls
> > > -#define __NR_syscalls 462
> > > +#define __NR_syscalls 463
> > >
> > > /*
> > > * 32 bit systems traditionally used different
> > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > > index 929e98c62965..90395b16bde0 100644
> > > --- a/kernel/events/uprobes.c
> > > +++ b/kernel/events/uprobes.c
> > > @@ -1474,11 +1474,20 @@ static int xol_add_vma(struct mm_struct *mm,
> > > struct xol_area *area)
> > > return ret;
> > > }
> > >
> > > +void * __weak arch_uprobe_trampoline(unsigned long *psize)
> > > +{
> > > + static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
> > > +
> > > + *psize = UPROBE_SWBP_INSN_SIZE;
> > > + return
> > > +}
> > > +
> > > static struct xol_area *__create_xol_area(unsigned long vaddr)
> > > {
> > > struct mm_struct *mm = current->mm;
> > > - uprobe_opcode_t insn = UPROBE_SWBP_INSN;
> > > + unsigned long insns_size;
> > > struct xol_area *area;
> > > + void *insns;
> > >
> > > area = kmalloc(sizeof(*area), GFP_KERNEL);
> > > if (unlikely(!area))
> > > @@ -1502,7 +1511,8 @@ static struct xol_area *__create_xol_area(unsigned
> > > long vaddr)
> > > /* Reserve the 1st slot for get_trampoline_vaddr() */
> > > set_bit(0, area->bitmap);
> > > atomic_set(>slot_count, 1);
> > > - arch_uprobe_copy_ixol(area->pages[0], 0, , UPROBE_SWBP_INSN_SIZE);
> > > + insns = arch_uprobe_trampoline(_size);
> > > + arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
> > >
> > > if (!xol_add_vma(mm, area))
> > > return area;
> > > @@ -2123,7 +2133,7 @@ static struct return_instance
> > > *find_next_ret_chain(struct return_instance *ri)
> > > return ri;
> > > }
> > >
> > > -static void handle_trampoline(struct pt_regs *regs)
> > > +void uprobe_handle_trampoline(struct pt_regs *regs)
> > > {
> > > struct uprobe_task *utask;
> > > struct return_instance *ri, *next;
> > > @@ -2188,7 +2198,7 @@ static void handle_swbp(struct pt_regs *regs)
> > >
> > > bp_vaddr = uprobe_get_swbp_addr(regs);
> > > if (bp_vaddr == get_trampoline_vaddr())
> > > - return handle_trampoline(regs);
> > > + return uprobe_handle_trampoline(regs);
> > >
> > > uprobe = find_active_uprobe(bp_vaddr, _swbp);
> > > if (!uprobe) {
> > > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> > > index faad00cce269..be6195e0d078 100644
> > > --- a/kernel/sys_ni.c
> > > +++ b/kernel/sys_ni.c
> > > @@ -391,3 +391,5 @@ COND_SYSCALL(setuid16);
> > >
> > > /* restartable sequence */
> > > COND_SYSCALL(rseq);
> > > +
> > > +COND_SYSCALL(uretprobe);
> > > --
> > > 2.44.0
> > >
> >
> >
> > --
> > Masami Hiramatsu (Google)
--
Masami Hiramatsu (Google)
gt; +static __always_inline unsigned long
> > +ftrace_regs_get_return_value(const struct ftrace_regs *fregs)
> > +{
> > + return fregs->a0;
> > +}
> > +
> > +static __always_inline void
> > +ftrace_regs_set_return_value(struct ftrace_regs *fregs,
> > +unsigned long ret)
> > +{
> > + fregs->a0 = ret;
> > +}
> > +
> > +static __always_inline void
> > +ftrace_override_function_with_return(struct ftrace_regs *fregs)
> > +{
> > + fregs->epc = fregs->ra;
> > +}
>
> Style/nit: All above; Try to use the full 100 chars, and keep the
> function name return value on the same line for grepability.
>
>
> Björn
>
> [1]
> https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/
--
Masami Hiramatsu (Google)
tomic_set(>slot_count, 1);
> - arch_uprobe_copy_ixol(area->pages[0], 0, , UPROBE_SWBP_INSN_SIZE);
> + insns = arch_uprobe_trampoline(_size);
> + arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
>
> if (!xol_add_vma(mm, area))
> return area;
> @@ -2123,7 +2133,7 @@ static struct return_instance
> *find_next_ret_chain(struct return_instance *ri)
> return ri;
> }
>
> -static void handle_trampoline(struct pt_regs *regs)
> +void uprobe_handle_trampoline(struct pt_regs *regs)
> {
> struct uprobe_task *utask;
> struct return_instance *ri, *next;
> @@ -2188,7 +2198,7 @@ static void handle_swbp(struct pt_regs *regs)
>
> bp_vaddr = uprobe_get_swbp_addr(regs);
> if (bp_vaddr == get_trampoline_vaddr())
> - return handle_trampoline(regs);
> + return uprobe_handle_trampoline(regs);
>
> uprobe = find_active_uprobe(bp_vaddr, _swbp);
> if (!uprobe) {
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index faad00cce269..be6195e0d078 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -391,3 +391,5 @@ COND_SYSCALL(setuid16);
>
> /* restartable sequence */
> COND_SYSCALL(rseq);
> +
> +COND_SYSCALL(uretprobe);
> --
> 2.44.0
>
--
Masami Hiramatsu (Google)
:
> > >
> > > > On Mon, 1 Apr 2024 20:25:52 +0900
> > > > Masami Hiramatsu (Google) wrote:
> > > >
> > > > > > Masami,
> > > > > >
> > > > > > Are you OK with just keeping it set to N.
> &
On Mon, 1 Apr 2024 12:09:18 -0400
Steven Rostedt wrote:
> On Mon, 1 Apr 2024 20:25:52 +0900
> Masami Hiramatsu (Google) wrote:
>
> > > Masami,
> > >
> > > Are you OK with just keeping it set to N.
> >
> > OK, if it is only for the debugging,
Thanks!
>
> Masami,
>
> Are you OK with just keeping it set to N.
OK, if it is only for the debugging, I'm OK to set N this.
>
> We could have other options like PROVE_LOCKING enable it.
Agreed (but it should say this is a debug option)
Thank you,
>
> -- Steve
--
Masami Hiramatsu (Google)
gt; > > }
> > > > > > > > @@ -935,9 +935,9 @@ static void delete_uprobe(struct uprobe
> > > > > > > > *uprobe)
> > > > > > > > if (WARN_ON(!uprobe_is_active(uprobe)))
> > > > > > > > return;
> > > > &g
struct inode
> > > > > *inode,
> > > > > min = vaddr_to_offset(vma, start);
> > > > > max = min + (end - start) - 1;
> > > > >
> > > > > - spin_lock(_treelock);
> > > > > + read_lock(_treelock);
> > > > > n = find_node_in_range(inode, min, max);
> > > > > if (n) {
> > > > > for (t = n; t; t = rb_prev(t)) {
> > > > > @@ -1316,7 +1316,7 @@ static void build_probe_list(struct inode
> > > > > *inode,
> > > > > get_uprobe(u);
> > > > > }
> > > > > }
> > > > > - spin_unlock(_treelock);
> > > > > + read_unlock(_treelock);
> > > > > }
> > > > >
> > > > > /* @vma contains reference counter, not the probed instruction. */
> > > > > @@ -1407,9 +1407,9 @@ vma_has_uprobes(struct vm_area_struct *vma,
> > > > > unsigned long start, unsigned long e
> > > > > min = vaddr_to_offset(vma, start);
> > > > > max = min + (end - start) - 1;
> > > > >
> > > > > - spin_lock(_treelock);
> > > > > + read_lock(_treelock);
> > > > > n = find_node_in_range(inode, min, max);
> > > > > - spin_unlock(_treelock);
> > > > > + read_unlock(_treelock);
> > > > >
> > > > > return !!n;
> > > > > }
> > > > > --
> > > > > 2.43.0
> > > > >
> > > >
> > > >
> > > > --
> > > > Masami Hiramatsu (Google)
> >
> >
> > --
> > Masami Hiramatsu (Google)
--
Masami Hiramatsu (Google)
ution time is observed (28 -> 3.5
> > > microsecs).
> >
> > Looks good to me.
> >
> > Acked-by: Masami Hiramatsu (Google)
> >
> > BTW, how did you measure the overhead? I think spinlock overhead
> > will depend on how much lo
> > > This change has been tested against production workloads that exhibit
> > > significant contention on the spinlock and an almost order of magnitude
> > > reduction for mean uprobe execution time is observed (28 -> 3.5
> > > microsecs).
> >
> > Looks go
found that exposing CONFIG_ALLOC_EXECMEM
to user does not help, it should be an internal change. So hiding this change
from user is better choice. Then there is no reason to introduce the new
alloc_execmem, but just expand kprobe_alloc_insn_page() is reasonable.
Mark, can you send this series here, so that others can review/test it?
Thank you!
>
> Thanks,
> Mark.
--
Masami Hiramatsu (Google)
deed. We need module_state() function to avoid it.
> >
> > It is non-existent unless CONFIG_MODULES is set given how things are
> > flagged in include/linux/module.h.
>
> Hey, noticed kconfig issue.
>
> According to kconfig-language.txt:
>
> "select should be used with care. select will force a symbol to a value
> without visiting the dependencies."
>
> So the problem here lies in KPROBES config entry using select statement
> to pick ALLOC_EXECMEM. It will not take the depends on statement into
> account and thus will allow to select kprobes without any allocator in
> place.
OK, in that case "depend on" is good.
>
> So to address this I'd suggest to use depends on statement also for
> describing relation between KPROBES and ALLOC_EXECMEM. It does not make
> life worse than before for anyone because even with the current kernel
> you have to select MODULES before you can move forward with kprobes.
Yeah, since ALLOC_EXECMEM is enabled by default.
Thank you!
>
> BR, Jarkko
--
Masami Hiramatsu (Google)
e_module_callback(struct notifier_block *nb,
> unsigned long val, void *data)
> @@ -704,6 +709,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> .notifier_call = trace_kprobe_module_callback,
> .priority = 1 /* Invoked after kprobe module callback */
> };
> +#endif /* CONFIG_MODULES */
As I similar to the above, let's move trace_kprobe_module_nb outside
of #ifdef.
>
> static int count_symbols(void *data, unsigned long unused)
> {
> @@ -1897,8 +1903,10 @@ static __init int init_kprobe_trace_early(void)
> if (ret)
> return ret;
>
> +#ifdef CONFIG_MODULES
> if (register_module_notifier(_kprobe_module_nb))
> return -EINVAL;
> +#endif /* CONFIG_MODULES */
And remove this #ifdef.
Thank you!
>
> return 0;
> }
> --
> 2.44.0
>
--
Masami Hiramatsu (Google)
mp; IS_ENABLED(CONFIG_MODULES))
err = register_module_notifier(_module_nb);
kprobes_initialized = (err == 0);
-
Thank you,
--
Masami Hiramatsu (Google)
his change has been tested against production workloads that exhibit
> significant contention on the spinlock and an almost order of magnitude
> reduction for mean uprobe execution time is observed (28 -> 3.5 microsecs).
Looks good to me.
Acked-by: Masami Hiramatsu (Google)
BTW, how d
checks to make sure RCU is on when ftrace callbacks recurse.
> +
> + This will add more overhead to all ftrace-based invocations.
... invocations, but keep it safe.
> +
> + If unsure, say N
If unsure, say Y
Thank you,
> +
> config RING_BUFFER_RECORD_RECURSION
> bool "Record functions that recurse in the ring buffer"
> depends on FTRACE_RECORD_RECURSION
> --
> 2.43.0
>
--
Masami Hiramatsu (Google)
et == -ENOENT) {
> pr_warn("This probe might be able to register after target
> module is loaded. Continue.\n");
> ret = 0;
> }
> @@ -670,6 +680,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
> return ret;
> }
>
> +#ifdef CONFIG_MODULES
> /* Module notifier call back, checking event on the module */
> static int trace_kprobe_module_callback(struct notifier_block *nb,
> unsigned long val, void *data)
> @@ -704,6 +715,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> .notifier_call = trace_kprobe_module_callback,
> .priority = 1 /* Invoked after kprobe module callback */
> };
> +#endif /* CONFIG_MODULES */
>
> static int count_symbols(void *data, unsigned long unused)
> {
> @@ -1897,8 +1909,10 @@ static __init int init_kprobe_trace_early(void)
> if (ret)
> return ret;
>
> +#ifdef CONFIG_MODULES
> if (register_module_notifier(_kprobe_module_nb))
> return -EINVAL;
> +#endif /* CONFIG_MODULES */
>
> return 0;
> }
> --
> 2.44.0
>
--
Masami Hiramatsu (Google)
"%s%s+0x0(+0x%zx(+0x%zx(%s))):string",
> +equal ? tmp : "", equal ? "=" : "",
> +offsetof(struct dentry, d_name.name),
> +offsetof(struct file, f_path.dentry),
> +equal ? equal + 1 : tmp);
> +
> + kfree(tmp);
> + if (ret >= bufsize - used)
> + goto nomem;
> + argv[i] = tmpbuf + used;
> + used += ret + 1;
> }
>
> *buf = tmpbuf;
> --
> 2.31.1
>
--
Masami Hiramatsu (Google)
le addresses.
>
Hi Ye,
Thanks for update! There is a nit comment but basically OK.
Acked-by: Masami Hiramatsu (Google)
Thank you,
> Diff v8 vs v7:
> 1. Add detail change log for patch[1-2];
>
> Diff v7 vs v6:
> 1. Squash [1/8] to [3/8] patches into 1 patch;
> 2. S
On Fri, 22 Mar 2024 00:07:59 +0900
Masami Hiramatsu (Google) wrote:
> > What would be really useful is if we had a way to expose BTF here.
> > Something like:
> >
> > "%pB::"
> >
> > The "%pB" would mean to look up the struct/field o
p the struct/field offsets and types via BTF,
> and create the appropriate command to find and print it.
Would you mean casing the pointer to ""?
>
> That would be much more flexible and useful as it would allow reading any
> field from any structure without having to use gdb.
>
> -- Steve
>
>
> > + kfree(tmp);
> > + if (ret >= bufsize - used)
> > + goto nomem;
> > + argv[i] = tmpbuf + used;
> > + used += ret + 1;
> > + }
> > + }
> > +
> > + *buf = tmpbuf;
> > + return 0;
> > +nomem:
> > + kfree(tmpbuf);
> > + return -ENOMEM;
> > +}
>
--
Masami Hiramatsu (Google)
dentry/file addresses.
Thanks for update! This series looks good to me.
Acked-by: Masami Hiramatsu (Google)
Let me pick this and test.
Thank you!
>
> Diff v7 vs v6:
> 1. Squash [1/8] to [3/8] patches into 1 patch;
> 2. Split readme_msg[] into each patch;
>
> Diff v6 vs v
On Wed, 20 Mar 2024 09:27:49 -0400
Steven Rostedt wrote:
> On Wed, 20 Mar 2024 17:10:38 +0900
> "Masami Hiramatsu (Google)" wrote:
>
> > From: Masami Hiramatsu (Google)
> >
> > Fix to initialize 'val' local variable with zero.
> > Dan reported that
@@ static int __trace_fprobe_create(int argc, const char
> *argv[])
> trace_probe_log_clear();
> kfree(new_argv);
> kfree(symbol);
> + kfree(dbuf);
> return ret;
>
> parse_error:
> --
> 2.31.1
>
--
Masami Hiramatsu (Google)
_cnt);
> + do_div(delta, (u32)bm_cnt);
> avg = delta;
>
> if (stddev > 0) {
> --
> 2.44.0
>
--
Masami Hiramatsu (Google)
race_kprobe.c | 6 ++
> kernel/trace/trace_probe.c| 63 +++
> kernel/trace/trace_probe.h| 2 +
> .../ftrace/test.d/dynevent/fprobe_args_vfs.tc | 40
> .../ftrace/test.d/kprobe/kprobe_args_vfs.tc | 43 +
> 8 files changed, 167 insertions(+), 3 deletions(-)
> create mode 100644
> tools/testing/selftests/ftrace/test.d/dynevent/fprobe_args_vfs.tc
> create mode 100644
> tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc
>
> --
> 2.31.1
>
--
Masami Hiramatsu (Google)
\[\\]\n"
> + "\t symstr, %pd/%pD, \\[\\]\n"
> #ifdef CONFIG_HIST_TRIGGERS
> "\tfield: ;\n"
> "\tstype: u8/u16/u32/u64, s8/s16/s32/s64, pid_t,\n"
> --
> 2.31.1
>
--
Masami Hiramatsu (Google)
On Sun, 17 Mar 2024 10:53:59 -0500
Jinghao Jia wrote:
>
>
> On 3/16/24 08:46, Masami Hiramatsu (Google) wrote:
> > On Thu, 14 Mar 2024 18:56:35 -0500
> > Jinghao Jia wrote:
> >
> >> On 3/14/24 10:17, Masami Hiramatsu (Google) wrote:
> >>> F
From: Masami Hiramatsu (Google)
Fix to initialize 'val' local variable with zero.
Dan reported that Smatch static code checker reports an error that a local
'val' variable needs to be initialized. Actually, the 'val' is expected to
be initialized by FETCH_OP_ARG in the same loop
On Tue, 19 Mar 2024 10:10:00 -0400
Steven Rostedt wrote:
> On Tue, 19 Mar 2024 10:19:09 +0300
> Dan Carpenter wrote:
>
> > Hello Masami Hiramatsu (Google),
> >
> > Commit 25f00e40ce79 ("tracing/probes: Support $argN in return probe
> > (kprobe and
st argument %pD with name"
> +echo 'p:testprobe vfs_read name=$arg1:%pD' > kprobe_events
> +echo 1 > events/kprobes/testprobe/enable
> +grep -q "1" events/kprobes/testprobe/enable
> +echo 0 > events/kprobes/testprobe/enable
> +grep "vfs_read" trace | grep -q "enable"
> +echo "" > kprobe_events
> +echo "" > trace
> +
> +: "Test argument %pD without name"
> +echo 'p:testprobe vfs_read $arg1:%pD' > kprobe_events
> +echo 1 > events/kprobes/testprobe/enable
> +grep -q "1" events/kprobes/testprobe/enable
> +echo 0 > events/kprobes/testprobe/enable
> +grep "vfs_read" trace | grep -q "enable"
> +echo "" > kprobe_events
> +echo "" > trace
> --
> 2.31.1
>
--
Masami Hiramatsu (Google)
gt; > uprobes: encapsulate preparation of uprobe args buffer
> > > uprobes: prepare uprobe args buffer lazily
> > > uprobes: add speculative lockless system-wide uprobe filter check
> > >
> > > kernel/trace/trace_uprobe.c | 103 +---
> > > 1 file changed, 59 insertions(+), 44 deletions(-)
> > >
> > > --
> > > 2.43.0
> > >
> >
> >
> > --
> > Masami Hiramatsu (Google)
--
Masami Hiramatsu (Google)
On Tue, 19 Mar 2024 13:20:57 +0900
Masami Hiramatsu (Google) wrote:
> Hi,
>
> On Mon, 18 Mar 2024 11:17:25 -0700
> Andrii Nakryiko wrote:
>
> > This patch set implements two speed ups for uprobe/uretprobe runtime
> > execution
> > path for some common scenar
stem-wide uprobe filter check
>
> kernel/trace/trace_uprobe.c | 103 +---
> 1 file changed, 59 insertions(+), 44 deletions(-)
>
> --
> 2.43.0
>
--
Masami Hiramatsu (Google)
From: Masami Hiramatsu (Google)
Read from an unsafe address with copy_from_kernel_nofault() in
arch_adjust_kprobe_addr() because this function is used before checking
the address is in text or not. Syzcaller bot found a bug and reported
the case if user specifies inaccessible data area
On Fri, 15 Mar 2024 00:12:30 +0900
"Masami Hiramatsu (Google)" wrote:
> From: Masami Hiramatsu (Google)
>
> Read from an unsafe address with copy_from_kernel_nofault() in
> arch_adjust_kprobe_addr() because this function is used before checking
> the address is in t
From: Masami Hiramatsu (Google)
Read from an unsafe address with copy_from_kernel_nofault() in
arch_adjust_kprobe_addr() because this function is used before checking
the address is in text or not. Syzcaller bot found a bug and reported
the case if user specifies inaccessible data area
On Tue, 12 Mar 2024 11:56:41 -0400
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> The rb_watermark_hit() checks if the amount of data in the ring buffer is
> above the percentage level passed in by the "full" variable. If it is, it
>
On Tue, 12 Mar 2024 09:19:21 -0400
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> The check for knowing if the poll should wait or not is basically the
> exact same logic as rb_watermark_hit(). The only difference is that
> rb_watermark_hit() al
On Tue, 12 Mar 2024 09:19:20 -0400
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> If a reader of the ring buffer is doing a poll, and waiting for the ring
> buffer to hit a specific watermark, there could be a case where it gets
> into an infinite ping
naming, execmem_alloc() seems reasonable to me? I have no strong
> > feelings at all, I'll just use that going forward unless somebody else
> > expresses an opinion.
>
> I am not good at naming things. No objection from me to "execmem_alloc".
Hm, it sounds good to me too. I think we should add a patch which just
rename the module_alloc/module_memfree with execmem_alloc/free first.
Thanks,
--
Masami Hiramatsu (Google)
{
> > int ret;
> >
> > @@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > goto out;
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
>
> Plain #ifdef will do here and below. IS_ENABLED is for usage withing the
> code, like
>
> if (IS_ENABLED(CONFIG_MODULES))
> ;
>
> > /* Check if 'p' is probing a module. */
> > *probed_mod = __module_text_address((unsigned long) p->addr);
> > if (*probed_mod) {
>
> --
> Sincerely yours,
> Mike.
--
Masami Hiramatsu (Google)
@@ static struct notifier_block trace_kprobe_module_nb = {
> .notifier_call = trace_kprobe_module_callback,
> .priority = 1 /* Invoked after kprobe module callback */
> };
> +#endif /* IS_ENABLED(CONFIG_MODULES) */
>
> static int count_symbols(void *data, unsigned long unused)
> {
> @@ -1897,8 +1906,10 @@ static __init int init_kprobe_trace_early(void)
> if (ret)
> return ret;
>
> +#if IS_ENABLED(CONFIG_MODULES)
> if (register_module_notifier(_kprobe_module_nb))
> return -EINVAL;
> +#endif
>
> return 0;
> }
> --
> 2.43.0
>
--
Masami Hiramatsu (Google)
>
> If you're gonna split code up to move to another place, it'd be nice
> if you can add copyright headers as was done with the kernel/module.c
> split into kernel/module/*.c
>
> Can we start with some small basic stuff we can all agree on?
>
> [0] https://lwn.net/Articles/944857/
>
> Luis
--
Masami Hiramatsu (Google)
VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK,
> + NUMA_NO_NODE, __builtin_return_address(0));
>
On Mon, 4 Mar 2024 22:34:33 -0500
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> Limit the max print event of trace_marker to just 4K string size. This must
> also be less than the amount that can be held by a trace_seq along with
> the text that i
On Thu, 29 Feb 2024 22:58:54 +0100
Jiri Olsa wrote:
> On Thu, Feb 29, 2024 at 08:22:47PM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google)
> >
> > Fix to allocate fprobe::entry_data_size buffer with rethook instances.
> > If fprobe doesn
On Thu, 29 Feb 2024 20:22:47 +0900
"Masami Hiramatsu (Google)" wrote:
> From: Masami Hiramatsu (Google)
>
> Fix to allocate fprobe::entry_data_size buffer with rethook instances.
> If fprobe doesn't allocate entry_data_size buffer for each rethook instance,
> fprob
From: Masami Hiramatsu (Google)
Fix to allocate fprobe::entry_data_size buffer with rethook instances.
If fprobe doesn't allocate entry_data_size buffer for each rethook instance,
fprobe entry handler can cause a buffer overrun when storing entry data in
entry handler.
Reported-by: Jiri Olsa
On Tue, 27 Feb 2024 12:57:06 -0500
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> The trace_marker write goes into the ring buffer. A test was added to
> write a string as big as the sub-buffer of the ring buffer to see if it
> would work. A sub-buff
On Mon, 26 Feb 2024 13:56:09 +0100
Jiri Olsa wrote:
> On Mon, Feb 26, 2024 at 12:20:43AM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google)
> >
> > Rewrite fprobe implementation on function-graph tracer.
> > Major API changes are:
&
ernel config in the attachment.
> I noticed that you are developing event tracing tprobe subsystem.
> May be you could explain such behavior?
> Thank you in advance!
>
> --
> Best regards
> Maksim Morskov
--
Masami Hiramatsu (Google)
function parameter descriptions.
>
> Fix one return values list so that it is formatted correctly when
> rendered for output.
>
> Spell "non-zero" with a hyphen in several places.
Looks good to me.
Acked-by: Masami Hiramatsu (Google)
Thanks!
>
> Signed-off-by: R
On Fri, 23 Feb 2024 16:25:19 -0500
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> The second parameter of __assign_rel_str() is no longer used. It can be
> removed.
>
> Note, the only real users of rel_string is user events. This code is just
>
On Fri, 23 Feb 2024 16:13:56 -0500
Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> In preparation to remove the second parameter of __assign_str(), make sure
> it is really a duplicate of __string() by adding a WARN_ON_ONCE().
>
Looks good to me. So e
cked-by: Masami Hiramatsu (Google)
Thanks!
> Signed-off-by: Thorsten Blum
> ---
> kernel/trace/trace_benchmark.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c
> index 54d5fa35c90a
From: Masami Hiramatsu (Google)
Update fprobe documentation for the new fprobe on function-graph
tracer. This includes some bahvior changes and pt_regs to
ftrace_regs interface change.
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v2:
- Update @fregs parameter explanation
From: Masami Hiramatsu (Google)
This test case repeats define and undefine the fprobe dynamic event to
ensure that the fprobe does not cause any issue with such operations.
Signed-off-by: Masami Hiramatsu (Google)
---
.../test.d/dynevent/add_remove_fprobe_repeat.tc| 19
From: Masami Hiramatsu (Google)
Since the fprobe event does not support maxactive anymore, stop
testing the maxactive syntax error checking.
Signed-off-by: Masami Hiramatsu (Google)
---
.../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |4 +---
1 file changed, 1 insertion(+), 3 deletions
From: Masami Hiramatsu (Google)
Remove depercated fprobe::nr_maxactive. This involves fprobe events to
rejects the maxactive number.
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v2:
- Newly added.
---
include/linux/fprobe.h |2 --
kernel/trace/trace_fprobe.c | 44
From: Masami Hiramatsu (Google)
Rewrite fprobe implementation on function-graph tracer.
Major API changes are:
- 'nr_maxactive' field is deprecated.
- This depends on CONFIG_DYNAMIC_FTRACE_WITH_ARGS or
!CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and
CONFIG_HAVE_FUNCTION_GRAPH_FREGS. So
From: Masami Hiramatsu (Google)
Add CONFIG_HAVE_FTRACE_GRAPH_FUNC kconfig in addition to ftrace_graph_func
macro check. This is for the other feature (e.g. FPROBE) which requires to
access ftrace_regs from fgraph_ops::entryfunc() can avoid compiling if
the fgraph can not pass the valid
From: Masami Hiramatsu (Google)
Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is
converted from ftrace_regs by ftrace_partial_regs(), thus some registers
may always returns 0. But it should be enough for function entry (access
arguments) and exit (access return value
From: Masami Hiramatsu (Google)
Allow fprobe events to be enabled with CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
With this change, fprobe events mostly use ftrace_regs instead of pt_regs.
Note that if the arch doesn't enable HAVE_PT_REGS_COMPAT_FTRACE_REGS,
fprobe events will not be able to be used from
From: Masami Hiramatsu (Google)
Add ftrace_fill_perf_regs() which should be compatible with the
perf_fetch_caller_regs(). In other words, the pt_regs returned from the
ftrace_fill_perf_regs() must satisfy 'user_mode(regs) == false' and can be
used for stack tracing.
Signed-off-by: Masami
From: Masami Hiramatsu (Google)
Add ftrace_partial_regs() which converts the ftrace_regs to pt_regs.
This is for the eBPF which needs this to keep the same pt_regs interface
to access registers.
Thus when replacing the pt_regs with ftrace_regs in fprobes (which is
used by kprobe_multi eBPF event
From: Masami Hiramatsu (Google)
Change the fprobe exit handler to use ftrace_regs structure instead of
pt_regs. This also introduce HAVE_PT_REGS_TO_FTRACE_REGS_CAST which means
the ftrace_regs's memory layout is equal to the pt_regs so that those are
able to cast. Fprobe introduces a new
From: Masami Hiramatsu (Google)
This allows fprobes to be available with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
instead of CONFIG_DYNAMIC_FTRACE_WITH_REGS, then we can enable fprobe
on arm64.
Signed-off-by: Masami Hiramatsu (Google)
Acked-by: Florent Revest
---
Changes in v6:
- Keep using
From: Masami Hiramatsu (Google)
Pass ftrace_regs to the fgraph_ops::retfunc(). If ftrace_regs is not
available, it passes a NULL instead. User callback function can access
some registers (including return address) via this ftrace_regs.
Signed-off-by: Masami Hiramatsu (Google)
---
Changes
From: Masami Hiramatsu (Google)
Use ftrace_regs instead of fgraph_ret_regs for tracing return value
on function_graph tracer because of simplifying the callback interface.
The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by
CONFIG_HAVE_FUNCTION_GRAPH_FREGS.
Signed-off-by: Masami
From: Masami Hiramatsu (Google)
Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not
available, it passes a NULL instead. User callback function can access
some registers (including return address) via this ftrace_regs.
Signed-off-by: Masami Hiramatsu (Google)
---
Changes
From: Masami Hiramatsu (Google)
Add a selftest for multiple function graph tracer with storage on a same
function. In this case, the shadow stack entry will be shared among those
fgraph with different data storage. So this will ensure the fgraph will
not mixed those storage data.
Signed-off
From: Steven Rostedt (VMware)
Add boot up selftest that passes variables from a function entry to a
function exit, and make sure that they do get passed around.
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v2:
- Add reserved size test
shadow
ret_stack and this then can be retrieved by fgraph_retrieve_data() called
by the corresponding retfunc().
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v8:
- Avoid using DIV_ROUND_UP() in the hot path.
Changes in v3:
- Store
are)
Signed-off-by: Masami Hiramatsu (Google)
---
Changes in v2:
- Make description lines shorter than 76 chars.
---
include/linux/trace_recursion.h |7 ---
kernel/trace/trace.h |9 +
kernel/trace/trace_functions_graph.c | 10 ++
3 files chan
are)
Signed-off-by: Masami Hiramatsu (Google)
---
include/linux/trace_recursion.h | 29 -
kernel/trace/trace.h| 34 --
2 files changed, 32 insertions(+), 31 deletions(-)
diff --git a/include/linux/trace_recursion.h b/incl
201 - 300 of 1394 matches
Mail list logo