Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional
On Tue, 26 Mar 2024 09:16:33 -0700 Andrii Nakryiko wrote: > > It's no different than lockdep. Test boxes should have it enabled, but > > there's no reason to have this enabled in a production system. > > > > I tend to agree with Steven here (which is why I sent this patch as it > is), but I'm happy to do it as an opt-out, if Masami insists. Please > do let me know if I need to send v2 or this one is actually the one > we'll end up using. Thanks! Masami, Are you OK with just keeping it set to N. We could have other options like PROVE_LOCKING enable it. -- Steve
Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional
On Mon, Mar 25, 2024 at 3:11 PM Steven Rostedt wrote: > > On Mon, 25 Mar 2024 11:38:48 +0900 > Masami Hiramatsu (Google) wrote: > > > On Fri, 22 Mar 2024 09:03:23 -0700 > > Andrii Nakryiko wrote: > > > > > Introduce CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING config option to > > > control whether ftrace low-level code performs additional > > > rcu_is_watching()-based validation logic in an attempt to catch noinstr > > > violations. > > > > > > This check is expected to never be true in practice and would be best > > > controlled with extra config to let users decide if they are willing to > > > pay the price. > > > > Hmm, for me, it sounds like "WARN_ON(something) never be true in practice > > so disable it by default". I think CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING > > is OK, but tht should be set to Y by default. If you have already verified > > that your system never make it true and you want to optimize your ftrace > > path, you can manually set it to N at your own risk. > > > > Really, it's for debugging. I would argue that it should *not* be default y. > Peter added this to find all the locations that could be called where RCU > is not watching. But the issue I have is that this is that it *does cause > overhead* with function tracing. > > I believe we found pretty much all locations that were an issue, and we > should now just make it an option for developers. > > It's no different than lockdep. Test boxes should have it enabled, but > there's no reason to have this enabled in a production system. > I tend to agree with Steven here (which is why I sent this patch as it is), but I'm happy to do it as an opt-out, if Masami insists. Please do let me know if I need to send v2 or this one is actually the one we'll end up using. Thanks! > -- Steve > > > > > > > > Cc: Steven Rostedt > > > Cc: Masami Hiramatsu > > > Cc: Paul E. McKenney > > > Signed-off-by: Andrii Nakryiko > > > --- > > > include/linux/trace_recursion.h | 2 +- > > > kernel/trace/Kconfig| 13 + > > > 2 files changed, 14 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/trace_recursion.h > > > b/include/linux/trace_recursion.h > > > index d48cd92d2364..24ea8ac049b4 100644 > > > --- a/include/linux/trace_recursion.h > > > +++ b/include/linux/trace_recursion.h > > > @@ -135,7 +135,7 @@ extern void ftrace_record_recursion(unsigned long ip, > > > unsigned long parent_ip); > > > # define do_ftrace_record_recursion(ip, pip) do { } while (0) > > > #endif > > > > > > -#ifdef CONFIG_ARCH_WANTS_NO_INSTR > > > +#ifdef CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING > > > # define trace_warn_on_no_rcu(ip) \ > > > ({ \ > > > bool __ret = !rcu_is_watching();\ > > > > BTW, maybe we can add "unlikely" in the next "if" line? > > > > > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig > > > index 61c541c36596..19bce4e217d6 100644 > > > --- a/kernel/trace/Kconfig > > > +++ b/kernel/trace/Kconfig > > > @@ -974,6 +974,19 @@ config FTRACE_RECORD_RECURSION_SIZE > > > This file can be reset, but the limit can not change in > > > size at runtime. > > > > > > +config FTRACE_VALIDATE_RCU_IS_WATCHING > > > + bool "Validate RCU is on during ftrace recursion check" > > > + depends on FUNCTION_TRACER > > > + depends on ARCH_WANTS_NO_INSTR > > > > default y > > > > > + help > > > + All callbacks that attach to the function tracing have some sort > > > + of protection against recursion. This option performs additional > > > + checks to make sure RCU is on when ftrace callbacks recurse. > > > + > > > + This will add more overhead to all ftrace-based invocations. > > > > ... invocations, but keep it safe. > > > > > + > > > + If unsure, say N > > > > If unsure, say Y > > > > Thank you, > > > > > + > > > config RING_BUFFER_RECORD_RECURSION > > > bool "Record functions that recurse in the ring buffer" > > > depends on FTRACE_RECORD_RECURSION > > > -- > > > 2.43.0 > > > > > > > >
Re: [PATCH net-next 0/3] trace: use TP_STORE_ADDRS macro
On Tue, Mar 26, 2024 at 9:18 PM Eric Dumazet wrote: > > On Tue, Mar 26, 2024 at 11:44 AM Jason Xing wrote: > > > Well, it's a pity that it seems that we are about to abandon this > > method but it's not that friendly to the users who are unable to > > deploy BPF... > > It is a pity these tracepoint patches are consuming a lot of reviewer > time, just because > some people 'can not deploy BPF' Sure, not everyone can do this easily. The phenomenon still exists and we cannot ignore it. Do you remember that about a month ago someone submitted one patch introducing a new tracepoint and then I replied to/asked you if it's necessary that we replace most of the tracepoints with BPF? Now I realise and accept the fact... I'll keep reviewing such patches and hope it can give you maintainers a break. I don't mind taking some time to do it, after all it's not a bad thing to help some people. > > Well, I came up with more ideas about how to improve the > > trace function in recent days. The motivation of doing this is that I > > encountered some issues which could be traced/diagnosed by using trace > > effortlessly without writing some bpftrace codes again and again. The > > status of trace seems not active but many people are still using it, I > > believe. > > 'Writing bpftrace codes again and again' is not a good reason to add > maintenance costs > to linux networking stack. I'm just saying :)
Re: [PATCH net-next 0/3] trace: use TP_STORE_ADDRS macro
On Tue, Mar 26, 2024 at 11:44 AM Jason Xing wrote: > Well, it's a pity that it seems that we are about to abandon this > method but it's not that friendly to the users who are unable to > deploy BPF... It is a pity these tracepoint patches are consuming a lot of reviewer time, just because some people 'can not deploy BPF' Well, I came up with more ideas about how to improve the > trace function in recent days. The motivation of doing this is that I > encountered some issues which could be traced/diagnosed by using trace > effortlessly without writing some bpftrace codes again and again. The > status of trace seems not active but many people are still using it, I > believe. 'Writing bpftrace codes again and again' is not a good reason to add maintenance costs to linux networking stack.
Re: [PATCH net-next 0/3] trace: use TP_STORE_ADDRS macro
On Tue, 2024-03-26 at 18:43 +0800, Jason Xing wrote: > On Tue, Mar 26, 2024 at 6:29 PM Paolo Abeni wrote: > > > > On Tue, 2024-03-26 at 12:14 +0800, Jason Xing wrote: > > > On Mon, Mar 25, 2024 at 11:43 AM Jason Xing > > > wrote: > > > > > > > > From: Jason Xing > > > > > > > > Using the macro for other tracepoints use to be more concise. > > > > No functional change. > > > > > > > > Jason Xing (3): > > > > trace: move to TP_STORE_ADDRS related macro to net_probe_common.h > > > > trace: use TP_STORE_ADDRS() macro in inet_sk_error_report() > > > > trace: use TP_STORE_ADDRS() macro in inet_sock_set_state() > > > > > > > > include/trace/events/net_probe_common.h | 29 > > > > include/trace/events/sock.h | 35 - > > > > > > I just noticed that some trace files in include/trace directory (like > > > net_probe_common.h, sock.h, skb.h, net.h, sock.h, udp.h, sctp.h, > > > qdisc.h, neigh.h, napi.h, icmp.h, ...) are not owned by networking > > > folks while some files (like tcp.h) have been maintained by specific > > > maintainers/experts (like Eric) because they belong to one specific > > > area. I wonder if we can get more networking guys involved in net > > > tracing. > > > > > > I'm not sure if 1) we can put those files into the "NETWORKING > > > [GENERAL]" category, or 2) we can create a new category to include > > > them all. > > > > I think all the file you mentioned are not under networking because of > > MAINTAINER file inaccuracy, and we could move there them accordingly. > > Yes, they are not under the networking category currently. So how > could we move them? The MAINTAINER file doesn't have all the specific > categories which are suitable for each of the trace files. I think there is no need to other categories: adding the explicit 'F:' entries for such files in the NETWORKING [GENERAL] section should fit. > > > I know people start using BPF to trace them all instead, but I can see > > > some good advantages of those hooks implemented in the kernel, say: > > > 1) help those machines which are not easy to use BPF tools. > > > 2) insert the tracepoint in the middle of some functions which cannot > > > be replaced by bpf kprobe. > > > 3) if we have enough tracepoints, we can generate a timeline to > > > know/detect which flow/skb spends unexpected time at which point. > > > ... > > > We can do many things in this area, I think :) > > > > > > What do you think about this, Jakub, Paolo, Eric ? > > > > I agree tracepoints are useful, but I think the general agreement is > > that they are the 'old way', we should try to avoid their > > proliferation. > > Well, it's a pity that it seems that we are about to abandon this > method but it's not that friendly to the users who are unable to > deploy BPF... Well, I came up with more ideas about how to improve the > trace function in recent days. The motivation of doing this is that I > encountered some issues which could be traced/diagnosed by using trace > effortlessly without writing some bpftrace codes again and again. The > status of trace seems not active but many people are still using it, I > believe. I don't think we should abandon it completely. My understanding is that we should thing carefully before adding new tracepoints, and generally speaking, avoid adding 'too many' of them. Cheers, Paolo
Re: [PATCH net-next v2 3/3] tcp: add location into reset trace process
On Mon, 2024-03-25 at 14:28 +0800, Jason Xing wrote: > From: Jason Xing > > In addition to knowing the 4-tuple of the flow which generates RST, > the reason why it does so is very important because we have some > cases where the RST should be sent and have no clue which one > exactly. > > Adding location of reset process can help us more, like what > trace_kfree_skb does. > > Signed-off-by: Jason Xing > --- > include/trace/events/tcp.h | 14 ++ > net/ipv4/tcp_ipv4.c| 2 +- > net/ipv4/tcp_output.c | 2 +- > net/ipv6/tcp_ipv6.c| 2 +- > 4 files changed, 13 insertions(+), 7 deletions(-) > > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h > index a13eb2147a02..8f6c1a07503c 100644 > --- a/include/trace/events/tcp.h > +++ b/include/trace/events/tcp.h > @@ -109,13 +109,17 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb, > */ > TRACE_EVENT(tcp_send_reset, > > - TP_PROTO(const struct sock *sk, const struct sk_buff *skb), > + TP_PROTO( > + const struct sock *sk, > + const struct sk_buff *skb, > + void *location), Very minor nit: the above lines should be aligned with the open bracket. No need to repost just for this, but let's wait for Eric's feedback. Cheers, Paolo
Re: [PATCH net-next 0/3] trace: use TP_STORE_ADDRS macro
Hello: This series was applied to netdev/net-next.git (main) by Paolo Abeni : On Mon, 25 Mar 2024 11:43:44 +0800 you wrote: > From: Jason Xing > > Using the macro for other tracepoints use to be more concise. > No functional change. > > Jason Xing (3): > trace: move to TP_STORE_ADDRS related macro to net_probe_common.h > trace: use TP_STORE_ADDRS() macro in inet_sk_error_report() > trace: use TP_STORE_ADDRS() macro in inet_sock_set_state() > > [...] Here is the summary with links: - [net-next,1/3] trace: move to TP_STORE_ADDRS related macro to net_probe_common.h https://git.kernel.org/netdev/net-next/c/b3af9045b482 - [net-next,2/3] trace: use TP_STORE_ADDRS() macro in inet_sk_error_report() https://git.kernel.org/netdev/net-next/c/a24c855a5ef2 - [net-next,3/3] trace: use TP_STORE_ADDRS() macro in inet_sock_set_state() https://git.kernel.org/netdev/net-next/c/646700ce23f4 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH net-next 0/3] trace: use TP_STORE_ADDRS macro
On Tue, Mar 26, 2024 at 6:29 PM Paolo Abeni wrote: > > On Tue, 2024-03-26 at 12:14 +0800, Jason Xing wrote: > > On Mon, Mar 25, 2024 at 11:43 AM Jason Xing > > wrote: > > > > > > From: Jason Xing > > > > > > Using the macro for other tracepoints use to be more concise. > > > No functional change. > > > > > > Jason Xing (3): > > > trace: move to TP_STORE_ADDRS related macro to net_probe_common.h > > > trace: use TP_STORE_ADDRS() macro in inet_sk_error_report() > > > trace: use TP_STORE_ADDRS() macro in inet_sock_set_state() > > > > > > include/trace/events/net_probe_common.h | 29 > > > include/trace/events/sock.h | 35 - > > > > I just noticed that some trace files in include/trace directory (like > > net_probe_common.h, sock.h, skb.h, net.h, sock.h, udp.h, sctp.h, > > qdisc.h, neigh.h, napi.h, icmp.h, ...) are not owned by networking > > folks while some files (like tcp.h) have been maintained by specific > > maintainers/experts (like Eric) because they belong to one specific > > area. I wonder if we can get more networking guys involved in net > > tracing. > > > > I'm not sure if 1) we can put those files into the "NETWORKING > > [GENERAL]" category, or 2) we can create a new category to include > > them all. > > I think all the file you mentioned are not under networking because of > MAINTAINER file inaccuracy, and we could move there them accordingly. Yes, they are not under the networking category currently. So how could we move them? The MAINTAINER file doesn't have all the specific categories which are suitable for each of the trace files. > > > > I know people start using BPF to trace them all instead, but I can see > > some good advantages of those hooks implemented in the kernel, say: > > 1) help those machines which are not easy to use BPF tools. > > 2) insert the tracepoint in the middle of some functions which cannot > > be replaced by bpf kprobe. > > 3) if we have enough tracepoints, we can generate a timeline to > > know/detect which flow/skb spends unexpected time at which point. > > ... > > We can do many things in this area, I think :) > > > > What do you think about this, Jakub, Paolo, Eric ? > > I agree tracepoints are useful, but I think the general agreement is > that they are the 'old way', we should try to avoid their > proliferation. Well, it's a pity that it seems that we are about to abandon this method but it's not that friendly to the users who are unable to deploy BPF... Well, I came up with more ideas about how to improve the trace function in recent days. The motivation of doing this is that I encountered some issues which could be traced/diagnosed by using trace effortlessly without writing some bpftrace codes again and again. The status of trace seems not active but many people are still using it, I believe. Thanks, Jason > > Cheers, > > Paolo >
Re: [PATCH net-next 0/3] trace: use TP_STORE_ADDRS macro
On Tue, 2024-03-26 at 12:14 +0800, Jason Xing wrote: > On Mon, Mar 25, 2024 at 11:43 AM Jason Xing wrote: > > > > From: Jason Xing > > > > Using the macro for other tracepoints use to be more concise. > > No functional change. > > > > Jason Xing (3): > > trace: move to TP_STORE_ADDRS related macro to net_probe_common.h > > trace: use TP_STORE_ADDRS() macro in inet_sk_error_report() > > trace: use TP_STORE_ADDRS() macro in inet_sock_set_state() > > > > include/trace/events/net_probe_common.h | 29 > > include/trace/events/sock.h | 35 - > > I just noticed that some trace files in include/trace directory (like > net_probe_common.h, sock.h, skb.h, net.h, sock.h, udp.h, sctp.h, > qdisc.h, neigh.h, napi.h, icmp.h, ...) are not owned by networking > folks while some files (like tcp.h) have been maintained by specific > maintainers/experts (like Eric) because they belong to one specific > area. I wonder if we can get more networking guys involved in net > tracing. > > I'm not sure if 1) we can put those files into the "NETWORKING > [GENERAL]" category, or 2) we can create a new category to include > them all. I think all the file you mentioned are not under networking because of MAINTAINER file inaccuracy, and we could move there them accordingly. > > I know people start using BPF to trace them all instead, but I can see > some good advantages of those hooks implemented in the kernel, say: > 1) help those machines which are not easy to use BPF tools. > 2) insert the tracepoint in the middle of some functions which cannot > be replaced by bpf kprobe. > 3) if we have enough tracepoints, we can generate a timeline to > know/detect which flow/skb spends unexpected time at which point. > ... > We can do many things in this area, I think :) > > What do you think about this, Jakub, Paolo, Eric ? I agree tracepoints are useful, but I think the general agreement is that they are the 'old way', we should try to avoid their proliferation. Cheers, Paolo