On Sun,  5 Aug 2018 20:44:37 -0700
"Joel Fernandes (Google)" <j...@joelfernandes.org> wrote:

> Commit f37755490fe9 ("tracepoints: Do not trace when cpu is offline")
> causes a problem for lockdep using tracepoint code. Once a CPU is
> offline, tracepoints donot get called, however this causes a big problem
> for lockdep probes that need to run so that IRQ annotations are marked
> correctly.
> 
> An issue is possible where while the CPU is going offline, an interrupt
> can come in and then a lockdep assert causes an annotation warning:
> 
> [  106.551354] IRQs not enabled as expected
> [  106.551785] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:982
>                                          tick_nohz_idle_enter+0x99/0xb0
> [  106.552964] Modules linked in:
> [  106.553299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W
> 
> We need tracepoints to run as late as possible. This commit tries to fix
> the issue by removing the cpu_online check in tracepoint code that was
> introduced in the mentioned commit, however now we run the risk of
> running dereferencing probes that aren't RCU protected, which gives an
> RCU warning like so on boot up:
> [    0.030159] x86: Booting SMP configuration:
> [    0.030169] .... node  #0, CPUs:      #1
> [    0.001000]
> [    0.001000] =============================
> [    0.001000] WARNING: suspicious RCU usage
> [    0.001000] 4.18.0-rc6+ #42 Not tainted
> [    0.001000] -----------------------------
> [    0.001000] ./include/trace/events/timer.h:38 suspicious
>                               rcu_dereference_check() usage!
> [    0.001000]
> [    0.001000] other info that might help us debug this:
> [    0.001000]
> [    0.001000]
> [    0.001000] RCU used illegally from offline CPU!
> [    0.001000] rcu_scheduler_active = 1, debug_locks = 1
> [    0.001000] no locks held by swapper/1/0.
> [    0.001000]
> 
> Any ideas on how we can fix this? Basically we need RCU to work here
> even after !cpu_online. I thought of just using SRCU for all tracepoints
> however that may mean we can't use tracepoints from NMI..
> 
> Tries-to-Fix: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and
>                              unify their usage")
> Reported-by: Masami Hiramatsu <mhira...@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <j...@joelfernandes.org>
> ---
>  include/linux/tracepoint.h | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index d9a084c72541..020885714a0f 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -365,19 +365,17 @@ extern void syscall_unregfunc(void);
>   * "void *__data, proto" as the callback prototype.
>   */
>  #define DECLARE_TRACE_NOARGS(name)                                   \
> -     __DECLARE_TRACE(name, void, ,                                   \
> -                     cpu_online(raw_smp_processor_id()),             \
> +     __DECLARE_TRACE(name, void, , 1,                                \
>                       void *__data, __data)
>  
>  #define DECLARE_TRACE(name, proto, args)                             \
> -     __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),              \
> -                     cpu_online(raw_smp_processor_id()),             \
> +     __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), 1,           \
>                       PARAMS(void *__data, proto),                    \
>                       PARAMS(__data, args))
>  
>  #define DECLARE_TRACE_CONDITION(name, proto, args, cond)             \
>       __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),              \
> -                     cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \
> +                     PARAMS(cond),                                   \
>                       PARAMS(void *__data, proto),                    \
>                       PARAMS(__data, args))
>  

Actually, I took a look at it too, and came up with this patch. Can you
test it out? The difference is that it doesn't stop the _rcuidle
version of the trace events to not be called by "cpu_online".

Which brings up another question. Can srcu work on cpu offlined CPUs?

-- Steve

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index d9a084c72541..4aba6c807d28 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -211,8 +211,11 @@ extern void syscall_unregfunc(void);
                        __DO_TRACE(&__tracepoint_##name,                \
                                TP_PROTO(data_proto),                   \
                                TP_ARGS(data_args),                     \
+                               cpu_online(raw_smp_processor_id()) &&   \
                                TP_CONDITION(cond), 0);                 \
-               if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {             \
+               if (IS_ENABLED(CONFIG_LOCKDEP) &&                       \
+                   cpu_online(raw_smp_processor_id()) &&               \
+                   (cond)) {                                           \
                        rcu_read_lock_sched_notrace();                  \
                        rcu_dereference_sched(__tracepoint_##name.funcs);\
                        rcu_read_unlock_sched_notrace();                \
@@ -365,19 +368,17 @@ extern void syscall_unregfunc(void);
  * "void *__data, proto" as the callback prototype.
  */
 #define DECLARE_TRACE_NOARGS(name)                                     \
-       __DECLARE_TRACE(name, void, ,                                   \
-                       cpu_online(raw_smp_processor_id()),             \
+       __DECLARE_TRACE(name, void, , 1,                                \
                        void *__data, __data)
 
 #define DECLARE_TRACE(name, proto, args)                               \
-       __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),              \
-                       cpu_online(raw_smp_processor_id()),             \
+       __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), 1,           \
                        PARAMS(void *__data, proto),                    \
                        PARAMS(__data, args))
 
 #define DECLARE_TRACE_CONDITION(name, proto, args, cond)               \
        __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),              \
-                       cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \
+                       PARAMS(cond),                                   \
                        PARAMS(void *__data, proto),                    \
                        PARAMS(__data, args))
 

Reply via email to