check with jun: the scenario is more like: @@@rcu_start_this_gp@@@ start after ___swait_event before schedule rcu_gp_kthread--> swait_event_idle_exclusive--> __swait_event_idle--> ___swait_event--------->schedule @@@ rcu_gp_kthread_wake skip wakeup in rcu_gp_kthread
then rcu_gp_kthread will sleep and can't wake up. Jun's patch can workaround it, what's your ideas? -----Original Message----- From: Zhang, Jun Sent: Tuesday, December 18, 2018 10:47 AM To: He, Bo <bo...@intel.com>; paul...@linux.ibm.com Cc: Steven Rostedt <rost...@goodmis.org>; linux-kernel@vger.kernel.org; j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin <jin.x...@intel.com>; Zhang, Yanmin <yanmin.zh...@intel.com>; Bai, Jie A <jie.a....@intel.com>; Sun, Yi J <yi.j....@intel.com>; Chang, Junxiao <junxiao.ch...@intel.com>; Mei, Paul <paul....@intel.com> Subject: RE: rcu_preempt caused oom Hello, paul In softirq context, and current is rcu_preempt-10, rcu_gp_kthread_wake don't wakeup rcu_preempt. Maybe next patch could fix it. Please help review. diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 0b760c1..98f5b40 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1697,7 +1697,7 @@ static bool rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) */ static void rcu_gp_kthread_wake(struct rcu_state *rsp) { - if (current == rsp->gp_kthread || + if (((current == rsp->gp_kthread) && !in_softirq()) || !READ_ONCE(rsp->gp_flags) || !rsp->gp_kthread) return; [44932.311439, 0][ rcu_preempt] rcu_preempt-10 [001] .n.. 44929.401037: rcu_grace_period: rcu_preempt 19063548 reqwait ...... [44932.311517, 0][ rcu_preempt] rcu_preempt-10 [001] d.s2 44929.402234: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 Startleaf [44932.311536, 0][ rcu_preempt] rcu_preempt-10 [001] d.s2 44929.402237: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 Startedroot -----Original Message----- From: He, Bo Sent: Tuesday, December 18, 2018 07:16 To: paul...@linux.ibm.com Cc: Zhang, Jun <jun.zh...@intel.com>; Steven Rostedt <rost...@goodmis.org>; linux-kernel@vger.kernel.org; j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin <jin.x...@intel.com>; Zhang, Yanmin <yanmin.zh...@intel.com>; Bai, Jie A <jie.a....@intel.com>; Sun, Yi J <yi.j....@intel.com>; Chang, Junxiao <junxiao.ch...@intel.com>; Mei, Paul <paul....@intel.com> Subject: RE: rcu_preempt caused oom Thanks for your comments, the issue could be panic with the change if (ret == 1). Here enclosed are the logs. -----Original Message----- From: Paul E. McKenney <paul...@linux.ibm.com> Sent: Monday, December 17, 2018 12:26 PM To: He, Bo <bo...@intel.com> Cc: Zhang, Jun <jun.zh...@intel.com>; Steven Rostedt <rost...@goodmis.org>; linux-kernel@vger.kernel.org; j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin <jin.x...@intel.com>; Zhang, Yanmin <yanmin.zh...@intel.com>; Bai, Jie A <jie.a....@intel.com>; Sun, Yi J <yi.j....@intel.com>; Chang, Junxiao <junxiao.ch...@intel.com>; Mei, Paul <paul....@intel.com> Subject: Re: rcu_preempt caused oom On Mon, Dec 17, 2018 at 03:15:42AM +0000, He, Bo wrote: > for double confirm the issue is not reproduce after 90 hours, we tried only > add the enclosed patch on the easy reproduced build, the issue is not > reproduced after 63 hours in the whole weekend on 16 boards. > so current conclusion is the debug patch has extreme effect on the rcu issue. This is not a surprise. (Please see the end of this email for a replacement patch that won't suppress the bug.) To see why this is not a surprise, let's take a closer look at your patch, in light of the comment header for wait_event_idle_timeout_exclusive(): * Returns: * 0 if the @condition evaluated to %false after the @timeout elapsed, * 1 if the @condition evaluated to %true after the @timeout elapsed, * or the remaining jiffies (at least 1) if the @condition evaluated * to %true before the @timeout elapsed. The situation we are seeing is that the RCU_GP_FLAG_INIT is set, but the rcu_preempt task does not wake up. This would correspond to the second case above, that is, a return value of 1. Looking now at your patch, with comments interspersed below: ------------------------------------------------------------------------ >From e8b583aa685b3b4f304f72398a80461bba09389c Mon Sep 17 00:00:00 2001 From: "he, bo" <bo...@intel.com> Date: Sun, 9 Dec 2018 18:11:33 +0800 Subject: [PATCH] rcu: detect the preempt_rcu hang for triage jing's board Change-Id: I2ffceec2ae4847867753609e45c99afc66956003 Tracked-On: Signed-off-by: he, bo <bo...@intel.com> --- kernel/rcu/tree.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 78c0cf2..d6de363 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2192,8 +2192,13 @@ static int __noreturn rcu_gp_kthread(void *arg) int ret; struct rcu_state *rsp = arg; struct rcu_node *rnp = rcu_get_root(rsp); + pid_t rcu_preempt_pid; rcu_bind_gp_kthread(); + if(!strcmp(rsp->name, "rcu_preempt")) { + rcu_preempt_pid = rsp->gp_kthread->pid; + } + for (;;) { /* Handle grace-period start. */ @@ -2202,8 +2207,19 @@ static int __noreturn rcu_gp_kthread(void *arg) READ_ONCE(rsp->gp_seq), TPS("reqwait")); rsp->gp_state = RCU_GP_WAIT_GPS; - swait_event_idle_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) & - RCU_GP_FLAG_INIT); + if (current->pid != rcu_preempt_pid) { + swait_event_idle_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) & + RCU_GP_FLAG_INIT); + } else { + ret = swait_event_idle_timeout_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) & + RCU_GP_FLAG_INIT, 2*HZ); + + if(!ret) { We get here if ret==0. Therefore, the above "if" statement needs to instead be "if (ret == 1) {". In addition, in order to get event traces dumped, we also need: rcu_ftrace_dump(DUMP_ALL); + show_rcu_gp_kthreads(); + panic("hung_task: blocked in rcu_gp_kthread init"); + } + } + rsp->gp_state = RCU_GP_DONE_GPS; /* Locking provides needed memory barrier. */ if (rcu_gp_init(rsp)) -- 2.7.4 ------------------------------------------------------------------------ So, again, please change the "if(!ret) {" to "if (ret == 1) {", and please add "rcu_ftrace_dump(DUMP_ALL);" right after this "if" statement, as shown above. With that change, I bet that you will again see failures. > Compared with the swait_event_idle_timeout_exclusive will do 3 times to check > the condition, while swait_event_idle_ exclusive will do 2 times check the > condition. > > so today I will do another experiment, only change as below: > - swait_event_idle_exclusive(rsp->gp_wq, > READ_ONCE(rsp->gp_flags) & > - RCU_GP_FLAG_INIT); > + ret = swait_event_idle_timeout_exclusive(rsp->gp_wq, > READ_ONCE(rsp->gp_flags) & > + RCU_GP_FLAG_INIT, MAX_SCHEDULE_TIMEOUT); > + > > Can you get some clues from the experiment? Again, please instead make the changes that I called out above, with the replacement for your patch 0001 shown below. Thanx, Paul PS. I have been testing for quite some time, but am still unable to reproduce this. So we must depend on you to reproduce it. ------------------------------------------------------------------------ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 0b760c1369f7..86152af1a580 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2153,8 +2153,13 @@ static int __noreturn rcu_gp_kthread(void *arg) int ret; struct rcu_state *rsp = arg; struct rcu_node *rnp = rcu_get_root(rsp); + pid_t rcu_preempt_pid; rcu_bind_gp_kthread(); + if(!strcmp(rsp->name, "rcu_preempt")) { + rcu_preempt_pid = rsp->gp_kthread->pid; + } + for (;;) { /* Handle grace-period start. */ @@ -2163,8 +2168,20 @@ static int __noreturn rcu_gp_kthread(void *arg) READ_ONCE(rsp->gp_seq), TPS("reqwait")); rsp->gp_state = RCU_GP_WAIT_GPS; - swait_event_idle_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) & - RCU_GP_FLAG_INIT); + if (current->pid != rcu_preempt_pid) { + swait_event_idle_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) & + RCU_GP_FLAG_INIT); + } else { + ret = swait_event_idle_timeout_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) & + RCU_GP_FLAG_INIT, 2*HZ); + + if (ret == 1) { + rcu_ftrace_dump(DUMP_ALL); + show_rcu_gp_kthreads(); + panic("hung_task: blocked in rcu_gp_kthread init"); + } + } + rsp->gp_state = RCU_GP_DONE_GPS; /* Locking provides needed memory barrier. */ if (rcu_gp_init(rsp))