RE: rcu_preempt caused oom

He, Bo Mon, 17 Dec 2018 19:12:58 -0800

check with jun:
the scenario is more like:
                                                                                
                  @@@rcu_start_this_gp@@@ start after ___swait_event before 
schedule
rcu_gp_kthread--> swait_event_idle_exclusive--> __swait_event_idle--> 
___swait_event--------->schedule
                                                                                
            @@@ rcu_gp_kthread_wake skip wakeup in rcu_gp_kthread


then rcu_gp_kthread will sleep and can't wake up.

Jun's patch can workaround it, what's your ideas?


-----Original Message-----
From: Zhang, Jun 
Sent: Tuesday, December 18, 2018 10:47 AM
To: He, Bo <bo...@intel.com>; paul...@linux.ibm.com
Cc: Steven Rostedt <rost...@goodmis.org>; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Xiao, Jin <jin.x...@intel.com>; Zhang, Yanmin <yanmin.zh...@intel.com>; Bai, 
Jie A <jie.a....@intel.com>; Sun, Yi J <yi.j....@intel.com>; Chang, Junxiao 
<junxiao.ch...@intel.com>; Mei, Paul <paul....@intel.com>
Subject: RE: rcu_preempt caused oom

Hello, paul

In softirq context, and current is rcu_preempt-10,  rcu_gp_kthread_wake don't 
wakeup rcu_preempt.
Maybe next patch could fix it. Please help review.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 0b760c1..98f5b40 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1697,7 +1697,7 @@ static bool rcu_future_gp_cleanup(struct rcu_state *rsp, 
struct rcu_node *rnp)
  */
 static void rcu_gp_kthread_wake(struct rcu_state *rsp)  {
-       if (current == rsp->gp_kthread ||
+       if (((current == rsp->gp_kthread) && !in_softirq()) ||
            !READ_ONCE(rsp->gp_flags) ||
            !rsp->gp_kthread)
                return;

[44932.311439, 0][     rcu_preempt]      rcu_preempt-10    [001] .n.. 
44929.401037: rcu_grace_period: rcu_preempt 19063548 reqwait
......
[44932.311517, 0][     rcu_preempt]      rcu_preempt-10    [001] d.s2 
44929.402234: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 
Startleaf
[44932.311536, 0][     rcu_preempt]      rcu_preempt-10    [001] d.s2 
44929.402237: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 
Startedroot


-----Original Message-----
From: He, Bo
Sent: Tuesday, December 18, 2018 07:16
To: paul...@linux.ibm.com
Cc: Zhang, Jun <jun.zh...@intel.com>; Steven Rostedt <rost...@goodmis.org>; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
<jin.x...@intel.com>; Zhang, Yanmin <yanmin.zh...@intel.com>; Bai, Jie A 
<jie.a....@intel.com>; Sun, Yi J <yi.j....@intel.com>; Chang, Junxiao 
<junxiao.ch...@intel.com>; Mei, Paul <paul....@intel.com>
Subject: RE: rcu_preempt caused oom

Thanks for your comments, the issue could be panic with the change if (ret == 
1). Here enclosed are the logs.

-----Original Message-----
From: Paul E. McKenney <paul...@linux.ibm.com>
Sent: Monday, December 17, 2018 12:26 PM
To: He, Bo <bo...@intel.com>
Cc: Zhang, Jun <jun.zh...@intel.com>; Steven Rostedt <rost...@goodmis.org>; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
<jin.x...@intel.com>; Zhang, Yanmin <yanmin.zh...@intel.com>; Bai, Jie A 
<jie.a....@intel.com>; Sun, Yi J <yi.j....@intel.com>; Chang, Junxiao 
<junxiao.ch...@intel.com>; Mei, Paul <paul....@intel.com>
Subject: Re: rcu_preempt caused oom

On Mon, Dec 17, 2018 at 03:15:42AM +0000, He, Bo wrote:
> for double confirm the issue is not reproduce after 90 hours, we tried only 
> add the enclosed patch on the easy reproduced build, the issue is not 
> reproduced after 63 hours in the whole weekend on 16 boards.
> so current conclusion is the debug patch has extreme  effect on the rcu issue.

This is not a surprise.  (Please see the end of this email for a replacement 
patch that won't suppress the bug.)

To see why this is not a surprise, let's take a closer look at your patch, in 
light of the comment header for wait_event_idle_timeout_exclusive():

 * Returns:
 * 0 if the @condition evaluated to %false after the @timeout elapsed,
 * 1 if the @condition evaluated to %true after the @timeout elapsed,
 * or the remaining jiffies (at least 1) if the @condition evaluated
 * to %true before the @timeout elapsed.

The situation we are seeing is that the RCU_GP_FLAG_INIT is set, but the 
rcu_preempt task does not wake up.  This would correspond to the second case 
above, that is, a return value of 1.  Looking now at your patch, with comments 
interspersed below:

------------------------------------------------------------------------

>From e8b583aa685b3b4f304f72398a80461bba09389c Mon Sep 17 00:00:00 2001
From: "he, bo" <bo...@intel.com>
Date: Sun, 9 Dec 2018 18:11:33 +0800
Subject: [PATCH] rcu: detect the preempt_rcu hang for triage jing's board

Change-Id: I2ffceec2ae4847867753609e45c99afc66956003
Tracked-On:
Signed-off-by: he, bo <bo...@intel.com>
---
 kernel/rcu/tree.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 78c0cf2..d6de363 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2192,8 +2192,13 @@ static int __noreturn rcu_gp_kthread(void *arg)
        int ret;
        struct rcu_state *rsp = arg;
        struct rcu_node *rnp = rcu_get_root(rsp);
+       pid_t rcu_preempt_pid;
 
        rcu_bind_gp_kthread();
+       if(!strcmp(rsp->name, "rcu_preempt")) {
+               rcu_preempt_pid = rsp->gp_kthread->pid;
+       }
+
        for (;;) {
 
                /* Handle grace-period start. */
@@ -2202,8 +2207,19 @@ static int __noreturn rcu_gp_kthread(void *arg)
                                               READ_ONCE(rsp->gp_seq),
                                               TPS("reqwait"));
                        rsp->gp_state = RCU_GP_WAIT_GPS;
-                       swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
-                                                    RCU_GP_FLAG_INIT);
+                       if (current->pid != rcu_preempt_pid) {
+                               swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
+                                               RCU_GP_FLAG_INIT);
+                       } else {
+                               ret = 
swait_event_idle_timeout_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
+                                               RCU_GP_FLAG_INIT, 2*HZ);
+
+                               if(!ret) {

We get here if ret==0.  Therefore, the above "if" statement needs to instead be 
"if (ret == 1) {".

In addition, in order to get event traces dumped, we also need:

                                        rcu_ftrace_dump(DUMP_ALL);

+                                       show_rcu_gp_kthreads();
+                                       panic("hung_task: blocked in 
rcu_gp_kthread init");
+                               }
+                       }
+
                        rsp->gp_state = RCU_GP_DONE_GPS;
                        /* Locking provides needed memory barrier. */
                        if (rcu_gp_init(rsp))
--
2.7.4

------------------------------------------------------------------------

So, again, please change the "if(!ret) {" to "if (ret == 1) {", and please add 
"rcu_ftrace_dump(DUMP_ALL);" right after this "if" statement, as shown above.

With that change, I bet that you will again see failures.

> Compared with the swait_event_idle_timeout_exclusive will do 3 times to check 
> the condition, while swait_event_idle_ exclusive will do 2 times check the 
> condition.
> 
> so today I will do another experiment, only change as below:
> -                     swait_event_idle_exclusive(rsp->gp_wq, 
> READ_ONCE(rsp->gp_flags) &
> -                                                  RCU_GP_FLAG_INIT);
> +                     ret = swait_event_idle_timeout_exclusive(rsp->gp_wq, 
> READ_ONCE(rsp->gp_flags) &
> +                                     RCU_GP_FLAG_INIT, MAX_SCHEDULE_TIMEOUT);
> +
> 
> Can you get some clues from the experiment?

Again, please instead make the changes that I called out above, with the 
replacement for your patch 0001 shown below.

                                                        Thanx, Paul

PS.  I have been testing for quite some time, but am still unable
     to reproduce this.  So we must depend on you to reproduce it.

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 
0b760c1369f7..86152af1a580 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2153,8 +2153,13 @@ static int __noreturn rcu_gp_kthread(void *arg)
        int ret;
        struct rcu_state *rsp = arg;
        struct rcu_node *rnp = rcu_get_root(rsp);
+       pid_t rcu_preempt_pid;
 
        rcu_bind_gp_kthread();
+       if(!strcmp(rsp->name, "rcu_preempt")) {
+               rcu_preempt_pid = rsp->gp_kthread->pid;
+       }
+
        for (;;) {
 
                /* Handle grace-period start. */
@@ -2163,8 +2168,20 @@ static int __noreturn rcu_gp_kthread(void *arg)
                                               READ_ONCE(rsp->gp_seq),
                                               TPS("reqwait"));
                        rsp->gp_state = RCU_GP_WAIT_GPS;
-                       swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
-                                                    RCU_GP_FLAG_INIT);
+                       if (current->pid != rcu_preempt_pid) {
+                               swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
+                                               RCU_GP_FLAG_INIT);
+                       } else {
+                               ret = 
swait_event_idle_timeout_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
+                                               RCU_GP_FLAG_INIT, 2*HZ);
+
+                               if (ret == 1) {
+                                       rcu_ftrace_dump(DUMP_ALL);
+                                       show_rcu_gp_kthreads();
+                                       panic("hung_task: blocked in 
rcu_gp_kthread init");
+                               }
+                       }
+
                        rsp->gp_state = RCU_GP_DONE_GPS;
                        /* Locking provides needed memory barrier. */
                        if (rcu_gp_init(rsp))

RE: rcu_preempt caused oom

Reply via email to