subject:"rcu_preempt caused oom"

Re: rcu_preempt caused oom

2018-12-17 Thread Paul E. McKenney

On Tue, Dec 18, 2018 at 02:46:43AM +, Zhang, Jun wrote:
> Hello, paul
> 
> In softirq context, and current is rcu_preempt-10,  rcu_gp_kthread_wake don't 
> wakeup rcu_preempt.
> Maybe next patch could fix it. Please help review.
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 0b760c1..98f5b40 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1697,7 +1697,7 @@ static bool rcu_future_gp_cleanup(struct rcu_state 
> *rsp, struct rcu_node *rnp)
>   */
>  static void rcu_gp_kthread_wake(struct rcu_state *rsp)
>  {
> -   if (current == rsp->gp_kthread ||
> +   if (((current == rsp->gp_kthread) && !in_softirq()) ||

Close, but not quite.  Please see below.

> !READ_ONCE(rsp->gp_flags) ||
> !rsp->gp_kthread)
> return;
> 
> [44932.311439, 0][ rcu_preempt]  rcu_preempt-10[001] .n.. 
> 44929.401037: rcu_grace_period: rcu_preempt 19063548 reqwait
> ..
> [44932.311517, 0][ rcu_preempt]  rcu_preempt-10[001] d.s2 
> 44929.402234: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 
> Startleaf
> [44932.311536, 0][ rcu_preempt]  rcu_preempt-10[001] d.s2 
> 44929.402237: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 
> Startedroot

Good catch!  If the rcu_preempt kthread had just entered the function
swait_event_idle_exclusive(), which had just called __swait_event_idle()
which had just called ___swait_event(), which had just gotten done
checking the "condition", then yes, the rcu_preempt kthread could
sleep forever.  This is a very narrow race window, but that matches
your experience with its not happening often -- and my experience with
it not happening at all.

However, for this to happen, the wakeup must happen within a softirq
handler that executes upon return from an interrupt that interrupted
___swait_event() just after the "if (condition)".  For this, we don't want
in_softirq() but rather in_serving_softirq(), as shown in the patch below.
The patch you have above could result in spurious wakeups, as it is
checking for bottom halves being disabled, not just executing within a
softirq handler.  Which might be better than not having enough wakeups,
but let's please try for just the right number.  ;-)

So could you please instead test the patch below?

And if it works, could I please have your Signed-off-by so that I can
queue it?  My patch is quite clearly derived from yours, after all!
And you should get credit for finding the problem and arriving at an
approximate fix, after all.

Thanx, Paul

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e9392a9d6291..b9205b40b621 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1722,7 +1722,7 @@ static bool rcu_future_gp_cleanup(struct rcu_state *rsp, 
struct rcu_node *rnp)
  */
 static void rcu_gp_kthread_wake(struct rcu_state *rsp)
 {
-   if (current == rsp->gp_kthread ||
+   if ((current == rsp->gp_kthread && !in_serving_softirq()) ||
!READ_ONCE(rsp->gp_flags) ||
!rsp->gp_kthread)
return;

RE: rcu_preempt caused oom

2018-12-17 Thread He, Bo

check with jun:
the scenario is more like:

  @@@rcu_start_this_gp@@@ start after ___swait_event before 
schedule
rcu_gp_kthread--> swait_event_idle_exclusive--> __swait_event_idle--> 
___swait_event->schedule

@@@ rcu_gp_kthread_wake skip wakeup in rcu_gp_kthread

then rcu_gp_kthread will sleep and can't wake up.

Jun's patch can workaround it, what's your ideas?


-Original Message-
From: Zhang, Jun 
Sent: Tuesday, December 18, 2018 10:47 AM
To: He, Bo ; paul...@linux.ibm.com
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Xiao, Jin ; Zhang, Yanmin ; Bai, 
Jie A ; Sun, Yi J ; Chang, Junxiao 
; Mei, Paul 
Subject: RE: rcu_preempt caused oom

Hello, paul

In softirq context, and current is rcu_preempt-10,  rcu_gp_kthread_wake don't 
wakeup rcu_preempt.
Maybe next patch could fix it. Please help review.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 0b760c1..98f5b40 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1697,7 +1697,7 @@ static bool rcu_future_gp_cleanup(struct rcu_state *rsp, 
struct rcu_node *rnp)
  */
 static void rcu_gp_kthread_wake(struct rcu_state *rsp)  {
-   if (current == rsp->gp_kthread ||
+   if (((current == rsp->gp_kthread) && !in_softirq()) ||
!READ_ONCE(rsp->gp_flags) ||
!rsp->gp_kthread)
return;

[44932.311439, 0][ rcu_preempt]  rcu_preempt-10[001] .n.. 
44929.401037: rcu_grace_period: rcu_preempt 19063548 reqwait
..
[44932.311517, 0][ rcu_preempt]  rcu_preempt-10[001] d.s2 
44929.402234: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 
Startleaf
[44932.311536, 0][ rcu_preempt]  rcu_preempt-10[001] d.s2 
44929.402237: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 
Startedroot


-Original Message-
From: He, Bo
Sent: Tuesday, December 18, 2018 07:16
To: paul...@linux.ibm.com
Cc: Zhang, Jun ; Steven Rostedt ; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
; Zhang, Yanmin ; Bai, Jie A 
; Sun, Yi J ; Chang, Junxiao 
; Mei, Paul 
Subject: RE: rcu_preempt caused oom

Thanks for your comments, the issue could be panic with the change if (ret == 
1). Here enclosed are the logs.

-Original Message-
From: Paul E. McKenney 
Sent: Monday, December 17, 2018 12:26 PM
To: He, Bo 
Cc: Zhang, Jun ; Steven Rostedt ; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
; Zhang, Yanmin ; Bai, Jie A 
; Sun, Yi J ; Chang, Junxiao 
; Mei, Paul 
Subject: Re: rcu_preempt caused oom

On Mon, Dec 17, 2018 at 03:15:42AM +, He, Bo wrote:
> for double confirm the issue is not reproduce after 90 hours, we tried only 
> add the enclosed patch on the easy reproduced build, the issue is not 
> reproduced after 63 hours in the whole weekend on 16 boards.
> so current conclusion is the debug patch has extreme  effect on the rcu issue.

This is not a surprise.  (Please see the end of this email for a replacement 
patch that won't suppress the bug.)

To see why this is not a surprise, let's take a closer look at your patch, in 
light of the comment header for wait_event_idle_timeout_exclusive():

 * Returns:
 * 0 if the @condition evaluated to %false after the @timeout elapsed,
 * 1 if the @condition evaluated to %true after the @timeout elapsed,
 * or the remaining jiffies (at least 1) if the @condition evaluated
 * to %true before the @timeout elapsed.

The situation we are seeing is that the RCU_GP_FLAG_INIT is set, but the 
rcu_preempt task does not wake up.  This would correspond to the second case 
above, that is, a return value of 1.  Looking now at your patch, with comments 
interspersed below:



>From e8b583aa685b3b4f304f72398a80461bba09389c Mon Sep 17 00:00:00 2001
From: "he, bo" 
Date: Sun, 9 Dec 2018 18:11:33 +0800
Subject: [PATCH] rcu: detect the preempt_rcu hang for triage jing's board

Change-Id: I2ffceec2ae4847867753609e45c99afc66956003
Tracked-On:
Signed-off-by: he, bo 
---
 kernel/rcu/tree.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 78c0cf2..d6de363 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2192,8 +2192,13 @@ static int __noreturn rcu_gp_kthread(void *arg)
int ret;
struct rcu_state *rsp = arg;
struct rcu_node *rnp = rcu_get_root(rsp);
+   pid_t rcu_preempt_pid;
 
rcu_bind_gp_kthread();
+   if

RE: rcu_preempt caused oom

2018-12-17 Thread Zhang, Jun

Hello, paul

In softirq context, and current is rcu_preempt-10,  rcu_gp_kthread_wake don't 
wakeup rcu_preempt.
Maybe next patch could fix it. Please help review.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0b760c1..98f5b40 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1697,7 +1697,7 @@ static bool rcu_future_gp_cleanup(struct rcu_state *rsp, 
struct rcu_node *rnp)
  */
 static void rcu_gp_kthread_wake(struct rcu_state *rsp)
 {
-   if (current == rsp->gp_kthread ||
+   if (((current == rsp->gp_kthread) && !in_softirq()) ||
!READ_ONCE(rsp->gp_flags) ||
!rsp->gp_kthread)
return;

[44932.311439, 0][ rcu_preempt]  rcu_preempt-10[001] .n.. 
44929.401037: rcu_grace_period: rcu_preempt 19063548 reqwait
..
[44932.311517, 0][ rcu_preempt]  rcu_preempt-10[001] d.s2 
44929.402234: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 
Startleaf
[44932.311536, 0][ rcu_preempt]  rcu_preempt-10[001] d.s2 
44929.402237: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 
Startedroot


-Original Message-
From: He, Bo 
Sent: Tuesday, December 18, 2018 07:16
To: paul...@linux.ibm.com
Cc: Zhang, Jun ; Steven Rostedt ; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
; Zhang, Yanmin ; Bai, Jie A 
; Sun, Yi J ; Chang, Junxiao 
; Mei, Paul 
Subject: RE: rcu_preempt caused oom

Thanks for your comments, the issue could be panic with the change if (ret == 
1). Here enclosed are the logs.

-Original Message-
From: Paul E. McKenney  
Sent: Monday, December 17, 2018 12:26 PM
To: He, Bo 
Cc: Zhang, Jun ; Steven Rostedt ; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
; Zhang, Yanmin ; Bai, Jie A 
; Sun, Yi J ; Chang, Junxiao 
; Mei, Paul 
Subject: Re: rcu_preempt caused oom

On Mon, Dec 17, 2018 at 03:15:42AM +, He, Bo wrote:
> for double confirm the issue is not reproduce after 90 hours, we tried only 
> add the enclosed patch on the easy reproduced build, the issue is not 
> reproduced after 63 hours in the whole weekend on 16 boards.
> so current conclusion is the debug patch has extreme  effect on the rcu issue.

This is not a surprise.  (Please see the end of this email for a replacement 
patch that won't suppress the bug.)

To see why this is not a surprise, let's take a closer look at your patch, in 
light of the comment header for wait_event_idle_timeout_exclusive():

 * Returns:
 * 0 if the @condition evaluated to %false after the @timeout elapsed,
 * 1 if the @condition evaluated to %true after the @timeout elapsed,
 * or the remaining jiffies (at least 1) if the @condition evaluated
 * to %true before the @timeout elapsed.

The situation we are seeing is that the RCU_GP_FLAG_INIT is set, but the 
rcu_preempt task does not wake up.  This would correspond to the second case 
above, that is, a return value of 1.  Looking now at your patch, with comments 
interspersed below:



>From e8b583aa685b3b4f304f72398a80461bba09389c Mon Sep 17 00:00:00 2001
From: "he, bo" 
Date: Sun, 9 Dec 2018 18:11:33 +0800
Subject: [PATCH] rcu: detect the preempt_rcu hang for triage jing's board

Change-Id: I2ffceec2ae4847867753609e45c99afc66956003
Tracked-On:
Signed-off-by: he, bo 
---
 kernel/rcu/tree.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 78c0cf2..d6de363 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2192,8 +2192,13 @@ static int __noreturn rcu_gp_kthread(void *arg)
int ret;
struct rcu_state *rsp = arg;
struct rcu_node *rnp = rcu_get_root(rsp);
+   pid_t rcu_preempt_pid;
 
rcu_bind_gp_kthread();
+   if(!strcmp(rsp->name, "rcu_preempt")) {
+   rcu_preempt_pid = rsp->gp_kthread->pid;
+   }
+
for (;;) {
 
/* Handle grace-period start. */
@@ -2202,8 +2207,19 @@ static int __noreturn rcu_gp_kthread(void *arg)
   READ_ONCE(rsp->gp_seq),
   TPS("reqwait"));
rsp->gp_state = RCU_GP_WAIT_GPS;
-   swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
-RCU_GP_FLAG_INIT);
+   if (current->pid != rcu_preempt_pid) {
+   swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
+   RCU_GP_FLAG_INIT);
+   } else {
+   ret = 
swait_event_idle_time

Re: rcu_preempt caused oom

2018-12-16 Thread Paul E. McKenney

On Mon, Dec 17, 2018 at 03:15:42AM +, He, Bo wrote:
> for double confirm the issue is not reproduce after 90 hours, we tried only 
> add the enclosed patch on the easy reproduced build, the issue is not 
> reproduced after 63 hours in the whole weekend on 16 boards.
> so current conclusion is the debug patch has extreme  effect on the rcu issue.

This is not a surprise.  (Please see the end of this email for a
replacement patch that won't suppress the bug.)

To see why this is not a surprise, let's take a closer look at your patch,
in light of the comment header for wait_event_idle_timeout_exclusive():

 * Returns:
 * 0 if the @condition evaluated to %false after the @timeout elapsed,
 * 1 if the @condition evaluated to %true after the @timeout elapsed,
 * or the remaining jiffies (at least 1) if the @condition evaluated
 * to %true before the @timeout elapsed.

The situation we are seeing is that the RCU_GP_FLAG_INIT is set, but
the rcu_preempt task does not wake up.  This would correspond to
the second case above, that is, a return value of 1.  Looking now
at your patch, with comments interspersed below:

>From e8b583aa685b3b4f304f72398a80461bba09389c Mon Sep 17 00:00:00 2001
From: "he, bo" 
Date: Sun, 9 Dec 2018 18:11:33 +0800
Subject: [PATCH] rcu: detect the preempt_rcu hang for triage jing's board

Change-Id: I2ffceec2ae4847867753609e45c99afc66956003
Tracked-On:
Signed-off-by: he, bo 
---
 kernel/rcu/tree.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 78c0cf2..d6de363 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2192,8 +2192,13 @@ static int __noreturn rcu_gp_kthread(void *arg)
int ret;
struct rcu_state *rsp = arg;
struct rcu_node *rnp = rcu_get_root(rsp);
+   pid_t rcu_preempt_pid;

rcu_bind_gp_kthread();
+   if(!strcmp(rsp->name, "rcu_preempt")) {
+   rcu_preempt_pid = rsp->gp_kthread->pid;
+   }
+
for (;;) {

/* Handle grace-period start. */
@@ -2202,8 +2207,19 @@ static int __noreturn rcu_gp_kthread(void *arg)
   READ_ONCE(rsp->gp_seq),
   TPS("reqwait"));
rsp->gp_state = RCU_GP_WAIT_GPS;
-   swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
-RCU_GP_FLAG_INIT);
+   if (current->pid != rcu_preempt_pid) {
+   swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
+   RCU_GP_FLAG_INIT);
+   } else {
+   ret = 
swait_event_idle_timeout_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
+   RCU_GP_FLAG_INIT, 2*HZ);
+
+   if(!ret) {

We get here if ret==0.  Therefore, the above "if" statement needs to
instead be "if (ret == 1) {".

In addition, in order to get event traces dumped, we also need:

rcu_ftrace_dump(DUMP_ALL);

+   show_rcu_gp_kthreads();
+   panic("hung_task: blocked in 
rcu_gp_kthread init");
+   }
+   }
+
rsp->gp_state = RCU_GP_DONE_GPS;
/* Locking provides needed memory barrier. */
if (rcu_gp_init(rsp))
-- 
2.7.4

So, again, please change the "if(!ret) {" to "if (ret == 1) {", and
please add "rcu_ftrace_dump(DUMP_ALL);" right after this "if" statement,
as shown above.

With that change, I bet that you will again see failures.

> Compared with the swait_event_idle_timeout_exclusive will do 3 times to check 
> the condition, while swait_event_idle_ exclusive will do 2 times check the 
> condition.
> 
> so today I will do another experiment, only change as below:
> - swait_event_idle_exclusive(rsp->gp_wq, 
> READ_ONCE(rsp->gp_flags) &
> -  RCU_GP_FLAG_INIT);
> + ret = swait_event_idle_timeout_exclusive(rsp->gp_wq, 
> READ_ONCE(rsp->gp_flags) &
> + RCU_GP_FLAG_INIT, MAX_SCHEDULE_TIMEOUT);
> +
> 
> Can you get some clues from the experiment?

Again, please instead make the changes that I called out above, with
the replacement for your patch 0001 shown below.

Thanx, Paul

PS.  I have been testing for quite some time, but am still unable
 to reproduce this.  So we must depend on you to reproduce it.

RE: rcu_preempt caused oom

2018-12-16 Thread He, Bo

for double confirm the issue is not reproduce after 90 hours, we tried only add 
the enclosed patch on the easy reproduced build, the issue is not reproduced 
after 63 hours in the whole weekend on 16 boards.
so current conclusion is the debug patch has extreme  effect on the rcu issue.

Compared with the swait_event_idle_timeout_exclusive will do 3 times to check 
the condition, while swait_event_idle_ exclusive will do 2 times check the 
condition.

so today I will do another experiment, only change as below:
-   swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
-RCU_GP_FLAG_INIT);
+   ret = swait_event_idle_timeout_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
+   RCU_GP_FLAG_INIT, MAX_SCHEDULE_TIMEOUT);
+

Can you get some clues from the experiment?

-Original Message-
From: Paul E. McKenney  
Sent: Friday, December 14, 2018 1:39 PM
To: He, Bo 
Cc: Zhang, Jun ; Steven Rostedt ; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
; Zhang, Yanmin ; Bai, Jie A 
; Sun, Yi J 
Subject: Re: rcu_preempt caused oom

On Thu, Dec 13, 2018 at 09:10:12PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 14, 2018 at 02:40:50AM +, He, Bo wrote:
> > another experiment we have done with the enclosed debug patch, and also 
> > have more rcu trace event enable but without CONFIG_RCU_BOOST config, we 
> > don't reproduce the issue after 90 Hours until now on 10 boards(the issue 
> > should reproduce on one night per previous experience).
> 
> That certainly supports the hypothesis that a wakeup is either not 
> being sent or is being lost.  Your patch is great for debugging (thank 
> you!), but the real solution of course needs to avoid the extra 
> wakeups, especially on battery-powered systems.
> 
> One suggested change below, to get rid of potential false positives.
> 
> > the purposes are to capture the more rcu event trace close to the issue 
> > happen, because I check the __wait_rcu_gp is not always in running, so we 
> > think even it trigger the panic for 3s timeout, the issue is already 
> > happened before 3s.
> 
> Agreed, it would be really good to have trace information from the cause.
> In the case you sent yesterday, it would be good to have trace 
> information from 308.256 seconds prior to the sysrq-v, for example, by 
> collecting the same event traces you did a few days ago.  It would 
> also be good to know whether the scheduler tick is providing 
> interrupts, and if so, why
> rcu_check_gp_start_stall() isn't being invoked.  ;-)
> 
> If collecting this information with your setup is not feasible (for 
> example, you might need a large trace buffer to capture five minutes 
> of traces), please let me know and I can provide additional debug 
> code.  Or you could add "rcu_ftrace_dump(DUMP_ALL);" just before the 
> "show_rcu_gp_kthreads();" in your patch below.
> 
> > And Actually the rsp->gp_flags = 1, but RCU_GP_WAIT_GPS(1) ->state: 0x402, 
> > it means the kthread is not schedule for 300s but the RCU_GP_FLAG_INIT is 
> > set. What's your ideas? 
> 
> The most likely possibility is that my analysis below is confused and 
> there really is some way that the code can set the RCU_GP_FLAG_INIT 
> bit without later doing a wakeup.  The trace data above could help 
> unconfuse me.
> 
>   Thanx, Paul
> 
> > -
> > -   swait_event_idle_exclusive(rsp->gp_wq, 
> > READ_ONCE(rsp->gp_flags) &
> > -RCU_GP_FLAG_INIT);
> > +   if (current->pid != rcu_preempt_pid) {
> > +   swait_event_idle_exclusive(rsp->gp_wq, 
> > READ_ONCE(rsp->gp_flags) &
> > +   RCU_GP_FLAG_INIT);
> > +   } else {
> 
> wait_again:
> 
> > +   ret = 
> > swait_event_idle_timeout_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
> > +   RCU_GP_FLAG_INIT, 2*HZ);
> > +
> > +   if(!ret) {
> 
> This would avoid complaining if RCU was legitimately idle for a long time:

Let's try this again.  Unless I am confused (quite possible) your original 
would panic if RCU was idle for more than two seconds.  What we

Re: rcu_preempt caused oom

2018-12-13 Thread Paul E. McKenney

On Thu, Dec 13, 2018 at 09:10:12PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 14, 2018 at 02:40:50AM +, He, Bo wrote:
> > another experiment we have done with the enclosed debug patch, and also 
> > have more rcu trace event enable but without CONFIG_RCU_BOOST config, we 
> > don't reproduce the issue after 90 Hours until now on 10 boards(the issue 
> > should reproduce on one night per previous experience).
> 
> That certainly supports the hypothesis that a wakeup is either not
> being sent or is being lost.  Your patch is great for debugging (thank
> you!), but the real solution of course needs to avoid the extra wakeups,
> especially on battery-powered systems.
> 
> One suggested change below, to get rid of potential false positives.
> 
> > the purposes are to capture the more rcu event trace close to the issue 
> > happen, because I check the __wait_rcu_gp is not always in running, so we 
> > think even it trigger the panic for 3s timeout, the issue is already 
> > happened before 3s.
> 
> Agreed, it would be really good to have trace information from the cause.
> In the case you sent yesterday, it would be good to have trace information
> from 308.256 seconds prior to the sysrq-v, for example, by collecting the
> same event traces you did a few days ago.  It would also be good to know
> whether the scheduler tick is providing interrupts, and if so, why
> rcu_check_gp_start_stall() isn't being invoked.  ;-)
> 
> If collecting this information with your setup is not feasible (for
> example, you might need a large trace buffer to capture five minutes
> of traces), please let me know and I can provide additional debug
> code.  Or you could add "rcu_ftrace_dump(DUMP_ALL);" just before the
> "show_rcu_gp_kthreads();" in your patch below.
> 
> > And Actually the rsp->gp_flags = 1, but RCU_GP_WAIT_GPS(1) ->state: 0x402, 
> > it means the kthread is not schedule for 300s but the RCU_GP_FLAG_INIT is 
> > set. What's your ideas? 
> 
> The most likely possibility is that my analysis below is confused and
> there really is some way that the code can set the RCU_GP_FLAG_INIT
> bit without later doing a wakeup.  The trace data above could help
> unconfuse me.
> 
>   Thanx, Paul
> 
> > -
> > -   swait_event_idle_exclusive(rsp->gp_wq, 
> > READ_ONCE(rsp->gp_flags) &
> > -RCU_GP_FLAG_INIT);
> > +   if (current->pid != rcu_preempt_pid) {
> > +   swait_event_idle_exclusive(rsp->gp_wq, 
> > READ_ONCE(rsp->gp_flags) &
> > +   RCU_GP_FLAG_INIT);
> > +   } else {
> 
> wait_again:
> 
> > +   ret = 
> > swait_event_idle_timeout_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
> > +   RCU_GP_FLAG_INIT, 2*HZ);
> > +
> > +   if(!ret) {
> 
> This would avoid complaining if RCU was legitimately idle for a long time:

Let's try this again.  Unless I am confused (quite possible) your original
would panic if RCU was idle for more than two seconds.  What we instead
want is to panic if we time out, but end up with RCU_GP_FLAG_INIT set.

So something like this:

if (ret == 1) {
/* Timed out with RCU_GP_FLAG_INIT. */
rcu_ftrace_dump(DUMP_ALL);
show_rcu_gp_kthreads();
panic("hung_task: blocked in 
rcu_gp_kthread init");
} else if (!ret) {
/* Timed out w/out RCU_GP_FLAG_INIT. */
goto wait_again;
}

Thanx, Paul

> > +   show_rcu_gp_kthreads();
> > +   panic("hung_task: blocked in 
> > rcu_gp_kthread init");
> > +   }
> > +   }
> > --------------
> > -Original Message-
> > From: Paul E. McKenney  
> > Sent: Friday, December 14, 2018 10:15 AM
> > To: He, Bo 
> > Cc: Zhang, Jun ; Steven Rostedt ; 
> > linux-kernel@vger

Re: rcu_preempt caused oom

2018-12-13 Thread Paul E. McKenney

On Fri, Dec 14, 2018 at 02:40:50AM +, He, Bo wrote:
> another experiment we have done with the enclosed debug patch, and also have 
> more rcu trace event enable but without CONFIG_RCU_BOOST config, we don't 
> reproduce the issue after 90 Hours until now on 10 boards(the issue should 
> reproduce on one night per previous experience).

That certainly supports the hypothesis that a wakeup is either not
being sent or is being lost.  Your patch is great for debugging (thank
you!), but the real solution of course needs to avoid the extra wakeups,
especially on battery-powered systems.

One suggested change below, to get rid of potential false positives.

> the purposes are to capture the more rcu event trace close to the issue 
> happen, because I check the __wait_rcu_gp is not always in running, so we 
> think even it trigger the panic for 3s timeout, the issue is already happened 
> before 3s.

Agreed, it would be really good to have trace information from the cause.
In the case you sent yesterday, it would be good to have trace information
from 308.256 seconds prior to the sysrq-v, for example, by collecting the
same event traces you did a few days ago.  It would also be good to know
whether the scheduler tick is providing interrupts, and if so, why
rcu_check_gp_start_stall() isn't being invoked.  ;-)

If collecting this information with your setup is not feasible (for
example, you might need a large trace buffer to capture five minutes
of traces), please let me know and I can provide additional debug
code.  Or you could add "rcu_ftrace_dump(DUMP_ALL);" just before the
"show_rcu_gp_kthreads();" in your patch below.

> And Actually the rsp->gp_flags = 1, but RCU_GP_WAIT_GPS(1) ->state: 0x402, it 
> means the kthread is not schedule for 300s but the RCU_GP_FLAG_INIT is set. 
> What's your ideas? 

The most likely possibility is that my analysis below is confused and
there really is some way that the code can set the RCU_GP_FLAG_INIT
bit without later doing a wakeup.  The trace data above could help
unconfuse me.

Thanx, Paul

> -
> - swait_event_idle_exclusive(rsp->gp_wq, 
> READ_ONCE(rsp->gp_flags) &
> -  RCU_GP_FLAG_INIT);
> + if (current->pid != rcu_preempt_pid) {
> + swait_event_idle_exclusive(rsp->gp_wq, 
> READ_ONCE(rsp->gp_flags) &
> + RCU_GP_FLAG_INIT);
> + } else {

wait_again:

> + ret = 
> swait_event_idle_timeout_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
> + RCU_GP_FLAG_INIT, 2*HZ);
> +
> + if(!ret) {

This would avoid complaining if RCU was legitimately idle for a long time:

if(!ret && !READ_ONCE(rsp->gp_flags)) {
rcu_ftrace_dump(DUMP_ALL);
show_rcu_gp_kthreads();
panic("hung_task: blocked in 
rcu_gp_kthread init");
} else if (!ret) {
goto wait_again;
}

> + show_rcu_gp_kthreads();
> + panic("hung_task: blocked in 
> rcu_gp_kthread init");
> + }
> + }
> --
> -Original Message-
> From: Paul E. McKenney  
> Sent: Friday, December 14, 2018 10:15 AM
> To: He, Bo 
> Cc: Zhang, Jun ; Steven Rostedt ; 
> linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> ; Sun, Yi J 
> Subject: Re: rcu_preempt caused oom
> 
> On Fri, Dec 14, 2018 at 01:30:04AM +, He, Bo wrote:
> > as you mentioned CONFIG_FAST_NO_HZ, do you mean CONFIG_RCU_FAST_NO_HZ? I 
> > double checked there is no FAST_NO_HZ in .config:
> 
> Yes, you are correct, CONFIG_RCU_FAST_NO_HZ.  OK, you do not have it set, 
> which means several code paths can be ignored.  Also CONFIG_HZ=1000, so
> 300 second delay.
> 
>   Thanx, Paul
> 
> > Here is the grep from .config:
> > egrep "HZ|RCU" .config
> > CONFIG_NO_HZ_COMMON=y
> > # CONFIG_HZ_PERIODIC is not set
> >

RE: rcu_preempt caused oom

2018-12-13 Thread He, Bo

another experiment we have done with the enclosed debug patch, and also have 
more rcu trace event enable but without CONFIG_RCU_BOOST config, we don't 
reproduce the issue after 90 Hours until now on 10 boards(the issue should 
reproduce on one night per previous experience).

the purposes are to capture the more rcu event trace close to the issue happen, 
because I check the __wait_rcu_gp is not always in running, so we think even it 
trigger the panic for 3s timeout, the issue is already happened before 3s.

And Actually the rsp->gp_flags = 1, but RCU_GP_WAIT_GPS(1) ->state: 0x402, it 
means the kthread is not schedule for 300s but the RCU_GP_FLAG_INIT is set. 
What's your ideas? 
-
-   swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
-RCU_GP_FLAG_INIT);
+   if (current->pid != rcu_preempt_pid) {
+   swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
+   RCU_GP_FLAG_INIT);
+   } else {
+   ret = 
swait_event_idle_timeout_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
+   RCU_GP_FLAG_INIT, 2*HZ);
+
+   if(!ret) {
+   show_rcu_gp_kthreads();
+   panic("hung_task: blocked in 
rcu_gp_kthread init");
+   }
+   }
--
-Original Message-
From: Paul E. McKenney  
Sent: Friday, December 14, 2018 10:15 AM
To: He, Bo 
Cc: Zhang, Jun ; Steven Rostedt ; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
; Zhang, Yanmin ; Bai, Jie A 
; Sun, Yi J 
Subject: Re: rcu_preempt caused oom

On Fri, Dec 14, 2018 at 01:30:04AM +, He, Bo wrote:
> as you mentioned CONFIG_FAST_NO_HZ, do you mean CONFIG_RCU_FAST_NO_HZ? I 
> double checked there is no FAST_NO_HZ in .config:

Yes, you are correct, CONFIG_RCU_FAST_NO_HZ.  OK, you do not have it set, which 
means several code paths can be ignored.  Also CONFIG_HZ=1000, so
300 second delay.

Thanx, Paul

> Here is the grep from .config:
> egrep "HZ|RCU" .config
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> CONFIG_NO_HZ=y
> # RCU Subsystem
> CONFIG_PREEMPT_RCU=y
> # CONFIG_RCU_EXPERT is not set
> CONFIG_SRCU=y
> CONFIG_TREE_SRCU=y
> CONFIG_TASKS_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_NEED_SEGCBLIST=y
> # CONFIG_HZ_100 is not set
> # CONFIG_HZ_250 is not set
> # CONFIG_HZ_300 is not set
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
> # CONFIG_MACHZ_WDT is not set
> # RCU Debugging
> CONFIG_PROVE_RCU=y
> CONFIG_RCU_PERF_TEST=m
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_TIMEOUT=7
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_EQS_DEBUG=y
> 
> -Original Message-
> From: Paul E. McKenney 
> Sent: Friday, December 14, 2018 2:12 AM
> To: He, Bo 
> Cc: Zhang, Jun ; Steven Rostedt 
> ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Xiao, Jin ; Zhang, Yanmin 
> ; Bai, Jie A ; Sun, Yi J 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Thu, Dec 13, 2018 at 03:26:08PM +, He, Bo wrote:
> > one of the board reproduce the issue with the show_rcu_gp_kthreads(), I 
> > also enclosed the logs as attachment.
> > 
> > [17818.936032] rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 
> > 0x402 delta ->gp_activity 308257 ->gp_req_activity 308256 ->gp_wake_time 
> > 308258 ->gp_wake_seq   21808189 ->gp_seq 21808192 ->gp_seq_needed 
> > 21808196 ->gp_flags 0x1
> 
> This is quite helpful, thank you!
> 
> The "RCU lockdep checking is enabled" says that CONFIG_PROVE_RCU=y, which is 
> good.  The "RCU_GP_WAIT_GPS(1)" means that the rcu_preempt task is waiting 
> for a new grace-period request.  The "->state: 0x402" means that it is 
> sleeping, neither running nor in the process of waking up.
> The "delta ->gp_activity 308257 ->gp_req_activity 308256 ->gp_wake_time 
> 308258" means that it has been more than 300,000 jiffies since the 
> rcu_preempt task did anything or was requested to do anything.
> 
> The "->gp_wake_seq 21

Re: rcu_preempt caused oom

2018-12-13 Thread Paul E. McKenney

On Fri, Dec 14, 2018 at 01:30:04AM +, He, Bo wrote:
> as you mentioned CONFIG_FAST_NO_HZ, do you mean CONFIG_RCU_FAST_NO_HZ? I 
> double checked there is no FAST_NO_HZ in .config:

Yes, you are correct, CONFIG_RCU_FAST_NO_HZ.  OK, you do not have it set,
which means several code paths can be ignored.  Also CONFIG_HZ=1000, so
300 second delay.

Thanx, Paul

> Here is the grep from .config:
> egrep "HZ|RCU" .config
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> CONFIG_NO_HZ=y
> # RCU Subsystem
> CONFIG_PREEMPT_RCU=y
> # CONFIG_RCU_EXPERT is not set
> CONFIG_SRCU=y
> CONFIG_TREE_SRCU=y
> CONFIG_TASKS_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_NEED_SEGCBLIST=y
> # CONFIG_HZ_100 is not set
> # CONFIG_HZ_250 is not set
> # CONFIG_HZ_300 is not set
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
> # CONFIG_MACHZ_WDT is not set
> # RCU Debugging
> CONFIG_PROVE_RCU=y
> CONFIG_RCU_PERF_TEST=m
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_TIMEOUT=7
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_EQS_DEBUG=y
> 
> -Original Message-
> From: Paul E. McKenney  
> Sent: Friday, December 14, 2018 2:12 AM
> To: He, Bo 
> Cc: Zhang, Jun ; Steven Rostedt ; 
> linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> ; Sun, Yi J 
> Subject: Re: rcu_preempt caused oom
> 
> On Thu, Dec 13, 2018 at 03:26:08PM +, He, Bo wrote:
> > one of the board reproduce the issue with the show_rcu_gp_kthreads(), I 
> > also enclosed the logs as attachment.
> > 
> > [17818.936032] rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 
> > 0x402 delta ->gp_activity 308257 ->gp_req_activity 308256 ->gp_wake_time 
> > 308258 ->gp_wake_seq   21808189 ->gp_seq 21808192 ->gp_seq_needed 
> > 21808196 ->gp_flags 0x1
> 
> This is quite helpful, thank you!
> 
> The "RCU lockdep checking is enabled" says that CONFIG_PROVE_RCU=y, which is 
> good.  The "RCU_GP_WAIT_GPS(1)" means that the rcu_preempt task is waiting 
> for a new grace-period request.  The "->state: 0x402" means that it is 
> sleeping, neither running nor in the process of waking up.
> The "delta ->gp_activity 308257 ->gp_req_activity 308256 ->gp_wake_time 
> 308258" means that it has been more than 300,000 jiffies since the 
> rcu_preempt task did anything or was requested to do anything.
> 
> The "->gp_wake_seq 21808189 ->gp_seq 21808192" says that the last attempt to 
> awaken the rcu_preempt task happened during the last grace period.
> The "->gp_seq_needed 21808196 ->gp_flags 0x1" nevertheless says that someone 
> requested a new grace period.  So if the rcu_preempt task were to wake up, it 
> would process the new grace period.  Note again also the ->gp_req_activity 
> 308256, which indicates that ->gp_flags was set more than 300,000 jiffies 
> ago, just after the last recorded activity of the rcu_preempt task.
> 
> But this is exactly the situation that rcu_check_gp_start_stall() is designed 
> to warn about (and does warn about for me when I comment out the wakeup 
> code).  So why is rcu_check_gp_start_stall() not being called?  Here are a 
> couple of possibilities:
> 
> 1.Because rcu_check_gp_start_stall() is only ever invoked from
>   RCU_SOFTIRQ, it is possible that softirqs are stalled for
>   whatever reason.
> 
> 2.Because RCU_SOFTIRQ is invoked primarily from the scheduler-clock
>   interrupt handler, it is possible that the scheduler tick has
>   somehow been disabled.  Traces from earlier runs showed a great
>   deal of RCU callbacks queued, which would have caused RCU to
>   refuse to allow the scheduler tick to be disabled, even if the
>   corresponding CPU was idle.
> 
> 3.You have CONFIG_FAST_NO_HZ=y (which you probably do, given
>   that you are building for a battery-powered device) and all of the
>   CPU's callbacks are lazy.  Except that your earlier traces showed
>   lots of non-lazy callbacks.  Besides, even if all callbacks were
>   lazy, there would still be a scheduling-clock interrupt every
>   six seconds, and there are quite a few six-second intervals
>   in a two-minute watchdog timeout.
> 
>   But if we cannot find the problem quickly, I will likely ask
>   you to try reproducing with CONFIG_FAST_NO_HZ=n.  This could
>   be thought of as bisecting the RCU code looking for the bug.
> 
> The first two of these

RE: rcu_preempt caused oom

2018-12-13 Thread He, Bo

as you mentioned CONFIG_FAST_NO_HZ, do you mean CONFIG_RCU_FAST_NO_HZ? I double 
checked there is no FAST_NO_HZ in .config:

Here is the grep from .config:
egrep "HZ|RCU" .config
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
# RCU Subsystem
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
# CONFIG_MACHZ_WDT is not set
# RCU Debugging
CONFIG_PROVE_RCU=y
CONFIG_RCU_PERF_TEST=m
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=7
CONFIG_RCU_TRACE=y
CONFIG_RCU_EQS_DEBUG=y

-Original Message-
From: Paul E. McKenney  
Sent: Friday, December 14, 2018 2:12 AM
To: He, Bo 
Cc: Zhang, Jun ; Steven Rostedt ; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
; Zhang, Yanmin ; Bai, Jie A 
; Sun, Yi J 
Subject: Re: rcu_preempt caused oom

On Thu, Dec 13, 2018 at 03:26:08PM +, He, Bo wrote:
> one of the board reproduce the issue with the show_rcu_gp_kthreads(), I also 
> enclosed the logs as attachment.
> 
> [17818.936032] rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 
> 0x402 delta ->gp_activity 308257 ->gp_req_activity 308256 ->gp_wake_time 
> 308258 ->gp_wake_seq   21808189 ->gp_seq 21808192 ->gp_seq_needed 
> 21808196 ->gp_flags 0x1

This is quite helpful, thank you!

The "RCU lockdep checking is enabled" says that CONFIG_PROVE_RCU=y, which is 
good.  The "RCU_GP_WAIT_GPS(1)" means that the rcu_preempt task is waiting for 
a new grace-period request.  The "->state: 0x402" means that it is sleeping, 
neither running nor in the process of waking up.
The "delta ->gp_activity 308257 ->gp_req_activity 308256 ->gp_wake_time 308258" 
means that it has been more than 300,000 jiffies since the rcu_preempt task did 
anything or was requested to do anything.

The "->gp_wake_seq 21808189 ->gp_seq 21808192" says that the last attempt to 
awaken the rcu_preempt task happened during the last grace period.
The "->gp_seq_needed 21808196 ->gp_flags 0x1" nevertheless says that someone 
requested a new grace period.  So if the rcu_preempt task were to wake up, it 
would process the new grace period.  Note again also the ->gp_req_activity 
308256, which indicates that ->gp_flags was set more than 300,000 jiffies ago, 
just after the last recorded activity of the rcu_preempt task.

But this is exactly the situation that rcu_check_gp_start_stall() is designed 
to warn about (and does warn about for me when I comment out the wakeup code).  
So why is rcu_check_gp_start_stall() not being called?  Here are a couple of 
possibilities:

1.  Because rcu_check_gp_start_stall() is only ever invoked from
RCU_SOFTIRQ, it is possible that softirqs are stalled for
whatever reason.

2.  Because RCU_SOFTIRQ is invoked primarily from the scheduler-clock
interrupt handler, it is possible that the scheduler tick has
somehow been disabled.  Traces from earlier runs showed a great
deal of RCU callbacks queued, which would have caused RCU to
refuse to allow the scheduler tick to be disabled, even if the
corresponding CPU was idle.

3.  You have CONFIG_FAST_NO_HZ=y (which you probably do, given
that you are building for a battery-powered device) and all of the
CPU's callbacks are lazy.  Except that your earlier traces showed
lots of non-lazy callbacks.  Besides, even if all callbacks were
lazy, there would still be a scheduling-clock interrupt every
six seconds, and there are quite a few six-second intervals
in a two-minute watchdog timeout.

But if we cannot find the problem quickly, I will likely ask
you to try reproducing with CONFIG_FAST_NO_HZ=n.  This could
be thought of as bisecting the RCU code looking for the bug.

The first two of these seem unlikely given that the watchdog timer was still 
firing.  Still, I don't see how 300,000 jiffies elapsed with a grace period 
requested and not started otherwise.  Could you please check?
One way to do so would be to enable ftrace on rcu_check_callbacks(), 
__rcu_process_callbacks(), and rcu_check_gp_start_stall().  It might be 
necessary to no-inline rcu_check_gp_start_stall().  You might have better ways 
to collect this information.

Without this information, the only workaround patch I can give you will degrade 
battery lifetime, which might not be what you want.

You do have a lockdep complaint early at boot.  Although I don't immediately 
see how this self-deadlock would affect RCU, please do get it fixed.  Sometimes

Re: rcu_preempt caused oom

2018-12-13 Thread Paul E. McKenney

34]  ? rcu_read_lock_sched_held+0x81/0x90
[2.896139]  ? __kmalloc+0x2a3/0x2e0
[2.896144]  ? media_pipeline_start+0x28/0x50
[2.896150]  ? __media_entity_enum_init+0x33/0x70
[2.896155]  ? csi2_has_route+0x18/0x20
[2.896160]  ? media_graph_walk_next.part.9+0xac/0x290
[2.896166]  __media_pipeline_start+0x15b/0x2f0
[2.896173]  ? rcu_read_lock_sched_held+0x81/0x90
[2.896179]  media_pipeline_start+0x33/0x50
[2.896186]  ipu_isys_video_prepare_streaming+0x1e0/0x610
[2.896191]  ? __lock_acquire+0x132e/0x1340
[2.896198]  ? __lock_acquire+0x2b5/0x1340
[2.896204]  ? lock_acquire+0x95/0x1a0
[2.896209]  ? start_streaming+0x5c/0x3a0
[2.896215]  ? start_streaming+0x5c/0x3a0
[2.896221]  ? __mutex_lock+0x391/0x9a0
[2.896226]  ? v4l_enable_media_source+0x2d/0x70
[2.896233]  ? find_held_lock+0x35/0xa0
[2.896238]  ? v4l_enable_media_source+0x57/0x70
[2.896245]  start_streaming+0x186/0x3a0
[2.896250]  ? __mutex_unlock_slowpath+0x58/0x2f0
[2.896257]  vb2_start_streaming+0x6d/0x130
[2.896262]  ? vb2_start_streaming+0x6d/0x130
[2.896267]  vb2_core_streamon+0x108/0x140
[2.896273]  vb2_streamon+0x29/0x50
[2.896278]  vb2_ioctl_streamon+0x42/0x50
[2.896284]  v4l_streamon+0x20/0x30
[2.896288]  __video_do_ioctl+0x1af/0x3c0
[2.896296]  ? __might_fault+0x85/0x90
[2.896302]  video_usercopy+0x27e/0x7e0
[2.896307]  ? copy_overflow+0x20/0x20
[2.896313]  ? find_held_lock+0x35/0xa0
[2.896319]  ? __might_fault+0x3e/0x90
[2.896325]  video_ioctl2+0x15/0x20
[2.896330]  v4l2_ioctl+0x49/0x50
[2.896335]  do_video_ioctl+0x93c/0x2360
[2.896343]  v4l2_compat_ioctl32+0x93/0xe0
[2.896349]  __ia32_compat_sys_ioctl+0x73a/0x1c90
[2.896354]  ? lockdep_hardirqs_on+0xef/0x180
[2.896359]  ? do_fast_syscall_32+0x3b/0x2d6
[2.896364]  do_fast_syscall_32+0x9a/0x2d6
[2.896370]  entry_SYSENTER_compat+0x6d/0x7c
[2.896377] RIP: 0023:0xf7e79b79
[2.896382] Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 0c 24 c3 8b 1c 
24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 
c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
[2.896387] RSP: 002b:f76816bc EFLAGS: 0292 ORIG_RAX: 
0036
[2.896393] RAX: ffda RBX: 000e RCX: 40045612
[2.896396] RDX: f768172c RSI: f7d42d9c RDI: f768172c
[2.896400] RBP: f7681708 R08:  R09: 
[2.896404] R10:  R11:  R12: 
[2.896408] R13:  R14:  R15: 



> [17818.936039] rcu: rcu_node 0:3 ->gp_seq 21808192 ->gp_seq_needed 
> 21808196
> [17818.936048] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x402 
> delta ->gp_activity 101730 ->gp_req_activity 101732 ->gp_wake_time 101730 
> ->gp_wake_seq 1357 -  >gp_seq 1360 ->gp_seq_needed 1360 ->gp_flags 0x0
>   
>
> [17818.936056] rcu: rcu_bh: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x402 
> delta ->gp_activity 4312486108 ->gp_req_activity 4312486108 ->gp_wake_time 
> 4312486108 ->gp_wake_seq 0 ->gp_seq -1200 ->gp_seq_needed -1200 
> ->gp_flags 0x0
> 
> -Original Message-
> From: Paul E. McKenney  
> Sent: Thursday, December 13, 2018 12:40 PM
> To: Zhang, Jun 
> Cc: He, Bo ; Steven Rostedt ; 
> linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> ; Sun, Yi J 
> Subject: Re: rcu_preempt caused oom
> 
> On Thu, Dec 13, 2018 at 03:28:46AM +, Zhang, Jun wrote:
> > Ok, we will test it, thanks!
> 
> But please also try the sysrq-y with the earlier patch after a hang!
> 
>           Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney [mailto:paul...@linux.ibm.com]
> > Sent: Thursday, December 13, 2018 10:43
> > To: Zhang, Jun 
> > Cc: He, Bo ; Steven Rostedt ; 
> > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
> > ; Zhang, Yanmin ; Bai, Jie 
> > A ; Sun, Yi J 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Thu, Dec 13, 2018 at 02:11:35AM +, Zhang, Jun wrote:
> > > Hello, Paul
> > > 
> > > I think the next patch is better.
> > > Because ULONG_CMP_GE could cause double write, which has risk that write 
> > > back old value.
> > > Pleas

Re: rcu_preempt caused oom

2018-12-12 Thread Paul E. McKenney

On Thu, Dec 13, 2018 at 03:28:46AM +, Zhang, Jun wrote:
> Ok, we will test it, thanks!

But please also try the sysrq-y with the earlier patch after a hang!

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney [mailto:paul...@linux.ibm.com] 
> Sent: Thursday, December 13, 2018 10:43
> To: Zhang, Jun 
> Cc: He, Bo ; Steven Rostedt ; 
> linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> ; Sun, Yi J 
> Subject: Re: rcu_preempt caused oom
> 
> On Thu, Dec 13, 2018 at 02:11:35AM +, Zhang, Jun wrote:
> > Hello, Paul
> > 
> > I think the next patch is better.
> > Because ULONG_CMP_GE could cause double write, which has risk that write 
> > back old value.
> > Please help review.
> > I don't test it. If you agree, we will test it.
> 
> Just to make sure that I understand, you are worried about something like the 
> following, correct?
> 
> o __note_gp_changes() compares rnp->gp_seq_needed and rdp->gp_seq_needed
>   and finds them equal.
> 
> o At just this time something like rcu_start_this_gp() assigns a new
>   (larger) value to rdp->gp_seq_needed.
> 
> o Then __note_gp_changes() overwrites rdp->gp_seq_needed with the
>   old value.
> 
> This cannot happen because __note_gp_changes() runs with interrupts disabled 
> on the CPU corresponding to the rcu_data structure referenced by the rdp 
> pointer.  So there is no way for rcu_start_this_gp() to be invoked on the 
> same CPU during this "if" statement.
> 
> Of course, there could be bugs.  For example:
> 
> o __note_gp_changes() might be called on a different CPU than that
>   corresponding to rdp.  You can check this with something like:
> 
>   WARN_ON_ONCE(rdp->cpu != smp_processor_id());
> 
> o The same things could happen with rcu_start_this_gp(), and the
>   above WARN_ON_ONCE() would work there as well.
> 
> o rcutree_prepare_cpu() is a special case, but is irrelevant unless
>   you are doing CPU-hotplug operations.  (It can run on a CPU other
>   than rdp->cpu, but only at times when rdp->cpu is offline.)
> 
> o Interrupts might not really be disabled.
> 
> That said, your patch could reduce overhead slightly, given that the two 
> values will be equal much of the time.  So it might be worth testing just for 
> that reason.
> 
> So why not just test it anyway?  If it makes the bug go away, I will be 
> surprised, but it would not be the first surprise for me.  ;-)
> 
>   Thanx, Paul
> 
> > Thanks!
> > 
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 
> > 0b760c1..c00f34e 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1849,7 +1849,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, 
> > struct rcu_node *rnp,
> > zero_cpu_stall_ticks(rdp);
> > }
> > rdp->gp_seq = rnp->gp_seq;  /* Remember new grace-period state. */
> > -   if (ULONG_CMP_GE(rnp->gp_seq_needed, rdp->gp_seq_needed) || 
> > rdp->gpwrap)
> > +   if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || 
> > + rdp->gpwrap)
> > rdp->gp_seq_needed = rnp->gp_seq_needed;
> > WRITE_ONCE(rdp->gpwrap, false);
> > rcu_gpnum_ovf(rnp, rdp);
> > 
> > 
> > -Original Message-
> > From: Paul E. McKenney [mailto:paul...@linux.ibm.com]
> > Sent: Thursday, December 13, 2018 08:12
> > To: He, Bo 
> > Cc: Steven Rostedt ; 
> > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> > ; Xiao, Jin ; Zhang, Yanmin 
> > ; Bai, Jie A ; Sun, Yi J 
> > 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Wed, Dec 12, 2018 at 11:13:22PM +, He, Bo wrote:
> > > I don't see the rcutree.sysrq_rcu parameter in v4.19 kernel, I also 
> > > checked the latest kernel and the latest tag v4.20-rc6, not see the 
> > > sysrq_rcu.
> > > Please correct me if I have something wrong.
> > 
> > That would be because I sent you the wrong patch, apologies!  :-/
> > 
> > Please instead see the one below, which does add sysrq_rcu.
> > 
> > Thanx, Paul
> > 
> > > -Original Message-
> > > From: Paul E. McKenney 
> >

Re: rcu_preempt caused oom

2018-12-12 Thread Paul E. McKenney

On Thu, Dec 13, 2018 at 02:11:35AM +, Zhang, Jun wrote:
> Hello, Paul
> 
> I think the next patch is better.
> Because ULONG_CMP_GE could cause double write, which has risk that write back 
> old value.
> Please help review.
> I don't test it. If you agree, we will test it.

Just to make sure that I understand, you are worried about something like
the following, correct?

o   __note_gp_changes() compares rnp->gp_seq_needed and rdp->gp_seq_needed
and finds them equal.

o   At just this time something like rcu_start_this_gp() assigns a new
(larger) value to rdp->gp_seq_needed.

o   Then __note_gp_changes() overwrites rdp->gp_seq_needed with the
old value.

This cannot happen because __note_gp_changes() runs with interrupts
disabled on the CPU corresponding to the rcu_data structure referenced
by the rdp pointer.  So there is no way for rcu_start_this_gp() to be
invoked on the same CPU during this "if" statement.

Of course, there could be bugs.  For example:

o   __note_gp_changes() might be called on a different CPU than that
corresponding to rdp.  You can check this with something like:

WARN_ON_ONCE(rdp->cpu != smp_processor_id());

o   The same things could happen with rcu_start_this_gp(), and the
above WARN_ON_ONCE() would work there as well.

o   rcutree_prepare_cpu() is a special case, but is irrelevant unless
you are doing CPU-hotplug operations.  (It can run on a CPU other
than rdp->cpu, but only at times when rdp->cpu is offline.)

o   Interrupts might not really be disabled.

That said, your patch could reduce overhead slightly, given that the
two values will be equal much of the time.  So it might be worth testing
just for that reason.

So why not just test it anyway?  If it makes the bug go away, I will be
surprised, but it would not be the first surprise for me.  ;-)

Thanx, Paul

> Thanks!
> 
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 0b760c1..c00f34e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1849,7 +1849,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, 
> struct rcu_node *rnp,
> zero_cpu_stall_ticks(rdp);
> }
> rdp->gp_seq = rnp->gp_seq;  /* Remember new grace-period state. */
> -   if (ULONG_CMP_GE(rnp->gp_seq_needed, rdp->gp_seq_needed) || 
> rdp->gpwrap)
> +   if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || 
> rdp->gpwrap)
> rdp->gp_seq_needed = rnp->gp_seq_needed;
> WRITE_ONCE(rdp->gpwrap, false);
> rcu_gpnum_ovf(rnp, rdp);
> 
> 
> -Original Message-
> From: Paul E. McKenney [mailto:paul...@linux.ibm.com] 
> Sent: Thursday, December 13, 2018 08:12
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> ; Sun, Yi J 
> Subject: Re: rcu_preempt caused oom
> 
> On Wed, Dec 12, 2018 at 11:13:22PM +, He, Bo wrote:
> > I don't see the rcutree.sysrq_rcu parameter in v4.19 kernel, I also checked 
> > the latest kernel and the latest tag v4.20-rc6, not see the sysrq_rcu.
> > Please correct me if I have something wrong.
> 
> That would be because I sent you the wrong patch, apologies!  :-/
> 
> Please instead see the one below, which does add sysrq_rcu.
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney 
> > Sent: Thursday, December 13, 2018 5:03 AM
> > To: He, Bo 
> > Cc: Steven Rostedt ; 
> > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> > ; Xiao, Jin ; Zhang, Yanmin 
> > ; Bai, Jie A 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Wed, Dec 12, 2018 at 07:42:24AM -0800, Paul E. McKenney wrote:
> > > On Wed, Dec 12, 2018 at 01:21:33PM +, He, Bo wrote:
> > > > we reproduce on two boards, but I still not see the 
> > > > show_rcu_gp_kthreads() dump logs, it seems the patch can't catch the 
> > > > scenario.
> > > > I double confirmed the CONFIG_PROVE_RCU=y is enabled in the config as 
> > > > it's extracted from the /proc/config.gz.
> > > 
> > > Strange.
> > > 
> > > Are the systems responsive to sysrq keys once failure occurs?  If 
> > > so, I will provide you a sysrq-R or some such to dump out the RCU state.
> > 
> > Or, as it turns out, sys

RE: rcu_preempt caused oom

2018-12-12 Thread Zhang, Jun

Hello, Paul

I think the next patch is better.
Because ULONG_CMP_GE could cause double write, which has risk that write back 
old value.
Please help review.
I don't test it. If you agree, we will test it.
Thanks!


diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0b760c1..c00f34e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1849,7 +1849,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, 
struct rcu_node *rnp,
zero_cpu_stall_ticks(rdp);
}
rdp->gp_seq = rnp->gp_seq;  /* Remember new grace-period state. */
-   if (ULONG_CMP_GE(rnp->gp_seq_needed, rdp->gp_seq_needed) || rdp->gpwrap)
+   if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap)
rdp->gp_seq_needed = rnp->gp_seq_needed;
WRITE_ONCE(rdp->gpwrap, false);
rcu_gpnum_ovf(rnp, rdp);


-Original Message-
From: Paul E. McKenney [mailto:paul...@linux.ibm.com] 
Sent: Thursday, December 13, 2018 08:12
To: He, Bo 
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 
; Bai, Jie A ; Sun, Yi J 

Subject: Re: rcu_preempt caused oom

On Wed, Dec 12, 2018 at 11:13:22PM +, He, Bo wrote:
> I don't see the rcutree.sysrq_rcu parameter in v4.19 kernel, I also checked 
> the latest kernel and the latest tag v4.20-rc6, not see the sysrq_rcu.
> Please correct me if I have something wrong.

That would be because I sent you the wrong patch, apologies!  :-/

Please instead see the one below, which does add sysrq_rcu.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney 
> Sent: Thursday, December 13, 2018 5:03 AM
> To: He, Bo 
> Cc: Steven Rostedt ; 
> linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> ; Xiao, Jin ; Zhang, Yanmin 
> ; Bai, Jie A 
> Subject: Re: rcu_preempt caused oom
> 
> On Wed, Dec 12, 2018 at 07:42:24AM -0800, Paul E. McKenney wrote:
> > On Wed, Dec 12, 2018 at 01:21:33PM +, He, Bo wrote:
> > > we reproduce on two boards, but I still not see the 
> > > show_rcu_gp_kthreads() dump logs, it seems the patch can't catch the 
> > > scenario.
> > > I double confirmed the CONFIG_PROVE_RCU=y is enabled in the config as 
> > > it's extracted from the /proc/config.gz.
> > 
> > Strange.
> > 
> > Are the systems responsive to sysrq keys once failure occurs?  If 
> > so, I will provide you a sysrq-R or some such to dump out the RCU state.
> 
> Or, as it turns out, sysrq-y if booting with rcutree.sysrq_rcu=1 using the 
> patch below.  Only lightly tested.



commit 04b6245c8458e8725f4169e62912c1fadfdf8141
Author: Paul E. McKenney 
Date:   Wed Dec 12 16:10:09 2018 -0800

rcu: Add sysrq rcu_node-dump capability

Backported from v4.21/v5.0

Life is hard if RCU manages to get stuck without triggering RCU CPU
stall warnings or triggering the rcu_check_gp_start_stall() checks
for failing to start a grace period.  This commit therefore adds a
boot-time-selectable sysrq key (commandeering "y") that allows manually
dumping Tree RCU state.  The new rcutree.sysrq_rcu kernel boot parameter
must be set for this sysrq to be available.

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 
0b760c1369f7..e9392a9d6291 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -61,6 +61,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "tree.h"
 #include "rcu.h"
@@ -128,6 +129,9 @@ int num_rcu_lvl[] = NUM_RCU_LVL_INIT;  int rcu_num_nodes 
__read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
 /* panic() on RCU Stall sysctl. */
 int sysctl_panic_on_rcu_stall __read_mostly;
+/* Commandeer a sysrq key to dump RCU's tree. */ static bool sysrq_rcu; 
+module_param(sysrq_rcu, bool, 0444);
 
 /*
  * The rcu_scheduler_active variable is initialized to the value @@ -662,6 
+666,27 @@ void show_rcu_gp_kthreads(void)  }  
EXPORT_SYMBOL_GPL(show_rcu_gp_kthreads);
 
+/* Dump grace-period-request information due to commandeered sysrq. */ 
+static void sysrq_show_rcu(int key) {
+   show_rcu_gp_kthreads();
+}
+
+static struct sysrq_key_op sysrq_rcudump_op = {
+   .handler = sysrq_show_rcu,
+   .help_msg = "show-rcu(y)",
+   .action_msg = "Show RCU tree",
+   .enable_mask = SYSRQ_ENABLE_DUMP,
+};
+
+static int __init rcu_sysrq_init(void)
+{
+   if (sysrq_rcu)
+   return register_sysrq_key('y', &sysrq_rcudump_op);
+   return 0;
+}
+early_initcall(rcu_sysrq_init);
+
 /*
  * Send along grace-period-related data for rcutorture diagnostics.
  */

Re: rcu_preempt caused oom

2018-12-12 Thread Paul E. McKenney

On Wed, Dec 12, 2018 at 11:13:22PM +, He, Bo wrote:
> I don't see the rcutree.sysrq_rcu parameter in v4.19 kernel, I also checked 
> the latest kernel and the latest tag v4.20-rc6, not see the sysrq_rcu.
> Please correct me if I have something wrong.

That would be because I sent you the wrong patch, apologies!  :-/

Please instead see the one below, which does add sysrq_rcu.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Thursday, December 13, 2018 5:03 AM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Wed, Dec 12, 2018 at 07:42:24AM -0800, Paul E. McKenney wrote:
> > On Wed, Dec 12, 2018 at 01:21:33PM +, He, Bo wrote:
> > > we reproduce on two boards, but I still not see the 
> > > show_rcu_gp_kthreads() dump logs, it seems the patch can't catch the 
> > > scenario.
> > > I double confirmed the CONFIG_PROVE_RCU=y is enabled in the config as 
> > > it's extracted from the /proc/config.gz.
> > 
> > Strange.
> > 
> > Are the systems responsive to sysrq keys once failure occurs?  If so, 
> > I will provide you a sysrq-R or some such to dump out the RCU state.
> 
> Or, as it turns out, sysrq-y if booting with rcutree.sysrq_rcu=1 using the 
> patch below.  Only lightly tested.



commit 04b6245c8458e8725f4169e62912c1fadfdf8141
Author: Paul E. McKenney 
Date:   Wed Dec 12 16:10:09 2018 -0800

rcu: Add sysrq rcu_node-dump capability

Backported from v4.21/v5.0

Life is hard if RCU manages to get stuck without triggering RCU CPU
stall warnings or triggering the rcu_check_gp_start_stall() checks
for failing to start a grace period.  This commit therefore adds a
boot-time-selectable sysrq key (commandeering "y") that allows manually
dumping Tree RCU state.  The new rcutree.sysrq_rcu kernel boot parameter
must be set for this sysrq to be available.

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0b760c1369f7..e9392a9d6291 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -61,6 +61,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "tree.h"
 #include "rcu.h"
@@ -128,6 +129,9 @@ int num_rcu_lvl[] = NUM_RCU_LVL_INIT;
 int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. 
*/
 /* panic() on RCU Stall sysctl. */
 int sysctl_panic_on_rcu_stall __read_mostly;
+/* Commandeer a sysrq key to dump RCU's tree. */
+static bool sysrq_rcu;
+module_param(sysrq_rcu, bool, 0444);
 
 /*
  * The rcu_scheduler_active variable is initialized to the value
@@ -662,6 +666,27 @@ void show_rcu_gp_kthreads(void)
 }
 EXPORT_SYMBOL_GPL(show_rcu_gp_kthreads);
 
+/* Dump grace-period-request information due to commandeered sysrq. */
+static void sysrq_show_rcu(int key)
+{
+   show_rcu_gp_kthreads();
+}
+
+static struct sysrq_key_op sysrq_rcudump_op = {
+   .handler = sysrq_show_rcu,
+   .help_msg = "show-rcu(y)",
+   .action_msg = "Show RCU tree",
+   .enable_mask = SYSRQ_ENABLE_DUMP,
+};
+
+static int __init rcu_sysrq_init(void)
+{
+   if (sysrq_rcu)
+   return register_sysrq_key('y', &sysrq_rcudump_op);
+   return 0;
+}
+early_initcall(rcu_sysrq_init);
+
 /*
  * Send along grace-period-related data for rcutorture diagnostics.
  */

RE: rcu_preempt caused oom

2018-12-12 Thread He, Bo

I don't see the rcutree.sysrq_rcu parameter in v4.19 kernel, I also checked the 
latest kernel and the latest tag v4.20-rc6, not see the sysrq_rcu.
Please correct me if I have something wrong.

-Original Message-
From: Paul E. McKenney  
Sent: Thursday, December 13, 2018 5:03 AM
To: He, Bo 
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 
; Bai, Jie A 
Subject: Re: rcu_preempt caused oom

On Wed, Dec 12, 2018 at 07:42:24AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 12, 2018 at 01:21:33PM +, He, Bo wrote:
> > we reproduce on two boards, but I still not see the show_rcu_gp_kthreads() 
> > dump logs, it seems the patch can't catch the scenario.
> > I double confirmed the CONFIG_PROVE_RCU=y is enabled in the config as it's 
> > extracted from the /proc/config.gz.
> 
> Strange.
> 
> Are the systems responsive to sysrq keys once failure occurs?  If so, 
> I will provide you a sysrq-R or some such to dump out the RCU state.

Or, as it turns out, sysrq-y if booting with rcutree.sysrq_rcu=1 using the 
patch below.  Only lightly tested.

Thanx, Paul



commit adfc7dff659495a3433d5084256be59eee0ac6df
Author: Paul E. McKenney 
Date:   Mon Dec 10 16:33:59 2018 -0800

rcu: Improve diagnostics for failed RCU grace-period start

Backported from v4.21/v5.0

If a grace period fails to start (for example, because you commented
out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
will invoke rcu_check_gp_start_stall(), which will notice and complain.
However, this complaint is lacking crucial debugging information such
as when the last wakeup executed and what the value of ->gp_seq was at
that time.  This commit therefore removes the current pr_alert() from
rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
which has been updated to print the needed information, which is collected
by rcu_gp_kthread_wake().

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 
0b760c1369f7..4bcd8753e293 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -626,25 +626,57 @@ void rcu_sched_force_quiescent_state(void)
 }
 EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
 
+/*
+ * Convert a ->gp_state value to a character string.
+ */
+static const char *gp_state_getname(short gs) {
+   if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
+   return "???";
+   return gp_state_names[gs];
+}
+
+/*
+ * Return the root node of the specified rcu_state structure.
+ */
+static struct rcu_node *rcu_get_root(struct rcu_state *rsp) {
+   return &rsp->node[0];
+}
+
 /*
  * Show the state of the grace-period kthreads.
  */
 void show_rcu_gp_kthreads(void)
 {
int cpu;
+   unsigned long j;
+   unsigned long ja;
+   unsigned long jr;
+   unsigned long jw;
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_state *rsp;
 
+   j = jiffies;
for_each_rcu_flavor(rsp) {
-   pr_info("%s: wait state: %d ->state: %#lx\n",
-   rsp->name, rsp->gp_state, rsp->gp_kthread->state);
+   ja = j - READ_ONCE(rsp->gp_activity);
+   jr = j - READ_ONCE(rsp->gp_req_activity);
+   jw = j - READ_ONCE(rsp->gp_wake_time);
+   pr_info("%s: wait state: %s(%d) ->state: %#lx delta 
->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld 
->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
+   rsp->name, gp_state_getname(rsp->gp_state),
+   rsp->gp_state,
+   rsp->gp_kthread ? rsp->gp_kthread->state : 0x1L,
+   ja, jr, jw, (long)READ_ONCE(rsp->gp_wake_seq),
+   (long)READ_ONCE(rsp->gp_seq),
+   (long)READ_ONCE(rcu_get_root(rsp)->gp_seq_needed),
+   READ_ONCE(rsp->gp_flags));
rcu_for_each_node_breadth_first(rsp, rnp) {
if (ULONG_CMP_GE(rsp->gp_seq, rnp->gp_seq_needed))
continue;
-   pr_info("\trcu_node %d:%d ->gp_seq %lu ->gp_seq_needed 
%lu\n",
-   rnp->grplo, rnp->grphi, rnp->gp_seq,
-   rnp->gp_seq_needed);
+   pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed 
%ld\n",
+   rnp->grplo, rnp->grphi, (long)rnp->gp_seq,
+

Re: rcu_preempt caused oom

2018-12-12 Thread Paul E. McKenney

On Wed, Dec 12, 2018 at 07:42:24AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 12, 2018 at 01:21:33PM +, He, Bo wrote:
> > we reproduce on two boards, but I still not see the show_rcu_gp_kthreads() 
> > dump logs, it seems the patch can't catch the scenario.
> > I double confirmed the CONFIG_PROVE_RCU=y is enabled in the config as it's 
> > extracted from the /proc/config.gz.
> 
> Strange.
> 
> Are the systems responsive to sysrq keys once failure occurs?  If so, I will
> provide you a sysrq-R or some such to dump out the RCU state.

Or, as it turns out, sysrq-y if booting with rcutree.sysrq_rcu=1 using
the patch below.  Only lightly tested.

Thanx, Paul



commit adfc7dff659495a3433d5084256be59eee0ac6df
Author: Paul E. McKenney 
Date:   Mon Dec 10 16:33:59 2018 -0800

rcu: Improve diagnostics for failed RCU grace-period start

Backported from v4.21/v5.0

If a grace period fails to start (for example, because you commented
out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
will invoke rcu_check_gp_start_stall(), which will notice and complain.
However, this complaint is lacking crucial debugging information such
as when the last wakeup executed and what the value of ->gp_seq was at
that time.  This commit therefore removes the current pr_alert() from
rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
which has been updated to print the needed information, which is collected
by rcu_gp_kthread_wake().

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0b760c1369f7..4bcd8753e293 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -626,25 +626,57 @@ void rcu_sched_force_quiescent_state(void)
 }
 EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
 
+/*
+ * Convert a ->gp_state value to a character string.
+ */
+static const char *gp_state_getname(short gs)
+{
+   if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
+   return "???";
+   return gp_state_names[gs];
+}
+
+/*
+ * Return the root node of the specified rcu_state structure.
+ */
+static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
+{
+   return &rsp->node[0];
+}
+
 /*
  * Show the state of the grace-period kthreads.
  */
 void show_rcu_gp_kthreads(void)
 {
int cpu;
+   unsigned long j;
+   unsigned long ja;
+   unsigned long jr;
+   unsigned long jw;
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_state *rsp;
 
+   j = jiffies;
for_each_rcu_flavor(rsp) {
-   pr_info("%s: wait state: %d ->state: %#lx\n",
-   rsp->name, rsp->gp_state, rsp->gp_kthread->state);
+   ja = j - READ_ONCE(rsp->gp_activity);
+   jr = j - READ_ONCE(rsp->gp_req_activity);
+   jw = j - READ_ONCE(rsp->gp_wake_time);
+   pr_info("%s: wait state: %s(%d) ->state: %#lx delta 
->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld 
->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
+   rsp->name, gp_state_getname(rsp->gp_state),
+   rsp->gp_state,
+   rsp->gp_kthread ? rsp->gp_kthread->state : 0x1L,
+   ja, jr, jw, (long)READ_ONCE(rsp->gp_wake_seq),
+   (long)READ_ONCE(rsp->gp_seq),
+   (long)READ_ONCE(rcu_get_root(rsp)->gp_seq_needed),
+   READ_ONCE(rsp->gp_flags));
rcu_for_each_node_breadth_first(rsp, rnp) {
if (ULONG_CMP_GE(rsp->gp_seq, rnp->gp_seq_needed))
continue;
-   pr_info("\trcu_node %d:%d ->gp_seq %lu ->gp_seq_needed 
%lu\n",
-   rnp->grplo, rnp->grphi, rnp->gp_seq,
-   rnp->gp_seq_needed);
+   pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed 
%ld\n",
+   rnp->grplo, rnp->grphi, (long)rnp->gp_seq,
+   (long)rnp->gp_seq_needed);
if (!rcu_is_leaf_node(rnp))
continue;
for_each_leaf_node_possible_cpu(rnp, cpu) {
@@ -653,8 +685,8 @@ void show_rcu_gp_kthreads(void)
ULONG_CMP_GE(rsp->gp_seq,
 rdp->gp_seq_needed))
continue;
-   pr_info("\tcpu %d ->gp_seq_needed %lu\n",
-   cpu, rdp->gp_seq_needed);
+   pr_info("\tcpu %d ->gp_seq_needed %ld\n",
+   cpu, (long)rdp->gp_seq_needed);
}
}

Re: rcu_preempt caused oom

2018-12-12 Thread Paul E. McKenney

On Wed, Dec 12, 2018 at 01:21:33PM +, He, Bo wrote:
> we reproduce on two boards, but I still not see the show_rcu_gp_kthreads() 
> dump logs, it seems the patch can't catch the scenario.
> I double confirmed the CONFIG_PROVE_RCU=y is enabled in the config as it's 
> extracted from the /proc/config.gz.

Strange.

Are the systems responsive to sysrq keys once failure occurs?  If so, I will
provide you a sysrq-R or some such to dump out the RCU state.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Wednesday, December 12, 2018 10:25 AM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Wed, Dec 12, 2018 at 01:37:40AM +, He, Bo wrote:
> > We reproduced the issue panic in hung_task with the patch "Improve 
> > diagnostics for failed RCU grace-period start", but unfortunately maybe 
> > it's due to the loglevel, the show_rcu_gp_kthreads doesn't print any logs, 
> > we will improve the build and run the test to double check.
> 
> Well, at least the diagnostics didn't prevent the problem from happening.  ;-)
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney 
> > Sent: Tuesday, December 11, 2018 12:47 PM
> > To: He, Bo 
> > Cc: Steven Rostedt ; 
> > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> > ; Xiao, Jin ; Zhang, Yanmin 
> > ; Bai, Jie A 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Mon, Dec 10, 2018 at 04:38:38PM -0800, Paul E. McKenney wrote:
> > > On Mon, Dec 10, 2018 at 06:56:18AM +, He, Bo wrote:
> > > > Hi, 
> > > >We have start the test with the CONFIG_PROVE_RCU=y, and also add one 
> > > > 2s to detect the preempt rcu hang, hope we can get more useful logs 
> > > > tomorrow.
> > > >I also enclosed the config and the debug patches for you review.
> > > 
> > > I instead suggest the (lightly tested) debug patch shown below, 
> > > which tracks wakeups of RCU's grace-period kthreads and dumps them 
> > > out if a given requested grace period fails to start.  Again, it is 
> > > necessary to build with CONFIG_PROVE_RCU=y, that is, with 
> > > CONFIG_PROVE_LOCKING=y.
> > 
> > Right.  This time without commenting out the wakeup as a test of the 
> > diagnostic.  :-/
> > 
> > Please use the patch below instead of the one that I sent in my previous 
> > email.
> > 
> > Thanx, Paul
> > 
> > --
> > --
> > 
> > commit adfc7dff659495a3433d5084256be59eee0ac6df
> > Author: Paul E. McKenney 
> > Date:   Mon Dec 10 16:33:59 2018 -0800
> > 
> > rcu: Improve diagnostics for failed RCU grace-period start
> > 
> > Backported from v4.21/v5.0
> > 
> > If a grace period fails to start (for example, because you commented
> > out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
> > will invoke rcu_check_gp_start_stall(), which will notice and complain.
> > However, this complaint is lacking crucial debugging information such
> > as when the last wakeup executed and what the value of ->gp_seq was at
> > that time.  This commit therefore removes the current pr_alert() from
> > rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
> > which has been updated to print the needed information, which is 
> > collected
> > by rcu_gp_kthread_wake().
> > 
> > Signed-off-by: Paul E. McKenney 
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 
> > 0b760c1369f7..4bcd8753e293 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -626,25 +626,57 @@ void rcu_sched_force_quiescent_state(void)
> >  }
> >  EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
> >  
> > +/*
> > + * Convert a ->gp_state value to a character string.
> > + */
> > +static const char *gp_state_getname(short gs) {
> > +   if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
> > +   return "???";
> > +   return gp_state_names[

Re: rcu_preempt caused oom

2018-12-11 Thread Paul E. McKenney

On Wed, Dec 12, 2018 at 01:37:40AM +, He, Bo wrote:
> We reproduced the issue panic in hung_task with the patch "Improve 
> diagnostics for failed RCU grace-period start", but unfortunately maybe it's 
> due to the loglevel, the show_rcu_gp_kthreads doesn't print any logs, we will 
> improve the build and run the test to double check.

Well, at least the diagnostics didn't prevent the problem from happening.  ;-)

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Tuesday, December 11, 2018 12:47 PM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Mon, Dec 10, 2018 at 04:38:38PM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 10, 2018 at 06:56:18AM +, He, Bo wrote:
> > > Hi, 
> > >We have start the test with the CONFIG_PROVE_RCU=y, and also add one 
> > > 2s to detect the preempt rcu hang, hope we can get more useful logs 
> > > tomorrow.
> > >I also enclosed the config and the debug patches for you review.
> > 
> > I instead suggest the (lightly tested) debug patch shown below, which 
> > tracks wakeups of RCU's grace-period kthreads and dumps them out if a 
> > given requested grace period fails to start.  Again, it is necessary 
> > to build with CONFIG_PROVE_RCU=y, that is, with CONFIG_PROVE_LOCKING=y.
> 
> Right.  This time without commenting out the wakeup as a test of the 
> diagnostic.  :-/
> 
> Please use the patch below instead of the one that I sent in my previous 
> email.
> 
>   Thanx, Paul
> 
> 
> 
> commit adfc7dff659495a3433d5084256be59eee0ac6df
> Author: Paul E. McKenney 
> Date:   Mon Dec 10 16:33:59 2018 -0800
> 
> rcu: Improve diagnostics for failed RCU grace-period start
> 
> Backported from v4.21/v5.0
> 
> If a grace period fails to start (for example, because you commented
> out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
> will invoke rcu_check_gp_start_stall(), which will notice and complain.
> However, this complaint is lacking crucial debugging information such
> as when the last wakeup executed and what the value of ->gp_seq was at
> that time.  This commit therefore removes the current pr_alert() from
> rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
> which has been updated to print the needed information, which is collected
> by rcu_gp_kthread_wake().
> 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 
> 0b760c1369f7..4bcd8753e293 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -626,25 +626,57 @@ void rcu_sched_force_quiescent_state(void)
>  }
>  EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
>  
> +/*
> + * Convert a ->gp_state value to a character string.
> + */
> +static const char *gp_state_getname(short gs) {
> + if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
> + return "???";
> + return gp_state_names[gs];
> +}
> +
> +/*
> + * Return the root node of the specified rcu_state structure.
> + */
> +static struct rcu_node *rcu_get_root(struct rcu_state *rsp) {
> + return &rsp->node[0];
> +}
> +
>  /*
>   * Show the state of the grace-period kthreads.
>   */
>  void show_rcu_gp_kthreads(void)
>  {
>   int cpu;
> + unsigned long j;
> + unsigned long ja;
> + unsigned long jr;
> + unsigned long jw;
>   struct rcu_data *rdp;
>   struct rcu_node *rnp;
>   struct rcu_state *rsp;
>  
> + j = jiffies;
>   for_each_rcu_flavor(rsp) {
> - pr_info("%s: wait state: %d ->state: %#lx\n",
> - rsp->name, rsp->gp_state, rsp->gp_kthread->state);
> + ja = j - READ_ONCE(rsp->gp_activity);
> + jr = j - READ_ONCE(rsp->gp_req_activity);
> + jw = j - READ_ONCE(rsp->gp_wake_time);
> + pr_info("%s: wait state: %s(%d) ->state: %#lx delta 
> ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld 
> ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
> + rsp->name, gp_state_getname(rsp->gp_state),
> + rsp->gp_st

RE: rcu_preempt caused oom

2018-12-10 Thread He, Bo

sure, we will update the new patch to run the test.

-Original Message-
From: Paul E. McKenney  
Sent: Tuesday, December 11, 2018 12:47 PM
To: He, Bo 
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 
; Bai, Jie A 
Subject: Re: rcu_preempt caused oom

On Mon, Dec 10, 2018 at 04:38:38PM -0800, Paul E. McKenney wrote:
> On Mon, Dec 10, 2018 at 06:56:18AM +, He, Bo wrote:
> > Hi, 
> >We have start the test with the CONFIG_PROVE_RCU=y, and also add one 2s 
> > to detect the preempt rcu hang, hope we can get more useful logs tomorrow.
> >I also enclosed the config and the debug patches for you review.
> 
> I instead suggest the (lightly tested) debug patch shown below, which 
> tracks wakeups of RCU's grace-period kthreads and dumps them out if a 
> given requested grace period fails to start.  Again, it is necessary 
> to build with CONFIG_PROVE_RCU=y, that is, with CONFIG_PROVE_LOCKING=y.

Right.  This time without commenting out the wakeup as a test of the 
diagnostic.  :-/

Please use the patch below instead of the one that I sent in my previous email.

Thanx, Paul



commit adfc7dff659495a3433d5084256be59eee0ac6df
Author: Paul E. McKenney 
Date:   Mon Dec 10 16:33:59 2018 -0800

rcu: Improve diagnostics for failed RCU grace-period start

Backported from v4.21/v5.0

If a grace period fails to start (for example, because you commented
out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
will invoke rcu_check_gp_start_stall(), which will notice and complain.
However, this complaint is lacking crucial debugging information such
as when the last wakeup executed and what the value of ->gp_seq was at
that time.  This commit therefore removes the current pr_alert() from
rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
which has been updated to print the needed information, which is collected
by rcu_gp_kthread_wake().

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 
0b760c1369f7..4bcd8753e293 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -626,25 +626,57 @@ void rcu_sched_force_quiescent_state(void)
 }
 EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
 
+/*
+ * Convert a ->gp_state value to a character string.
+ */
+static const char *gp_state_getname(short gs) {
+   if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
+   return "???";
+   return gp_state_names[gs];
+}
+
+/*
+ * Return the root node of the specified rcu_state structure.
+ */
+static struct rcu_node *rcu_get_root(struct rcu_state *rsp) {
+   return &rsp->node[0];
+}
+
 /*
  * Show the state of the grace-period kthreads.
  */
 void show_rcu_gp_kthreads(void)
 {
int cpu;
+   unsigned long j;
+   unsigned long ja;
+   unsigned long jr;
+   unsigned long jw;
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_state *rsp;
 
+   j = jiffies;
for_each_rcu_flavor(rsp) {
-   pr_info("%s: wait state: %d ->state: %#lx\n",
-   rsp->name, rsp->gp_state, rsp->gp_kthread->state);
+   ja = j - READ_ONCE(rsp->gp_activity);
+   jr = j - READ_ONCE(rsp->gp_req_activity);
+   jw = j - READ_ONCE(rsp->gp_wake_time);
+   pr_info("%s: wait state: %s(%d) ->state: %#lx delta 
->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld 
->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
+   rsp->name, gp_state_getname(rsp->gp_state),
+   rsp->gp_state,
+   rsp->gp_kthread ? rsp->gp_kthread->state : 0x1L,
+   ja, jr, jw, (long)READ_ONCE(rsp->gp_wake_seq),
+   (long)READ_ONCE(rsp->gp_seq),
+   (long)READ_ONCE(rcu_get_root(rsp)->gp_seq_needed),
+   READ_ONCE(rsp->gp_flags));
rcu_for_each_node_breadth_first(rsp, rnp) {
if (ULONG_CMP_GE(rsp->gp_seq, rnp->gp_seq_needed))
continue;
-   pr_info("\trcu_node %d:%d ->gp_seq %lu ->gp_seq_needed 
%lu\n",
-   rnp->grplo, rnp->grphi, rnp->gp_seq,
-   rnp->gp_seq_needed);
+   pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed 
%ld\n",
+   rnp->grplo, rnp->grphi, (long)rnp->gp_se

Re: rcu_preempt caused oom

2018-12-10 Thread Paul E. McKenney

On Mon, Dec 10, 2018 at 04:38:38PM -0800, Paul E. McKenney wrote:
> On Mon, Dec 10, 2018 at 06:56:18AM +, He, Bo wrote:
> > Hi, 
> >We have start the test with the CONFIG_PROVE_RCU=y, and also add one 2s 
> > to detect the preempt rcu hang, hope we can get more useful logs tomorrow.
> >I also enclosed the config and the debug patches for you review.
> 
> I instead suggest the (lightly tested) debug patch shown below, which
> tracks wakeups of RCU's grace-period kthreads and dumps them out if a
> given requested grace period fails to start.  Again, it is necessary to
> build with CONFIG_PROVE_RCU=y, that is, with CONFIG_PROVE_LOCKING=y.

Right.  This time without commenting out the wakeup as a test of the
diagnostic.  :-/

Please use the patch below instead of the one that I sent in my
previous email.

Thanx, Paul



commit adfc7dff659495a3433d5084256be59eee0ac6df
Author: Paul E. McKenney 
Date:   Mon Dec 10 16:33:59 2018 -0800

rcu: Improve diagnostics for failed RCU grace-period start

Backported from v4.21/v5.0

If a grace period fails to start (for example, because you commented
out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
will invoke rcu_check_gp_start_stall(), which will notice and complain.
However, this complaint is lacking crucial debugging information such
as when the last wakeup executed and what the value of ->gp_seq was at
that time.  This commit therefore removes the current pr_alert() from
rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
which has been updated to print the needed information, which is collected
by rcu_gp_kthread_wake().

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0b760c1369f7..4bcd8753e293 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -626,25 +626,57 @@ void rcu_sched_force_quiescent_state(void)
 }
 EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
 
+/*
+ * Convert a ->gp_state value to a character string.
+ */
+static const char *gp_state_getname(short gs)
+{
+   if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
+   return "???";
+   return gp_state_names[gs];
+}
+
+/*
+ * Return the root node of the specified rcu_state structure.
+ */
+static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
+{
+   return &rsp->node[0];
+}
+
 /*
  * Show the state of the grace-period kthreads.
  */
 void show_rcu_gp_kthreads(void)
 {
int cpu;
+   unsigned long j;
+   unsigned long ja;
+   unsigned long jr;
+   unsigned long jw;
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_state *rsp;
 
+   j = jiffies;
for_each_rcu_flavor(rsp) {
-   pr_info("%s: wait state: %d ->state: %#lx\n",
-   rsp->name, rsp->gp_state, rsp->gp_kthread->state);
+   ja = j - READ_ONCE(rsp->gp_activity);
+   jr = j - READ_ONCE(rsp->gp_req_activity);
+   jw = j - READ_ONCE(rsp->gp_wake_time);
+   pr_info("%s: wait state: %s(%d) ->state: %#lx delta 
->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld 
->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
+   rsp->name, gp_state_getname(rsp->gp_state),
+   rsp->gp_state,
+   rsp->gp_kthread ? rsp->gp_kthread->state : 0x1L,
+   ja, jr, jw, (long)READ_ONCE(rsp->gp_wake_seq),
+   (long)READ_ONCE(rsp->gp_seq),
+   (long)READ_ONCE(rcu_get_root(rsp)->gp_seq_needed),
+   READ_ONCE(rsp->gp_flags));
rcu_for_each_node_breadth_first(rsp, rnp) {
if (ULONG_CMP_GE(rsp->gp_seq, rnp->gp_seq_needed))
continue;
-   pr_info("\trcu_node %d:%d ->gp_seq %lu ->gp_seq_needed 
%lu\n",
-   rnp->grplo, rnp->grphi, rnp->gp_seq,
-   rnp->gp_seq_needed);
+   pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed 
%ld\n",
+   rnp->grplo, rnp->grphi, (long)rnp->gp_seq,
+   (long)rnp->gp_seq_needed);
if (!rcu_is_leaf_node(rnp))
continue;
for_each_leaf_node_possible_cpu(rnp, cpu) {
@@ -653,8 +685,8 @@ void show_rcu_gp_kthreads(void)
ULONG_CMP_GE(rsp->gp_seq,
 rdp->gp_seq_needed))
continue;
-   pr_info("\tcpu %d ->gp_seq_needed %lu\n",
-   cpu, rdp->gp_seq_needed);
+

Re: rcu_preempt caused oom

2018-12-10 Thread Paul E. McKenney

On Mon, Dec 10, 2018 at 06:56:18AM +, He, Bo wrote:
> Hi, 
>We have start the test with the CONFIG_PROVE_RCU=y, and also add one 2s to 
> detect the preempt rcu hang, hope we can get more useful logs tomorrow.
>I also enclosed the config and the debug patches for you review.

I instead suggest the (lightly tested) debug patch shown below, which
tracks wakeups of RCU's grace-period kthreads and dumps them out if a
given requested grace period fails to start.  Again, it is necessary to
build with CONFIG_PROVE_RCU=y, that is, with CONFIG_PROVE_LOCKING=y.

Thanx, Paul



commit 2a3826f15adaf92d046c80e38d090ecff5403807
Author: Paul E. McKenney 
Date:   Mon Dec 10 16:33:59 2018 -0800

rcu: Improve diagnostics for failed RCU grace-period start

Backported from v4.21/v5.0

If a grace period fails to start (for example, because you commented
out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
will invoke rcu_check_gp_start_stall(), which will notice and complain.
However, this complaint is lacking crucial debugging information such
as when the last wakeup executed and what the value of ->gp_seq was at
that time.  This commit therefore removes the current pr_alert() from
rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
which has been updated to print the needed information, which is collected
by rcu_gp_kthread_wake().

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0b760c1369f7..7daaef57d905 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -626,25 +626,57 @@ void rcu_sched_force_quiescent_state(void)
 }
 EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
 
+/*
+ * Convert a ->gp_state value to a character string.
+ */
+static const char *gp_state_getname(short gs)
+{
+   if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
+   return "???";
+   return gp_state_names[gs];
+}
+
+/*
+ * Return the root node of the specified rcu_state structure.
+ */
+static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
+{
+   return &rsp->node[0];
+}
+
 /*
  * Show the state of the grace-period kthreads.
  */
 void show_rcu_gp_kthreads(void)
 {
int cpu;
+   unsigned long j;
+   unsigned long ja;
+   unsigned long jr;
+   unsigned long jw;
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_state *rsp;
 
+   j = jiffies;
for_each_rcu_flavor(rsp) {
-   pr_info("%s: wait state: %d ->state: %#lx\n",
-   rsp->name, rsp->gp_state, rsp->gp_kthread->state);
+   ja = j - READ_ONCE(rsp->gp_activity);
+   jr = j - READ_ONCE(rsp->gp_req_activity);
+   jw = j - READ_ONCE(rsp->gp_wake_time);
+   pr_info("%s: wait state: %s(%d) ->state: %#lx delta 
->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld 
->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
+   rsp->name, gp_state_getname(rsp->gp_state),
+   rsp->gp_state,
+   rsp->gp_kthread ? rsp->gp_kthread->state : 0x1L,
+   ja, jr, jw, (long)READ_ONCE(rsp->gp_wake_seq),
+   (long)READ_ONCE(rsp->gp_seq),
+   (long)READ_ONCE(rcu_get_root(rsp)->gp_seq_needed),
+   READ_ONCE(rsp->gp_flags));
rcu_for_each_node_breadth_first(rsp, rnp) {
if (ULONG_CMP_GE(rsp->gp_seq, rnp->gp_seq_needed))
continue;
-   pr_info("\trcu_node %d:%d ->gp_seq %lu ->gp_seq_needed 
%lu\n",
-   rnp->grplo, rnp->grphi, rnp->gp_seq,
-   rnp->gp_seq_needed);
+   pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed 
%ld\n",
+   rnp->grplo, rnp->grphi, (long)rnp->gp_seq,
+   (long)rnp->gp_seq_needed);
if (!rcu_is_leaf_node(rnp))
continue;
for_each_leaf_node_possible_cpu(rnp, cpu) {
@@ -653,8 +685,8 @@ void show_rcu_gp_kthreads(void)
ULONG_CMP_GE(rsp->gp_seq,
 rdp->gp_seq_needed))
continue;
-   pr_info("\tcpu %d ->gp_seq_needed %lu\n",
-   cpu, rdp->gp_seq_needed);
+   pr_info("\tcpu %d ->gp_seq_needed %ld\n",
+   cpu, (long)rdp->gp_seq_needed);
}
}
/* sched_show_task(rsp->gp_kthread); */
@@ -690,14 +722,6 @@ void rcutort

RE: rcu_preempt caused oom

2018-12-09 Thread He, Bo

Hi, 
   We have start the test with the CONFIG_PROVE_RCU=y, and also add one 2s to 
detect the preempt rcu hang, hope we can get more useful logs tomorrow.
   I also enclosed the config and the debug patches for you review.

-Original Message-
From: Paul E. McKenney  
Sent: Monday, December 10, 2018 3:56 AM
To: He, Bo 
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 
; Bai, Jie A 
Subject: Re: rcu_preempt caused oom

On Fri, Dec 07, 2018 at 06:11:31AM -0800, Paul E. McKenney wrote:
> On Fri, Dec 07, 2018 at 08:25:09AM +, He, Bo wrote:
> > Bad news,  the issue is still reproduced after 61 Hours monkey test on 1/6 
> > boards with the CONFIG_RCU_BOOST=y, and the issue is not seen on kernel 
> > 4.14, the CONFIG_RCU_BOOST is also not enabled in our kernel 4.14 config.
> > Here enclosed is the logs.
> > 
> > > So the question becomes "Why is the grace-period kthread being awakened 
> > > so many times, but not actually waking up?"  
> > maybe it's not schedule issue, I have two suspects:
> > we can see tons of grace_period with 117392312: 
> >  [219346.919405, 0] showmap-31232 [000] d..1 219346.136035: 
> > rcu_future_grace_period: rcu_preempt 117392312 117392316 0 0 3 
> > Startleaf  [219346.919417, 0] showmap-31232 [000] d..1 
> > 219346.136036: rcu_future_grace_period: rcu_preempt 117392312 
> > 117392316 0 0 3 Prestarted  [219346.919429, 0] showmap-31232 [000] 
> > d..1 219346.136036: rcu_grace_period: rcu_preempt 117392312 
> > AccWaitCB
> > 
> > "Startleaf" means start the grace period, "Prestarted" means the grace 
> > period is already started or other conditions blocked, RCU_GP_FLAG_INIT 
> > should follow the "Startedroot", then the kthread can be wakeup.
> 
> Yes, when "Startleaf" is followed by "Prestarted", that means that we 
> reached an rcu_node structure that is already aware that the requested 
> grace period is needed.  Breaking down the relevant "if" statement in
> rcu_start_this_gp():
> 
>   if (ULONG_CMP_GE(rnp->gp_seq_needed, gp_seq_req) ||
>   // A.  GP already requested at this rcu_node
>   rcu_seq_started(&rnp->gp_seq, gp_seq_req) ||
>   // B.  The requested grace period already started
>   (rnp != rnp_start &&
>rcu_seq_state(rcu_seq_current(&rnp->gp_seq {
>   // C.  Leaf rcu_node recorded request, and
>   // some grace period is in progress
> 
> A:In this case, the "Startedroot" should be taken care of by some
>   other thread, or one of B or C held earlier.
> 
> B:This cannot be the case, because your earlier trace showed that
>   the requested grace period had not started.
> 
> C:This cannot be the case because both traces above are on the
>   leaf rcu_node structure.  If it were the case, the currently
>   running grace period would notice the need for the requested
>   grace period when it ended, and would start the grace period
>   at that time.
> 
> So you are saying that your trace goes back far enough to capture the 
> very first "Startleaf" for this new grace period, and you don't ever 
> see a "Startedroot"?  This would be OK if the later "Startedleaf" 
> showed up at that point.  If you do have such a trace, could you 
> please send it to me (or post it somewhere and send me the URL)?
> 
> In any case, this code has bee reworked recently, so I will take a 
> closer look, which will take some time.  Please feel free to continue 
> to do so as well, of course!

Hmmm...  Could you please build with CONFIG_PROVE_RCU=y and run the original 
(for example, CONFIG_RCU_BOOST=n)?  I would expect this to trigger the warning 
in rcu_check_gp_start_stall().  Of course, if it does not trigger, that would 
be valuable information as well.

Thanx, Paul

> > I do experiment to dump the backtrace, the rcu_quiescent_state_report is 
> > called in softirq context:
> > -0 [000] dNs2 24471.669280: rcu_quiescent_state_report: 
> > rcu_preempt 3562401 1>0 0 0 3 0
> >   -0 [000] dNs2 24471.669293: 
> >  => rcu_report_qs_rnp+0x1e2/0x2a0
> >  => rcu_process_callbacks+0x2f1/0x3c0  => __do_softirq+0x12a/0x386  
> > => irq_exit+0xb1/0xc0  => smp_apic_timer_interrupt+0xd4/0x1e0
> >  => apic_timer_interrupt+0xf/0x20
> >  => cpuidle_enter_state+0xb1

Re: rcu_preempt caused oom

2018-12-09 Thread Paul E. McKenney

rcu_accelerate_cbs, we can see AccWaitCB log, but can't see 
> > rcu_quiescent_state_report, mostly it's due to condition rnp->qsmask & mask 
> > blocked.
> > 
> > static void
> > rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
> > {
> > ...
> > if ((rnp->qsmask & mask) == 0) {
> > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > } else {
> > rdp->core_needs_qs = false;
> > needwake = rcu_accelerate_cbs(rsp, rnp, rdp); 
> > rcu_report_qs_rnp(mask, rsp, rnp, rnp->gp_seq, flags);  
> > 
> >
> > 
> > if (needwake)
> > rcu_gp_kthread_wake(rsp);
> > }
> > }
> 
> This is a completely different code path.  The rcu_start_this_gp()
> function is trying to start a new grace period.  In contrast, this
> rcu_report_qs_rdp() function reports a quiescent state for a currently
> running grace period.  In your earlier trace, there was no currently
> running grace period, so rcu_report_qs_rdp() exiting early is expected
> behavior.
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney  
> > Sent: Friday, December 7, 2018 1:38 AM
> > To: He, Bo 
> > Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> > j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> > jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> > ; Zhang, Yanmin ; Bai, Jie A 
> > 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Thu, Dec 06, 2018 at 01:23:01PM +, He, Bo wrote:
> > > 1. The test is positive after set the kthread priority to SCHED_FIFO 
> > > without CONFIG_RCU_BOOST,  the issue is not reproduced until now.
> > > 2. Here is previous log enable the ftrace_dump, and we can get 4 seconds 
> > > ftrace. The panic log was triggered with the enclosed debug patch, 
> > > replaced the wait_for_completion(&rs_array[i].completion) with 
> > > wait_for_completion_timeout(&rs_array[i].completion, 3*HZ) in 
> > > __wait_rcu_gp(). The logs enabled the lockdep to dump the locks, and dump 
> > > all tasks backtrace.
> > 
> > Thank you for collecting this information!
> > 
> > (By the way, the usual downside of the priority increase is increased 
> > context-switch rate and thus CPU overhead.)
> > 
> > And all three grace-period kthreads are blocked apparently in their 
> > top-level loops (though inlining and all that).  There are quite a few 
> > preemptions ("72738.702815: rcu_preempt_task: rcu_preempt"), but they are 
> > all blocking the next grace period (29041008), not the current one 
> > (29041004).  And the "rcu_unlock_preempted_task" trace records flag the 
> > current grace-period sequence number as 29041004, which means that there is 
> > no grace period in progress, that is, RCU is idle.
> > 
> > Which explains why there is no RCU CPU stall warning -- after all, if there 
> > is no grace period in flight, it is not possible to stall that non-existent 
> > grace period.
> > 
> > That also could explain why increasing the priority of the grace-period 
> > kthreads gets things going again.  There have been a great number of 
> > requests for a new grace period (for example, "rcu_future_grace_period:
> > rcu_preempt 29041004 29041008 0 0 3 Startleaf"), so as soon as the 
> > grace-period kthread wakes up, a new grace period will start.
> > 
> > Except that the rcu_preempt task says "I" rather than "R", as you noted in 
> > an earlier email.
> > 
> > And there should have been multiple attempts to wake up the grace-period 
> > kthread, because there are lots of callbacks queued as in 136,045 of them 
> > ("rcu_callback: rcu_preempt rhp=66f735c9 func=file_free_rcu 
> > 2811/136045").  Which is of course why you are seeing the OOM.
> > 
> > So the question becomes "Why is the grace-period kthread being awakened so 
> > many times, but not actually waking up?"  In the past, there was a 
> > scheduler bug that could cause that, but that was -way- before the v4.19 
> > that you are running.  More recently, there have been timer-related 
> > problems, but those only happened while a grace period was active, and 
> > where also long before v4.19.
> > 
> > Hmmm...  One possibility is that you have somehow ma

Re: rcu_preempt caused oom

2018-12-07 Thread Paul E. McKenney

 rcu_gp_kthread_wake(rsp);
> }
> }

This is a completely different code path.  The rcu_start_this_gp()
function is trying to start a new grace period.  In contrast, this
rcu_report_qs_rdp() function reports a quiescent state for a currently
running grace period.  In your earlier trace, there was no currently
running grace period, so rcu_report_qs_rdp() exiting early is expected
behavior.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Friday, December 7, 2018 1:38 AM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Thu, Dec 06, 2018 at 01:23:01PM +, He, Bo wrote:
> > 1. The test is positive after set the kthread priority to SCHED_FIFO 
> > without CONFIG_RCU_BOOST,  the issue is not reproduced until now.
> > 2. Here is previous log enable the ftrace_dump, and we can get 4 seconds 
> > ftrace. The panic log was triggered with the enclosed debug patch, replaced 
> > the wait_for_completion(&rs_array[i].completion) with 
> > wait_for_completion_timeout(&rs_array[i].completion, 3*HZ) in 
> > __wait_rcu_gp(). The logs enabled the lockdep to dump the locks, and dump 
> > all tasks backtrace.
> 
> Thank you for collecting this information!
> 
> (By the way, the usual downside of the priority increase is increased 
> context-switch rate and thus CPU overhead.)
> 
> And all three grace-period kthreads are blocked apparently in their top-level 
> loops (though inlining and all that).  There are quite a few preemptions 
> ("72738.702815: rcu_preempt_task: rcu_preempt"), but they are all blocking 
> the next grace period (29041008), not the current one (29041004).  And the 
> "rcu_unlock_preempted_task" trace records flag the current grace-period 
> sequence number as 29041004, which means that there is no grace period in 
> progress, that is, RCU is idle.
> 
> Which explains why there is no RCU CPU stall warning -- after all, if there 
> is no grace period in flight, it is not possible to stall that non-existent 
> grace period.
> 
> That also could explain why increasing the priority of the grace-period 
> kthreads gets things going again.  There have been a great number of requests 
> for a new grace period (for example, "rcu_future_grace_period:
> rcu_preempt 29041004 29041008 0 0 3 Startleaf"), so as soon as the 
> grace-period kthread wakes up, a new grace period will start.
> 
> Except that the rcu_preempt task says "I" rather than "R", as you noted in an 
> earlier email.
> 
> And there should have been multiple attempts to wake up the grace-period 
> kthread, because there are lots of callbacks queued as in 136,045 of them 
> ("rcu_callback: rcu_preempt rhp=66f735c9 func=file_free_rcu 
> 2811/136045").  Which is of course why you are seeing the OOM.
> 
> So the question becomes "Why is the grace-period kthread being awakened so 
> many times, but not actually waking up?"  In the past, there was a scheduler 
> bug that could cause that, but that was -way- before the v4.19 that you are 
> running.  More recently, there have been timer-related problems, but those 
> only happened while a grace period was active, and where also long before 
> v4.19.
> 
> Hmmm...  One possibility is that you have somehow managed to invoke
> call_rcu() with interrupts disabled, which would in turn disable the extra 
> wakeups that RCU sends when it sees excessive numbers of callbacks.
> Except that in that case, boosting the priority wouldn't help.  Besides, the 
> scheduling-clock interrupt should also check for this, and should push things 
> forward if need be.
> 
> If RCU managed to put all of its callbacks into the RCU_NEXT_READY_TAIL 
> bucket on all CPUs, that would defeat the wakeup-if-no-grace-period checks 
> (RCU is supposed to have started the relevant grace period before putting 
> callbacks into that bucket).  But that cannot be the case here, because new 
> callbacks are being enqueued throughout, and these would then trigger RCU's 
> start-a-new-grace-period checks.
> 
> But it would be good to confirm that this is actually working like I would 
> expect it to.  Could you please add scheduler wakeup to your tracing, if 
> possible, only displaying those sent to the rcu_preempt task?
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney 
> > Sent:

Re: rcu_preempt caused oom

2018-12-06 Thread Paul E. McKenney

On Thu, Dec 06, 2018 at 01:23:01PM +, He, Bo wrote:
> 1. The test is positive after set the kthread priority to SCHED_FIFO without 
> CONFIG_RCU_BOOST,  the issue is not reproduced until now.
> 2. Here is previous log enable the ftrace_dump, and we can get 4 seconds 
> ftrace. The panic log was triggered with the enclosed debug patch, replaced 
> the wait_for_completion(&rs_array[i].completion) with 
> wait_for_completion_timeout(&rs_array[i].completion, 3*HZ) in 
> __wait_rcu_gp(). The logs enabled the lockdep to dump the locks, and dump all 
> tasks backtrace.

Thank you for collecting this information!

(By the way, the usual downside of the priority increase is increased
context-switch rate and thus CPU overhead.)

And all three grace-period kthreads are blocked apparently in their
top-level loops (though inlining and all that).  There are quite a few
preemptions ("72738.702815: rcu_preempt_task: rcu_preempt"), but they
are all blocking the next grace period (29041008), not the current one
(29041004).  And the "rcu_unlock_preempted_task" trace records flag the
current grace-period sequence number as 29041004, which means that there
is no grace period in progress, that is, RCU is idle.

Which explains why there is no RCU CPU stall warning -- after all, if
there is no grace period in flight, it is not possible to stall that
non-existent grace period.

That also could explain why increasing the priority of the grace-period
kthreads gets things going again.  There have been a great number of
requests for a new grace period (for example, "rcu_future_grace_period:
rcu_preempt 29041004 29041008 0 0 3 Startleaf"), so as soon as the
grace-period kthread wakes up, a new grace period will start.

Except that the rcu_preempt task says "I" rather than "R", as you noted
in an earlier email.

And there should have been multiple attempts to wake up the grace-period
kthread, because there are lots of callbacks queued as in 136,045 of
them ("rcu_callback: rcu_preempt rhp=66f735c9 func=file_free_rcu
2811/136045").  Which is of course why you are seeing the OOM.

So the question becomes "Why is the grace-period kthread being awakened
so many times, but not actually waking up?"  In the past, there was a
scheduler bug that could cause that, but that was -way- before the v4.19
that you are running.  More recently, there have been timer-related
problems, but those only happened while a grace period was active,
and where also long before v4.19.

Hmmm...  One possibility is that you have somehow managed to invoke
call_rcu() with interrupts disabled, which would in turn disable the
extra wakeups that RCU sends when it sees excessive numbers of callbacks.
Except that in that case, boosting the priority wouldn't help.  Besides,
the scheduling-clock interrupt should also check for this, and should
push things forward if need be.

If RCU managed to put all of its callbacks into the RCU_NEXT_READY_TAIL
bucket on all CPUs, that would defeat the wakeup-if-no-grace-period
checks (RCU is supposed to have started the relevant grace period before
putting callbacks into that bucket).  But that cannot be the case here,
because new callbacks are being enqueued throughout, and these would
then trigger RCU's start-a-new-grace-period checks.

But it would be good to confirm that this is actually working like I would
expect it to.  Could you please add scheduler wakeup to your tracing,
if possible, only displaying those sent to the rcu_preempt task?

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Thursday, December 6, 2018 1:45 AM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Wed, Dec 05, 2018 at 08:42:54AM +, He, Bo wrote:
> > I double checked the .config, we don't enable CONFIG_NO_HZ_FULL .
> > Our previous logs can dump all the task backtrace, and kthread (the 
> > rcu_preempt, rcu_sched, and rcu_bh tasks) are all in "I" state not in "R" 
> > state, my understandings are if it's the side-effect of causing RCU's 
> > kthreads to be run at SCHED_FIFO priority 1, the kthreads should be in R 
> > state.
> 
> Hmmm...  Well, the tasks could in theory be waiting on a blocking mutex.
> But in practice the grace-period kthreads wait on events, so that makes no 
> sense.
> 
> Is it possible for you to dump out the grace-period kthread's stack, for 
> example, with sysreq-t?  (Steve might know a better way to do this.)
> 
> > I will do more experiments and keep you updat

Re: rcu_preempt caused oom

2018-12-05 Thread Paul E. McKenney

On Wed, Dec 05, 2018 at 08:42:54AM +, He, Bo wrote:
> I double checked the .config, we don't enable CONFIG_NO_HZ_FULL .
> Our previous logs can dump all the task backtrace, and kthread (the 
> rcu_preempt, rcu_sched, and rcu_bh tasks) are all in "I" state not in "R" 
> state, my understandings are if it's the side-effect of causing RCU's 
> kthreads to be run at SCHED_FIFO priority 1, the kthreads should be in R 
> state.

Hmmm...  Well, the tasks could in theory be waiting on a blocking mutex.
But in practice the grace-period kthreads wait on events, so that makes
no sense.

Is it possible for you to dump out the grace-period kthread's stack,
for example, with sysreq-t?  (Steve might know a better way to do this.)

> I will do more experiments and keep you update once we have more findings:
> 1. set the kthread priority to SCHED_FIFO without CONFIG_RCU_BOOST and see if 
> the issue can reproduce.

That sounds like a most excellent experiment!

> 2. check more ftrace to double confirm why there is no 
> trace_rcu_quiescent_state_report and most of the trace_rcu_grace_period are 
> in "AccWaitCB".

As noted earlier, to see something interesting, you will need to start
the ftrace before the grace period starts.  This would probably mean
having ftrace running before starting the test.  Starting the ftrace
after the hang commences is unlikely to produce useful information.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Wednesday, December 5, 2018 3:50 AM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Tue, Dec 04, 2018 at 07:50:04AM +, He, Bo wrote:
> > Hi, Paul:
> > the enclosed is the log trigger the 120s hung_task_panic without other 
> > debug patches, the hung task is blocked at __wait_rcu_gp, it means the 
> > rcu_cpu_stall can't detect the scenario:
> > echo 1 > /proc/sys/kernel/panic_on_rcu_stall
> > echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
> 
> Not necessarily.  If there is an RCU CPU stall warning, blocking within
> __wait_rcu_gp() is expected behavior.  It is possible that the problem is 
> that although the grace period is completing as required, the callbacks are 
> not being invoked in a timely fashion.  And that could happen if you had 
> CONFIG_NO_HZ_FULL and a bunch of nohz_full CPUs, or, alternatively, callback 
> offloading enabled.  But I don't see these in your previous emails.  Another 
> possible cause is that the grace-period kthread is being delayed, so that the 
> grace period never starts.  This seems unlikely, but it is the only thing 
> thus far that matches the symptoms.
> 
> CONFIG_RCU_BOOST=y has the side-effect of causing RCU's kthreads to be run at 
> SCHED_FIFO priority 1, and that would help in the case where RCU's 
> grace-period kthread (the rcu_preempt, rcu_sched, and rcu_bh tasks, all of 
> which execute in the rcu_gp_kthread() function) was being starved of CPU time.
> 
> Does that sound likely?
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney 
> > Sent: Monday, December 3, 2018 9:57 PM
> > To: He, Bo 
> > Cc: Steven Rostedt ; 
> > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> > ; Xiao, Jin ; Zhang, Yanmin 
> > 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote:
> > > Thanks, we have run the test for the whole weekend and not reproduce the 
> > > issue,  so we confirm the CONFIG_RCU_BOOST can fix the issue.
> > 
> > Very good, that is encouraging.  Perhaps I should think about making 
> > CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for 
> > architectures for which rt_mutexes are implemented.
> > 
> > > We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic 
> > > on rcu stall and will see if we can see the panic, will keep you posed 
> > > with the test results.
> > > echo 1 > /proc/sys/kernel/panic_on_rcu_stall
> > 
> > Looking forward to seeing what is going on!  Of course, to reproduce, you 
> > will need to again build with CONFIG_RCU_BOOST=n.
> > 
> >         Thanx, Paul
> > 
> > > -Original Me

RE: rcu_preempt caused oom

2018-12-05 Thread He, Bo

I double checked the .config, we don't enable CONFIG_NO_HZ_FULL .
Our previous logs can dump all the task backtrace, and kthread (the 
rcu_preempt, rcu_sched, and rcu_bh tasks) are all in "I" state not in "R" 
state, my understandings are if it's the side-effect of causing RCU's kthreads 
to be run at SCHED_FIFO priority 1, the kthreads should be in R state.

I will do more experiments and keep you update once we have more findings:
1. set the kthread priority to SCHED_FIFO without CONFIG_RCU_BOOST and see if 
the issue can reproduce.
2. check more ftrace to double confirm why there is no 
trace_rcu_quiescent_state_report and most of the trace_rcu_grace_period are in 
"AccWaitCB".

-Original Message-
From: Paul E. McKenney  
Sent: Wednesday, December 5, 2018 3:50 AM
To: He, Bo 
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 
; Bai, Jie A 
Subject: Re: rcu_preempt caused oom

On Tue, Dec 04, 2018 at 07:50:04AM +, He, Bo wrote:
> Hi, Paul:
> the enclosed is the log trigger the 120s hung_task_panic without other debug 
> patches, the hung task is blocked at __wait_rcu_gp, it means the 
> rcu_cpu_stall can't detect the scenario:
> echo 1 > /proc/sys/kernel/panic_on_rcu_stall
> echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout

Not necessarily.  If there is an RCU CPU stall warning, blocking within
__wait_rcu_gp() is expected behavior.  It is possible that the problem is that 
although the grace period is completing as required, the callbacks are not 
being invoked in a timely fashion.  And that could happen if you had 
CONFIG_NO_HZ_FULL and a bunch of nohz_full CPUs, or, alternatively, callback 
offloading enabled.  But I don't see these in your previous emails.  Another 
possible cause is that the grace-period kthread is being delayed, so that the 
grace period never starts.  This seems unlikely, but it is the only thing thus 
far that matches the symptoms.

CONFIG_RCU_BOOST=y has the side-effect of causing RCU's kthreads to be run at 
SCHED_FIFO priority 1, and that would help in the case where RCU's grace-period 
kthread (the rcu_preempt, rcu_sched, and rcu_bh tasks, all of which execute in 
the rcu_gp_kthread() function) was being starved of CPU time.

Does that sound likely?

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney 
> Sent: Monday, December 3, 2018 9:57 PM
> To: He, Bo 
> Cc: Steven Rostedt ; 
> linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> ; Xiao, Jin ; Zhang, Yanmin 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote:
> > Thanks, we have run the test for the whole weekend and not reproduce the 
> > issue,  so we confirm the CONFIG_RCU_BOOST can fix the issue.
> 
> Very good, that is encouraging.  Perhaps I should think about making 
> CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for 
> architectures for which rt_mutexes are implemented.
> 
> > We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic on 
> > rcu stall and will see if we can see the panic, will keep you posed with 
> > the test results.
> > echo 1 > /proc/sys/kernel/panic_on_rcu_stall
> 
> Looking forward to seeing what is going on!  Of course, to reproduce, you 
> will need to again build with CONFIG_RCU_BOOST=n.
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney 
> > Sent: Saturday, December 1, 2018 12:49 AM
> > To: He, Bo 
> > Cc: Steven Rostedt ; 
> > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> > ; Xiao, Jin ; Zhang, Yanmin 
> > 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote:
> > > Here is the kernel cmdline:
> > 
> > Thank you!
> > 
> > > Kernel command line: androidboot.acpio_idx=0
> > > androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06
> > > _0
> > > 3- userdebug androidboot.diskbus=00.0 
> > > androidboot.verifiedbootstate=green
> > > androidboot.bootreason=power-on 
> > > androidboot.serialno=R1J56L6006a7bb
> > > g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves 
> > > reboot_panic=p,w i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti 
> > > ftrace_dump_on_oops trace_buf_size=1024K intel_iommu=off

Re: rcu_preempt caused oom

2018-12-04 Thread Paul E. McKenney

On Tue, Dec 04, 2018 at 07:50:04AM +, He, Bo wrote:
> Hi, Paul:
> the enclosed is the log trigger the 120s hung_task_panic without other debug 
> patches, the hung task is blocked at __wait_rcu_gp, it means the 
> rcu_cpu_stall can't detect the scenario:
> echo 1 > /proc/sys/kernel/panic_on_rcu_stall
> echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout

Not necessarily.  If there is an RCU CPU stall warning, blocking within
__wait_rcu_gp() is expected behavior.  It is possible that the problem is
that although the grace period is completing as required, the callbacks
are not being invoked in a timely fashion.  And that could happen if you
had CONFIG_NO_HZ_FULL and a bunch of nohz_full CPUs, or, alternatively,
callback offloading enabled.  But I don't see these in your previous
emails.  Another possible cause is that the grace-period kthread is being
delayed, so that the grace period never starts.  This seems unlikely,
but it is the only thing thus far that matches the symptoms.

CONFIG_RCU_BOOST=y has the side-effect of causing RCU's kthreads to
be run at SCHED_FIFO priority 1, and that would help in the case where
RCU's grace-period kthread (the rcu_preempt, rcu_sched, and rcu_bh tasks,
all of which execute in the rcu_gp_kthread() function) was being starved
of CPU time.

Does that sound likely?

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Monday, December 3, 2018 9:57 PM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin 
> Subject: Re: rcu_preempt caused oom
> 
> On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote:
> > Thanks, we have run the test for the whole weekend and not reproduce the 
> > issue,  so we confirm the CONFIG_RCU_BOOST can fix the issue.
> 
> Very good, that is encouraging.  Perhaps I should think about making 
> CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for 
> architectures for which rt_mutexes are implemented.
> 
> > We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic on 
> > rcu stall and will see if we can see the panic, will keep you posed with 
> > the test results.
> > echo 1 > /proc/sys/kernel/panic_on_rcu_stall
> 
> Looking forward to seeing what is going on!  Of course, to reproduce, you 
> will need to again build with CONFIG_RCU_BOOST=n.
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney 
> > Sent: Saturday, December 1, 2018 12:49 AM
> > To: He, Bo 
> > Cc: Steven Rostedt ; 
> > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> > ; Xiao, Jin ; Zhang, Yanmin 
> > 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote:
> > > Here is the kernel cmdline:
> > 
> > Thank you!
> > 
> > > Kernel command line: androidboot.acpio_idx=0
> > > androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_0
> > > 3- userdebug androidboot.diskbus=00.0 
> > > androidboot.verifiedbootstate=green
> > > androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb
> > > g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves 
> > > reboot_panic=p,w i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti 
> > > ftrace_dump_on_oops trace_buf_size=1024K intel_iommu=off gpt
> > > loglevel=4 androidboot.hardware=gordon_peak 
> > > firmware_class.path=/vendor/firmware relative_sleep_states=1
> > > enforcing=0 androidboot.selinux=permissive cpu_init_udelay=10 
> > > androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/pro
> > > pe rties/android/ pstore.backend=ramoops memmap=0x140$0x5000
> > > ramoops.mem_address=0x5000 ramoops.mem_size=0x140
> > > ramoops.record_size=0x4000 ramoops.console_size=0x100
> > > ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current
> > > i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 
> > > drm.vblankoffdelay=
> > 
> > And no sign of any suppression of RCU CPU stall warnings.  Hmmm...
> > It does take more than 21 seconds to OOM?  Or do things happen faster than 
> > that?  If they do happen faster than that, then on approach would be to add 
> > something like this to the kernel command line:
> > 
> > rcupdate.rcu_cpu_stall_timeout=7
> > 
> > Th

RE: rcu_preempt caused oom

2018-12-03 Thread He, Bo

Hi, Paul:
the enclosed is the log trigger the 120s hung_task_panic without other debug 
patches, the hung task is blocked at __wait_rcu_gp, it means the rcu_cpu_stall 
can't detect the scenario:
echo 1 > /proc/sys/kernel/panic_on_rcu_stall
echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout


-Original Message-
From: Paul E. McKenney  
Sent: Monday, December 3, 2018 9:57 PM
To: He, Bo 
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 

Subject: Re: rcu_preempt caused oom

On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote:
> Thanks, we have run the test for the whole weekend and not reproduce the 
> issue,  so we confirm the CONFIG_RCU_BOOST can fix the issue.

Very good, that is encouraging.  Perhaps I should think about making 
CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for 
architectures for which rt_mutexes are implemented.

> We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic on 
> rcu stall and will see if we can see the panic, will keep you posed with the 
> test results.
> echo 1 > /proc/sys/kernel/panic_on_rcu_stall

Looking forward to seeing what is going on!  Of course, to reproduce, you will 
need to again build with CONFIG_RCU_BOOST=n.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney 
> Sent: Saturday, December 1, 2018 12:49 AM
> To: He, Bo 
> Cc: Steven Rostedt ; 
> linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> ; Xiao, Jin ; Zhang, Yanmin 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote:
> > Here is the kernel cmdline:
> 
> Thank you!
> 
> > Kernel command line: androidboot.acpio_idx=0
> > androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_0
> > 3- userdebug androidboot.diskbus=00.0 
> > androidboot.verifiedbootstate=green
> > androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb
> > g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves 
> > reboot_panic=p,w i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti 
> > ftrace_dump_on_oops trace_buf_size=1024K intel_iommu=off gpt
> > loglevel=4 androidboot.hardware=gordon_peak 
> > firmware_class.path=/vendor/firmware relative_sleep_states=1
> > enforcing=0 androidboot.selinux=permissive cpu_init_udelay=10 
> > androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/pro
> > pe rties/android/ pstore.backend=ramoops memmap=0x140$0x5000
> > ramoops.mem_address=0x5000 ramoops.mem_size=0x140
> > ramoops.record_size=0x4000 ramoops.console_size=0x100
> > ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current
> > i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 
> > drm.vblankoffdelay=
> 
> And no sign of any suppression of RCU CPU stall warnings.  Hmmm...
> It does take more than 21 seconds to OOM?  Or do things happen faster than 
> that?  If they do happen faster than that, then on approach would be to add 
> something like this to the kernel command line:
> 
>   rcupdate.rcu_cpu_stall_timeout=7
> 
> This would set the stall timeout to seven seconds.  Note that timeouts less 
> than three seconds are silently interpreted as three seconds.
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Steven Rostedt 
> > Sent: Friday, November 30, 2018 11:17 PM
> > To: Paul E. McKenney 
> > Cc: He, Bo ; linux-kernel@vger.kernel.org; 
> > j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> > jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> > ; Zhang, Yanmin 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Fri, 30 Nov 2018 06:43:17 -0800
> > "Paul E. McKenney"  wrote:
> > 
> > > Could you please send me your list of kernel boot parameters?  
> > > They usually appear near the start of your console output.
> > 
> > Or just: cat /proc/cmdline
> > 
> > -- Steve
> > 
> 



apanic_console
Description: apanic_console

Re: rcu_preempt caused oom

2018-12-03 Thread Paul E. McKenney

On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote:
> Thanks, we have run the test for the whole weekend and not reproduce the 
> issue,  so we confirm the CONFIG_RCU_BOOST can fix the issue.

Very good, that is encouraging.  Perhaps I should think about making
CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least
for architectures for which rt_mutexes are implemented.

> We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic on 
> rcu stall and will see if we can see the panic, will keep you posed with the 
> test results.
> echo 1 > /proc/sys/kernel/panic_on_rcu_stall

Looking forward to seeing what is going on!  Of course, to reproduce, you
will need to again build with CONFIG_RCU_BOOST=n.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Saturday, December 1, 2018 12:49 AM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin 
> Subject: Re: rcu_preempt caused oom
> 
> On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote:
> > Here is the kernel cmdline:
> 
> Thank you!
> 
> > Kernel command line: androidboot.acpio_idx=0  
> > androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_03-
> > userdebug androidboot.diskbus=00.0 androidboot.verifiedbootstate=green 
> > androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb 
> > g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves 
> > reboot_panic=p,w i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti 
> > ftrace_dump_on_oops trace_buf_size=1024K intel_iommu=off gpt 
> > loglevel=4 androidboot.hardware=gordon_peak 
> > firmware_class.path=/vendor/firmware relative_sleep_states=1 
> > enforcing=0 androidboot.selinux=permissive cpu_init_udelay=10 
> > androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/prope
> > rties/android/ pstore.backend=ramoops memmap=0x140$0x5000 
> > ramoops.mem_address=0x5000 ramoops.mem_size=0x140 
> > ramoops.record_size=0x4000 ramoops.console_size=0x100 
> > ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current
> > i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 
> > drm.vblankoffdelay=
> 
> And no sign of any suppression of RCU CPU stall warnings.  Hmmm...
> It does take more than 21 seconds to OOM?  Or do things happen faster than 
> that?  If they do happen faster than that, then on approach would be to add 
> something like this to the kernel command line:
> 
>   rcupdate.rcu_cpu_stall_timeout=7
> 
> This would set the stall timeout to seven seconds.  Note that timeouts less 
> than three seconds are silently interpreted as three seconds.
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Steven Rostedt 
> > Sent: Friday, November 30, 2018 11:17 PM
> > To: Paul E. McKenney 
> > Cc: He, Bo ; linux-kernel@vger.kernel.org; 
> > j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> > jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> > ; Zhang, Yanmin 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Fri, 30 Nov 2018 06:43:17 -0800
> > "Paul E. McKenney"  wrote:
> > 
> > > Could you please send me your list of kernel boot parameters?  They 
> > > usually appear near the start of your console output.
> > 
> > Or just: cat /proc/cmdline
> > 
> > -- Steve
> > 
>

RE: rcu_preempt caused oom

2018-12-02 Thread He, Bo

Thanks, we have run the test for the whole weekend and not reproduce the issue, 
 so we confirm the CONFIG_RCU_BOOST can fix the issue.

We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic on rcu 
stall and will see if we can see the panic, will keep you posed with the test 
results.
echo 1 > /proc/sys/kernel/panic_on_rcu_stall

-Original Message-
From: Paul E. McKenney  
Sent: Saturday, December 1, 2018 12:49 AM
To: He, Bo 
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 

Subject: Re: rcu_preempt caused oom

On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote:
> Here is the kernel cmdline:

Thank you!

> Kernel command line: androidboot.acpio_idx=0  
> androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_03-
> userdebug androidboot.diskbus=00.0 androidboot.verifiedbootstate=green 
> androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb 
> g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves 
> reboot_panic=p,w i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti 
> ftrace_dump_on_oops trace_buf_size=1024K intel_iommu=off gpt 
> loglevel=4 androidboot.hardware=gordon_peak 
> firmware_class.path=/vendor/firmware relative_sleep_states=1 
> enforcing=0 androidboot.selinux=permissive cpu_init_udelay=10 
> androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/prope
> rties/android/ pstore.backend=ramoops memmap=0x140$0x5000 
> ramoops.mem_address=0x5000 ramoops.mem_size=0x140 
> ramoops.record_size=0x4000 ramoops.console_size=0x100 
> ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current
> i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 
> drm.vblankoffdelay=

And no sign of any suppression of RCU CPU stall warnings.  Hmmm...
It does take more than 21 seconds to OOM?  Or do things happen faster than 
that?  If they do happen faster than that, then on approach would be to add 
something like this to the kernel command line:

rcupdate.rcu_cpu_stall_timeout=7

This would set the stall timeout to seven seconds.  Note that timeouts less 
than three seconds are silently interpreted as three seconds.

Thanx, Paul

> -Original Message-
> From: Steven Rostedt 
> Sent: Friday, November 30, 2018 11:17 PM
> To: Paul E. McKenney 
> Cc: He, Bo ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin 
> Subject: Re: rcu_preempt caused oom
> 
> On Fri, 30 Nov 2018 06:43:17 -0800
> "Paul E. McKenney"  wrote:
> 
> > Could you please send me your list of kernel boot parameters?  They 
> > usually appear near the start of your console output.
> 
> Or just: cat /proc/cmdline
> 
> -- Steve
>

Re: rcu_preempt caused oom

2018-11-30 Thread Paul E. McKenney

On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote:
> Here is the kernel cmdline:

Thank you!

> Kernel command line: androidboot.acpio_idx=0  
> androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_03-userdebug
>  androidboot.diskbus=00.0 androidboot.verifiedbootstate=green 
> androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb 
> g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves reboot_panic=p,w 
> i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti ftrace_dump_on_oops 
> trace_buf_size=1024K intel_iommu=off gpt loglevel=4 
> androidboot.hardware=gordon_peak firmware_class.path=/vendor/firmware 
> relative_sleep_states=1 enforcing=0 androidboot.selinux=permissive 
> cpu_init_udelay=10 
> androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/properties/android/
>  pstore.backend=ramoops memmap=0x140$0x5000 
> ramoops.mem_address=0x5000 ramoops.mem_size=0x140 
> ramoops.record_size=0x4000 ramoops.console_size=0x100 
> ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current 
> i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 drm.vblankoffdelay=

And no sign of any suppression of RCU CPU stall warnings.  Hmmm...
It does take more than 21 seconds to OOM?  Or do things happen faster
than that?  If they do happen faster than that, then on approach would
be to add something like this to the kernel command line:

rcupdate.rcu_cpu_stall_timeout=7

This would set the stall timeout to seven seconds.  Note that timeouts
less than three seconds are silently interpreted as three seconds.

Thanx, Paul

> -Original Message-
> From: Steven Rostedt  
> Sent: Friday, November 30, 2018 11:17 PM
> To: Paul E. McKenney 
> Cc: He, Bo ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin 
> Subject: Re: rcu_preempt caused oom
> 
> On Fri, 30 Nov 2018 06:43:17 -0800
> "Paul E. McKenney"  wrote:
> 
> > Could you please send me your list of kernel boot parameters?  They 
> > usually appear near the start of your console output.
> 
> Or just: cat /proc/cmdline
> 
> -- Steve
>

RE: rcu_preempt caused oom

2018-11-30 Thread He, Bo

Here is the kernel cmdline:

Kernel command line: androidboot.acpio_idx=0  
androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_03-userdebug 
androidboot.diskbus=00.0 androidboot.verifiedbootstate=green 
androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb 
g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves reboot_panic=p,w 
i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti ftrace_dump_on_oops 
trace_buf_size=1024K intel_iommu=off gpt loglevel=4 
androidboot.hardware=gordon_peak firmware_class.path=/vendor/firmware 
relative_sleep_states=1 enforcing=0 androidboot.selinux=permissive 
cpu_init_udelay=10 
androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/properties/android/
 pstore.backend=ramoops memmap=0x140$0x5000 
ramoops.mem_address=0x5000 ramoops.mem_size=0x140 
ramoops.record_size=0x4000 ramoops.console_size=0x100 
ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current 
i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 drm.vblankoffdelay=

-Original Message-
From: Steven Rostedt  
Sent: Friday, November 30, 2018 11:17 PM
To: Paul E. McKenney 
Cc: He, Bo ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 

Subject: Re: rcu_preempt caused oom

On Fri, 30 Nov 2018 06:43:17 -0800
"Paul E. McKenney"  wrote:

> Could you please send me your list of kernel boot parameters?  They 
> usually appear near the start of your console output.

Or just: cat /proc/cmdline

-- Steve

Re: rcu_preempt caused oom

2018-11-30 Thread Steven Rostedt

On Fri, 30 Nov 2018 06:43:17 -0800
"Paul E. McKenney"  wrote:

> Could you please send me your list of kernel boot parameters?  They
> usually appear near the start of your console output.

Or just: cat /proc/cmdline

-- Steve

Re: rcu_preempt caused oom

2018-11-30 Thread Paul E. McKenney

On Fri, Nov 30, 2018 at 08:03:38AM +, He, Bo wrote:
> Thanks for your great suggestions.
> After enable the CONFIG_RCU_BOOST=y, we don't reproduce the issue until now, 
> we will keep it running and update you with the test results.
> 
> The enclosed is the kernel config, here is the config I grep with the RCU, we 
> don't enable the CONFIG_RCU_BOOST in our build.
> # RCU Subsystem
> CONFIG_PREEMPT_RCU=y
> # CONFIG_RCU_EXPERT is not set
> CONFIG_SRCU=y
> CONFIG_TREE_SRCU=y
> CONFIG_TASKS_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_NEED_SEGCBLIST=y
> # RCU Debugging
> CONFIG_RCU_PERF_TEST=m
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_TIMEOUT=21
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_EQS_DEBUG=y

Thank you!

What likely happened is that a low-priority RCU reader was preempted
indefinitely.  Though I would have expected an RCU CPU stall warning
in that case, so it might well be that something else is going on.
Could you please send me your list of kernel boot parameters?  They
usually appear near the start of your console output.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Thursday, November 29, 2018 10:27 PM
> To: He, Bo 
> Cc: linux-kernel@vger.kernel.org; j...@joshtriplett.org; rost...@goodmis.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> ; Xiao, Jin ; Zhang, Yanmin 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Thu, Nov 29, 2018 at 05:06:47AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 29, 2018 at 08:49:35AM +, He, Bo wrote:
> > > Hi, 
> > >   we test on kernel 4.19.0 on android, after run more than 24 Hours 
> > > monkey stress test, we see OOM on 1/10 2G memory board, the issue is not 
> > > seen on the 4.14 kernel.
> > > we have done some debugs:
> > > 1. OOM is due to the filp consume too many memory: 300M vs 2G board.
> > > 2. with the 120s hung task detect, most of the tasks will block at 
> > > __wait_rcu_gp: wait_for_completion(&rs_array[i].completion);
> 
> Did you did see any RCU CPU stall warnings?  Or have those been disabled?
> If they have been disabled, could you please rerun with them enabled?
> 
> > > [47571.863839] Kernel panic - not syncing: hung_task: blocked tasks
> > > [47571.875446] CPU: 1 PID: 13626 Comm: FinalizerDaemon Tainted: G U   
> > >   O  4.19.0-quilt-2e5dc0ac-gf3f313245eb6 #1
> > > [47571.887603] Call Trace:
> > > [47571.890547]  dump_stack+0x70/0xa5 [47571.894456]  
> > > panic+0xe3/0x241 [47571.897977]  ? 
> > > wait_for_completion_timeout+0x72/0x1b0
> > > [47571.903830]  __wait_rcu_gp+0x17b/0x180 [47571.908226]  
> > > synchronize_rcu.part.76+0x38/0x50 [47571.913393]  ? 
> > > __call_rcu.constprop.79+0x3a0/0x3a0
> > > [47571.918948]  ? __bpf_trace_rcu_invoke_callback+0x10/0x10
> > > [47571.925094]  synchronize_rcu+0x43/0x50 [47571.929487]  
> > > evdev_detach_client+0x59/0x60 [47571.934264]  
> > > evdev_release+0x4e/0xd0 [47571.938464]  __fput+0xfa/0x1f0 
> > > [47571.942072]  fput+0xe/0x10 [47571.945683]  
> > > task_work_run+0x90/0xc0 [47571.949884]  
> > > exit_to_usermode_loop+0x9f/0xb0 [47571.954855]  
> > > do_syscall_64+0xfa/0x110 [47571.959151]  
> > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> This is indeed a task waiting on synchronize_rcu().
> 
> > > 3. after enable the rcu trace, we don't see rcu_quiescent_state_report 
> > > trace in a long time, we see rcu_callback: rcu_preempt will never 
> > > response with the rcu_invoke_callback.
> > > [47572.040668]  ps-12388   1d..1 47566097572us : rcu_grace_period: 
> > > rcu_preempt 23716088 AccWaitCB
> > > [47572.040707]  ps-12388   1d... 47566097621us : rcu_callback: 
> > > rcu_preempt rhp=783a728b func=file_free_rcu 4354/82824
> > > [47572.040734]  ps-12388   1d..1 47566097622us : 
> > > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> > > [47572.040756]  ps-12388   1d..1 47566097623us : 
> > > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> > > [47572.040778]  ps-12388   1d..1 47566097623us : rcu_grace_period: 
> > > rcu_preempt 23716088 AccWaitCB
> > > [47572.040802]  ps-12388   1d... 47566097674us : rcu_callback: 
> > > rcu_preempt rhp=42c76521 func=file_free_rcu 4354/82825
> > > [47572.040824]  ps-12388   1d..1 47566097676us : 
> > > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> > > [47572.0

Re: rcu_preempt caused oom

2018-11-29 Thread Paul E. McKenney

On Thu, Nov 29, 2018 at 05:06:47AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 29, 2018 at 08:49:35AM +, He, Bo wrote:
> > Hi, 
> >   we test on kernel 4.19.0 on android, after run more than 24 Hours 
> > monkey stress test, we see OOM on 1/10 2G memory board, the issue is not 
> > seen on the 4.14 kernel.
> > we have done some debugs:
> > 1. OOM is due to the filp consume too many memory: 300M vs 2G board.
> > 2. with the 120s hung task detect, most of the tasks will block at 
> > __wait_rcu_gp: wait_for_completion(&rs_array[i].completion);

Did you did see any RCU CPU stall warnings?  Or have those been disabled?
If they have been disabled, could you please rerun with them enabled?

> > [47571.863839] Kernel panic - not syncing: hung_task: blocked tasks
> > [47571.875446] CPU: 1 PID: 13626 Comm: FinalizerDaemon Tainted: G U 
> > O  4.19.0-quilt-2e5dc0ac-gf3f313245eb6 #1
> > [47571.887603] Call Trace:
> > [47571.890547]  dump_stack+0x70/0xa5
> > [47571.894456]  panic+0xe3/0x241
> > [47571.897977]  ? wait_for_completion_timeout+0x72/0x1b0
> > [47571.903830]  __wait_rcu_gp+0x17b/0x180
> > [47571.908226]  synchronize_rcu.part.76+0x38/0x50
> > [47571.913393]  ? __call_rcu.constprop.79+0x3a0/0x3a0
> > [47571.918948]  ? __bpf_trace_rcu_invoke_callback+0x10/0x10
> > [47571.925094]  synchronize_rcu+0x43/0x50
> > [47571.929487]  evdev_detach_client+0x59/0x60
> > [47571.934264]  evdev_release+0x4e/0xd0
> > [47571.938464]  __fput+0xfa/0x1f0
> > [47571.942072]  fput+0xe/0x10
> > [47571.945683]  task_work_run+0x90/0xc0
> > [47571.949884]  exit_to_usermode_loop+0x9f/0xb0
> > [47571.954855]  do_syscall_64+0xfa/0x110
> > [47571.959151]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is indeed a task waiting on synchronize_rcu().

> > 3. after enable the rcu trace, we don't see rcu_quiescent_state_report 
> > trace in a long time, we see rcu_callback: rcu_preempt will never response 
> > with the rcu_invoke_callback.
> > [47572.040668]  ps-12388   1d..1 47566097572us : rcu_grace_period: 
> > rcu_preempt 23716088 AccWaitCB
> > [47572.040707]  ps-12388   1d... 47566097621us : rcu_callback: 
> > rcu_preempt rhp=783a728b func=file_free_rcu 4354/82824
> > [47572.040734]  ps-12388   1d..1 47566097622us : 
> > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> > [47572.040756]  ps-12388   1d..1 47566097623us : 
> > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> > [47572.040778]  ps-12388   1d..1 47566097623us : rcu_grace_period: 
> > rcu_preempt 23716088 AccWaitCB
> > [47572.040802]  ps-12388   1d... 47566097674us : rcu_callback: 
> > rcu_preempt rhp=42c76521 func=file_free_rcu 4354/82825
> > [47572.040824]  ps-12388   1d..1 47566097676us : 
> > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> > [47572.040847]  ps-12388   1d..1 47566097676us : 
> > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> > [47572.040868]  ps-12388   1d..1 47566097676us : rcu_grace_period: 
> > rcu_preempt 23716088 AccWaitCB
> > [47572.040895]  ps-12388   1d..1 47566097716us : rcu_callback: 
> > rcu_preempt rhp=5e40fde2 func=avc_node_free 4354/82826
> > [47572.040919]  ps-12388   1d..1 47566097735us : rcu_callback: 
> > rcu_preempt rhp=f80fe353 func=avc_node_free 4354/82827
> > [47572.040943]  ps-12388   1d..1 47566097758us : rcu_callback: 
> > rcu_preempt rhp=7486f400 func=avc_node_free 4354/82828
> > [47572.040967]  ps-12388   1d..1 47566097760us : rcu_callback: 
> > rcu_preempt rhp=b87872a8 func=avc_node_free 4354/82829
> > [47572.040990]  ps-12388   1d... 47566097789us : rcu_callback: 
> > rcu_preempt rhp=8c656343 func=file_free_rcu 4354/82830
> > [47572.041013]  ps-12388   1d..1 47566097790us : 
> > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> > [47572.041036]  ps-12388   1d..1 47566097790us : 
> > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> > [47572.041057]  ps-12388   1d..1 47566097791us : rcu_grace_period: 
> > rcu_preempt 23716088 AccWaitCB
> > [47572.041081]  ps-12388   1d... 47566097871us : rcu_callback: 
> > rcu_preempt rhp=7e6c898c func=file_free_rcu 4354/82831
> > [47572.041103]  ps-12388   1d..1 47566097872us : 
> > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> > [47572.041126]  ps-12388   1d..1 47566097872us : 
> > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> > [47572.041147]  ps-12388   1d..1 47566097873us : rcu_grace_period: 
> > rcu_preempt 23716088 AccWaitCB
> > [47572.041170]  ps-12388   1d... 47566097945us : rcu_callback: 
> > rcu_preempt rhp=32f4f174 func=file_free_rcu 4354/82832
> > [47572.041193]  ps-12388   1d..1 47566097946us : 
> > rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf

Callbacks are being queued and

Re: rcu_preempt caused oom

2018-11-29 Thread Paul E. McKenney

On Thu, Nov 29, 2018 at 08:49:35AM +, He, Bo wrote:
> Hi, 
>   we test on kernel 4.19.0 on android, after run more than 24 Hours 
> monkey stress test, we see OOM on 1/10 2G memory board, the issue is not seen 
> on the 4.14 kernel.
> we have done some debugs:
> 1. OOM is due to the filp consume too many memory: 300M vs 2G board.
> 2. with the 120s hung task detect, most of the tasks will block at 
> __wait_rcu_gp: wait_for_completion(&rs_array[i].completion);
> [47571.863839] Kernel panic - not syncing: hung_task: blocked tasks
> [47571.875446] CPU: 1 PID: 13626 Comm: FinalizerDaemon Tainted: G U O 
>  4.19.0-quilt-2e5dc0ac-gf3f313245eb6 #1
> [47571.887603] Call Trace:
> [47571.890547]  dump_stack+0x70/0xa5
> [47571.894456]  panic+0xe3/0x241
> [47571.897977]  ? wait_for_completion_timeout+0x72/0x1b0
> [47571.903830]  __wait_rcu_gp+0x17b/0x180
> [47571.908226]  synchronize_rcu.part.76+0x38/0x50
> [47571.913393]  ? __call_rcu.constprop.79+0x3a0/0x3a0
> [47571.918948]  ? __bpf_trace_rcu_invoke_callback+0x10/0x10
> [47571.925094]  synchronize_rcu+0x43/0x50
> [47571.929487]  evdev_detach_client+0x59/0x60
> [47571.934264]  evdev_release+0x4e/0xd0
> [47571.938464]  __fput+0xfa/0x1f0
> [47571.942072]  fput+0xe/0x10
> [47571.945683]  task_work_run+0x90/0xc0
> [47571.949884]  exit_to_usermode_loop+0x9f/0xb0
> [47571.954855]  do_syscall_64+0xfa/0x110
> [47571.959151]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 3. after enable the rcu trace, we don't see rcu_quiescent_state_report trace 
> in a long time, we see rcu_callback: rcu_preempt will never response with the 
> rcu_invoke_callback.
> [47572.040668]  ps-12388   1d..1 47566097572us : rcu_grace_period: 
> rcu_preempt 23716088 AccWaitCB
> [47572.040707]  ps-12388   1d... 47566097621us : rcu_callback: 
> rcu_preempt rhp=783a728b func=file_free_rcu 4354/82824
> [47572.040734]  ps-12388   1d..1 47566097622us : rcu_future_grace_period: 
> rcu_preempt 23716088 23716092 0 0 3 Startleaf
> [47572.040756]  ps-12388   1d..1 47566097623us : rcu_future_grace_period: 
> rcu_preempt 23716088 23716092 0 0 3 Prestarted
> [47572.040778]  ps-12388   1d..1 47566097623us : rcu_grace_period: 
> rcu_preempt 23716088 AccWaitCB
> [47572.040802]  ps-12388   1d... 47566097674us : rcu_callback: 
> rcu_preempt rhp=42c76521 func=file_free_rcu 4354/82825
> [47572.040824]  ps-12388   1d..1 47566097676us : rcu_future_grace_period: 
> rcu_preempt 23716088 23716092 0 0 3 Startleaf
> [47572.040847]  ps-12388   1d..1 47566097676us : rcu_future_grace_period: 
> rcu_preempt 23716088 23716092 0 0 3 Prestarted
> [47572.040868]  ps-12388   1d..1 47566097676us : rcu_grace_period: 
> rcu_preempt 23716088 AccWaitCB
> [47572.040895]  ps-12388   1d..1 47566097716us : rcu_callback: 
> rcu_preempt rhp=5e40fde2 func=avc_node_free 4354/82826
> [47572.040919]  ps-12388   1d..1 47566097735us : rcu_callback: 
> rcu_preempt rhp=f80fe353 func=avc_node_free 4354/82827
> [47572.040943]  ps-12388   1d..1 47566097758us : rcu_callback: 
> rcu_preempt rhp=7486f400 func=avc_node_free 4354/82828
> [47572.040967]  ps-12388   1d..1 47566097760us : rcu_callback: 
> rcu_preempt rhp=b87872a8 func=avc_node_free 4354/82829
> [47572.040990]  ps-12388   1d... 47566097789us : rcu_callback: 
> rcu_preempt rhp=8c656343 func=file_free_rcu 4354/82830
> [47572.041013]  ps-12388   1d..1 47566097790us : rcu_future_grace_period: 
> rcu_preempt 23716088 23716092 0 0 3 Startleaf
> [47572.041036]  ps-12388   1d..1 47566097790us : rcu_future_grace_period: 
> rcu_preempt 23716088 23716092 0 0 3 Prestarted
> [47572.041057]  ps-12388   1d..1 47566097791us : rcu_grace_period: 
> rcu_preempt 23716088 AccWaitCB
> [47572.041081]  ps-12388   1d... 47566097871us : rcu_callback: 
> rcu_preempt rhp=7e6c898c func=file_free_rcu 4354/82831
> [47572.041103]  ps-12388   1d..1 47566097872us : rcu_future_grace_period: 
> rcu_preempt 23716088 23716092 0 0 3 Startleaf
> [47572.041126]  ps-12388   1d..1 47566097872us : rcu_future_grace_period: 
> rcu_preempt 23716088 23716092 0 0 3 Prestarted
> [47572.041147]  ps-12388   1d..1 47566097873us : rcu_grace_period: 
> rcu_preempt 23716088 AccWaitCB
> [47572.041170]  ps-12388   1d... 47566097945us : rcu_callback: 
> rcu_preempt rhp=32f4f174 func=file_free_rcu 4354/82832
> [47572.041193]  ps-12388   1d..1 47566097946us : rcu_future_grace_period: 
> rcu_preempt 23716088 23716092 0 0 3 Startleaf
> 
> Do you have any suggestions to debug the issue?

If you do not already have CONFIG_RCU_BOOST=y set, could you please
rebuild with that?

Could you also please send your .config file?

Thanx, Paul

rcu_preempt caused oom

2018-11-29 Thread He, Bo

Hi, 
  we test on kernel 4.19.0 on android, after run more than 24 Hours monkey 
stress test, we see OOM on 1/10 2G memory board, the issue is not seen on the 
4.14 kernel.
we have done some debugs:
1. OOM is due to the filp consume too many memory: 300M vs 2G board.
2. with the 120s hung task detect, most of the tasks will block at 
__wait_rcu_gp: wait_for_completion(&rs_array[i].completion);
[47571.863839] Kernel panic - not syncing: hung_task: blocked tasks
[47571.875446] CPU: 1 PID: 13626 Comm: FinalizerDaemon Tainted: G U O   
   4.19.0-quilt-2e5dc0ac-gf3f313245eb6 #1
[47571.887603] Call Trace:
[47571.890547]  dump_stack+0x70/0xa5
[47571.894456]  panic+0xe3/0x241
[47571.897977]  ? wait_for_completion_timeout+0x72/0x1b0
[47571.903830]  __wait_rcu_gp+0x17b/0x180
[47571.908226]  synchronize_rcu.part.76+0x38/0x50
[47571.913393]  ? __call_rcu.constprop.79+0x3a0/0x3a0
[47571.918948]  ? __bpf_trace_rcu_invoke_callback+0x10/0x10
[47571.925094]  synchronize_rcu+0x43/0x50
[47571.929487]  evdev_detach_client+0x59/0x60
[47571.934264]  evdev_release+0x4e/0xd0
[47571.938464]  __fput+0xfa/0x1f0
[47571.942072]  fput+0xe/0x10
[47571.945683]  task_work_run+0x90/0xc0
[47571.949884]  exit_to_usermode_loop+0x9f/0xb0
[47571.954855]  do_syscall_64+0xfa/0x110
[47571.959151]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
3. after enable the rcu trace, we don't see rcu_quiescent_state_report trace in 
a long time, we see rcu_callback: rcu_preempt will never response with the 
rcu_invoke_callback.
[47572.040668]  ps-12388   1d..1 47566097572us : rcu_grace_period: 
rcu_preempt 23716088 AccWaitCB
[47572.040707]  ps-12388   1d... 47566097621us : rcu_callback: rcu_preempt 
rhp=783a728b func=file_free_rcu 4354/82824
[47572.040734]  ps-12388   1d..1 47566097622us : rcu_future_grace_period: 
rcu_preempt 23716088 23716092 0 0 3 Startleaf
[47572.040756]  ps-12388   1d..1 47566097623us : rcu_future_grace_period: 
rcu_preempt 23716088 23716092 0 0 3 Prestarted
[47572.040778]  ps-12388   1d..1 47566097623us : rcu_grace_period: 
rcu_preempt 23716088 AccWaitCB
[47572.040802]  ps-12388   1d... 47566097674us : rcu_callback: rcu_preempt 
rhp=42c76521 func=file_free_rcu 4354/82825
[47572.040824]  ps-12388   1d..1 47566097676us : rcu_future_grace_period: 
rcu_preempt 23716088 23716092 0 0 3 Startleaf
[47572.040847]  ps-12388   1d..1 47566097676us : rcu_future_grace_period: 
rcu_preempt 23716088 23716092 0 0 3 Prestarted
[47572.040868]  ps-12388   1d..1 47566097676us : rcu_grace_period: 
rcu_preempt 23716088 AccWaitCB
[47572.040895]  ps-12388   1d..1 47566097716us : rcu_callback: rcu_preempt 
rhp=5e40fde2 func=avc_node_free 4354/82826
[47572.040919]  ps-12388   1d..1 47566097735us : rcu_callback: rcu_preempt 
rhp=f80fe353 func=avc_node_free 4354/82827
[47572.040943]  ps-12388   1d..1 47566097758us : rcu_callback: rcu_preempt 
rhp=7486f400 func=avc_node_free 4354/82828
[47572.040967]  ps-12388   1d..1 47566097760us : rcu_callback: rcu_preempt 
rhp=b87872a8 func=avc_node_free 4354/82829
[47572.040990]  ps-12388   1d... 47566097789us : rcu_callback: rcu_preempt 
rhp=8c656343 func=file_free_rcu 4354/82830
[47572.041013]  ps-12388   1d..1 47566097790us : rcu_future_grace_period: 
rcu_preempt 23716088 23716092 0 0 3 Startleaf
[47572.041036]  ps-12388   1d..1 47566097790us : rcu_future_grace_period: 
rcu_preempt 23716088 23716092 0 0 3 Prestarted
[47572.041057]  ps-12388   1d..1 47566097791us : rcu_grace_period: 
rcu_preempt 23716088 AccWaitCB
[47572.041081]  ps-12388   1d... 47566097871us : rcu_callback: rcu_preempt 
rhp=7e6c898c func=file_free_rcu 4354/82831
[47572.041103]  ps-12388   1d..1 47566097872us : rcu_future_grace_period: 
rcu_preempt 23716088 23716092 0 0 3 Startleaf
[47572.041126]  ps-12388   1d..1 47566097872us : rcu_future_grace_period: 
rcu_preempt 23716088 23716092 0 0 3 Prestarted
[47572.041147]  ps-12388   1d..1 47566097873us : rcu_grace_period: 
rcu_preempt 23716088 AccWaitCB
[47572.041170]  ps-12388   1d... 47566097945us : rcu_callback: rcu_preempt 
rhp=32f4f174 func=file_free_rcu 4354/82832
[47572.041193]  ps-12388   1d..1 47566097946us : rcu_future_grace_period: 
rcu_preempt 23716088 23716092 0 0 3 Startleaf

Do you have any suggestions to debug the issue?

39 matches

Mail list logo