Re: rcu_dyntick and suspicious RCU usage

2012-07-17 Thread Paul E. McKenney
On Fri, Jul 13, 2012 at 05:12:30PM +0800, Fengguang Wu wrote:
> On Thu, Jul 12, 2012 at 10:02:42AM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 13, 2012 at 12:57:38AM +0800, Fengguang Wu wrote:
> > > On Thu, Jul 12, 2012 at 09:43:08AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
> > > > > > Hi Paul,
> > > > > > 
> > > > > > Fortunately this bug is bisectable and the first bad commit is:
> > > > > > 
> > > > > > commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
> > > > > > Author: Paul E. McKenney 
> > > > > > Date:   Fri Sep 30 12:10:22 2011 -0700
> > > > > > 
> > > > > > rcu: Track idleness independent of idle tasks
> > > > > 
> > > > > OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
> > > > > traces while in idle.  The confusion on my part was that in TREE_RCU,
> > > > > the nesting and dyntick-idle indication are different, while in
> > > > > TINY_RCU they are one and the same.
> > > > > 
> > > > > Does the following patch help?
> > > > 
> > > > This one failed in my testing.  Please see the end for the fixed
> > > > version, with on small but important change.
> > > 
> > > It worked, thanks!
> > > 
> > > Tested-by: Fengguang Wu 
> > 
> > Very good!  (And please ignore my resend of the same patch.)
> > 
> > I will queue this.
> 
> Will you recommend it for -stable? It impacts 3.3/3.4. I tested it on
> 3.4 and it works fine. However for 3.3, the patch cannot apply cleanly.

If someone hits it in a real workload, I might consider it.  I currently
have it queued for 3.7.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-17 Thread Paul E. McKenney
On Fri, Jul 13, 2012 at 05:12:30PM +0800, Fengguang Wu wrote:
 On Thu, Jul 12, 2012 at 10:02:42AM -0700, Paul E. McKenney wrote:
  On Fri, Jul 13, 2012 at 12:57:38AM +0800, Fengguang Wu wrote:
   On Thu, Jul 12, 2012 at 09:43:08AM -0700, Paul E. McKenney wrote:
On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
 On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
  Hi Paul,
  
  Fortunately this bug is bisectable and the first bad commit is:
  
  commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
  Author: Paul E. McKenney paul.mcken...@linaro.org
  Date:   Fri Sep 30 12:10:22 2011 -0700
  
  rcu: Track idleness independent of idle tasks
 
 OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
 traces while in idle.  The confusion on my part was that in TREE_RCU,
 the nesting and dyntick-idle indication are different, while in
 TINY_RCU they are one and the same.
 
 Does the following patch help?

This one failed in my testing.  Please see the end for the fixed
version, with on small but important change.
   
   It worked, thanks!
   
   Tested-by: Fengguang Wu w...@linux.intel.com
  
  Very good!  (And please ignore my resend of the same patch.)
  
  I will queue this.
 
 Will you recommend it for -stable? It impacts 3.3/3.4. I tested it on
 3.4 and it works fine. However for 3.3, the patch cannot apply cleanly.

If someone hits it in a real workload, I might consider it.  I currently
have it queued for 3.7.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-13 Thread Fengguang Wu
On Thu, Jul 12, 2012 at 10:02:42AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 13, 2012 at 12:57:38AM +0800, Fengguang Wu wrote:
> > On Thu, Jul 12, 2012 at 09:43:08AM -0700, Paul E. McKenney wrote:
> > > On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
> > > > > Hi Paul,
> > > > > 
> > > > > Fortunately this bug is bisectable and the first bad commit is:
> > > > > 
> > > > > commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
> > > > > Author: Paul E. McKenney 
> > > > > Date:   Fri Sep 30 12:10:22 2011 -0700
> > > > > 
> > > > > rcu: Track idleness independent of idle tasks
> > > > 
> > > > OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
> > > > traces while in idle.  The confusion on my part was that in TREE_RCU,
> > > > the nesting and dyntick-idle indication are different, while in
> > > > TINY_RCU they are one and the same.
> > > > 
> > > > Does the following patch help?
> > > 
> > > This one failed in my testing.  Please see the end for the fixed
> > > version, with on small but important change.
> > 
> > It worked, thanks!
> > 
> > Tested-by: Fengguang Wu 
> 
> Very good!  (And please ignore my resend of the same patch.)
> 
> I will queue this.

Will you recommend it for -stable? It impacts 3.3/3.4. I tested it on
3.4 and it works fine. However for 3.3, the patch cannot apply cleanly.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-13 Thread Fengguang Wu
On Thu, Jul 12, 2012 at 10:02:42AM -0700, Paul E. McKenney wrote:
 On Fri, Jul 13, 2012 at 12:57:38AM +0800, Fengguang Wu wrote:
  On Thu, Jul 12, 2012 at 09:43:08AM -0700, Paul E. McKenney wrote:
   On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
 Hi Paul,
 
 Fortunately this bug is bisectable and the first bad commit is:
 
 commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
 Author: Paul E. McKenney paul.mcken...@linaro.org
 Date:   Fri Sep 30 12:10:22 2011 -0700
 
 rcu: Track idleness independent of idle tasks

OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
traces while in idle.  The confusion on my part was that in TREE_RCU,
the nesting and dyntick-idle indication are different, while in
TINY_RCU they are one and the same.

Does the following patch help?
   
   This one failed in my testing.  Please see the end for the fixed
   version, with on small but important change.
  
  It worked, thanks!
  
  Tested-by: Fengguang Wu w...@linux.intel.com
 
 Very good!  (And please ignore my resend of the same patch.)
 
 I will queue this.

Will you recommend it for -stable? It impacts 3.3/3.4. I tested it on
3.4 and it works fine. However for 3.3, the patch cannot apply cleanly.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-12 Thread Paul E. McKenney
On Fri, Jul 13, 2012 at 12:57:38AM +0800, Fengguang Wu wrote:
> On Thu, Jul 12, 2012 at 09:43:08AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
> > > On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
> > > > Hi Paul,
> > > > 
> > > > Fortunately this bug is bisectable and the first bad commit is:
> > > > 
> > > > commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
> > > > Author: Paul E. McKenney 
> > > > Date:   Fri Sep 30 12:10:22 2011 -0700
> > > > 
> > > > rcu: Track idleness independent of idle tasks
> > > 
> > > OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
> > > traces while in idle.  The confusion on my part was that in TREE_RCU,
> > > the nesting and dyntick-idle indication are different, while in
> > > TINY_RCU they are one and the same.
> > > 
> > > Does the following patch help?
> > 
> > This one failed in my testing.  Please see the end for the fixed
> > version, with on small but important change.
> 
> It worked, thanks!
> 
> Tested-by: Fengguang Wu 

Very good!  (And please ignore my resend of the same patch.)

I will queue this.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-12 Thread Paul E. McKenney
On Thu, Jul 12, 2012 at 10:11:30PM +0800, Fengguang Wu wrote:
> On Wed, Jul 11, 2012 at 08:55:59PM +0800, Fengguang Wu wrote:
> > On Wed, Jul 11, 2012 at 08:36:16PM +0800, Fengguang Wu wrote:
> > > On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
> > > > > Hi Paul,
> > > > > 
> > > > > Fortunately this bug is bisectable and the first bad commit is:
> > > > > 
> > > > > commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
> > > > > Author: Paul E. McKenney 
> > > > > Date:   Fri Sep 30 12:10:22 2011 -0700
> > > > > 
> > > > > rcu: Track idleness independent of idle tasks
> > > > 
> > > > OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
> > > > traces while in idle.  The confusion on my part was that in TREE_RCU,
> > > > the nesting and dyntick-idle indication are different, while in
> > > > TINY_RCU they are one and the same.
> > > > 
> > > > Does the following patch help?
> > > 
> > > Not exactly, but the error message is now changed to:
> > 
> > Then I think it may help to test the linux-next tree that contains
> > latest RCU fixes. And it somehow helped. There are no warnings in
> > the linux-next tree 1 or 2 days ago; there is a maybe unrelated warning
> > in today's linux-next tree.
> > 
> > [0.013241] Performance Events: 
> > [0.014222] [ cut here ]
> > [0.014975] WARNING: at /c/wfg/linux/kernel/workqueue.c:1217 
> > worker_enter_idle+0x2fa/0x37c()
> 
> That's bisected to one of Tejun's workqueue commit..
> 
> > Both trees freeze after this point:
> > 
> > modprobe: FATAL: Could not load 
> > /lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
> > ^M
> > modprobe: FATAL: Could not load 
> > /lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
> > ^M
> 
> And that's bisected to Al's vfs work..
> 
> > So I'd better to base the test on one of your RCU branches..
> > Which branch would your recommend?
> 
> And finally I tried the patch on top of rcu/next, and get the attached
> dmesg and warnings:
> 
> [1.776415] Testing tracer preemptirqsoff: PASSED
> [1.779383] Testing tracer wakeup: [2.091320] ftrace-test (15) used 
> greatest stack depth: 7284 bytes left
> PASSED
> [2.094844] Testing tracer wakeup_rt: [2.179052] [ cut 
> here ]
> [2.181087] WARNING: at /c/wfg/linux/kernel/rcutiny.c:92 
> rcu_idle_enter+0xf0/0x244() 
> [2.183023] Hardware name: Bochs
> 
> [2.183023] Modules linked in:
> [2.183023]
> [2.183023] Pid: 0, comm: swapper Not tainted 3.5.0-rc5+ #59
> [2.183023] Call Trace:
> [2.183023]  [] warn_slowpath_common+0xfc/0x13b
> [2.183023]  [] ? rcu_idle_enter+0xf0/0x244
> [2.183023]  [] ? rcu_idle_enter+0xf0/0x244
> [2.183023]  [] warn_slowpath_null+0x3e/0x4e
> [2.183023]  [] rcu_idle_enter+0xf0/0x244   
> [2.183023]  [] cpu_idle+0x159/0x220 
> [2.183023]  [] rest_init+0x2db/0x2ef
> [2.183023]  [] ? reciprocal_value+0x64/0x64
> [2.183023]  [] start_kernel+0x8eb/0x8ff
> [2.183023]  [] ? repair_env_string+0xd3/0xd3
> [2.183023]  [] i386_start_kernel+0xf8/0x10b 
> [2.183023] ---[ end trace 44593438a59a9533 ]---  
> [2.195679] [ cut here ]
> [2.196428] WARNING: at /c/wfg/linux/kernel/rcutiny.c:113 
> rcu_irq_exit+0x100/0x217()
> [2.197633] Hardware name: Bochs
> [2.198183] Modules linked in:
> [2.198695] Pid: 0, comm: swapper Tainted: GW3.5.0-rc5+ #59
> [2.199671] Call Trace:
> [2.199671]  [] warn_slowpath_common+0xfc/0x13b
> [2.199671]  [] ? rcu_irq_exit+0x100/0x217 
> [2.199671]  [] ? rcu_irq_exit+0x100/0x217
> [2.199671]  [] warn_slowpath_null+0x3e/0x4e
> [2.199671]  [] rcu_irq_exit+0x100/0x217
> [2.199671]  [] irq_exit+0x117/0x14d
> [2.199671]  [] do_IRQ+0xe5/0x106
> [2.199671]  [] ? trace_hardirqs_off_caller+0x97/0x167
> [2.199671]  [] common_interrupt+0x35/0x3c
> [2.199671]  [] ? trace_hardirqs_on_caller+0x1b8/0x38c
> [2.199671]  [] ? native_safe_halt+0x13/0x23 
>  
> 
> The same rcu/next HEAD w/o patch, the warning is:
> 
> [4.456093] Testing event rcu_dyntick: [4.458251] 
> [4.458941] ===
> [4.459352] [ INFO: suspicious RCU usage. ]
> [4.459352] 3.5.0-rc5+ #60 Not tainted
> [4.459352] ---
> [4.459352] /c/wfg/linux/include/trace/events/rcu.h:275 suspicious 
> rcu_dereference_check() usage!
> [4.459352] 
> [4.459352] other info that might help us debug this:
> [4.459352] 
> [4.459352] 
> [4.459352] RCU used illegally from idle CPU!
> [4.459352] rcu_scheduler_active = 1, debug_locks = 0
> [4.459352] RCU used illegally from extended quiescent state!
> [4.459352] no locks held by swapper/0.
> 

Re: rcu_dyntick and suspicious RCU usage

2012-07-12 Thread Fengguang Wu
On Thu, Jul 12, 2012 at 09:43:08AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
> > > Hi Paul,
> > > 
> > > Fortunately this bug is bisectable and the first bad commit is:
> > > 
> > > commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
> > > Author: Paul E. McKenney 
> > > Date:   Fri Sep 30 12:10:22 2011 -0700
> > > 
> > > rcu: Track idleness independent of idle tasks
> > 
> > OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
> > traces while in idle.  The confusion on my part was that in TREE_RCU,
> > the nesting and dyntick-idle indication are different, while in
> > TINY_RCU they are one and the same.
> > 
> > Does the following patch help?
> 
> This one failed in my testing.  Please see the end for the fixed
> version, with on small but important change.

It worked, thanks!

Tested-by: Fengguang Wu 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-12 Thread Paul E. McKenney
On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
> On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
> > Hi Paul,
> > 
> > Fortunately this bug is bisectable and the first bad commit is:
> > 
> > commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
> > Author: Paul E. McKenney 
> > Date:   Fri Sep 30 12:10:22 2011 -0700
> > 
> > rcu: Track idleness independent of idle tasks
> 
> OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
> traces while in idle.  The confusion on my part was that in TREE_RCU,
> the nesting and dyntick-idle indication are different, while in
> TINY_RCU they are one and the same.
> 
> Does the following patch help?

This one failed in my testing.  Please see the end for the fixed
version, with on small but important change.

Thanx, Paul

> 
> 
> rcu: Pull TINY_RCU dyntick-idle tracing into non-idle region
> 
> Because TINY_RCU's idle detection keys directly off of the nesting
> level, rather than from a separate variable as in TREE_RCU, the
> TINY_RCU dyntick-idle tracing on transition to idle must happen
> before the change to the nesting level.  This commit therefore makes
> this change by passing the desired new value (rather than the old value)
> of the nesting level in to rcu_idle_enter_common().
> 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
> index 547b1fe..ca3741d 100644
> --- a/kernel/rcutiny.c
> +++ b/kernel/rcutiny.c
> @@ -56,24 +56,26 @@ static void __call_rcu(struct rcu_head *head,
>  static long long rcu_dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
>  
>  /* Common code for rcu_idle_enter() and rcu_irq_exit(), see 
> kernel/rcutree.c. */
> -static void rcu_idle_enter_common(long long oldval)
> +static void rcu_idle_enter_common(long long newval)
>  {
>   if (rcu_dynticks_nesting) {
>   RCU_TRACE(trace_rcu_dyntick("--=",
> - oldval, rcu_dynticks_nesting));
> + rcu_dynticks_nesting, newval));
>   return;
>   }
> - RCU_TRACE(trace_rcu_dyntick("Start", oldval, rcu_dynticks_nesting));
> + RCU_TRACE(trace_rcu_dyntick("Start", rcu_dynticks_nesting, newval));
>   if (!is_idle_task(current)) {
>   struct task_struct *idle = idle_task(smp_processor_id());
>  
>   RCU_TRACE(trace_rcu_dyntick("Error on entry: not idle task",
> - oldval, rcu_dynticks_nesting));
> + rcu_dynticks_nesting, newval));
>   ftrace_dump(DUMP_ALL);
>   WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
> current->pid, current->comm,
> idle->pid, idle->comm); /* must be idle task! */
>   }
> + barrier();
> + rcu_dynticks_nesting = newval;
>   rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
>  }
>  
> @@ -84,17 +86,16 @@ static void rcu_idle_enter_common(long long oldval)
>  void rcu_idle_enter(void)
>  {
>   unsigned long flags;
> - long long oldval;
> + long long newval;
>  
>   local_irq_save(flags);
> - oldval = rcu_dynticks_nesting;
>   WARN_ON_ONCE((rcu_dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0);
>   if ((rcu_dynticks_nesting & DYNTICK_TASK_NEST_MASK) ==
>   DYNTICK_TASK_NEST_VALUE)
> - rcu_dynticks_nesting = 0;
> + newval = 0;
>   else
> - rcu_dynticks_nesting  -= DYNTICK_TASK_NEST_VALUE;
> - rcu_idle_enter_common(oldval);
> + newval = rcu_dynticks_nesting - DYNTICK_TASK_NEST_VALUE;
> + rcu_idle_enter_common(newval);
>   local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_idle_enter);
> @@ -105,13 +106,12 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
>  void rcu_irq_exit(void)
>  {
>   unsigned long flags;
> - long long oldval;
> + long long newval;
>  
>   local_irq_save(flags);
> - oldval = rcu_dynticks_nesting;
> - rcu_dynticks_nesting--;
> - WARN_ON_ONCE(rcu_dynticks_nesting < 0);
> - rcu_idle_enter_common(oldval);
> + newval = rcu_dynticks_nesting - 1;
> + WARN_ON_ONCE(newval < 0);
> + rcu_idle_enter_common(newval);
>   local_irq_restore(flags);
>  }



rcu: Pull TINY_RCU dyntick-idle tracing into non-idle region

Because TINY_RCU's idle detection keys directly off of the nesting
level, rather than from a separate variable as in TREE_RCU, the
TINY_RCU dyntick-idle tracing on transition to idle must happen
before the change to the nesting level.  This commit therefore makes
this change by passing the desired new value (rather than the old value)
of the nesting level in to 

Re: rcu_dyntick and suspicious RCU usage

2012-07-12 Thread Fengguang Wu
On Wed, Jul 11, 2012 at 08:55:59PM +0800, Fengguang Wu wrote:
> On Wed, Jul 11, 2012 at 08:36:16PM +0800, Fengguang Wu wrote:
> > On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
> > > On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
> > > > Hi Paul,
> > > > 
> > > > Fortunately this bug is bisectable and the first bad commit is:
> > > > 
> > > > commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
> > > > Author: Paul E. McKenney 
> > > > Date:   Fri Sep 30 12:10:22 2011 -0700
> > > > 
> > > > rcu: Track idleness independent of idle tasks
> > > 
> > > OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
> > > traces while in idle.  The confusion on my part was that in TREE_RCU,
> > > the nesting and dyntick-idle indication are different, while in
> > > TINY_RCU they are one and the same.
> > > 
> > > Does the following patch help?
> > 
> > Not exactly, but the error message is now changed to:
> 
> Then I think it may help to test the linux-next tree that contains
> latest RCU fixes. And it somehow helped. There are no warnings in
> the linux-next tree 1 or 2 days ago; there is a maybe unrelated warning
> in today's linux-next tree.
> 
> [0.013241] Performance Events: 
> [0.014222] [ cut here ]
> [0.014975] WARNING: at /c/wfg/linux/kernel/workqueue.c:1217 
> worker_enter_idle+0x2fa/0x37c()

That's bisected to one of Tejun's workqueue commit..

> Both trees freeze after this point:
> 
> modprobe: FATAL: Could not load 
> /lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
> ^M
> modprobe: FATAL: Could not load 
> /lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
> ^M

And that's bisected to Al's vfs work..

> So I'd better to base the test on one of your RCU branches..
> Which branch would your recommend?

And finally I tried the patch on top of rcu/next, and get the attached
dmesg and warnings:

[1.776415] Testing tracer preemptirqsoff: PASSED
[1.779383] Testing tracer wakeup: [2.091320] ftrace-test (15) used 
greatest stack depth: 7284 bytes left
PASSED
[2.094844] Testing tracer wakeup_rt: [2.179052] [ cut here 
]
[2.181087] WARNING: at /c/wfg/linux/kernel/rcutiny.c:92 
rcu_idle_enter+0xf0/0x244() 
[2.183023] Hardware name: Bochs

[2.183023] Modules linked in:
[2.183023]
[2.183023] Pid: 0, comm: swapper Not tainted 3.5.0-rc5+ #59
[2.183023] Call Trace:
[2.183023]  [] warn_slowpath_common+0xfc/0x13b
[2.183023]  [] ? rcu_idle_enter+0xf0/0x244
[2.183023]  [] ? rcu_idle_enter+0xf0/0x244
[2.183023]  [] warn_slowpath_null+0x3e/0x4e
[2.183023]  [] rcu_idle_enter+0xf0/0x244   
[2.183023]  [] cpu_idle+0x159/0x220 
[2.183023]  [] rest_init+0x2db/0x2ef
[2.183023]  [] ? reciprocal_value+0x64/0x64
[2.183023]  [] start_kernel+0x8eb/0x8ff
[2.183023]  [] ? repair_env_string+0xd3/0xd3
[2.183023]  [] i386_start_kernel+0xf8/0x10b 
[2.183023] ---[ end trace 44593438a59a9533 ]---  
[2.195679] [ cut here ]
[2.196428] WARNING: at /c/wfg/linux/kernel/rcutiny.c:113 
rcu_irq_exit+0x100/0x217()
[2.197633] Hardware name: Bochs
[2.198183] Modules linked in:
[2.198695] Pid: 0, comm: swapper Tainted: GW3.5.0-rc5+ #59
[2.199671] Call Trace:
[2.199671]  [] warn_slowpath_common+0xfc/0x13b
[2.199671]  [] ? rcu_irq_exit+0x100/0x217 
[2.199671]  [] ? rcu_irq_exit+0x100/0x217
[2.199671]  [] warn_slowpath_null+0x3e/0x4e
[2.199671]  [] rcu_irq_exit+0x100/0x217
[2.199671]  [] irq_exit+0x117/0x14d
[2.199671]  [] do_IRQ+0xe5/0x106
[2.199671]  [] ? trace_hardirqs_off_caller+0x97/0x167
[2.199671]  [] common_interrupt+0x35/0x3c
[2.199671]  [] ? trace_hardirqs_on_caller+0x1b8/0x38c
[2.199671]  [] ? native_safe_halt+0x13/0x23   
   

The same rcu/next HEAD w/o patch, the warning is:

[4.456093] Testing event rcu_dyntick: [4.458251] 
[4.458941] ===
[4.459352] [ INFO: suspicious RCU usage. ]
[4.459352] 3.5.0-rc5+ #60 Not tainted
[4.459352] ---
[4.459352] /c/wfg/linux/include/trace/events/rcu.h:275 suspicious 
rcu_dereference_check() usage!
[4.459352] 
[4.459352] other info that might help us debug this:
[4.459352] 
[4.459352] 
[4.459352] RCU used illegally from idle CPU!
[4.459352] rcu_scheduler_active = 1, debug_locks = 0
[4.459352] RCU used illegally from extended quiescent state!
[4.459352] no locks held by swapper/0.
[4.459352] 
[4.459352] stack backtrace:
[4.459352] Pid: 0, comm: swapper Not tainted 3.5.0-rc5+ #60
[4.459352] Call Trace:

[4.459352] 
[4.459352]  [] lockdep_rcu_suspicious+0x1de/0x1f4
[4.459352]  [] trace_rcu_dyntick+0xd9/0x167
[4.459352]  [] 

Re: rcu_dyntick and suspicious RCU usage

2012-07-12 Thread Fengguang Wu
On Wed, Jul 11, 2012 at 08:55:59PM +0800, Fengguang Wu wrote:
 On Wed, Jul 11, 2012 at 08:36:16PM +0800, Fengguang Wu wrote:
  On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
   On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
Hi Paul,

Fortunately this bug is bisectable and the first bad commit is:

commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
Author: Paul E. McKenney paul.mcken...@linaro.org
Date:   Fri Sep 30 12:10:22 2011 -0700

rcu: Track idleness independent of idle tasks
   
   OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
   traces while in idle.  The confusion on my part was that in TREE_RCU,
   the nesting and dyntick-idle indication are different, while in
   TINY_RCU they are one and the same.
   
   Does the following patch help?
  
  Not exactly, but the error message is now changed to:
 
 Then I think it may help to test the linux-next tree that contains
 latest RCU fixes. And it somehow helped. There are no warnings in
 the linux-next tree 1 or 2 days ago; there is a maybe unrelated warning
 in today's linux-next tree.
 
 [0.013241] Performance Events: 
 [0.014222] [ cut here ]
 [0.014975] WARNING: at /c/wfg/linux/kernel/workqueue.c:1217 
 worker_enter_idle+0x2fa/0x37c()

That's bisected to one of Tejun's workqueue commit..

 Both trees freeze after this point:
 
 modprobe: FATAL: Could not load 
 /lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
 ^M
 modprobe: FATAL: Could not load 
 /lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
 ^M

And that's bisected to Al's vfs work..

 So I'd better to base the test on one of your RCU branches..
 Which branch would your recommend?

And finally I tried the patch on top of rcu/next, and get the attached
dmesg and warnings:

[1.776415] Testing tracer preemptirqsoff: PASSED
[1.779383] Testing tracer wakeup: [2.091320] ftrace-test (15) used 
greatest stack depth: 7284 bytes left
PASSED
[2.094844] Testing tracer wakeup_rt: [2.179052] [ cut here 
]
[2.181087] WARNING: at /c/wfg/linux/kernel/rcutiny.c:92 
rcu_idle_enter+0xf0/0x244() 
[2.183023] Hardware name: Bochs

[2.183023] Modules linked in:
[2.183023]
[2.183023] Pid: 0, comm: swapper Not tainted 3.5.0-rc5+ #59
[2.183023] Call Trace:
[2.183023]  [c103704c] warn_slowpath_common+0xfc/0x13b
[2.183023]  [c1105d31] ? rcu_idle_enter+0xf0/0x244
[2.183023]  [c1105d31] ? rcu_idle_enter+0xf0/0x244
[2.183023]  [c10370c9] warn_slowpath_null+0x3e/0x4e
[2.183023]  [c1105d31] rcu_idle_enter+0xf0/0x244   
[2.183023]  [c100fbdb] cpu_idle+0x159/0x220 
[2.183023]  [c14165a3] rest_init+0x2db/0x2ef
[2.183023]  [c14162c8] ? reciprocal_value+0x64/0x64
[2.183023]  [c171d456] start_kernel+0x8eb/0x8ff
[2.183023]  [c171c62f] ? repair_env_string+0xd3/0xd3
[2.183023]  [c171c342] i386_start_kernel+0xf8/0x10b 
[2.183023] ---[ end trace 44593438a59a9533 ]---  
[2.195679] [ cut here ]
[2.196428] WARNING: at /c/wfg/linux/kernel/rcutiny.c:113 
rcu_irq_exit+0x100/0x217()
[2.197633] Hardware name: Bochs
[2.198183] Modules linked in:
[2.198695] Pid: 0, comm: swapper Tainted: GW3.5.0-rc5+ #59
[2.199671] Call Trace:
[2.199671]  [c103704c] warn_slowpath_common+0xfc/0x13b
[2.199671]  [c1105b2a] ? rcu_irq_exit+0x100/0x217 
[2.199671]  [c1105b2a] ? rcu_irq_exit+0x100/0x217
[2.199671]  [c10370c9] warn_slowpath_null+0x3e/0x4e
[2.199671]  [c1105b2a] rcu_irq_exit+0x100/0x217
[2.199671]  [c10469ae] irq_exit+0x117/0x14d
[2.199671]  [c1005d31] do_IRQ+0xe5/0x106
[2.199671]  [c10ba77e] ? trace_hardirqs_off_caller+0x97/0x167
[2.199671]  [c1437975] common_interrupt+0x35/0x3c
[2.199671]  [c10c00e0] ? trace_hardirqs_on_caller+0x1b8/0x38c
[2.199671]  [c1025c7b] ? native_safe_halt+0x13/0x23   
   

The same rcu/next HEAD w/o patch, the warning is:

[4.456093] Testing event rcu_dyntick: [4.458251] 
[4.458941] ===
[4.459352] [ INFO: suspicious RCU usage. ]
[4.459352] 3.5.0-rc5+ #60 Not tainted
[4.459352] ---
[4.459352] /c/wfg/linux/include/trace/events/rcu.h:275 suspicious 
rcu_dereference_check() usage!
[4.459352] 
[4.459352] other info that might help us debug this:
[4.459352] 
[4.459352] 
[4.459352] RCU used illegally from idle CPU!
[4.459352] rcu_scheduler_active = 1, debug_locks = 0
[4.459352] RCU used illegally from extended quiescent state!
[4.459352] no locks held by swapper/0.
[4.459352] 
[4.459352] stack backtrace:
[4.459352] Pid: 0, comm: swapper Not tainted 3.5.0-rc5+ #60
[4.459352] Call Trace:

[4.459352] 
[4.459352]  

Re: rcu_dyntick and suspicious RCU usage

2012-07-12 Thread Paul E. McKenney
On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
 On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
  Hi Paul,
  
  Fortunately this bug is bisectable and the first bad commit is:
  
  commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
  Author: Paul E. McKenney paul.mcken...@linaro.org
  Date:   Fri Sep 30 12:10:22 2011 -0700
  
  rcu: Track idleness independent of idle tasks
 
 OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
 traces while in idle.  The confusion on my part was that in TREE_RCU,
 the nesting and dyntick-idle indication are different, while in
 TINY_RCU they are one and the same.
 
 Does the following patch help?

This one failed in my testing.  Please see the end for the fixed
version, with on small but important change.

Thanx, Paul

 
 
 rcu: Pull TINY_RCU dyntick-idle tracing into non-idle region
 
 Because TINY_RCU's idle detection keys directly off of the nesting
 level, rather than from a separate variable as in TREE_RCU, the
 TINY_RCU dyntick-idle tracing on transition to idle must happen
 before the change to the nesting level.  This commit therefore makes
 this change by passing the desired new value (rather than the old value)
 of the nesting level in to rcu_idle_enter_common().
 
 Signed-off-by: Paul E. McKenney paul.mcken...@linaro.org
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 
 diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
 index 547b1fe..ca3741d 100644
 --- a/kernel/rcutiny.c
 +++ b/kernel/rcutiny.c
 @@ -56,24 +56,26 @@ static void __call_rcu(struct rcu_head *head,
  static long long rcu_dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
  
  /* Common code for rcu_idle_enter() and rcu_irq_exit(), see 
 kernel/rcutree.c. */
 -static void rcu_idle_enter_common(long long oldval)
 +static void rcu_idle_enter_common(long long newval)
  {
   if (rcu_dynticks_nesting) {
   RCU_TRACE(trace_rcu_dyntick(--=,
 - oldval, rcu_dynticks_nesting));
 + rcu_dynticks_nesting, newval));
   return;
   }
 - RCU_TRACE(trace_rcu_dyntick(Start, oldval, rcu_dynticks_nesting));
 + RCU_TRACE(trace_rcu_dyntick(Start, rcu_dynticks_nesting, newval));
   if (!is_idle_task(current)) {
   struct task_struct *idle = idle_task(smp_processor_id());
  
   RCU_TRACE(trace_rcu_dyntick(Error on entry: not idle task,
 - oldval, rcu_dynticks_nesting));
 + rcu_dynticks_nesting, newval));
   ftrace_dump(DUMP_ALL);
   WARN_ONCE(1, Current pid: %d comm: %s / Idle pid: %d comm: %s,
 current-pid, current-comm,
 idle-pid, idle-comm); /* must be idle task! */
   }
 + barrier();
 + rcu_dynticks_nesting = newval;
   rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
  }
  
 @@ -84,17 +86,16 @@ static void rcu_idle_enter_common(long long oldval)
  void rcu_idle_enter(void)
  {
   unsigned long flags;
 - long long oldval;
 + long long newval;
  
   local_irq_save(flags);
 - oldval = rcu_dynticks_nesting;
   WARN_ON_ONCE((rcu_dynticks_nesting  DYNTICK_TASK_NEST_MASK) == 0);
   if ((rcu_dynticks_nesting  DYNTICK_TASK_NEST_MASK) ==
   DYNTICK_TASK_NEST_VALUE)
 - rcu_dynticks_nesting = 0;
 + newval = 0;
   else
 - rcu_dynticks_nesting  -= DYNTICK_TASK_NEST_VALUE;
 - rcu_idle_enter_common(oldval);
 + newval = rcu_dynticks_nesting - DYNTICK_TASK_NEST_VALUE;
 + rcu_idle_enter_common(newval);
   local_irq_restore(flags);
  }
  EXPORT_SYMBOL_GPL(rcu_idle_enter);
 @@ -105,13 +106,12 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
  void rcu_irq_exit(void)
  {
   unsigned long flags;
 - long long oldval;
 + long long newval;
  
   local_irq_save(flags);
 - oldval = rcu_dynticks_nesting;
 - rcu_dynticks_nesting--;
 - WARN_ON_ONCE(rcu_dynticks_nesting  0);
 - rcu_idle_enter_common(oldval);
 + newval = rcu_dynticks_nesting - 1;
 + WARN_ON_ONCE(newval  0);
 + rcu_idle_enter_common(newval);
   local_irq_restore(flags);
  }



rcu: Pull TINY_RCU dyntick-idle tracing into non-idle region

Because TINY_RCU's idle detection keys directly off of the nesting
level, rather than from a separate variable as in TREE_RCU, the
TINY_RCU dyntick-idle tracing on transition to idle must happen
before the change to the nesting level.  This commit therefore makes
this change by passing the desired new value (rather than the old value)
of the nesting level in to rcu_idle_enter_common().

Signed-off-by: Paul E. McKenney 

Re: rcu_dyntick and suspicious RCU usage

2012-07-12 Thread Fengguang Wu
On Thu, Jul 12, 2012 at 09:43:08AM -0700, Paul E. McKenney wrote:
 On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
  On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
   Hi Paul,
   
   Fortunately this bug is bisectable and the first bad commit is:
   
   commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
   Author: Paul E. McKenney paul.mcken...@linaro.org
   Date:   Fri Sep 30 12:10:22 2011 -0700
   
   rcu: Track idleness independent of idle tasks
  
  OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
  traces while in idle.  The confusion on my part was that in TREE_RCU,
  the nesting and dyntick-idle indication are different, while in
  TINY_RCU they are one and the same.
  
  Does the following patch help?
 
 This one failed in my testing.  Please see the end for the fixed
 version, with on small but important change.

It worked, thanks!

Tested-by: Fengguang Wu w...@linux.intel.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-12 Thread Paul E. McKenney
On Thu, Jul 12, 2012 at 10:11:30PM +0800, Fengguang Wu wrote:
 On Wed, Jul 11, 2012 at 08:55:59PM +0800, Fengguang Wu wrote:
  On Wed, Jul 11, 2012 at 08:36:16PM +0800, Fengguang Wu wrote:
   On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
 Hi Paul,
 
 Fortunately this bug is bisectable and the first bad commit is:
 
 commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
 Author: Paul E. McKenney paul.mcken...@linaro.org
 Date:   Fri Sep 30 12:10:22 2011 -0700
 
 rcu: Track idleness independent of idle tasks

OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
traces while in idle.  The confusion on my part was that in TREE_RCU,
the nesting and dyntick-idle indication are different, while in
TINY_RCU they are one and the same.

Does the following patch help?
   
   Not exactly, but the error message is now changed to:
  
  Then I think it may help to test the linux-next tree that contains
  latest RCU fixes. And it somehow helped. There are no warnings in
  the linux-next tree 1 or 2 days ago; there is a maybe unrelated warning
  in today's linux-next tree.
  
  [0.013241] Performance Events: 
  [0.014222] [ cut here ]
  [0.014975] WARNING: at /c/wfg/linux/kernel/workqueue.c:1217 
  worker_enter_idle+0x2fa/0x37c()
 
 That's bisected to one of Tejun's workqueue commit..
 
  Both trees freeze after this point:
  
  modprobe: FATAL: Could not load 
  /lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
  ^M
  modprobe: FATAL: Could not load 
  /lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
  ^M
 
 And that's bisected to Al's vfs work..
 
  So I'd better to base the test on one of your RCU branches..
  Which branch would your recommend?
 
 And finally I tried the patch on top of rcu/next, and get the attached
 dmesg and warnings:
 
 [1.776415] Testing tracer preemptirqsoff: PASSED
 [1.779383] Testing tracer wakeup: [2.091320] ftrace-test (15) used 
 greatest stack depth: 7284 bytes left
 PASSED
 [2.094844] Testing tracer wakeup_rt: [2.179052] [ cut 
 here ]
 [2.181087] WARNING: at /c/wfg/linux/kernel/rcutiny.c:92 
 rcu_idle_enter+0xf0/0x244() 
 [2.183023] Hardware name: Bochs
 
 [2.183023] Modules linked in:
 [2.183023]
 [2.183023] Pid: 0, comm: swapper Not tainted 3.5.0-rc5+ #59
 [2.183023] Call Trace:
 [2.183023]  [c103704c] warn_slowpath_common+0xfc/0x13b
 [2.183023]  [c1105d31] ? rcu_idle_enter+0xf0/0x244
 [2.183023]  [c1105d31] ? rcu_idle_enter+0xf0/0x244
 [2.183023]  [c10370c9] warn_slowpath_null+0x3e/0x4e
 [2.183023]  [c1105d31] rcu_idle_enter+0xf0/0x244   
 [2.183023]  [c100fbdb] cpu_idle+0x159/0x220 
 [2.183023]  [c14165a3] rest_init+0x2db/0x2ef
 [2.183023]  [c14162c8] ? reciprocal_value+0x64/0x64
 [2.183023]  [c171d456] start_kernel+0x8eb/0x8ff
 [2.183023]  [c171c62f] ? repair_env_string+0xd3/0xd3
 [2.183023]  [c171c342] i386_start_kernel+0xf8/0x10b 
 [2.183023] ---[ end trace 44593438a59a9533 ]---  
 [2.195679] [ cut here ]
 [2.196428] WARNING: at /c/wfg/linux/kernel/rcutiny.c:113 
 rcu_irq_exit+0x100/0x217()
 [2.197633] Hardware name: Bochs
 [2.198183] Modules linked in:
 [2.198695] Pid: 0, comm: swapper Tainted: GW3.5.0-rc5+ #59
 [2.199671] Call Trace:
 [2.199671]  [c103704c] warn_slowpath_common+0xfc/0x13b
 [2.199671]  [c1105b2a] ? rcu_irq_exit+0x100/0x217 
 [2.199671]  [c1105b2a] ? rcu_irq_exit+0x100/0x217
 [2.199671]  [c10370c9] warn_slowpath_null+0x3e/0x4e
 [2.199671]  [c1105b2a] rcu_irq_exit+0x100/0x217
 [2.199671]  [c10469ae] irq_exit+0x117/0x14d
 [2.199671]  [c1005d31] do_IRQ+0xe5/0x106
 [2.199671]  [c10ba77e] ? trace_hardirqs_off_caller+0x97/0x167
 [2.199671]  [c1437975] common_interrupt+0x35/0x3c
 [2.199671]  [c10c00e0] ? trace_hardirqs_on_caller+0x1b8/0x38c
 [2.199671]  [c1025c7b] ? native_safe_halt+0x13/0x23 
  
 
 The same rcu/next HEAD w/o patch, the warning is:
 
 [4.456093] Testing event rcu_dyntick: [4.458251] 
 [4.458941] ===
 [4.459352] [ INFO: suspicious RCU usage. ]
 [4.459352] 3.5.0-rc5+ #60 Not tainted
 [4.459352] ---
 [4.459352] /c/wfg/linux/include/trace/events/rcu.h:275 suspicious 
 rcu_dereference_check() usage!
 [4.459352] 
 [4.459352] other info that might help us debug this:
 [4.459352] 
 [4.459352] 
 [4.459352] RCU used illegally from idle CPU!
 [4.459352] rcu_scheduler_active = 1, debug_locks = 0
 [4.459352] RCU used illegally from extended quiescent state!
 [4.459352] no locks held by swapper/0.
 [   

Re: rcu_dyntick and suspicious RCU usage

2012-07-12 Thread Paul E. McKenney
On Fri, Jul 13, 2012 at 12:57:38AM +0800, Fengguang Wu wrote:
 On Thu, Jul 12, 2012 at 09:43:08AM -0700, Paul E. McKenney wrote:
  On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
   On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
Hi Paul,

Fortunately this bug is bisectable and the first bad commit is:

commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
Author: Paul E. McKenney paul.mcken...@linaro.org
Date:   Fri Sep 30 12:10:22 2011 -0700

rcu: Track idleness independent of idle tasks
   
   OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
   traces while in idle.  The confusion on my part was that in TREE_RCU,
   the nesting and dyntick-idle indication are different, while in
   TINY_RCU they are one and the same.
   
   Does the following patch help?
  
  This one failed in my testing.  Please see the end for the fixed
  version, with on small but important change.
 
 It worked, thanks!
 
 Tested-by: Fengguang Wu w...@linux.intel.com

Very good!  (And please ignore my resend of the same patch.)

I will queue this.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-11 Thread Fengguang Wu
On Wed, Jul 11, 2012 at 08:36:16PM +0800, Fengguang Wu wrote:
> On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
> > > Hi Paul,
> > > 
> > > Fortunately this bug is bisectable and the first bad commit is:
> > > 
> > > commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
> > > Author: Paul E. McKenney 
> > > Date:   Fri Sep 30 12:10:22 2011 -0700
> > > 
> > > rcu: Track idleness independent of idle tasks
> > 
> > OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
> > traces while in idle.  The confusion on my part was that in TREE_RCU,
> > the nesting and dyntick-idle indication are different, while in
> > TINY_RCU they are one and the same.
> > 
> > Does the following patch help?
> 
> Not exactly, but the error message is now changed to:

Then I think it may help to test the linux-next tree that contains
latest RCU fixes. And it somehow helped. There are no warnings in
the linux-next tree 1 or 2 days ago; there is a maybe unrelated warning
in today's linux-next tree.

[0.013241] Performance Events: 
[0.014222] [ cut here ]
[0.014975] WARNING: at /c/wfg/linux/kernel/workqueue.c:1217 
worker_enter_idle+0x2fa/0x37c()
[0.016000] Hardware name: Bochs
[0.016000] Modules linked in:
[0.016000] Pid: 1, comm: swapper Not tainted 3.5.0-rc6-next-20120710+ #58
[0.016000] Call Trace:
[0.016000]  [] warn_slowpath_common+0xfc/0x13b
[0.016000]  [] ? worker_enter_idle+0x2fa/0x37c
[0.016000]  [] ? worker_enter_idle+0x2fa/0x37c
[0.016000]  [] warn_slowpath_null+0x3e/0x4e
[0.016000]  [] worker_enter_idle+0x2fa/0x37c
[0.016000]  [] ? _raw_spin_lock_irq+0xc3/0xe5
[0.016000]  [] start_worker+0x2a/0x51
[0.016000]  [] init_workqueues+0x29f/0x656
[0.016000]  [] ? usermodehelper_init+0xac/0xac
[0.016000]  [] do_one_initcall+0xf7/0x272
[0.016000]  [] ? trace_preempt_on+0x1c/0x7c
[0.016000]  [] kernel_init+0x1a4/0x4da
[0.016000]  [] ? start_kernel+0x8ff/0x8ff
[0.016000]  [] kernel_thread_helper+0x6/0x10
[0.016000] ---[ end trace 50864a5de9c2f446 ]---
[0.020353] Testing tracer nop: PASSED

Both trees freeze after this point:

modprobe: FATAL: Could not load 
/lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
^M
modprobe: FATAL: Could not load 
/lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
^M
[   31.223314] CPA self-test:
[   31.224750]  4k 3069 large 61 gb 0 x 3130[c000-cfffc000] miss 0
[   31.239798]  4k 65533 large 0 gb 0 x 65533[c000-cfffc000] miss 0
[   31.249441]  4k 65533 large 0 gb 0 x 65533[c000-cfffc000] miss 0
[   31.250402] ok.

So I'd better to base the test on one of your RCU branches..
Which branch would your recommend?

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-11 Thread Fengguang Wu
On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
> On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
> > Hi Paul,
> > 
> > Fortunately this bug is bisectable and the first bad commit is:
> > 
> > commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
> > Author: Paul E. McKenney 
> > Date:   Fri Sep 30 12:10:22 2011 -0700
> > 
> > rcu: Track idleness independent of idle tasks
> 
> OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
> traces while in idle.  The confusion on my part was that in TREE_RCU,
> the nesting and dyntick-idle indication are different, while in
> TINY_RCU they are one and the same.
> 
> Does the following patch help?

Not exactly, but the error message is now changed to:

[2.320411] Testing tracer wakeup: [2.632565] ftrace-test (15) used 
greatest stack depth: 7284 bytes left
PASSED
[2.634231] Testing tracer wakeup_rt: [2.716312] [ cut here 
]
[2.717208] WARNING: at /c/wfg/linux/kernel/rcutiny.c:92 
rcu_idle_enter+0xf0/0x244()
[2.718549] Hardware name: Bochs

[2.719136] Modules linked in:
[2.719754] 
[2.720022] Pid: 0, comm: swapper Not tainted 3.5.0-rc6+ #55
[2.720290] Call Trace:
[2.720290]  [] warn_slowpath_common+0xfc/0x13b
[2.720290]  [] ? rcu_idle_enter+0xf0/0x244
[2.720290]  [] ? rcu_idle_enter+0xf0/0x244
[2.720290]  [] warn_slowpath_null+0x3e/0x4e
[2.720290]  [] rcu_idle_enter+0xf0/0x244
[2.720290]  [] cpu_idle+0x159/0x220
[2.720290]  [] rest_init+0x2db/0x2ef
[2.720290]  [] ? reciprocal_value+0x64/0x64
[2.720290]  [] start_kernel+0x8eb/0x8ff
[2.720290]  [] ? repair_env_string+0xd3/0xd3
[2.720290]  [] i386_start_kernel+0xf8/0x10b
[2.720290] ---[ end trace 44593438a59a9533 ]---
[2.732372] [ cut here ]
[2.733235] WARNING: at /c/wfg/linux/kernel/rcutiny.c:113 
rcu_irq_exit+0x100/0x217()
[2.734613] Hardware name: Bochs
[2.735204] Modules linked in:
[2.735817] Pid: 0, comm: swapper Tainted: GW3.5.0-rc6+ #55
[2.736366] Call Trace:
[2.736366]  [] warn_slowpath_common+0xfc/0x13b
[2.736366]  [] ? rcu_irq_exit+0x100/0x217
[2.736366]  [] ? rcu_irq_exit+0x100/0x217
[2.736366]  [] warn_slowpath_null+0x3e/0x4e
[2.736366]  [] rcu_irq_exit+0x100/0x217
[2.736366]  [] irq_exit+0x117/0x14d
[2.736366]  [] do_IRQ+0xe5/0x106
[2.736366]  [] ? trace_hardirqs_off_caller+0x97/0x167
[2.736366]  [] ? local_bh_enable+0x3c/0x3c
[2.736366]  [] common_interrupt+0x35/0x3c
[2.736366]  [] ? local_bh_enable+0x3c/0x3c
[2.736366]  [] ? do_wait+0x453/0x4f5
[2.736366]  [] ? do_wait+0x453/0x4f5
[2.736366]  [] ? do_timer+0xc1e/0xc77
[2.736366]  [] ? arch_local_irq_enable+0x13/0x27
[2.736366]  [] __do_softirq+0x9d/0x4fd
[2.736366]  [] ? local_bh_enable+0x3c/0x3c
[2.736366][] ? irq_exit+0x92/0x14d
[2.736366]  [] ? do_IRQ+0xe5/0x106
[2.736366]  [] ? trace_hardirqs_off_caller+0x97/0x167
[2.736366]  [] ? common_interrupt+0x35/0x3c
[2.736366]  [] ? do_timer+0xc1e/0xc77
[2.736366]  [] ? native_safe_halt+0x13/0x23
[2.736366]  [] ? default_idle+0xac/0x14a
[2.736366]  [] ? cpu_idle+0x16d/0x220
[2.736366]  [] ? rest_init+0x2db/0x2ef
[2.736366]  [] ? reciprocal_value+0x64/0x64
[2.736366]  [] ? start_kernel+0x8eb/0x8ff
[2.736366]  [] ? repair_env_string+0xd3/0xd3
[2.736366]  [] ? i386_start_kernel+0xf8/0x10b
[2.736366] ---[ end trace 44593438a59a9534 ]---
[2.765933] [ cut here ]
[2.766811] WARNING: at /c/wfg/linux/kernel/rcutiny.c:149 
rcu_idle_exit+0xfa/0x266()
[2.768173] Hardware name: Bochs
[2.768927] Modules linked in:
[2.769514] Pid: 0, comm: swapper Tainted: GW3.5.0-rc6+ #55
[2.769920] Call Trace:
[2.769920]  [] warn_slowpath_common+0xfc/0x13b
[2.769920]  [] ? rcu_idle_exit+0xfa/0x266
[2.769920]  [] ? rcu_idle_exit+0xfa/0x266
[2.769920]  [] warn_slowpath_null+0x3e/0x4e
[2.769920]  [] rcu_idle_exit+0xfa/0x266
[2.769920]  [] cpu_idle+0x180/0x220
[2.769920]  [] rest_init+0x2db/0x2ef
[2.769920]  [] ? reciprocal_value+0x64/0x64
[2.769920]  [] start_kernel+0x8eb/0x8ff
[2.769920]  [] ? repair_env_string+0xd3/0xd3
[2.769920]  [] i386_start_kernel+0xf8/0x10b
[2.769920] ---[ end trace 44593438a59a9535 ]---
[2.988523] PASSED

The test is carried out on top of 2.5-rc6. The kvm command line is:

kvm -cpu kvm64 -enable-kvm \
-kernel $vmlinuz \
-append "$APPEND BOOT_IMAGE=$BOOT_IMAGE" \
$initrd \
-m $mem \
-smp 2 \
-net nic,vlan=0,macaddr=$mac,model=e1000  \
-net user,vlan=0,hostfwd=tcp::$port-:22 \
-boot order=nc \
-no-reboot \
-watchdog i6300esb \
$(qemu_drives) \

Re: rcu_dyntick and suspicious RCU usage

2012-07-11 Thread Paul E. McKenney
On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
> Hi Paul,
> 
> Fortunately this bug is bisectable and the first bad commit is:
> 
> commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
> Author: Paul E. McKenney 
> Date:   Fri Sep 30 12:10:22 2011 -0700
> 
> rcu: Track idleness independent of idle tasks

OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
traces while in idle.  The confusion on my part was that in TREE_RCU,
the nesting and dyntick-idle indication are different, while in
TINY_RCU they are one and the same.

Does the following patch help?

Thanx, Paul



rcu: Pull TINY_RCU dyntick-idle tracing into non-idle region

Because TINY_RCU's idle detection keys directly off of the nesting
level, rather than from a separate variable as in TREE_RCU, the
TINY_RCU dyntick-idle tracing on transition to idle must happen
before the change to the nesting level.  This commit therefore makes
this change by passing the desired new value (rather than the old value)
of the nesting level in to rcu_idle_enter_common().

Signed-off-by: Paul E. McKenney 
Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 547b1fe..ca3741d 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -56,24 +56,26 @@ static void __call_rcu(struct rcu_head *head,
 static long long rcu_dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
 
 /* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcutree.c. 
*/
-static void rcu_idle_enter_common(long long oldval)
+static void rcu_idle_enter_common(long long newval)
 {
if (rcu_dynticks_nesting) {
RCU_TRACE(trace_rcu_dyntick("--=",
-   oldval, rcu_dynticks_nesting));
+   rcu_dynticks_nesting, newval));
return;
}
-   RCU_TRACE(trace_rcu_dyntick("Start", oldval, rcu_dynticks_nesting));
+   RCU_TRACE(trace_rcu_dyntick("Start", rcu_dynticks_nesting, newval));
if (!is_idle_task(current)) {
struct task_struct *idle = idle_task(smp_processor_id());
 
RCU_TRACE(trace_rcu_dyntick("Error on entry: not idle task",
-   oldval, rcu_dynticks_nesting));
+   rcu_dynticks_nesting, newval));
ftrace_dump(DUMP_ALL);
WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
  current->pid, current->comm,
  idle->pid, idle->comm); /* must be idle task! */
}
+   barrier();
+   rcu_dynticks_nesting = newval;
rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
 }
 
@@ -84,17 +86,16 @@ static void rcu_idle_enter_common(long long oldval)
 void rcu_idle_enter(void)
 {
unsigned long flags;
-   long long oldval;
+   long long newval;
 
local_irq_save(flags);
-   oldval = rcu_dynticks_nesting;
WARN_ON_ONCE((rcu_dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0);
if ((rcu_dynticks_nesting & DYNTICK_TASK_NEST_MASK) ==
DYNTICK_TASK_NEST_VALUE)
-   rcu_dynticks_nesting = 0;
+   newval = 0;
else
-   rcu_dynticks_nesting  -= DYNTICK_TASK_NEST_VALUE;
-   rcu_idle_enter_common(oldval);
+   newval = rcu_dynticks_nesting - DYNTICK_TASK_NEST_VALUE;
+   rcu_idle_enter_common(newval);
local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_enter);
@@ -105,13 +106,12 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
 void rcu_irq_exit(void)
 {
unsigned long flags;
-   long long oldval;
+   long long newval;
 
local_irq_save(flags);
-   oldval = rcu_dynticks_nesting;
-   rcu_dynticks_nesting--;
-   WARN_ON_ONCE(rcu_dynticks_nesting < 0);
-   rcu_idle_enter_common(oldval);
+   newval = rcu_dynticks_nesting - 1;
+   WARN_ON_ONCE(newval < 0);
+   rcu_idle_enter_common(newval);
local_irq_restore(flags);
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-11 Thread Paul E. McKenney
On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
 Hi Paul,
 
 Fortunately this bug is bisectable and the first bad commit is:
 
 commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
 Author: Paul E. McKenney paul.mcken...@linaro.org
 Date:   Fri Sep 30 12:10:22 2011 -0700
 
 rcu: Track idleness independent of idle tasks

OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
traces while in idle.  The confusion on my part was that in TREE_RCU,
the nesting and dyntick-idle indication are different, while in
TINY_RCU they are one and the same.

Does the following patch help?

Thanx, Paul



rcu: Pull TINY_RCU dyntick-idle tracing into non-idle region

Because TINY_RCU's idle detection keys directly off of the nesting
level, rather than from a separate variable as in TREE_RCU, the
TINY_RCU dyntick-idle tracing on transition to idle must happen
before the change to the nesting level.  This commit therefore makes
this change by passing the desired new value (rather than the old value)
of the nesting level in to rcu_idle_enter_common().

Signed-off-by: Paul E. McKenney paul.mcken...@linaro.org
Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com

diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 547b1fe..ca3741d 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -56,24 +56,26 @@ static void __call_rcu(struct rcu_head *head,
 static long long rcu_dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
 
 /* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcutree.c. 
*/
-static void rcu_idle_enter_common(long long oldval)
+static void rcu_idle_enter_common(long long newval)
 {
if (rcu_dynticks_nesting) {
RCU_TRACE(trace_rcu_dyntick(--=,
-   oldval, rcu_dynticks_nesting));
+   rcu_dynticks_nesting, newval));
return;
}
-   RCU_TRACE(trace_rcu_dyntick(Start, oldval, rcu_dynticks_nesting));
+   RCU_TRACE(trace_rcu_dyntick(Start, rcu_dynticks_nesting, newval));
if (!is_idle_task(current)) {
struct task_struct *idle = idle_task(smp_processor_id());
 
RCU_TRACE(trace_rcu_dyntick(Error on entry: not idle task,
-   oldval, rcu_dynticks_nesting));
+   rcu_dynticks_nesting, newval));
ftrace_dump(DUMP_ALL);
WARN_ONCE(1, Current pid: %d comm: %s / Idle pid: %d comm: %s,
  current-pid, current-comm,
  idle-pid, idle-comm); /* must be idle task! */
}
+   barrier();
+   rcu_dynticks_nesting = newval;
rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
 }
 
@@ -84,17 +86,16 @@ static void rcu_idle_enter_common(long long oldval)
 void rcu_idle_enter(void)
 {
unsigned long flags;
-   long long oldval;
+   long long newval;
 
local_irq_save(flags);
-   oldval = rcu_dynticks_nesting;
WARN_ON_ONCE((rcu_dynticks_nesting  DYNTICK_TASK_NEST_MASK) == 0);
if ((rcu_dynticks_nesting  DYNTICK_TASK_NEST_MASK) ==
DYNTICK_TASK_NEST_VALUE)
-   rcu_dynticks_nesting = 0;
+   newval = 0;
else
-   rcu_dynticks_nesting  -= DYNTICK_TASK_NEST_VALUE;
-   rcu_idle_enter_common(oldval);
+   newval = rcu_dynticks_nesting - DYNTICK_TASK_NEST_VALUE;
+   rcu_idle_enter_common(newval);
local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_enter);
@@ -105,13 +106,12 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
 void rcu_irq_exit(void)
 {
unsigned long flags;
-   long long oldval;
+   long long newval;
 
local_irq_save(flags);
-   oldval = rcu_dynticks_nesting;
-   rcu_dynticks_nesting--;
-   WARN_ON_ONCE(rcu_dynticks_nesting  0);
-   rcu_idle_enter_common(oldval);
+   newval = rcu_dynticks_nesting - 1;
+   WARN_ON_ONCE(newval  0);
+   rcu_idle_enter_common(newval);
local_irq_restore(flags);
 }
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-11 Thread Fengguang Wu
On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
 On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
  Hi Paul,
  
  Fortunately this bug is bisectable and the first bad commit is:
  
  commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
  Author: Paul E. McKenney paul.mcken...@linaro.org
  Date:   Fri Sep 30 12:10:22 2011 -0700
  
  rcu: Track idleness independent of idle tasks
 
 OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
 traces while in idle.  The confusion on my part was that in TREE_RCU,
 the nesting and dyntick-idle indication are different, while in
 TINY_RCU they are one and the same.
 
 Does the following patch help?

Not exactly, but the error message is now changed to:

[2.320411] Testing tracer wakeup: [2.632565] ftrace-test (15) used 
greatest stack depth: 7284 bytes left
PASSED
[2.634231] Testing tracer wakeup_rt: [2.716312] [ cut here 
]
[2.717208] WARNING: at /c/wfg/linux/kernel/rcutiny.c:92 
rcu_idle_enter+0xf0/0x244()
[2.718549] Hardware name: Bochs

[2.719136] Modules linked in:
[2.719754] 
[2.720022] Pid: 0, comm: swapper Not tainted 3.5.0-rc6+ #55
[2.720290] Call Trace:
[2.720290]  [c103704c] warn_slowpath_common+0xfc/0x13b
[2.720290]  [c11057f5] ? rcu_idle_enter+0xf0/0x244
[2.720290]  [c11057f5] ? rcu_idle_enter+0xf0/0x244
[2.720290]  [c10370c9] warn_slowpath_null+0x3e/0x4e
[2.720290]  [c11057f5] rcu_idle_enter+0xf0/0x244
[2.720290]  [c100fbdb] cpu_idle+0x159/0x220
[2.720290]  [c141615f] rest_init+0x2db/0x2ef
[2.720290]  [c1415e84] ? reciprocal_value+0x64/0x64
[2.720290]  [c171d456] start_kernel+0x8eb/0x8ff
[2.720290]  [c171c62f] ? repair_env_string+0xd3/0xd3
[2.720290]  [c171c342] i386_start_kernel+0xf8/0x10b
[2.720290] ---[ end trace 44593438a59a9533 ]---
[2.732372] [ cut here ]
[2.733235] WARNING: at /c/wfg/linux/kernel/rcutiny.c:113 
rcu_irq_exit+0x100/0x217()
[2.734613] Hardware name: Bochs
[2.735204] Modules linked in:
[2.735817] Pid: 0, comm: swapper Tainted: GW3.5.0-rc6+ #55
[2.736366] Call Trace:
[2.736366]  [c103704c] warn_slowpath_common+0xfc/0x13b
[2.736366]  [c11055ee] ? rcu_irq_exit+0x100/0x217
[2.736366]  [c11055ee] ? rcu_irq_exit+0x100/0x217
[2.736366]  [c10370c9] warn_slowpath_null+0x3e/0x4e
[2.736366]  [c11055ee] rcu_irq_exit+0x100/0x217
[2.736366]  [c104697f] irq_exit+0x117/0x14d
[2.736366]  [c1005d31] do_IRQ+0xe5/0x106
[2.736366]  [c10b9fea] ? trace_hardirqs_off_caller+0x97/0x167
[2.736366]  [c1045eb2] ? local_bh_enable+0x3c/0x3c
[2.736366]  [c1437575] common_interrupt+0x35/0x3c
[2.736366]  [c1045eb2] ? local_bh_enable+0x3c/0x3c
[2.736366]  [c104007b] ? do_wait+0x453/0x4f5
[2.736366]  [c104007b] ? do_wait+0x453/0x4f5
[2.736366]  [c10b00e0] ? do_timer+0xc1e/0xc77
[2.736366]  [c1044284] ? arch_local_irq_enable+0x13/0x27
[2.736366]  [c1045f4f] __do_softirq+0x9d/0x4fd
[2.736366]  [c1045eb2] ? local_bh_enable+0x3c/0x3c
[2.736366]  IRQ  [c10468fa] ? irq_exit+0x92/0x14d
[2.736366]  [c1005d31] ? do_IRQ+0xe5/0x106
[2.736366]  [c10b9fea] ? trace_hardirqs_off_caller+0x97/0x167
[2.736366]  [c1437575] ? common_interrupt+0x35/0x3c
[2.736366]  [c10b00e0] ? do_timer+0xc1e/0xc77
[2.736366]  [c1025c7b] ? native_safe_halt+0x13/0x23
[2.736366]  [c100e52a] ? default_idle+0xac/0x14a
[2.736366]  [c100fbef] ? cpu_idle+0x16d/0x220
[2.736366]  [c141615f] ? rest_init+0x2db/0x2ef
[2.736366]  [c1415e84] ? reciprocal_value+0x64/0x64
[2.736366]  [c171d456] ? start_kernel+0x8eb/0x8ff
[2.736366]  [c171c62f] ? repair_env_string+0xd3/0xd3
[2.736366]  [c171c342] ? i386_start_kernel+0xf8/0x10b
[2.736366] ---[ end trace 44593438a59a9534 ]---
[2.765933] [ cut here ]
[2.766811] WARNING: at /c/wfg/linux/kernel/rcutiny.c:149 
rcu_idle_exit+0xfa/0x266()
[2.768173] Hardware name: Bochs
[2.768927] Modules linked in:
[2.769514] Pid: 0, comm: swapper Tainted: GW3.5.0-rc6+ #55
[2.769920] Call Trace:
[2.769920]  [c103704c] warn_slowpath_common+0xfc/0x13b
[2.769920]  [c11033be] ? rcu_idle_exit+0xfa/0x266
[2.769920]  [c11033be] ? rcu_idle_exit+0xfa/0x266
[2.769920]  [c10370c9] warn_slowpath_null+0x3e/0x4e
[2.769920]  [c11033be] rcu_idle_exit+0xfa/0x266
[2.769920]  [c100fc02] cpu_idle+0x180/0x220
[2.769920]  [c141615f] rest_init+0x2db/0x2ef
[2.769920]  [c1415e84] ? reciprocal_value+0x64/0x64
[2.769920]  [c171d456] start_kernel+0x8eb/0x8ff
[2.769920]  [c171c62f] ? repair_env_string+0xd3/0xd3
[2.769920]  [c171c342] i386_start_kernel+0xf8/0x10b
[2.769920] ---[ end trace 44593438a59a9535 ]---
[2.988523] PASSED

The test is carried out on top of 2.5-rc6. The kvm command line is:

kvm -cpu kvm64 -enable-kvm \

Re: rcu_dyntick and suspicious RCU usage

2012-07-11 Thread Fengguang Wu
On Wed, Jul 11, 2012 at 08:36:16PM +0800, Fengguang Wu wrote:
 On Wed, Jul 11, 2012 at 12:49:24AM -0700, Paul E. McKenney wrote:
  On Tue, Jul 10, 2012 at 12:47:00PM +0800, Fengguang Wu wrote:
   Hi Paul,
   
   Fortunately this bug is bisectable and the first bad commit is:
   
   commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
   Author: Paul E. McKenney paul.mcken...@linaro.org
   Date:   Fri Sep 30 12:10:22 2011 -0700
   
   rcu: Track idleness independent of idle tasks
  
  OK, there is a problem in TINY_RCU's handling of dyntick-idle: it
  traces while in idle.  The confusion on my part was that in TREE_RCU,
  the nesting and dyntick-idle indication are different, while in
  TINY_RCU they are one and the same.
  
  Does the following patch help?
 
 Not exactly, but the error message is now changed to:

Then I think it may help to test the linux-next tree that contains
latest RCU fixes. And it somehow helped. There are no warnings in
the linux-next tree 1 or 2 days ago; there is a maybe unrelated warning
in today's linux-next tree.

[0.013241] Performance Events: 
[0.014222] [ cut here ]
[0.014975] WARNING: at /c/wfg/linux/kernel/workqueue.c:1217 
worker_enter_idle+0x2fa/0x37c()
[0.016000] Hardware name: Bochs
[0.016000] Modules linked in:
[0.016000] Pid: 1, comm: swapper Not tainted 3.5.0-rc6-next-20120710+ #58
[0.016000] Call Trace:
[0.016000]  [c10374bc] warn_slowpath_common+0xfc/0x13b
[0.016000]  [c106530f] ? worker_enter_idle+0x2fa/0x37c
[0.016000]  [c106530f] ? worker_enter_idle+0x2fa/0x37c
[0.016000]  [c1037539] warn_slowpath_null+0x3e/0x4e
[0.016000]  [c106530f] worker_enter_idle+0x2fa/0x37c
[0.016000]  [c143d177] ? _raw_spin_lock_irq+0xc3/0xe5
[0.016000]  [c10653bb] start_worker+0x2a/0x51
[0.016000]  [c174ec11] init_workqueues+0x29f/0x656
[0.016000]  [c174e972] ? usermodehelper_init+0xac/0xac
[0.016000]  [c1001145] do_one_initcall+0xf7/0x272
[0.016000]  [c1124cb6] ? trace_preempt_on+0x1c/0x7c
[0.016000]  [c173d60e] kernel_init+0x1a4/0x4da
[0.016000]  [c173d46a] ? start_kernel+0x8ff/0x8ff
[0.016000]  [c144a942] kernel_thread_helper+0x6/0x10
[0.016000] ---[ end trace 50864a5de9c2f446 ]---
[0.020353] Testing tracer nop: PASSED

Both trees freeze after this point:

modprobe: FATAL: Could not load 
/lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
^M
modprobe: FATAL: Could not load 
/lib/modules/3.5.0-rc6-next-20120710+/modules.dep: No such file or directory
^M
[   31.223314] CPA self-test:
[   31.224750]  4k 3069 large 61 gb 0 x 3130[c000-cfffc000] miss 0
[   31.239798]  4k 65533 large 0 gb 0 x 65533[c000-cfffc000] miss 0
[   31.249441]  4k 65533 large 0 gb 0 x 65533[c000-cfffc000] miss 0
[   31.250402] ok.

So I'd better to base the test on one of your RCU branches..
Which branch would your recommend?

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu_dyntick and suspicious RCU usage

2012-07-09 Thread Fengguang Wu
Hi Paul,

Fortunately this bug is bisectable and the first bad commit is:

commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
Author: Paul E. McKenney 
Date:   Fri Sep 30 12:10:22 2011 -0700

rcu: Track idleness independent of idle tasks

However, at that commit, the error messages are a bit different:

[1.665656] WARNING: at /c/wfg/linux/kernel/rcutiny.c:67 
rcu_idle_enter_common+0x103/0x1b2()
[1.668811] WARNING: at /c/wfg/linux/kernel/rcutiny.c:112 
rcu_idle_exit_common+0xfa/0x18c()
[4.758670] WARNING: at /c/wfg/linux/kernel/trace/trace_events.c:1720 
event_trace_self_tests_init+0x87/0x11b()

Here is the first back trace:

[1.660165] Testing tracer function: 
[1.664825] [ cut here ]
[1.665656] WARNING: at /c/wfg/linux/kernel/rcutiny.c:67 
rcu_idle_enter_common+0x103/0x1b2()
[1.667006] Hardware name: Bochs
[1.667583] Modules linked in:
[1.668106] Pid: 0, comm: swapper Not tainted 3.2.0-rc5+ #41
[1.668811] Call Trace:
[1.668811]  [] ? printk+0x2b/0x3b
[1.668811]  [] warn_slowpath_common+0xfc/0x13b
[1.668811]  [] ? rcu_idle_enter_common+0x103/0x1b2
[1.668811]  [] ? rcu_idle_enter_common+0x103/0x1b2
[1.668811]  [] warn_slowpath_null+0x3e/0x4e
[1.668811]  [] rcu_idle_enter_common+0x103/0x1b2
[1.668811]  [] rcu_irq_exit+0x1f8/0x20f
[1.668811]  [] irq_exit+0x117/0x14d
[1.668811]  [] do_IRQ+0xe5/0x106
[1.668811]  [] ? trace_hardirqs_off_caller+0x97/0x167
[1.668811]  [] common_interrupt+0x35/0x3c
[1.668811]  [] ? print_shortest_lock_dependencies+0xec/0x292
[1.668811]  [] ? native_safe_halt+0x18/0x28
[1.668811]  [] default_idle+0xd7/0x1f0
[1.668811]  [] cpu_idle+0x1c4/0x2c8
[1.668811]  [] rest_init+0x1db/0x1ef
[1.668811]  [] ? reciprocal_value+0x64/0x64
[1.668811]  [] start_kernel+0x8e5/0x8f9
[1.668811]  [] ? loglevel+0x55/0x55
[1.668811]  [] ? reserve_ebda_region+0xb9/0xc9
[1.668811]  [] i386_start_kernel+0x13a/0x14f

Thanks,
Fengguang
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.2.0-rc5+ (wfg@bee) (gcc version 4.7.0 (Debian 
4.7.1-1) ) #41 PREEMPT Tue Jul 10 12:38:54 CST 2012
[0.00] KERNEL supported cpus:
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00]   Transmeta GenuineTMx86
[0.00]   Transmeta TransmetaCPU
[0.00] CPU: vendor_id 'GenuineIntel' unknown, using generic init.
[0.00] CPU: Your system may be unstable.
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009f400 (usable)
[0.00]  BIOS-e820: 0009f400 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 0fffd000 (usable)
[0.00]  BIOS-e820: 0fffd000 - 1000 (reserved)
[0.00]  BIOS-e820: fffbc000 - 0001 (reserved)
[0.00] debug: ignoring loglevel setting.
[0.00] Notice: NX (Execute Disable) protection cannot be enabled: 
non-PAE kernel!
[0.00] DMI 2.4 present.
[0.00] DMI: Bochs Bochs, BIOS Bochs 01/01/2007
[0.00] e820 update range:  - 0001 (usable) 
==> (reserved)
[0.00] e820 remove range: 000a - 0010 (usable)
[0.00] last_pfn = 0xfffd max_arch_pfn = 0x10
[0.00] initial memory mapped : 0 - 0240
[0.00] Base memory trampoline at [c009e000] 9e000 size 4096
[0.00] init_memory_mapping: -0fffd000
[0.00]  00 - 40 page 4k
[0.00]  40 - 000fc0 page 2M
[0.00]  000fc0 - 000fffd000 page 4k
[0.00] kernel direct mapping tables up to fffd000 @ 23fb000-240
[0.00] log_buf_len: 8388608
[0.00] early log buf free: 129089(98%)
[0.00] RAMDISK: 0e73f000 - 0fff
[0.00] 0MB HIGHMEM available.
[0.00] 255MB LOWMEM available.
[0.00]   mapped low ram: 0 - 0fffd000
[0.00]   low ram: 0 - 0fffd000
[0.00] kvm-clock: Using msrs 12 and 11
[0.00] kvm-clock: cpu 0, msr 0:15ed781, boot clock
[0.00] Zone PFN ranges:
[0.00]   Normal   0x0010 -> 0xfffd
[0.00]   HighMem  empty
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[2] active PFN ranges
[0.00] 0: 0x0010 -> 0x009f
[0.00] 0: 0x0100 -> 0xfffd
[0.00] On node 0 totalpages: 65420
[0.00] free_area_init_node: node 0, pgdat c16ac510, node_mem_map 
cdd3f200
[0.00]   Normal zone: 512 pages used for memmap
[0.00]   Normal zone: 0 pages reserved
[0.00]   Normal zone: 64908 pages, LIFO batch:15
[0.00] SFI: Simple Firmware Interface v0.81 

Re: rcu_dyntick and suspicious RCU usage

2012-07-09 Thread Fengguang Wu
Hi Paul,

Fortunately this bug is bisectable and the first bad commit is:

commit 9b2e4f1880b789be1f24f9684f7a54b90310b5c0
Author: Paul E. McKenney paul.mcken...@linaro.org
Date:   Fri Sep 30 12:10:22 2011 -0700

rcu: Track idleness independent of idle tasks

However, at that commit, the error messages are a bit different:

[1.665656] WARNING: at /c/wfg/linux/kernel/rcutiny.c:67 
rcu_idle_enter_common+0x103/0x1b2()
[1.668811] WARNING: at /c/wfg/linux/kernel/rcutiny.c:112 
rcu_idle_exit_common+0xfa/0x18c()
[4.758670] WARNING: at /c/wfg/linux/kernel/trace/trace_events.c:1720 
event_trace_self_tests_init+0x87/0x11b()

Here is the first back trace:

[1.660165] Testing tracer function: 
[1.664825] [ cut here ]
[1.665656] WARNING: at /c/wfg/linux/kernel/rcutiny.c:67 
rcu_idle_enter_common+0x103/0x1b2()
[1.667006] Hardware name: Bochs
[1.667583] Modules linked in:
[1.668106] Pid: 0, comm: swapper Not tainted 3.2.0-rc5+ #41
[1.668811] Call Trace:
[1.668811]  [c13edf12] ? printk+0x2b/0x3b
[1.668811]  [c104b892] warn_slowpath_common+0xfc/0x13b
[1.668811]  [c10f3c2c] ? rcu_idle_enter_common+0x103/0x1b2
[1.668811]  [c10f3c2c] ? rcu_idle_enter_common+0x103/0x1b2
[1.668811]  [c104b90f] warn_slowpath_null+0x3e/0x4e
[1.668811]  [c10f3c2c] rcu_idle_enter_common+0x103/0x1b2
[1.668811]  [c10f3ed3] rcu_irq_exit+0x1f8/0x20f
[1.668811]  [c1058ca4] irq_exit+0x117/0x14d
[1.668811]  [c1005c45] do_IRQ+0xe5/0x106
[1.668811]  [c109e6b8] ? trace_hardirqs_off_caller+0x97/0x167
[1.668811]  [c1408a75] common_interrupt+0x35/0x3c
[1.668811]  [c10a00e0] ? print_shortest_lock_dependencies+0xec/0x292
[1.668811]  [c1023e16] ? native_safe_halt+0x18/0x28
[1.668811]  [c100e10f] default_idle+0xd7/0x1f0
[1.668811]  [c1001eb3] cpu_idle+0x1c4/0x2c8
[1.668811]  [c13e9027] rest_init+0x1db/0x1ef
[1.668811]  [c13e8e4c] ? reciprocal_value+0x64/0x64
[1.668811]  [c16cc232] start_kernel+0x8e5/0x8f9
[1.668811]  [c16cb36b] ? loglevel+0x55/0x55
[1.668811]  [c16cb208] ? reserve_ebda_region+0xb9/0xc9
[1.668811]  [c16cb13a] i386_start_kernel+0x13a/0x14f

Thanks,
Fengguang
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.2.0-rc5+ (wfg@bee) (gcc version 4.7.0 (Debian 
4.7.1-1) ) #41 PREEMPT Tue Jul 10 12:38:54 CST 2012
[0.00] KERNEL supported cpus:
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00]   Transmeta GenuineTMx86
[0.00]   Transmeta TransmetaCPU
[0.00] CPU: vendor_id 'GenuineIntel' unknown, using generic init.
[0.00] CPU: Your system may be unstable.
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009f400 (usable)
[0.00]  BIOS-e820: 0009f400 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 0fffd000 (usable)
[0.00]  BIOS-e820: 0fffd000 - 1000 (reserved)
[0.00]  BIOS-e820: fffbc000 - 0001 (reserved)
[0.00] debug: ignoring loglevel setting.
[0.00] Notice: NX (Execute Disable) protection cannot be enabled: 
non-PAE kernel!
[0.00] DMI 2.4 present.
[0.00] DMI: Bochs Bochs, BIOS Bochs 01/01/2007
[0.00] e820 update range:  - 0001 (usable) 
== (reserved)
[0.00] e820 remove range: 000a - 0010 (usable)
[0.00] last_pfn = 0xfffd max_arch_pfn = 0x10
[0.00] initial memory mapped : 0 - 0240
[0.00] Base memory trampoline at [c009e000] 9e000 size 4096
[0.00] init_memory_mapping: -0fffd000
[0.00]  00 - 40 page 4k
[0.00]  40 - 000fc0 page 2M
[0.00]  000fc0 - 000fffd000 page 4k
[0.00] kernel direct mapping tables up to fffd000 @ 23fb000-240
[0.00] log_buf_len: 8388608
[0.00] early log buf free: 129089(98%)
[0.00] RAMDISK: 0e73f000 - 0fff
[0.00] 0MB HIGHMEM available.
[0.00] 255MB LOWMEM available.
[0.00]   mapped low ram: 0 - 0fffd000
[0.00]   low ram: 0 - 0fffd000
[0.00] kvm-clock: Using msrs 12 and 11
[0.00] kvm-clock: cpu 0, msr 0:15ed781, boot clock
[0.00] Zone PFN ranges:
[0.00]   Normal   0x0010 - 0xfffd
[0.00]   HighMem  empty
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[2] active PFN ranges
[0.00] 0: 0x0010 - 0x009f
[0.00] 0: 0x0100 - 0xfffd
[0.00] On node 0 totalpages: 65420
[0.00] free_area_init_node: node 0, pgdat c16ac510, node_mem_map 
cdd3f200
[0.00]   Normal zone: 512 pages