Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-27 Thread Paul E. McKenney
On Wed, Aug 27, 2014 at 12:43:01PM -0400, Pranith Kumar wrote:
> On Wed, Aug 27, 2014 at 12:21 PM, Paul E. McKenney
>  wrote:
> > On Wed, Aug 27, 2014 at 10:13:50AM +0530, Amit Shah wrote:
> 
> >>
> >> Yes, this patch helps my case as well.
> >
> > Very good!!!
> >
> > Pranith, I can take this patch, but would you be willing to invert
> > the sense of ->nocb_leader_wake (e.g., call it ->nocb_leader_sleep or
> > some such)?  This field is only used in eight places in the source code.
> >
> > The idea is that inverting the sense of the field allows the normal C
> > initialization of zero to properly initialize this field, plus it gets
> > rid of a few lines of code.
> 
> Sure, that is indeed a good idea. I will send a new patch.

Very good, looking forward to seeing it!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-27 Thread Pranith Kumar
On Wed, Aug 27, 2014 at 12:21 PM, Paul E. McKenney
 wrote:
> On Wed, Aug 27, 2014 at 10:13:50AM +0530, Amit Shah wrote:

>>
>> Yes, this patch helps my case as well.
>
> Very good!!!
>
> Pranith, I can take this patch, but would you be willing to invert
> the sense of ->nocb_leader_wake (e.g., call it ->nocb_leader_sleep or
> some such)?  This field is only used in eight places in the source code.
>
> The idea is that inverting the sense of the field allows the normal C
> initialization of zero to properly initialize this field, plus it gets
> rid of a few lines of code.
>

Sure, that is indeed a good idea. I will send a new patch.

-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-27 Thread Paul E. McKenney
On Wed, Aug 27, 2014 at 10:13:50AM +0530, Amit Shah wrote:
> On (Sat) 23 Aug 2014 [03:43:38], Pranith Kumar wrote:
> > On Fri, Aug 22, 2014 at 5:53 PM, Paul E. McKenney
> >  wrote:
> > >
> > > Hmmm...  Please try replacing the synchronize_rcu() in
> > > __sysrq_swap_key_ops() with (say) schedule_timeout_interruptible(HZ / 10).
> > > I bet that gets rid of the hang.  (And also introduces a low-probability
> > > bug, but should be OK for testing.)
> > >
> > > The other thing to try is to revert your patch that turned my event
> > > traces into printk()s, then put an ftrace_dump(DUMP_ALL); just after
> > > the synchronize_rcu() -- that might make it so that the ftrace data
> > > actually gets dumped out.
> > >
> > 
> > I was able to reproduce this error on my Ubuntu 14.04 machine. I think
> > I found the root cause of the problem after several kvm runs.
> > 
> > The problem is that earlier we were waiting on nocb_head and now we
> > are waiting on nocb_leader_wake.
> > 
> > So there are a lot of nocb callbacks which are enqueued before the
> > nocb thread is spawned. This sets up nocb_head to be non-null, because
> > of which the nocb kthread used to wake up immediately after sleeping.
> > 
> > Now that we have switched to nocb_leader_wake, this is not being set
> > when there are pending callbacks, unless the callbacks overflow the
> > qhimark. The pending callbacks were around 7000 when the boot hangs.
> > 
> > So setting the qhimark using the boot parameter rcutree.qhimark=5000
> > is one way to allow us to boot past the point by forcefully waking up
> > the nocb kthread. I am not sure this is fool-proof.
> > 
> > Another option to start the nocb kthreads with nocb_leader_wake set,
> > so that it can handle any pending callbacks. The following patch also
> > allows us to boot properly.
> > 
> > Phew! Let me know if this makes any sense :)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 00dc411..4c397aa 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2386,6 +2386,9 @@ static int rcu_nocb_kthread(void *arg)
> > struct rcu_head **tail;
> > struct rcu_data *rdp = arg;
> > 
> > +   if (rdp->nocb_leader == rdp)
> > +   rdp->nocb_leader_wake = true;
> > +
> > /* Each pass through this loop invokes one batch of callbacks */
> > for (;;) {
> > /* Wait for callbacks. */
> 
> Yes, this patch helps my case as well.

Very good!!!

Pranith, I can take this patch, but would you be willing to invert
the sense of ->nocb_leader_wake (e.g., call it ->nocb_leader_sleep or
some such)?  This field is only used in eight places in the source code.

The idea is that inverting the sense of the field allows the normal C
initialization of zero to properly initialize this field, plus it gets
rid of a few lines of code.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-26 Thread Amit Shah
On (Sat) 23 Aug 2014 [03:43:38], Pranith Kumar wrote:
> On Fri, Aug 22, 2014 at 5:53 PM, Paul E. McKenney
>  wrote:
> >
> > Hmmm...  Please try replacing the synchronize_rcu() in
> > __sysrq_swap_key_ops() with (say) schedule_timeout_interruptible(HZ / 10).
> > I bet that gets rid of the hang.  (And also introduces a low-probability
> > bug, but should be OK for testing.)
> >
> > The other thing to try is to revert your patch that turned my event
> > traces into printk()s, then put an ftrace_dump(DUMP_ALL); just after
> > the synchronize_rcu() -- that might make it so that the ftrace data
> > actually gets dumped out.
> >
> 
> I was able to reproduce this error on my Ubuntu 14.04 machine. I think
> I found the root cause of the problem after several kvm runs.
> 
> The problem is that earlier we were waiting on nocb_head and now we
> are waiting on nocb_leader_wake.
> 
> So there are a lot of nocb callbacks which are enqueued before the
> nocb thread is spawned. This sets up nocb_head to be non-null, because
> of which the nocb kthread used to wake up immediately after sleeping.
> 
> Now that we have switched to nocb_leader_wake, this is not being set
> when there are pending callbacks, unless the callbacks overflow the
> qhimark. The pending callbacks were around 7000 when the boot hangs.
> 
> So setting the qhimark using the boot parameter rcutree.qhimark=5000
> is one way to allow us to boot past the point by forcefully waking up
> the nocb kthread. I am not sure this is fool-proof.
> 
> Another option to start the nocb kthreads with nocb_leader_wake set,
> so that it can handle any pending callbacks. The following patch also
> allows us to boot properly.
> 
> Phew! Let me know if this makes any sense :)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 00dc411..4c397aa 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2386,6 +2386,9 @@ static int rcu_nocb_kthread(void *arg)
> struct rcu_head **tail;
> struct rcu_data *rdp = arg;
> 
> +   if (rdp->nocb_leader == rdp)
> +   rdp->nocb_leader_wake = true;
> +
> /* Each pass through this loop invokes one batch of callbacks */
> for (;;) {
> /* Wait for callbacks. */

Yes, this patch helps my case as well.

Thanks!

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-24 Thread Paul E. McKenney
On Sat, Aug 23, 2014 at 11:39:39PM -0400, Pranith Kumar wrote:
> On Sat, Aug 23, 2014 at 11:23 PM, Paul E. McKenney
>  wrote:
> > On Sat, Aug 23, 2014 at 08:26:10PM -0400, Pranith Kumar wrote:
> >> On Sat, Aug 23, 2014 at 12:51 PM, Paul E. McKenney
> >>  wrote:
> >>
> >> > It might well!  Another possibility is that the early_initcall function
> >> > doing the synchronize_rcu() is happening before the early_initcall
> >> > creating the RCU grace-period kthreads.
> >> >
> >> > Seems like we need to close both holes.  Let's see how your patch works
> >> > for Amit, and I am testing a patch for the possible early_initcall
> >> > ordering issue.
> >>
> >> I checked the init call which is calling synchronize_rcu():
> >> subsys_initcall(pm_sysrq_init); this is being called after
> >> early_initcall.
> >>
> >> The order of initcalls is early, core, postcore, arch, subsys, fs,
> >> device, late. So I guess that is ok.
> >>
> >> I wonder why it was not showing up in 12.04. I have a dual boot. Will
> >> test it out and see if I can find something.
> >
> > Me, I am wondering about 7,000 callbacks being registered during early
> > boot time.  ;-)
> 
> This is the backtrace for most of the callbacks:

Thank you for the info!

And that explains why acpi=off helped the people running 14.04.

Thanx, Paul

> [4.612103] [ cut here ]
> [4.613340] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree_plugin.h:2115
> __call_rcu_nocb_enqueue+0x58/0x283()
> [4.615975] Modules linked in:
> [4.616000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW 3.16.0+ 
> #76
> [4.616000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Bochs 01/01/2011
> [4.616000]   81803c20 813c5213
> 
> [4.616000]  81803c58 810388aa 8108aac8
> 88001f9cce40
> [4.616000]  88001f5b6c08  0286
> 81803c68
> [4.616000] Call Trace:
> [4.616000]  [] dump_stack+0x4e/0x7a
> [4.616000]  [] warn_slowpath_common+0x7f/0x98
> [4.616000]  [] ? __call_rcu_nocb_enqueue+0x58/0x283
> [4.616000]  [] warn_slowpath_null+0x1a/0x1c
> [4.616000]  [] __call_rcu_nocb_enqueue+0x58/0x283
> [4.616000]  [] ? unreferenced_object+0x4f/0x4f
> [4.616000]  [] __call_rcu+0xcd/0x32b
> [4.616000]  [] call_rcu+0x1b/0x1d
> [4.616000]  [] put_object+0x41/0x44
> [4.616000]  [] delete_object_full+0x29/0x2c
> [4.616000]  [] kmemleak_free+0x25/0x43
> [4.616000]  [] slab_free_hook+0x1d/0x63
> [4.616000]  [] kmem_cache_free+0x52/0x154
> [4.616000]  [] ? acpi_os_release_object+0xe/0x12
> [4.616000]  [] acpi_os_release_object+0xe/0x12
> [4.616000]  [] acpi_ps_free_op+0x25/0x27
> [4.616000]  [] acpi_ps_create_op+0x135/0x209
> [4.616000]  [] acpi_ps_parse_loop+0x1d3/0x575
> [4.616000]  [] acpi_ps_parse_aml+0xa0/0x277
> [4.616000]  [] acpi_ns_one_complete_parse+0xfc/0x11b
> [4.616000]  [] acpi_ns_parse_table+0x33/0x38
> [4.616000]  [] acpi_ns_load_table+0x4c/0x8b
> [4.616000]  [] acpi_load_tables+0x9d/0x15d
> [4.616000]  [] acpi_early_init+0x73/0xfe
> [4.616000]  [] start_kernel+0x3a9/0x40a
> [4.616000]  [] ? early_idt_handlers+0x120/0x120
> [4.616000]  [] x86_64_start_reservations+0x2a/0x2c
> [4.616000]  [] x86_64_start_kernel+0x13c/0x149
> [4.616000] ---[ end trace 8dbfee90ca96696c ]---
> 
> 
> -- 
> Pranith
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-23 Thread Pranith Kumar
On Sat, Aug 23, 2014 at 11:23 PM, Paul E. McKenney
 wrote:
> On Sat, Aug 23, 2014 at 08:26:10PM -0400, Pranith Kumar wrote:
>> On Sat, Aug 23, 2014 at 12:51 PM, Paul E. McKenney
>>  wrote:
>>
>> > It might well!  Another possibility is that the early_initcall function
>> > doing the synchronize_rcu() is happening before the early_initcall
>> > creating the RCU grace-period kthreads.
>> >
>> > Seems like we need to close both holes.  Let's see how your patch works
>> > for Amit, and I am testing a patch for the possible early_initcall
>> > ordering issue.
>>
>> I checked the init call which is calling synchronize_rcu():
>> subsys_initcall(pm_sysrq_init); this is being called after
>> early_initcall.
>>
>> The order of initcalls is early, core, postcore, arch, subsys, fs,
>> device, late. So I guess that is ok.
>>
>> I wonder why it was not showing up in 12.04. I have a dual boot. Will
>> test it out and see if I can find something.
>
> Me, I am wondering about 7,000 callbacks being registered during early
> boot time.  ;-)
>


This is the backtrace for most of the callbacks:

[4.612103] [ cut here ]
[4.613340] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree_plugin.h:2115
__call_rcu_nocb_enqueue+0x58/0x283()
[4.615975] Modules linked in:
[4.616000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW 3.16.0+ #76
[4.616000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[4.616000]   81803c20 813c5213

[4.616000]  81803c58 810388aa 8108aac8
88001f9cce40
[4.616000]  88001f5b6c08  0286
81803c68
[4.616000] Call Trace:
[4.616000]  [] dump_stack+0x4e/0x7a
[4.616000]  [] warn_slowpath_common+0x7f/0x98
[4.616000]  [] ? __call_rcu_nocb_enqueue+0x58/0x283
[4.616000]  [] warn_slowpath_null+0x1a/0x1c
[4.616000]  [] __call_rcu_nocb_enqueue+0x58/0x283
[4.616000]  [] ? unreferenced_object+0x4f/0x4f
[4.616000]  [] __call_rcu+0xcd/0x32b
[4.616000]  [] call_rcu+0x1b/0x1d
[4.616000]  [] put_object+0x41/0x44
[4.616000]  [] delete_object_full+0x29/0x2c
[4.616000]  [] kmemleak_free+0x25/0x43
[4.616000]  [] slab_free_hook+0x1d/0x63
[4.616000]  [] kmem_cache_free+0x52/0x154
[4.616000]  [] ? acpi_os_release_object+0xe/0x12
[4.616000]  [] acpi_os_release_object+0xe/0x12
[4.616000]  [] acpi_ps_free_op+0x25/0x27
[4.616000]  [] acpi_ps_create_op+0x135/0x209
[4.616000]  [] acpi_ps_parse_loop+0x1d3/0x575
[4.616000]  [] acpi_ps_parse_aml+0xa0/0x277
[4.616000]  [] acpi_ns_one_complete_parse+0xfc/0x11b
[4.616000]  [] acpi_ns_parse_table+0x33/0x38
[4.616000]  [] acpi_ns_load_table+0x4c/0x8b
[4.616000]  [] acpi_load_tables+0x9d/0x15d
[4.616000]  [] acpi_early_init+0x73/0xfe
[4.616000]  [] start_kernel+0x3a9/0x40a
[4.616000]  [] ? early_idt_handlers+0x120/0x120
[4.616000]  [] x86_64_start_reservations+0x2a/0x2c
[4.616000]  [] x86_64_start_kernel+0x13c/0x149
[4.616000] ---[ end trace 8dbfee90ca96696c ]---


-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-23 Thread Paul E. McKenney
On Sat, Aug 23, 2014 at 08:26:10PM -0400, Pranith Kumar wrote:
> On Sat, Aug 23, 2014 at 12:51 PM, Paul E. McKenney
>  wrote:
> 
> > It might well!  Another possibility is that the early_initcall function
> > doing the synchronize_rcu() is happening before the early_initcall
> > creating the RCU grace-period kthreads.
> >
> > Seems like we need to close both holes.  Let's see how your patch works
> > for Amit, and I am testing a patch for the possible early_initcall
> > ordering issue.
> 
> I checked the init call which is calling synchronize_rcu():
> subsys_initcall(pm_sysrq_init); this is being called after
> early_initcall.
> 
> The order of initcalls is early, core, postcore, arch, subsys, fs,
> device, late. So I guess that is ok.
> 
> I wonder why it was not showing up in 12.04. I have a dual boot. Will
> test it out and see if I can find something.

Me, I am wondering about 7,000 callbacks being registered during early
boot time.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-23 Thread Pranith Kumar
On Sat, Aug 23, 2014 at 12:51 PM, Paul E. McKenney
 wrote:

> It might well!  Another possibility is that the early_initcall function
> doing the synchronize_rcu() is happening before the early_initcall
> creating the RCU grace-period kthreads.
>
> Seems like we need to close both holes.  Let's see how your patch works
> for Amit, and I am testing a patch for the possible early_initcall
> ordering issue.

I checked the init call which is calling synchronize_rcu():
subsys_initcall(pm_sysrq_init); this is being called after
early_initcall.

The order of initcalls is early, core, postcore, arch, subsys, fs,
device, late. So I guess that is ok.

I wonder why it was not showing up in 12.04. I have a dual boot. Will
test it out and see if I can find something.

-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-23 Thread Paul E. McKenney
On Sat, Aug 23, 2014 at 03:43:38AM -0400, Pranith Kumar wrote:
> On Fri, Aug 22, 2014 at 5:53 PM, Paul E. McKenney
>  wrote:
> >
> > Hmmm...  Please try replacing the synchronize_rcu() in
> > __sysrq_swap_key_ops() with (say) schedule_timeout_interruptible(HZ / 10).
> > I bet that gets rid of the hang.  (And also introduces a low-probability
> > bug, but should be OK for testing.)
> >
> > The other thing to try is to revert your patch that turned my event
> > traces into printk()s, then put an ftrace_dump(DUMP_ALL); just after
> > the synchronize_rcu() -- that might make it so that the ftrace data
> > actually gets dumped out.
> >
> 
> I was able to reproduce this error on my Ubuntu 14.04 machine. I think
> I found the root cause of the problem after several kvm runs.
> 
> The problem is that earlier we were waiting on nocb_head and now we
> are waiting on nocb_leader_wake.
> 
> So there are a lot of nocb callbacks which are enqueued before the
> nocb thread is spawned. This sets up nocb_head to be non-null, because
> of which the nocb kthread used to wake up immediately after sleeping.
> 
> Now that we have switched to nocb_leader_wake, this is not being set
> when there are pending callbacks, unless the callbacks overflow the
> qhimark. The pending callbacks were around 7000 when the boot hangs.
> 
> So setting the qhimark using the boot parameter rcutree.qhimark=5000
> is one way to allow us to boot past the point by forcefully waking up
> the nocb kthread. I am not sure this is fool-proof.

Unfortunately, not in all cases.  A small kernel for embedded use might
register only a few callbacks during boot, which could still result
in a hang.

> Another option to start the nocb kthreads with nocb_leader_wake set,
> so that it can handle any pending callbacks. The following patch also
> allows us to boot properly.

This seems like a much better approach.

> Phew! Let me know if this makes any sense :)

It might well!  Another possibility is that the early_initcall function
doing the synchronize_rcu() is happening before the early_initcall
creating the RCU grace-period kthreads.

Seems like we need to close both holes.  Let's see how your patch works
for Amit, and I am testing a patch for the possible early_initcall
ordering issue.

> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 00dc411..4c397aa 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2386,6 +2386,9 @@ static int rcu_nocb_kthread(void *arg)
> struct rcu_head **tail;
> struct rcu_data *rdp = arg;
> 
> +   if (rdp->nocb_leader == rdp)
> +   rdp->nocb_leader_wake = true;
> +

Not that it matters all that much, but given that the followers don't
ever reference ->nocb_leader_wake, we should be able to set this flag
unconditionally.

Thanx, Paul

> /* Each pass through this loop invokes one batch of callbacks */
> for (;;) {
> /* Wait for callbacks. */
> 
> 
> -- 
> Pranith
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-23 Thread Pranith Kumar
On Fri, Aug 22, 2014 at 5:53 PM, Paul E. McKenney
 wrote:
>
> Hmmm...  Please try replacing the synchronize_rcu() in
> __sysrq_swap_key_ops() with (say) schedule_timeout_interruptible(HZ / 10).
> I bet that gets rid of the hang.  (And also introduces a low-probability
> bug, but should be OK for testing.)
>
> The other thing to try is to revert your patch that turned my event
> traces into printk()s, then put an ftrace_dump(DUMP_ALL); just after
> the synchronize_rcu() -- that might make it so that the ftrace data
> actually gets dumped out.
>

I was able to reproduce this error on my Ubuntu 14.04 machine. I think
I found the root cause of the problem after several kvm runs.

The problem is that earlier we were waiting on nocb_head and now we
are waiting on nocb_leader_wake.

So there are a lot of nocb callbacks which are enqueued before the
nocb thread is spawned. This sets up nocb_head to be non-null, because
of which the nocb kthread used to wake up immediately after sleeping.

Now that we have switched to nocb_leader_wake, this is not being set
when there are pending callbacks, unless the callbacks overflow the
qhimark. The pending callbacks were around 7000 when the boot hangs.

So setting the qhimark using the boot parameter rcutree.qhimark=5000
is one way to allow us to boot past the point by forcefully waking up
the nocb kthread. I am not sure this is fool-proof.

Another option to start the nocb kthreads with nocb_leader_wake set,
so that it can handle any pending callbacks. The following patch also
allows us to boot properly.

Phew! Let me know if this makes any sense :)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 00dc411..4c397aa 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2386,6 +2386,9 @@ static int rcu_nocb_kthread(void *arg)
struct rcu_head **tail;
struct rcu_data *rdp = arg;

+   if (rdp->nocb_leader == rdp)
+   rdp->nocb_leader_wake = true;
+
/* Each pass through this loop invokes one batch of callbacks */
for (;;) {
/* Wait for callbacks. */


-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-22 Thread Paul E. McKenney
On Fri, Aug 22, 2014 at 02:53:44PM -0700, Paul E. McKenney wrote:
> On Fri, Aug 22, 2014 at 10:44:05PM +0530, Amit Shah wrote:
> > On (Fri) 22 Aug 2014 [07:48:19], Paul E. McKenney wrote:
> > > On Fri, Aug 22, 2014 at 06:26:49PM +0530, Amit Shah wrote:
> > > > On (Fri) 22 Aug 2014 [18:06:51], Amit Shah wrote:
> > > > > On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote:
> > > > > > On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote:
> > > > > > 
> > > > > > > The odds are low over the next few days.  I am adding nastier 
> > > > > > > rcutorture
> > > > > > > testing, however.  It would still be very good to get debug 
> > > > > > > information
> > > > > > > from your setup.  One approach would be to convert the trace 
> > > > > > > function
> > > > > > > calls into printk(), if that would help.
> > > > > > 
> > > > > > I added a few printks on the lines of the traces in cases where
> > > > > > rcu_nocb_poll was checked -- since that reproduces the hang.  Are 
> > > > > > the
> > > > > > following traces sufficient, or should I keep adding more printks?
> > > > > > 
> > > > > > In the case of rcu-trace-nopoll.txt, the messages stop after a while
> > > > > > (when the guest locks up hard).  That's when I kill the qemu 
> > > > > > process.
> > > > > 
> > > > > And this is bt from gdb when the endless 
> > > > > 
> > > > >   RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot
> > > > > 
> > > > > messages are being spewed.
> > > > > 
> > > > > I can't time it, but hope it gives some indication along with the 
> > > > > printks.
> > > > 
> > > > ... and after the system 'locks up', this is the state it's in:
> > > > 
> > > > ^C
> > > > Program received signal SIGINT, Interrupt.
> > > > native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > > > 50   }
> > > > (gdb) bt
> > > > #0  native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > > > #1  0x8100b9c1 in arch_safe_halt () at 
> > > > ./arch/x86/include/asm/paravirt.h:111
> > > > #2  default_idle () at arch/x86/kernel/process.c:311
> > > > #3  0x8100c107 in arch_cpu_idle () at 
> > > > arch/x86/kernel/process.c:302
> > > > #4  0x8106a25a in cpuidle_idle_call () at 
> > > > kernel/sched/idle.c:120
> > > > #5  cpu_idle_loop () at kernel/sched/idle.c:220
> > > > #6  cpu_startup_entry (state=) at kernel/sched/idle.c:268
> > > > #7  0x813e068b in rest_init () at init/main.c:418
> > > > #8  0x81a8cf5a in start_kernel () at init/main.c:680
> > > > #9  0x81a8c4ba in x86_64_start_reservations 
> > > > (real_mode_data=) at arch/x86/kernel/head64.c:193
> > > > #10 0x81a8c607 in x86_64_start_kernel (real_mode_data=0x13f90 
> > > >  )
> > > > at arch/x86/kernel/head64.c:182
> > > > #11 0x in ?? ()
> > > > 
> > > > 
> > > > Wondering why it's doing this.  Am stepping through
> > > > cpu_startup_entry() to see if I get any clues.
> > > 
> > > This looks to me like normal behavior in the x86 ACPI idle loop.
> > > My guess is that the lockup is caused by indefinite blocking, in
> > > which case we would expect all the CPUs to be in the idle loop.
> > 
> > Hm, found it:
> > 
> > The stall happens in do_initcalls().
> > 
> > pm_sysrq_init() is the function that causes the hang.  When I #if 0
> > the line
> > 
> > register_sysrq_key('o', &sysrq_poweroff_op);
> > 
> > in pm_sysrq_init(), the boot proceeds normally.
> 
> Yow!!!
> 
> > Now what this is, and what relation this has to rcu and that patch in
> > particular is next...
> 
> Hmmm...  Please try replacing the synchronize_rcu() in
> __sysrq_swap_key_ops() with (say) schedule_timeout_interruptible(HZ / 10).
> I bet that gets rid of the hang.  (And also introduces a low-probability
> bug, but should be OK for testing.)
> 
> The other thing to try is to revert your patch that turned my event
> traces into printk()s, then put an ftrace_dump(DUMP_ALL); just after
> the synchronize_rcu() -- that might make it so that the ftrace data
> actually gets dumped out.

And one other thing to try...

Put a printk at the beginning of rcu_spawn_gp_kthread(), which is in
kernel/rcu/tree.c.  If that printk does not appear before the call
to pm_sysrq_init(), that would be an important clue.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-22 Thread Paul E. McKenney
On Fri, Aug 22, 2014 at 10:44:05PM +0530, Amit Shah wrote:
> On (Fri) 22 Aug 2014 [07:48:19], Paul E. McKenney wrote:
> > On Fri, Aug 22, 2014 at 06:26:49PM +0530, Amit Shah wrote:
> > > On (Fri) 22 Aug 2014 [18:06:51], Amit Shah wrote:
> > > > On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote:
> > > > > On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote:
> > > > > 
> > > > > > The odds are low over the next few days.  I am adding nastier 
> > > > > > rcutorture
> > > > > > testing, however.  It would still be very good to get debug 
> > > > > > information
> > > > > > from your setup.  One approach would be to convert the trace 
> > > > > > function
> > > > > > calls into printk(), if that would help.
> > > > > 
> > > > > I added a few printks on the lines of the traces in cases where
> > > > > rcu_nocb_poll was checked -- since that reproduces the hang.  Are the
> > > > > following traces sufficient, or should I keep adding more printks?
> > > > > 
> > > > > In the case of rcu-trace-nopoll.txt, the messages stop after a while
> > > > > (when the guest locks up hard).  That's when I kill the qemu process.
> > > > 
> > > > And this is bt from gdb when the endless 
> > > > 
> > > >   RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot
> > > > 
> > > > messages are being spewed.
> > > > 
> > > > I can't time it, but hope it gives some indication along with the 
> > > > printks.
> > > 
> > > ... and after the system 'locks up', this is the state it's in:
> > > 
> > > ^C
> > > Program received signal SIGINT, Interrupt.
> > > native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > > 50 }
> > > (gdb) bt
> > > #0  native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > > #1  0x8100b9c1 in arch_safe_halt () at 
> > > ./arch/x86/include/asm/paravirt.h:111
> > > #2  default_idle () at arch/x86/kernel/process.c:311
> > > #3  0x8100c107 in arch_cpu_idle () at 
> > > arch/x86/kernel/process.c:302
> > > #4  0x8106a25a in cpuidle_idle_call () at kernel/sched/idle.c:120
> > > #5  cpu_idle_loop () at kernel/sched/idle.c:220
> > > #6  cpu_startup_entry (state=) at kernel/sched/idle.c:268
> > > #7  0x813e068b in rest_init () at init/main.c:418
> > > #8  0x81a8cf5a in start_kernel () at init/main.c:680
> > > #9  0x81a8c4ba in x86_64_start_reservations 
> > > (real_mode_data=) at arch/x86/kernel/head64.c:193
> > > #10 0x81a8c607 in x86_64_start_kernel (real_mode_data=0x13f90 
> > >  )
> > > at arch/x86/kernel/head64.c:182
> > > #11 0x in ?? ()
> > > 
> > > 
> > > Wondering why it's doing this.  Am stepping through
> > > cpu_startup_entry() to see if I get any clues.
> > 
> > This looks to me like normal behavior in the x86 ACPI idle loop.
> > My guess is that the lockup is caused by indefinite blocking, in
> > which case we would expect all the CPUs to be in the idle loop.
> 
> Hm, found it:
> 
> The stall happens in do_initcalls().
> 
> pm_sysrq_init() is the function that causes the hang.  When I #if 0
> the line
> 
> register_sysrq_key('o', &sysrq_poweroff_op);
> 
> in pm_sysrq_init(), the boot proceeds normally.

Yow!!!

> Now what this is, and what relation this has to rcu and that patch in
> particular is next...

Hmmm...  Please try replacing the synchronize_rcu() in
__sysrq_swap_key_ops() with (say) schedule_timeout_interruptible(HZ / 10).
I bet that gets rid of the hang.  (And also introduces a low-probability
bug, but should be OK for testing.)

The other thing to try is to revert your patch that turned my event
traces into printk()s, then put an ftrace_dump(DUMP_ALL); just after
the synchronize_rcu() -- that might make it so that the ftrace data
actually gets dumped out.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-22 Thread Amit Shah
On (Fri) 22 Aug 2014 [22:44:05], Amit Shah wrote:
> Hm, found it:
> 
> The stall happens in do_initcalls().
> 
> pm_sysrq_init() is the function that causes the hang.  When I #if 0
> the line
> 
> register_sysrq_key('o', &sysrq_poweroff_op);
> 
> in pm_sysrq_init(), the boot proceeds normally.
> 
> Now what this is, and what relation this has to rcu and that patch in
> particular is next...

... and enabling the following debug options makes the bug disappear:

CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_SELFTEST=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1

Anyway, so it looks like a race somewhere in the schedule_work_on()
chain.  Not sure how to capture the debug messages there w/o disabling
these debug options.  I'll keep trying, though.


Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-22 Thread Amit Shah
On (Fri) 22 Aug 2014 [07:48:19], Paul E. McKenney wrote:
> On Fri, Aug 22, 2014 at 06:26:49PM +0530, Amit Shah wrote:
> > On (Fri) 22 Aug 2014 [18:06:51], Amit Shah wrote:
> > > On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote:
> > > > On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote:
> > > > 
> > > > > The odds are low over the next few days.  I am adding nastier 
> > > > > rcutorture
> > > > > testing, however.  It would still be very good to get debug 
> > > > > information
> > > > > from your setup.  One approach would be to convert the trace function
> > > > > calls into printk(), if that would help.
> > > > 
> > > > I added a few printks on the lines of the traces in cases where
> > > > rcu_nocb_poll was checked -- since that reproduces the hang.  Are the
> > > > following traces sufficient, or should I keep adding more printks?
> > > > 
> > > > In the case of rcu-trace-nopoll.txt, the messages stop after a while
> > > > (when the guest locks up hard).  That's when I kill the qemu process.
> > > 
> > > And this is bt from gdb when the endless 
> > > 
> > >   RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot
> > > 
> > > messages are being spewed.
> > > 
> > > I can't time it, but hope it gives some indication along with the printks.
> > 
> > ... and after the system 'locks up', this is the state it's in:
> > 
> > ^C
> > Program received signal SIGINT, Interrupt.
> > native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > 50   }
> > (gdb) bt
> > #0  native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > #1  0x8100b9c1 in arch_safe_halt () at 
> > ./arch/x86/include/asm/paravirt.h:111
> > #2  default_idle () at arch/x86/kernel/process.c:311
> > #3  0x8100c107 in arch_cpu_idle () at arch/x86/kernel/process.c:302
> > #4  0x8106a25a in cpuidle_idle_call () at kernel/sched/idle.c:120
> > #5  cpu_idle_loop () at kernel/sched/idle.c:220
> > #6  cpu_startup_entry (state=) at kernel/sched/idle.c:268
> > #7  0x813e068b in rest_init () at init/main.c:418
> > #8  0x81a8cf5a in start_kernel () at init/main.c:680
> > #9  0x81a8c4ba in x86_64_start_reservations 
> > (real_mode_data=) at arch/x86/kernel/head64.c:193
> > #10 0x81a8c607 in x86_64_start_kernel (real_mode_data=0x13f90 
> >  )
> > at arch/x86/kernel/head64.c:182
> > #11 0x in ?? ()
> > 
> > 
> > Wondering why it's doing this.  Am stepping through
> > cpu_startup_entry() to see if I get any clues.
> 
> This looks to me like normal behavior in the x86 ACPI idle loop.
> My guess is that the lockup is caused by indefinite blocking, in
> which case we would expect all the CPUs to be in the idle loop.

Hm, found it:

The stall happens in do_initcalls().

pm_sysrq_init() is the function that causes the hang.  When I #if 0
the line

register_sysrq_key('o', &sysrq_poweroff_op);

in pm_sysrq_init(), the boot proceeds normally.

Now what this is, and what relation this has to rcu and that patch in
particular is next...


Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-22 Thread Paul E. McKenney
On Fri, Aug 22, 2014 at 06:26:49PM +0530, Amit Shah wrote:
> On (Fri) 22 Aug 2014 [18:06:51], Amit Shah wrote:
> > On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote:
> > > On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote:
> > > 
> > > > The odds are low over the next few days.  I am adding nastier rcutorture
> > > > testing, however.  It would still be very good to get debug information
> > > > from your setup.  One approach would be to convert the trace function
> > > > calls into printk(), if that would help.
> > > 
> > > I added a few printks on the lines of the traces in cases where
> > > rcu_nocb_poll was checked -- since that reproduces the hang.  Are the
> > > following traces sufficient, or should I keep adding more printks?
> > > 
> > > In the case of rcu-trace-nopoll.txt, the messages stop after a while
> > > (when the guest locks up hard).  That's when I kill the qemu process.
> > 
> > And this is bt from gdb when the endless 
> > 
> >   RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot
> > 
> > messages are being spewed.
> > 
> > I can't time it, but hope it gives some indication along with the printks.
> 
> ... and after the system 'locks up', this is the state it's in:
> 
> ^C
> Program received signal SIGINT, Interrupt.
> native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> 50 }
> (gdb) bt
> #0  native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> #1  0x8100b9c1 in arch_safe_halt () at 
> ./arch/x86/include/asm/paravirt.h:111
> #2  default_idle () at arch/x86/kernel/process.c:311
> #3  0x8100c107 in arch_cpu_idle () at arch/x86/kernel/process.c:302
> #4  0x8106a25a in cpuidle_idle_call () at kernel/sched/idle.c:120
> #5  cpu_idle_loop () at kernel/sched/idle.c:220
> #6  cpu_startup_entry (state=) at kernel/sched/idle.c:268
> #7  0x813e068b in rest_init () at init/main.c:418
> #8  0x81a8cf5a in start_kernel () at init/main.c:680
> #9  0x81a8c4ba in x86_64_start_reservations 
> (real_mode_data=) at arch/x86/kernel/head64.c:193
> #10 0x81a8c607 in x86_64_start_kernel (real_mode_data=0x13f90 
>  )
> at arch/x86/kernel/head64.c:182
> #11 0x in ?? ()
> 
> 
> Wondering why it's doing this.  Am stepping through
> cpu_startup_entry() to see if I get any clues.

This looks to me like normal behavior in the x86 ACPI idle loop.
My guess is that the lockup is caused by indefinite blocking, in
which case we would expect all the CPUs to be in the idle loop.

Of course, this all assumes that your system is using ACPI for idle.
(Is it?)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-22 Thread Amit Shah
On (Fri) 22 Aug 2014 [18:06:51], Amit Shah wrote:
> On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote:
> > On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote:
> > 
> > > The odds are low over the next few days.  I am adding nastier rcutorture
> > > testing, however.  It would still be very good to get debug information
> > > from your setup.  One approach would be to convert the trace function
> > > calls into printk(), if that would help.
> > 
> > I added a few printks on the lines of the traces in cases where
> > rcu_nocb_poll was checked -- since that reproduces the hang.  Are the
> > following traces sufficient, or should I keep adding more printks?
> > 
> > In the case of rcu-trace-nopoll.txt, the messages stop after a while
> > (when the guest locks up hard).  That's when I kill the qemu process.
> 
> And this is bt from gdb when the endless 
> 
>   RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot
> 
> messages are being spewed.
> 
> I can't time it, but hope it gives some indication along with the printks.

... and after the system 'locks up', this is the state it's in:

^C
Program received signal SIGINT, Interrupt.
native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
50   }
(gdb) bt
#0  native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
#1  0x8100b9c1 in arch_safe_halt () at 
./arch/x86/include/asm/paravirt.h:111
#2  default_idle () at arch/x86/kernel/process.c:311
#3  0x8100c107 in arch_cpu_idle () at arch/x86/kernel/process.c:302
#4  0x8106a25a in cpuidle_idle_call () at kernel/sched/idle.c:120
#5  cpu_idle_loop () at kernel/sched/idle.c:220
#6  cpu_startup_entry (state=) at kernel/sched/idle.c:268
#7  0x813e068b in rest_init () at init/main.c:418
#8  0x81a8cf5a in start_kernel () at init/main.c:680
#9  0x81a8c4ba in x86_64_start_reservations (real_mode_data=) at arch/x86/kernel/head64.c:193
#10 0x81a8c607 in x86_64_start_kernel (real_mode_data=0x13f90 
 )
at arch/x86/kernel/head64.c:182
#11 0x in ?? ()


Wondering why it's doing this.  Am stepping through
cpu_startup_entry() to see if I get any clues.


Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-22 Thread Amit Shah
On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote:
> On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote:
> 
> > The odds are low over the next few days.  I am adding nastier rcutorture
> > testing, however.  It would still be very good to get debug information
> > from your setup.  One approach would be to convert the trace function
> > calls into printk(), if that would help.
> 
> I added a few printks on the lines of the traces in cases where
> rcu_nocb_poll was checked -- since that reproduces the hang.  Are the
> following traces sufficient, or should I keep adding more printks?
> 
> In the case of rcu-trace-nopoll.txt, the messages stop after a while
> (when the guest locks up hard).  That's when I kill the qemu process.

And this is bt from gdb when the endless 

  RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot

messages are being spewed.

I can't time it, but hope it gives some indication along with the printks.

Program received signal SIGINT, Interrupt.
io_serial_out (p=0x82940780 , offset=, 
value=) at drivers/tty/serial/8250/8250_core.c:439
439   }
(gdb) bt
#0  io_serial_out (p=0x82940780 , offset=, value=) at drivers/tty/serial/8250/8250_core.c:439
#1  0x812b260a in serial_port_out (up=, 
offset=, value=) at 
include/linux/serial_core.h:214
#2  0x812b4781 in serial8250_console_putchar (port=0x82940780 
, ch=111) at drivers/tty/serial/8250/8250_core.c:2990
#3  0x812af07d in uart_console_write (port=0x82940780 
, 
s=0x828dd96a  "t\n8fff]\nes: 4KB 0, 2MB 0, 4MB 0, 1GB 
0\n6K bss, 33036K reserved)\n2 17:46:45 IST 2014\n", count=60, 
putchar=0x812b4758 ) at 
drivers/tty/serial/serial_core.c:1747
#4  0x812b470c in serial8250_console_write (co=, 
s=, count=60) at drivers/tty/serial/8250/8250_core.c:3025
#5  0x8107f517 in call_console_drivers (level=, len=60, 
text=) at kernel/printk/printk.c:1421
#6  0x81080498 in console_unlock () at kernel/printk/printk.c:2244
#7  0x81080b39 in vprintk_emit (facility=, 
level=, dict=, dictlen=, 
fmt=, args=) at kernel/printk/printk.c:1786
#8  0x813e5235 in printk (fmt=) at 
kernel/printk/printk.c:1851
#9  0x8108e46b in __call_rcu_nocb_enqueue (rdp=0x88000fbcce00, 
rhp=, rhtp=, rhcount=, 
rhcount_lazy=, flags=) at 
kernel/rcu/tree_plugin.h:2144
#10 0x81091140 in __call_rcu_nocb (flags=, 
lazy=, rhp=, rdp=)
at kernel/rcu/tree_plugin.h:2166
#11 __call_rcu (head=0x88000e6c5390, func=0x81131346 
, rsp=0x818389c0 , cpu=, 
lazy=) at kernel/rcu/tree.c:2687
#12 0x81091673 in call_rcu (head=, func=) 
at kernel/rcu/tree_plugin.h:678
#13 0x81131756 in put_object (object=0x88000e6c5308) at 
mm/kmemleak.c:471
#14 0x81131b8c in delete_object_full (ptr=) at 
mm/kmemleak.c:641
#15 0x813e1782 in kmemleak_free (ptr=) at 
mm/kmemleak.c:944
#16 0x81128782 in kmemleak_free_recursive (flags=, 
ptr=) at include/linux/kmemleak.h:50
#17 slab_free_hook (s=0x82940780 , 
x=0x88000e991c68) at mm/slub.c:1265
#18 0x8112a725 in slab_free (addr=, x=, 
page=, s=) at mm/slub.c:2644
#19 kmem_cache_free (s=, x=0x88000e991c68) at mm/slub.c:2681
#20 0x8121d84c in ida_get_new_above (ida=0x82940780 
, starting_id=, p_id=) at 
lib/idr.c:999
#21 0x8121dbe6 in ida_simple_get (ida=0x82940780 
, start=1016, end=, gfp_mask=0) at 
lib/idr.c:1101
#22 0x81188f19 in __kernfs_new_node (root=, name=0x0 
, mode=33060, flags=514) at fs/kernfs/dir.c:530
#23 0x81189e22 in kernfs_new_node (parent=0x88000e651000, 
name=, mode=33060, flags=) at fs/kernfs/dir.c:558
#24 0x8118b3a3 in __kernfs_create_file (parent=, 
name=, mode=, size=4096, 
ops=0x81424a80 , priv=, ns=0x0 
, name_is_static=true, key=0x81bc3a20 <__key.17290>)
at fs/kernfs/file.c:920
#25 0x8118bb6e in sysfs_add_file_mode_ns (parent=0x88000e651000, 
attr=0x88000e621358, is_bin=, mode=, 
ns=) at fs/sysfs/file.c:256
#26 0x8118c4c0 in create_files (update=, grp=, kobj=, parent=) at fs/sysfs/group.c:58
#27 internal_create_group (kobj=0x88000e67a1a8, update=, 
grp=0x88000e621298) at fs/sysfs/group.c:116
#28 0x8118c562 in sysfs_create_group (kobj=, 
grp=) at fs/sysfs/group.c:138
#29 0x81aa09e9 in kernel_add_sysfs_param (name_skip=, 
kparam=, name=) at kernel/params.c:783
#30 param_sysfs_builtin () at kernel/params.c:820
#31 param_sysfs_init () at kernel/params.c:940
#32 0x810003f4 in do_one_initcall (fn=0x81aa0886 
) at init/main.c:791
#33 0x81a8d08a in do_initcall_level (level=) at 
init/main.c:857
#34 do_initcalls () at init/main.c:865
#35 do_basic_setup () at init/main.c:884
#36 kernel_init_freeable () at init/main.c:1005
#37 0x813e084d in kernel_init (unused=) at 
init/main.c:935
#38 
#39 0x in irq_stack_union ()
#40 0x000

Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-18 Thread Paul E. McKenney
On Mon, Aug 18, 2014 at 11:23:45PM +0530, Amit Shah wrote:
> On (Fri) 15 Aug 2014 [08:04:05], Paul E. McKenney wrote:
> > On Fri, Aug 15, 2014 at 10:54:11AM +0530, Amit Shah wrote:
> > > On (Wed) 13 Aug 2014 [06:00:49], Paul E. McKenney wrote:
> > > > On Wed, Aug 13, 2014 at 11:14:39AM +0530, Amit Shah wrote:
> > > > > On (Tue) 12 Aug 2014 [14:41:51], Paul E. McKenney wrote:
> > > > > > On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> > > > > > > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > > > > > > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> > > > > > > 
> > > > > > > [ . . . ]
> > > > > > > 
> > > > > > > > > I know of only virtio-console doing this (via userspace only,
> > > > > > > > > though).
> > > > > > > > 
> > > > > > > > As in userspace within the guest?  That would not work.  The 
> > > > > > > > userspace
> > > > > > > > that the qemu is running in might.  There is a way to extract 
> > > > > > > > ftrace info
> > > > > > > > from crash dumps, so one approach would be "sendkey 
> > > > > > > > alt-sysrq-c", then
> > > > > > > > pull the buffer from the resulting dump.  For all I know, there 
> > > > > > > > might also
> > > > > > > > be some script that uses the qemu "x" command to get at the 
> > > > > > > > ftrace buffer.
> > > > > > > > 
> > > > > > > > Again, I cannot reproduce this, and I have been through the 
> > > > > > > > code several
> > > > > > > > times over the past few days, and am not seeing it.  I could 
> > > > > > > > start
> > > > > > > > sending you random diagnostic patches, but it would be much 
> > > > > > > > better if
> > > > > > > > we could get the trace data from the failure.
> > > > > 
> > > > > I think the only recourse I now have is to dump the guest state from
> > > > > qemu, and attempt to find the ftrace buffers by poking pages and
> > > > > finding some ftrace-like struct... and then dumping the buffers.
> > > > 
> > > > The data exists in the qemu guest state, so it would be good to have
> > > > it one way or another.  My current (perhaps self-serving) guess is that
> > > > you have come up with a way to trick qemu into dropping IPIs.
> > > 
> > > I didn't get around to doing this yet; will get to it next week.
> > > 
> > > In the meantime, I tried this on RHEL6 (with RHEL6 qemu and gcc and
> > > seabios), and that exhibits the problem similarly with my .config.
> > 
> > And I am running my tests successfully on an x86_64 system running
> > Ubuntu 12.04.  Some testing on 14.04 seems to require booting with
> > acpi=off, leading to my perhaps self-serving guess above.
> 
> It looks like Ubuntu 12.04 has a choice of multiple kernels.  Which
> one are you running?

3.2.0-67-generic-pae and 3.13.0-30-generic for the host.

> Also, is there a chance you could try this on a RHEL6 box?

The odds are low over the next few days.  I am adding nastier rcutorture
testing, however.  It would still be very good to get debug information
from your setup.  One approach would be to convert the trace function
calls into printk(), if that would help.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-18 Thread Amit Shah
On (Fri) 15 Aug 2014 [08:04:05], Paul E. McKenney wrote:
> On Fri, Aug 15, 2014 at 10:54:11AM +0530, Amit Shah wrote:
> > On (Wed) 13 Aug 2014 [06:00:49], Paul E. McKenney wrote:
> > > On Wed, Aug 13, 2014 at 11:14:39AM +0530, Amit Shah wrote:
> > > > On (Tue) 12 Aug 2014 [14:41:51], Paul E. McKenney wrote:
> > > > > On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> > > > > > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > > > > > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> > > > > > 
> > > > > > [ . . . ]
> > > > > > 
> > > > > > > > I know of only virtio-console doing this (via userspace only,
> > > > > > > > though).
> > > > > > > 
> > > > > > > As in userspace within the guest?  That would not work.  The 
> > > > > > > userspace
> > > > > > > that the qemu is running in might.  There is a way to extract 
> > > > > > > ftrace info
> > > > > > > from crash dumps, so one approach would be "sendkey alt-sysrq-c", 
> > > > > > > then
> > > > > > > pull the buffer from the resulting dump.  For all I know, there 
> > > > > > > might also
> > > > > > > be some script that uses the qemu "x" command to get at the 
> > > > > > > ftrace buffer.
> > > > > > > 
> > > > > > > Again, I cannot reproduce this, and I have been through the code 
> > > > > > > several
> > > > > > > times over the past few days, and am not seeing it.  I could start
> > > > > > > sending you random diagnostic patches, but it would be much 
> > > > > > > better if
> > > > > > > we could get the trace data from the failure.
> > > > 
> > > > I think the only recourse I now have is to dump the guest state from
> > > > qemu, and attempt to find the ftrace buffers by poking pages and
> > > > finding some ftrace-like struct... and then dumping the buffers.
> > > 
> > > The data exists in the qemu guest state, so it would be good to have
> > > it one way or another.  My current (perhaps self-serving) guess is that
> > > you have come up with a way to trick qemu into dropping IPIs.
> > 
> > I didn't get around to doing this yet; will get to it next week.
> > 
> > In the meantime, I tried this on RHEL6 (with RHEL6 qemu and gcc and
> > seabios), and that exhibits the problem similarly with my .config.
> 
> And I am running my tests successfully on an x86_64 system running
> Ubuntu 12.04.  Some testing on 14.04 seems to require booting with
> acpi=off, leading to my perhaps self-serving guess above.

It looks like Ubuntu 12.04 has a choice of multiple kernels.  Which
one are you running?

Also, is there a chance you could try this on a RHEL6 box?

Thanks,

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-15 Thread Paul E. McKenney
On Fri, Aug 15, 2014 at 10:54:11AM +0530, Amit Shah wrote:
> On (Wed) 13 Aug 2014 [06:00:49], Paul E. McKenney wrote:
> > On Wed, Aug 13, 2014 at 11:14:39AM +0530, Amit Shah wrote:
> > > On (Tue) 12 Aug 2014 [14:41:51], Paul E. McKenney wrote:
> > > > On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > > > > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> > > > > 
> > > > > [ . . . ]
> > > > > 
> > > > > > > I know of only virtio-console doing this (via userspace only,
> > > > > > > though).
> > > > > > 
> > > > > > As in userspace within the guest?  That would not work.  The 
> > > > > > userspace
> > > > > > that the qemu is running in might.  There is a way to extract 
> > > > > > ftrace info
> > > > > > from crash dumps, so one approach would be "sendkey alt-sysrq-c", 
> > > > > > then
> > > > > > pull the buffer from the resulting dump.  For all I know, there 
> > > > > > might also
> > > > > > be some script that uses the qemu "x" command to get at the ftrace 
> > > > > > buffer.
> > > > > > 
> > > > > > Again, I cannot reproduce this, and I have been through the code 
> > > > > > several
> > > > > > times over the past few days, and am not seeing it.  I could start
> > > > > > sending you random diagnostic patches, but it would be much better 
> > > > > > if
> > > > > > we could get the trace data from the failure.
> > > 
> > > I think the only recourse I now have is to dump the guest state from
> > > qemu, and attempt to find the ftrace buffers by poking pages and
> > > finding some ftrace-like struct... and then dumping the buffers.
> > 
> > The data exists in the qemu guest state, so it would be good to have
> > it one way or another.  My current (perhaps self-serving) guess is that
> > you have come up with a way to trick qemu into dropping IPIs.
> 
> I didn't get around to doing this yet; will get to it next week.
> 
> In the meantime, I tried this on RHEL6 (with RHEL6 qemu and gcc and
> seabios), and that exhibits the problem similarly with my .config.

And I am running my tests successfully on an x86_64 system running
Ubuntu 12.04.  Some testing on 14.04 seems to require booting with
acpi=off, leading to my perhaps self-serving guess above.

> 
> 
> > > > +
> > > > return true;
> > > 
> > > I have return 1; here.
> > > 
> > > I'm on linux.git, c8d6637d0497d62093dbba0694c7b3a80b79bfe1.
> > 
> > I am working on top of my -rcu tree, which contains the fix from "1" to
> > "true" compared to current mainline.  So this will resolve itself, and
> > you should be OK fixing up conflict in either direction.
> 
> Yep, I did do that.  Just noted here that the hunk didn't directly
> apply.

Fair enough, thank you for letting me know.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-14 Thread Amit Shah
On (Wed) 13 Aug 2014 [06:00:49], Paul E. McKenney wrote:
> On Wed, Aug 13, 2014 at 11:14:39AM +0530, Amit Shah wrote:
> > On (Tue) 12 Aug 2014 [14:41:51], Paul E. McKenney wrote:
> > > On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> > > > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > I know of only virtio-console doing this (via userspace only,
> > > > > > though).
> > > > > 
> > > > > As in userspace within the guest?  That would not work.  The userspace
> > > > > that the qemu is running in might.  There is a way to extract ftrace 
> > > > > info
> > > > > from crash dumps, so one approach would be "sendkey alt-sysrq-c", then
> > > > > pull the buffer from the resulting dump.  For all I know, there might 
> > > > > also
> > > > > be some script that uses the qemu "x" command to get at the ftrace 
> > > > > buffer.
> > > > > 
> > > > > Again, I cannot reproduce this, and I have been through the code 
> > > > > several
> > > > > times over the past few days, and am not seeing it.  I could start
> > > > > sending you random diagnostic patches, but it would be much better if
> > > > > we could get the trace data from the failure.
> > 
> > I think the only recourse I now have is to dump the guest state from
> > qemu, and attempt to find the ftrace buffers by poking pages and
> > finding some ftrace-like struct... and then dumping the buffers.
> 
> The data exists in the qemu guest state, so it would be good to have
> it one way or another.  My current (perhaps self-serving) guess is that
> you have come up with a way to trick qemu into dropping IPIs.

I didn't get around to doing this yet; will get to it next week.

In the meantime, I tried this on RHEL6 (with RHEL6 qemu and gcc and
seabios), and that exhibits the problem similarly with my .config.



> > > +
> > >   return true;
> > 
> > I have return 1; here.
> > 
> > I'm on linux.git, c8d6637d0497d62093dbba0694c7b3a80b79bfe1.
> 
> I am working on top of my -rcu tree, which contains the fix from "1" to
> "true" compared to current mainline.  So this will resolve itself, and
> you should be OK fixing up conflict in either direction.

Yep, I did do that.  Just noted here that the hunk didn't directly
apply.

Thanks,

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-13 Thread Paul E. McKenney
On Wed, Aug 13, 2014 at 06:00:49AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 13, 2014 at 11:14:39AM +0530, Amit Shah wrote:
> > On (Tue) 12 Aug 2014 [14:41:51], Paul E. McKenney wrote:
> > > On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> > > > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > I know of only virtio-console doing this (via userspace only,
> > > > > > though).
> > > > > 
> > > > > As in userspace within the guest?  That would not work.  The userspace
> > > > > that the qemu is running in might.  There is a way to extract ftrace 
> > > > > info
> > > > > from crash dumps, so one approach would be "sendkey alt-sysrq-c", then
> > > > > pull the buffer from the resulting dump.  For all I know, there might 
> > > > > also
> > > > > be some script that uses the qemu "x" command to get at the ftrace 
> > > > > buffer.
> > > > > 
> > > > > Again, I cannot reproduce this, and I have been through the code 
> > > > > several
> > > > > times over the past few days, and am not seeing it.  I could start
> > > > > sending you random diagnostic patches, but it would be much better if
> > > > > we could get the trace data from the failure.
> > 
> > I think the only recourse I now have is to dump the guest state from
> > qemu, and attempt to find the ftrace buffers by poking pages and
> > finding some ftrace-like struct... and then dumping the buffers.
> 
> The data exists in the qemu guest state, so it would be good to have
> it one way or another.  My current (perhaps self-serving) guess is that
> you have come up with a way to trick qemu into dropping IPIs.

Oh, and I wrote up my last inspection of the nocb code.  Please see below.

Thanx, Paul



Given that specifying rcu_nocb_poll avoids the hang, the natural focus
is on checks for rcu_nocb_poll:

o   __call_rcu_nocb_enqueue() skips the wakeup if rcu_nocb_poll,
which is legitimate because the rcuo kthreads do their own
wakeups in this case.

o   nocb_leader_wait() does wait_event_interruptible() on
my_rdp->nocb_leader_wake if !rcu_nocb_poll.  So one further
question is the handling of ->nocb_leader_wake.

The thing to check for is if ->nocb_leader_wake can get set to
true without a wakeup while the leader sleeps, as this would
clearly lead to a hang.  Checking each use:

o   wake_nocb_leader() tests ->nocb_leader_wake, and
if false, sets it and does a wakeup.  The set is
ordered after the test due to control dependencies.
Multiple followers might concurrently attempt to
wake their leader, and this can result in multiple
wakeups, which should be OK -- we only need one
wakeup, so more won't hurt.

Here, every time ->nocb_leader_wake is set, we
follow up with a wakeup, so this particular use
avoids the sleep-while-set problem.

It is also important to note that

o   nocb_leader_wait() waits for ->nocb_leader_wake, as
noted above.

o   nocb_leader_wait() checks for spurious wakeups, but
before sleeping again, it clears ->nocb_leader_wake,
does a memory barrier, and rescans the callback
queues, and sets ->nocb_leader_wake if any have
callbacks.  Either way, it goes to wait again.  If it
set ->nocb_leader_wake, then the wait won't wait,
as required.

The check for spurious wakeups also moves callbacks
to an intermediate list for the grace-period-wait
operation.  This ensures that we don't prematurely
invoke any callbacks that arrive while the grace period
is in progress.

o   If the wakeup was real, nocb_leader_wait() clears
->nocb_leader_wake, does a memory barrier, and moves
callbacks from the intermediate lists to the followers'
lists (including itself, as a leader is its own first
follower).  During this move, the leader checks for
new callbacks having arrived during the grace period,
and sets ->nocb_leader_wake if there are any, again
short-circuiting the following wake.

o   Note that nocb_leader_wait() never sets ->nocb_leader_wake
unless it has found callbacks waiting for it, and that
setting ->nocb_leader_wake short-circuits the next wait,
so that a wakeup is not required.

Note also that every time nocb_leader_wait() clears

Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-13 Thread Paul E. McKenney
On Wed, Aug 13, 2014 at 11:14:39AM +0530, Amit Shah wrote:
> On (Tue) 12 Aug 2014 [14:41:51], Paul E. McKenney wrote:
> > On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> > > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > I know of only virtio-console doing this (via userspace only,
> > > > > though).
> > > > 
> > > > As in userspace within the guest?  That would not work.  The userspace
> > > > that the qemu is running in might.  There is a way to extract ftrace 
> > > > info
> > > > from crash dumps, so one approach would be "sendkey alt-sysrq-c", then
> > > > pull the buffer from the resulting dump.  For all I know, there might 
> > > > also
> > > > be some script that uses the qemu "x" command to get at the ftrace 
> > > > buffer.
> > > > 
> > > > Again, I cannot reproduce this, and I have been through the code several
> > > > times over the past few days, and am not seeing it.  I could start
> > > > sending you random diagnostic patches, but it would be much better if
> > > > we could get the trace data from the failure.
> 
> I think the only recourse I now have is to dump the guest state from
> qemu, and attempt to find the ftrace buffers by poking pages and
> finding some ftrace-like struct... and then dumping the buffers.

The data exists in the qemu guest state, so it would be good to have
it one way or another.  My current (perhaps self-serving) guess is that
you have come up with a way to trick qemu into dropping IPIs.

> > > Hearing no objections, random patch #1.  The compiler could in theory
> > > cause trouble without this patch, so there is some possibility that
> > > it is a fix.
> > 
> > #2...  This would have been a problem without the earlier patch, but
> > who knows?  (#1 moved from theoretically possible but not on x86 to
> > maybe on x86 given a sufficiently malevolent compiler with the
> > patch that you located with bisection.)
> 
> I tried all 3 patches individually, and all 3 together, no success.

I am not at all surprised.  You would have to have an extremely malevolent
compiler for two of them to have any effect, and you would have to have
someone invoking call_rcu() with irqs disabled from idle for the other
to have any effect.  Which is why I missed seeing them the first three
times I reviewed this code over the past few days.

> My gcc is gcc-4.8.3-1.fc20.x86_64.  I'm using a fairly uptodate Fedora
> 20 system on my laptop for these tests.
> 
> Curiously, patches 1 and 3 applied fine, but this one had a conflict.
> 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 1dc72f523c4a..1da605740e8d 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2137,6 +2137,17 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, 
> > struct rcu_head *rhp,
> 
> I have this hunk at line 2161, and...
> 
> > trace_rcu_callback(rdp->rsp->name, rhp,
> >-atomic_long_read(&rdp->nocb_q_count_lazy),
> >-atomic_long_read(&rdp->nocb_q_count));
> > +
> > +   /*
> > +* If called from an extended quiescent state with interrupts
> > +* disabled, invoke the RCU core in order to allow the idle-entry
> > +* deferred-wakeup check to function.
> > +*/
> > +   if (irqs_disabled_flags(flags) &&
> > +   !rcu_is_watching() &&
> > +   cpu_online(smp_processor_id()))
> > +   invoke_rcu_core();
> > +
> > return true;
> 
> I have return 1; here.
> 
> I'm on linux.git, c8d6637d0497d62093dbba0694c7b3a80b79bfe1.

I am working on top of my -rcu tree, which contains the fix from "1" to
"true" compared to current mainline.  So this will resolve itself, and
you should be OK fixing up conflict in either direction.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-12 Thread Amit Shah
On (Tue) 12 Aug 2014 [14:41:51], Paul E. McKenney wrote:
> On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> > 
> > [ . . . ]
> > 
> > > > I know of only virtio-console doing this (via userspace only,
> > > > though).
> > > 
> > > As in userspace within the guest?  That would not work.  The userspace
> > > that the qemu is running in might.  There is a way to extract ftrace info
> > > from crash dumps, so one approach would be "sendkey alt-sysrq-c", then
> > > pull the buffer from the resulting dump.  For all I know, there might also
> > > be some script that uses the qemu "x" command to get at the ftrace buffer.
> > > 
> > > Again, I cannot reproduce this, and I have been through the code several
> > > times over the past few days, and am not seeing it.  I could start
> > > sending you random diagnostic patches, but it would be much better if
> > > we could get the trace data from the failure.

I think the only recourse I now have is to dump the guest state from
qemu, and attempt to find the ftrace buffers by poking pages and
finding some ftrace-like struct... and then dumping the buffers.

> > Hearing no objections, random patch #1.  The compiler could in theory
> > cause trouble without this patch, so there is some possibility that
> > it is a fix.
> 
> #2...  This would have been a problem without the earlier patch, but
> who knows?  (#1 moved from theoretically possible but not on x86 to
> maybe on x86 given a sufficiently malevolent compiler with the
> patch that you located with bisection.)

I tried all 3 patches individually, and all 3 together, no success.

My gcc is gcc-4.8.3-1.fc20.x86_64.  I'm using a fairly uptodate Fedora
20 system on my laptop for these tests.

Curiously, patches 1 and 3 applied fine, but this one had a conflict.

> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 1dc72f523c4a..1da605740e8d 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2137,6 +2137,17 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, 
> struct rcu_head *rhp,

I have this hunk at line 2161, and...

>   trace_rcu_callback(rdp->rsp->name, rhp,
>  -atomic_long_read(&rdp->nocb_q_count_lazy),
>  -atomic_long_read(&rdp->nocb_q_count));
> +
> + /*
> +  * If called from an extended quiescent state with interrupts
> +  * disabled, invoke the RCU core in order to allow the idle-entry
> +  * deferred-wakeup check to function.
> +  */
> + if (irqs_disabled_flags(flags) &&
> + !rcu_is_watching() &&
> + cpu_online(smp_processor_id()))
> + invoke_rcu_core();
> +
>   return true;

I have return 1; here.

I'm on linux.git, c8d6637d0497d62093dbba0694c7b3a80b79bfe1.


Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-12 Thread Paul E. McKenney
On Tue, Aug 12, 2014 at 02:41:51PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> > 
> > [ . . . ]
> > 
> > > > I know of only virtio-console doing this (via userspace only,
> > > > though).
> > > 
> > > As in userspace within the guest?  That would not work.  The userspace
> > > that the qemu is running in might.  There is a way to extract ftrace info
> > > from crash dumps, so one approach would be "sendkey alt-sysrq-c", then
> > > pull the buffer from the resulting dump.  For all I know, there might also
> > > be some script that uses the qemu "x" command to get at the ftrace buffer.
> > > 
> > > Again, I cannot reproduce this, and I have been through the code several
> > > times over the past few days, and am not seeing it.  I could start
> > > sending you random diagnostic patches, but it would be much better if
> > > we could get the trace data from the failure.
> > 
> > Hearing no objections, random patch #1.  The compiler could in theory
> > cause trouble without this patch, so there is some possibility that
> > it is a fix.
> 
> #2...  This would have been a problem without the earlier patch, but
> who knows?  (#1 moved from theoretically possible but not on x86 to
> maybe on x86 given a sufficiently malevolent compiler with the
> patch that you located with bisection.)

#3...  This one is theoretically possible, but not on any system that has
a full barrier on lock acquisition.  This code did not exist before the
patch you located via bisection, but on the other hand, given that you
were running only two CPUs, it should be dead code anyway.  But who knows?

Thanx, Paul



diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1da605740e8d..70bff565dab6 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2314,6 +2314,7 @@ wait_again:
atomic_long_add(rdp->nocb_gp_count, &rdp->nocb_follower_count);
atomic_long_add(rdp->nocb_gp_count_lazy,
&rdp->nocb_follower_count_lazy);
+   smp_mb__after_atomic(); /* Store *tail before wakeup. */
if (rdp != my_rdp && tail == &rdp->nocb_follower_head) {
/*
 * List was empty, wake up the follower.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-12 Thread Paul E. McKenney
On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> 
> [ . . . ]
> 
> > > I know of only virtio-console doing this (via userspace only,
> > > though).
> > 
> > As in userspace within the guest?  That would not work.  The userspace
> > that the qemu is running in might.  There is a way to extract ftrace info
> > from crash dumps, so one approach would be "sendkey alt-sysrq-c", then
> > pull the buffer from the resulting dump.  For all I know, there might also
> > be some script that uses the qemu "x" command to get at the ftrace buffer.
> > 
> > Again, I cannot reproduce this, and I have been through the code several
> > times over the past few days, and am not seeing it.  I could start
> > sending you random diagnostic patches, but it would be much better if
> > we could get the trace data from the failure.
> 
> Hearing no objections, random patch #1.  The compiler could in theory
> cause trouble without this patch, so there is some possibility that
> it is a fix.

#2...  This would have been a problem without the earlier patch, but
who knows?  (#1 moved from theoretically possible but not on x86 to
maybe on x86 given a sufficiently malevolent compiler with the
patch that you located with bisection.)

Thanx, Paul



diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1dc72f523c4a..1da605740e8d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2137,6 +2137,17 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct 
rcu_head *rhp,
trace_rcu_callback(rdp->rsp->name, rhp,
   -atomic_long_read(&rdp->nocb_q_count_lazy),
   -atomic_long_read(&rdp->nocb_q_count));
+
+   /*
+* If called from an extended quiescent state with interrupts
+* disabled, invoke the RCU core in order to allow the idle-entry
+* deferred-wakeup check to function.
+*/
+   if (irqs_disabled_flags(flags) &&
+   !rcu_is_watching() &&
+   cpu_online(smp_processor_id()))
+   invoke_rcu_core();
+
return true;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-12 Thread Paul E. McKenney
On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:

[ . . . ]

> > I know of only virtio-console doing this (via userspace only,
> > though).
> 
> As in userspace within the guest?  That would not work.  The userspace
> that the qemu is running in might.  There is a way to extract ftrace info
> from crash dumps, so one approach would be "sendkey alt-sysrq-c", then
> pull the buffer from the resulting dump.  For all I know, there might also
> be some script that uses the qemu "x" command to get at the ftrace buffer.
> 
> Again, I cannot reproduce this, and I have been through the code several
> times over the past few days, and am not seeing it.  I could start
> sending you random diagnostic patches, but it would be much better if
> we could get the trace data from the failure.

Hearing no objections, random patch #1.  The compiler could in theory
cause trouble without this patch, so there is some possibility that
it is a fix.

Thanx, Paul



diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index a2333d07f5d6..1dc72f523c4a 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2043,7 +2043,7 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool 
force)
if (!ACCESS_ONCE(rdp_leader->nocb_kthread))
return;
if (!ACCESS_ONCE(rdp_leader->nocb_leader_wake) || force) {
-   /* Prior xchg orders against prior callback enqueue. */
+   /* Prior smp_mb__after_atomic() orders against prior enqueue. */
ACCESS_ONCE(rdp_leader->nocb_leader_wake) = true;
wake_up(&rdp_leader->nocb_wq);
}
@@ -2072,6 +2072,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
ACCESS_ONCE(*old_rhpp) = rhp;
atomic_long_add(rhcount, &rdp->nocb_q_count);
atomic_long_add(rhcount_lazy, &rdp->nocb_q_count_lazy);
+   smp_mb__after_atomic(); /* Store *old_rhpp before _wake test. */
 
/* If we are not being polled and there is a kthread, awaken it ... */
t = ACCESS_ONCE(rdp->nocb_kthread);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-12 Thread Paul E. McKenney
On Tue, Aug 12, 2014 at 10:57:26AM +0530, Amit Shah wrote:
> On (Mon) 11 Aug 2014 [13:34:21], Paul E. McKenney wrote:
> > On Tue, Aug 12, 2014 at 01:48:45AM +0530, Amit Shah wrote:

[ . . . ]

> > > > In addition "sendkey alt-sysrq-t" at the "(qemu)" prompt dumps all 
> > > > tasks'
> > > > stacks, which would also likely be useful information.
> > > 
> > > Nah, this doesn't work -- the guest's totally locked up.  I need a way
> > > to continuously dump buffers till the lockup happens, I suppose.
> > 
> > That is a bit surprising.  Is it possible that the system is OOMing
> > quickly due to grace periods not proceeding?  If so, maybe giving the
> > VM more memory would help.
> 
> I bumped it up to 1G and then 2G, same result.

OK.  I am then back at making the system dump core.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-12 Thread Paul E. McKenney
On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> On (Mon) 11 Aug 2014 [20:45:31], Paul E. McKenney wrote:

[ . . . ]

> > > That is a bit surprising.  Is it possible that the system is OOMing
> > > quickly due to grace periods not proceeding?  If so, maybe giving the
> > > VM more memory would help.
> > 
> > Oh, and it is necessary to build the kernel with CONFIG_RCU_TRACE=y
> > for the rcu_nocb_wake trace events to be enabled in the first place.
> > I am assuming that your kernel was built with CONFIG_MAGIC_SYSRQ=y.
> 
> Yes, it is :-)  I checked the rcu_nocb_poll cmdline option does indeed
> dump all the ftrace buffers to dmesg.

Good.  ;-)

> > If all of that is in place and no joy, is it possible to extract the
> > ftrace buffer from the running/hung guest?  It should be in there
> > somewhere!  ;-)
> 
> I know of only virtio-console doing this (via userspace only,
> though).

As in userspace within the guest?  That would not work.  The userspace
that the qemu is running in might.  There is a way to extract ftrace info
from crash dumps, so one approach would be "sendkey alt-sysrq-c", then
pull the buffer from the resulting dump.  For all I know, there might also
be some script that uses the qemu "x" command to get at the ftrace buffer.

Again, I cannot reproduce this, and I have been through the code several
times over the past few days, and am not seeing it.  I could start
sending you random diagnostic patches, but it would be much better if
we could get the trace data from the failure.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-11 Thread Amit Shah
On (Mon) 11 Aug 2014 [20:45:31], Paul E. McKenney wrote:
> On Mon, Aug 11, 2014 at 01:34:21PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 12, 2014 at 01:48:45AM +0530, Amit Shah wrote:
> > > On (Mon) 11 Aug 2014 [13:11:02], Paul E. McKenney wrote:
> > > > On Tue, Aug 12, 2014 at 01:11:26AM +0530, Amit Shah wrote:
> > > > > On (Mon) 11 Aug 2014 [09:28:07], Paul E. McKenney wrote:
> > > > > > On Mon, Aug 11, 2014 at 12:43:08PM +0530, Amit Shah wrote:
> > > > > > > On (Fri) 08 Aug 2014 [14:46:48], Paul E. McKenney wrote:
> > > > > > > > On Fri, Aug 08, 2014 at 02:43:47PM -0700, Paul E. McKenney 
> > > > > > > > wrote:
> > > > > > > > > On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> > > > > > > > > > On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> > > > > > > > > 
> > > > > > > > > [ . . . ]
> > > > > > > > > 
> > > > > > > > > > > Hmmm... What happens if you boot 
> > > > > > > > > > > a7d7a143d0b4cb1914705884ca5c25e322dba693
> > > > > > > > > > > with the kernel parameter "acpi=off"?
> > > > > > > > > > 
> > > > > > > > > > That doesn't change anything - still hangs.
> > > > > > > > > > 
> > > > > > > > > > I intend to look at this more on Monday, though - turning 
> > > > > > > > > > in for
> > > > > > > > > > today.  In the meantime, if there's anything else you'd 
> > > > > > > > > > like me to
> > > > > > > > > > try, please let me know.
> > > > > > > > > 
> > > > > > > > > OK, given that I still cannot reproduce it, I do need your 
> > > > > > > > > help with
> > > > > > > > > the diagnostics.  And so what sorts of diagnostics work for 
> > > > > > > > > you in
> > > > > > > > > the hung state?  Are you able to dump ftrace buffers?
> > > > > > > > > 
> > > > > > > > > If you are able to dump ftrace buffers, please enable 
> > > > > > > > > rcu:rcu_nocb_wake
> > > > > > > > > and send me the resulting trace.
> > > > > > > > 
> > > > > > > > And another random kernel boot parameter to try is 
> > > > > > > > rcu_nocb_poll.
> > > > > > > 
> > > > > > > Right, this gets the boot going again:
> > > > > > 
> > > > > > OK, that likely indicates a lost wakeup.  The event tracing enabled 
> > > > > > by
> > > > > > "rcu:rcu_nocb_wake" should help track those down.  Last time, it 
> > > > > > was qemu
> > > > > > losing the wakeups, but maybe it is RCU this time.  ;-)
> > > > > 
> > > > > The guest goes dead pretty early; is there a trick to enabling and
> > > > > getting these traces out of the guest that I don't know of that
> > > > > doesn't involve being booted into userspace?  I can perhaps try
> > > > > getting the trace output out from a virtio-serial channel; but even
> > > > > that driver isn't probed yet when the lockup happens.
> > > > 
> > > > First boot with the kernel parameter "trace_event=rcu:rcu_nocb_wake".
> > > > Then when the system hangs, do "sendkey alt-sysrq-z" at the "(qemu)"
> > > > prompt to dump the ftrace buffer.  This hopefully dumps the trace buffer
> > > > to dmesg.
> > > > 
> > > > In addition "sendkey alt-sysrq-t" at the "(qemu)" prompt dumps all 
> > > > tasks'
> > > > stacks, which would also likely be useful information.
> > > 
> > > Nah, this doesn't work -- the guest's totally locked up.  I need a way
> > > to continuously dump buffers till the lockup happens, I suppose.
> > 
> > That is a bit surprising.  Is it possible that the system is OOMing
> > quickly due to grace periods not proceeding?  If so, maybe giving the
> > VM more memory would help.
> 
> Oh, and it is necessary to build the kernel with CONFIG_RCU_TRACE=y
> for the rcu_nocb_wake trace events to be enabled in the first place.
> I am assuming that your kernel was built with CONFIG_MAGIC_SYSRQ=y.

Yes, it is :-)  I checked the rcu_nocb_poll cmdline option does indeed
dump all the ftrace buffers to dmesg.

> If all of that is in place and no joy, is it possible to extract the
> ftrace buffer from the running/hung guest?  It should be in there
> somewhere!  ;-)

I know of only virtio-console doing this (via userspace only,
though).

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-11 Thread Amit Shah
On (Mon) 11 Aug 2014 [13:34:21], Paul E. McKenney wrote:
> On Tue, Aug 12, 2014 at 01:48:45AM +0530, Amit Shah wrote:
> > On (Mon) 11 Aug 2014 [13:11:02], Paul E. McKenney wrote:
> > > On Tue, Aug 12, 2014 at 01:11:26AM +0530, Amit Shah wrote:
> > > > On (Mon) 11 Aug 2014 [09:28:07], Paul E. McKenney wrote:
> > > > > On Mon, Aug 11, 2014 at 12:43:08PM +0530, Amit Shah wrote:
> > > > > > On (Fri) 08 Aug 2014 [14:46:48], Paul E. McKenney wrote:
> > > > > > > On Fri, Aug 08, 2014 at 02:43:47PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> > > > > > > > > On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> > > > > > > > 
> > > > > > > > [ . . . ]
> > > > > > > > 
> > > > > > > > > > Hmmm... What happens if you boot 
> > > > > > > > > > a7d7a143d0b4cb1914705884ca5c25e322dba693
> > > > > > > > > > with the kernel parameter "acpi=off"?
> > > > > > > > > 
> > > > > > > > > That doesn't change anything - still hangs.
> > > > > > > > > 
> > > > > > > > > I intend to look at this more on Monday, though - turning in 
> > > > > > > > > for
> > > > > > > > > today.  In the meantime, if there's anything else you'd like 
> > > > > > > > > me to
> > > > > > > > > try, please let me know.
> > > > > > > > 
> > > > > > > > OK, given that I still cannot reproduce it, I do need your help 
> > > > > > > > with
> > > > > > > > the diagnostics.  And so what sorts of diagnostics work for you 
> > > > > > > > in
> > > > > > > > the hung state?  Are you able to dump ftrace buffers?
> > > > > > > > 
> > > > > > > > If you are able to dump ftrace buffers, please enable 
> > > > > > > > rcu:rcu_nocb_wake
> > > > > > > > and send me the resulting trace.
> > > > > > > 
> > > > > > > And another random kernel boot parameter to try is rcu_nocb_poll.
> > > > > > 
> > > > > > Right, this gets the boot going again:
> > > > > 
> > > > > OK, that likely indicates a lost wakeup.  The event tracing enabled by
> > > > > "rcu:rcu_nocb_wake" should help track those down.  Last time, it was 
> > > > > qemu
> > > > > losing the wakeups, but maybe it is RCU this time.  ;-)
> > > > 
> > > > The guest goes dead pretty early; is there a trick to enabling and
> > > > getting these traces out of the guest that I don't know of that
> > > > doesn't involve being booted into userspace?  I can perhaps try
> > > > getting the trace output out from a virtio-serial channel; but even
> > > > that driver isn't probed yet when the lockup happens.
> > > 
> > > First boot with the kernel parameter "trace_event=rcu:rcu_nocb_wake".
> > > Then when the system hangs, do "sendkey alt-sysrq-z" at the "(qemu)"
> > > prompt to dump the ftrace buffer.  This hopefully dumps the trace buffer
> > > to dmesg.
> > > 
> > > In addition "sendkey alt-sysrq-t" at the "(qemu)" prompt dumps all tasks'
> > > stacks, which would also likely be useful information.
> > 
> > Nah, this doesn't work -- the guest's totally locked up.  I need a way
> > to continuously dump buffers till the lockup happens, I suppose.
> 
> That is a bit surprising.  Is it possible that the system is OOMing
> quickly due to grace periods not proceeding?  If so, maybe giving the
> VM more memory would help.

I bumped it up to 1G and then 2G, same result.

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-11 Thread Paul E. McKenney
On Mon, Aug 11, 2014 at 01:34:21PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 12, 2014 at 01:48:45AM +0530, Amit Shah wrote:
> > On (Mon) 11 Aug 2014 [13:11:02], Paul E. McKenney wrote:
> > > On Tue, Aug 12, 2014 at 01:11:26AM +0530, Amit Shah wrote:
> > > > On (Mon) 11 Aug 2014 [09:28:07], Paul E. McKenney wrote:
> > > > > On Mon, Aug 11, 2014 at 12:43:08PM +0530, Amit Shah wrote:
> > > > > > On (Fri) 08 Aug 2014 [14:46:48], Paul E. McKenney wrote:
> > > > > > > On Fri, Aug 08, 2014 at 02:43:47PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> > > > > > > > > On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> > > > > > > > 
> > > > > > > > [ . . . ]
> > > > > > > > 
> > > > > > > > > > Hmmm... What happens if you boot 
> > > > > > > > > > a7d7a143d0b4cb1914705884ca5c25e322dba693
> > > > > > > > > > with the kernel parameter "acpi=off"?
> > > > > > > > > 
> > > > > > > > > That doesn't change anything - still hangs.
> > > > > > > > > 
> > > > > > > > > I intend to look at this more on Monday, though - turning in 
> > > > > > > > > for
> > > > > > > > > today.  In the meantime, if there's anything else you'd like 
> > > > > > > > > me to
> > > > > > > > > try, please let me know.
> > > > > > > > 
> > > > > > > > OK, given that I still cannot reproduce it, I do need your help 
> > > > > > > > with
> > > > > > > > the diagnostics.  And so what sorts of diagnostics work for you 
> > > > > > > > in
> > > > > > > > the hung state?  Are you able to dump ftrace buffers?
> > > > > > > > 
> > > > > > > > If you are able to dump ftrace buffers, please enable 
> > > > > > > > rcu:rcu_nocb_wake
> > > > > > > > and send me the resulting trace.
> > > > > > > 
> > > > > > > And another random kernel boot parameter to try is rcu_nocb_poll.
> > > > > > 
> > > > > > Right, this gets the boot going again:
> > > > > 
> > > > > OK, that likely indicates a lost wakeup.  The event tracing enabled by
> > > > > "rcu:rcu_nocb_wake" should help track those down.  Last time, it was 
> > > > > qemu
> > > > > losing the wakeups, but maybe it is RCU this time.  ;-)
> > > > 
> > > > The guest goes dead pretty early; is there a trick to enabling and
> > > > getting these traces out of the guest that I don't know of that
> > > > doesn't involve being booted into userspace?  I can perhaps try
> > > > getting the trace output out from a virtio-serial channel; but even
> > > > that driver isn't probed yet when the lockup happens.
> > > 
> > > First boot with the kernel parameter "trace_event=rcu:rcu_nocb_wake".
> > > Then when the system hangs, do "sendkey alt-sysrq-z" at the "(qemu)"
> > > prompt to dump the ftrace buffer.  This hopefully dumps the trace buffer
> > > to dmesg.
> > > 
> > > In addition "sendkey alt-sysrq-t" at the "(qemu)" prompt dumps all tasks'
> > > stacks, which would also likely be useful information.
> > 
> > Nah, this doesn't work -- the guest's totally locked up.  I need a way
> > to continuously dump buffers till the lockup happens, I suppose.
> 
> That is a bit surprising.  Is it possible that the system is OOMing
> quickly due to grace periods not proceeding?  If so, maybe giving the
> VM more memory would help.

Oh, and it is necessary to build the kernel with CONFIG_RCU_TRACE=y
for the rcu_nocb_wake trace events to be enabled in the first place.
I am assuming that your kernel was built with CONFIG_MAGIC_SYSRQ=y.

If all of that is in place and no joy, is it possible to extract the
ftrace buffer from the running/hung guest?  It should be in there
somewhere!  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-11 Thread Paul E. McKenney
On Tue, Aug 12, 2014 at 01:48:45AM +0530, Amit Shah wrote:
> On (Mon) 11 Aug 2014 [13:11:02], Paul E. McKenney wrote:
> > On Tue, Aug 12, 2014 at 01:11:26AM +0530, Amit Shah wrote:
> > > On (Mon) 11 Aug 2014 [09:28:07], Paul E. McKenney wrote:
> > > > On Mon, Aug 11, 2014 at 12:43:08PM +0530, Amit Shah wrote:
> > > > > On (Fri) 08 Aug 2014 [14:46:48], Paul E. McKenney wrote:
> > > > > > On Fri, Aug 08, 2014 at 02:43:47PM -0700, Paul E. McKenney wrote:
> > > > > > > On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> > > > > > > > On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> > > > > > > 
> > > > > > > [ . . . ]
> > > > > > > 
> > > > > > > > > Hmmm... What happens if you boot 
> > > > > > > > > a7d7a143d0b4cb1914705884ca5c25e322dba693
> > > > > > > > > with the kernel parameter "acpi=off"?
> > > > > > > > 
> > > > > > > > That doesn't change anything - still hangs.
> > > > > > > > 
> > > > > > > > I intend to look at this more on Monday, though - turning in for
> > > > > > > > today.  In the meantime, if there's anything else you'd like me 
> > > > > > > > to
> > > > > > > > try, please let me know.
> > > > > > > 
> > > > > > > OK, given that I still cannot reproduce it, I do need your help 
> > > > > > > with
> > > > > > > the diagnostics.  And so what sorts of diagnostics work for you in
> > > > > > > the hung state?  Are you able to dump ftrace buffers?
> > > > > > > 
> > > > > > > If you are able to dump ftrace buffers, please enable 
> > > > > > > rcu:rcu_nocb_wake
> > > > > > > and send me the resulting trace.
> > > > > > 
> > > > > > And another random kernel boot parameter to try is rcu_nocb_poll.
> > > > > 
> > > > > Right, this gets the boot going again:
> > > > 
> > > > OK, that likely indicates a lost wakeup.  The event tracing enabled by
> > > > "rcu:rcu_nocb_wake" should help track those down.  Last time, it was 
> > > > qemu
> > > > losing the wakeups, but maybe it is RCU this time.  ;-)
> > > 
> > > The guest goes dead pretty early; is there a trick to enabling and
> > > getting these traces out of the guest that I don't know of that
> > > doesn't involve being booted into userspace?  I can perhaps try
> > > getting the trace output out from a virtio-serial channel; but even
> > > that driver isn't probed yet when the lockup happens.
> > 
> > First boot with the kernel parameter "trace_event=rcu:rcu_nocb_wake".
> > Then when the system hangs, do "sendkey alt-sysrq-z" at the "(qemu)"
> > prompt to dump the ftrace buffer.  This hopefully dumps the trace buffer
> > to dmesg.
> > 
> > In addition "sendkey alt-sysrq-t" at the "(qemu)" prompt dumps all tasks'
> > stacks, which would also likely be useful information.
> 
> Nah, this doesn't work -- the guest's totally locked up.  I need a way
> to continuously dump buffers till the lockup happens, I suppose.

That is a bit surprising.  Is it possible that the system is OOMing
quickly due to grace periods not proceeding?  If so, maybe giving the
VM more memory would help.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-11 Thread Amit Shah
On (Mon) 11 Aug 2014 [13:11:02], Paul E. McKenney wrote:
> On Tue, Aug 12, 2014 at 01:11:26AM +0530, Amit Shah wrote:
> > On (Mon) 11 Aug 2014 [09:28:07], Paul E. McKenney wrote:
> > > On Mon, Aug 11, 2014 at 12:43:08PM +0530, Amit Shah wrote:
> > > > On (Fri) 08 Aug 2014 [14:46:48], Paul E. McKenney wrote:
> > > > > On Fri, Aug 08, 2014 at 02:43:47PM -0700, Paul E. McKenney wrote:
> > > > > > On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> > > > > > > On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> > > > > > 
> > > > > > [ . . . ]
> > > > > > 
> > > > > > > > Hmmm... What happens if you boot 
> > > > > > > > a7d7a143d0b4cb1914705884ca5c25e322dba693
> > > > > > > > with the kernel parameter "acpi=off"?
> > > > > > > 
> > > > > > > That doesn't change anything - still hangs.
> > > > > > > 
> > > > > > > I intend to look at this more on Monday, though - turning in for
> > > > > > > today.  In the meantime, if there's anything else you'd like me to
> > > > > > > try, please let me know.
> > > > > > 
> > > > > > OK, given that I still cannot reproduce it, I do need your help with
> > > > > > the diagnostics.  And so what sorts of diagnostics work for you in
> > > > > > the hung state?  Are you able to dump ftrace buffers?
> > > > > > 
> > > > > > If you are able to dump ftrace buffers, please enable 
> > > > > > rcu:rcu_nocb_wake
> > > > > > and send me the resulting trace.
> > > > > 
> > > > > And another random kernel boot parameter to try is rcu_nocb_poll.
> > > > 
> > > > Right, this gets the boot going again:
> > > 
> > > OK, that likely indicates a lost wakeup.  The event tracing enabled by
> > > "rcu:rcu_nocb_wake" should help track those down.  Last time, it was qemu
> > > losing the wakeups, but maybe it is RCU this time.  ;-)
> > 
> > The guest goes dead pretty early; is there a trick to enabling and
> > getting these traces out of the guest that I don't know of that
> > doesn't involve being booted into userspace?  I can perhaps try
> > getting the trace output out from a virtio-serial channel; but even
> > that driver isn't probed yet when the lockup happens.
> 
> First boot with the kernel parameter "trace_event=rcu:rcu_nocb_wake".
> Then when the system hangs, do "sendkey alt-sysrq-z" at the "(qemu)"
> prompt to dump the ftrace buffer.  This hopefully dumps the trace buffer
> to dmesg.
> 
> In addition "sendkey alt-sysrq-t" at the "(qemu)" prompt dumps all tasks'
> stacks, which would also likely be useful information.

Nah, this doesn't work -- the guest's totally locked up.  I need a way
to continuously dump buffers till the lockup happens, I suppose.

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-11 Thread Paul E. McKenney
On Tue, Aug 12, 2014 at 01:11:26AM +0530, Amit Shah wrote:
> On (Mon) 11 Aug 2014 [09:28:07], Paul E. McKenney wrote:
> > On Mon, Aug 11, 2014 at 12:43:08PM +0530, Amit Shah wrote:
> > > On (Fri) 08 Aug 2014 [14:46:48], Paul E. McKenney wrote:
> > > > On Fri, Aug 08, 2014 at 02:43:47PM -0700, Paul E. McKenney wrote:
> > > > > On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> > > > > > On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> > > > > 
> > > > > [ . . . ]
> > > > > 
> > > > > > > Hmmm... What happens if you boot 
> > > > > > > a7d7a143d0b4cb1914705884ca5c25e322dba693
> > > > > > > with the kernel parameter "acpi=off"?
> > > > > > 
> > > > > > That doesn't change anything - still hangs.
> > > > > > 
> > > > > > I intend to look at this more on Monday, though - turning in for
> > > > > > today.  In the meantime, if there's anything else you'd like me to
> > > > > > try, please let me know.
> > > > > 
> > > > > OK, given that I still cannot reproduce it, I do need your help with
> > > > > the diagnostics.  And so what sorts of diagnostics work for you in
> > > > > the hung state?  Are you able to dump ftrace buffers?
> > > > > 
> > > > > If you are able to dump ftrace buffers, please enable 
> > > > > rcu:rcu_nocb_wake
> > > > > and send me the resulting trace.
> > > > 
> > > > And another random kernel boot parameter to try is rcu_nocb_poll.
> > > 
> > > Right, this gets the boot going again:
> > 
> > OK, that likely indicates a lost wakeup.  The event tracing enabled by
> > "rcu:rcu_nocb_wake" should help track those down.  Last time, it was qemu
> > losing the wakeups, but maybe it is RCU this time.  ;-)
> 
> The guest goes dead pretty early; is there a trick to enabling and
> getting these traces out of the guest that I don't know of that
> doesn't involve being booted into userspace?  I can perhaps try
> getting the trace output out from a virtio-serial channel; but even
> that driver isn't probed yet when the lockup happens.

First boot with the kernel parameter "trace_event=rcu:rcu_nocb_wake".
Then when the system hangs, do "sendkey alt-sysrq-z" at the "(qemu)"
prompt to dump the ftrace buffer.  This hopefully dumps the trace buffer
to dmesg.

In addition "sendkey alt-sysrq-t" at the "(qemu)" prompt dumps all tasks'
stacks, which would also likely be useful information.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-11 Thread Amit Shah
On (Mon) 11 Aug 2014 [09:28:07], Paul E. McKenney wrote:
> On Mon, Aug 11, 2014 at 12:43:08PM +0530, Amit Shah wrote:
> > On (Fri) 08 Aug 2014 [14:46:48], Paul E. McKenney wrote:
> > > On Fri, Aug 08, 2014 at 02:43:47PM -0700, Paul E. McKenney wrote:
> > > > On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> > > > > On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > Hmmm... What happens if you boot 
> > > > > > a7d7a143d0b4cb1914705884ca5c25e322dba693
> > > > > > with the kernel parameter "acpi=off"?
> > > > > 
> > > > > That doesn't change anything - still hangs.
> > > > > 
> > > > > I intend to look at this more on Monday, though - turning in for
> > > > > today.  In the meantime, if there's anything else you'd like me to
> > > > > try, please let me know.
> > > > 
> > > > OK, given that I still cannot reproduce it, I do need your help with
> > > > the diagnostics.  And so what sorts of diagnostics work for you in
> > > > the hung state?  Are you able to dump ftrace buffers?
> > > > 
> > > > If you are able to dump ftrace buffers, please enable rcu:rcu_nocb_wake
> > > > and send me the resulting trace.
> > > 
> > > And another random kernel boot parameter to try is rcu_nocb_poll.
> > 
> > Right, this gets the boot going again:
> 
> OK, that likely indicates a lost wakeup.  The event tracing enabled by
> "rcu:rcu_nocb_wake" should help track those down.  Last time, it was qemu
> losing the wakeups, but maybe it is RCU this time.  ;-)

The guest goes dead pretty early; is there a trick to enabling and
getting these traces out of the guest that I don't know of that
doesn't involve being booted into userspace?  I can perhaps try
getting the trace output out from a virtio-serial channel; but even
that driver isn't probed yet when the lockup happens.

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-11 Thread Paul E. McKenney
On Mon, Aug 11, 2014 at 12:43:08PM +0530, Amit Shah wrote:
> On (Fri) 08 Aug 2014 [14:46:48], Paul E. McKenney wrote:
> > On Fri, Aug 08, 2014 at 02:43:47PM -0700, Paul E. McKenney wrote:
> > > On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> > > > On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > Hmmm... What happens if you boot 
> > > > > a7d7a143d0b4cb1914705884ca5c25e322dba693
> > > > > with the kernel parameter "acpi=off"?
> > > > 
> > > > That doesn't change anything - still hangs.
> > > > 
> > > > I intend to look at this more on Monday, though - turning in for
> > > > today.  In the meantime, if there's anything else you'd like me to
> > > > try, please let me know.
> > > 
> > > OK, given that I still cannot reproduce it, I do need your help with
> > > the diagnostics.  And so what sorts of diagnostics work for you in
> > > the hung state?  Are you able to dump ftrace buffers?
> > > 
> > > If you are able to dump ftrace buffers, please enable rcu:rcu_nocb_wake
> > > and send me the resulting trace.
> > 
> > And another random kernel boot parameter to try is rcu_nocb_poll.
> 
> Right, this gets the boot going again:

OK, that likely indicates a lost wakeup.  The event tracing enabled by
"rcu:rcu_nocb_wake" should help track those down.  Last time, it was qemu
losing the wakeups, but maybe it is RCU this time.  ;-)

Thanx, Paul

> --- /var/tmp/rcu-bad.txt  2014-08-11 12:39:53.571306488 +0530
> +++ /var/tmp/rcu-good-nocb-poll.txt  2014-08-11 12:40:37.760432052 +0530
> @@ -1,7 +1,7 @@
> -$ qemu-kvm -m 512 -smp 1 -cpu host,+kvmclock,+x2apic -enable-kvm  -kernel 
> ~/src/linux/arch/x86/boot/bzImage /guests/f11-auto.qcow2  -append 
> 'root=/dev/sda2 acpi=off console=ttyS0 console=tty0'  -snapshot  -serial stdio
> +$ qemu-kvm -m 512 -smp 1 -cpu host,+kvmclock,+x2apic -enable-kvm  -kernel 
> ~/src/linux/arch/x86/boot/bzImage /guests/f11-auto.qcow2  -append 
> 'root=/dev/sda2 acpi=off console=ttyS0 console=tty0 rcu_nocb_poll'  -snapshot 
>  -serial stdio
>  Initializing cgroup subsys cpu
>  Linux version 3.16.0+ (a...@grmbl.mre) (gcc version 4.8.3 20140624 (Red Hat 
> 4.8.3-1) (GCC) ) #80 SMP PREEMPT Fri Aug 8 22:57:35 IST 2014
> -Command line: root=/dev/sda2 acpi=off console=ttyS0 console=tty0
> +Command line: root=/dev/sda2 acpi=off console=ttyS0 console=tty0 
> rcu_nocb_poll
>  e820: BIOS-provided physical RAM map:
>  BIOS-e820: [mem 0x-0x0009fbff] usable
>  BIOS-e820: [mem 0x0009fc00-0x0009] reserved
> @@ -47,14 +47,14 @@
>  KVM setup async PF for cpu 0
>  kvm-stealtime: cpu 0, msr 1fa0cbc0
>  Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 128903
> -Kernel command line: root=/dev/sda2 acpi=off console=ttyS0 console=tty0
> +Kernel command line: root=/dev/sda2 acpi=off console=ttyS0 console=tty0 
> rcu_nocb_poll
>  PID hash table entries: 2048 (order: 2, 16384 bytes)
>  Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
>  Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
>  xsave: enabled xstate_bv 0x7, cntxt size 0x340
>  AGP: Checking aperture...
>  AGP: No AGP bridge found
> -Memory: 485836K/523888K available (4029K kernel code, 727K rwdata, 2184K 
> rodata, 2872K init, 14172K bss, 38052K reserved)
> +Memory: 485832K/523888K available (4029K kernel code, 727K rwdata, 2184K 
> rodata, 2872K init, 14172K bss, 38056K reserved)
>  SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
>  Preemptible hierarchical RCU implementation.
>RCU debugfs-based tracing is enabled.
> @@ -63,6 +63,7 @@
>RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
>Offload RCU callbacks from all CPUs
>Offload RCU callbacks from CPUs: 0.
> +  Poll for callbacks from no-CBs CPUs.
>  RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
>  NO_HZ: Full dynticks CPUs: 1-3.
>  NR_IRQS:4352 nr_irqs:256 16
> @@ -114,3 +115,118 @@
>  cpuidle: using governor ladder
>  cpuidle: using governor menu
>  PCI: Using configuration type 1 for base access
> +ACPI: Interpreter disabled.
> +vgaarb: loaded
> +SCSI subsystem initialized
> +pps_core: LinuxPPS API ver. 1 registered
> +pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti 
> 
> +PTP clock support registered
> +PCI: Probing PCI hardware
> +PCI host bridge to bus :00
> +pci_bus :00: root bus resource [io  0x-0x]
> +pci_bus :00: root bus resource [mem 0x-0xff]
> +pci_bus :00: No busn resource found for root bus, will use [bus 00-ff]
> +pci :00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
> +pci :00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
> +pci :00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
> +pci :00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
> +pci :00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
> +pci :00:01.3: quirk: [io  0xb100-

Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-11 Thread Amit Shah
On (Fri) 08 Aug 2014 [14:46:48], Paul E. McKenney wrote:
> On Fri, Aug 08, 2014 at 02:43:47PM -0700, Paul E. McKenney wrote:
> > On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> > > On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> > 
> > [ . . . ]
> > 
> > > > Hmmm... What happens if you boot 
> > > > a7d7a143d0b4cb1914705884ca5c25e322dba693
> > > > with the kernel parameter "acpi=off"?
> > > 
> > > That doesn't change anything - still hangs.
> > > 
> > > I intend to look at this more on Monday, though - turning in for
> > > today.  In the meantime, if there's anything else you'd like me to
> > > try, please let me know.
> > 
> > OK, given that I still cannot reproduce it, I do need your help with
> > the diagnostics.  And so what sorts of diagnostics work for you in
> > the hung state?  Are you able to dump ftrace buffers?
> > 
> > If you are able to dump ftrace buffers, please enable rcu:rcu_nocb_wake
> > and send me the resulting trace.
> 
> And another random kernel boot parameter to try is rcu_nocb_poll.

Right, this gets the boot going again:

--- /var/tmp/rcu-bad.txt  2014-08-11 12:39:53.571306488 +0530
+++ /var/tmp/rcu-good-nocb-poll.txt  2014-08-11 12:40:37.760432052 +0530
@@ -1,7 +1,7 @@
-$ qemu-kvm -m 512 -smp 1 -cpu host,+kvmclock,+x2apic -enable-kvm  -kernel 
~/src/linux/arch/x86/boot/bzImage /guests/f11-auto.qcow2  -append 
'root=/dev/sda2 acpi=off console=ttyS0 console=tty0'  -snapshot  -serial stdio
+$ qemu-kvm -m 512 -smp 1 -cpu host,+kvmclock,+x2apic -enable-kvm  -kernel 
~/src/linux/arch/x86/boot/bzImage /guests/f11-auto.qcow2  -append 
'root=/dev/sda2 acpi=off console=ttyS0 console=tty0 rcu_nocb_poll'  -snapshot  
-serial stdio
 Initializing cgroup subsys cpu
 Linux version 3.16.0+ (a...@grmbl.mre) (gcc version 4.8.3 20140624 (Red Hat 
4.8.3-1) (GCC) ) #80 SMP PREEMPT Fri Aug 8 22:57:35 IST 2014
-Command line: root=/dev/sda2 acpi=off console=ttyS0 console=tty0
+Command line: root=/dev/sda2 acpi=off console=ttyS0 console=tty0 rcu_nocb_poll
 e820: BIOS-provided physical RAM map:
 BIOS-e820: [mem 0x-0x0009fbff] usable
 BIOS-e820: [mem 0x0009fc00-0x0009] reserved
@@ -47,14 +47,14 @@
 KVM setup async PF for cpu 0
 kvm-stealtime: cpu 0, msr 1fa0cbc0
 Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 128903
-Kernel command line: root=/dev/sda2 acpi=off console=ttyS0 console=tty0
+Kernel command line: root=/dev/sda2 acpi=off console=ttyS0 console=tty0 
rcu_nocb_poll
 PID hash table entries: 2048 (order: 2, 16384 bytes)
 Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
 Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
 xsave: enabled xstate_bv 0x7, cntxt size 0x340
 AGP: Checking aperture...
 AGP: No AGP bridge found
-Memory: 485836K/523888K available (4029K kernel code, 727K rwdata, 2184K 
rodata, 2872K init, 14172K bss, 38052K reserved)
+Memory: 485832K/523888K available (4029K kernel code, 727K rwdata, 2184K 
rodata, 2872K init, 14172K bss, 38056K reserved)
 SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
 Preemptible hierarchical RCU implementation.
 RCU debugfs-based tracing is enabled.
@@ -63,6 +63,7 @@
   RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
   Offload RCU callbacks from all CPUs
   Offload RCU callbacks from CPUs: 0.
+  Poll for callbacks from no-CBs CPUs.
 RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
 NO_HZ: Full dynticks CPUs: 1-3.
 NR_IRQS:4352 nr_irqs:256 16
@@ -114,3 +115,118 @@
 cpuidle: using governor ladder
 cpuidle: using governor menu
 PCI: Using configuration type 1 for base access
+ACPI: Interpreter disabled.
+vgaarb: loaded
+SCSI subsystem initialized
+pps_core: LinuxPPS API ver. 1 registered
+pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti 

+PTP clock support registered
+PCI: Probing PCI hardware
+PCI host bridge to bus :00
+pci_bus :00: root bus resource [io  0x-0x]
+pci_bus :00: root bus resource [mem 0x-0xff]
+pci_bus :00: No busn resource found for root bus, will use [bus 00-ff]
+pci :00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
+pci :00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
+pci :00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
+pci :00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
+pci :00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
+pci :00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
+vgaarb: device added: PCI::00:02.0,decodes=io+mem,owns=io+mem,locks=none
+pci :00:01.0: PIIX/ICH IRQ router [8086:7000]
+Switched to clocksource kvm-clock
+pnp: PnP ACPI: disabled
+NET: Registered protocol family 2
+TCP established hash table entries: 4096 (order: 3, 32768 bytes)
+TCP bind hash table entries: 4096 (order: 6, 327680 bytes)
+TCP: Hash tables configured (established 4096 bind 4096)
+TCP: reno registered
+UDP hash table entries: 256 (order: 3, 49152 bytes)
+UDP-Lite

Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-08 Thread Paul E. McKenney
On Fri, Aug 08, 2014 at 02:43:47PM -0700, Paul E. McKenney wrote:
> On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> > On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> 
> [ . . . ]
> 
> > > Hmmm... What happens if you boot a7d7a143d0b4cb1914705884ca5c25e322dba693
> > > with the kernel parameter "acpi=off"?
> > 
> > That doesn't change anything - still hangs.
> > 
> > I intend to look at this more on Monday, though - turning in for
> > today.  In the meantime, if there's anything else you'd like me to
> > try, please let me know.
> 
> OK, given that I still cannot reproduce it, I do need your help with
> the diagnostics.  And so what sorts of diagnostics work for you in
> the hung state?  Are you able to dump ftrace buffers?
> 
> If you are able to dump ftrace buffers, please enable rcu:rcu_nocb_wake
> and send me the resulting trace.

And another random kernel boot parameter to try is rcu_nocb_poll.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-08 Thread Paul E. McKenney
On Sat, Aug 09, 2014 at 12:04:24AM +0530, Amit Shah wrote:
> On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:

[ . . . ]

> > Hmmm... What happens if you boot a7d7a143d0b4cb1914705884ca5c25e322dba693
> > with the kernel parameter "acpi=off"?
> 
> That doesn't change anything - still hangs.
> 
> I intend to look at this more on Monday, though - turning in for
> today.  In the meantime, if there's anything else you'd like me to
> try, please let me know.

OK, given that I still cannot reproduce it, I do need your help with
the diagnostics.  And so what sorts of diagnostics work for you in
the hung state?  Are you able to dump ftrace buffers?

If you are able to dump ftrace buffers, please enable rcu:rcu_nocb_wake
and send me the resulting trace.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-08 Thread Amit Shah
On (Fri) 08 Aug 2014 [11:18:35], Paul E. McKenney wrote:
> On Fri, Aug 08, 2014 at 11:07:10PM +0530, Amit Shah wrote:
> > On (Fri) 08 Aug 2014 [09:25:02], Paul E. McKenney wrote:
> > > On Fri, Aug 08, 2014 at 02:10:56PM +0530, Amit Shah wrote:
> > > > On Friday 11 July 2014 07:05 PM, Paul E. McKenney wrote:
> > > > >From: "Paul E. McKenney" 
> > > > >
> > > > >An 80-CPU system with a context-switch-heavy workload can require so
> > > > >many NOCB kthread wakeups that the RCU grace-period kthreads spend 
> > > > >several
> > > > >tens of percent of a CPU just awakening things.  This clearly will not
> > > > >scale well: If you add enough CPUs, the RCU grace-period kthreads would
> > > > >get behind, increasing grace-period latency.
> > > > >
> > > > >To avoid this problem, this commit divides the NOCB kthreads into 
> > > > >leaders
> > > > >and followers, where the grace-period kthreads awaken the leaders each 
> > > > >of
> > > > >whom in turn awakens its followers.  By default, the number of groups 
> > > > >of
> > > > >kthreads is the square root of the number of CPUs, but this default may
> > > > >be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
> > > > >This reduces the number of wakeups done per grace period by the RCU
> > > > >grace-period kthread by the square root of the number of CPUs, but of
> > > > >course by shifting those wakeups to the leaders.  In addition, because
> > > > >the leaders do grace periods on behalf of their respective followers,
> > > > >the number of wakeups of the followers decreases by up to a factor of 
> > > > >two.
> > > > >Instead of being awakened once when new callbacks arrive and again
> > > > >at the end of the grace period, the followers are awakened only at
> > > > >the end of the grace period.
> > > > >
> > > > >For a numerical example, in a 4096-CPU system, the grace-period kthread
> > > > >would awaken 64 leaders, each of which would awaken its 63 followers
> > > > >at the end of the grace period.  This compares favorably with the 79
> > > > >wakeups for the grace-period kthread on an 80-CPU system.
> > > > >
> > > > >Reported-by: Rik van Riel 
> > > > >Signed-off-by: Paul E. McKenney 
> > > > 
> > > > This patch causes KVM guest boot to not proceed after a while.
> > > > .config is attached, and boot messages are appeneded.  This commit
> > > > was pointed to by bisect, and reverting on current master (while
> > > > addressing a trivial conflict) makes the boot work again.
> > > > 
> > > > The qemu cmdline is
> > > > 
> > > > ./x86_64-softmmu/qemu-system-x86_64 -m 512 -smp 2 -cpu
> > > > host,+kvmclock,+x2apic -enable-kvm  -kernel
> > > > ~/src/linux/arch/x86/boot/bzImage /guests/f11-auto.qcow2  -append
> > > > 'root=/dev/sda2 console=ttyS0 console=tty0' -snapshot -serial stdio
> > > 
> > > I cannot reproduce this.  I am at commit a7d7a143d0b4c, in case that
> > > makes a difference.
> > 
> > Yea; I'm at that commit too.  And the version of qemu doesn't matter;
> > happens on F20's qemu-kvm-1.6.2-7.fc20.x86_64 as well as qemu.git
> > compiled locally.
> > 
> > > There are some things in your dmesg that look quite strange to me, though.
> > > 
> > > You have "--smp 2" above, but in your dmesg I see the following:
> > > 
> > >   [0.00] setup_percpu: NR_CPUS:4 nr_cpumask_bits:4
> > >   nr_cpu_ids:1 nr_node_ids:1
> > > 
> > > So your run somehow only has one CPU.  RCU agrees that there is only
> > > one CPU:
> > 
> > Yea; indeed.  There are MTRR warnings too; attaching the boot log of
> > failed run and diff to the successful run (rcu-good-notime.txt).
> 
> My qemu runs don't have those MTRR warnings, for whatever that is worth.
> 
> > The failed run is on commit a7d7a143d0b4cb1914705884ca5c25e322dba693
> > and the successful run has these reverted on top:
> > 
> > 187497fa5e9e9383820d33e48b87f8200a747c2a
> > b58cc46c5f6b57f1c814e374dbc47176e6b4938e
> > fbce7497ee5af800a1c350c73f3c3f103cb27a15
> 
> OK.  Strange set of commits.

The last one is the one that causes the failure, the above two are
just the context fixups needed for a clean revert of the last one.

> > That is rcu-bad-notime.txt.
> > 
> > >   [0.00] Preemptible hierarchical RCU implementation.
> > >   [0.00]  RCU debugfs-based tracing is enabled.
> > >   [0.00]  RCU lockdep checking is enabled.
> > >   [0.00]  Additional per-CPU info printed with stalls.
> > >   [0.00]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
> > >   [0.00]  Offload RCU callbacks from all CPUs
> > >   [0.00]  Offload RCU callbacks from CPUs: 0.
> > >   [0.00] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
> > > nr_cpu_ids=1
> > >   [0.00] NO_HZ: Full dynticks CPUs: 1-3.
> > > 
> > > But NO_HZ thinks that there are four.  This appears to be due to NO_HZ
> > > looking at the compile-time constants, and I doubt that this would cause
> > > a problem.  But if there really is a CPU 1 that RCU doesn't know about,
> > > and it q

Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-08 Thread Paul E. McKenney
On Fri, Aug 08, 2014 at 11:07:10PM +0530, Amit Shah wrote:
> On (Fri) 08 Aug 2014 [09:25:02], Paul E. McKenney wrote:
> > On Fri, Aug 08, 2014 at 02:10:56PM +0530, Amit Shah wrote:
> > > On Friday 11 July 2014 07:05 PM, Paul E. McKenney wrote:
> > > >From: "Paul E. McKenney" 
> > > >
> > > >An 80-CPU system with a context-switch-heavy workload can require so
> > > >many NOCB kthread wakeups that the RCU grace-period kthreads spend 
> > > >several
> > > >tens of percent of a CPU just awakening things.  This clearly will not
> > > >scale well: If you add enough CPUs, the RCU grace-period kthreads would
> > > >get behind, increasing grace-period latency.
> > > >
> > > >To avoid this problem, this commit divides the NOCB kthreads into leaders
> > > >and followers, where the grace-period kthreads awaken the leaders each of
> > > >whom in turn awakens its followers.  By default, the number of groups of
> > > >kthreads is the square root of the number of CPUs, but this default may
> > > >be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
> > > >This reduces the number of wakeups done per grace period by the RCU
> > > >grace-period kthread by the square root of the number of CPUs, but of
> > > >course by shifting those wakeups to the leaders.  In addition, because
> > > >the leaders do grace periods on behalf of their respective followers,
> > > >the number of wakeups of the followers decreases by up to a factor of 
> > > >two.
> > > >Instead of being awakened once when new callbacks arrive and again
> > > >at the end of the grace period, the followers are awakened only at
> > > >the end of the grace period.
> > > >
> > > >For a numerical example, in a 4096-CPU system, the grace-period kthread
> > > >would awaken 64 leaders, each of which would awaken its 63 followers
> > > >at the end of the grace period.  This compares favorably with the 79
> > > >wakeups for the grace-period kthread on an 80-CPU system.
> > > >
> > > >Reported-by: Rik van Riel 
> > > >Signed-off-by: Paul E. McKenney 
> > > 
> > > This patch causes KVM guest boot to not proceed after a while.
> > > .config is attached, and boot messages are appeneded.  This commit
> > > was pointed to by bisect, and reverting on current master (while
> > > addressing a trivial conflict) makes the boot work again.
> > > 
> > > The qemu cmdline is
> > > 
> > > ./x86_64-softmmu/qemu-system-x86_64 -m 512 -smp 2 -cpu
> > > host,+kvmclock,+x2apic -enable-kvm  -kernel
> > > ~/src/linux/arch/x86/boot/bzImage /guests/f11-auto.qcow2  -append
> > > 'root=/dev/sda2 console=ttyS0 console=tty0' -snapshot -serial stdio
> > 
> > I cannot reproduce this.  I am at commit a7d7a143d0b4c, in case that
> > makes a difference.
> 
> Yea; I'm at that commit too.  And the version of qemu doesn't matter;
> happens on F20's qemu-kvm-1.6.2-7.fc20.x86_64 as well as qemu.git
> compiled locally.
> 
> > There are some things in your dmesg that look quite strange to me, though.
> > 
> > You have "--smp 2" above, but in your dmesg I see the following:
> > 
> > [0.00] setup_percpu: NR_CPUS:4 nr_cpumask_bits:4
> > nr_cpu_ids:1 nr_node_ids:1
> > 
> > So your run somehow only has one CPU.  RCU agrees that there is only
> > one CPU:
> 
> Yea; indeed.  There are MTRR warnings too; attaching the boot log of
> failed run and diff to the successful run (rcu-good-notime.txt).

My qemu runs don't have those MTRR warnings, for whatever that is worth.

> The failed run is on commit a7d7a143d0b4cb1914705884ca5c25e322dba693
> and the successful run has these reverted on top:
> 
> 187497fa5e9e9383820d33e48b87f8200a747c2a
> b58cc46c5f6b57f1c814e374dbc47176e6b4938e
> fbce7497ee5af800a1c350c73f3c3f103cb27a15

OK.  Strange set of commits.

> That is rcu-bad-notime.txt.
> 
> > [0.00] Preemptible hierarchical RCU implementation.
> > [0.00]  RCU debugfs-based tracing is enabled.
> > [0.00]  RCU lockdep checking is enabled.
> > [0.00]  Additional per-CPU info printed with stalls.
> > [0.00]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
> > [0.00]  Offload RCU callbacks from all CPUs
> > [0.00]  Offload RCU callbacks from CPUs: 0.
> > [0.00] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
> > nr_cpu_ids=1
> > [0.00] NO_HZ: Full dynticks CPUs: 1-3.
> > 
> > But NO_HZ thinks that there are four.  This appears to be due to NO_HZ
> > looking at the compile-time constants, and I doubt that this would cause
> > a problem.  But if there really is a CPU 1 that RCU doesn't know about,
> > and it queues a callback, that callback will never be invoked, and you
> > could easily see hangs.
> > 
> > Give that your .config says CONFIG_NR_CPUS=4 and your qemu says "--smp 2",
> > why does nr_cpu_ids think that there is only one CPU?  Are you running
> > this on a non-x86_64 CPU so that qemu only does UP or some such?
> 
> No; this is "Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz" on a ThinkP

Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-08 Thread Amit Shah
On (Fri) 08 Aug 2014 [09:25:02], Paul E. McKenney wrote:
> On Fri, Aug 08, 2014 at 02:10:56PM +0530, Amit Shah wrote:
> > On Friday 11 July 2014 07:05 PM, Paul E. McKenney wrote:
> > >From: "Paul E. McKenney" 
> > >
> > >An 80-CPU system with a context-switch-heavy workload can require so
> > >many NOCB kthread wakeups that the RCU grace-period kthreads spend several
> > >tens of percent of a CPU just awakening things.  This clearly will not
> > >scale well: If you add enough CPUs, the RCU grace-period kthreads would
> > >get behind, increasing grace-period latency.
> > >
> > >To avoid this problem, this commit divides the NOCB kthreads into leaders
> > >and followers, where the grace-period kthreads awaken the leaders each of
> > >whom in turn awakens its followers.  By default, the number of groups of
> > >kthreads is the square root of the number of CPUs, but this default may
> > >be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
> > >This reduces the number of wakeups done per grace period by the RCU
> > >grace-period kthread by the square root of the number of CPUs, but of
> > >course by shifting those wakeups to the leaders.  In addition, because
> > >the leaders do grace periods on behalf of their respective followers,
> > >the number of wakeups of the followers decreases by up to a factor of two.
> > >Instead of being awakened once when new callbacks arrive and again
> > >at the end of the grace period, the followers are awakened only at
> > >the end of the grace period.
> > >
> > >For a numerical example, in a 4096-CPU system, the grace-period kthread
> > >would awaken 64 leaders, each of which would awaken its 63 followers
> > >at the end of the grace period.  This compares favorably with the 79
> > >wakeups for the grace-period kthread on an 80-CPU system.
> > >
> > >Reported-by: Rik van Riel 
> > >Signed-off-by: Paul E. McKenney 
> > 
> > This patch causes KVM guest boot to not proceed after a while.
> > .config is attached, and boot messages are appeneded.  This commit
> > was pointed to by bisect, and reverting on current master (while
> > addressing a trivial conflict) makes the boot work again.
> > 
> > The qemu cmdline is
> > 
> > ./x86_64-softmmu/qemu-system-x86_64 -m 512 -smp 2 -cpu
> > host,+kvmclock,+x2apic -enable-kvm  -kernel
> > ~/src/linux/arch/x86/boot/bzImage /guests/f11-auto.qcow2  -append
> > 'root=/dev/sda2 console=ttyS0 console=tty0' -snapshot -serial stdio
> 
> I cannot reproduce this.  I am at commit a7d7a143d0b4c, in case that
> makes a difference.

Yea; I'm at that commit too.  And the version of qemu doesn't matter;
happens on F20's qemu-kvm-1.6.2-7.fc20.x86_64 as well as qemu.git
compiled locally.

> There are some things in your dmesg that look quite strange to me, though.
> 
> You have "--smp 2" above, but in your dmesg I see the following:
> 
>   [0.00] setup_percpu: NR_CPUS:4 nr_cpumask_bits:4
>   nr_cpu_ids:1 nr_node_ids:1
> 
> So your run somehow only has one CPU.  RCU agrees that there is only
> one CPU:

Yea; indeed.  There are MTRR warnings too; attaching the boot log of
failed run and diff to the successful run (rcu-good-notime.txt).

The failed run is on commit a7d7a143d0b4cb1914705884ca5c25e322dba693
and the successful run has these reverted on top:

187497fa5e9e9383820d33e48b87f8200a747c2a
b58cc46c5f6b57f1c814e374dbc47176e6b4938e
fbce7497ee5af800a1c350c73f3c3f103cb27a15

That is rcu-bad-notime.txt.

>   [0.00] Preemptible hierarchical RCU implementation.
>   [0.00]  RCU debugfs-based tracing is enabled.
>   [0.00]  RCU lockdep checking is enabled.
>   [0.00]  Additional per-CPU info printed with stalls.
>   [0.00]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
>   [0.00]  Offload RCU callbacks from all CPUs
>   [0.00]  Offload RCU callbacks from CPUs: 0.
>   [0.00] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
> nr_cpu_ids=1
>   [0.00] NO_HZ: Full dynticks CPUs: 1-3.
> 
> But NO_HZ thinks that there are four.  This appears to be due to NO_HZ
> looking at the compile-time constants, and I doubt that this would cause
> a problem.  But if there really is a CPU 1 that RCU doesn't know about,
> and it queues a callback, that callback will never be invoked, and you
> could easily see hangs.
> 
> Give that your .config says CONFIG_NR_CPUS=4 and your qemu says "--smp 2",
> why does nr_cpu_ids think that there is only one CPU?  Are you running
> this on a non-x86_64 CPU so that qemu only does UP or some such?

No; this is "Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz" on a ThinkPad
T420s.

In my attached boot logs, RCU does detect two cpus.  Here's the diff
between them.  I recompiled to remove the timing info so the diffs are
comparable:

$ diff -u /var/tmp/rcu-bad-notime.txt /var/tmp/rcu-good-notime.txt 
--- /var/tmp/rcu-bad-notime.txt2014-08-08 22:49:37.207745682 +0530
+++ /var/tmp/rcu-good-notime.txt  

Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-08 Thread Paul E. McKenney
On Fri, Aug 08, 2014 at 02:10:56PM +0530, Amit Shah wrote:
> On Friday 11 July 2014 07:05 PM, Paul E. McKenney wrote:
> >From: "Paul E. McKenney" 
> >
> >An 80-CPU system with a context-switch-heavy workload can require so
> >many NOCB kthread wakeups that the RCU grace-period kthreads spend several
> >tens of percent of a CPU just awakening things.  This clearly will not
> >scale well: If you add enough CPUs, the RCU grace-period kthreads would
> >get behind, increasing grace-period latency.
> >
> >To avoid this problem, this commit divides the NOCB kthreads into leaders
> >and followers, where the grace-period kthreads awaken the leaders each of
> >whom in turn awakens its followers.  By default, the number of groups of
> >kthreads is the square root of the number of CPUs, but this default may
> >be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
> >This reduces the number of wakeups done per grace period by the RCU
> >grace-period kthread by the square root of the number of CPUs, but of
> >course by shifting those wakeups to the leaders.  In addition, because
> >the leaders do grace periods on behalf of their respective followers,
> >the number of wakeups of the followers decreases by up to a factor of two.
> >Instead of being awakened once when new callbacks arrive and again
> >at the end of the grace period, the followers are awakened only at
> >the end of the grace period.
> >
> >For a numerical example, in a 4096-CPU system, the grace-period kthread
> >would awaken 64 leaders, each of which would awaken its 63 followers
> >at the end of the grace period.  This compares favorably with the 79
> >wakeups for the grace-period kthread on an 80-CPU system.
> >
> >Reported-by: Rik van Riel 
> >Signed-off-by: Paul E. McKenney 
> 
> This patch causes KVM guest boot to not proceed after a while.
> .config is attached, and boot messages are appeneded.  This commit
> was pointed to by bisect, and reverting on current master (while
> addressing a trivial conflict) makes the boot work again.
> 
> The qemu cmdline is
> 
> ./x86_64-softmmu/qemu-system-x86_64 -m 512 -smp 2 -cpu
> host,+kvmclock,+x2apic -enable-kvm  -kernel
> ~/src/linux/arch/x86/boot/bzImage /guests/f11-auto.qcow2  -append
> 'root=/dev/sda2 console=ttyS0 console=tty0' -snapshot -serial stdio

I cannot reproduce this.  I am at commit a7d7a143d0b4c, in case that
makes a difference.

There are some things in your dmesg that look quite strange to me, though.

You have "--smp 2" above, but in your dmesg I see the following:

[0.00] setup_percpu: NR_CPUS:4 nr_cpumask_bits:4
nr_cpu_ids:1 nr_node_ids:1

So your run somehow only has one CPU.  RCU agrees that there is only
one CPU:

[0.00] Preemptible hierarchical RCU implementation.
[0.00]  RCU debugfs-based tracing is enabled.
[0.00]  RCU lockdep checking is enabled.
[0.00]  Additional per-CPU info printed with stalls.
[0.00]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
[0.00]  Offload RCU callbacks from all CPUs
[0.00]  Offload RCU callbacks from CPUs: 0.
[0.00] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
[0.00] NO_HZ: Full dynticks CPUs: 1-3.

But NO_HZ thinks that there are four.  This appears to be due to NO_HZ
looking at the compile-time constants, and I doubt that this would cause
a problem.  But if there really is a CPU 1 that RCU doesn't know about,
and it queues a callback, that callback will never be invoked, and you
could easily see hangs.

Give that your .config says CONFIG_NR_CPUS=4 and your qemu says "--smp 2",
why does nr_cpu_ids think that there is only one CPU?  Are you running
this on a non-x86_64 CPU so that qemu only does UP or some such?

The following is what I get (and what I would expect) with that setup:

[0.00] Hierarchical RCU implementation.
[0.00]  RCU debugfs-based tracing is enabled.
[0.00]  RCU lockdep checking is enabled.
[0.00]  Additional per-CPU info printed with stalls.
[0.00]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
[0.00]  Offload RCU callbacks from all CPUs
[0.00]  Offload RCU callbacks from CPUs: 0-1.
[0.00] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=2
[0.00] NO_HZ: Full dynticks CPUs: 1-3.

So whatever did you do with CPU 1?  ;-)

Of course, if I tell qemu "--smp 1" instead of "--smp 2", then RCU thinks
that there is only one CPU:

[0.00] Hierarchical RCU implementation.
[0.00]  RCU debugfs-based tracing is enabled.
[0.00]  RCU lockdep checking is enabled.
[0.00]  Additional per-CPU info printed with stalls.
[0.00]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
[0.00]  Offload RCU ca

Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-08 Thread Amit Shah

On Friday 11 July 2014 07:05 PM, Paul E. McKenney wrote:

From: "Paul E. McKenney" 

An 80-CPU system with a context-switch-heavy workload can require so
many NOCB kthread wakeups that the RCU grace-period kthreads spend several
tens of percent of a CPU just awakening things.  This clearly will not
scale well: If you add enough CPUs, the RCU grace-period kthreads would
get behind, increasing grace-period latency.

To avoid this problem, this commit divides the NOCB kthreads into leaders
and followers, where the grace-period kthreads awaken the leaders each of
whom in turn awakens its followers.  By default, the number of groups of
kthreads is the square root of the number of CPUs, but this default may
be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
This reduces the number of wakeups done per grace period by the RCU
grace-period kthread by the square root of the number of CPUs, but of
course by shifting those wakeups to the leaders.  In addition, because
the leaders do grace periods on behalf of their respective followers,
the number of wakeups of the followers decreases by up to a factor of two.
Instead of being awakened once when new callbacks arrive and again
at the end of the grace period, the followers are awakened only at
the end of the grace period.

For a numerical example, in a 4096-CPU system, the grace-period kthread
would awaken 64 leaders, each of which would awaken its 63 followers
at the end of the grace period.  This compares favorably with the 79
wakeups for the grace-period kthread on an 80-CPU system.

Reported-by: Rik van Riel 
Signed-off-by: Paul E. McKenney 


This patch causes KVM guest boot to not proceed after a while.  .config 
is attached, and boot messages are appeneded.  This commit was pointed 
to by bisect, and reverting on current master (while addressing a 
trivial conflict) makes the boot work again.


The qemu cmdline is

./x86_64-softmmu/qemu-system-x86_64 -m 512 -smp 2 -cpu 
host,+kvmclock,+x2apic -enable-kvm  -kernel 
~/src/linux/arch/x86/boot/bzImage /guests/f11-auto.qcow2  -append 
'root=/dev/sda2 console=ttyS0 console=tty0' -snapshot -serial stdio


Using qemu.git.

Rik suggested collecting qemu stack traces, here they are:

$ pgrep qemu
10587
$ cat /proc/10587/stack
[] poll_schedule_timeout+0x49/0x70
[] do_sys_poll+0x442/0x560
[] SyS_ppoll+0x1b3/0x1d0
[] system_call_fastpath+0x16/0x1b
[] 0x

$ cat /proc/10587/task/105
10587/ 10589/ 10590/ 10592/


$ cat /proc/10587/task/*/stack
[] poll_schedule_timeout+0x49/0x70
[] do_sys_poll+0x442/0x560
[] SyS_ppoll+0x1b3/0x1d0
[] system_call_fastpath+0x16/0x1b
[] 0x
[] kvm_vcpu_block+0x7d/0xd0 [kvm]
[] kvm_arch_vcpu_ioctl_run+0x11c/0x1180 [kvm]
[] kvm_vcpu_ioctl+0x2aa/0x5a0 [kvm]
[] do_vfs_ioctl+0x2e0/0x4a0
[] SyS_ioctl+0x81/0xa0
[] system_call_fastpath+0x16/0x1b
[] 0x
[] kvm_vcpu_block+0x7d/0xd0 [kvm]
[] kvm_arch_vcpu_ioctl_run+0x11c/0x1180 [kvm]
[] kvm_vcpu_ioctl+0x2aa/0x5a0 [kvm]
[] do_vfs_ioctl+0x2e0/0x4a0
[] SyS_ioctl+0x81/0xa0
[] system_call_fastpath+0x16/0x1b
[] 0x
[] poll_schedule_timeout+0x49/0x70
[] do_sys_poll+0x442/0x560
[] SyS_poll+0x74/0x110
[] system_call_fastpath+0x16/0x1b
[] 0x


[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.16.0-rc1+ (a...@grmbl.mre) (gcc version 
4.8.3 20140624 (Red Hat 4.8.3-1) (GCC) ) #71 SMP PREEMPT Thu Aug 7 
21:30:26 IST 2014

[0.00] Command line: root=/dev/sda2 console=ttyS0 console=tty0
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] 
reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] 
reserved

[0.00] BIOS-e820: [mem 0x0010-0x1ffd] usable
[0.00] BIOS-e820: [mem 0x1ffe-0x1fff] 
reserved
[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] 
reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] 
reserved

[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.8 present.
[0.00] Hypervisor detected: KVM
[0.00] AGP: No AGP bridge found
[0.00] e820: last_pfn = 0x1ffe0 max_arch_pfn = 0x4
[0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 
0x7010600070106

[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00] init_memory_mapping: [mem 0x1f80-0x1f9f]
[0.00] init_memory_mapping: [mem 0x1c00-0x1f7f]
[0.00] init_memory_mapping: [mem 0x0010-0x1bff]
[0.00] init_memory_mapping: [mem 0x1fa0-0x1ffd]
[0.00] RAMDISK: [mem 0x1fa2e000-0x1ffe]
[0.00] Allocated new RAMDISK: [mem 0x1f342000-0x1f903645]
[0.00] Move RAMDISK from [mem 0x1fa2e000-0x1ffef645] to [mem 
0x1f342000-0x1f903645]

[0.00] ACPI: Early ta

[PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-07-11 Thread Paul E. McKenney
From: "Paul E. McKenney" 

An 80-CPU system with a context-switch-heavy workload can require so
many NOCB kthread wakeups that the RCU grace-period kthreads spend several
tens of percent of a CPU just awakening things.  This clearly will not
scale well: If you add enough CPUs, the RCU grace-period kthreads would
get behind, increasing grace-period latency.

To avoid this problem, this commit divides the NOCB kthreads into leaders
and followers, where the grace-period kthreads awaken the leaders each of
whom in turn awakens its followers.  By default, the number of groups of
kthreads is the square root of the number of CPUs, but this default may
be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
This reduces the number of wakeups done per grace period by the RCU
grace-period kthread by the square root of the number of CPUs, but of
course by shifting those wakeups to the leaders.  In addition, because
the leaders do grace periods on behalf of their respective followers,
the number of wakeups of the followers decreases by up to a factor of two.
Instead of being awakened once when new callbacks arrive and again
at the end of the grace period, the followers are awakened only at
the end of the grace period.

For a numerical example, in a 4096-CPU system, the grace-period kthread
would awaken 64 leaders, each of which would awaken its 63 followers
at the end of the grace period.  This compares favorably with the 79
wakeups for the grace-period kthread on an 80-CPU system.

Reported-by: Rik van Riel 
Signed-off-by: Paul E. McKenney 
---
 Documentation/kernel-parameters.txt |   7 +
 kernel/rcu/tree.h   |  28 +++-
 kernel/rcu/tree_plugin.h| 252 ++--
 3 files changed, 244 insertions(+), 43 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 910c3829f81d..770662c42c9f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2802,6 +2802,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
quiescent states.  Units are jiffies, minimum
value is one, and maximum value is HZ.
 
+   rcutree.rcu_nocb_leader_stride= [KNL]
+   Set the number of NOCB kthread groups, which
+   defaults to the square root of the number of
+   CPUs.  Larger numbers reduces the wakeup overhead
+   on the per-CPU grace-period kthreads, but increases
+   that same overhead on each group's leader.
+
rcutree.qhimark= [KNL]
Set threshold of queued RCU callbacks beyond which
batch limiting is disabled.
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 0f69a79c5b7d..e996d1e53c84 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -334,11 +334,29 @@ struct rcu_data {
struct rcu_head **nocb_tail;
atomic_long_t nocb_q_count; /* # CBs waiting for kthread */
atomic_long_t nocb_q_count_lazy; /*  (approximate). */
+   struct rcu_head *nocb_follower_head; /* CBs ready to invoke. */
+   struct rcu_head **nocb_follower_tail;
+   atomic_long_t nocb_follower_count; /* # CBs ready to invoke. */
+   atomic_long_t nocb_follower_count_lazy; /*  (approximate). */
int nocb_p_count;   /* # CBs being invoked by kthread */
int nocb_p_count_lazy;  /*  (approximate). */
wait_queue_head_t nocb_wq;  /* For nocb kthreads to sleep on. */
struct task_struct *nocb_kthread;
bool nocb_defer_wakeup; /* Defer wakeup of nocb_kthread. */
+
+   /* The following fields are used by the leader, hence own cacheline. */
+   struct rcu_head *nocb_gp_head cacheline_internodealigned_in_smp;
+   /* CBs waiting for GP. */
+   struct rcu_head **nocb_gp_tail;
+   long nocb_gp_count;
+   long nocb_gp_count_lazy;
+   bool nocb_leader_wake;  /* Is the nocb leader thread awake? */
+   struct rcu_data *nocb_next_follower;
+   /* Next follower in wakeup chain. */
+
+   /* The following fields are used by the follower, hence new cachline. */
+   struct rcu_data *nocb_leader cacheline_internodealigned_in_smp;
+   /* Leader CPU takes GP-end wakeups. */
 #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
 
/* 8) RCU CPU stall data. */
@@ -587,8 +605,14 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
 /* Sum up queue lengths for tracing. */
 static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long 
*qll)
 {
-   *ql = atomic_long_read(&rdp->nocb_q_count) + rdp->nocb_p_count;
-   *qll = atomic_long_read(&rdp->nocb_q_count_lazy) + 
rdp->nocb_p_count_lazy;
+   *ql = atomic_long_read(&rdp->nocb_q_co