[PATCH 4.8 14/35] rcu: Fix soft lockup for rcu_nocb_kthread

2016-12-06 Thread Greg Kroah-Hartman
4.8-stable review patch.  If anyone has any objections, please let me know.

--

From: Ding Tianhong 

commit bedc1969150d480c462cdac320fa944b694a7162 upstream.

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
88057dd9c000
[  368.106005] RIP: 0010:[]  [] 
fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
[  368.106005] RAX: 0001 RBX: 020155c0 RCX: 0001
[  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 880036d11a00
[  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
[  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 88061fc83c58
[  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 020155c0
[  368.106005] FS:  () GS:88061fc8() 
knlGS:
[  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
[  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 000407e0
[  368.106005] DR0:  DR1:  DR2: 
[  368.106005] DR3:  DR6: 0ff0 DR7: 0400
[  368.106005] Stack:
[  368.106005]  01c0 88057b766000 8802e380b000 
88057af03e00
[  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
814ee146
[  368.106005]  8802e380af00 e380af00 819e0900 
020155c001c0
[  368.106005] Call Trace:
[  368.106005]  
[  368.106005]
[  368.106005]  [] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [] ? skb_release_data+0xd6/0x110
[  368.106005]  [] ? kfree_skb+0x3a/0xa0
[  368.106005]  [] ip_rcv_finish+0x29f/0x350
[  368.106005]  [] ip_rcv+0x234/0x380
[  368.106005]  [] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [] __netif_receive_skb+0x18/0x60
[  368.106005]  [] process_backlog+0xae/0x180
[  368.106005]  [] net_rx_action+0x152/0x240
[  368.106005]  [] __do_softirq+0xef/0x280
[  368.106005]  [] call_softirq+0x1c/0x30
[  368.106005]  
[  368.106005]
[  368.106005]  [] do_softirq+0x65/0xa0
[  368.106005]  [] local_bh_enable+0x94/0xa0
[  368.106005]  [] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [] ? wake_up_bit+0x30/0x30
[  368.106005]  [] ? rcu_start_gp+0x40/0x40
[  368.106005]  [] kthread+0xcf/0xe0
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [] ret_from_fork+0x58/0x90
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140

==cut here==

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong 
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney 
Cc: Dhaval Giani 
Signed-off-by: Greg Kroah-Hartman 

---
 kernel/rcu/tree_plugin.h |1 +
 1 file changed, 1 insertion(+)

--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
cl++;
c++;
local_bh_enable();
+   cond_resched_rcu_qs();
list = next;
}
trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);




[PATCH 4.8 14/35] rcu: Fix soft lockup for rcu_nocb_kthread

2016-12-06 Thread Greg Kroah-Hartman
4.8-stable review patch.  If anyone has any objections, please let me know.

--

From: Ding Tianhong 

commit bedc1969150d480c462cdac320fa944b694a7162 upstream.

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
88057dd9c000
[  368.106005] RIP: 0010:[]  [] 
fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
[  368.106005] RAX: 0001 RBX: 020155c0 RCX: 0001
[  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 880036d11a00
[  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
[  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 88061fc83c58
[  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 020155c0
[  368.106005] FS:  () GS:88061fc8() 
knlGS:
[  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
[  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 000407e0
[  368.106005] DR0:  DR1:  DR2: 
[  368.106005] DR3:  DR6: 0ff0 DR7: 0400
[  368.106005] Stack:
[  368.106005]  01c0 88057b766000 8802e380b000 
88057af03e00
[  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
814ee146
[  368.106005]  8802e380af00 e380af00 819e0900 
020155c001c0
[  368.106005] Call Trace:
[  368.106005]  
[  368.106005]
[  368.106005]  [] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [] ? skb_release_data+0xd6/0x110
[  368.106005]  [] ? kfree_skb+0x3a/0xa0
[  368.106005]  [] ip_rcv_finish+0x29f/0x350
[  368.106005]  [] ip_rcv+0x234/0x380
[  368.106005]  [] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [] __netif_receive_skb+0x18/0x60
[  368.106005]  [] process_backlog+0xae/0x180
[  368.106005]  [] net_rx_action+0x152/0x240
[  368.106005]  [] __do_softirq+0xef/0x280
[  368.106005]  [] call_softirq+0x1c/0x30
[  368.106005]  
[  368.106005]
[  368.106005]  [] do_softirq+0x65/0xa0
[  368.106005]  [] local_bh_enable+0x94/0xa0
[  368.106005]  [] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [] ? wake_up_bit+0x30/0x30
[  368.106005]  [] ? rcu_start_gp+0x40/0x40
[  368.106005]  [] kthread+0xcf/0xe0
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [] ret_from_fork+0x58/0x90
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140

==cut here==

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong 
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney 
Cc: Dhaval Giani 
Signed-off-by: Greg Kroah-Hartman 

---
 kernel/rcu/tree_plugin.h |1 +
 1 file changed, 1 insertion(+)

--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
cl++;
c++;
local_bh_enable();
+   cond_resched_rcu_qs();
list = next;
}
trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);