Hi, Paul: I try to debug this problem and found this solution could work well for both problem scene.
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 85c5a88..dbc14a7 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2172,7 +2172,7 @@ static int rcu_nocb_kthread(void *arg) if (__rcu_reclaim(rdp->rsp->name, list)) cl++; c++; - local_bh_enable(); + _local_bh_enable(); cond_resched_rcu_qs(); list = next; } The cond_resched_rcu_qs() would process the softirq if the softirq is pending, so no need to use local_bh_enable() to process the softirq twice here, and it will avoid OOM when huge packets arrives, what do you think about it? Please give me some suggestion. Thanks. Ding On 2016/11/21 9:28, Ding Tianhong wrote: > > > On 2016/11/21 8:13, Paul E. McKenney wrote: >> On Sat, Nov 19, 2016 at 12:22:09AM -0800, Paul E. McKenney wrote: >>> On Sat, Nov 19, 2016 at 03:50:32PM +0800, Ding Tianhong wrote: >>>> >>>> >>>> On 2016/11/18 21:01, Paul E. McKenney wrote: >>>>> On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote: >>>>>> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread") >>>>>> will introduce a new problem that when huge IP abnormal packet arrived, >>>>>> it may cause OOM and break the kernel, just like this: >>>>>> >>>>>> [ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2 >>>>>> [ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120 >>>>>> [ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE >>>>>> ----V------- 3.10.0-327.28.3.28.x86_64 #1 >>>>>> [ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >>>>>> BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014 >>>>>> [ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 >>>>>> ffffffff81638cb9 >>>>>> [ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 >>>>>> 0000000000000000 >>>>>> [ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 >>>>>> 00000000b080d798 >>>>>> [ 100.067050] Call Trace: >>>>>> [ 100.067057] [<ffffffff81638cb9>] dump_stack+0x19/0x1b >>>>>> [ 100.067062] [<ffffffff81171380>] warn_alloc_failed+0x110/0x180 >>>>>> [ 100.067066] [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0 >>>>>> [ 100.067070] [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0 >>>>>> [ 100.067075] [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170 >>>>>> [ 100.067080] [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 >>>>>> [mlx4_en] >>>>>> [ 100.067083] [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 >>>>>> [mlx4_en] >>>>>> [ 100.067086] [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60 >>>>>> [ 100.067088] [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0 >>>>>> [ 100.067092] [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 >>>>>> [mlx4_en] >>>>>> [ 100.067095] [<ffffffff8131027d>] ? list_del+0xd/0x30 >>>>>> [ 100.067098] [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30 >>>>>> [ 100.067101] [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 >>>>>> [mlx4_en] >>>>>> [ 100.067103] [<ffffffff8152f372>] net_rx_action+0x152/0x240 >>>>>> [ 100.067107] [<ffffffff81084d1f>] __do_softirq+0xef/0x280 >>>>>> [ 100.067109] [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50 >>>>>> [ 100.067114] [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0 >>>>>> [ 100.067117] [<ffffffff8163e269>] ? schedule+0x29/0x70 >>>>>> [ 100.067120] [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90 >>>>>> [ 100.067122] [<ffffffff810a5d4f>] kthread+0xcf/0xe0 >>>>>> [ 100.067124] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140 >>>>>> [ 100.067127] [<ffffffff81649198>] ret_from_fork+0x58/0x90 >>>>>> [ 100.067129] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140 >>>>>> >>>>>> ================================cut >>>>>> here===================================== >>>>>> >>>>>> The reason is that the huge abnormal IP packet will be received to net >>>>>> stack >>>>>> and be dropped finally by dst_release, and the dst_release would use the >>>>>> rcuos >>>>>> callback-offload kthread to free the packet, but the >>>>>> cond_resched_rcu_qs() will >>>>>> calling do_softirq() to receive more and more IP abnormal packets which >>>>>> will be >>>>>> throw into the RCU callbacks again later, the number of received packet >>>>>> is much >>>>>> greater than the number of packets freed, it will exhaust the memory and >>>>>> then OOM, >>>>>> so don't try to process any pending softirqs in the rcuos >>>>>> callback-offload kthread >>>>>> is a more effective solution. >>>>> >>>>> OK, but we could still have softirqs processed by the grace-period kthread >>>>> as a result of any number of other events. So this change might reduce >>>>> the probability of this problem, but it doesn't eliminate it. >>>>> >>>>> How huge are these huge IP packets? Is the underlying problem that they >>>>> are too large to use the memory-allocator fastpaths? >>>>> >>>>> Thanx, Paul >>>>> >>>> >>>> I use the 40G mellanox NiC to receive packet, and the testgine could send >>>> Mac abnormal packet and >>>> IP abnormal packet to full speed. >>>> >>>> The Mac abnormal packet would be dropped at low level and not be received >>>> to net stack, >>>> but the IP abnormal packet will introduce this problem, every packet will >>>> looks as new dst first and >>>> release later by dst_release because it is meaningless. >>>> >>>> dst_release->call_rcu(&dst->rcu_head, dst_destroy_rcu); >>>> >>>> so all packet will be freed until the rcuos callback-offload kthread >>>> processing, it will be a infinite loop >>>> if huge packet is coming because the do_softirq will load more and more >>>> packet to the rcuos processing kthread, >>>> so I still could not find a better way to fix this, btw, it is really hard >>>> to say the driver use too large memory-allocater >>>> fastpaths, there is no memory leak and the Ixgbe may meet the same problem >>>> too. >> >> And following up on my fastpath point -- from what I can see, one >> big effect of the large invalid packets is that they push processing >> off of a number of fastpaths. If these packets could be rejected with >> less per-packet processing, I bet that things would work much better. >> >> Thanx, Paul > > Yes, and I found the WARN_ON_ONCE(!irqs_disabled()) will be triggered if use > _local_bh_enable here, > so I think we could ask some help from Eric and David how to reject the huge > number packets. > > Thanks > Ding > >> >>> The overall effect of these two patches is to move from enabling bh >>> (and processing recent softirqs) to enabling bh without processing >>> recent softirqs. Is this really the correct way to solve this problem? >>> What about this solution is avoiding re-introducing the original >>> softlockups? Have you talked to the networking guys about this issue? >>> >>> Thanx, Paul >>> >>>> Thanks. >>>> Ding >>>> >>>> >>>>>> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread") >>>>>> Signed-off-by: Ding Tianhong <dingtianh...@huawei.com> >>>>>> >>>>>> Signed-off-by: Ding Tianhong <dingtianh...@huawei.com> >>>>>> --- >>>>>> kernel/rcu/tree_plugin.h | 3 +-- >>>>>> 1 file changed, 1 insertion(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h >>>>>> index 85c5a88..760c3b5 100644 >>>>>> --- a/kernel/rcu/tree_plugin.h >>>>>> +++ b/kernel/rcu/tree_plugin.h >>>>>> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg) >>>>>> if (__rcu_reclaim(rdp->rsp->name, list)) >>>>>> cl++; >>>>>> c++; >>>>>> - local_bh_enable(); >>>>>> - cond_resched_rcu_qs(); >>>>>> + _local_bh_enable(); >>>>>> list = next; >>>>>> } >>>>>> trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1); >>>>>> -- >>>>>> 1.9.0 >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> . >>>>> >>>> >> >> >> . >>