Re: regression with napi/softirq ?
On Thu, Jul 18, 2019 at 4:08 PM Eric Dumazet wrote: > > > > On 7/18/19 2:55 PM, Sudip Mukherjee wrote: > > > Thanks Eric. But there is no improvement in delay between > > softirq_raise and softirq_entry with this change. > > But moving to a later kernel (linus master branch? ) like Thomas has > > said in the other mail might be difficult atm. I can definitely > > move to v4.14.133 if that helps. Thomas ? > > If you are tracking max latency then I guess you have to tweak > SOFTIRQ_NOW_MASK > to include NET_RX_SOFTIRQ > > The patch I gave earlier would only lower the probability of events, not > completely get rid of them. > > > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index > 0427a86743a46b7e1891f7b6c1ff585a8a1695f5..302046dd8d7e6740e466c422954f22565fe19e69 > 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -81,7 +81,7 @@ static void wakeup_softirqd(void) > * right now. Let ksoftirqd handle this at its own rate, to get fairness, > * unless we're doing some of the synchronous softirqs. > */ > -#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ)) > +#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ) | (1 << > NET_RX_SOFTIRQ)) > static bool ksoftirqd_running(unsigned long pending) > { > struct task_struct *tsk = __this_cpu_read(ksoftirqd); Thanks Eric, this looks better than the hack that tglx gave. :) Though the hack was good for testing. But my original problem was a drop is network packets and till now I was thinking that the delay in processing the softirq is causing that. But with the hack tglx has given the latency has decreased but my problem is still there. So, I am looking into it again now. Thanks again for the patch. -- Regards Sudip
Re: regression with napi/softirq ?
On 7/18/19 2:55 PM, Sudip Mukherjee wrote: > Thanks Eric. But there is no improvement in delay between > softirq_raise and softirq_entry with this change. > But moving to a later kernel (linus master branch? ) like Thomas has > said in the other mail might be difficult atm. I can definitely > move to v4.14.133 if that helps. Thomas ? If you are tracking max latency then I guess you have to tweak SOFTIRQ_NOW_MASK to include NET_RX_SOFTIRQ The patch I gave earlier would only lower the probability of events, not completely get rid of them. diff --git a/kernel/softirq.c b/kernel/softirq.c index 0427a86743a46b7e1891f7b6c1ff585a8a1695f5..302046dd8d7e6740e466c422954f22565fe19e69 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -81,7 +81,7 @@ static void wakeup_softirqd(void) * right now. Let ksoftirqd handle this at its own rate, to get fairness, * unless we're doing some of the synchronous softirqs. */ -#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ)) +#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ) | (1 << NET_RX_SOFTIRQ)) static bool ksoftirqd_running(unsigned long pending) { struct task_struct *tsk = __this_cpu_read(ksoftirqd);
Re: regression with napi/softirq ?
On Thu, Jul 18, 2019 at 12:42 PM Eric Dumazet wrote: > > > > On 7/18/19 1:18 PM, Sudip Mukherjee wrote: > > Hi Eric, > > > > On Thu, Jul 18, 2019 at 7:58 AM Eric Dumazet wrote: > >> > >> > >> > >> On 7/17/19 11:52 PM, Thomas Gleixner wrote: > >>> Sudip, > >>> > >>> On Wed, 17 Jul 2019, Sudip Mukherjee wrote: > On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner > wrote: > > You can hack ksoftirq_running() to return always false to avoid this, > > but > > that might cause application starvation and a huge packet buffer backlog > > when the amount of incoming packets makes the CPU do nothing else than > > softirq processing. > > I tried that now, it is better but still not as good as v3.8 > Now I am getting 375.9usec as the maximum time between raising the > softirq > and it starting to execute and packet drops still there. > > And just a thought, do you think there should be a CONFIG_ option for > this feature of ksoftirqd_running() so that it can be disabled if needed > by users like us? > >>> > >>> If at all then a sysctl to allow runtime control. > >>> > > > > > > > >> > >> ksoftirqd might be spuriously scheduled from tx path, when > >> __qdisc_run() also reacts to need_resched(). > >> > >> By raising NET_TX while we are processing NET_RX (say we send a TCP ACK > >> packet > >> in response to incoming packet), we force __do_softirq() to perform > >> another loop, but before doing an other round, it will also check > >> need_resched() > >> and eventually call wakeup_softirqd() > >> > >> I wonder if following patch makes any difference. > >> > >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > >> index > >> 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b > >> 100644 > >> --- a/net/sched/sch_generic.c > >> +++ b/net/sched/sch_generic.c > >> @@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q) > >> int packets; > >> > >> while (qdisc_restart(q, &packets)) { > >> + if (qdisc_is_empty(q)) > >> + break; > > > > unfortunately its v4.14.55 and qdisc_is_empty() is not yet introduced. > > And I can not backport 28cff537ef2e ("net: sched: add empty status > > flag for NOLOCK qdisc") > > also as TCQ_F_NOLOCK is there. :( > > > > On old kernels, you can simply use > > static inline bool qdisc_is_empty(struct Qdisc *q) > { > return !qdisc_qlen(q); > } > Thanks Eric. But there is no improvement in delay between softirq_raise and softirq_entry with this change. But moving to a later kernel (linus master branch? ) like Thomas has said in the other mail might be difficult atm. I can definitely move to v4.14.133 if that helps. Thomas ? -- Regards Sudip
Re: regression with napi/softirq ?
On Thu, 18 Jul 2019, Sudip Mukherjee wrote: > On Thu, Jul 18, 2019 at 7:58 AM Eric Dumazet wrote: > > ksoftirqd might be spuriously scheduled from tx path, when > > __qdisc_run() also reacts to need_resched(). > > > > By raising NET_TX while we are processing NET_RX (say we send a TCP ACK > > packet > > in response to incoming packet), we force __do_softirq() to perform > > another loop, but before doing an other round, it will also check > > need_resched() > > and eventually call wakeup_softirqd() > > > > I wonder if following patch makes any difference. > > > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > > index > > 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b > > 100644 > > --- a/net/sched/sch_generic.c > > +++ b/net/sched/sch_generic.c > > @@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q) > > int packets; > > > > while (qdisc_restart(q, &packets)) { > > + if (qdisc_is_empty(q)) > > + break; > > unfortunately its v4.14.55 and qdisc_is_empty() is not yet introduced. > And I can not backport 28cff537ef2e ("net: sched: add empty status > flag for NOLOCK qdisc") > also as TCQ_F_NOLOCK is there. :( Then please run the test on a current kernel. Backports can be discussed once the problem is understood. Thanks, tglx
Re: regression with napi/softirq ?
On 7/18/19 1:18 PM, Sudip Mukherjee wrote: > Hi Eric, > > On Thu, Jul 18, 2019 at 7:58 AM Eric Dumazet wrote: >> >> >> >> On 7/17/19 11:52 PM, Thomas Gleixner wrote: >>> Sudip, >>> >>> On Wed, 17 Jul 2019, Sudip Mukherjee wrote: On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner wrote: > You can hack ksoftirq_running() to return always false to avoid this, but > that might cause application starvation and a huge packet buffer backlog > when the amount of incoming packets makes the CPU do nothing else than > softirq processing. I tried that now, it is better but still not as good as v3.8 Now I am getting 375.9usec as the maximum time between raising the softirq and it starting to execute and packet drops still there. And just a thought, do you think there should be a CONFIG_ option for this feature of ksoftirqd_running() so that it can be disabled if needed by users like us? >>> >>> If at all then a sysctl to allow runtime control. >>> > > > >> >> ksoftirqd might be spuriously scheduled from tx path, when >> __qdisc_run() also reacts to need_resched(). >> >> By raising NET_TX while we are processing NET_RX (say we send a TCP ACK >> packet >> in response to incoming packet), we force __do_softirq() to perform >> another loop, but before doing an other round, it will also check >> need_resched() >> and eventually call wakeup_softirqd() >> >> I wonder if following patch makes any difference. >> >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c >> index >> 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b >> 100644 >> --- a/net/sched/sch_generic.c >> +++ b/net/sched/sch_generic.c >> @@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q) >> int packets; >> >> while (qdisc_restart(q, &packets)) { >> + if (qdisc_is_empty(q)) >> + break; > > unfortunately its v4.14.55 and qdisc_is_empty() is not yet introduced. > And I can not backport 28cff537ef2e ("net: sched: add empty status > flag for NOLOCK qdisc") > also as TCQ_F_NOLOCK is there. :( > On old kernels, you can simply use static inline bool qdisc_is_empty(struct Qdisc *q) { return !qdisc_qlen(q); }
Re: regression with napi/softirq ?
Hi Eric, On Thu, Jul 18, 2019 at 7:58 AM Eric Dumazet wrote: > > > > On 7/17/19 11:52 PM, Thomas Gleixner wrote: > > Sudip, > > > > On Wed, 17 Jul 2019, Sudip Mukherjee wrote: > >> On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner wrote: > >>> You can hack ksoftirq_running() to return always false to avoid this, but > >>> that might cause application starvation and a huge packet buffer backlog > >>> when the amount of incoming packets makes the CPU do nothing else than > >>> softirq processing. > >> > >> I tried that now, it is better but still not as good as v3.8 > >> Now I am getting 375.9usec as the maximum time between raising the softirq > >> and it starting to execute and packet drops still there. > >> > >> And just a thought, do you think there should be a CONFIG_ option for > >> this feature of ksoftirqd_running() so that it can be disabled if needed > >> by users like us? > > > > If at all then a sysctl to allow runtime control. > > > > ksoftirqd might be spuriously scheduled from tx path, when > __qdisc_run() also reacts to need_resched(). > > By raising NET_TX while we are processing NET_RX (say we send a TCP ACK packet > in response to incoming packet), we force __do_softirq() to perform > another loop, but before doing an other round, it will also check > need_resched() > and eventually call wakeup_softirqd() > > I wonder if following patch makes any difference. > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index > 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b > 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q) > int packets; > > while (qdisc_restart(q, &packets)) { > + if (qdisc_is_empty(q)) > + break; unfortunately its v4.14.55 and qdisc_is_empty() is not yet introduced. And I can not backport 28cff537ef2e ("net: sched: add empty status flag for NOLOCK qdisc") also as TCQ_F_NOLOCK is there. :( -- Regards Sudip
Re: regression with napi/softirq ?
On 7/17/19 11:52 PM, Thomas Gleixner wrote: > Sudip, > > On Wed, 17 Jul 2019, Sudip Mukherjee wrote: >> On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner wrote: >>> You can hack ksoftirq_running() to return always false to avoid this, but >>> that might cause application starvation and a huge packet buffer backlog >>> when the amount of incoming packets makes the CPU do nothing else than >>> softirq processing. >> >> I tried that now, it is better but still not as good as v3.8 >> Now I am getting 375.9usec as the maximum time between raising the softirq >> and it starting to execute and packet drops still there. >> >> And just a thought, do you think there should be a CONFIG_ option for >> this feature of ksoftirqd_running() so that it can be disabled if needed >> by users like us? > > If at all then a sysctl to allow runtime control. > >> Can you please think of anything else that might have changed which I still >> need >> to change to make the time comparable to v3.8.. > > Something with in that small range of: > > 63592 files changed, 13783320 insertions(+), 5155492 deletions(-) > > :) > > Seriously, that can be anything. > > Can you please test with Linus' head of tree and add some more > instrumentation, so we can see what holds off softirqs from being > processed. If the ksoftirqd enforcement is disabled, then the only reason > can be a long lasting softirq disabled region. Tracing should tell. ksoftirqd might be spuriously scheduled from tx path, when __qdisc_run() also reacts to need_resched(). By raising NET_TX while we are processing NET_RX (say we send a TCP ACK packet in response to incoming packet), we force __do_softirq() to perform another loop, but before doing an other round, it will also check need_resched() and eventually call wakeup_softirqd() I wonder if following patch makes any difference. diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q) int packets; while (qdisc_restart(q, &packets)) { + if (qdisc_is_empty(q)) + break; /* * Ordered by possible occurrence: Postpone processing if * 1. we've exceeded packet quota
Re: regression with napi/softirq ?
Sudip, On Wed, 17 Jul 2019, Sudip Mukherjee wrote: > On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner wrote: > > You can hack ksoftirq_running() to return always false to avoid this, but > > that might cause application starvation and a huge packet buffer backlog > > when the amount of incoming packets makes the CPU do nothing else than > > softirq processing. > > I tried that now, it is better but still not as good as v3.8 > Now I am getting 375.9usec as the maximum time between raising the softirq > and it starting to execute and packet drops still there. > > And just a thought, do you think there should be a CONFIG_ option for > this feature of ksoftirqd_running() so that it can be disabled if needed > by users like us? If at all then a sysctl to allow runtime control. > Can you please think of anything else that might have changed which I still > need > to change to make the time comparable to v3.8.. Something with in that small range of: 63592 files changed, 13783320 insertions(+), 5155492 deletions(-) :) Seriously, that can be anything. Can you please test with Linus' head of tree and add some more instrumentation, so we can see what holds off softirqs from being processed. If the ksoftirqd enforcement is disabled, then the only reason can be a long lasting softirq disabled region. Tracing should tell. Thanks, tglx
Re: regression with napi/softirq ?
Hi, On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner wrote: > > On Wed, 17 Jul 2019, Sudip Mukherjee wrote: > > I am using v4.14.55 on an Intel Atom based board and I am seeing network > > packet drops frequently on wireshark logs. After lots of debugging it > > seems that when this happens softirq is taking huge time to start after > > it has been raised. This is a small snippet from ftrace: > > > ><...>-2110 [001] dNH1 466.634916: irq_handler_entry: irq=126 > > name=eth0-TxRx-0 > ><...>-2110 [001] dNH1 466.634917: softirq_raise: vec=3 > > [action=NET_RX] > ><...>-2110 [001] dNH1 466.634918: irq_handler_exit: irq=126 > > ret=handled > > ksoftirqd/1-15[001] ..s. 466.635826: softirq_entry: vec=3 > > [action=NET_RX] > > ksoftirqd/1-15[001] ..s. 466.635852: softirq_exit: vec=3 > > [action=NET_RX] > > ksoftirqd/1-15[001] d.H. 466.635856: irq_handler_entry: irq=126 > > name=eth0-TxRx-0 > > ksoftirqd/1-15[001] d.H. 466.635857: softirq_raise: vec=3 > > [action=NET_RX] > > ksoftirqd/1-15[001] d.H. 466.635858: irq_handler_exit: irq=126 > > ret=handled > > ksoftirqd/1-15[001] ..s. 466.635860: softirq_entry: vec=3 > > [action=NET_RX] > > ksoftirqd/1-15[001] ..s. 466.635863: softirq_exit: vec=3 > > [action=NET_RX] > > > > So, softirq was raised at 466.634917 but it started at 466.635826 almost > > 909 usec after it was raised. > > This is a situation where the network softirq decided to delegate softirq > processing to ksoftirqd. That happens when too much work is available while > processing softirqs on return from interrupt. > > That means that softirq processing happens under scheduler control. So if > there are other runnable tasks on the same CPU ksoftirqd can be delayed > until their time slice expired. As a consequence ksoftirqd might not be > able to catch up with the incoming packet flood and the NIC starts to drop. Yes, and I see in the ftrace that there are many other userspace processes getting scheduled in that time. > > You can hack ksoftirq_running() to return always false to avoid this, but > that might cause application starvation and a huge packet buffer backlog > when the amount of incoming packets makes the CPU do nothing else than > softirq processing. I tried that now, it is better but still not as good as v3.8 Now I am getting 375.9usec as the maximum time between raising the softirq and it starting to execute and packet drops still there. And just a thought, do you think there should be a CONFIG_ option for this feature of ksoftirqd_running() so that it can be disabled if needed by users like us? Can you please think of anything else that might have changed which I still need to change to make the time comparable to v3.8.. -- Regards Sudip
Re: regression with napi/softirq ?
On Wed, 17 Jul 2019, Sudip Mukherjee wrote: > I am using v4.14.55 on an Intel Atom based board and I am seeing network > packet drops frequently on wireshark logs. After lots of debugging it > seems that when this happens softirq is taking huge time to start after > it has been raised. This is a small snippet from ftrace: > ><...>-2110 [001] dNH1 466.634916: irq_handler_entry: irq=126 > name=eth0-TxRx-0 ><...>-2110 [001] dNH1 466.634917: softirq_raise: vec=3 > [action=NET_RX] ><...>-2110 [001] dNH1 466.634918: irq_handler_exit: irq=126 > ret=handled > ksoftirqd/1-15[001] ..s. 466.635826: softirq_entry: vec=3 > [action=NET_RX] > ksoftirqd/1-15[001] ..s. 466.635852: softirq_exit: vec=3 > [action=NET_RX] > ksoftirqd/1-15[001] d.H. 466.635856: irq_handler_entry: irq=126 > name=eth0-TxRx-0 > ksoftirqd/1-15[001] d.H. 466.635857: softirq_raise: vec=3 > [action=NET_RX] > ksoftirqd/1-15[001] d.H. 466.635858: irq_handler_exit: irq=126 > ret=handled > ksoftirqd/1-15[001] ..s. 466.635860: softirq_entry: vec=3 > [action=NET_RX] > ksoftirqd/1-15[001] ..s. 466.635863: softirq_exit: vec=3 > [action=NET_RX] > > So, softirq was raised at 466.634917 but it started at 466.635826 almost > 909 usec after it was raised. This is a situation where the network softirq decided to delegate softirq processing to ksoftirqd. That happens when too much work is available while processing softirqs on return from interrupt. That means that softirq processing happens under scheduler control. So if there are other runnable tasks on the same CPU ksoftirqd can be delayed until their time slice expired. As a consequence ksoftirqd might not be able to catch up with the incoming packet flood and the NIC starts to drop. You can hack ksoftirq_running() to return always false to avoid this, but that might cause application starvation and a huge packet buffer backlog when the amount of incoming packets makes the CPU do nothing else than softirq processing. Thanks, tglx