Hi, On what kernel version should I try the patch ? I tried on 5.9 and it doesn't build.
> On 18 Nov 2020, at 20:47, Rafael J. Wysocki <r...@rjwysocki.net> wrote: > > On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote: >> On 11/16/2020 8:11 AM, Andrei Popa wrote: >>> Hello, >>> >>> After an update from vmlinuz-4.15.0-106-generic to vmlinuz-5.4.0-37-generic >>> we experience, on a number of servers, a very high number of >>> rx_missed_errors and dropped packets only on the uplink 10G interface. We >>> have another 10G downlink interface with no problems. >>> >>> The affected servers have the following mainboards: >>> S5520HC ver E26045-455 >>> S5520UR ver E22554-751 >>> S5520UR ver E22554-753 >>> S5000VSA >>> >>> On other 30 servers with similar mainboards and/or configs there are no >>> dropped packets with vmlinuz-5.4.0-37-generic. >>> >>> We’ve installed vanilla 4.16 and there were no dropped packets. >>> Vanilla 4.17 had a very high number of dropped packets like the following: >>> >>> root@shaper:~# cat test >>> #!/bin/bash >>> while true >>> do >>> ethtool -S ens6f1|grep "missed_errors" >>> ifconfig ens6f1|grep RX|grep dropped >>> sleep 1 >>> done >>> >>> root@shaper:~# ./test >>> rx_missed_errors: 2418845 >>> RX errors 0 dropped 2418888 overruns 0 frame 0 >>> rx_missed_errors: 2426175 >>> RX errors 0 dropped 2426218 overruns 0 frame 0 >>> rx_missed_errors: 2431910 >>> RX errors 0 dropped 2431953 overruns 0 frame 0 >>> rx_missed_errors: 2437266 >>> RX errors 0 dropped 2437309 overruns 0 frame 0 >>> rx_missed_errors: 2443305 >>> RX errors 0 dropped 2443348 overruns 0 frame 0 >>> rx_missed_errors: 2448357 >>> RX errors 0 dropped 2448400 overruns 0 frame 0 >>> rx_missed_errors: 2452539 >>> RX errors 0 dropped 2452582 overruns 0 frame 0 >>> >>> We did a git bisect and we’ve found that the following commit generates the >>> high number of dropped packets: >>> >>> Author: Rafael J. Wysocki <rafael.j.wyso...@intel.com >>> <mailto:rafael.j.wyso...@intel.com>> >>> Date: Thu Apr 5 19:12:43 2018 +0200 >>> cpuidle: menu: Avoid selecting shallow states with stopped tick >>> If the scheduler tick has been stopped already and the governor >>> selects a shallow idle state, the CPU can spend a long time in that >>> state if the selection is based on an inaccurate prediction of idle >>> time. That effect turns out to be relevant, so it needs to be >>> mitigated. >>> To that end, modify the menu governor to discard the result of the >>> idle time prediction if the tick is stopped and the predicted idle >>> time is less than the tick period length, unless the tick timer is >>> going to expire soon. >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com >>> <mailto:rafael.j.wyso...@intel.com>> >>> Acked-by: Peter Zijlstra (Intel) <pet...@infradead.org >>> <mailto:pet...@infradead.org>> >>> diff --git a/drivers/cpuidle/governors/menu.c >>> b/drivers/cpuidle/governors/menu.c >>> index 267982e471e0..1bfe03ceb236 100644 >>> --- a/drivers/cpuidle/governors/menu.c >>> +++ b/drivers/cpuidle/governors/menu.c >>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, >>> struct cpuidle_device *dev, >>> */ >>> data->predicted_us = min(data->predicted_us, expected_interval); >>> - /* >>> - * Use the performance multiplier and the user-configurable >>> - * latency_req to determine the maximum exit latency. >>> - */ >>> - interactivity_req = data->predicted_us / >>> performance_multiplier(nr_iowaiters, cpu_load); >>> - if (latency_req > interactivity_req) >>> - latency_req = interactivity_req; >> >> The tick_nohz_tick_stopped() check may be done after the above and it >> may be reworked a bit. >> >> I'll send a test patch to you shortly. > > The patch is appended, but please note that it has been rebased by hand and > not tested. > > Please let me know if it makes any difference. > > And in the future please avoid pasting the entire kernel config to your > reports, that's problematic. > > --- > drivers/cpuidle/governors/menu.c | 23 ++++++++++++----------- > 1 file changed, 12 insertions(+), 11 deletions(-) > > Index: linux-pm/drivers/cpuidle/governors/menu.c > =================================================================== > --- linux-pm.orig/drivers/cpuidle/governors/menu.c > +++ linux-pm/drivers/cpuidle/governors/menu.c > @@ -308,18 +308,18 @@ static int menu_select(struct cpuidle_dr > get_typical_interval(data, predicted_us)) * > NSEC_PER_USEC; > > - if (tick_nohz_tick_stopped()) { > - /* > - * If the tick is already stopped, the cost of possible short > - * idle duration misprediction is much higher, because the CPU > - * may be stuck in a shallow idle state for a long time as a > - * result of it. In that case say we might mispredict and use > - * the known time till the closest timer event for the idle > - * state selection. > - */ > - if (data->predicted_us < TICK_USEC) > - data->predicted_us = min_t(unsigned int, TICK_USEC, > - ktime_to_us(delta_next)); > + /* > + * If the tick is already stopped, the cost of possible short idle > + * duration misprediction is much higher, because the CPU may be stuck > + * in a shallow idle state for a long time as a result of it. In that > + * case, say we might mispredict and use the known time till the closest > + * timer event for the idle state selection, unless that event is going > + * to occur within the tick time frame (in which case the CPU will be > + * woken up from whatever idle state it gets into soon enough anyway). > + */ > + if (tick_nohz_tick_stopped() && data->predicted_us < TICK_USEC && > + delta_next >= TICK_NSEC) { > + data->predicted_us = ktime_to_us(delta_next); > } else { > /* > * Use the performance multiplier and the user-configurable