Hi,

On what kernel version should I try the patch ? I tried on 5.9 and it doesn't 
build.

> On 18 Nov 2020, at 20:47, Rafael J. Wysocki <r...@rjwysocki.net> wrote:
> 
> On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote:
>> On 11/16/2020 8:11 AM, Andrei Popa wrote:
>>> Hello,
>>> 
>>> After an update from vmlinuz-4.15.0-106-generic to vmlinuz-5.4.0-37-generic 
>>> we experience, on a  number of servers, a very high number of 
>>> rx_missed_errors and dropped packets only on the uplink 10G interface. We 
>>> have another 10G downlink interface with no problems.
>>> 
>>> The affected servers have the following mainboards:
>>> S5520HC ver E26045-455
>>> S5520UR ver E22554-751
>>> S5520UR ver E22554-753
>>> S5000VSA
>>> 
>>> On other 30 servers with similar mainboards and/or configs there are no 
>>> dropped packets with vmlinuz-5.4.0-37-generic.
>>> 
>>> We’ve installed vanilla 4.16 and there were no dropped packets.
>>> Vanilla 4.17 had a very high number of dropped packets like the following:
>>> 
>>> root@shaper:~# cat test
>>> #!/bin/bash
>>> while true
>>> do
>>> ethtool -S ens6f1|grep "missed_errors"
>>> ifconfig ens6f1|grep RX|grep dropped
>>> sleep 1
>>> done
>>> 
>>> root@shaper:~# ./test
>>>      rx_missed_errors: 2418845
>>>         RX errors 0  dropped 2418888  overruns 0  frame 0
>>>      rx_missed_errors: 2426175
>>>         RX errors 0  dropped 2426218  overruns 0  frame 0
>>>      rx_missed_errors: 2431910
>>>         RX errors 0  dropped 2431953  overruns 0  frame 0
>>>      rx_missed_errors: 2437266
>>>         RX errors 0  dropped 2437309  overruns 0  frame 0
>>>      rx_missed_errors: 2443305
>>>         RX errors 0  dropped 2443348  overruns 0  frame 0
>>>      rx_missed_errors: 2448357
>>>         RX errors 0  dropped 2448400  overruns 0  frame 0
>>>      rx_missed_errors: 2452539
>>>         RX errors 0  dropped 2452582  overruns 0  frame 0
>>> 
>>> We did a git bisect and we’ve found that the following commit generates the 
>>> high number of dropped packets:
>>> 
>>> Author: Rafael J. Wysocki <rafael.j.wyso...@intel.com 
>>> <mailto:rafael.j.wyso...@intel.com>>
>>> Date:   Thu Apr 5 19:12:43 2018 +0200
>>>     cpuidle: menu: Avoid selecting shallow states with stopped tick
>>>     If the scheduler tick has been stopped already and the governor
>>>     selects a shallow idle state, the CPU can spend a long time in that
>>>     state if the selection is based on an inaccurate prediction of idle
>>>     time.  That effect turns out to be relevant, so it needs to be
>>>     mitigated.
>>>     To that end, modify the menu governor to discard the result of the
>>>     idle time prediction if the tick is stopped and the predicted idle
>>>     time is less than the tick period length, unless the tick timer is
>>>     going to expire soon.
>>>     Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com 
>>> <mailto:rafael.j.wyso...@intel.com>>
>>>     Acked-by: Peter Zijlstra (Intel) <pet...@infradead.org 
>>> <mailto:pet...@infradead.org>>
>>> diff --git a/drivers/cpuidle/governors/menu.c 
>>> b/drivers/cpuidle/governors/menu.c
>>> index 267982e471e0..1bfe03ceb236 100644
>>> --- a/drivers/cpuidle/governors/menu.c
>>> +++ b/drivers/cpuidle/governors/menu.c
>>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, 
>>> struct cpuidle_device *dev,
>>>          */
>>>         data->predicted_us = min(data->predicted_us, expected_interval);
>>> -       /*
>>> -        * Use the performance multiplier and the user-configurable
>>> -        * latency_req to determine the maximum exit latency.
>>> -        */
>>> -       interactivity_req = data->predicted_us / 
>>> performance_multiplier(nr_iowaiters, cpu_load);
>>> -       if (latency_req > interactivity_req)
>>> -               latency_req = interactivity_req;
>> 
>> The tick_nohz_tick_stopped() check may be done after the above and it 
>> may be reworked a bit.
>> 
>> I'll send a test patch to you shortly.
> 
> The patch is appended, but please note that it has been rebased by hand and
> not tested.
> 
> Please let me know if it makes any difference.
> 
> And in the future please avoid pasting the entire kernel config to your
> reports, that's problematic.
> 
> ---
> drivers/cpuidle/governors/menu.c |   23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -308,18 +308,18 @@ static int menu_select(struct cpuidle_dr
>                               get_typical_interval(data, predicted_us)) *
>                               NSEC_PER_USEC;
> 
> -     if (tick_nohz_tick_stopped()) {
> -             /*
> -              * If the tick is already stopped, the cost of possible short
> -              * idle duration misprediction is much higher, because the CPU
> -              * may be stuck in a shallow idle state for a long time as a
> -              * result of it.  In that case say we might mispredict and use
> -              * the known time till the closest timer event for the idle
> -              * state selection.
> -              */
> -             if (data->predicted_us < TICK_USEC)
> -                     data->predicted_us = min_t(unsigned int, TICK_USEC,
> -                                                ktime_to_us(delta_next));
> +     /*
> +      * If the tick is already stopped, the cost of possible short idle
> +      * duration misprediction is much higher, because the CPU may be stuck
> +      * in a shallow idle state for a long time as a result of it.  In that
> +      * case, say we might mispredict and use the known time till the closest
> +      * timer event for the idle state selection, unless that event is going
> +      * to occur within the tick time frame (in which case the CPU will be
> +      * woken up from whatever idle state it gets into soon enough anyway).
> +      */
> +     if (tick_nohz_tick_stopped() && data->predicted_us < TICK_USEC &&
> +         delta_next >= TICK_NSEC) {
> +             data->predicted_us = ktime_to_us(delta_next);
>       } else {
>               /*
>                * Use the performance multiplier and the user-configurable

Reply via email to