On 12/14/23 12:15, Kevin Traynor wrote:
> On 11/10/2023 20:08, Ilya Maximets wrote:
>> On 9/29/23 14:50, Kevin Traynor wrote:
>>> Extend 'pmd-sleep-max' so that individual PMD thread cores may have
>>> a specified max sleep request value.
>>>
>>> Existing behaviour is maintained.
>>>
>>> Any PMD thread core without a value will use the global value if
>>> set or default no sleep.
>>>
>>> To set PMD thread cores 8 and 9 to never request a load based
>>> sleep and all other PMD thread cores to be able to request a max
>>> sleep of 50 usecs:
>>>
>>> $ ovs-vsctl set open_vswitch .
>>> other_config:pmd-sleep-max=50,8:0,9:0
>>>
>>> To set PMD thread cores 10 and 11 to request a max sleep of 100
>>> usecs and all other PMD thread cores to never request a sleep:
>>>
>>> $ ovs-vsctl set open_vswitch .
>>> other_config:pmd-sleep-max=10:100,11:100
>>>
>>> 'pmd-sleep-show' is updated to show the max sleep value for each
>>> PMD thread.
>>>
>>> Signed-off-by: Kevin Traynor <ktray...@redhat.com> ---
>>
>> Hi, Kevin.  Thanks for the new version!
>>
> 
> Hi Ilya, thanks for reviewing.
> 
>> I see a very strange 1.5% preformance degradation in a V-to-V 
>> scenario with this patch applied.  Could you, please, check?
>>
> 
> I tested V-2-V and I don't see a drop. I only see <=0.5% difference 
> between all tests. Sometimes with the patch is better, sometimes 
> without, so I think all I see is test/measurement variance.
> 
> | Patch | sleep max | tput kpps| cyc/pkt | % tput drop | % cyc/pkt inc |
> |-------|-----------|----------|---------|-------------|---------------|
> | No    |         0 |     7263 |     358 |             |               |
> | No    |       100 |     7260 |     358 |             |               |
> | Yes   |         0 |     7293 |     357 | -0.41%      | -0.28%        |
> | Yes   |       100 |     7292 |     357 | -0.44%      | -0.28%        |
> 
> 
> I also tested V-2-V-2-OVSDROP and different combinations of single and 
> multiple pmds (single and multple rxqs on pmd) and see the same behaviour.

OK.  Thanks for checking!  I'll try it again with v6, my testing is
not very scientific, so can be an issue in my setup.

> 
>> Also, this patch needs a NEWS entry.
>>
> 
> Added in v6. Reworked other comments, replied to a few of them below. 
> (New TB seems to have mangled the text a bit)

Try checking these options (usual suspects):
  https://support.mozilla.org/bm/questions/1367660#answer-1482876

> 
> thanks,
> Kevin.
> 
>> Some other comments inline.
>>
>>> Documentation/topics/dpdk/pmd.rst |  34 +++- 
>>> lib/dpif-netdev-private-thread.h  |   3 + lib/dpif-netdev.c
>>> | 267 ++++++++++++++++++++++++--- tests/pmd.at
>>> | 292 ++++++++++++++++++++++++++++-- vswitchd/vswitch.xml
>>> |  32 +++- 5 files changed, 579 insertions(+), 49 deletions(-)
>>>
>>> diff --git a/Documentation/topics/dpdk/pmd.rst
>>> b/Documentation/topics/dpdk/pmd.rst index f43819be0..dd6ee46bd
>>> 100644 --- a/Documentation/topics/dpdk/pmd.rst +++
>>> b/Documentation/topics/dpdk/pmd.rst @@ -354,8 +354,4 @@ time not
>>> processing packets will be determined by the sleep and processor 
>>> wake-up times and should be tested with each system configuration.
>>>
>>> -The current configuration of the PMD load based sleeping can be
>>> shown with:: - -    $ ovs-appctl dpif-netdev/pmd-sleep-show - Sleep
>>> time statistics for 10 secs can be seen with::
>>>
>>> @@ -380,4 +376,34 @@ system configuration (e.g. enabling processor
>>> C-states) and workloads. rate.
>>>
>>> +Maximum sleep values can also be set for individual PMD threads
>>> using +key:value pairs in the form of core:max_sleep. Any PMD
>>> thread that has been +assigned a specified value will use that. Any
>>> PMD thread that does not have +a specified value will use the
>>> current global value. + +Specified values for individual PMD
>>> threads can be added or removed at +any time. + +For example, to
>>> set PMD threads on cores 8 and 9 to never request a load based 
>>> +sleep and all others PMD threads to be able to request a max sleep
>>> of +50 microseconds (us):: + +    $ ovs-vsctl set open_vswitch .
>>> other_config:pmd-sleep-max=50,8:0,9:0 + +The max sleep value for
>>> each PMD threads can be checked in the logs or with:: + +    $
>>> ovs-appctl dpif-netdev/pmd-sleep-show +    pmd thread numa_id 0
>>> core_id 8: +      max sleep:    0 us +    pmd thread numa_id 1
>>> core_id 9: +      max sleep:    0 us +    pmd thread numa_id 0
>>> core_id 10: +      max sleep:   50 us +    pmd thread numa_id 1
>>> core_id 11: +      max sleep:   50 us +    pmd thread numa_id 0
>>> core_id 12: +      max sleep:   50 us +    pmd thread numa_id 1
>>> core_id 13: +      max sleep:   50 us + .. _ovs-vswitchd(8): 
>>> http://openvswitch.org/support/dist-docs/ovs-vswitchd.8.html diff
>>> --git a/lib/dpif-netdev-private-thread.h
>>> b/lib/dpif-netdev-private-thread.h index 1ec3cd794..cb18e5def
>>> 100644 --- a/lib/dpif-netdev-private-thread.h +++
>>> b/lib/dpif-netdev-private-thread.h @@ -181,4 +181,7 @@ struct
>>> dp_netdev_pmd_thread { bool isolated;
>>>
>>> +    /* Max sleep request in microseconds.*/
>>
>> A space after a period.
>>
>>> +    atomic_uint64_t max_sleep; + /* Queue id used by this pmd
>>> thread to send packets on all netdevs if * XPS disabled for this
>>> netdev. All static_tx_qid's are unique and less diff --git
>>> a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 157694bcf..72ee53a02
>>> 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -180,4
>>> +180,9 @@ static struct odp_support dp_netdev_support = { #define
>>> PMD_SLEEP_INC_US 1
>>>
>>> +struct pmd_sleep { +    unsigned core_id; +    uint64_t
>>> max_sleep; +}; + struct dpcls { struct cmap_node node;      /*
>>> Within dp_netdev_pmd_thread.classifiers */ @@ -289,5 +294,5 @@
>>> struct dp_netdev { atomic_bool pmd_perf_metrics; /* Max load based
>>> sleep request. */
>>
>> We should mention that it is a default one.
>>
> 
> done
> 
>>> -    atomic_uint64_t pmd_max_sleep; +    uint64_t pmd_max_sleep;
>>
>> Maybe rename as well, but I'm not sure abouot that.
>>
> 
> Yeah, it probably makes it a bit more descriptive where it's used 
> elsewhere in the code. Renamed to pmd_max_sleep_default.
> 
>>> /* Enable the SMC cache from ovsdb config */ atomic_bool
>>> smc_enable_db; @@ -327,4 +332,7 @@ struct dp_netdev { char
>>> *pmd_cmask;
>>>
>>> +    /* PMD load based max sleep request user string. */ +    char
>>> *max_sleep_list; + uint64_t last_tnl_conf_seq;
>>>
>>> @@ -1429,4 +1437,17 @@ dpif_netdev_pmd_rebalance(struct
>>> unixctl_conn *conn, int argc, }
>>>
>>> +static void +pmd_info_show_sleep(struct ds *reply, unsigned
>>> core_id, int numa_id, +                    uint64_t pmd_max_sleep) 
>>> +{ +    if (core_id == NON_PMD_CORE_ID) { +        return; +    } +
>>> ds_put_format(reply, +                  "pmd thread numa_id %d
>>> core_id %d:\n" +                  "  max sleep: %4"PRIu64" us\n", +
>>> numa_id, core_id, pmd_max_sleep); +} + static void 
>>> dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const
>>> char *argv[], @@ -1443,8 +1464,7 @@ dpif_netdev_pmd_info(struct
>>> unixctl_conn *conn, int argc, const char *argv[], unsigned long
>>> long max_secs = (PMD_INTERVAL_LEN * PMD_INTERVAL_MAX) /
>>> INTERVAL_USEC_TO_SEC; -    uint64_t default_max_sleep = 0; +
>>> uint64_t max_sleep; bool show_header = true;
>>
>> Reverse x-mass tree.
>>
>>>
>>> - ovs_mutex_lock(&dp_netdev_mutex);
>>>
>>> @@ -1513,10 +1533,11 @@ dpif_netdev_pmd_info(struct unixctl_conn
>>> *conn, int argc, const char *argv[], } else if (type ==
>>> PMD_INFO_SLEEP_SHOW) { if (show_header) { -
>>> atomic_read_relaxed(&dp->pmd_max_sleep, &default_max_sleep); -
>>> ds_put_format(&reply, "Default max sleep: %4"PRIu64" us", -
>>> default_max_sleep); -                ds_put_cstr(&reply, "\n"); +
>>> ds_put_format(&reply, "Default max sleep: %4"PRIu64" us\n", +
>>> dp->pmd_max_sleep); show_header = false; } +
>>> atomic_read_relaxed(&pmd->max_sleep, &max_sleep); +
>>> pmd_info_show_sleep(&reply, pmd->core_id, pmd->numa_id, +
>>> max_sleep); } } @@ -1907,4 +1928,6 @@ create_dp_netdev(const char
>>> *name, const struct dpif_class *class, }
>>>
>>> +    dp->max_sleep_list = NULL; + dp->last_tnl_conf_seq =
>>> seq_read(tnl_conf_seq); *dpp = dp; @@ -2016,4 +2039,5 @@
>>> dp_netdev_free(struct dp_netdev *dp) dp_netdev_meter_destroy(dp);
>>>
>>> +    free(dp->max_sleep_list); free(dp->pmd_cmask); 
>>> free(CONST_CAST(char *, dp->name)); @@ -4844,4 +4868,206 @@
>>> set_pmd_auto_lb(struct dp_netdev *dp, bool state, bool always_log) 
>>> }
>>>
>>> +static int +parse_pmd_sleep_list(const char *max_sleep_list, +
>>> struct pmd_sleep **pmd_sleeps) +{ +    char *list, *copy, *key,
>>> *value; +    int num_vals = 0; + +    if (!max_sleep_list) { +
>>> return num_vals; +    } + +    list = copy =
>>> xstrdup(max_sleep_list); + +    while
>>> (ofputil_parse_key_value(&list, &key, &value)) { +        char
>>> *error = NULL; +        unsigned core; +        uint64_t temp,
>>> pmd_max_sleep; +        int i;
>>
>> Reverse x-mass tree;
>>
>>> + +        error = str_to_u64(key, &temp); +        if (error) { +
>>> free(error); +            continue; +        } + +        error =
>>> str_to_u64(value, &pmd_max_sleep); +        if (error) { +
>>> /* No value specified. key is dp default. */
>>
>> This is not fully correct.  str_to_u64() may fail for varius reasons.
>> The way to check that value is not specified is to check that it is a
>> zero-length string, i.e. value[0] == '\0'. If it is not, but
>> str_to_u64() fails, then it is a formatting error, i.e. user put
>> something weird there and we should not use that value.  Current code
>> will use the 'key' as a default sleep value in this case.  That
>> doesn't seem right. Maybe add a test case '50,1:qwe,2:0' and check
>> that defualt value is 50?
>>
> 
> good catch, thanks. It is definitely the wrong way to handle that input. 
> I've reworked this in v6 and added a couple of tests for it.
> 
>>> +            core = UINT_MAX; +            pmd_max_sleep = temp; +
>>> free(error); +        } else { +            /* Value specified. key
>>> is pmd core id.*/ +            if (temp >= UINT_MAX) { +
>>> continue; +            } +            core = (unsigned) temp; +
>>> } + +        /* Detect duplicate max sleep values. */ +        for
>>> (i = 0; i < num_vals; i++) { +            if
>>> ((*pmd_sleeps)[i].core_id == core) { +                break; +
>>> } +        } +        if (i == num_vals) { +            /* Not
>>> duplicate, add a new entry. */ +            *pmd_sleeps =
>>> xrealloc(*pmd_sleeps, +                                   (num_vals
>>> + 1) * sizeof **pmd_sleeps); +            num_vals++; +        } + 
>>> +        pmd_max_sleep = MIN(PMD_RCU_QUIESCE_INTERVAL,
>>> pmd_max_sleep); + +        (*pmd_sleeps)[i].core_id = core; +
>>> (*pmd_sleeps)[i].max_sleep = pmd_max_sleep; +    } + +
>>> free(copy); +    return num_vals; +} + +static void 
>>> +log_pmd_sleep(unsigned core_id, int numa_id, uint64_t
>>> pmd_max_sleep) +{ +    if (core_id == NON_PMD_CORE_ID) { +
>>> return; +    } +    VLOG_INFO("PMD thread on numa_id: %d, core id:
>>> %2d, " +              "max sleep: %4"PRIu64" us.", numa_id,
>>> core_id, pmd_max_sleep); +} + +static void 
>>> +pmd_init_max_sleep(struct dp_netdev *dp, struct
>>> dp_netdev_pmd_thread *pmd) +{ +    uint64_t max_sleep =
>>> dp->pmd_max_sleep; +    struct pmd_sleep *pmd_sleeps = NULL; +
>>> int num_vals; + +    num_vals =
>>> parse_pmd_sleep_list(dp->max_sleep_list, &pmd_sleeps); + +    /*
>>> Check if the user has set a specific value for this pmd. */ +
>>> for (int i = 0; i < num_vals; i++) { +        if
>>> (pmd_sleeps[i].core_id == pmd->core_id) { +            max_sleep =
>>> pmd_sleeps[i].max_sleep; +            break; +        } +    } +
>>> atomic_init(&pmd->max_sleep, max_sleep); +
>>> log_pmd_sleep(pmd->core_id, pmd->numa_id, max_sleep); +
>>> free(pmd_sleeps); +} + +static bool 
>>> +assign_sleep_values_to_pmds(struct dp_netdev *dp, int num_vals, +
>>> struct pmd_sleep *pmd_sleeps) +{ +    struct dp_netdev_pmd_thread
>>> *pmd; +    bool value_changed = false; + +    CMAP_FOR_EACH (pmd,
>>> node, &dp->poll_threads) { +        uint64_t new_max_sleep,
>>> cur_pmd_max_sleep; + +        if (pmd->core_id == NON_PMD_CORE_ID)
>>> { +            continue; +        } + +        /* Default to global
>>> value. */ +        new_max_sleep = dp->pmd_max_sleep; + +        /*
>>> Check for pmd specific value. */ +        for (int i = 0;  i <
>>> num_vals; i++) { +            if (pmd->core_id ==
>>> pmd_sleeps[i].core_id) { +                new_max_sleep =
>>> pmd_sleeps[i].max_sleep; +                break; +            } +
>>> } +        atomic_read_relaxed(&pmd->max_sleep,
>>> &cur_pmd_max_sleep); +        if (new_max_sleep !=
>>> cur_pmd_max_sleep) { +
>>> atomic_store_relaxed(&pmd->max_sleep, new_max_sleep); +
>>> value_changed = true; +        } +    } +    return value_changed; 
>>> +} + +static void +log_all_pmd_sleeps(struct dp_netdev *dp) +{ +
>>> struct dp_netdev_pmd_thread **pmd_list = NULL; +    struct
>>> dp_netdev_pmd_thread *pmd; +    size_t n; + +    VLOG_INFO("Default
>>> PMD thread max sleep: %4"PRIu64" us.", +
>>> dp->pmd_max_sleep); + +    sorted_poll_thread_list(dp, &pmd_list,
>>> &n); + +    for (size_t i = 0; i < n; i++) { +        uint64_t
>>> cur_pmd_max_sleep; + +        pmd = pmd_list[i]; +
>>> atomic_read_relaxed(&pmd->max_sleep, &cur_pmd_max_sleep); +
>>> log_pmd_sleep(pmd->core_id, pmd->numa_id, cur_pmd_max_sleep); +
>>> } +    free(pmd_list); +} + +static bool 
>>> +set_all_pmd_max_sleeps(struct dp_netdev *dp, const struct smap
>>> *config) +{ +    const char *max_sleep_list = smap_get(config,
>>> "pmd-sleep-max"); +    struct pmd_sleep *pmd_sleeps = NULL; +
>>> uint64_t default_max_sleep = 0; +    bool default_changed = false; 
>>> +    bool pmd_changed = false; +    uint64_t pmd_maxsleep; +    int
>>> num_vals = 0; + +    /* Check for deprecated 'pmd-maxsleep' value.
>>> */ +    pmd_maxsleep = smap_get_ullong(config, "pmd-maxsleep",
>>> UINT64_MAX); +    if (pmd_maxsleep != UINT64_MAX &&
>>> !max_sleep_list) { +        VLOG_WARN_ONCE("pmd-maxsleep is
>>> deprecated. " +                       "Please use pmd-sleep-max
>>> instead."); +        default_max_sleep = pmd_maxsleep; +    }
>>
>> The knob was deprecated in 3.2.  Can we just remove it now? (NEWS
>> entry will be needed.)
>>
> 
> I would be inclined to keep it for at least another release. So anyone 
> starting with 3.1 and then moving to 3.2 or 3.3 sees the deprecation at 
> that point. Then can remove in 3.4 (assuming no objections).

Sounds good to me.  3.3 is planned to become LTS, so it makes sense
to keep the option there and remove in 3.4 or later (but hopefully
before the next LTS).

> 
>>> + +    /* Check if there is no change in string or value. */ +
>>> if (!!dp->max_sleep_list == !!max_sleep_list) { +        if
>>> (max_sleep_list +            ?
>>> nullable_string_is_equal(max_sleep_list, dp->max_sleep_list) +
>>> : default_max_sleep == dp->pmd_max_sleep) { +            return
>>> false; +        } +    }
>>
>> If we remove the deprecated knob we may just compare strings here.
>>
>>> + +    /* Free existing string and copy new one (if any). */ +
>>> free(dp->max_sleep_list); +    dp->max_sleep_list =
>>> nullable_xstrdup(max_sleep_list); + +    if (max_sleep_list) { +
>>> num_vals = parse_pmd_sleep_list(max_sleep_list, &pmd_sleeps); + +
>>> /* Check if the user has set a global value. */ +        for (int i
>>> = 0; i < num_vals; i++) { +            if (pmd_sleeps[i].core_id ==
>>> UINT_MAX) { +                default_max_sleep =
>>> pmd_sleeps[i].max_sleep; +                break; +            } +
>>> } +    } + +    if (dp->pmd_max_sleep != default_max_sleep) { +
>>> dp->pmd_max_sleep = default_max_sleep; +        default_changed =
>>> true; +    } +    pmd_changed = assign_sleep_values_to_pmds(dp,
>>> num_vals, pmd_sleeps); + +    free(pmd_sleeps); +    return
>>> default_changed || pmd_changed; +} + /* Applies datapath
>>> configuration from the database. Some of the changes are * actually
>>> applied in dpif_netdev_run(). */ @@ -4861,5 +5087,4 @@
>>> dpif_netdev_set_config(struct dpif *dpif, const struct smap
>>> *other_config) uint8_t cur_rebalance_load; uint32_t rebalance_load,
>>> rebalance_improve; -    uint64_t  pmd_max_sleep,
>>> cur_pmd_max_sleep; bool log_autolb = false; enum
>>> sched_assignment_type pmd_rxq_assign_type; @@ -5012,24 +5237,10 @@
>>> dpif_netdev_set_config(struct dpif *dpif, const struct smap
>>> *other_config) set_pmd_auto_lb(dp, autolb_state, log_autolb);
>>>
>>> -    pmd_max_sleep = smap_get_ullong(other_config, "pmd-maxsleep",
>>> UINT64_MAX); -    if (pmd_max_sleep != UINT64_MAX) { -
>>> VLOG_WARN("pmd-maxsleep is deprecated. " -                  "Please
>>> use pmd-sleep-max instead."); -    } else { -        pmd_max_sleep
>>> = 0; +    bool sleep_changed = set_all_pmd_max_sleeps(dp,
>>> other_config); +    if (first_set_config || sleep_changed) { +
>>> log_all_pmd_sleeps(dp); }
>>>
>>> -    pmd_max_sleep = smap_get_ullong(other_config,
>>> "pmd-sleep-max", -
>>> pmd_max_sleep); -    pmd_max_sleep = MIN(PMD_RCU_QUIESCE_INTERVAL,
>>> pmd_max_sleep); -    atomic_read_relaxed(&dp->pmd_max_sleep,
>>> &cur_pmd_max_sleep); -    if (first_set_config || pmd_max_sleep !=
>>> cur_pmd_max_sleep) { -
>>> atomic_store_relaxed(&dp->pmd_max_sleep, pmd_max_sleep); -
>>> VLOG_INFO("PMD max sleep request is %"PRIu64" usecs.",
>>> pmd_max_sleep); -        VLOG_INFO("PMD load based sleeps are
>>> %s.", -                  pmd_max_sleep ? "enabled" : "disabled" ); 
>>> -    } - -    first_set_config  = false; +    first_set_config =
>>> false; return 0; } @@ -7060,5 +7271,5 @@ reload:
>>>
>>> atomic_read_relaxed(&pmd->dp->smc_enable_db,
>>> &pmd->ctx.smc_enable_db); -
>>> atomic_read_relaxed(&pmd->dp->pmd_max_sleep, &max_sleep); +
>>> atomic_read_relaxed(&pmd->max_sleep, &max_sleep);
>>>
>>> for (i = 0; i < poll_cnt; i++) { @@ -7647,4 +7858,6 @@
>>> dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct
>>> dp_netdev *dp, cmap_init(&pmd->tx_bonds);
>>>
>>> +    pmd_init_max_sleep(dp, pmd); + /* Initialize DPIF function
>>> pointer to the default configured version. */ 
>>> atomic_init(&pmd->netdev_input_func,
>>> dp_netdev_impl_get_default()); diff --git a/tests/pmd.at
>>> b/tests/pmd.at index 7bdaca9e7..205293982 100644 ---
>>> a/tests/pmd.at +++ b/tests/pmd.at @@ -61,12 +61,11 @@
>>> m4_define([CHECK_PMD_THREADS_CREATED], [ ])
>>>
>>> -dnl CHECK_DP_SLEEP_MAX([max_sleep], [enabled], [+line]) +dnl
>>> CHECK_DP_SLEEP_MAX([max_sleep], [+line]) dnl -dnl Checks correct
>>> pmd load based sleep is set for the datapath. +dnl Checks correct
>>> pmd load based sleep value  for the datapath.
>>
>> Double space.
>>
>> <snip>
>>
>> Best regards, Ilya Maximets.
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to