Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

Eelco Chaudron Wed, 30 Jun 2021 02:18:26 -0700


On 29 Jun 2021, at 18:32, Van Haaren, Harry wrote:


>> -----Original Message-----
>> From: dev <ovs-dev-boun...@openvswitch.org> On Behalf Of Eelco Chaudron
>> Sent: Tuesday, June 29, 2021 1:38 PM
>> To: Amber, Kumar <kumar.am...@intel.com>
>> Cc: d...@openvswitch.org; i.maxim...@ovn.org
>> Subject: Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select 
>> the best
>> mfex function
>>
>> More comments below. FYI I’m only reviewing right now, no testing.
>
> Sure, thanks for reviews.
>
>> On 17 Jun 2021, at 18:27, Kumar Amber wrote:
>
> <snip patch commit and some code>
>
>>> +/* Allocate per thread PMD pointer space for study_stats. */
>>> +static inline struct study_stats *
>>> +get_study_stats(void)
>>> +{
>>> +    struct study_stats *stats = study_stats_get();
>>> +    if (OVS_UNLIKELY(!stats)) {
>>> +       stats = xzalloc(sizeof *stats);
>>> +       study_stats_set_unsafe(stats);
>>> +    }
>>> +    return stats;
>>> +}
>>> +
>>
>> Just got a mind-meld with the code, and realized that the function might be 
>> different
>> per PMD thread due to this auto mode (and autovalidator mode in the previous
>> patch).
>>
>> This makes it only stronger that we need a way to see the currently selected 
>> mode,
>> and not per datapath, but per PMD per datapath!
>
> Study depends on the traffic pattern, so yes you're correct that it depends.
> The study command was added after community suggested user-experience
> would improve if the user doesn't have to provide an exact miniflow profile 
> name.
>
> Study studies the traffic running on that PMD, compares all MFEX impls, and 
> prints out
> hits. It selects the _first_ implementation that surpasses the threshold of 
> packets.
>
> Users are free to use the more specific names of MFEX impls instead of "study"
> for fine-grained control over the MFEX impl in use, e.g.
>
> ovs-appctl dpif-netdev/miniflow-parser-set avx512_vbmi_ipv4_udp
>
>> Do we also need a way to set this per PMD?
>
> I don't feel there is real value here, but we could investigate adding an
> optional parameter to the command indicating a PMD thread IDX to set?
> We have access to "pmd->core_id" in our set() function, so limiting changes
> to a specific PMD thread can be done ~ easily... but is it really required?
>
> Perfect is the enemy of good... I'd prefer focus on getting existing code 
> changes merged,
> and add additional (optional) parameters in future if deemed useful in real 
> world testing?

See Flavio’s reply, as those were the concerns same concerns I thought of.

>>> +uint32_t
>>> +mfex_study_traffic(struct dp_packet_batch *packets,
>>> +                   struct netdev_flow_key *keys,
>>> +                   uint32_t keys_size, odp_port_t in_port,
>>> +                   void *pmd_handle)
>>> +{
>>> +    uint32_t hitmask = 0;
>>> +    uint32_t mask = 0;
>>> +    struct dp_netdev_pmd_thread *pmd = pmd_handle;
>>> +    struct dpif_miniflow_extract_impl *miniflow_funcs;
>>> +    uint32_t impl_count = dpif_miniflow_extract_info_get(&miniflow_funcs);
>>> +    struct study_stats *stats = get_study_stats();
>>> +
>>> +    /* Run traffic optimized miniflow_extract to collect the hitmask
>>> +     * to be compared after certain packets have been hit to choose
>>> +     * the best miniflow_extract version for that traffic. */
>>> +    for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
>>> +        if (miniflow_funcs[i].available) {
>>> +            hitmask = miniflow_funcs[i].extract_func(packets, keys, 
>>> keys_size,
>>> +                                                     in_port, pmd_handle);
>>> +            stats->impl_hitcount[i] += count_1bits(hitmask);
>>> +
>>> +            /* If traffic is not classified than we dont overwrite the keys
>>> +             * array in minfiflow implementations so its safe to create a
>>> +             * mask for all those packets whose miniflow have been 
>>> created. */
>>> +            mask |= hitmask;
>>> +        }
>>> +    }
>>> +    stats->pkt_count += dp_packet_batch_size(packets);
>>> +
>>> +    /* Choose the best implementation after a minimum packets have been
>>> +     * processed. */
>>> +    if (stats->pkt_count >= MFEX_MAX_COUNT) {
>>> +        uint32_t best_func_index = MFEX_IMPL_START_IDX;
>>> +        uint32_t max_hits = 0;
>>> +        for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
>>> +            if (stats->impl_hitcount[i] > max_hits) {
>>> +                max_hits = stats->impl_hitcount[i];
>>> +                best_func_index = i;
>>> +            }
>>> +        }
>>> +
>>> +        if (max_hits >= MFEX_MIN_HIT_COUNT_FOR_USE) {
>>> +            /* Set the implementation to index with max_hits. */
>>> +            pmd->miniflow_extract_opt =
>>> +                        miniflow_funcs[best_func_index].extract_func;
>>> +            VLOG_INFO("MFEX study chose impl %s: (hits %d/%d pkts)\n",
>>> +                      miniflow_funcs[best_func_index].name, max_hits,
>>> +                      stats->pkt_count);
>>
>> We have no idea which PMD the mode is selected for guess we might need to add
>> this?
>>
>> Maybe we should report the numbers/hits for the other methods, as they might 
>> be
>> equal, and some might be faster in execution time?
>
> As above, the implementations are sorted in performance order. Performance
> here can be known by micro-benchmarks, and developers of such SIMD optimized
> code can be expected to know which impl is fastest.

Don’t think we can, as it’s not documented in the code, and some one can just 
add his own, and has no clue about the existing ones.

> In our current code, the avx512_vbmi_* impls are always before the avx512_*
> impls, as the VBMI instruction set allows a faster runtime.

Guess we need some documentation in the developer's section on how to add 
processor optimized functions, and how to benchmark them (and maybe some 
benchmark data for the current implementations).
Also, someone can write a sloppy avx512_vbmi* function that might be slower 
than an avx512_*, right?

> If really desired, we could dump the whole results of the MFEX table for that
> PMD thread, however I would expect the results to be noise, and not signal.

What about dumping it as debug messages, so in case we would like to see it 
(either for development, or production case), we can still enable it?


> I'm happy to discuss, but a bit fearful of adding all sorts of fancy features
> that in reality are not going to be useful.
>
> <snip code changes till end of patch>
>
> Regards, -Harry

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

Reply via email to