from:"Flavio Leitner"

Re: [ovs-dev] [PATCH] rhel: use /run instead of /var/run

2021-07-12 Thread Flavio Leitner

On Wed, May 12, 2021 at 05:08:08PM +0200, Timothy Redaelli wrote:
> Systemd unit file generates warnings about PID file path since /var/run
> is a legacy path so just use /run instead of /var/run.
> 
> /var/run is a symlink of /run starting from RHEL7 (and any other distribution
> that uses systemd).
> 
> Reported-at: https://bugzilla.redhat.com/1952081
> Signed-off-by: Timothy Redaelli 
> ---

Reproduced on F34:
Jul 12 17:03:28 p50 systemd[1]:
/usr/lib/systemd/system/ovs-vswitchd.service:12: PIDFile= references
a path below legacy directory /var/run/, updating
/var/run/openvswitch/ovs-vswitchd.pid →
/run/openvswitch/ovs-vswitchd.pid; please update the unit file
accordingly.

Acked-by: Flavio Leitner 

Thanks Timothy,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v4] dpif/dpcls: limit count subtable search info logs

2021-07-12 Thread Flavio Leitner



Hi Kumar,

There is an issue with the signed-offs reported by 0-day Robot.
For additional info, please check the link below and look for the
tag Co-authored-by:
https://github.com/openvswitch/ovs/blob/master/Documentation/internals/contributing/submitting-patches.rst#tags

Otherwise the patch looks good time.
Thanks,
fbl

On Mon, Jul 12, 2021 at 11:44:05AM +0530, kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit avoids many instances of "using subtable X for miniflow (x,y)"
> in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
> when no specialized subtable is found, and the generic "_any" version of
> the avx512 subtable search implementation was used. This change logs the
> subtable usage once, avoiding duplicates.
> 
> Signed-off-by: Harry van Haaren 
> Signed-off-by: kumar Amber 
> 
> ---
> v4:
> - add doc updtae from Flavio
> v3:
> - add comments from Flavio
> - add documentation update
> ---
>  Documentation/topics/dpdk/bridge.rst   | 34 ++
>  lib/dpif-netdev-lookup-avx512-gather.c |  4 +--
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 0f70a0cad..374e03eb0 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -182,6 +182,40 @@ chosen, and the 2nd occurance of that priority is not 
> used. Put in logical
>  terms, a subtable is chosen if its priority is greater than the previous
>  best candidate.
>  
> +Optimizing Specific Subtable Search
> +~~~
> +
> +During the packet classification, the datapath can use specialized
> +lookup tables to optimize the search. However, not all situations
> +are optimized. If you see a message like the following one in the OVS
> +logs, it means that there is no specialized implementation available
> +for the current networking traffic. In this case, OVS will continue
> +to process the traffic normally using a more generic lookup table."
> +
> +"Using non-specialized AVX512 lookup for subtable (4,1) and possibly others."
> +
> +(Note that the numbers 4 and 1 will likely be different in your logs)
> +
> +Additional specialized lookups can be added to OVS if the user
> +provides that log message along with the command output as show
> +below to the OVS mailing list. Note that the numbers in the log
> +message ("subtable (X,Y)") need to match with the numbers in
> +the provided command output ("dp-extra-info:miniflow_bits(X,Y)").
> +
> +"ovs-appctl dpctl/dump-flows -m", which results in output like this:
> +
> +ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5, 
> skb_priority(0/0),skb_mark(0/0)
> +,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),
> +
> dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00:
> +00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(
> +
> 0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0
> +,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0,
> +dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs,
> +actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1)
> +
> +Please send an email to the OVS mailing list ovs-dev@openvswitch.org with
> +the output of the "dp-extra-info:miniflow_bits(4,1)" values.
> +
>  CPU ISA Testing and Validation
>  ~~
>  
> diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
> b/lib/dpif-netdev-lookup-avx512-gather.c
> index bc359dc4a..ced846aa7 100644
> --- a/lib/dpif-netdev-lookup-avx512-gather.c
> +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> @@ -411,8 +411,8 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, 
> uint32_t u1_bits)
>   */
>  if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
>  f = dpcls_avx512_gather_mf_any;
> -VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n",
> -  u0_bits, u1_bits);
> +VLOG_INFO_ONCE("Using non-specialized AVX512 lookup for subtable"
> +   " (%d,%d) and possibly others.", u0_bits, u1_bits);
>  }
>  
>  return f;
> -- 
> 2.25.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch

2021-07-12 Thread Flavio Leitner



Hi Joe,

Maybe you can take a look...
Thanks,
fbl

On Thu, Jul 08, 2021 at 11:40:12AM -0300, Flavio Leitner wrote:
> 
> Hi Pravin,
> 
> Any thoughts on this patch? We are closing OVS 2.16, so it would
> be nice to know if it looks okay or needs changes, specially
> changes related to the userspace interface.
> 
> Thanks,
> fbl
> 
> On Wed, Jun 30, 2021 at 05:53:49AM -0400, Mark Gray wrote:
> > The Open vSwitch kernel module uses the upcall mechanism to send
> > packets from kernel space to user space when it misses in the kernel
> > space flow table. The upcall sends packets via a Netlink socket.
> > Currently, a Netlink socket is created for every vport. In this way,
> > there is a 1:1 mapping between a vport and a Netlink socket.
> > When a packet is received by a vport, if it needs to be sent to
> > user space, it is sent via the corresponding Netlink socket.
> > 
> > This mechanism, with various iterations of the corresponding user
> > space code, has seen some limitations and issues:
> > 
> > * On systems with a large number of vports, there is a correspondingly
> > large number of Netlink sockets which can limit scaling.
> > (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> > * Packet reordering on upcalls.
> > (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> > * A thundering herd issue.
> > (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> > 
> > This patch introduces an alternative, feature-negotiated, upcall
> > mode using a per-cpu dispatch rather than a per-vport dispatch.
> > 
> > In this mode, the Netlink socket to be used for the upcall is
> > selected based on the CPU of the thread that is executing the upcall.
> > In this way, it resolves the issues above as:
> > 
> > a) The number of Netlink sockets scales with the number of CPUs
> > rather than the number of vports.
> > b) Ordering per-flow is maintained as packets are distributed to
> > CPUs based on mechanisms such as RSS and flows are distributed
> > to a single user space thread.
> > c) Packets from a flow can only wake up one user space thread.
> > 
> > The corresponding user space code can be found at:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html
> > 
> > Bugzilla: https://bugzilla.redhat.com/1844576
> > Signed-off-by: Mark Gray 
> > ---
> > 
> > Notes:
> > v1 - Reworked based on Flavio's comments:
> >  * Fixed handling of userspace action case
> >  * Renamed 'struct dp_portids'
> >  * Fixed handling of return from kmalloc()
> >  * Removed check for dispatch type from ovs_dp_get_upcall_portid()
> >- Reworked based on Dan's comments:
> >  * Fixed handling of return from kmalloc()
> >- Reworked based on Pravin's comments:
> >  * Fixed handling of userspace action case
> >- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids
> > 
> >  include/uapi/linux/openvswitch.h |  8 
> >  net/openvswitch/actions.c|  6 ++-
> >  net/openvswitch/datapath.c   | 70 +++-
> >  net/openvswitch/datapath.h   | 20 +
> >  4 files changed, 101 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/uapi/linux/openvswitch.h 
> > b/include/uapi/linux/openvswitch.h
> > index 8d16744edc31..6571b57b2268 100644
> > --- a/include/uapi/linux/openvswitch.h
> > +++ b/include/uapi/linux/openvswitch.h
> > @@ -70,6 +70,8 @@ enum ovs_datapath_cmd {
> >   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
> >   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
> >   * not be sent.
> > + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> > + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
> >   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through 
> > the
> >   * datapath.  Always present in notifications.
> >   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> > the
> > @@ -87,6 +89,9 @@ enum ovs_datapath_attr {
> > OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
> > OVS_DP_ATTR_PAD,
> > OVS_DP_ATTR_MASKS_CACHE_SIZE,
> > +   OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls in 
> > per-cpu
> > +* dispatch mode
> > +*/
> > __OVS_DP_ATTR_MAX
> >  };
> >  
> > @@ -127,6 +132,9 @@ struct ovs_vport_stats {
> >  /* Allow

Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Flavio Leitner

On Mon, Jul 12, 2021 at 02:22:46PM +0200, Eelco Chaudron wrote:
> See some comments below…
> 
> For this patch series, I’m only looking at the diff from v6..v9, not a full 
> review.
> I will do basic compilation and some tests at the end.
> 
> Cheers,
> 
> Eelco
> 
> 
> On 12 Jul 2021, at 7:51, kumar Amber wrote:
> 
> > From: Kumar Amber 
> >
> > This patch introduces the MFEX function pointers which allows
> > the user to switch between different miniflow extract implementations
> > which are provided by the OVS based on optimized ISA CPU.
> >
> > The user can query for the available minflow extract variants available
> > for that CPU by following commands:
> >
> > $ovs-appctl dpif-netdev/miniflow-parser-get
> >
> > Similarly an user can set the miniflow implementation by the following
> > command :
> >
> > $ ovs-appctl dpif-netdev/miniflow-parser-set name
> >
> > This allows for more performance and flexibility to the user to choose
> > the miniflow implementation according to the needs.
> >
> > Signed-off-by: Kumar Amber 
> > Co-authored-by: Harry van Haaren 
> > Signed-off-by: Harry van Haaren 
> >
> > ---
> > v9:
> > - fix review comments from Flavio
> > v7:
> > - fix review comments(Eelco, Flavio)
> > v5:
> > - fix review comments(Ian, Flavio, Eelco)
> > - add enum to hold mfex indexes
> > - add new get and set implemenatations
> > - add Atomic set and get
> > ---
> > ---
> >  NEWS  |   1 +
> >  lib/automake.mk   |   2 +
> >  lib/dpif-netdev-avx512.c  |  31 +-
> >  lib/dpif-netdev-private-extract.c | 162 ++
> >  lib/dpif-netdev-private-extract.h | 111 
> >  lib/dpif-netdev-private-thread.h  |   8 ++
> >  lib/dpif-netdev.c | 105 +++
> >  7 files changed, 416 insertions(+), 4 deletions(-)
> >  create mode 100644 lib/dpif-netdev-private-extract.c
> >  create mode 100644 lib/dpif-netdev-private-extract.h
> >
> > diff --git a/NEWS b/NEWS
> > index 6cdccc715..b0f08e96d 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -32,6 +32,7 @@ Post-v2.15.0
> >   * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction 
> > if the
> > CPU supports it. This enhances performance by using the native 
> > vpopcount
> > instructions, instead of the emulated version of vpopcount.
> > + * Add command line option to switch between MFEX function pointers.
> > - ovs-ctl:
> >   * New option '--no-record-hostname' to disable hostname configuration
> > in ovsdb on startup.
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index 3c9523c1a..53b8abc0f 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
> > lib/dpif-netdev-private-dpcls.h \
> > lib/dpif-netdev-private-dpif.c \
> > lib/dpif-netdev-private-dpif.h \
> > +   lib/dpif-netdev-private-extract.c \
> > +   lib/dpif-netdev-private-extract.h \
> > lib/dpif-netdev-private-flow.h \
> > lib/dpif-netdev-private-thread.h \
> > lib/dpif-netdev-private.h \
> > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> > index 6f9aa8284..7772b7abf 100644
> > --- a/lib/dpif-netdev-avx512.c
> > +++ b/lib/dpif-netdev-avx512.c
> > @@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct 
> > dp_netdev_pmd_thread *pmd,
> >   * // do all processing (HWOL->MFEX->EMC->SMC)
> >   * }
> >   */
> > +
> > +/* Do a batch minfilow extract into keys. */
> > +uint32_t mf_mask = 0;
> > +miniflow_extract_func mfex_func;
> > +atomic_read_relaxed(>miniflow_extract_opt, _func);
> > +if (mfex_func) {
> > +mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd);
> > +}
> > +
> >  uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
> >  uint32_t iter = lookup_pkts_bitmask;
> >  while (iter) {
> > @@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct 
> > dp_netdev_pmd_thread *pmd,
> >  pkt_metadata_init(>md, in_port);
> >
> >  struct dp_netdev_flow *f = NULL;
> > +struct netdev_flow_key *key = [i];
> > +
> > +/* Check the minfiflow mask to see if the packet was correctly
> > + * classifed by vector mfex else do a scalar miniflow extract
> > + * for that packet.
> > + */
> > +bool mfex_hit = !!(mf_mask & (1 << i));
> >
> >  /* Check for a partial hardware offload match. */
> >  if (hwol_enabled) {
> > @@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct 
> > dp_netdev_pmd_thread *pmd,
> >  }
> >  if (f) {
> >  rules[i] = >cr;
> > -pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> > +/* If AVX512 MFEX already classified the packet, use it. */
> > +if (mfex_hit) {
> > +pkt_meta[i].tcp_flags = 
> > miniflow_get_tcp_flags(>mf);
> > +} else {
> > +

Re: [ovs-dev] [v8 12/12] dpif-netdev: add mfex options to scalar dpif

2021-07-10 Thread Flavio Leitner

On Fri, Jul 09, 2021 at 05:36:02PM +0530, kumar Amber wrote:
> This commits add the mfex optimized options to be
> executed as part of scalar DPIF.
> 
> Signed-off-by: kumar Amber 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v8 11/12] dpif-netdev/mfex: add more AVX512 traffic profiles

2021-07-10 Thread Flavio Leitner

On Fri, Jul 09, 2021 at 05:36:01PM +0530, kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit adds 3 new traffic profile implementations to the
> existing avx512 miniflow extract infrastructure. The profiles added are:
> - Ether()/IP()/TCP()
> - Ether()/Dot1Q()/IP()/UDP()
> - Ether()/Dot1Q()/IP()/TCP()
> 
> The design of the avx512 code here is for scalability to add more
> traffic profiles, as well as enabling CPU ISA. Note that an implementation
> is primarily adding static const data, which the compiler then specializes
> away when the profile specific function is declared below.
> 
> As a result, the code is relatively maintainable, and scalable for new
> traffic profiles as well as new ISA, and does not lower performance
> compared with manually written code for each profile/ISA.
> 
> Note that confidence in the correctness of each implementation is
> achieved through autovalidation, unit tests with known packets, and
> fuzz tested packets.
> 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v8 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-10 Thread Flavio Leitner




Hi,

On Fri, Jul 09, 2021 at 05:36:00PM +0530, kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit adds AVX512 implementations of miniflow extract.
> By using the 64 bytes available in an AVX512 register, it is
> possible to convert a packet to a miniflow data-structure in
> a small quantity instructions.
> 
> The implementation here probes for Ether()/IP()/UDP() traffic,
> and builds the appropriate miniflow data-structure for packets
> that match the probe.
> 
> The implementation here is auto-validated by the miniflow
> extract autovalidator, hence its correctness can be easily
> tested and verified.
> 
> Note that this commit is designed to easily allow addition of new
> traffic profiles in a scalable way, without code duplication for
> each traffic profile.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---
> v8:
> - include documentation on AVX512 MFEX as per Eelco's suggestion
> v7:
> - fix minor review sentences (Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - inlcude assert for flow abi change
> - include assert for offset changes
> ---
> ---
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev-extract-avx512.c  | 478 ++
>  lib/dpif-netdev-private-extract.c |  13 +
>  lib/dpif-netdev-private-extract.h |  25 +-
>  4 files changed, 515 insertions(+), 2 deletions(-)
>  create mode 100644 lib/dpif-netdev-extract-avx512.c
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index f4f36325e..299f81939 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
>   $(AM_CFLAGS)
>  lib_libopenvswitchavx512_la_SOURCES = \
>   lib/dpif-netdev-lookup-avx512-gather.c \
> + lib/dpif-netdev-extract-avx512.c \
>   lib/dpif-netdev-avx512.c
>  lib_libopenvswitchavx512_la_LDFLAGS = \
>   -static
> diff --git a/lib/dpif-netdev-extract-avx512.c 
> b/lib/dpif-netdev-extract-avx512.c
> new file mode 100644
> index 0..c06e53582
> --- /dev/null
> +++ b/lib/dpif-netdev-extract-avx512.c
> @@ -0,0 +1,478 @@
> +/*
> + * Copyright (c) 2021 Intel.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +/*
> + * AVX512 Miniflow Extract.
> + *
> + * This file contains optimized implementations of miniflow_extract()
> + * for specific common traffic patterns. The optimizations allow for
> + * quick probing of a specific packet type, and if a match with a specific
> + * type is found, a shuffle like procedure builds up the required miniflow.
> + *
> + * Process
> + * -
> + *
> + * The procedure is to classify the packet based on the traffic type
> + * using predifined bit-masks and arrage the packet header data using shuffle
> + * instructions to a pre-defined place as required by the miniflow.
> + * This elimates the if-else ladder to identify the packet data and add data
> + * as per protocol which is present.
> + */
> +
> +#ifdef __x86_64__
> +/* Sparse cannot handle the AVX512 instructions. */
> +#if !defined(__CHECKER__)
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "flow.h"
> +#include "dpdk.h"
> +
> +#include "dpif-netdev-private-dpcls.h"
> +#include "dpif-netdev-private-extract.h"
> +#include "dpif-netdev-private-flow.h"
> +
> +/* AVX512-BW level permutex2var_epi8 emulation. */
> +static inline __m512i
> +__attribute__((target("avx512bw")))
> +_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
> +   __m512i v_data_0,
> +   __m512i v_shuf_idxs,
> +   __m512i v_data_1)
> +{
> +/* Manipulate shuffle indexes for u16 size. */
> +__mmask64 k_mask_odd_lanes = 0x;
> +/* Clear away ODD lane bytes. Cannot be done above due to no u8 shift. */
> +__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes,
> +v_shuf_idxs,
> +_mm512_setzero_si512());
> +v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
> +
> +__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
> +
> +/* Shuffle each half at 16-bit width. */
> +__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_evn,
> +v_data_1);
> +__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_odd,
> +

Re: [ovs-dev] [v8 08/12] dpif/stats: add miniflow extract opt hits counter

2021-07-10 Thread Flavio Leitner

On Fri, Jul 09, 2021 at 05:35:58PM +0530, kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit adds a new counter to be displayed to the user when
> requesting datapath packet statistics. It counts the number of
> packets that are parsed and a miniflow built up from it by the
> optimized miniflow extract parsers.
> 
> The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an
> extra entry indicating if the optimized MFEX was hit:
> 
>   - MFEX Opt hits:6786432  (100.0 %)
> 
> Signed-off-by: Harry van Haaren 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v8 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-10 Thread Flavio Leitner

On Fri, Jul 09, 2021 at 05:35:57PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> Tests:
>   6: OVS-DPDK - MFEX Autovalidator
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
> 
> Added a new directory to store the PCAP file used
> in the tests and a script to generate the fuzzy traffic
> type pcap to be used in fuzzy unit test.
> 
> Signed-off-by: Kumar Amber 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v8 06/12] dpif-netdev: Add packet count and core id paramters for study

2021-07-10 Thread Flavio Leitner



Hi,

On Fri, Jul 09, 2021 at 05:35:56PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This commit introduces additional command line paramter

Parameter.

> for mfex study function. If user provides additional packet out
> it is used in study to compare minimum packets which must be processed
> else a default value is choosen.
> Also introduces a third paramter for choosing a particular pmd core.
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
> 
> Signed-off-by: Kumar Amber 
> 
> ---
> v7:
> - change the command paramters for core_id and study_pkt_cnt
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - introucde pmd core id parameter
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  39 +++-
>  lib/dpif-netdev-extract-study.c  |  26 -
>  lib/dpif-netdev-private-extract.c|   2 +-
>  lib/dpif-netdev-private-extract.h|   9 ++
>  lib/dpif-netdev.c| 138 +--
>  5 files changed, 200 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 4db416ddd..8ed810f34 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -284,12 +284,47 @@ command also shows whether the CPU supports each 
> implementation ::
>  
>  An implementation can be selected manually by the following command ::
>  
> -$ ovs-appctl dpif-netdev/miniflow-parser-set study
> +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
> + [study_cnt]
> +
> +The above command has two optional parameters study_cnt and core_id which can
> +be set optionally. The first parameter core_id to set a particular miniflow

The above command has two optional parameters: study_cnt and
core_id. The core_id set a particular miniflow...

> +extract function to a specific pmd thread on the core. Third parameter
> +study_cnt is specific to study where how many packets needed to choose best
> +implementation can be provided.. In case of any other implementation other
> +than study third parameter [study_cnt] can pe provided with any arbitatry
> +number and is ignored.

Third parameter study_cnt, which is specific to study and ignored by
other implementations, means how many packets are needed to choose the
best implementation.



>  
>  Also user can select the study implementation which studies the traffic for

The user can select...

>  a specific number of packets by applying all available implementaions of
>  miniflow extract and than chooses the one with most optimal result for that
> -traffic pattern.
> +traffic pattern. A user can also provide an additional packet count parameter

The user can optionally provide an packet count [study_cnt]
parameter...

> +which is the minimum number of packets that OVS must study before choosing an
> +optimal implementation. If no packet count is provided, then the default 
> value,
> +128 is chosen. Also, as there is no synchronization point between threads, 
> one
> +PMD thread might still be running a previous round, and can now decide on
> +earlier data.
> +
> +The per packet count is a global value, and parallel `study()` executions 
> with
> +differing packet counts will use the most recent count value provided by 
> usser.
> +
> +Study can be selected with packet count by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
> +
> +Study can be selected with packet count and explicit PMD selection
> +by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
> +
> +In the above command the last parameter is the CORE ID of the PMD
> +thread and this can also be used to explicitly set the miniflow
> +extraction function pointer on different PMD threads.
> +
> +Scalar can be selected on core 3 by the following command where
> +study count can be put as any arbitary number or left blank::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
>  
>  Miniflow Extract Validation
>  ~~~
> diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
> index f14464b2b..d29523db0 100644
> --- a/lib/dpif-netdev-extract-study.c
> +++ b/lib/dpif-netdev-extract-study.c
> @@ -28,7 +28,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
>  /* Max count of packets to be compared. */
>  #define MFEX_MAX_COUNT (128)
>  
> -static uint32_t mfex_study_pkts_count = 0;
> +static uint32_t mfex_study_pkts_count = MFEX_MAX_COUNT;
>  
>  /* Struct to hold miniflow study stats. */
>  struct study_stats {
> @@ -51,6 +51,28 @@ mfex_study_get_study_stats_ptr(void)
>  return stats;
>  }
>  
> +uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count,
> +const char *name)
> +{
> +struct dpif_miniflow_extract_impl *miniflow_funcs;
> +dpif_mfex_impl_info_get(_funcs);
> +
> +/* If the packet count

Re: [ovs-dev] [v8 05/12] dpif-netdev: Add configure to enable autovalidator at build time.

2021-07-10 Thread Flavio Leitner

On Fri, Jul 09, 2021 at 05:35:55PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This commit adds a new command to allow the user to enable
> autovalidatior by default at build time thus allowing for
> runnig unit test by default.
> 
>  $ ./configure --enable-mfex-default-autovalidator
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> 
> ---
> v7:
> - fix review commens(Eelco, Flavio)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  5 +
>  NEWS |  5 +++--
>  acinclude.m4 | 16 
>  configure.ac |  1 +
>  lib/dpif-netdev-private-extract.c|  9 -
>  5 files changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 7c618cf1f..4db416ddd 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -307,3 +307,8 @@ implementations provide the same results.
>  To set the Miniflow autovalidator, use this command ::
>  
>  $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> +
> +A compile time option is available in order to test it with the OVS unit
> +test suite. Use the following configure option ::
> +
> +$ ./configure --enable-mfex-default-autovalidator
> diff --git a/NEWS b/NEWS
> index 95bf386e3..3addb8616 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -31,9 +31,11 @@ Post-v2.15.0
>   * Add command line option to switch between mfex function pointers.
>   * Add miniflow extract auto-validator function to compare different
> miniflow extract implementations against default implementation.
> -*  Add study function to miniflow function table which studies packet
> + * Add study function to miniflow function table which studies packet

I guess I commented out in the patch introducing this.
Please fix there instead.

> and automatically chooses the best miniflow implementation for that
> traffic.
> + * Add build time configure command to enable auto-validatior as default
> +   miniflow implementation at build time.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> @@ -50,7 +52,6 @@ Post-v2.15.0
>   * New option '--election-timer' to the 'create-cluster' command to set 
> the
> leader election timer during cluster creation.
>  
> -
>  v2.15.0 - 15 Feb 2021
>  -
> - OVSDB:
> diff --git a/acinclude.m4 b/acinclude.m4
> index 343303447..5a48f0335 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -14,6 +14,22 @@
>  # See the License for the specific language governing permissions and
>  # limitations under the License.
>  
> +dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time?
> +dnl This enables automatically running all unit tests with all MFEX
> +dnl implementations.
> +AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [
> +  AC_ARG_ENABLE([mfex-default-autovalidator],
> +[AC_HELP_STRING([--enable-mfex-default-autovalidator], 
> [Enable MFEX autovalidator as default miniflow_extract implementation.])],
> +[autovalidator=yes],[autovalidator=no])
> +  AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation])
> +  if test "$autovalidator" != yes; then
> +AC_MSG_RESULT([no])
> +  else
> +OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT"
> +AC_MSG_RESULT([yes])
> +  fi
> +])
> +
>  dnl Set OVS DPCLS Autovalidator as default subtable search at compile time?
>  dnl This enables automatically running all unit tests with all DPCLS
>  dnl implementations.
> diff --git a/configure.ac b/configure.ac
> index e45685a6c..46c402892 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE
>  OVS_CTAGS_IDENTIFIERS
>  OVS_CHECK_DPCLS_AUTOVALIDATOR
>  OVS_CHECK_DPIF_AVX512_DEFAULT
> +OVS_CHECK_MFEX_AUTOVALIDATOR
>  OVS_CHECK_BINUTILS_AVX512
>  
>  AC_ARG_VAR(KARCH, [Kernel Architecture String])
> diff --git a/lib/dpif-netdev-private-extract.c 
> b/lib/dpif-netdev-private-extract.c
> index bb7c98f31..be8c69408 100644
> --- a/lib/dpif-netdev-private-extract.c
> +++ b/lib/dpif-netdev-private-extract.c
> @@ -80,13 +80,20 @@ dp_mfex_impl_get_default(void)
>  /* For the first call, this will be NULL. Compute the compile time 
> default.
>   */
>  if (OVS_UNLIKELY(!default_mfex_func_set)) {
> +
> +#ifdef MFEX_AUTOVALIDATOR_DEFAULT
> +VLOG_INFO("Default miniflow Extract implementation %s",
> +  mfex_impls[MFEX_IMPL_AUTOVALIDATOR].name);
> +atomic_store_relaxed(mfex_func, (uintptr_t) mfex_impls
> + [MFEX_IMPL_AUTOVALIDATOR].extract_func);
> +#else
>  VLOG_INFO("Default MFEX implementation is %s.\n",
>

Re: [ovs-dev] [v8 03/12] dpif-netdev: Add study function to select the best mfex function

2021-07-10 Thread Flavio Leitner



Hi,

On Fri, Jul 09, 2021 at 05:35:53PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> The study function runs all the available implementations
> of miniflow_extract and makes a choice whose hitmask has
> maximum hits and sets the mfex to that function.
> 
> Study can be run at runtime using the following command:
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set study
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> 
> ---
> v7:
> - fix review comments(Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - add Atomic set in study
> ---
> ---
>  NEWS  |   3 +
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev-extract-study.c   | 143 ++
>  lib/dpif-netdev-private-extract.c |  17 
>  lib/dpif-netdev-private-extract.h |  20 +
>  5 files changed, 184 insertions(+)
>  create mode 100644 lib/dpif-netdev-extract-study.c
> 
> diff --git a/NEWS b/NEWS
> index e8b4e0405..95bf386e3 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -31,6 +31,9 @@ Post-v2.15.0
>   * Add command line option to switch between mfex function pointers.
>   * Add miniflow extract auto-validator function to compare different
> miniflow extract implementations against default implementation.
> +*  Add study function to miniflow function table which studies packet

I guess that is not indented correctly.

> +   and automatically chooses the best miniflow implementation for that
> +   traffic.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 53b8abc0f..f4f36325e 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -107,6 +107,7 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dp-packet.h \
>   lib/dp-packet.c \
>   lib/dpdk.h \
> + lib/dpif-netdev-extract-study.c \
>   lib/dpif-netdev-lookup.h \
>   lib/dpif-netdev-lookup.c \
>   lib/dpif-netdev-lookup-autovalidator.c \
> diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
> new file mode 100644
> index 0..f14464b2b
> --- /dev/null
> +++ b/lib/dpif-netdev-extract-study.c
> @@ -0,0 +1,143 @@
> +/*
> + * Copyright (c) 2021 Intel.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "dpif-netdev-private-thread.h"
> +#include "openvswitch/vlog.h"
> +#include "ovs-thread.h"
> +
> +VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
> +
> +/* Max count of packets to be compared. */
> +#define MFEX_MAX_COUNT (128)
> +
> +static uint32_t mfex_study_pkts_count = 0;
> +
> +/* Struct to hold miniflow study stats. */
> +struct study_stats {
> +uint32_t pkt_count;
> +uint32_t impl_hitcount[MFEX_IMPL_MAX];
> +};
> +
> +/* Define per thread data to hold the study stats. */
> +DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats);
> +
> +/* Allocate per thread PMD pointer space for study_stats. */
> +static inline struct study_stats *
> +mfex_study_get_study_stats_ptr(void)
> +{
> +struct study_stats *stats = study_stats_get();
> +if (OVS_UNLIKELY(!stats)) {
> +   stats = xzalloc(sizeof *stats);
> +   study_stats_set_unsafe(stats);
> +}
> +return stats;
> +}
> +
> +uint32_t
> +mfex_study_traffic(struct dp_packet_batch *packets,
> +   struct netdev_flow_key *keys,
> +   uint32_t keys_size, odp_port_t in_port,
> +   struct dp_netdev_pmd_thread *pmd_handle)
> +{
> +uint32_t hitmask = 0;
> +uint32_t mask = 0;
> +struct dp_netdev_pmd_thread *pmd = pmd_handle;
> +struct dpif_miniflow_extract_impl *miniflow_funcs;
> +struct study_stats *stats = mfex_study_get_study_stats_ptr();
> +uint32_t impl_count = dpif_mfex_impl_info_get(_funcs);

This module has access to enum and the MFEX_IMPL_MAX should be
enough. See more details in dpif_mfex_impl_info_get() below.

> +
> +if (impl_count <= 0) {
> +return 0;
> +}

Not sure how that can happen because at least there will
one implementation for scalar and another for study. If
that is not enough we can build assert because it doesn't
change in runtime.

> +
> +/* Run traffic optimized miniflow_extract to collect the hitmask
> + * to be compared after certain packets have

Re: [ovs-dev] [v8 04/12] docs/dpdk/bridge: add miniflow extract section.

2021-07-10 Thread Flavio Leitner

On Fri, Jul 09, 2021 at 05:35:54PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This commit adds a section to the dpdk/bridge.rst netdev documentation,
> detailing the added miniflow functionality. The newly added commands are
> documented, and sample output is provided.
> 
> The use of auto-validator and special study function is also described
> in detail as well as running fuzzy tests.
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v8 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-07-10 Thread Flavio Leitner

On Fri, Jul 09, 2021 at 05:35:52PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This patch introduced the auto-validation function which
> allows users to compare the batch of packets obtained from
> different miniflow implementations against the linear
> miniflow extract and return a hitmask.
> 
> The autovaidator function can be triggered at runtime using the
> following command:
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> 
> ---
> v6:
> -fix review comments(Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - remove ovs assert and switch to default after a batch of packets
>   is processed
> - Atomic set and get introduced
> - fix raw_ctz for windows build
> ---
> ---
>  NEWS  |   2 +
>  lib/dpif-netdev-private-extract.c | 150 ++
>  lib/dpif-netdev-private-extract.h |  22 +
>  lib/dpif-netdev.c |   2 +-
>  4 files changed, 175 insertions(+), 1 deletion(-)
> 
> diff --git a/NEWS b/NEWS
> index 413ceec6c..e8b4e0405 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -29,6 +29,8 @@ Post-v2.15.0
> CPU supports it. This enhances performance by using the native 
> vpopcount
> instructions, instead of the emulated version of vpopcount.
>   * Add command line option to switch between mfex function pointers.
> + * Add miniflow extract auto-validator function to compare different
> +   miniflow extract implementations against default implementation.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/dpif-netdev-private-extract.c 
> b/lib/dpif-netdev-private-extract.c
> index 1cf133736..2cc1bc9d5 100644
> --- a/lib/dpif-netdev-private-extract.c
> +++ b/lib/dpif-netdev-private-extract.c
> @@ -38,6 +38,11 @@ static miniflow_extract_func default_mfex_func = NULL;
>   */
>  static struct dpif_miniflow_extract_impl mfex_impls[] = {
>  
> +[MFEX_IMPL_AUTOVALIDATOR] = {
> +.probe = NULL,
> +.extract_func = dpif_miniflow_extract_autovalidator,
> +.name = "autovalidator", },
> +
>  [MFEX_IMPL_SCALAR] = {
>  .probe = NULL,
>  .extract_func = NULL,
> @@ -156,3 +161,148 @@ dp_mfex_impl_get_by_name(const char *name, 
> miniflow_extract_func *out_func)
>  
>  return -ENOENT;
>  }
> +
> +uint32_t
> +dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
> +struct netdev_flow_key *keys,
> +uint32_t keys_size, odp_port_t in_port,
> +struct dp_netdev_pmd_thread *pmd_handle)
> +{
> +const size_t cnt = dp_packet_batch_size(packets);
> +uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
> +uint16_t good_l3_ofs[NETDEV_MAX_BURST];
> +uint16_t good_l4_ofs[NETDEV_MAX_BURST];
> +uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
> +struct dp_packet *packet;
> +struct dp_netdev_pmd_thread *pmd = pmd_handle;
> +struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
> +
> +if (keys_size < cnt) {
> +miniflow_extract_func default_func = NULL;
> +atomic_uintptr_t *pmd_func = (void *)>miniflow_extract_opt;
> +atomic_store_relaxed(pmd_func, (uintptr_t) default_func);
> +VLOG_ERR("Invalid key size supplied, Key_size: %d less than"
> + "batch_size:  %" PRIuSIZE"", keys_size, cnt);

Please mention that the autovalidator is disabled in this pmd too.

> +return 0;
> +}
> +
> +/* Run scalar miniflow_extract to get default result. */
> +DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> +pkt_metadata_init(>md, in_port);
> +miniflow_extract(packet, [i].mf);
> +
> +/* Store known good metadata to compare with optimized metadata. */
> +good_l2_5_ofs[i] = packet->l2_5_ofs;
> +good_l3_ofs[i] = packet->l3_ofs;
> +good_l4_ofs[i] = packet->l4_ofs;
> +good_l2_pad_size[i] = packet->l2_pad_size;
> +}
> +
> +uint32_t batch_failed = 0;
> +/* Iterate through each version of miniflow implementations. */
> +for (int j = MFEX_IMPL_START_IDX; j < MFEX_IMPL_MAX; j++) {
> +if ((j < MFEX_IMPL_MAX) || (!mfex_impls[j].available)) {

With the fix for MFEX_IMPL_MAX in the previous patch and fixing
the define for MFEX_IMPL_START_IDX in the future patch, there is
no need to check if (j < MFEX_IMPL_MAX).



> +continue;
> +}
> +
> +/* Reset keys and offsets before each implementation. */
> +memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
> +DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> +dp_packet_reset_offsets(packet);
> +}
> +/* Call optimized miniflow for each batch of packet. */
> +uint32_t hit_mask =

Re: [ovs-dev] [v8 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-10 Thread Flavio Leitner




Hi

On Fri, Jul 09, 2021 at 05:35:51PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This patch introduces the mfex function pointers which allows

Let's use capital MFEX.

> the user to switch between different miniflow extract implementations
> which are provided by the OVS based on optimized ISA CPU.
> 
> The user can query for the available minflow extract variants available
> for that CPU by following commands:
> 
> $ovs-appctl dpif-netdev/miniflow-parser-get
> 
> Similarly an user can set the miniflow implementation by the following
> command :
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set name
> 
> This allows for more performance and flexibility to the user to choose
> the miniflow implementation according to the needs.
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> 
> ---
> v7:
> - fix review comments(Eelco, Flavio)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - add enum to hold mfex indexes
> - add new get and set implemenatations
> - add Atomic set and get
> ---
> ---
>  NEWS  |   1 +
>  lib/automake.mk   |   2 +
>  lib/dpif-netdev-avx512.c  |  32 +-
>  lib/dpif-netdev-private-extract.c | 158 ++
>  lib/dpif-netdev-private-extract.h | 107 
>  lib/dpif-netdev-private-thread.h  |   8 ++
>  lib/dpif-netdev.c | 105 
>  7 files changed, 409 insertions(+), 4 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-extract.c
>  create mode 100644 lib/dpif-netdev-private-extract.h
> 
> diff --git a/NEWS b/NEWS
> index 269f647db..413ceec6c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -28,6 +28,7 @@ Post-v2.15.0
>   * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if 
> the
> CPU supports it. This enhances performance by using the native 
> vpopcount
> instructions, instead of the emulated version of vpopcount.
> + * Add command line option to switch between mfex function pointers.

Capital MFEX.

> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 3c9523c1a..53b8abc0f 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev-private-dpcls.h \
>   lib/dpif-netdev-private-dpif.c \
>   lib/dpif-netdev-private-dpif.h \
> + lib/dpif-netdev-private-extract.c \
> + lib/dpif-netdev-private-extract.h \
>   lib/dpif-netdev-private-flow.h \
>   lib/dpif-netdev-private-thread.h \
>   lib/dpif-netdev-private.h \
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> index 6f9aa8284..0c2345f90 100644
> --- a/lib/dpif-netdev-avx512.c
> +++ b/lib/dpif-netdev-avx512.c
> @@ -149,6 +149,16 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>   * // do all processing (HWOL->MFEX->EMC->SMC)
>   * }
>   */
> +
> +/* Do a batch minfilow extract into keys. */
> +uint32_t mf_mask = 0;
> +miniflow_extract_func mfex_func;
> +atomic_read_relaxed(>miniflow_extract_opt, _func);
> +if (mfex_func) {
> +mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd);
> +}
> +
> +/* Perform first packet interation. */

What does the above comment mean?

>  uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
>  uint32_t iter = lookup_pkts_bitmask;
>  while (iter) {
> @@ -167,6 +177,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>  pkt_metadata_init(>md, in_port);
>  
>  struct dp_netdev_flow *f = NULL;
> +struct netdev_flow_key *key = [i];
> +
> +/* Check the minfiflow mask to see if the packet was correctly
> + * classifed by vector mfex else do a scalar miniflow extract
> + * for that packet.
> + */
> +bool mfex_hit = (mf_mask & (1 << i));

Since this is a bit mask, please convert to boolean using double NOT:
bool mfex_hit = !!(mf_mask & (1 << i));


>  /* Check for a partial hardware offload match. */
>  if (hwol_enabled) {
> @@ -177,7 +194,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>  }
>  if (f) {
>  rules[i] = >cr;
> -pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> +/* If AVX512 MFEX already classified the packet, use it. */
> +if (mfex_hit) {
> +pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
> +} else {
> +pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> +}
> +
>  pkt_meta[i].bytes = dp_packet_size(packet);
>  phwol_hits++;
>  hwol_emc_smc_hitmask |= (1 << i);
> @@ -185,9 +208,10 @@

Re: [ovs-dev] [PATCH v2] netdev-linux: Ignore TSO packets when TSO is not enabled for userspace

2021-07-10 Thread Flavio Leitner

On Fri, Jul 09, 2021 at 10:08:39PM +0200, Ilya Maximets wrote:
> On 7/8/21 2:16 PM, Flavio Leitner wrote:
> > On Mon, Jul 05, 2021 at 07:57:41AM -0400, Eelco Chaudron wrote:
> >> When TSO is disabled from a userspace forwarding datapath perspective,
> >> but TSO has been wrongly enabled on the kernel side, log a warning
> >> message, and drop the packet. With the current implementation,
> >> OVS will crash.
> >>
> >> Fixes: 73858f9db ("netdev-linux: Prepend the std packet in the TSO packet")
> >> Signed-off-by: Eelco Chaudron 
> >> ---
> >> v2: Fixed rx->aux_bufs[i] to allow reuse
> >>
> >>  lib/netdev-linux.c |   20 +---
> >>  1 file changed, 17 insertions(+), 3 deletions(-)
> Thanks, Eelco and Flavio!  I extended the commit message a bit with more
> details why exactly this happens.  I also added a different Fixes tag,
> because the actual culprit for the issue is that commit 2109841b7984
> ("Use batch process recv for tap and raw socket in netdev datapath")
> dropped the (retval > size) check without providing an alternative while
> migrating from recvmsg to recvmmsg.  This resulted in construction of
> a malformed dp_packet with size larger than the allocated space.
> The crash due to NULL aux_bufs was introduced later by commit 73858f9db.

Yup, thanks for improving the commit.

> Applied and backported down to 2.13.

That's great!
Thanks,
-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD statistic.

2021-07-09 Thread Flavio Leitner

ype, dl_type); ▒│   │  
miniflow_push_be16(mf, dl_type, dl_type); ▒
  0.19 │mov%r10w,0xc(%r13)  ▒│  1.91 │  
  mov%r10w,0xc(%r13)  ▒
   │  miniflow_pad_to_64(mf, dl_type);  ▒│   │  
miniflow_pad_to_64(mf, dl_type);  ▒
  0.42 │add$0x10,%r13   ▒│  0.32 │  
  add$0x10,%r13   ▒
  0.26 │mov%r15w,-0x2(%r13) ▒│  5.62 │  
  mov%r15w,-0x2(%r13) ▒
   │  if (num_vlans > 0) {  ▒│   │  
if (num_vlans > 0) {  ▒
   │test   %rbp,%rbp▒│   │  
  test   %rbp,%rbp▒
  1.42 │  ↓ je 2b5  ▒│  0.11 │  
↓ je 2b5  ▒
   │  miniflow_push_words_32(mf, vlans, vlans, num_vlans▒│   │  
miniflow_push_words_32(mf, vlans, vlans, num_vlans▒
   │lea0x1(%rbp),%rcx   ▒│   │  
  lea0x1(%rbp),%rcx   ▒
   │mov%r13,%rdi▒│   │  
  mov%r13,%rdi▒
   │lea0x40(%rsp),%rsi  ▒│   │  
  lea0x40(%rsp),%rsi  ▒
   │mov%r9,0x28(%rsp)   ▒│   │  
  mov%r9,0x28(%rsp)   ▒
   │shr%rcx ▒│   │  
  shr%rcx ▒
   │  flowmap_set():▒│   │  
flowmap_set():▒
   │  map_t n_bits_mask = (MAP_1 << n_bits) - 1;▒│   │  
map_t n_bits_mask = (MAP_1 << n_bits) - 1;▒
   │mov$0x1,%r15d   ▒│   │  
  mov$0x1,%r15d   ▒
   │mov%r8d,0x24(%rsp)  ▒│   │  
  mov%r8d,0x24(%rsp)  ▒
   │  miniflow_extract():   ▒│   │  
miniflow_extract():   ▒
   │lea0x0(,%rbp,4),%rdx▒│   │  
  lea0x0(,%rbp,4),%rdx▒
   │  flowmap_set():▒│   │  
flowmap_set():▒
   │shl%cl,%r15 ▒│   │  
  shl%cl,%r15 ▒
   │mov%r10d,0x20(%rsp) ▒│   │  
  mov%r10d,0x20(%rsp) ▒
   │sub$0x1,%r15▒│   │  
  sub$0x1,%r15▒
   │mov%r11,0x18(%rsp)  ▒│   │  
  mov%r11,0x18(%rsp)  ▒


I don't see any relevant optimization difference in the code
above, but the "mov %r15w,-0x2(%r13)" on the right side accounts
for almost all the difference, though on the left side it seems
a bit more spread.

I applied the patch below and it helped to get to 12.7Mpps, so
almost at the same levels. I wonder if you see the same result.

diff --git a/lib/flow.c b/lib/flow.c
index 729d59b1b..4572e356b 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -746,6 +746,9 @@ miniflow_extract(struct dp_packet *packet, struct miniflow 
*dst)
 uint8_t *ct_nw_proto_p = NULL;
 ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
 
+/* dltype will be updated later. */
+OVS_PREFETCH_WRITE(miniflow_pointer(mf, dl_type));
+
 /* Metadata. */
 if (flow_tnl_dst_is_set(>tunnel)) {
 miniflow_push_words(mf, tunnel, >tunnel,


fbl

On Thu, Jul 08, 2021 at 03:02:36PM +0100, Cian Ferriter wrote:
> It is possible for packets traversing the userspace datapath to match a
> flow before hitting on EMC by using a mark ID provided by a NIC. Add a
> PMD statistic for this hit.
> 
> Signed-off-by: Cian Ferriter 
> Acked-by: Flavio Leitner 
> 
> ---
> 
> Cc: Gaetan Rivet 
> Cc: Sriharsha Basavapatna 
> 
> v14:
> - Added Flavio's Acked-by tag.
> 
> v13:
> - Minor refactoring to address review comments.
> - Update manpages to reflect the new format of the pmd-perf-show
>   command.
> ---
>  NEWS| 2 ++
>  lib/dpif-netdev-avx512.c| 3 +++
>  lib/dpif-netdev-perf.c  | 3 +++
>  lib/dpif-netdev-perf.h  | 1 +
>  lib/dpif-netdev-unixctl.man | 1 +
>  lib/dpif-netdev.c

Re: [ovs-dev] [v15 05/10] dpif-netdev: Add command to get dpif implementations.

2021-07-09 Thread Flavio Leitner

On Thu, Jul 08, 2021 at 03:02:35PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds a new command to retrieve the list of available
> DPIF implementations. This can be used by to check what implementations
> of the DPIF are available in any given OVS binary. It also returns which
> implementations are in use by the OVS PMD threads.
> 
> Usage:
>  $ ovs-appctl dpif-netdev/dpif-impl-get
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v15 04/10] dpif-netdev: Add command to switch dpif implementation.

2021-07-09 Thread Flavio Leitner

On Thu, Jul 08, 2021 at 03:02:34PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds a new command to allow the user to switch
> the active DPIF implementation at runtime. A probe function
> is executed before switching the DPIF implementation, to ensure
> the CPU is capable of running the ISA required. For example, the
> below code will switch to the AVX512 enabled DPIF assuming
> that the runtime CPU is capable of running AVX512 instructions:
> 
>  $ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512
> 
> A new configuration flag is added to allow selection of the
> default DPIF. This is useful for running the unit-tests against
> the available DPIF implementations, without modifying each unit test.
> 
> The design of the testing & validation for ISA optimized DPIF
> implementations is based around the work already upstream for DPCLS.
> Note however that a DPCLS lookup has no state or side-effects, allowing
> the auto-validator implementation to perform multiple lookups and
> provide consistent statistic counters.
> 
> The DPIF component does have state, so running two implementations in
> parallel and comparing output is not a valid testing method, as there
> are changes in DPIF statistic counters (side effects). As a result, the
> DPIF is tested directly against the unit-tests.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v3] dpif/dpcls: limit count subtable search info logs

2021-07-09 Thread Flavio Leitner

On Fri, Jul 09, 2021 at 01:43:08PM +, Van Haaren, Harry wrote:
> > -Original Message-
> 
> 
> > > +Optimizing Specific Subtable Search
> > > +~~~
> > > +
> > > +The DPCLS wildcarding engine can be "specialized" to handle specific 
> > > subtable
> > > +searching even faster than the generic subtable search 
> > > implementation.Not all
> > > +subtable searches are specialized, if you see a message like the 
> > > following in
> > > +your OVS output, please run the below commands to inform OVS community
> > of the
> > 
> > The documentation is supposed to help users to understand what is
> > happening and perhaps consider the next steps. The text above
> > requires deeper knowledge of how OVS internally works and provides
> > no indication of what is causing it or consequences like if it is
> > harmful or not. Also, it commits to add new specialized lookups
> > in the next release, which might not be realistic.
> > 
> > What do you think about the suggestion below based on your proposal?
> > 
> > 8<---
> > During the packet classification, the datapath can use specialized
> > lookup tables to optimize the search. However, not all situations
> > are optimized. If you see a message like the following one in the OVS
> > logs, it means that there is no specialized implementation available
> > for the current networking traffic. In this case, OVS will continue
> > to process the traffic normally using a more generic lookup table."
> > 
> > "Using non-specialized AVX512 lookup for subtable (4,1) and possibly 
> > others."
> > 
> > (Note that the numbers 4 and 1 will likely be different in your logs)
> > 
> > Additional specialized lookups can be added to OVS if the user
> > provides that log message along with the command output as show
> > below to the OVS mailing list. Note that the numbers in the log
> > message ("subtable (X,Y)") need to match with the numbers in
> > the provided command output ("dp-extra-info:miniflow_bits(X,Y)").
> > 
> > "ovs-appctl dpctl/dump-flows -m", which results in output like this:
> > 
> > ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5, 
> > skb_priority(0/0),skb_mark(0/0)
> > ,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),
> > 
> > dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00:
> > 00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(
> > 
> > 0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0
> > 
> > ,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0,
> > dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs,
> > actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1)
> > 
> > 8<---
> 
> Good suggestion, I guess I'm too close to the code to even be able describe or
> document it from a users point of view!
> 
> I like the suggested changes, thanks. Respin or update-on-apply all fine with 
> me. -Harry

Due to this busy patching period, could you please respin?
It helps maintainers, the 0day bot can do usual checking, etc.

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v3] dpif/dpcls: limit count subtable search info logs

2021-07-09 Thread Flavio Leitner



Hi,

Thanks for following up with this patch.
See my comments below.


On Thu, Jul 08, 2021 at 09:41:54PM +0530, kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit avoids many instances of "using subtable X for miniflow (x,y)"
> in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
> when no specialized subtable is found, and the generic "_any" version of
> the avx512 subtable search implementation was used. This change logs the
> subtable usage once, avoiding duplicates.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---
> v3:
> - add comments from Flavio
> - add documentation update
> ---
>  Documentation/topics/dpdk/bridge.rst   | 31 ++
>  lib/dpif-netdev-lookup-avx512-gather.c |  4 ++--
>  2 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 0f70a0cad..e74c839b5 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -182,6 +182,37 @@ chosen, and the 2nd occurance of that priority is not 
> used. Put in logical
>  terms, a subtable is chosen if its priority is greater than the previous
>  best candidate.
>  
> +Optimizing Specific Subtable Search
> +~~~
> +
> +The DPCLS wildcarding engine can be "specialized" to handle specific subtable
> +searching even faster than the generic subtable search implementation.Not all
> +subtable searches are specialized, if you see a message like the following in
> +your OVS output, please run the below commands to inform OVS community of the

The documentation is supposed to help users to understand what is
happening and perhaps consider the next steps. The text above
requires deeper knowledge of how OVS internally works and provides
no indication of what is causing it or consequences like if it is
harmful or not. Also, it commits to add new specialized lookups
in the next release, which might not be realistic.

What do you think about the suggestion below based on your proposal?

8<---
During the packet classification, the datapath can use specialized
lookup tables to optimize the search. However, not all situations
are optimized. If you see a message like the following one in the OVS
logs, it means that there is no specialized implementation available
for the current networking traffic. In this case, OVS will continue
to process the traffic normally using a more generic lookup table."

"Using non-specialized AVX512 lookup for subtable (4,1) and possibly others."

(Note that the numbers 4 and 1 will likely be different in your logs)

Additional specialized lookups can be added to OVS if the user
provides that log message along with the command output as show
below to the OVS mailing list. Note that the numbers in the log
message ("subtable (X,Y)") need to match with the numbers in
the provided command output ("dp-extra-info:miniflow_bits(X,Y)").

"ovs-appctl dpctl/dump-flows -m", which results in output like this:

ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5, skb_priority(0/0),skb_mark(0/0)
,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),
dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00:
00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(
0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0
,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0,
dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs,
actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1)

8<---

Thanks,
fbl

> +subtables in use in your deployment, and we can optimize these subtables for
> +you in the next release.
> +
> +"Using non-specialized AVX512 lookup for subtable (7,2) and possibly others."
> +
> +(Note that the numbers 7 and 2 will likely be different in your logs, these
> +are the numbers OVS community requires to specialize the subtable in your
> +deployment!)
> +
> +If you see this log message, please run the command
> +"ovs-appctl dpctl/dump-flows -m", which results in output like this:
> +
> +ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5, 
> skb_priority(0/0),skb_mark(0/0)
> +,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),
> +
> dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00:
> +00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(
> +
> 0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0
> +,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0,
> +dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs,
> +actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1)
> +
> +Please send an email to the OVS mailing list ovs-dev@openvswitch.org with
> +the output of the "dp-extra-info:miniflow_bits(7,2)" values.
> +
>  CPU ISA Testing and Validation
>

Re: [ovs-dev] [v6 01/11] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-09 Thread Flavio Leitner

On Fri, Jul 09, 2021 at 01:55:25PM +0200, Eelco Chaudron wrote:
> 
> 
> On 9 Jul 2021, at 13:16, Amber, Kumar wrote:
> 
> > Hi Eelco,
> >
> >> -Original Message-
> >> From: Eelco Chaudron 
> >> Sent: Friday, July 9, 2021 4:42 PM
> >> To: Amber, Kumar 
> >> Cc: Flavio Leitner ; Ferriter, Cian 
> >> ;
> >> ovs-dev@openvswitch.org; i.maxim...@ovn.org
> >> Subject: Re: [ovs-dev] [v6 01/11] dpif-netdev: Add command line and 
> >> function
> >> pointer for miniflow extract
> >>
> >>
> >>
> >> On 9 Jul 2021, at 13:10, Amber, Kumar wrote:
> >>
> >>> Hi Eelco,
> >>>
> >>>> -Original Message-
> >>>> From: Eelco Chaudron 
> >>>> Sent: Friday, July 9, 2021 4:36 PM
> >>>> To: Amber, Kumar 
> >>>> Cc: Flavio Leitner ; Ferriter, Cian
> >>>> ; ovs-dev@openvswitch.org;
> >>>> i.maxim...@ovn.org
> >>>> Subject: Re: [ovs-dev] [v6 01/11] dpif-netdev: Add command line and
> >>>> function pointer for miniflow extract
> >>>>
> >>>>
> >>>>
> >>>> On 8 Jul 2021, at 16:01, Amber, Kumar wrote:
> >>>>
> >>>>> Hi Flavio,
> >>>>>
> >>>>> Thanks for the Review
> >>>>> Replies are inline.
> >>>>>
> >>>>> 
> >>>>>
> >>>>>>> +miniflow_extract_func
> >>>>>>> +dp_mfex_impl_get_default(void)
> >>>>>>> +{
> >>>>>>> +/* For the first call, this will be NULL. Compute the compile 
> >>>>>>> time
> >> default.
> >>>>>>> + */
> >>>>>>> +if (!default_mfex_func) {
> >>>>>>> +
> >>>>>>> +VLOG_INFO("Default MFEX implementation is %s.\n",
> >>>>>>> +  mfex_impls[MFEX_IMPL_SCALAR].name);
> >>>>>>> +default_mfex_func =
> >> mfex_impls[MFEX_IMPL_SCALAR].extract_func;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +return default_mfex_func;
> >>>>>>
> >>>>>> Eelco asked to use VLOG_INFO_ONCE to avoid flooding the log, which
> >>>>>> in the end will use a static variable. Perhaps it would be better
> >>>>>> to define a static boolean like:
> >>>>>>
> >>>>>> miniflow_extract_func
> >>>>>> dp_mfex_impl_get_default(void)
> >>>>>> {
> >>>>>>/* For the first call, this will be NULL. Compute the compile time 
> >>>>>> default.
> >>>>>> */
> >>>>>>static bool default_mfex_func_set = false;
> >>>>>>
> >>>>>>if (OVS_UNLIKELY(!default_mfex_func_set)) {
> >>>>>>VLOG_INFO("Default MFEX implementation is %s.\n",
> >>>>>>  mfex_impls[MFEX_IMPL_SCALAR].name);
> >>>>>>// FIXME: Atomic set?
> >>>>>>default_mfex_func = mfex_impls[MFEX_IMPL_SCALAR].extract_func;
> >>>>>>default_mfex_func_set = true;
> >>>>>>}
> >>>>>>
> >>>>>>return default_mfex_func;
> >>>>>> }
> >>>>>>
> >>>>>
> >>>>> Sound good taking into v7.
> >>>>
> >>>> As you already sent out a v7, I guess you mean v8?
> >>>>
> >>>> Are you planning to send out a v8 after you incorporate all Flavio's
> >>>> comments? If so, I hold off on v7 and wait for v8.
> >>>>
> >>>
> >>> The changes are in v7 itself.
> >>
> >> Ok, Anyway, I’ll wait for Flavio to finish his review before I start on v7 
> >> to avoid
> >> having to look at v8 again.
> >>
> >
> > You can review other patches in the series in the meanwhile 
> 
> But then I have again to review the v8, I have some other stuff to work on in 
> the meantime ;)
> 
> I’m assuming Flavio will look at the whole set. Flavio can you ping us when 
> you’re done.

I will jump straight to v7 and consider what has been asked in v6.
I think we can review v7 in parallel.

Thanks,
-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch

2021-07-08 Thread Flavio Leitner



Hi Pravin,

Any thoughts on this patch? We are closing OVS 2.16, so it would
be nice to know if it looks okay or needs changes, specially
changes related to the userspace interface.

Thanks,
fbl

On Wed, Jun 30, 2021 at 05:53:49AM -0400, Mark Gray wrote:
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
> 
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
> 
> * On systems with a large number of vports, there is a correspondingly
> large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> 
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
> 
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
> 
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
> 
> The corresponding user space code can be found at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html
> 
> Bugzilla: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
> 
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * Fixed handling of userspace action case
>  * Renamed 'struct dp_portids'
>  * Fixed handling of return from kmalloc()
>  * Removed check for dispatch type from ovs_dp_get_upcall_portid()
>- Reworked based on Dan's comments:
>  * Fixed handling of return from kmalloc()
>- Reworked based on Pravin's comments:
>  * Fixed handling of userspace action case
>- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids
> 
>  include/uapi/linux/openvswitch.h |  8 
>  net/openvswitch/actions.c|  6 ++-
>  net/openvswitch/datapath.c   | 70 +++-
>  net/openvswitch/datapath.h   | 20 +
>  4 files changed, 101 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 8d16744edc31..6571b57b2268 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -70,6 +70,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.  Always present in notifications.
>   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> the
> @@ -87,6 +89,9 @@ enum ovs_datapath_attr {
>   OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
>   OVS_DP_ATTR_PAD,
>   OVS_DP_ATTR_MASKS_CACHE_SIZE,
> + OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls in 
> per-cpu
> +  * dispatch mode
> +  */
>   __OVS_DP_ATTR_MAX
>  };
>  
> @@ -127,6 +132,9 @@ struct ovs_vport_stats {
>  /* Allow tc offload recirc sharing */
>  #define OVS_DP_F_TC_RECIRC_SHARING   (1 << 2)
>  
> +/* Allow per-cpu dispatch of upcalls */
> +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3)
> +
>  /* Fixed logical ports. */
>  #define OVSP_LOCAL  ((__u32)0)
>  
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index ef15d9eb4774..f79679746c62 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, struct 
> sk_buff *skb,
>   break;
>  
>   case OVS_USERSPACE_ATTR_PID:
> - upcall.portid = nla_get_u32(a);
> + if (dp->user_features & 
> OVS_DP_F_DISPATCH_UPCALL_PER_CPU)
> + upcall.portid =
> +ovs_dp_get_upcall_portid(dp, 
>

Re: [ovs-dev] [v6 01/11] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-08 Thread Flavio Leitner



Hi,

Eelco did an extensive review on this one already, so I will
try to not repeat the same.

See below.

On Tue, Jul 06, 2021 at 02:11:40PM +0100, Cian Ferriter wrote:
> From: Kumar Amber 
> 
> This patch introduces the mfex function pointers which allows
> the user to switch between different miniflow extract implementations
> which are provided by the OVS based on optimized ISA CPU.
> 
> The user can query for the available minflow extract variants available
> for that CPU by following commands:
> 
> $ovs-appctl dpif-netdev/miniflow-parser-get
> 
> Similarly an user can set the miniflow implementation by the following
> command :
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set name
> 
> This allows for more performance and flexibility to the user to choose
> the miniflow implementation according to the needs.
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> 
> ---
> 
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - add enum to hold mfex indexes
> - add new get and set implemenatations
> - add Atomic set and get
> ---
> ---
>  NEWS  |   1 +
>  lib/automake.mk   |   2 +
>  lib/dpif-netdev-avx512.c  |  32 +-
>  lib/dpif-netdev-private-extract.c | 159 ++
>  lib/dpif-netdev-private-extract.h | 105 
>  lib/dpif-netdev-private-thread.h  |   8 ++
>  lib/dpif-netdev.c | 127 +++-
>  7 files changed, 429 insertions(+), 5 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-extract.c
>  create mode 100644 lib/dpif-netdev-private-extract.h
> 
> diff --git a/NEWS b/NEWS
> index be96fc57f..60db823c4 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -22,6 +22,7 @@ Post-v2.15.0
>   * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if 
> the
> CPU supports it. This enhances performance by using the native 
> vpopcount
> instructions, instead of the emulated version of vpopcount.
> + * Add command line option to switch between mfex function pointers.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 49f42c2a3..6657b9ae5 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev-private-dpcls.h \
>   lib/dpif-netdev-private-dpif.c \
>   lib/dpif-netdev-private-dpif.h \
> + lib/dpif-netdev-private-extract.c \
> + lib/dpif-netdev-private-extract.h \
>   lib/dpif-netdev-private-flow.h \
>   lib/dpif-netdev-private-hwol.h \
>   lib/dpif-netdev-private-thread.h \
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> index 9a5189145..91fad92db 100644
> --- a/lib/dpif-netdev-avx512.c
> +++ b/lib/dpif-netdev-avx512.c
> @@ -149,6 +149,16 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>   * // do all processing (HWOL->MFEX->EMC->SMC)
>   * }
>   */
> +
> +/* Do a batch minfilow extract into keys. */
> +uint32_t mf_mask = 0;
> +miniflow_extract_func mfex_func;
> +atomic_read_relaxed(>miniflow_extract_opt, _func);
> +if (mfex_func) {
> +mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd);
> +}
> +
> +/* Perform first packet interation. */
>  uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
>  uint32_t iter = lookup_pkts_bitmask;
>  while (iter) {
> @@ -167,6 +177,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>  pkt_metadata_init(>md, in_port);
>  
>  struct dp_netdev_flow *f = NULL;
> +struct netdev_flow_key *key = [i];
> +
> +/* Check the minfiflow mask to see if the packet was correctly
> + * classifed by vector mfex else do a scalar miniflow extract
> + * for that packet.
> + */
> +uint32_t mfex_hit = (mf_mask & (1 << i));
>  
>  /* Check for a partial hardware offload match. */
>  if (hwol_enabled) {
> @@ -177,7 +194,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>  }
>  if (f) {
>  rules[i] = >cr;
> -pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> +/* If AVX512 MFEX already classified the packet, use it. */
> +if (mfex_hit) {
> +pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
> +} else {
> +pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> +}
> +
>  pkt_meta[i].bytes = dp_packet_size(packet);
>  phwol_hits++;
>  hwol_emc_smc_hitmask |= (1 << i);
> @@ -185,9 +208,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>  }
>  }
>  
> -/* Do miniflow extract

Re: [ovs-dev] [PATCH v2] netdev-linux: Ignore TSO packets when TSO is not enabled for userspace

2021-07-08 Thread Flavio Leitner



Hi Ian,

This one affects TSO which I think you have interest.

If you're okay with the patch, could you please merge it?

Thanks,
fbl

On Mon, Jul 05, 2021 at 07:57:41AM -0400, Eelco Chaudron wrote:
> When TSO is disabled from a userspace forwarding datapath perspective,
> but TSO has been wrongly enabled on the kernel side, log a warning
> message, and drop the packet. With the current implementation,
> OVS will crash.
> 
> Fixes: 73858f9db ("netdev-linux: Prepend the std packet in the TSO packet")
> Signed-off-by: Eelco Chaudron 
> ---
> v2: Fixed rx->aux_bufs[i] to allow reuse
> 
>  lib/netdev-linux.c |   20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 07ece0c7f..d5e693464 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -1292,14 +1292,28 @@ netdev_linux_batch_rxq_recv_sock(struct 
> netdev_rxq_linux *rx, int mtu,
>  for (i = 0; i < retval; i++) {
>  struct dp_packet *pkt;
>  
> -if (mmsgs[i].msg_len < ETH_HEADER_LEN) {
> +if (mmsgs[i].msg_hdr.msg_flags & MSG_TRUNC
> +|| mmsgs[i].msg_len < ETH_HEADER_LEN) {
>  struct netdev *netdev_ = netdev_rxq_get_netdev(>up);
>  struct netdev_linux *netdev = netdev_linux_cast(netdev_);
>  
> +/* The rx->aux_bufs[i] will be re-used next time. */
>  dp_packet_delete(buffers[i]);
>  netdev->rx_dropped += 1;
> -VLOG_WARN_RL(, "%s: Dropped packet: less than ether hdr size",
> - netdev_get_name(netdev_));
> +if (mmsgs[i].msg_hdr.msg_flags & MSG_TRUNC) {
> +/* Data is truncated, so the packet is corrupted, and needs
> + * to be dropped. This can happen if TSO/GRO is enabled in
> + * the kernel, but not in userspace, i.e. there is no dp
> + * buffer to store the full packet. */
> +VLOG_WARN_RL(,
> + "%s: Dropped packet: Too big. GRO/TSO enabled?",
> + netdev_get_name(netdev_));
> +} else {
> +VLOG_WARN_RL(,
> + "%s: Dropped packet: less than ether hdr size",
> + netdev_get_name(netdev_));
> +}
> +
>  continue;
>  }
>  
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] bridge: Use correct (legacy) role names in database.

2021-07-07 Thread Flavio Leitner

On Wed, Jul 07, 2021 at 12:58:28PM -0700, Ben Pfaff wrote:
> On Wed, Jul 07, 2021 at 04:37:11PM -0300, Flavio Leitner wrote:
> > 
> > Hi,
> > 
> > On Tue, Jul 06, 2021 at 03:37:09PM -0700, Ben Pfaff wrote:
> > > The vswitchd database schema requires role names to be "master" or
> > > "slave", but this code tried to use "primary" and "secondary".
> > 
> > We have defined the constraints in the schema and we can't change
> > the schema because it would require external applications to do
> > the same change and would affect upgrades, correct?
> 
> Yes.

Thanks!
-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload

2021-07-07 Thread Flavio Leitner



Hi,

On Wed, Jul 07, 2021 at 04:53:14PM +0200, Ilya Maximets wrote:
> On 7/6/21 3:34 PM, Van Haaren, Harry wrote:
> >> -Original Message-
> >> From: Ilya Maximets 
> >> Sent: Thursday, July 1, 2021 11:32 AM
> >> To: Van Haaren, Harry ; Ilya Maximets
> >> 
> >> Cc: Eli Britstein ; ovs dev ; Ivan 
> >> Malov
> >> ; Majd Dibbiny ; Stokes, Ian
> >> ; Ferriter, Cian ; Ben Pfaff
> >> ; Balazs Nemeth ; Sriharsha Basavapatna
> >> 
> >> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
> >>
> >> On 6/29/21 1:53 PM, Van Haaren, Harry wrote:
>  -Original Message-
>  From: Ilya Maximets 
>  Sent: Monday, June 28, 2021 3:33 PM
>  To: Van Haaren, Harry ; Ilya Maximets
>  ; Sriharsha Basavapatna
>  
>  Cc: Eli Britstein ; ovs dev ; 
>  Ivan
> >> Malov
>  ; Majd Dibbiny ; Stokes, Ian
>  ; Ferriter, Cian ; Ben 
>  Pfaff
>  ; Balazs Nemeth 
>  Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
> 
>  On 6/25/21 7:28 PM, Van Haaren, Harry wrote:
> >> -Original Message-
> >> From: dev  On Behalf Of Ilya Maximets
> >> Sent: Friday, June 25, 2021 4:26 PM
> >> To: Sriharsha Basavapatna ; Ilya
>  Maximets
> >> 
> >> Cc: Eli Britstein ; ovs dev ; 
> >> Ivan
>  Malov
> >> ; Majd Dibbiny 
> >> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
> >
> > 
> >
>  That looks good to me.  So, I guess, Harsha, we're waiting for
>  your review/tests here.
> >>>
> >>> Thanks Ilya and Eli, looks good to me; I've also tested it and it 
> >>> works fine.
> >>> -Harsha
> >>
> >> Thanks, everyone.  Applied to master.
> >
> > Hi Ilya and OVS Community,
> >
> > There are open questions around this patchset, why has it been merged?
> >
> > Earlier today, new concerns were raised by Cian around the negative
> >> performance
>  impact of these code changes:
> > - https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384445.html
> >
> > Both you (Ilya) and Eli responded, and I was following the 
> > conversation. Various
>  code changes were suggested,
> > and some may seem like they might work, Eli mentioned some solutions 
> > might
> >> not
>  work due to the hardware:
> > I was processing both your comments and input, and planning a technical 
> > reply
>  later today.
> > - suggestions: https://mail.openvswitch.org/pipermail/ovs-dev/2021-
>  June/384446.html
> > - concerns around hw: 
> > https://mail.openvswitch.org/pipermail/ovs-dev/2021-
>  June/384464.html
> 
>  Concerns not really about the hardware, but the API itself
>  that should be clarified a little bit to avoid confusion and
>  avoid incorrect changes like the one I suggested.
>  But this is a small enhancement that could be done on top.
> 
> >
> > Keep in mind that there are open performance issues to be worked out, 
> > that
> >> have
>  not been resolved at this point in the conversation.
> 
>  Performance issue that can be worked out, will be worked out
>  in a separate patch , v1 for which we already have on a mailing
>  list for some time, so it didn't make sense to to re-validate
>  the whole series again due to this one pretty obvious change.
> 
> > There is no agreement on solutions, nor an agreement to ignore the
> >> performance
>  degradation, or to try resolve this degradation later.
> 
>  Particular part of the packet restoration call seems hard
>  to avoid in a long term (I don't see a good solution for that),
>  but the short term solution might be implemented on top.
>  The part with multiple reads of recirc_id and checking if
>  offloading is enabled has a fix already (that needs a v2, but
>  anyway).
> 
> >
> > That these patches have been merged is inappropriate:
> > 1) Not enough time given for responses (11 am concerns raised, 5pm 
> > merged
>  without resolution? (Irish timezone))
> 
>  I responded with suggestions and arguments against solutions
>  suggested in the report, Eli responded with rejection of one
>  one of my suggestions.  And it seems clear (for me) that
>  there is no good solution for this part at the moment.
>  Part of the performance could be won back, but the rest
>  seems to be inevitable.  As a short-term solution we can
>  guard the netdev_hw_miss_packet_recover() with experimental
>  API ifdef, but it will strike back anyway in the future.
> 
> > 2) Open question not addressed/resolved, resulting in a 6% known 
> > negative
>  performance impact being merged.
> 
>  I don't think it wasn't addressed.
> >>>
> >>> Was code merged that resulted in a known regression of 6%?  Yes. Facts 
> >>> are facts.
> >>> I don't care for arguing over

Re: [ovs-dev] [PATCH] bridge: Use correct (legacy) role names in database.

2021-07-07 Thread Flavio Leitner



Hi,

On Tue, Jul 06, 2021 at 03:37:09PM -0700, Ben Pfaff wrote:
> The vswitchd database schema requires role names to be "master" or
> "slave", but this code tried to use "primary" and "secondary".

We have defined the constraints in the schema and we can't change
the schema because it would require external applications to do
the same change and would affect upgrades, correct?

Thanks
fbl

> Signed-off-by: Ben Pfaff 
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/218
> Fixes: 807152a4ddfb ("Use primary/secondary, not master/slave, as names for 
> OpenFlow roles.")
> ---
>  vswitchd/bridge.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index 0432d2abf0af..cb7c5cb769da 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -3019,9 +3019,9 @@ ofp12_controller_role_to_str(enum ofp12_controller_role 
> role)
>  case OFPCR12_ROLE_EQUAL:
>  return "other";
>  case OFPCR12_ROLE_PRIMARY:
> -return "primary";
> +return "master";
>  case OFPCR12_ROLE_SECONDARY:
> -return "secondary";
> +return "slave";
>  case OFPCR12_ROLE_NOCHANGE:
>  default:
>  return NULL;
> -- 
> 2.31.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v14 06/11] dpif-netdev: Add command to get dpif implementations.

2021-07-07 Thread Flavio Leitner



Hello,

Please find my comments below.

On Thu, Jul 01, 2021 at 04:06:14PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds a new command to retrieve the list of available
> DPIF implementations. This can be used by to check what implementations
> of the DPIF are available in any given OVS binary. It also returns which
> implementations are in use by the OVS PMD threads.
> 
> Usage:
>  $ ovs-appctl dpif-netdev/dpif-impl-get
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> 
> ---
> 
> v14:
> - Rename command to dpif-impl-get.
> - Hide more of the dpif impl details from lib/dpif-netdev.c. Pass a
>   dynamic_string to return the dpif-impl-get CMD output.
> - Add information about which DPIF impl is currently in use by each PMD
>   thread.
> 
> v13:
> - Add NEWS item about DPIF get and set commands here rather than in a
>   later commit.
> - Add documentation items about DPIF set commands here rather than in a
>   later commit.
> ---
>  Documentation/topics/dpdk/bridge.rst |  8 +++
>  NEWS |  1 +
>  lib/dpif-netdev-private-dpif.c   | 33 
>  lib/dpif-netdev-private-dpif.h   |  8 +++
>  lib/dpif-netdev-unixctl.man  |  3 +++
>  lib/dpif-netdev.c| 30 +
>  6 files changed, 83 insertions(+)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 06d1f943c..2d0850836 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -226,6 +226,14 @@ stats associated with the datapath.
>  Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF 
> to
>  improve performance.
>  
> +OVS provides multiple implementations of the DPIF. The available
> +implementations can be listed with the following command ::
> +
> +$ ovs-appctl dpif-netdev/dpif-impl-get
> +Available DPIF implementations:
> +  dpif_scalar (pmds: none)
> +  dpif_avx512 (pmds: 1,2,6,7)
> +
>  By default, dpif_scalar is used. The DPIF implementation can be selected by
>  name ::
>  
> diff --git a/NEWS b/NEWS
> index e23506225..cf0987a24 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -13,6 +13,7 @@ Post-v2.15.0
>   * Refactor lib/dpif-netdev.c to multiple header files.
>   * Add avx512 implementation of dpif which can process non recirculated
> packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
> + * Add commands to get and set the dpif implementations.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
> index da3511f51..4eaefb291 100644
> --- a/lib/dpif-netdev-private-dpif.c
> +++ b/lib/dpif-netdev-private-dpif.c
> @@ -92,6 +92,39 @@ dp_netdev_impl_set_default_by_name(const char *name)
>  
>  }
>  
> +uint32_t
> +dp_netdev_impl_get(struct ds *reply, struct dp_netdev_pmd_thread **pmd_list,
> +   size_t n)
> +{
> +/* Add all dpif functions to reply string. */
> +ds_put_cstr(reply, "Available DPIF implementations:\n");
> +
> +for (uint32_t i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
> +ds_put_format(reply, "  %s (pmds: ", dpif_impls[i].name);
> +
> +for (size_t j = 0; j < n; j++) {
> +struct dp_netdev_pmd_thread *pmd = pmd_list[j];
> +if (pmd->core_id == NON_PMD_CORE_ID) {
> +continue;
> +}
> +
> +if (pmd->netdev_input_func == dpif_impls[i].input_func) {
> +ds_put_format(reply, "%u,", pmd->core_id);
> +}
> +}
> +
> +ds_chomp(reply, ',');
> +
> +if (ds_last(reply) == ' ') {
> +ds_put_cstr(reply, "none");
> +}
> +
> +ds_put_cstr(reply, ")\n");
> +}
> +
> +return ARRAY_SIZE(dpif_impls);
> +}
> +
>  /* This function checks all available DPIF implementations, and selects the
>   * returns the function pointer to the one requested by "name".
>   */
> diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
> index 0e58153f4..d2c2cbaf4 100644
> --- a/lib/dpif-netdev-private-dpif.h
> +++ b/lib/dpif-netdev-private-dpif.h
> @@ -22,6 +22,7 @@
>  /* Forward declarations to avoid including files. */
>  struct dp_netdev_pmd_thread;
>  struct dp_packet_batch;
> +struct ds;
>  
>  /* Typedef for DPIF functions.
>   * Returns whether all packets were processed successfully.
> @@ -48,6 +49,13 @@ struct dpif_netdev_impl_info_t {
>  const char *name;
>  };
>  
> +/* This function returns all available implementations to the caller. The
> + * quantity of implementations is returned by the int return value.
> + */
> +uint32_t
> +dp_netdev_impl_get(struct ds *reply, struct dp_netdev_pmd_thread **pmd_list,
> +   size_t n);
> +
>  /* This function

Re: [ovs-dev] [PATCH v5 3/3] dpif-netlink: Introduce per-cpu upcall dispatch

2021-07-07 Thread Flavio Leitner

On Wed, Jul 07, 2021 at 04:43:21AM -0400, Mark Gray wrote:
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
> 
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
> 
> * On systems with a large number of vports, there is correspondingly
> a large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> 
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
> 
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
> 
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
> 
> Reported-at: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---

Acked-by: Flavio Leitner 

Thanks Mark!
fbl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v14 05/11] dpif-netdev: Add command to switch dpif implementation.

2021-07-07 Thread Flavio Leitner

Hi,

Please see my comments below.

On Thu, Jul 01, 2021 at 04:06:13PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds a new command to allow the user to switch
> the active DPIF implementation at runtime. A probe function
> is executed before switching the DPIF implementation, to ensure
> the CPU is capable of running the ISA required. For example, the
> below code will switch to the AVX512 enabled DPIF assuming
> that the runtime CPU is capable of running AVX512 instructions:
> 
>  $ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512
> 
> A new configuration flag is added to allow selection of the
> default DPIF. This is useful for running the unit-tests against
> the available DPIF implementations, without modifying each unit test.
> 
> The design of the testing & validation for ISA optimized DPIF
> implementations is based around the work already upstream for DPCLS.
> Note however that a DPCLS lookup has no state or side-effects, allowing
> the auto-validator implementation to perform multiple lookups and
> provide consistent statistic counters.
> 
> The DPIF component does have state, so running two implementations in
> parallel and comparing output is not a valid testing method, as there
> are changes in DPIF statistic counters (side effects). As a result, the
> DPIF is tested directly against the unit-tests.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> 
> ---
> 
> v14:
> - Change command name to dpif-impl-set
> - Fix the order of includes to what is layed out in the coding-style.rst
> - Use bool not int to capture return value of dpdk_get_cpu_has_isa()
> - Use an enum to index DPIF impls array.
> - Hide more of the dpif impl details from lib/dpif-netdev.c.
> - Fix comment on *dp_netdev_input_func() typedef.
> - Rename dp_netdev_input_func func to input_func.
> - Remove the datapath or dp argument from the dpif-impl-set CMD.
> - Set the DPIF function pointer atomically.
> 
> v13:
> - Add Docs items about the switch DPIF command here rather than in
>   later commit.
> - Document operation in manpages as well as rST.
> - Minor code refactoring to address review comments.
> ---
>  Documentation/topics/dpdk/bridge.rst |  34 
>  acinclude.m4 |  15 
>  configure.ac |   1 +
>  lib/automake.mk  |   1 +
>  lib/dpif-netdev-avx512.c |  14 +++
>  lib/dpif-netdev-private-dpif.c   | 122 +++
>  lib/dpif-netdev-private-dpif.h   |  47 +++
>  lib/dpif-netdev-private-thread.h |  10 ---
>  lib/dpif-netdev-unixctl.man  |   3 +
>  lib/dpif-netdev.c|  74 ++--
>  10 files changed, 306 insertions(+), 15 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-dpif.c
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 526d5c959..06d1f943c 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -214,3 +214,37 @@ implementation ::
>  
>  Compile OVS in debug mode to have `ovs_assert` statements error out if
>  there is a mis-match in the DPCLS lookup implementation.
> +
> +Datapath Interface Performance
> +--
> +
> +The datapath interface (DPIF) or dp_netdev_input() is responsible for taking
> +packets through the major components of the userspace datapath; such as
> +miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance
> +stats associated with the datapath.
> +
> +Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF 
> to
> +improve performance.
> +
> +By default, dpif_scalar is used. The DPIF implementation can be selected by
> +name ::
> +
> +$ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512
> +DPIF implementation set to dpif_avx512.
> +
> +$ ovs-appctl dpif-netdev/dpif-impl-set dpif_scalar
> +DPIF implementation set to dpif_scalar.
> +
> +Running Unit Tests with AVX512 DPIF
> +~~~
> +
> +Since the AVX512 DPIF is disabled by default, a compile time option is
> +available in order to test it with the OVS unit test suite. When building 
> with
> +a CPU that supports AVX512, use the following configure option ::
> +
> +$ ./configure --enable-dpif-default-avx512
> +
> +The following line should be seen in the configure output when the above 
> option
> +is used ::
> +
> +checking whether DPIF AVX512 is default implementation... yes
> diff --git a/acinclude.m4 b/acinclude.m4
> index 15a54d636..5fbcd9872 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
>fi
>  ])
>  
> +dnl Set OVS DPIF default implementation at configure time for running the 
> unit
> +dnl tests on the whole codebase without modifying tests per DPIF impl
> +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
> +

Re: [ovs-dev] [v14 04/11] dpif-avx512: Add ISA implementation of dpif.

2021-07-07 Thread Flavio Leitner

On Thu, Jul 01, 2021 at 04:06:12PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds the AVX512 implementation of DPIF functionality,
> specifically the dp_netdev_input_outer_avx512 function. This function
> only handles outer (no re-circulations), and is optimized to use the
> AVX512 ISA for packet batching and other DPIF work.
> 
> Sparse is not able to handle the AVX512 intrinsics, causing compile
> time failures, so it is disabled for this file.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> Co-authored-by: Kumar Amber 
> Signed-off-by: Kumar Amber 
> 
> ---

Thanks for addressing all the previous comments.

It seems that if we had used a emc_hit along with existing
smc_hit the code would be easier read, instead adding/removing
bitmasks from a emc_smc variable.

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v14 03/11] dpif-netdev: Add function pointer for netdev input.

2021-07-07 Thread Flavio Leitner

On Thu, Jul 01, 2021 at 04:06:11PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds a function pointer to the pmd thread data structure,
> giving the pmd thread flexibility in its dpif-input function choice.
> This allows choosing of the implementation based on ISA capabilities
> of the runtime CPU, leading to optimizations and higher performance.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v14 02/11] dpif-netdev: Split HWOL out to own header file.

2021-07-06 Thread Flavio Leitner



Hi,

After the refactoring and rebasing, this patch doesn't seem
necessary anymore. I don't see value in keeping it.
Can we drop it? What do you think?

fbl


On Thu, Jul 01, 2021 at 04:06:10PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit moves the datapath lookup functions required for
> hardware offload to a separate file. This allows other DPIF
> implementations to access the lookup functions, encouraging
> code reuse.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---
> 
> Cc: Gaetan Rivet 
> Cc: Sriharsha Basavapatna 
> 
> v14:
> - Fix spelling mistake in commit message.
> 
> v13:
> - Minor code refactor to address review comments.
> ---
>  lib/automake.mk|  1 +
>  lib/dpif-netdev-private-hwol.h | 63 ++
>  lib/dpif-netdev-private.h  |  1 +
>  lib/dpif-netdev.c  | 38 ++--
>  4 files changed, 67 insertions(+), 36 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-hwol.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index fdba3c6c0..3a33cdd5c 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -115,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev-private-dfc.h \
>   lib/dpif-netdev-private-dpcls.h \
>   lib/dpif-netdev-private-flow.h \
> + lib/dpif-netdev-private-hwol.h \
>   lib/dpif-netdev-private-thread.h \
>   lib/dpif-netdev-private.h \
>   lib/dpif-netdev-perf.c \
> diff --git a/lib/dpif-netdev-private-hwol.h b/lib/dpif-netdev-private-hwol.h
> new file mode 100644
> index 0..b93297a74
> --- /dev/null
> +++ b/lib/dpif-netdev-private-hwol.h
> @@ -0,0 +1,63 @@
> +/*
> + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc.
> + * Copyright (c) 2021 Intel Corporation.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef DPIF_NETDEV_PRIVATE_HWOL_H
> +#define DPIF_NETDEV_PRIVATE_HWOL_H 1
> +
> +#include "dpif-netdev-private-flow.h"
> +
> +#define MAX_FLOW_MARK   (UINT32_MAX - 1)
> +#define INVALID_FLOW_MARK   0
> +/* Zero flow mark is used to indicate the HW to remove the mark. A packet
> + * marked with zero mark is received in SW without a mark at all, so it
> + * cannot be used as a valid mark.
> + */
> +
> +struct megaflow_to_mark_data {
> +const struct cmap_node node;
> +ovs_u128 mega_ufid;
> +uint32_t mark;
> +};
> +
> +struct flow_mark {
> +struct cmap megaflow_to_mark;
> +struct cmap mark_to_flow;
> +struct id_pool *pool;
> +};
> +
> +/* allocated in dpif-netdev.c */
> +extern struct flow_mark flow_mark;
> +
> +static inline struct dp_netdev_flow *
> +mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd,
> +  const uint32_t mark)
> +{
> +struct dp_netdev_flow *flow;
> +
> +CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0),
> + _mark.mark_to_flow) {
> +if (flow->mark == mark && flow->pmd_id == pmd->core_id &&
> +flow->dead == false) {
> +return flow;
> +}
> +}
> +
> +return NULL;
> +}
> +
> +
> +#endif /* dpif-netdev-private-hwol.h */
> diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h
> index d7b6fd7ec..62e3616c1 100644
> --- a/lib/dpif-netdev-private.h
> +++ b/lib/dpif-netdev-private.h
> @@ -30,5 +30,6 @@
>  #include "dpif-netdev-private-dpcls.h"
>  #include "dpif-netdev-private-dfc.h"
>  #include "dpif-netdev-private-thread.h"
> +#include "dpif-netdev-private-hwol.h"
>  
>  #endif /* netdev-private.h */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 2e29980c5..b9b10c6bb 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -18,6 +18,7 @@
>  #include "dpif-netdev.h"
>  #include "dpif-netdev-private.h"
>  #include "dpif-netdev-private-dfc.h"
> +#include "dpif-netdev-private-hwol.h"
>  
>  #include 
>  #include 
> @@ -1983,26 +1984,8 @@ dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread 
> *pmd,
>  return cls;
>  }
>  
> -#define MAX_FLOW_MARK   (UINT32_MAX - 1)
> -#define INVALID_FLOW_MARK   0
> -/* Zero flow mark is used to indicate the HW to remove the mark. A packet
> - * marked with zero mark is received in SW without a mark at all, so it
> - * cannot be used as a valid mark.
> - */
>  
> -struct megaflow_to_mark_data {
> -const struct cmap_node node;
> -ovs_u128 mega_ufid;
> -uint32_t mark;
> -};
> -
> -struct flow_mark {
> -struct cmap

Re: [ovs-dev] [v14 01/11] dpif-netdev: Refactor to multiple header files.

2021-07-06 Thread Flavio Leitner

On Thu, Jul 01, 2021 at 04:06:09PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> Split the very large file dpif-netdev.c and the datastructures
> it contains into multiple header files. Each header file is
> responsible for the datastructures of that component.
> 
> This logical split allows better reuse and modularity of the code,
> and reduces the very large file dpif-netdev.c to be more managable.
> 
> Due to dependencies between components, it is not possible to
> move component in smaller granularities than this patch.
> 
> To explain the dependencies better, eg:
> 
> DPCLS has no deps (from dpif-netdev.c file)
> FLOW depends on DPCLS (struct dpcls_rule)
> DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key)
> THREAD depends on DFC (struct dfc_cache)
> 
> DFC_PROC depends on THREAD (struct pmd_thread)
> 
> DPCLS lookup.h/c require only DPCLS
> DPCLS implementations require only dpif-netdev-lookup.h.
> - This change was made in 2.12 release with function pointers
> - This commit only refactors the name to "private-dpcls.h"
> 
> netdev_flow_key_equal_mf() is renamed to emc_flow_key_equal_mf().
> 
> Rename functions specific to dpcls from netdev_* namespace to the
> dpcls_* namespace, as they are only used by dpcls code.
> 
> 'inline' is added to the dp_netdev_flow_hash() when it is moved
> definition to fix a compiler error.
> 
> One valid checkpatch issue with the use of the
> EMC_FOR_EACH_POS_WITH_HASH() macro was fixed.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v14 01/11] dpif-netdev: Refactor to multiple header files.

2021-07-06 Thread Flavio Leitner

On Tue, Jul 06, 2021 at 04:20:59PM -0300, Flavio Leitner wrote:
> 
> Hi,
> 
> I was reviewing the patch while testing and I can consistently
> loss 1Mpps (or more) on a P2P scenario with this flow table:
> ovs-ofctl add-flow br0 in_port=dpdk0,actions=output:dpdk1 
> 
> TX: 14Mpps
> RX without patch: +12.6Mpps
> RX with patch: 11.67Mpps

FYI: the performance is consistently recovered with patch 03.
fbl

> CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
> 
> Perf diff:
> # Event 'cycles'
> #
> # Baseline  Delta Abs  Shared Object   Symbol 
>
> #   .  ..  
> ..
> #
>  8.32% -2.64%  libc-2.28.so[.] __memcmp_avx2_movbe
>+2.04%  ovs-vswitchd[.] 
> dp_netdev_pmd_flush_output_packets.part.41
> 14.60% -1.78%  ovs-vswitchd[.] mlx5_rx_burst_vec
>  2.78% +1.78%  ovs-vswitchd[.] non_atomic_ullong_add
> 23.95% +1.60%  ovs-vswitchd[.] miniflow_extract
>  2.02% +1.54%  ovs-vswitchd[.] netdev_dpdk_rxq_recv
> 14.77% -0.82%  ovs-vswitchd[.] dp_netdev_input__
>  5.46% +0.79%  ovs-vswitchd[.] mlx5_tx_burst_none_empw
>  2.79% -0.77%  ovs-vswitchd[.] 
> dp_netdev_pmd_flush_output_on_port
>  3.70% -0.58%  ovs-vswitchd[.] dp_execute_output_action
>  3.34% -0.58%  ovs-vswitchd[.] netdev_send
>  2.92% +0.41%  ovs-vswitchd[.] dp_netdev_process_rxq_port
>  0.36% +0.39%  ovs-vswitchd[.] netdev_dpdk_vhost_rxq_recv
>  4.25% +0.38%  ovs-vswitchd[.] 
> mlx5_tx_handle_completion.isra.49
>  0.82% +0.30%  ovs-vswitchd[.] pmd_perf_end_iteration
>  1.53% -0.12%  ovs-vswitchd[.] netdev_dpdk_filter_packet_len
>  0.53% -0.11%  ovs-vswitchd[.] netdev_is_flow_api_enabled
>  0.54% -0.10%  [vdso]  [.] 0x09c0
>  1.72% +0.10%  ovs-vswitchd[.] netdev_rxq_recv
>  0.61% +0.09%  ovs-vswitchd[.] pmd_thread_main
>  0.08% +0.07%  ovs-vswitchd[.] userspace_tso_enabled
>  0.45% -0.07%  ovs-vswitchd[.] memcmp@plt
>  0.22% -0.05%  ovs-vswitchd[.] dp_execute_cb
> 
> 
> 
> On Thu, Jul 01, 2021 at 04:06:09PM +0100, Cian Ferriter wrote:
> > From: Harry van Haaren 
> > 
> > Split the very large file dpif-netdev.c and the datastructures
> > it contains into multiple header files. Each header file is
> > responsible for the datastructures of that component.
> > 
> > This logical split allows better reuse and modularity of the code,
> > and reduces the very large file dpif-netdev.c to be more managable.
> > 
> > Due to dependencies between components, it is not possible to
> > move component in smaller granularities than this patch.
> > 
> > To explain the dependencies better, eg:
> > 
> > DPCLS has no deps (from dpif-netdev.c file)
> > FLOW depends on DPCLS (struct dpcls_rule)
> > DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key)
> > THREAD depends on DFC (struct dfc_cache)
> > 
> > DFC_PROC depends on THREAD (struct pmd_thread)
> > 
> > DPCLS lookup.h/c require only DPCLS
> > DPCLS implementations require only dpif-netdev-lookup.h.
> > - This change was made in 2.12 release with function pointers
> > - This commit only refactors the name to "private-dpcls.h"
> > 
> > netdev_flow_key_equal_mf() is renamed to emc_flow_key_equal_mf().
> > 
> > Rename functions specific to dpcls from netdev_* namespace to the
> > dpcls_* namespace, as they are only used by dpcls code.
> > 
> > 'inline' is added to the dp_netdev_flow_hash() when it is moved
> > definition to fix a compiler error.
> > 
> > One valid checkpatch issue with the use of the
> > EMC_FOR_EACH_POS_WITH_HASH() macro was fixed.
> > 
> > Signed-off-by: Harry van Haaren 
> > Co-authored-by: Cian Ferriter 
> > Signed-off-by: Cian Ferriter 
> > 
> > ---
> > 
> > Cc: Gaetan Rivet 
> > Cc: Sriharsha Basavapatna 
> > 
> > v14:
> > - Make some functions in lib/dpif-netdev-private-dfc.c private as they
> >   aren't used in other files.
> > - Fix the order of includes to what is layed out in the coding-style.rst
> > 
> > v13:
> > - Add NEWS item in this commit rather than later.
> > - Add lib/dpif-netdev-private-dfc.c file and move non fast path dfc
> >   related functions there.
> > - Squash commit which renames funct

Re: [ovs-dev] [v14 01/11] dpif-netdev: Refactor to multiple header files.

2021-07-06 Thread Flavio Leitner



Hi,

I was reviewing the patch while testing and I can consistently
loss 1Mpps (or more) on a P2P scenario with this flow table:
ovs-ofctl add-flow br0 in_port=dpdk0,actions=output:dpdk1 

TX: 14Mpps
RX without patch: +12.6Mpps
RX with patch: 11.67Mpps

CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz

Perf diff:
# Event 'cycles'
#
# Baseline  Delta Abs  Shared Object   Symbol   
 
#   .  ..  
..
#
 8.32% -2.64%  libc-2.28.so[.] __memcmp_avx2_movbe
   +2.04%  ovs-vswitchd[.] 
dp_netdev_pmd_flush_output_packets.part.41
14.60% -1.78%  ovs-vswitchd[.] mlx5_rx_burst_vec
 2.78% +1.78%  ovs-vswitchd[.] non_atomic_ullong_add
23.95% +1.60%  ovs-vswitchd[.] miniflow_extract
 2.02% +1.54%  ovs-vswitchd[.] netdev_dpdk_rxq_recv
14.77% -0.82%  ovs-vswitchd[.] dp_netdev_input__
 5.46% +0.79%  ovs-vswitchd[.] mlx5_tx_burst_none_empw
 2.79% -0.77%  ovs-vswitchd[.] 
dp_netdev_pmd_flush_output_on_port
 3.70% -0.58%  ovs-vswitchd[.] dp_execute_output_action
 3.34% -0.58%  ovs-vswitchd[.] netdev_send
 2.92% +0.41%  ovs-vswitchd[.] dp_netdev_process_rxq_port
 0.36% +0.39%  ovs-vswitchd[.] netdev_dpdk_vhost_rxq_recv
 4.25% +0.38%  ovs-vswitchd[.] mlx5_tx_handle_completion.isra.49
 0.82% +0.30%  ovs-vswitchd[.] pmd_perf_end_iteration
 1.53% -0.12%  ovs-vswitchd[.] netdev_dpdk_filter_packet_len
 0.53% -0.11%  ovs-vswitchd[.] netdev_is_flow_api_enabled
 0.54% -0.10%  [vdso]  [.] 0x09c0
 1.72% +0.10%  ovs-vswitchd[.] netdev_rxq_recv
 0.61% +0.09%  ovs-vswitchd[.] pmd_thread_main
 0.08% +0.07%  ovs-vswitchd[.] userspace_tso_enabled
 0.45% -0.07%  ovs-vswitchd[.] memcmp@plt
 0.22% -0.05%  ovs-vswitchd[.] dp_execute_cb



On Thu, Jul 01, 2021 at 04:06:09PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> Split the very large file dpif-netdev.c and the datastructures
> it contains into multiple header files. Each header file is
> responsible for the datastructures of that component.
> 
> This logical split allows better reuse and modularity of the code,
> and reduces the very large file dpif-netdev.c to be more managable.
> 
> Due to dependencies between components, it is not possible to
> move component in smaller granularities than this patch.
> 
> To explain the dependencies better, eg:
> 
> DPCLS has no deps (from dpif-netdev.c file)
> FLOW depends on DPCLS (struct dpcls_rule)
> DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key)
> THREAD depends on DFC (struct dfc_cache)
> 
> DFC_PROC depends on THREAD (struct pmd_thread)
> 
> DPCLS lookup.h/c require only DPCLS
> DPCLS implementations require only dpif-netdev-lookup.h.
> - This change was made in 2.12 release with function pointers
> - This commit only refactors the name to "private-dpcls.h"
> 
> netdev_flow_key_equal_mf() is renamed to emc_flow_key_equal_mf().
> 
> Rename functions specific to dpcls from netdev_* namespace to the
> dpcls_* namespace, as they are only used by dpcls code.
> 
> 'inline' is added to the dp_netdev_flow_hash() when it is moved
> definition to fix a compiler error.
> 
> One valid checkpatch issue with the use of the
> EMC_FOR_EACH_POS_WITH_HASH() macro was fixed.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> 
> ---
> 
> Cc: Gaetan Rivet 
> Cc: Sriharsha Basavapatna 
> 
> v14:
> - Make some functions in lib/dpif-netdev-private-dfc.c private as they
>   aren't used in other files.
> - Fix the order of includes to what is layed out in the coding-style.rst
> 
> v13:
> - Add NEWS item in this commit rather than later.
> - Add lib/dpif-netdev-private-dfc.c file and move non fast path dfc
>   related functions there.
> - Squash commit which renames functions specific to dpcls from netdev_*
>   namespace to the dpcls_* namespace, as they are only used by dpcls
>   code into this commit.
> - Minor fixes from review comments.
> ---
>  NEWS   |   1 +
>  lib/automake.mk|   5 +
>  lib/dpif-netdev-lookup-autovalidator.c |   1 -
>  lib/dpif-netdev-lookup-avx512-gather.c |   1 -
>  lib/dpif-netdev-lookup-generic.c   |   1 -
>  lib/dpif-netdev-lookup.h   |   2 +-
>  lib/dpif-netdev-private-dfc.c  | 110 +
>  lib/dpif-netdev-private-dfc.h  | 164 
>  lib/dpif-netdev-private-dpcls.h| 128 ++
>  lib/dpif-netdev-private-flow.h | 163 
>  lib/dpif-netdev-private-thread.h   | 206 ++
>  lib/dpif-netdev-private.h  | 100 +
>  lib/dpif-netdev.c

Re: [ovs-dev] [PATCH 1/1] match: do not print "igmp" match keyword

2021-07-06 Thread Flavio Leitner

On Tue, Jul 06, 2021 at 03:27:41PM +0200, Adrian Moreno wrote:
> 
> 
> On 7/6/21 2:50 PM, Flavio Leitner wrote:
> > On Tue, Jul 06, 2021 at 08:25:59AM +0200, Adrian Moreno wrote:
> >>
> >>
> >> On 7/5/21 4:15 PM, Flavio Leitner wrote:
> >>>
> >>> Hi,
> >>>
> >>> On Wed, Jun 30, 2021 at 05:43:54PM +0200, Adrian Moreno wrote:
> >>>> The match keyword "igmp" is not supported in ofp-parse, which means
> >>>> that flow dumps cannot be restored. This patch prints the igmp match
> >>>> in the accepted format (ip,nw_proto=2) and adds a test.
> >>>
> >>> I raised concerns about changing the output and break scripts in
> >>> the past.  However, it seems not removing the keyword also cause
> >>> issues, so I am not opposing to remove the igmp keyword anymore.
> >>>
> >>> Acked-by: Flavio Leitner 
> >>>
> >>
> >> Thanks Flavio. Do you think this is an acceptable solution also for stable 
> >> branches?
> > 
> > My concern is that changing the output can potentially break
> > somebody else's script and that is really bad in a stable
> > release update.
> > 
> > BTW, this is an user visible change, so I'd say that the patch
> > needs to highlight that in the NEWS file too.
> > 
> OK. I'll send another update, thanks.
> 
> > 
> >> If not, how about replacing the flows in ovs-save so that upgrades of 
> >> stable
> >> branches work fine?
> > 
> > You mean fixing ovs-save in master or in stable branches?
> > 
> My proposal was:
> - changing the output + advertise in NEWS in master branch (and future 
> releases)
> - add a workaround in ovs-save in stable branches to ensure they can be 
> upgraded
> without big datapath impact
> 
> WDYT?

Sounds like a good plan to me.

Thank you,
-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/1] match: do not print "igmp" match keyword

2021-07-06 Thread Flavio Leitner

On Tue, Jul 06, 2021 at 08:25:59AM +0200, Adrian Moreno wrote:
> 
> 
> On 7/5/21 4:15 PM, Flavio Leitner wrote:
> > 
> > Hi,
> > 
> > On Wed, Jun 30, 2021 at 05:43:54PM +0200, Adrian Moreno wrote:
> >> The match keyword "igmp" is not supported in ofp-parse, which means
> >> that flow dumps cannot be restored. This patch prints the igmp match
> >> in the accepted format (ip,nw_proto=2) and adds a test.
> > 
> > I raised concerns about changing the output and break scripts in
> > the past.  However, it seems not removing the keyword also cause
> > issues, so I am not opposing to remove the igmp keyword anymore.
> > 
> > Acked-by: Flavio Leitner 
> > 
> 
> Thanks Flavio. Do you think this is an acceptable solution also for stable 
> branches?

My concern is that changing the output can potentially break
somebody else's script and that is really bad in a stable
release update.

BTW, this is an user visible change, so I'd say that the patch
needs to highlight that in the NEWS file too.


> If not, how about replacing the flows in ovs-save so that upgrades of stable
> branches work fine?

You mean fixing ovs-save in master or in stable branches?

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 3/3] dpif-netlink: Introduce per-cpu upcall dispatch

2021-07-06 Thread Flavio Leitner



Hi Mark,

David had some comments about the NEWS file, and I found an issue
on Windows below.

On Tue, Jul 06, 2021 at 05:31:11AM -0400, Mark Gray wrote:
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
> 
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
> 
> * On systems with a large number of vports, there is correspondingly
> a large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> 
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
> 
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
> 
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
> 
> Reported-at: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
> 
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * change DISPATCH_MODE_PER_CPU() to inline function
>  * add `ovs-appctl` command to check dispatch mode for datapaths
>  * fixed issue with userspace actions (tested using `ovs-ofctl 
> monitor br0 65534 -P nxt_packet_in`)
>  * update documentation as requested
> v2 - Reworked based on Flavio's comments:
>  * Used dpif_netlink_upcall_per_cpu() for check in 
> dpif_netlink_set_handler_pids()
>  * Added macro for (ignored) Netlink PID
>  * Fixed indentation issue
>  * Added NEWS entry
>  * Added section to ovs-vswitchd.8 man page
> v4 - Reworked based on Flavio's comments:
>  * Cleaned up log message when dispatch mode is set
> 
>  NEWS  |   7 +-
>  .../linux/compat/include/linux/openvswitch.h  |   7 +
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev.c |   1 +
>  lib/dpif-netlink-unixctl.man  |   6 +
>  lib/dpif-netlink.c| 464 --
>  lib/dpif-provider.h   |  32 +-
>  lib/dpif.c|  17 +
>  lib/dpif.h|   1 +
>  ofproto/ofproto-dpif-upcall.c |  51 +-
>  ofproto/ofproto.c |  12 -
>  vswitchd/ovs-vswitchd.8.in|   1 +
>  vswitchd/vswitch.xml  |  23 +-
>  13 files changed, 526 insertions(+), 97 deletions(-)
>  create mode 100644 lib/dpif-netlink-unixctl.man
> 
> diff --git a/NEWS b/NEWS
> index a2a2dcf95d7d..80b13e358685 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -29,7 +29,12 @@ Post-v2.15.0
> - ovsdb-tool:
>   * New option '--election-timer' to the 'create-cluster' command to set 
> the
> leader election timer during cluster creation.
> -
> +   - Per-cpu upcall dispatching:
> + * ovs-vswitchd will configure the kernel module using per-cpu dispatch
> +   mode (if available). This changes the way upcalls are delivered to 
> user
> +   space in order to resolve a number of issues with per-vport dispatch.
> +   The new debug appctl command `dpif-netlink/dispatch-mode`
> +   will return the current dispatch mode for each datapath.
>  
>  v2.15.0 - 15 Feb 2021
>  -
> diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
> b/datapath/linux/compat/include/linux/openvswitch.h
> index 875de20250ce..f29265df055e 100644
> --- a/datapath/linux/compat/include/linux/openvswitch.h
> +++ b/datapath/linux/compat/include/linux/openvswitch.h
> @@ -89,6 +89,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.

Re: [ovs-dev] [PATCH v3 3/3] dpif-netlink: Introduce per-cpu upcall dispatch

2021-07-05 Thread Flavio Leitner



Hi Mark,

On Mon, Jul 05, 2021 at 09:38:37AM -0400, Mark Gray wrote:
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
> 
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
> 
> * On systems with a large number of vports, there is correspondingly
> a large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> 
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
> 
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
> 
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
> 
> Reported-at: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
> 
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * change DISPATCH_MODE_PER_CPU() to inline function
>  * add `ovs-appctl` command to check dispatch mode for datapaths
>  * fixed issue with userspace actions (tested using `ovs-ofctl 
> monitor br0 65534 -P nxt_packet_in`)
>  * update documentation as requested
> v2 - Reworked based on Flavio's comments:
>  * Used dpif_netlink_upcall_per_cpu() for check in 
> dpif_netlink_set_handler_pids()
>  * Added macro for (ignored) Netlink PID
>  * Fixed indentation issue
>  * Added NEWS entry
>  * Added section to ovs-vswitchd.8 man page
> 
>  NEWS  |   7 +-
>  .../linux/compat/include/linux/openvswitch.h  |   7 +
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev.c |   1 +
>  lib/dpif-netlink-unixctl.man  |   6 +
>  lib/dpif-netlink.c| 460 --
>  lib/dpif-provider.h   |  32 +-
>  lib/dpif.c|  17 +
>  lib/dpif.h|   1 +
>  ofproto/ofproto-dpif-upcall.c |  51 +-
>  ofproto/ofproto.c |  12 -
>  vswitchd/ovs-vswitchd.8.in|   1 +
>  vswitchd/vswitch.xml  |  23 +-
>  13 files changed, 522 insertions(+), 97 deletions(-)
>  create mode 100644 lib/dpif-netlink-unixctl.man
> 
> diff --git a/NEWS b/NEWS
> index a2a2dcf95d7d..80b13e358685 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -29,7 +29,12 @@ Post-v2.15.0
> - ovsdb-tool:
>   * New option '--election-timer' to the 'create-cluster' command to set 
> the
> leader election timer during cluster creation.
> -
> +   - Per-cpu upcall dispatching:
> + * ovs-vswitchd will configure the kernel module using per-cpu dispatch
> +   mode (if available). This changes the way upcalls are delivered to 
> user
> +   space in order to resolve a number of issues with per-vport dispatch.
> +   The new debug appctl command `dpif-netlink/dispatch-mode`
> +   will return the current dispatch mode for each datapath.
>  
>  v2.15.0 - 15 Feb 2021
>  -
> diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
> b/datapath/linux/compat/include/linux/openvswitch.h
> index 875de20250ce..f29265df055e 100644
> --- a/datapath/linux/compat/include/linux/openvswitch.h
> +++ b/datapath/linux/compat/include/linux/openvswitch.h
> @@ -89,6 +89,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.  Always present in notifications.
>   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> the
> @@ -105,6 +107,8 @@ enum ovs_datapath_attr {
>

Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch

2021-07-05 Thread Flavio Leitner

On Wed, Jun 30, 2021 at 05:53:49AM -0400, Mark Gray wrote:
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
> 
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
> 
> * On systems with a large number of vports, there is a correspondingly
> large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> 
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
> 
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
> 
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
> 
> The corresponding user space code can be found at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html
> 
> Bugzilla: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---

It looks good and works for me.
Acked-by: Flavio Leitner 

Thanks Mark!
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] netdev-linux: Ignore TSO packets when TSO is not enabled for userspace

2021-07-05 Thread Flavio Leitner

On Mon, Jul 05, 2021 at 07:57:41AM -0400, Eelco Chaudron wrote:
> When TSO is disabled from a userspace forwarding datapath perspective,
> but TSO has been wrongly enabled on the kernel side, log a warning
> message, and drop the packet. With the current implementation,
> OVS will crash.
> 
> Fixes: 73858f9db ("netdev-linux: Prepend the std packet in the TSO packet")
> Signed-off-by: Eelco Chaudron 
> ---

Nice, thanks Eelco.

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/1] match: do not print "igmp" match keyword

2021-07-05 Thread Flavio Leitner

Hi,

On Wed, Jun 30, 2021 at 05:43:54PM +0200, Adrian Moreno wrote:
> The match keyword "igmp" is not supported in ofp-parse, which means
> that flow dumps cannot be restored. This patch prints the igmp match
> in the accepted format (ip,nw_proto=2) and adds a test.

I raised concerns about changing the output and break scripts in
the past.  However, it seems not removing the keyword also cause
issues, so I am not opposing to remove the igmp keyword anymore.

Acked-by: Flavio Leitner 

Thanks,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-linux: Ignore TSO packets when TSO is not enabled for userspace

2021-07-03 Thread Flavio Leitner



Hi,

Thanks for fixing this bug!

On Fri, Jul 02, 2021 at 10:22:36AM -0400, Eelco Chaudron wrote:
> When TSO is disabled from a userspace forwarding datapath perspective,
> but TSO has been wrongly enabled on the kernel side, log a warning
> message, and drop the packet. With the current implementation,
> OVS will crash.
> 
> Fixes: 73858f9db ("netdev-linux: Prepend the std packet in the TSO packet")
> Signed-off-by: Eelco Chaudron 
> ---
>  lib/netdev-linux.c |   17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 07ece0c7f..6dc98cae5 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -1303,6 +1303,23 @@ netdev_linux_batch_rxq_recv_sock(struct 
> netdev_rxq_linux *rx, int mtu,
>  continue;
>  }
>  
> +if (mmsgs[i].msg_hdr.msg_flags & MSG_TRUNC) {
> +/* Data is truncated, so the packet is corrupted, and needs to be
> + * dropped. This can happen if TSO/GRO is enabled in the kernel,
> + * but not in userspace, i.e. there is no dp buffer to store the
> + * full packet. */
> +struct netdev *netdev_ = netdev_rxq_get_netdev(>up);
> +struct netdev_linux *netdev = netdev_linux_cast(netdev_);
> +
> +dp_packet_delete(buffers[i]);
> +dp_packet_delete(rx->aux_bufs[i]);

It needs to set rx->aux_bufs[i] to NULL so that the caller can
allocate a new packet, otherwise the free pointer will be re-used
next time.

Another option is to not free that aux_bufs and let it be re-used
next time (same happens in the block above when
mmsgs[i].msg_len < ETH_HEADER_LEN.

fbl

> +netdev->rx_dropped += 1;
> +VLOG_WARN_RL(,
> + "%s: Dropped packet: Too big. GRO/TSO enabled?",
> + netdev_get_name(netdev_));
> +continue;
> +}
> +
>  if (mmsgs[i].msg_len > std_len) {
>  /* Build a single linear TSO packet by prepending the data from
>   * std_len buffer to the aux_buf. */
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 3/3] dpif-netlink: Introduce per-cpu upcall dispatch

2021-07-01 Thread Flavio Leitner



Hi Mark,

One more thing, this seems a relevant change to mention in the
NEWS file.

Thanks,
fbl

On Wed, Jun 30, 2021 at 05:56:11AM -0400, Mark Gray wrote:
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
> 
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
> 
> * On systems with a large number of vports, there is correspondingly
> a large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> 
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
> 
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
> 
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
> 
> Reported-at: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
> 
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * change DISPATCH_MODE_PER_CPU() to inline function
>  * add `ovs-appctl` command to check dispatch mode for datapaths
>  * fixed issue with userspace actions (tested using `ovs-ofctl 
> monitor br0 65534 -P nxt_packet_in`)
>  * update documentation as requested
> 
>  .../linux/compat/include/linux/openvswitch.h  |   7 +
>  lib/dpif-netdev.c |   1 +
>  lib/dpif-netlink.c| 456 --
>  lib/dpif-provider.h   |  32 +-
>  lib/dpif.c|  17 +
>  lib/dpif.h|   1 +
>  ofproto/ofproto-dpif-upcall.c |  51 +-
>  ofproto/ofproto.c |  12 -
>  vswitchd/vswitch.xml  |  23 +-
>  9 files changed, 504 insertions(+), 96 deletions(-)
> 
> diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
> b/datapath/linux/compat/include/linux/openvswitch.h
> index 875de20250ce..f29265df055e 100644
> --- a/datapath/linux/compat/include/linux/openvswitch.h
> +++ b/datapath/linux/compat/include/linux/openvswitch.h
> @@ -89,6 +89,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.  Always present in notifications.
>   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> the
> @@ -105,6 +107,8 @@ enum ovs_datapath_attr {
>   OVS_DP_ATTR_MEGAFLOW_STATS, /* struct ovs_dp_megaflow_stats */
>   OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
>   OVS_DP_ATTR_PAD,
> + OVS_DP_ATTR_PAD2,
> + OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls */
>   __OVS_DP_ATTR_MAX
>  };
>  
> @@ -146,6 +150,9 @@ struct ovs_vport_stats {
>  /* Allow tc offload recirc sharing */
>  #define OVS_DP_F_TC_RECIRC_SHARING  (1 << 2)
>  
> +/* Allow per-cpu dispatch of upcalls */
> +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3)
> +
>  /* Fixed logical ports. */
>  #define OVSP_LOCAL  ((__u32)0)
>  
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index c5ab35d2a5a5..b2c2baadf4f3 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -8562,6 +8562,7 @@ const struct dpif_class dpif_netdev_class = {
>  dpif_netdev_operate,
>  NULL,   /* recv_set */
>  NULL,   /* handlers_set */
> +NULL,   /* number_handlers_required */
>  dpif_netdev_set_config,
>  dpif_netdev_queue_to_priority,
>  NULL,   /* recv */
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index f92905dd83fd..2399879aea3e 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -98,6

Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch

2021-07-01 Thread Flavio Leitner



Hi Mark,

Thanks for addressing the comments. The patch looks good to me
and I plan to test it tomorrow.

fbl

On Wed, Jun 30, 2021 at 05:53:49AM -0400, Mark Gray wrote:
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
> 
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
> 
> * On systems with a large number of vports, there is a correspondingly
> large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> 
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
> 
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
> 
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
> 
> The corresponding user space code can be found at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html
> 
> Bugzilla: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
> 
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * Fixed handling of userspace action case
>  * Renamed 'struct dp_portids'
>  * Fixed handling of return from kmalloc()
>  * Removed check for dispatch type from ovs_dp_get_upcall_portid()
>- Reworked based on Dan's comments:
>  * Fixed handling of return from kmalloc()
>- Reworked based on Pravin's comments:
>  * Fixed handling of userspace action case
>- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids
> 
>  include/uapi/linux/openvswitch.h |  8 
>  net/openvswitch/actions.c|  6 ++-
>  net/openvswitch/datapath.c   | 70 +++-
>  net/openvswitch/datapath.h   | 20 +
>  4 files changed, 101 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 8d16744edc31..6571b57b2268 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -70,6 +70,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.  Always present in notifications.
>   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> the
> @@ -87,6 +89,9 @@ enum ovs_datapath_attr {
>   OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
>   OVS_DP_ATTR_PAD,
>   OVS_DP_ATTR_MASKS_CACHE_SIZE,
> + OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls in 
> per-cpu
> +  * dispatch mode
> +  */
>   __OVS_DP_ATTR_MAX
>  };
>  
> @@ -127,6 +132,9 @@ struct ovs_vport_stats {
>  /* Allow tc offload recirc sharing */
>  #define OVS_DP_F_TC_RECIRC_SHARING   (1 << 2)
>  
> +/* Allow per-cpu dispatch of upcalls */
> +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3)
> +
>  /* Fixed logical ports. */
>  #define OVSP_LOCAL  ((__u32)0)
>  
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index ef15d9eb4774..f79679746c62 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, struct 
> sk_buff *skb,
>   break;
>  
>   case OVS_USERSPACE_ATTR_PID:
> - upcall.portid = nla_get_u32(a);
> + if (dp->user_features & 
> OVS_DP_F_DISPATCH_UPCALL_PER_CPU)
> + upcall.portid =
> +ovs_dp_get_upcall_portid(dp, 
> smp_processor_id());
> + else
> +

Re: [ovs-dev] [PATCH 3/3] dpif-netlink: Introduce per-cpu upcall dispatch

2021-07-01 Thread Flavio Leitner



Hi Mark,

I've not tested this yet.
See some comments below.

On Wed, Jun 30, 2021 at 05:56:11AM -0400, Mark Gray wrote:
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
> 
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
> 
> * On systems with a large number of vports, there is correspondingly
> a large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> 
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
> 
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
> 
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
> 
> Reported-at: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
> 
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * change DISPATCH_MODE_PER_CPU() to inline function
>  * add `ovs-appctl` command to check dispatch mode for datapaths
>  * fixed issue with userspace actions (tested using `ovs-ofctl 
> monitor br0 65534 -P nxt_packet_in`)
>  * update documentation as requested
> 
>  .../linux/compat/include/linux/openvswitch.h  |   7 +
>  lib/dpif-netdev.c |   1 +
>  lib/dpif-netlink.c| 456 --
>  lib/dpif-provider.h   |  32 +-
>  lib/dpif.c|  17 +
>  lib/dpif.h|   1 +
>  ofproto/ofproto-dpif-upcall.c |  51 +-
>  ofproto/ofproto.c |  12 -
>  vswitchd/vswitch.xml  |  23 +-
>  9 files changed, 504 insertions(+), 96 deletions(-)
> 
> diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
> b/datapath/linux/compat/include/linux/openvswitch.h
> index 875de20250ce..f29265df055e 100644
> --- a/datapath/linux/compat/include/linux/openvswitch.h
> +++ b/datapath/linux/compat/include/linux/openvswitch.h
> @@ -89,6 +89,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.  Always present in notifications.
>   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> the
> @@ -105,6 +107,8 @@ enum ovs_datapath_attr {
>   OVS_DP_ATTR_MEGAFLOW_STATS, /* struct ovs_dp_megaflow_stats */
>   OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
>   OVS_DP_ATTR_PAD,
> + OVS_DP_ATTR_PAD2,
> + OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls */
>   __OVS_DP_ATTR_MAX
>  };
>  
> @@ -146,6 +150,9 @@ struct ovs_vport_stats {
>  /* Allow tc offload recirc sharing */
>  #define OVS_DP_F_TC_RECIRC_SHARING  (1 << 2)
>  
> +/* Allow per-cpu dispatch of upcalls */
> +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3)
> +
>  /* Fixed logical ports. */
>  #define OVSP_LOCAL  ((__u32)0)
>  
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index c5ab35d2a5a5..b2c2baadf4f3 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -8562,6 +8562,7 @@ const struct dpif_class dpif_netdev_class = {
>  dpif_netdev_operate,
>  NULL,   /* recv_set */
>  NULL,   /* handlers_set */
> +NULL,   /* number_handlers_required */
>  dpif_netdev_set_config,
>  dpif_netdev_queue_to_priority,
>  NULL,   /* recv */
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index f92905dd83fd..2399879aea3e 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -98,6 +98,8 @@ struct dpif_netlink_dp {
>

Re: [ovs-dev] [PATCH 2/3] dpif-netlink: fix report_loss() message

2021-07-01 Thread Flavio Leitner

On Wed, Jun 30, 2021 at 05:56:10AM -0400, Mark Gray wrote:
> Fixes: 1579cf677fcb ("dpif-linux: Implement the API functions to allow 
> multiple handler threads read upcall.")
> Signed-off-by: Mark Gray 
> ---

Acked-by: Flavio Leitner 
Thanks Mark!
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/3] ofproto: change type of n_handlers and n_revalidators

2021-07-01 Thread Flavio Leitner

On Wed, Jun 30, 2021 at 05:56:09AM -0400, Mark Gray wrote:
> 'n_handlers' and 'n_revalidators' are declared as type 'size_t'.
> However, dpif_handlers_set() requires parameter 'n_handlers' as
> type 'uint32_t'. This patch fixes this type mismatch.
> 
> Signed-off-by: Mark Gray 
> ---

Acked-by: Flavio Leitner 

Thanks Mark!
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpif/dpcls: limit count subtable search info logs

2021-07-01 Thread Flavio Leitner

On Thu, Jul 01, 2021 at 12:12:18PM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: Flavio Leitner 
> > Sent: Thursday, July 1, 2021 12:15 PM
> > To: Van Haaren, Harry 
> > Cc: Amber, Kumar ; d...@openvswitch.org;
> > i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [PATCH] dpif/dpcls: limit count subtable search info 
> > logs
> > 
> > On Thu, Jul 01, 2021 at 10:55:55AM +, Van Haaren, Harry wrote:
> > > > -Original Message-
> > > > From: Amber, Kumar 
> > > > Sent: Thursday, July 1, 2021 10:50 AM
> > > > To: Van Haaren, Harry 
> > > > Cc: d...@openvswitch.org; i.maxim...@ovn.org; Flavio Leitner
> > > > 
> > > > Subject: RE: [ovs-dev] [PATCH] dpif/dpcls: limit count subtable search 
> > > > info
> > logs
> > > >
> > > > Hi Harry,
> > > >
> > > > Any Insights on the following 
> > > >
> > > > > -Original Message-
> > > > > From: Flavio Leitner 
> > > > > Sent: Wednesday, June 30, 2021 12:28 AM
> > > > > To: Amber, Kumar 
> > > > > Cc: d...@openvswitch.org; i.maxim...@ovn.org
> > > > > Subject: Re: [ovs-dev] [PATCH] dpif/dpcls: limit count subtable 
> > > > > search info
> > logs
> > > > >
> > > > > On Tue, Jun 29, 2021 at 10:19:41PM +0530, Kumar Amber wrote:
> > > > > > From: Harry van Haaren 
> > > > > >
> > > > > > This commit avoids many instances of "using subtable X for miniflow 
> > > > > > (x,y)"
> > > > > > in the ovs-vswitchd log when using the DPCLS Autovalidator. This
> > > > > > occurs when no specialized subtable is found, and the generic "_any"
> > > > > > version of the avx512 subtable search implementation was used. This
> > > > > > change logs the subtable usage once, avoiding duplicates.
> > > > > >
> > > > > > Signed-off-by: Harry van Haaren 
> > > > > > ---
> > > > > >  lib/dpif-netdev-lookup-avx512-gather.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/lib/dpif-netdev-lookup-avx512-gather.c
> > > > > > b/lib/dpif-netdev-lookup-avx512-gather.c
> > > > > > index bc359dc4a..f1b44deb3 100644
> > > > > > --- a/lib/dpif-netdev-lookup-avx512-gather.c
> > > > > > +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> > > > > > @@ -411,7 +411,7 @@ dpcls_subtable_avx512_gather_probe(uint32_t
> > > > > u0_bits, uint32_t u1_bits)
> > > > > >   */
> > > > > >  if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
> > > > > >  f = dpcls_avx512_gather_mf_any;
> > > > > > -VLOG_INFO("Using avx512_gather_mf_any for subtable 
> > > > > > (%d,%d)\n",
> > > > > > +VLOG_INFO_ONCE("Using avx512_gather_mf_any for subtable
> > > > > > + (%d,%d)\n",
> > > > > >u0_bits, u1_bits);
> > > > >
> > > > > This will log only one time, but there are multiple subtables, so we 
> > > > > won't
> > see
> > > > > other subtable changing. If the subtable information is not relevant, 
> > > > > then it
> > > > > shouldn't be in the msg.
> > >
> > > Note that this only logs the subtables that are not specialized.
> > > Basically, this if this log is seen in the wild, it tell us OVS community 
> > > that the
> > > subtable fingerprint (u0_bits, u1_bits) combo should be specialized.
> > 
> > But then why you don't care about other fingerprints?
> 
> So we do care about other fingerprints. Ideally all non-specialized 
> fingerprints
> will get logged. It’s a question of verbosity of logging vs available info.
> 
> 
> > > > > Also, the log only exists for *_mf_any, not for others specialized 
> > > > > functions.
> > >
> > > Yes, intentionally. If the accelerated path is being chosen, why nag the 
> > > user.
> > 
> > OK.
> > 
> > > If we're missing a specialized subtable, it would be good for OVS 
> > > community to
> > know,
> > > so we log it, but only log it once. We could consider a

Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-01 Thread Flavio Leitner

On Wed, Jun 30, 2021 at 04:54:22PM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: Eelco Chaudron 
> > Sent: Wednesday, June 30, 2021 3:35 PM
> > To: Van Haaren, Harry 
> > Cc: Amber, Kumar ; d...@openvswitch.org;
> > i.maxim...@ovn.org; Flavio Leitner ; Stokes, Ian
> > 
> > Subject: Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based 
> > optimized
> > miniflow extract
> > 
> > 
> > 
> > On 30 Jun 2021, at 15:30, Van Haaren, Harry wrote:
> > 
> > >> -Original Message-
> > >> From: Eelco Chaudron 
> > >> Sent: Wednesday, June 30, 2021 2:12 PM
> > >> To: Amber, Kumar ; Van Haaren, Harry
> > >> 
> > >> Cc: d...@openvswitch.org; i.maxim...@ovn.org; Flavio Leitner
> > ;
> > >> Stokes, Ian 
> > >> Subject: Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based 
> > >> optimized
> > >> miniflow extract
> > >>
> > >> This patch was an interesting patch to review and being reminded about
> > endianness,
> > >> and this site,
> > >>
> > https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_maskz
> > >> _permutexvar_epi8=4315, got me through it ;)
> > >
> > > Hah, yes the Intrinsics Guide is very useful for reading/investigating 
> > > what/how
> > instructions can do.
> > > Its... almost always open in a browser in some tab here! :)
> > >
> > >
> > >> Some comments below...
> > >>
> > >> //Eelco
> > >
> > > Thanks for review, I'll snip away large chunks of code to reduce 
> > > verbosity.
> > >
> > > Regards, -Harry
> > >
> > >
> > >> On 17 Jun 2021, at 18:27, Kumar Amber wrote:
> > >>
> > >>> From: Harry van Haaren 
> > >
> > > 
> > >
> > >>> +/* AVX512-BW level permutex2var_epi8 emulation. */
> > >>> +static inline __m512i
> > >>> +__attribute__((target("avx512bw")))
> > >>
> > >> Are these targets universal enough for all supported compilers, if not 
> > >> we might
> > need
> > >> to move them to individual macros in compile.h.
> > >
> > > Yes, these are the standard gcc/clang etc compiler -m  
> > > switches.
> > >
> > > Search for "-mavx512bw" on e.g. this GCC page, lists them all;
> > > https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
> > >
> > > If a compiler does not understand them, we will have to #ifdef that 
> > > compiler out,
> > > as it just doesn't support the ISA.
> > 
> > Guess my concern is with the windows/Microsoft compiler, as I have no 
> > windows
> > setup, I can not verify this.
> 
> Me neither. Flavio you mentioned a windows compiler issue on the DPIF 
> patchset,
> would you test compile here please?

I used the DPIF v13 and MFEX v4 with a quick fix replacing __builtin_ctz()
with raw_ctz() and it builds fine on AppVeyor:
https://ci.appveyor.com/project/fleitner/ovs/build/job/svobt835gl8wrm8q

< snipped some other unrelated points >

Thanks,
-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpif/dpcls: limit count subtable search info logs

2021-07-01 Thread Flavio Leitner

On Thu, Jul 01, 2021 at 10:55:55AM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: Amber, Kumar 
> > Sent: Thursday, July 1, 2021 10:50 AM
> > To: Van Haaren, Harry 
> > Cc: d...@openvswitch.org; i.maxim...@ovn.org; Flavio Leitner
> > 
> > Subject: RE: [ovs-dev] [PATCH] dpif/dpcls: limit count subtable search info 
> > logs
> > 
> > Hi Harry,
> > 
> > Any Insights on the following 
> > 
> > > -Original Message-
> > > From: Flavio Leitner 
> > > Sent: Wednesday, June 30, 2021 12:28 AM
> > > To: Amber, Kumar 
> > > Cc: d...@openvswitch.org; i.maxim...@ovn.org
> > > Subject: Re: [ovs-dev] [PATCH] dpif/dpcls: limit count subtable search 
> > > info logs
> > >
> > > On Tue, Jun 29, 2021 at 10:19:41PM +0530, Kumar Amber wrote:
> > > > From: Harry van Haaren 
> > > >
> > > > This commit avoids many instances of "using subtable X for miniflow 
> > > > (x,y)"
> > > > in the ovs-vswitchd log when using the DPCLS Autovalidator. This
> > > > occurs when no specialized subtable is found, and the generic "_any"
> > > > version of the avx512 subtable search implementation was used. This
> > > > change logs the subtable usage once, avoiding duplicates.
> > > >
> > > > Signed-off-by: Harry van Haaren 
> > > > ---
> > > >  lib/dpif-netdev-lookup-avx512-gather.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/lib/dpif-netdev-lookup-avx512-gather.c
> > > > b/lib/dpif-netdev-lookup-avx512-gather.c
> > > > index bc359dc4a..f1b44deb3 100644
> > > > --- a/lib/dpif-netdev-lookup-avx512-gather.c
> > > > +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> > > > @@ -411,7 +411,7 @@ dpcls_subtable_avx512_gather_probe(uint32_t
> > > u0_bits, uint32_t u1_bits)
> > > >   */
> > > >  if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
> > > >  f = dpcls_avx512_gather_mf_any;
> > > > -VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n",
> > > > +VLOG_INFO_ONCE("Using avx512_gather_mf_any for subtable
> > > > + (%d,%d)\n",
> > > >u0_bits, u1_bits);
> > >
> > > This will log only one time, but there are multiple subtables, so we 
> > > won't see
> > > other subtable changing. If the subtable information is not relevant, 
> > > then it
> > > shouldn't be in the msg.
> 
> Note that this only logs the subtables that are not specialized.
> Basically, this if this log is seen in the wild, it tell us OVS community 
> that the
> subtable fingerprint (u0_bits, u1_bits) combo should be specialized.

But then why you don't care about other fingerprints?

> > > Also, the log only exists for *_mf_any, not for others specialized 
> > > functions.
> 
> Yes, intentionally. If the accelerated path is being chosen, why nag the user.

OK.

> If we're missing a specialized subtable, it would be good for OVS community 
> to know,
> so we log it, but only log it once. We could consider a rate-limited log 
> instead of
> a "ONCE".

This comes back to the point of showing or not the fingerprint.
If it matters, then it seems a problem not printing others. If what
matters is notifying the user that one or more subtables are not
specialized, then a generic message without fingerprint is enough
and cause less confusion.

 
> Note that DPCLS Autovalidator calls probe() for each subtable-lookup, as it 
> wants
> to test all the different variants. ONCE avoids a lot of noise in this case, 
> but I'm OK
> with a _RL version too.

Sure, I understand the motivation and I agree we should improve that.

> > > Do we need that information in runtime? Unless I am missing other callers,
> > > dpcls_subtable_get_best_impl() has a VLOG_DBG() logging all cases with the
> > > same information.
> 
> Yes, for debugging it is useful to see for any subtable, which is being 
> selected.
> The above prints are INFO level, so OVS community can see what subtables
> could be specialized in future for better performance in those subtable 
> lookups.
> 
> So the one open is to use VLOG with ONCE (as this patch does) or with a 
> RateLimit,
> I'm OK with either approach.

Please help me understand how do you use that log message.

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v4 11/12] dpif-netdev/mfex: add more AVX512 traffic profiles

2021-06-30 Thread Flavio Leitner

Hi,

On Thu, Jun 17, 2021 at 09:57:53PM +0530, Kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit adds 3 new traffic profile implementations to the
> existing avx512 miniflow extract infrastructure. The profiles added are:
> - Ether()/IP()/TCP()
> - Ether()/Dot1Q()/IP()/UDP()
> - Ether()/Dot1Q()/IP()/TCP()
> 
> The design of the avx512 code here is for scalability to add more
> traffic profiles, as well as enabling CPU ISA. Note that an implementation
> is primarily adding static const data, which the compiler then specializes
> away when the profile specific function is declared below.
> 
> As a result, the code is relatively maintainable, and scalable for new
> traffic profiles as well as new ISA, and does not lower performance
> compared with manually written code for each profile/ISA.
> 
> Note that confidence in the correctness of each implementation is
> achieved through autovalidation, unit tests with known packets, and
> fuzz tested packets.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---
> 
> Hi Readers,
> 
> If you have a traffic profile you'd like to see accelerated using
> avx512 code, please send me an email and we can collaborate on adding
> support for it!
> 
> Regards, -Harry
> ---
>  lib/dpif-netdev-extract-avx512.c  | 155 ++
>  lib/dpif-netdev-private-extract.c |  31 ++
>  lib/dpif-netdev-private-extract.h |   4 +
>  3 files changed, 190 insertions(+)
> 
> diff --git a/lib/dpif-netdev-extract-avx512.c 
> b/lib/dpif-netdev-extract-avx512.c
> index 1145ac8a9..0e0f6e295 100644
> --- a/lib/dpif-netdev-extract-avx512.c
> +++ b/lib/dpif-netdev-extract-avx512.c
> @@ -117,6 +117,13 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
> __m512i idx, __m512i a)
>  
>  #define PATTERN_ETHERTYPE_MASK PATTERN_ETHERTYPE_GEN(0xFF, 0xFF)
>  #define PATTERN_ETHERTYPE_IPV4 PATTERN_ETHERTYPE_GEN(0x08, 0x00)
> +#define PATTERN_ETHERTYPE_DT1Q PATTERN_ETHERTYPE_GEN(0x81, 0x00)
> +
> +/* VLAN (Dot1Q) patterns and masks. */
> +#define PATTERN_DT1Q_MASK   \
> +  0x00, 0x00, 0xFF, 0xFF,
> +#define PATTERN_DT1Q_IPV4   \
> +  0x00, 0x00, 0x08, 0x00,
>  
>  /* Generator for checking IPv4 ver, ihl, and proto */
>  #define PATTERN_IPV4_GEN(VER_IHL, FLAG_OFF_B0, FLAG_OFF_B1, PROTO) \
> @@ -142,6 +149,29 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
> __m512i idx, __m512i a)
>34, 35, 36, 37, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* UDP */  
>  \
>NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. 
> */
>  
> +/* TCP shuffle: tcp_ctl bits require mask/processing, not included here. */
> +#define PATTERN_IPV4_TCP_SHUFFLE \
> +   0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, NU, NU, /* Ether 
> */ \
> +  26, 27, 28, 29, 30, 31, 32, 33, NU, NU, NU, NU, 20, 15, 22, 23, /* IPv4 */ 
>  \
> +  NU, NU, NU, NU, NU, NU, NU, NU, 34, 35, 36, 37, NU, NU, NU, NU, /* TCP */  
>  \
> +  NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. 
> */
> +
> +#define PATTERN_DT1Q_IPV4_UDP_SHUFFLE
>  \
> +  /* Ether (2 blocks): Note that *VLAN* type is written here. */ 
>  \
> +  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,  
>  \
> +  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */  
>  \
> +  12, 13, 14, 15, 0, 0, 0, 0,
>  \
> +  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */ 
>  \
> +  38, 39, 40, 41, NU, NU, NU, NU, /* UDP */
> +
> +#define PATTERN_DT1Q_IPV4_TCP_SHUFFLE
>  \
> +  /* Ether (2 blocks): Note that *VLAN* type is written here. */ 
>  \
> +  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,  
>  \
> +  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */  
>  \
> +  12, 13, 14, 15, 0, 0, 0, 0,
>  \
> +  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */ 
>  \
> +  NU, NU, NU, NU, NU, NU, NU, NU, 38, 39, 40, 41, NU, NU, NU, NU, /* TCP */  
>  \
> +  NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
>  
>  /* Generation of K-mask bitmask values, to zero out data in result. Note that
>   * these correspond 1:1 to the above "*_SHUFFLE" values, and bit used must be
> @@ -151,12 +181,22 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
> __m512i idx, __m512i a)
>   * Note the ULL suffix allows shifting by 32 or more without integer 
> overflow.
>   */
>  #define KMASK_ETHER 0x1FFFULL
> +#define KMASK_DT1Q  0x000FULL
>  #define KMASK_IPV4  0xF0FFULL
>  #define KMASK_UDP   0x000FULL
> +#define KMASK_TCP   0x0F00ULL
>  
>  #define PATTERN_IPV4_UDP_KMASK \
>  (KMASK_ETHER | (KMASK_IPV4 << 16) | (KMASK_UDP << 32))
>  
> +#define PATTERN_IPV4_TCP_KMASK \
> +

Re: [ovs-dev] [PATCH] dpif/dpcls: limit count subtable search info logs

2021-06-29 Thread Flavio Leitner

On Tue, Jun 29, 2021 at 10:19:41PM +0530, Kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit avoids many instances of "using subtable X for miniflow (x,y)"
> in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
> when no specialized subtable is found, and the generic "_any" version of
> the avx512 subtable search implementation was used. This change logs the
> subtable usage once, avoiding duplicates.
> 
> Signed-off-by: Harry van Haaren 
> ---
>  lib/dpif-netdev-lookup-avx512-gather.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
> b/lib/dpif-netdev-lookup-avx512-gather.c
> index bc359dc4a..f1b44deb3 100644
> --- a/lib/dpif-netdev-lookup-avx512-gather.c
> +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> @@ -411,7 +411,7 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, 
> uint32_t u1_bits)
>   */
>  if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
>  f = dpcls_avx512_gather_mf_any;
> -VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n",
> +VLOG_INFO_ONCE("Using avx512_gather_mf_any for subtable (%d,%d)\n",
>u0_bits, u1_bits);

This will log only one time, but there are multiple subtables, so we
won't see other subtable changing. If the subtable information is not
relevant, then it shouldn't be in the msg.

Also, the log only exists for *_mf_any, not for others specialized
functions.

Do we need that information in runtime? Unless I am missing
other callers, dpcls_subtable_get_best_impl() has a VLOG_DBG()
logging all cases with the same information. 

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-06-29 Thread Flavio Leitner

On Tue, Jun 29, 2021 at 05:11:00PM +, Amber, Kumar wrote:
> Hi Eelco, Flavio,
> 
> Pls find my replies Inline
> 
> > -Original Message-
> > From: Flavio Leitner 
> > Sent: Tuesday, June 29, 2021 7:51 PM
> > To: Eelco Chaudron 
> > Cc: Amber, Kumar ; Van Haaren, Harry
> > ; d...@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex
> > autovalidator
> > 
> > On Tue, Jun 29, 2021 at 03:50:22PM +0200, Eelco Chaudron wrote:
> > >
> > >
> > > On 28 Jun 2021, at 4:57, Flavio Leitner wrote:
> > >
> > > > Hi,
> > > >
> > > >
> > > > On Thu, Jun 17, 2021 at 09:57:49PM +0530, Kumar Amber wrote:
> > > >> Tests:
> > > >>   6: OVS-DPDK - MFEX Autovalidator
> > > >>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
> > > >>
> > > >> Added a new directory to store the PCAP file used in the tests and
> > > >> a script to generate the fuzzy traffic type pcap to be used in
> > > >> fuzzy unit test.
> > > >
> > > >
> > > > I haven't tried this yet but am I right that these tests are going
> > > > to pass a pcap to send traffic in a busy loop for 5 seconds in the
> > > > first case and 20 seconds in the second case?
> > > >
> > > > I see that when autovalidator is set OVS will crash if one
> > > > implementation returns a different value, so I wonder why we need to
> > > > run for that long.
> > >
> > > I think we should remove the assert (already suggested by Harry), so
> > > it will not crass by accident if someone selects autovalidator in the
> > > field (and runs into an issue).
> > > Failure will then be detected by the ERROR log entries on shutdown.
> > 
> > That's true for the testsuite, but not in production as there is nothing to
> > disable that.
> > 
> > Perhaps if autovalidator detects an issue, it should log an ERROR level log 
> > to
> > report to testsuite, disable the failing mode and make sure OVS is either in
> > default or in another functional mode.
> 
> So I have put the following : 
>   Removed the assert 
>   Allow the Auto-validator to run for all implementation and for a full 
> batch
>   Document error via Vlog_Error
>   Set the auto-validator to default {Scalar} when returning out in case 
> of failure. 

Sounds like a plan to me.
Is that okay with you Eelco?


> > > I’m wondering if there is another way than a simple delay, as these tend 
> > > to
> > cause issues later on. Can we check packets processed or something?
> > 
> > Yeah, maybe we can pass all packets like 5x at least.
> 
> Sure I will try to find something to do it more nicely.
> But just a thought keeping it 20sec allows for a full-stabilization and also 
> thorough testing of stability as well.
> So keeping it may not be just a bad idea.

The issue is that if every test decides to delay seconds, the testsuite
becomes impractical. We have removed 'sleep' over time. Instead, we
have functions to wait for a certain cmdline output, or some event.
Yes, there are still some left to be fixed.

Back to the point, maybe there is a signal of some sort we can get
that indicates the stability you're looking for. 

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

2021-06-29 Thread Flavio Leitner

On Tue, Jun 29, 2021 at 04:32:05PM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: dev  On Behalf Of Eelco Chaudron
> > Sent: Tuesday, June 29, 2021 1:38 PM
> > To: Amber, Kumar 
> > Cc: d...@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select 
> > the best
> > mfex function
> > 
> > More comments below. FYI I’m only reviewing right now, no testing.
> 
> Sure, thanks for reviews.
> 
> > On 17 Jun 2021, at 18:27, Kumar Amber wrote:
> 
> 
> 
> > > +/* Allocate per thread PMD pointer space for study_stats. */
> > > +static inline struct study_stats *
> > > +get_study_stats(void)
> > > +{
> > > +struct study_stats *stats = study_stats_get();
> > > +if (OVS_UNLIKELY(!stats)) {
> > > +   stats = xzalloc(sizeof *stats);
> > > +   study_stats_set_unsafe(stats);
> > > +}
> > > +return stats;
> > > +}
> > > +
> > 
> > Just got a mind-meld with the code, and realized that the function might be 
> > different
> > per PMD thread due to this auto mode (and autovalidator mode in the previous
> > patch).
> > 
> > This makes it only stronger that we need a way to see the currently 
> > selected mode,
> > and not per datapath, but per PMD per datapath!
> 
> Study depends on the traffic pattern, so yes you're correct that it depends.
> The study command was added after community suggested user-experience
> would improve if the user doesn't have to provide an exact miniflow profile 
> name.
> 
> Study studies the traffic running on that PMD, compares all MFEX impls, and 
> prints out
> hits. It selects the _first_ implementation that surpasses the threshold of 
> packets.
> 
> Users are free to use the more specific names of MFEX impls instead of "study"
> for fine-grained control over the MFEX impl in use, e.g.
> 
> ovs-appctl dpif-netdev/miniflow-parser-set avx512_vbmi_ipv4_udp
> 
> > Do we also need a way to set this per PMD?
> 
> I don't feel there is real value here, but we could investigate adding an
> optional parameter to the command indicating a PMD thread IDX to set?
> We have access to "pmd->core_id" in our set() function, so limiting changes
> to a specific PMD thread can be done ~ easily... but is it really required?

I think the concern here (at least from my side) is that users can
set the algorithm globally or per DP, not per PMD. However, the
study can set different algorithms per PMD. For example, say that
'study' indicates that alg#1 for PMD#1 and alg#2 for PMD#2 in the
lab. Now we want to move to production and make that selection
static, how can we do that?

If we set study, how do we tell from the cmdline the algorithm
chose for each PMD? Another example of the same situation: if
we always start with 'study' and suddenly there is a traffic
processing difference. How one can check what is different in
the settings? The logs don't tell which PMD was affected.
 
> Perfect is the enemy of good... I'd prefer focus on getting existing code 
> changes merged,
> and add additional (optional) parameters in future if deemed useful in real 
> world testing?

True. Perhaps we have different use cases in mind. How do you expect
users to use this feature? Do you think production users will always
start with 'study'?

Thanks,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Flavio Leitner


Hi,

On Tue, Jun 29, 2021 at 03:20:36PM +, Amber, Kumar wrote:
> Hi Flavio,
> 
> Replies inline.
> 
> > >
> > > Guess the above needs to be atomic.
> > >
> > > Removed based on Flavio comments.
> > 
> > I asked to initialize that using an API and Eelco is asking to set it 
> > atomically.
> > The requests are complementary, right?
> > 
> 
> Yes True sorry for confusion so we have refactored the code a bit to use 
> Atomic set and get along with the API 
> Wherever applicable since here on any failure we would want to fall back to 
> Scalar we would not need the API
> To find default implementation.

OK, no problem. Looking forward to the next version.

> > >
> > > + VLOG_ERR("failed to get miniflow extract function
> > > + implementations\n");
> > >
> > > Capital F to be in sync with your other error messages?
> > >
> > > Removed based on Flavio comments.
> > 
> > Not sure if I got this. I mentioned that the '\n' is not needed at the end 
> > of all
> > VLOG_* calls. Eelco is asking to start with capital 'F'. So the requests are
> > complementary, unless with the refactor the message went away.
> > 
> > Just make sure to follow the logging style convention in OVS.
> 
> Sorry for confusion I have fixed all the VLOGS with this convention.

great!

fbl

> > 
> > fbl
> > 
> > 
> > 
> > >
> > > + return 0;
> > > + }
> > > + ovs_assert(keys_size >= cnt);
> > >
> > > I don’t think we should assert here. Just return an error like above, so 
> > > in
> > production, we get notified, and this implementation gets disabled.
> > >
> > > Actually we do else one would most likely be overwriting the assigned
> > array space for keys and will hit a Seg fault at some point.
> > >
> > > And hence we would like to know at the compile time if this is the case.
> > >
> > > + struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
> > > +
> > > + /* Run scalar miniflow_extract to get default result. */
> > > + DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> > > + pkt_metadata_init(>md, in_port); miniflow_extract(packet,
> > > + [i].mf);
> > > +
> > > + /* Store known good metadata to compare with optimized metadata. */
> > > + good_l2_5_ofs[i] = packet->l2_5_ofs; good_l3_ofs[i] =
> > > + packet->l3_ofs; good_l4_ofs[i] = packet->l4_ofs; good_l2_pad_size[i]
> > > + = packet->l2_pad_size; }
> > > +
> > > + /* Iterate through each version of miniflow implementations. */ for
> > > + (int j = MFEX_IMPL_START_IDX; j < ARRAY_SIZE(mfex_impls); j++) { if
> > > + (!mfex_impls[j].available) { continue; }
> > > +
> > > + /* Reset keys and offsets before each implementation. */
> > > + memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
> > > + DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> > > + dp_packet_reset_offsets(packet); }
> > > + /* Call optimized miniflow for each batch of packet. */ uint32_t
> > > + hit_mask = mfex_impls[j].extract_func(packets, test_keys, keys_size,
> > > + in_port, pmd_handle);
> > > +
> > > + /* Do a miniflow compare for bits, blocks and offsets for all the
> > > + * classified packets in the hitmask marked by set bits. */ while
> > > + (hit_mask) {
> > > + /* Index for the set bit. */
> > > + uint32_t i = __builtin_ctz(hit_mask);
> > > + /* Set the index in hitmask to Zero. */ hit_mask &= (hit_mask - 1);
> > > +
> > > + uint32_t failed = 0;
> > > +
> > > + /* Check miniflow bits are equal. */ if ((keys[i].mf.map.bits[0] !=
> > > + test_keys[i].mf.map.bits[0]) || (keys[i].mf.map.bits[1] !=
> > > + test_keys[i].mf.map.bits[1])) { VLOG_ERR("Good 0x%llx 0x%llx\tTest
> > > + 0x%llx 0x%llx\n", keys[i].mf.map.bits[0], keys[i].mf.map.bits[1],
> > > + test_keys[i].mf.map.bits[0], test_keys[i].mf.map.bits[1]); failed =
> > > + 1; }
> > > +
> > > + if (!miniflow_equal([i].mf, _keys[i].mf)) { uint32_t
> > > + block_cnt = miniflow_n_values([i].mf); VLOG_ERR("Autovalidation
> > > + blocks failed for %s pkt %d", mfex_impls[j].name, i); VLOG_ERR("
> > > + Good hexdump:\n"); uint64_t *good_block_ptr = (uint64_t
> > > + *)[i].buf; uint64_t *test_block_ptr = (uint64_t
> > > + *)_keys[i].buf; for (uint32_t b = 0; b < block_cnt; b++) {
> > > + VLOG_ERR(" %"PRIx64"\n", good_block_ptr[b]); } VLOG_ERR(" Test
> > > + hexdump:\n"); for (uint32_t b = 0; b < block_cnt; b++) { VLOG_ERR("
> > > + %"PRIx64"\n", test_block_ptr[b]); } failed = 1; }
> > > +
> > > + if ((packets->packets[i]->l2_pad_size != good_l2_pad_size[i]) ||
> > > + (packets->packets[i]->l2_5_ofs != good_l2_5_ofs[i]) ||
> > > + (packets->packets[i]->l3_ofs != good_l3_ofs[i]) ||
> > > + (packets->packets[i]->l4_ofs != good_l4_ofs[i])) {
> > > + VLOG_ERR("Autovalidation packet offsets failed for %s pkt %d",
> > > + mfex_impls[j].name, i); VLOG_ERR(" Good offsets: l2_pad_size %u,
> > > + l2_5_ofs : %u"
> > > + " l3_ofs %u, l4_ofs %u\n",
> > > + good_l2_pad_size[i], good_l2_5_ofs[i], good_l3_ofs[i],
> > > + good_l4_ofs[i]); VLOG_ERR(" Test offsets: l2_pad_size %u, l2_5_ofs :
> > > + %u"
> > > + " l3_ofs %u, l4_ofs %u\n",
> > > +

Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-06-29 Thread Flavio Leitner

On Tue, Jun 29, 2021 at 03:50:22PM +0200, Eelco Chaudron wrote:
> 
> 
> On 28 Jun 2021, at 4:57, Flavio Leitner wrote:
> 
> > Hi,
> >
> >
> > On Thu, Jun 17, 2021 at 09:57:49PM +0530, Kumar Amber wrote:
> >> Tests:
> >>   6: OVS-DPDK - MFEX Autovalidator
> >>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
> >>
> >> Added a new directory to store the PCAP file used
> >> in the tests and a script to generate the fuzzy traffic
> >> type pcap to be used in fuzzy unit test.
> >
> >
> > I haven't tried this yet but am I right that these tests are
> > going to pass a pcap to send traffic in a busy loop for 5
> > seconds in the first case and 20 seconds in the second case?
> >
> > I see that when autovalidator is set OVS will crash if one
> > implementation returns a different value, so I wonder why
> > we need to run for that long.
> 
> I think we should remove the assert (already suggested by Harry),
> so it will not crass by accident if someone selects autovalidator
> in the field (and runs into an issue).
> Failure will then be detected by the ERROR log entries on shutdown.

That's true for the testsuite, but not in production as there is
nothing to disable that.

Perhaps if autovalidator detects an issue, it should log an ERROR
level log to report to testsuite, disable the failing mode and make
sure OVS is either in default or in another functional mode.

> I’m wondering if there is another way than a simple delay, as these tend to 
> cause issues later on. Can we check packets processed or something?

Yeah, maybe we can pass all packets like 5x at least.

fbl


> 
> > It is storing a python tool in the pcap directory. I think the
> > fuzzy tool could be called 'mfex_fuzzy.py' and stay in tests/
> > with other similar testing tools.
> >
> > Also, I don't think the test environment sets OVS_DIR. The
> > 'tests/' is actually $srcdir, but I could be wrong here.
> >
> > BTW, scapy is not mandatory to build or test OVS, so if that
> > tool is not available, the test should be skipped and not fail.
> >
> > Thanks,
> > fbl


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Flavio Leitner



Hi,

On Tue, Jun 29, 2021 at 12:27:57PM +, Amber, Kumar wrote:
> Hi Eelco,
> 
> Thanks Again for reviews , Pls find my replies inline.
> 
> From: Eelco Chaudron 
> Sent: Tuesday, June 29, 2021 5:14 PM
> To: Van Haaren, Harry ; Amber, Kumar 
> 
> Cc: d...@openvswitch.org; i.maxim...@ovn.org; Stokes, Ian 
> ; Flavio Leitner 
> Subject: Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function 
> for miniflow extract
> 
> 
> On 17 Jun 2021, at 18:27, Kumar Amber wrote:
> 
> This patch introduced the auto-validation function which
> allows users to compare the batch of packets obtained from
> different miniflow implementations against the linear
> miniflow extract and return a hitmask.
> 
> The autovaidator function can be triggered at runtime using the
> following command:
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> 
> Signed-off-by: Kumar Amber 
> mailto:kumar.am...@intel.com>>
> Co-authored-by: Harry van Haaren 
> mailto:harry.van.haa...@intel.com>>
> Signed-off-by: Harry van Haaren 
> mailto:harry.van.haa...@intel.com>>
> ---
> lib/dpif-netdev-private-extract.c | 141 ++
> lib/dpif-netdev-private-extract.h | 15 
> lib/dpif-netdev.c | 2 +-
> 3 files changed, 157 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev-private-extract.c 
> b/lib/dpif-netdev-private-extract.c
> index fcc56ef26..0741c19f9 100644
> --- a/lib/dpif-netdev-private-extract.c
> +++ b/lib/dpif-netdev-private-extract.c
> @@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);
> 
> /* Implementations of available extract options. */
> static struct dpif_miniflow_extract_impl mfex_impls[] = {
> + {
> + .probe = NULL,
> + .extract_func = dpif_miniflow_extract_autovalidator,
> + .name = "autovalidator",
> + },
> {
> .probe = NULL,
> .extract_func = NULL,
> @@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
> dpif_miniflow_extract_impl **out_ptr)
> *out_ptr = mfex_impls;
> return ARRAY_SIZE(mfex_impls);
> }
> +
> +uint32_t
> +dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
> + struct netdev_flow_key *keys,
> + uint32_t keys_size, odp_port_t in_port,
> + void *pmd_handle)
> +{
> + const size_t cnt = dp_packet_batch_size(packets);
> + uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
> + uint16_t good_l3_ofs[NETDEV_MAX_BURST];
> + uint16_t good_l4_ofs[NETDEV_MAX_BURST];
> + uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
> + struct dp_packet *packet;
> + struct dp_netdev_pmd_thread *pmd = pmd_handle;
> + struct dpif_miniflow_extract_impl *miniflow_funcs;
> +
> + int32_t mfunc_count = dpif_miniflow_extract_info_get(_funcs);
> + if (mfunc_count < 0) {
> 
> In theory 0 could not be returned, but just to cover the corner case can we 
> change this to include zero.
> 
> The  code has been adapted as per Flavio comments so will not be a concern.
> 
> + pmd->miniflow_extract_opt = NULL;
> 
> Guess the above needs to be atomic.
> 
> Removed based on Flavio comments.

I asked to initialize that using an API and Eelco is asking to
set it atomically. The requests are complementary, right?

> 
> + VLOG_ERR("failed to get miniflow extract function implementations\n");
> 
> Capital F to be in sync with your other error messages?
> 
> Removed based on Flavio comments.

Not sure if I got this. I mentioned that the '\n' is not needed at
the end of all VLOG_* calls. Eelco is asking to start with capital
'F'. So the requests are complementary, unless with the refactor
the message went away.

Just make sure to follow the logging style convention in OVS.

fbl



> 
> + return 0;
> + }
> + ovs_assert(keys_size >= cnt);
> 
> I don’t think we should assert here. Just return an error like above, so in 
> production, we get notified, and this implementation gets disabled.
> 
> Actually we do else one would most likely be overwriting the assigned array 
> space for keys and will hit a Seg fault at some point.
> 
> And hence we would like to know at the compile time if this is the case.
> 
> + struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
> +
> + /* Run scalar miniflow_extract to get default result. */
> + DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> + pkt_metadata_init(>md, in_port);
> + miniflow_extract(packet, [i].mf);
> +
> + /* Store known good metadata to compare with optimized metadata. */
> + good_l2_5_ofs[i] = packet->l2_5_ofs;
> + good_l3_ofs[i] = packet->l3_ofs;
> + good_l4_ofs[i] = packet->l4_ofs;
> + good_l2_pad_size[i] = packet->l2_pad_size;
> + }
> +
> + /* Iterate through each version of miniflow implementations. */

Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-06-29 Thread Flavio Leitner



Hi,

On Thu, Jun 17, 2021 at 09:57:52PM +0530, Kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit adds AVX512 implementations of miniflow extract.
> By using the 64 bytes available in an AVX512 register, it is
> possible to convert a packet to a miniflow data-structure in
> a small quantity instructions.
> 
> The implementation here probes for Ether()/IP()/UDP() traffic,
> and builds the appropriate miniflow data-structure for packets
> that match the probe.
> 
> The implementation here is auto-validated by the miniflow
> extract autovalidator, hence its correctness can be easily
> tested and verified.
> 
> Note that this commit is designed to easily allow addition of new
> traffic profiles in a scalable way, without code duplication for
> each traffic profile.
> 
> Signed-off-by: Harry van Haaren 
> ---
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev-extract-avx512.c  | 416 ++
>  lib/dpif-netdev-private-extract.c |  15 ++
>  lib/dpif-netdev-private-extract.h |  19 ++
>  4 files changed, 451 insertions(+)
>  create mode 100644 lib/dpif-netdev-extract-avx512.c
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 3080bb04a..2b95d6f92 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
>   $(AM_CFLAGS)
>  lib_libopenvswitchavx512_la_SOURCES = \
>   lib/dpif-netdev-lookup-avx512-gather.c \
> + lib/dpif-netdev-extract-avx512.c \
>   lib/dpif-netdev-avx512.c
>  lib_libopenvswitchavx512_la_LDFLAGS = \
>   -static
> diff --git a/lib/dpif-netdev-extract-avx512.c 
> b/lib/dpif-netdev-extract-avx512.c
> new file mode 100644
> index 0..1145ac8a9
> --- /dev/null
> +++ b/lib/dpif-netdev-extract-avx512.c
> @@ -0,0 +1,416 @@
> +/*
> + * Copyright (c) 2021 Intel.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */


Since this is very specific to AVX512, can we have a more verbose
comment here explaining how it works? See the 'dpif, the DataPath
InterFace.' in dpif.h as an example.


> +
> +#ifdef __x86_64__
> +/* Sparse cannot handle the AVX512 instructions. */
> +#if !defined(__CHECKER__)
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "flow.h"
> +#include "dpdk.h"
> +
> +#include "dpif-netdev-private-dpcls.h"
> +#include "dpif-netdev-private-extract.h"
> +#include "dpif-netdev-private-flow.h"
> +
> +/* AVX512-BW level permutex2var_epi8 emulation. */
> +static inline __m512i
> +__attribute__((target("avx512bw")))
> +_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
> +   __m512i v_data_0,
> +   __m512i v_shuf_idxs,
> +   __m512i v_data_1)
> +{
> +/* Manipulate shuffle indexes for u16 size. */
> +__mmask64 k_mask_odd_lanes = 0x;
> +/* clear away ODD lane bytes. Cannot be done above due to no u8 shift */
> +__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes,
> +v_shuf_idxs, _mm512_setzero_si512());
> +v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
> +
> +__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
> +
> +/* Shuffle each half at 16-bit width */
> +__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_evn,
> +v_data_1);
> +__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_odd,
> +v_data_1);
> +
> +/* Find if the shuffle index was odd, via mask and compare */
> +uint16_t index_odd_mask = 0x1;
> +const __m512i v_index_mask_u16 = _mm512_set1_epi16(index_odd_mask);
> +
> +/* EVEN lanes, find if u8 index was odd,  result as u16 bitmask */
> +__m512i v_idx_even_masked = _mm512_and_si512(v_shuf_idxs,
> + v_index_mask_u16);
> +__mmask32 evn_rotate_mask = _mm512_cmpeq_epi16_mask(v_idx_even_masked,
> +v_index_mask_u16);
> +
> +/* ODD lanes, find if u8 index was odd, result as u16 bitmask */
> +__m512i v_shuf_idx_srli8 = _mm512_srli_epi16(v_shuf_idxs, 8);
> +__m512i v_idx_odd_masked = _mm512_and_si512(v_shuf_idx_srli8,
> +v_index_mask_u16);
> +__mmask32 odd_rotate_mask =

Re: [ovs-dev] [v4 09/12] dpdk: add additional CPU ISA detection strings

2021-06-27 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 09:57:51PM +0530, Kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit enables OVS to at runtime check for more detailed
> AVX512 capabilities, specifically Byte and Word (BW) extensions,
> and Vector Bit Manipulation Instructions (VBMI).
> 
> These instructions will be used in the CPU ISA optimized
> implementations of traffic profile aware miniflow extract.
> 
> Signed-off-by: Harry van Haaren 
> ---

Acked-by: Flavio Leitner 

fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v4 08/12] dpif/stats: add miniflow extract opt hits counter

2021-06-27 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 09:57:50PM +0530, Kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit adds a new counter to be displayed to the user when
> requesting datapath packet statistics. It counts the number of
> packets that are parsed and a miniflow built up from it by the
> optimized miniflow extract parsers.
> 
> The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an
> extra entry indicating if the optimized MFEX was hit:
> 
>   - MFEX Opt hits:6786432  (100.0 %)
> 
> Signed-off-by: Harry van Haaren 
> ---
>  lib/dpif-netdev-avx512.c |  2 ++
>  lib/dpif-netdev-perf.c   |  3 +++
>  lib/dpif-netdev-perf.h   |  1 +
>  lib/dpif-netdev.c| 14 +-
>  tests/pmd.at |  6 --


It looks like this is missing to update lib/dpif-netdev-unixctl.man


>  5 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> index bb99b23ff..f55786f8c 100644
> --- a/lib/dpif-netdev-avx512.c
> +++ b/lib/dpif-netdev-avx512.c
> @@ -297,8 +297,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>  }
>  
>  /* At this point we don't return error anymore, so commit stats here. */
> +uint32_t mfex_hit = __builtin_popcountll(mf_mask);
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_RECV, batch_size);
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_PHWOL_HIT, 
> phwol_hits);
> +pmd_perf_update_counter(>perf_stats, PMD_STAT_MFEX_OPT_HIT, 
> mfex_hit);
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_EXACT_HIT, emc_hits);
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, smc_hits);
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_MASKED_HIT,
> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
> index 7103a2d4d..d7676ea2b 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -247,6 +247,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
> pmd_perf_stats *s,
>  "  Rx packets:%12"PRIu64"  (%.0f Kpps, %.0f 
> cycles/pkt)\n"
>  "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
>  "  - PHWOL hits:  %12"PRIu64"  (%5.1f %%)\n"
> +"  - MFEX Opt hits:   %12"PRIu64"  (%5.1f %%)\n"
>  "  - EMC hits:%12"PRIu64"  (%5.1f %%)\n"
>  "  - SMC hits:%12"PRIu64"  (%5.1f %%)\n"
>  "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
> @@ -258,6 +259,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
> pmd_perf_stats *s,
>  passes, rx_packets ? 1.0 * passes / rx_packets : 0,
>  stats[PMD_STAT_PHWOL_HIT],
>  100.0 * stats[PMD_STAT_PHWOL_HIT] / passes,
> +stats[PMD_STAT_MFEX_OPT_HIT],
> +100.0 * stats[PMD_STAT_MFEX_OPT_HIT] / passes,
>  stats[PMD_STAT_EXACT_HIT],
>  100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
>  stats[PMD_STAT_SMC_HIT],
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> index 8b1a52387..834c26260 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -57,6 +57,7 @@ extern "C" {
>  
>  enum pmd_stat_type {
>  PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). 
> */
> +PMD_STAT_MFEX_OPT_HIT,  /* Packets that had miniflow optimized match. */
>  PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
>  PMD_STAT_SMC_HIT,   /* Packets that had a sig match hit (SMC). */
>  PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 35c927d55..7a8f15415 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -660,6 +660,7 @@ pmd_info_show_stats(struct ds *reply,
>"  packet recirculations: %"PRIu64"\n"
>"  avg. datapath passes per packet: %.02f\n"
>"  phwol hits: %"PRIu64"\n"
> +  "  mfex opt hits: %"PRIu64"\n"
>"  emc hits: %"PRIu64"\n"
>"  smc hits: %"PRIu64"\n"
>"  megaflow hits: %"PRIu64"\n"
> @@ -669,10 +670,9 @@ pmd_info_show_stats(struct ds *reply,
>"  avg. packets per output batch: %.02f\n",
>total_packets, stats[PMD_STAT_RECIRC],
>passes_per_pkt, stats[PMD_STAT_PHWOL_HIT],
> -  stats[PMD_STAT_EXACT_HIT],
> -  stats[PMD_STAT_SMC_HIT],
> -  stats[PMD_STAT_MASKED_HIT], lookups_per_hit,
> -  stats[PMD_STAT_MISS], stats[PMD_STAT_LOST],
> +  stats[PMD_STAT_MFEX_OPT_HIT], stats[PMD_STAT_EXACT_HIT],
> +  stats[PMD_STAT_SMC_HIT], stats[PMD_STAT_MASKED_HIT],
> +  lookups_per_hit, stats[PMD_STAT_MISS], 
> stats[PMD_STAT_LOST],
>packets_per_batch);
>  
>  if (total_cycles == 0) {
> @@ -6863,7 +6863,7 @@ dfc_processing(struct

Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-06-27 Thread Flavio Leitner

Hi,


On Thu, Jun 17, 2021 at 09:57:49PM +0530, Kumar Amber wrote:
> Tests:
>   6: OVS-DPDK - MFEX Autovalidator
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
> 
> Added a new directory to store the PCAP file used
> in the tests and a script to generate the fuzzy traffic
> type pcap to be used in fuzzy unit test.


I haven't tried this yet but am I right that these tests are
going to pass a pcap to send traffic in a busy loop for 5
seconds in the first case and 20 seconds in the second case?

I see that when autovalidator is set OVS will crash if one
implementation returns a different value, so I wonder why
we need to run for that long.

It is storing a python tool in the pcap directory. I think the
fuzzy tool could be called 'mfex_fuzzy.py' and stay in tests/
with other similar testing tools.

Also, I don't think the test environment sets OVS_DIR. The
'tests/' is actually $srcdir, but I could be wrong here.

BTW, scapy is not mandatory to build or test OVS, so if that
tool is not available, the test should be skipped and not fail.

Thanks,
fbl


> 
> Signed-off-by: Kumar Amber 
> ---
>  tests/automake.mk|   5 +
>  tests/pcap/fuzzy.py  |  32 ++
>  tests/pcap/mfex_test | Bin 0 -> 416 bytes
>  tests/system-dpdk.at |  46 +++
>  4 files changed, 83 insertions(+)
>  create mode 100755 tests/pcap/fuzzy.py
>  create mode 100644 tests/pcap/mfex_test
> 
> diff --git a/tests/automake.mk b/tests/automake.mk
> index 1a528aa39..532875971 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -142,6 +142,11 @@ $(srcdir)/tests/fuzz-regression-list.at: 
> tests/automake.mk
>   echo "TEST_FUZZ_REGRESSION([$$basename])"; \
>   done > $@.tmp && mv $@.tmp $@
>  
> +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
> +MFEX_AUTOVALIDATOR_TESTS = \
> + tests/pcap/mfex_test \
> + tests/pcap/fuzzy.py
> +
>  OVSDB_CLUSTER_TESTSUITE_AT = \
>   tests/ovsdb-cluster-testsuite.at \
>   tests/ovsdb-execution.at \
> diff --git a/tests/pcap/fuzzy.py b/tests/pcap/fuzzy.py
> new file mode 100755
> index 0..a8051ba2b
> --- /dev/null
> +++ b/tests/pcap/fuzzy.py
> @@ -0,0 +1,32 @@
> +#!/usr/bin/python3
> +try:
> +   from scapy.all import *
> +except ModuleNotFoundError as err:
> +   print(err + ": Scapy")
> +import sys
> +import os
> +
> +path = os.environ['OVS_DIR'] + "/tests/pcap/fuzzy"
> +pktdump = PcapWriter(path, append=False, sync=True)
> +
> +for i in range(0, 2000):
> +
> +   # Generate random protocol bases, use a fuzz() over the combined packet 
> for full fuzzing.
> +   eth = Ether(src=RandMAC(), dst=RandMAC())
> +   vlan = Dot1Q()
> +   ipv4 = IP(src=RandIP(), dst=RandIP())
> +   ipv6 = IPv6(src=RandIP6(), dst=RandIP6())
> +   udp = UDP()
> +   tcp = TCP()
> +
> +   # IPv4 packets with fuzzing
> +   pktdump.write(fuzz(eth/ipv4/udp))
> +   pktdump.write(fuzz(eth/ipv4/tcp))
> +   pktdump.write(fuzz(eth/vlan/ipv4/udp))
> +   pktdump.write(fuzz(eth/vlan/ipv4/tcp))
> +
> +# IPv6 packets with fuzzing
> +   pktdump.write(fuzz(eth/ipv6/udp))
> +   pktdump.write(fuzz(eth/ipv6/tcp))
> +   pktdump.write(fuzz(eth/vlan/ipv6/udp))
> +   pktdump.write(fuzz(eth/vlan/ipv6/tcp))
> \ No newline at end of file
> diff --git a/tests/pcap/mfex_test b/tests/pcap/mfex_test
> new file mode 100644
> index 
> ..1aac67b8d643ecb016c758cba4cc32212a80f52a
> GIT binary patch
> literal 416
> zcmca|c+)~A1{MYw`2U}Qff2}QK`M68ITRa|G@yFii5$Gfk6YL%z>@uY&}o|
> z2s4N<1VH2&7y^V87$)XGOtD~MV$cFgfG~zBGGJ2#YtF$ xK>KST_NTIwYriok6N4Vm)gX-Q@c^{cp<7_5LgK^UuU{2>VS0RZ!RQ+EIW
> 
> literal 0
> HcmV?d1
> 
> diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
> index 802895488..46eaea35a 100644
> --- a/tests/system-dpdk.at
> +++ b/tests/system-dpdk.at
> @@ -232,3 +232,49 @@ OVS_VSWITCHD_STOP(["\@does not exist. The Open vSwitch 
> kernel module is probably
>  \@EAL: No free hugepages reported in hugepages-1048576kB@d"])
>  AT_CLEANUP
>  dnl 
> --
> +
> +dnl 
> --
> +dnl Add standard DPDK PHY port
> +AT_SETUP([OVS-DPDK - MFEX Autovalidator])
> +AT_KEYWORDS([dpdk])
> +
> +OVS_DPDK_START()
> +
> +dnl Add userspace bridge and attach it to OVS
> +AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk 
> options:dpdk-devargs=net_pcap1,rx_pcap=$OVS_DIR/tests/pcap/mfex_test,infinite_rx=1],
>  [], [stdout], [stderr])
> +AT_CHECK([ovs-vsctl show], [], [stdout])
> +
> +
> +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator], [0], 
> [dnl
> +Miniflow implementation set to autovalidator.
> +])
> +sleep 5
> +
> +dnl Clean up
> +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
> +AT_CLEANUP
> +dnl 
> --
> +
> +dnl 
>

Re: [ovs-dev] [v4 06/12] dpif-netdev: Add additional packet count parameter for study function

2021-06-27 Thread Flavio Leitner



Hi,

On Thu, Jun 17, 2021 at 09:57:48PM +0530, Kumar Amber wrote:
> This commit introduces additonal command line paramter
    
Typos.

> for mfex study function. If user provides additional packet out
> it is used in study to compare minimum packets which must be processed
> else a default value is choosen.
> 
> $ OVS_DIR/utilities/ovs-appctl dpif-netdev/miniflow-parser-set study 500

There is no need to include "OVS_DIR/utilities/" as it depends
on each particular deployment.

> 
> Signed-off-by: Kumar Amber 
> ---
>  Documentation/topics/dpdk/bridge.rst |  8 ++-
>  lib/dpif-netdev-extract-study.c  | 15 +++-
>  lib/dpif-netdev-private-extract.h|  8 +++
>  lib/dpif-netdev.c| 34 +++-
>  4 files changed, 57 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 1c78adc75..e7e91289a 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -288,7 +288,13 @@ An implementation can be selected manually by the 
> following command ::
>  Also user can select the study implementation which studies the traffic for
>  a specific number of packets by applying all availbale implementaions of
>  miniflow extract and than chooses the one with most optimal result for that
> -traffic pattern.
> +traffic pattern. User can also provide additonal parameter as packet count
> +which is minimum packets which OVS must study before choosing optimal
> +implementation, If no packet count is provided than default value is choosen.
> +
> +Study can be selected with packet count by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024

Ian already commented here.


>  
>  Miniflow Extract Validation
>  ~~~
> diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
> index d063d040c..c48fb125e 100644
> --- a/lib/dpif-netdev-extract-study.c
> +++ b/lib/dpif-netdev-extract-study.c
> @@ -55,6 +55,19 @@ get_study_stats(void)
>  return stats;
>  }
>  
> +static uint32_t pkt_compare_count = 0;

If we had a convention as mentioned in patch #3, this could be
mfex_study_compare_count or maybe mfex_study_pkts_count.

> +
> +uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count,
> +struct dpif_miniflow_extract_impl *opt)
> +{
> +if ((opt->extract_func == mfex_study_traffic) && (pkt_cmp_count != 0)) {
> +pkt_compare_count = pkt_cmp_count;
> +return 0;
> +}
> +pkt_compare_count = MFEX_MAX_COUNT;
> +return -EINVAL;

The documentation is not here, so not sure if it's missing or if there
will be a patch updating the man-page at least. My suggestion would
be to allow 0 to reset to built-in default.

> +}
> +
>  uint32_t
>  mfex_study_traffic(struct dp_packet_batch *packets,
> struct netdev_flow_key *keys,
> @@ -87,7 +100,7 @@ mfex_study_traffic(struct dp_packet_batch *packets,
>  
>  /* Choose the best implementation after a minimum packets have been
>   * processed. */
> -if (stats->pkt_count >= MFEX_MAX_COUNT) {
> +if (stats->pkt_count >= pkt_compare_count) {
>  uint32_t best_func_index = MFEX_IMPL_START_IDX;
>  uint32_t max_hits = 0;
>  for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
> diff --git a/lib/dpif-netdev-private-extract.h 
> b/lib/dpif-netdev-private-extract.h
> index d8a284db7..0ec74bef9 100644
> --- a/lib/dpif-netdev-private-extract.h
> +++ b/lib/dpif-netdev-private-extract.h
> @@ -127,5 +127,13 @@ dpif_miniflow_extract_get_default(void);
>   * overridden at runtime. */
>  void
>  dpif_miniflow_extract_set_default(miniflow_extract_func func);
> +/* Sets the packet count from user to the stats for use in
> + * study function to match against the classified packets to choose
> + * the optimal implementation.
> + * On error, returns EINVAL.
> + * On success, returns 0.
> + */
> +uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count,
> +struct dpif_miniflow_extract_impl *opt);
>  
>  #endif /* DPIF_NETDEV_AVX512_EXTRACT */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 716e0debf..35c927d55 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1141,14 +1141,29 @@ dpif_miniflow_extract_impl_set(struct unixctl_conn 
> *conn, int argc,
>  return;
>  }
>  new_func = opt->extract_func;
> -/* argv[2] is optional datapath instance. If no datapath name is 
> provided.
> +
> +/* argv[2] is optional packet count, which user can provide along with
> + * study function to set the minimum packet that must be matched in order
> + * to choose the optimal function. */
> +uint32_t pkt_cmp_count = 0;
> +uint32_t study_ret;
> +if (argc == 3) {
> +char *err_str;
> +pkt_cmp_count =

Re: [ovs-dev] [v4 05/12] dpif-netdev: Add configure to enable autovalidator at build time.

2021-06-27 Thread Flavio Leitner




Hi,

I think Ian covered all issues and I suspect this patch might
change a bit due to comments on previous patches.

fbl

On Thu, Jun 17, 2021 at 09:57:47PM +0530, Kumar Amber wrote:
> This commit adds a new command to allow the user to enable
> autovalidatior by default at build time thus allowing for
> runnig unit test by default.
> 
>  $ ./configure --enable-mfex-default-autovalidator
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> ---
>  Documentation/topics/dpdk/bridge.rst |  5 +
>  NEWS | 12 +++-
>  acinclude.m4 | 16 
>  configure.ac |  1 +
>  lib/dpif-netdev-private-extract.c| 24 
>  lib/dpif-netdev-private-extract.h| 10 ++
>  lib/dpif-netdev.c|  7 +--
>  7 files changed, 72 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index b262b98f8..1c78adc75 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -307,6 +307,11 @@ To set the Miniflow autovalidator, use this command ::
>  
>  $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
>  
> +A compile time option is available in order to test it with the OVS unit
> +test suite. Use the following configure option ::
> +
> +$ ./configure --enable-mfex-default-autovalidator
> +
>  Unit Test Miniflow Extract
>  ++
>  
> diff --git a/NEWS b/NEWS
> index 63a485309..ed9f4d4c4 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -24,6 +24,17 @@ Post-v2.15.0
>   * An optimized miniflow extract (mfex) implementation is now available,
> which uses CPU SIMD ISA to parse specific traffic profiles 
> efficiently.
> Refer to the documentation for details on how to enable it at runtime.
> + * Cache results for CPU ISA checks, reduces overhead on repeated 
> lookups.
> + * Add command line option to switch between mfex function pointers.
> + * Add miniflow extract auto-validator function to compare different
> +   miniflow extract implementations against default implementation.
> + * Add study function to miniflow function table which studies packet
> +   and automatically chooses the best miniflow implementation for that
> +   traffic.
> + * Add AVX512 based optimized miniflow extract function for traffic type
> +   IP/UDP.
> + * Add build time configure command to enable auto-validatior as default
> +   miniflow implementation at build time.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> @@ -35,7 +46,6 @@ Post-v2.15.0
>   * New option '--election-timer' to the 'create-cluster' command to set 
> the
> leader election timer during cluster creation.
>  
> -
>  v2.15.0 - 15 Feb 2021
>  -
> - OVSDB:
> diff --git a/acinclude.m4 b/acinclude.m4
> index 5fbcd9872..e2704cfda 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -14,6 +14,22 @@
>  # See the License for the specific language governing permissions and
>  # limitations under the License.
>  
> +dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time?
> +dnl This enables automatically running all unit tests with all MFEX
> +dnl implementations.
> +AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [
> +  AC_ARG_ENABLE([mfex-default-autovalidator],
> +[AC_HELP_STRING([--enable-mfex-default-autovalidator], 
> [Enable MFEX autovalidator as default miniflow_extract implementation.])],
> +[autovalidator=yes],[autovalidator=no])
> +  AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation])
> +  if test "$autovalidator" != yes; then
> +AC_MSG_RESULT([no])
> +  else
> +OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT"
> +AC_MSG_RESULT([yes])
> +  fi
> +])
> +
>  dnl Set OVS DPCLS Autovalidator as default subtable search at compile time?
>  dnl This enables automatically running all unit tests with all DPCLS
>  dnl implementations.
> diff --git a/configure.ac b/configure.ac
> index e45685a6c..46c402892 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE
>  OVS_CTAGS_IDENTIFIERS
>  OVS_CHECK_DPCLS_AUTOVALIDATOR
>  OVS_CHECK_DPIF_AVX512_DEFAULT
> +OVS_CHECK_MFEX_AUTOVALIDATOR
>  OVS_CHECK_BINUTILS_AVX512
>  
>  AC_ARG_VAR(KARCH, [Kernel Architecture String])
> diff --git a/lib/dpif-netdev-private-extract.c 
> b/lib/dpif-netdev-private-extract.c
> index d86268a1d..2008e5ee5 100644
> --- a/lib/dpif-netdev-private-extract.c
> +++ b/lib/dpif-netdev-private-extract.c
> @@ -230,3 +230,27 @@ dpif_miniflow_extract_autovalidator(struct 
> dp_packet_batch *packets,
>   */
>  return 0;
>  }
> +
> +/* Variable to hold the defaualt mfex implementation. */
> +static

Re: [ovs-dev] [v4 04/12] docs/dpdk/bridge: add miniflow extract section.

2021-06-27 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 09:57:46PM +0530, Kumar Amber wrote:
> This commit adds a section to the dpdk/bridge.rst netdev documentation,
> detailing the added miniflow functionality. The newly added commands are
> documented, and sample output is provided.
> 
> The use of auto-validator and special study function is also described
> in detail as well as running fuzzy tests.

Usually we require to add the NEWS entry and documentation in the same
patch adding the feature because they help to document the patch and
keep sources integrity when bisecting.

> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> ---
>  Documentation/topics/dpdk/bridge.rst | 105 +++
>  NEWS |   3 +
>  2 files changed, 108 insertions(+)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index f59e26cbe..b262b98f8 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -256,3 +256,108 @@ The following line should be seen in the configure 
> output when the above option
>  is used ::
>  
>  checking whether DPIF AVX512 is default implementation... yes
> +
> +Miniflow Extract
> +
> +
> +Miniflow extract (MFEX) performs parsing of the raw packets and extracts the
> +important header information into a compressed miniflow. This miniflow is
> +composed of bits and blocks where the bits signify which blocks are set or
> +have values where as the blocks hold the metadata, ip, udp, vlan, etc. These
> +values are used by the datapath for switching decisions later.
> +
> +Most modern CPUs are have SIMD capabilities. These SIMD instructions are able

Have?

> +to process a vector rather than act on one single data. OVS provides multiple
> +implementations of miniflow extract. This allows the user to take advantage
> +of SIMD instructions like AVX512 to gain additional performance.
> +
> +A list of implementations can be obtained by the following command. The
> +command also shows whether the CPU supports each implementation ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-get
> +Available Optimized Miniflow Extracts:
> +  autovalidator (available: True)
> +  disable (available: True)
> +  study (available: True)
> +  avx512_ip_udp (available: True)
> +
> +An implementation can be selected manually by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set study
> +
> +Also user can select the study implementation which studies the traffic for
> +a specific number of packets by applying all availbale implementaions of

Typos already mentioned.

> +miniflow extract and than chooses the one with most optimal result for that
> +traffic pattern.
> +
> +Miniflow Extract Validation
> +~~~
> +
> +As multiple versions of miniflow extract can co-exist, each with different
> +CPU ISA optimizations, it is important to validate that they all give the
> +exact same results. To easily test all miniflow implementations, an
> +``autovalidator`` implementation of the miniflow exists. This implementation
> +runs all other available miniflow extract implementations, and verifies that
> +the results are identical.
> +
> +Running the OVS unit tests with the autovalidator enabled ensures all
> +implementations provide the same results.
> +
> +To set the Miniflow autovalidator, use this command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> +
> +Unit Test Miniflow Extract
> +++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS=6
> +6: OVS-DPDK - MFEX Autovalidator

This will change over time. Can we use -k  instead?

> +
> +The unit test uses mulitple traffic type to test the correctness of the
> +implementaions.
> +
> +Running Fuzzy test with Autovalidator
> ++
> +
> +Fuzzy tests can also be done on minfilow extract with the help of

Typo.

> +auto-validator and Scapy. The steps below describes the steps to
> +reproduce the setup with IP being fuzzed to generate packets.
> +
> +Scapy is used to create fuzzy IP packets and save them into a PCAP ::
> +
> +pkt = fuzz(Ether()/IP()/TCP())
> +
> +Set the miniflow extract to autovalidator using ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> +
> +OVS is configured to receive the generated packets ::
> +
> +$ ovs-vsctl add-port br0 pcap0 -- \
> +set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
> +"rx_pcap=fuzzy.pcap"

This comes back to the point of adding the doc along with the test
because I can't see how the test works in this patch.


> +
>

Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

2021-06-27 Thread Flavio Leitner



Hi,

On Thu, Jun 17, 2021 at 09:57:45PM +0530, Kumar Amber wrote:
> The study function runs all the available implementations
> of miniflow_extract and makes a choice whose hitmask has
> maximum hits and sets the mfex to that function.
> 
> Study can be run at runtime using the following command:
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set study

Nice!


> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> ---
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev-extract-study.c   | 119 ++
>  lib/dpif-netdev-private-extract.c |   5 ++
>  lib/dpif-netdev-private-extract.h |  14 +++-
>  4 files changed, 138 insertions(+), 1 deletion(-)
>  create mode 100644 lib/dpif-netdev-extract-study.c
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 6657b9ae5..3080bb04a 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -114,6 +114,7 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev.c \
>   lib/dpif-netdev.h \
>   lib/dpif-netdev-private-dfc.c \
> + lib/dpif-netdev-extract-study.c \

Wrong order?

>   lib/dpif-netdev-private-dfc.h \
>   lib/dpif-netdev-private-dpcls.h \
>   lib/dpif-netdev-private-dpif.c \
> diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
> new file mode 100644
> index 0..d063d040c
> --- /dev/null
> +++ b/lib/dpif-netdev-extract-study.c
> @@ -0,0 +1,119 @@
> +/*
> + * Copyright (c) 2021 Intel.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "dpif-netdev-private-extract.h"
> +#include "dpif-netdev-private-thread.h"
> +#include "openvswitch/vlog.h"
> +#include "ovs-thread.h"
> +
> +VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
> +
> +/* Max size of packets to be compared. */

Size or number?

> +#define MFEX_MAX_COUNT (128)
> +
> +/* This value is the threshold for the amount of packets that
> + * must hit on the optimized miniflow extract before it will be
> + * accepted and used in the datapath after the study phase. */
> +#define MFEX_MIN_HIT_COUNT_FOR_USE (MFEX_MAX_COUNT / 2)
> +
> +/* Struct to hold miniflow study stats. */
> +struct study_stats {
> +uint32_t pkt_count;
> +uint32_t impl_hitcount[MFEX_IMPLS_MAX_SIZE];
> +};
> +
> +/* Define per thread data to hold the study stats. */
> +DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats);
> +
> +/* Allocate per thread PMD pointer space for study_stats. */
> +static inline struct study_stats *
> +get_study_stats(void)

Please define some prefix name for this module, like
for example mfex_study_, to have a convention.


> +{
> +struct study_stats *stats = study_stats_get();
> +if (OVS_UNLIKELY(!stats)) {
> +   stats = xzalloc(sizeof *stats);
> +   study_stats_set_unsafe(stats);
> +}
> +return stats;
> +}
> +
> +uint32_t
> +mfex_study_traffic(struct dp_packet_batch *packets,
> +   struct netdev_flow_key *keys,
> +   uint32_t keys_size, odp_port_t in_port,
> +   void *pmd_handle)
> +{
> +uint32_t hitmask = 0;
> +uint32_t mask = 0;
> +struct dp_netdev_pmd_thread *pmd = pmd_handle;
> +struct dpif_miniflow_extract_impl *miniflow_funcs;
> +uint32_t impl_count = dpif_miniflow_extract_info_get(_funcs);
> +struct study_stats *stats = get_study_stats();
> +
> +/* Run traffic optimized miniflow_extract to collect the hitmask
> + * to be compared after certain packets have been hit to choose
> + * the best miniflow_extract version for that traffic. */
> +for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
> +if (miniflow_funcs[i].available) {
> +hitmask = miniflow_funcs[i].extract_func(packets, keys, 
> keys_size,
> + in_port, pmd_handle);
> +stats->impl_hitcount[i] += count_1bits(hitmask);
> +
> +/* If traffic is not classified than we dont overwrite the keys
> + * array in minfiflow implementations so its safe to create a
> + * mask for all those packets whose miniflow have been created. 
> */
> +mask |= hitmask;
> +}
> +}
> +stats->pkt_count += dp_packet_batch_size(packets);
> +
> +/* Choose the best implementation after a minimum packets have been
> + * processed.

Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-27 Thread Flavio Leitner



Hi,

I haven't tested the patch set yet.
I left some comments in line.

On Thu, Jun 17, 2021 at 09:57:44PM +0530, Kumar Amber wrote:
> This patch introduced the auto-validation function which
> allows users to compare the batch of packets obtained from
> different miniflow implementations against the linear
> miniflow extract and return a hitmask.
> 
> The autovaidator function can be triggered at runtime using the
> following command:
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> ---
>  lib/dpif-netdev-private-extract.c | 141 ++
>  lib/dpif-netdev-private-extract.h |  15 
>  lib/dpif-netdev.c |   2 +-
>  3 files changed, 157 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev-private-extract.c 
> b/lib/dpif-netdev-private-extract.c
> index fcc56ef26..0741c19f9 100644
> --- a/lib/dpif-netdev-private-extract.c
> +++ b/lib/dpif-netdev-private-extract.c
> @@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);
>  
>  /* Implementations of available extract options. */
>  static struct dpif_miniflow_extract_impl mfex_impls[] = {
> +   {
> +.probe = NULL,
> +.extract_func = dpif_miniflow_extract_autovalidator,
> +.name = "autovalidator",
> +},

Please define a enum for each entry. Also document that
autovalidator is required to be the first and suggest
to see the comment explaining more on MFEX_IMPL_START_IDX.

>  {
>  .probe = NULL,
>  .extract_func = NULL,
> @@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
> dpif_miniflow_extract_impl **out_ptr)
>  *out_ptr = mfex_impls;
>  return ARRAY_SIZE(mfex_impls);
>  }
> +
> +uint32_t
> +dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
> +struct netdev_flow_key *keys,
> +uint32_t keys_size, odp_port_t in_port,
> +void *pmd_handle)
> +{
> +const size_t cnt = dp_packet_batch_size(packets);
> +uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
> +uint16_t good_l3_ofs[NETDEV_MAX_BURST];
> +uint16_t good_l4_ofs[NETDEV_MAX_BURST];
> +uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
> +struct dp_packet *packet;
> +struct dp_netdev_pmd_thread *pmd = pmd_handle;
> +struct dpif_miniflow_extract_impl *miniflow_funcs;
> +
> +int32_t mfunc_count = dpif_miniflow_extract_info_get(_funcs);
> +if (mfunc_count < 0) {
> +pmd->miniflow_extract_opt = NULL;
> +VLOG_ERR("failed to get miniflow extract function 
> implementations\n");

No need for terminating with \n here and other calls to VLOG_*().

> +return 0;
> +}

> +ovs_assert(keys_size >= cnt);
> +struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
> +
> +/* Run scalar miniflow_extract to get default result. */
> +DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> +pkt_metadata_init(>md, in_port);
> +miniflow_extract(packet, [i].mf);
> +
> +/* Store known good metadata to compare with optimized metadata. */
> +good_l2_5_ofs[i] = packet->l2_5_ofs;
> +good_l3_ofs[i] = packet->l3_ofs;
> +good_l4_ofs[i] = packet->l4_ofs;
> +good_l2_pad_size[i] = packet->l2_pad_size;
> +}
> +
> +/* Iterate through each version of miniflow implementations. */
> +for (int j = MFEX_IMPL_START_IDX; j < ARRAY_SIZE(mfex_impls); j++) {
> +if (!mfex_impls[j].available) {
> +continue;
> +}
> +
> +/* Reset keys and offsets before each implementation. */
> +memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
> +DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> +dp_packet_reset_offsets(packet);
> +}
> +/* Call optimized miniflow for each batch of packet. */
> +uint32_t hit_mask = mfex_impls[j].extract_func(packets, test_keys,
> +keys_size, in_port, pmd_handle);
> +
> +/* Do a miniflow compare for bits, blocks and offsets for all the
> + * classified packets in the hitmask marked by set bits. */
> +while (hit_mask) {
> +/* Index for the set bit. */
> +uint32_t i = __builtin_ctz(hit_mask);
> +/* Set the index in hitmask to Zero. */
> +hit_mask &= (hit_mask - 1);
> +
> +uint32_t failed = 0;
> +
> +/* Check miniflow bits are equal. */
> +if ((keys[i].mf.map.bits[0] != test_keys[i].mf.map.bits[0]) ||
> +(keys[i].mf.map.bits[1] != test_keys[i].mf.map.bits[1])) {
> +VLOG_ERR("Good 0x%llx 0x%llx\tTest 0x%llx 0x%llx\n",
> + keys[i].mf.map.bits[0], keys[i].mf.map.bits[1],
> + test_keys[i].mf.map.bits[0],
> +

Re: [ovs-dev] [v4 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-06-27 Thread Flavio Leitner



Hi,

I am reviewing this patch ignoring the things that were
already pointed out in other reviews in the ML.

On Thu, Jun 17, 2021 at 09:57:43PM +0530, Kumar Amber wrote:
> This patch introduces the mfex function pointers which allows
> the user to switch between different miniflow extract implementations
> which are provided by the OVS based on optimized ISA CPU.
> 
> The user can query for the available minflow extract variants available
> for that CPU by following commands:
> 
> $ovs-appctl dpif-netdev/miniflow-parser-get
> 
> Similarly an user can set the miniflow implementation by the following
> command :
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set name
> 
> This allow for more performance and flexibility to the user to choose
> the miniflow implementation according to the needs.
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> ---
>  lib/automake.mk   |   2 +
>  lib/dpif-netdev-avx512.c  |  32 ++--
>  lib/dpif-netdev-private-extract.c |  86 
>  lib/dpif-netdev-private-extract.h |  94 ++
>  lib/dpif-netdev-private-thread.h  |   4 +
>  lib/dpif-netdev.c | 126 +-
>  6 files changed, 337 insertions(+), 7 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-extract.c
>  create mode 100644 lib/dpif-netdev-private-extract.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 49f42c2a3..6657b9ae5 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev-private-dpcls.h \
>   lib/dpif-netdev-private-dpif.c \
>   lib/dpif-netdev-private-dpif.h \
> + lib/dpif-netdev-private-extract.c \
> + lib/dpif-netdev-private-extract.h \
>   lib/dpif-netdev-private-flow.h \
>   lib/dpif-netdev-private-hwol.h \
>   lib/dpif-netdev-private-thread.h \
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> index f9b199637..bb99b23ff 100644
> --- a/lib/dpif-netdev-avx512.c
> +++ b/lib/dpif-netdev-avx512.c
> @@ -148,6 +148,15 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>   * // do all processing (HWOL->MFEX->EMC->SMC)
>   * }
>   */
> +
> +/* Do a batch minfilow extract into keys. */
> +uint32_t mf_mask = 0;
> +if (pmd->miniflow_extract_opt) {
> +mf_mask = pmd->miniflow_extract_opt(packets, keys,
> +batch_size, in_port,
> +(void *) pmd);
> +}
> +/* Perform first packet interation */
>  uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
>  uint32_t iter = lookup_pkts_bitmask;
>  while (iter) {
> @@ -159,6 +168,12 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>  pkt_metadata_init(>md, in_port);
>  
>  struct dp_netdev_flow *f = NULL;
> +struct netdev_flow_key *key = [i];
> +
> +/* Check the minfiflow mask to see if the packet was correctly
> +* classifed by vector mfex else do a scalar miniflow extract
> +* for that packet. */
> +uint32_t mfex_hit = (mf_mask & (1 << i));

This is actually a bool.

>  
>  /* Check for partial hardware offload mark. */
>  uint32_t mark;
> @@ -166,7 +181,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>  f = mark_to_flow_find(pmd, mark);
>  if (f) {
>  rules[i] = >cr;
> -pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> +/* If AVX512 MFEX already classified the packet, use it. */
> +if (mfex_hit) {
> +pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
> +} else {
> +pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> +}
> +
>  pkt_meta[i].bytes = dp_packet_size(packet);
>  phwol_hits++;
>  hwol_emc_smc_hitmask |= (1 << i);
> @@ -174,11 +195,12 @@ dp_netdev_input_outer_avx512(struct 
> dp_netdev_pmd_thread *pmd,
>  }
>  }
>  
> -/* Do miniflow extract into keys. */
> -struct netdev_flow_key *key = [i];
> -miniflow_extract(packet, >mf);
> +if (!mfex_hit) {
> +/* Do a scalar miniflow extract into keys */
> +miniflow_extract(packet, >mf);
> +}
>  
> -/* Cache TCP and byte values for all packets. */
> +/* Cache TCP and byte values for all packets */
>  pkt_meta[i].bytes = dp_packet_size(packet);
>  pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
>  
> diff --git a/lib/dpif-netdev-private-extract.c 
> b/lib/dpif-netdev-private-extract.c
> new file mode 100644
> index 0..fcc56ef26
> --- /dev/null
> +++ b/lib/dpif-netdev-private-extract.c
> @@ -0,0 +1,86 @@
> +/*
> + * Copyright (c)

Re: [ovs-dev] [v13 12/12] dpcls-avx512: Enable avx512 vector popcount instruction.

2021-06-24 Thread Flavio Leitner

On Thu, Jun 24, 2021 at 12:52:49PM +, Van Haaren, Harry wrote:
> > On Thu, Jun 24, 2021 at 11:07:59AM +, Van Haaren, Harry wrote:
> > > > On Thu, Jun 17, 2021 at 05:18:25PM +0100, Cian Ferriter wrote:
> > > > > From: Harry van Haaren 
> 
> > > I do like the idea of toolchain supporting ISA options a bit more, there 
> > > is
> > > so much compute performance available that is not widely used today.
> > > Such an effort industry wide would be very beneficial to all for improving
> > > performance, but would be a pretty large undertaking too... outside the
> > > scope of this patchset! :)
> > 
> > Yeah, it is. I mean, if the toolchain is not ready yet and we think
> > worth the benefits considering that most probably fewer people will
> > be able to contribute or maintain, then I see no other way to solve
> > the issue.
> 
> So the toolchain is "ready" in that we have a path to enable CPU ISA, and
> see the benefits. We can dream about future toolchains, and how those might
> improve our workflow in future, but pragmatically the approach here is the
> best-known-method based on available tools today. DPDK uses the same
> techniques (Function pointer, CPUID based ISA check, and plug in ISA if 
> available).
> 
> Improving the toolchain would only solve the problem to allow the compiler to 
> use the
> CPU ISA. This does not solve the problem of the compiler not being able to 
> understand
> the data-movement & processing to be able to reason about it and 
> auto-vectorize.

Yeah, the examples I found are straight forward use of ISA as you said,
then I wasn't sure about how much a compiler is able to help nowadays.


> > Do you think improving the toolchain is a larger commitment than
> > manually improving applications? A quick look on gcc gave me the
> > impression that it does support at least some basic vector
> > optimization capabilities.
> 
> Yes - you raise a good point, "basic vector optimization capabilities" are 
> present
> in various compilers (gcc and clang/llvm is what I test with). For the 
> matrix-multiply
> problem that is often used to showcase compiler auto-vectorization, it is an 
> extremely
> well bounded, and simple task from understanding the work to be done.
> 
> Our emails crossed paths, there's more detail here about matrix multiply & 
> basic vectorization.
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384377.html

Exactly :) For sure we want OVS to run faster, but there needs to be
line on how low level we can go because it's always a trade off with
complexity. In this case the line was blur, at least to me, because
I wasn't aware of how far the toolchain can help us.

Do you think these optimizations will be a problem with Windows or
BSDs? I haven't found an alternative to Cirrus which I used before
to build on BSD.


> > > I'll admit to being a bit of an ISA fan, but there's some magical 
> > > instructions
> > > that can do stuff in 1x instruction that otherwise take large amounts of
> > > shifts & loops. Did I hear somebody ask for examples..??
> > 
> > Out of curiosity, which tool are you using (if you are) to measure
> > the improvements at cycles level? vtune?
> 
> I use the Linux Perf tooling for performance measurements, along with OVS's
> own per-packet cycle count reporting. Hardware performance measuring (as Linux
> Perf and VTune use) provide all the info that's required.
> 
> For those not measuring performance at the function/ASM level, run the 
> following
> commands and view the performance in your terminal:perf top -C  
> -b
> 
> Based on that, focus on the area's where lots of cycles are spent, and 
> investigate
> alternative SIMD based implementations for that same functionality, making use
> of the CPU ISA. That's the general workflow :)

Yup, I am familiar with most of those except with VTune, so I wondered
if that provided more insights to see AVX512 optimizations impact.

> For those particularly interested, I done a "Measure Software Performance of 
> Data Plane Applications"
> talk at DPDK Userspace in 2019 talking about workflow/method: 
> https://www.youtube.com/watch?v=ZmwOKR5JyPk

Great, thanks for sharing it.

> 
> 
> > > I'll stop promoting ISA here, but am happy to continue detailed 
> > > discussions, or
> > break out
> > > conversations about specific areas of compute in OVS if there's appetite 
> > > for that!
> > Feel free
> > > to email to OVS Mailing list (with me on CC please :) or email directly 
> > > OK too.
> > 
> > I am definitely learning more about it and I appreciated your
> > longer reply.
> 
> As you may notice, this is an area I'm passionate about. If there's specific 
> interest,
> I can volunteer to try cover "measuring OVS's SW datapath performance" talk 
> at a
> future OVS conference..

I'd say that interesting talks are always welcome! :)

One thing that maybe you have interest is to increase datapath
visibility with regards to performance. Today there are some
statistics, but maybe there could be

Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.

2021-06-24 Thread Flavio Leitner

On Thu, Jun 24, 2021 at 11:53:34AM +, Ferriter, Cian wrote:
[...]
> > On Thu, Jun 17, 2021 at 05:18:18PM +0100, Cian Ferriter wrote:
> > > +
> > > +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> > > +
> > > +/* Actual list of implementations goes here. */
> > > +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> > > +/* The default scalar C code implementation. */
> > > +{ .func = dp_netdev_input,
> > 
> > The '.func' is too generic. Can we rename this to something that
> > relates to 'input'?
> > 
> 
> I'll rename to 'input_func'. Does that sound good to you?

Sounds better, indeed.

[..]
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > index 1f15af882..9c234ef3d 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -479,8 +479,8 @@ static void dp_netdev_execute_actions(struct 
> > > dp_netdev_pmd_thread *pmd,
> > >const struct flow *flow,
> > >const struct nlattr *actions,
> > >size_t actions_len);
> > > -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > > -struct dp_packet_batch *, odp_port_t 
> > > port_no);
> > > +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > > +struct dp_packet_batch *, odp_port_t port_no);
> > 
> > 
> > All other functions around are static and this one is now part of
> > dpif-netdev-private-dpif.h which is included by
> > dpif-netdev-private-thread.h as part of dpif-netdev-private.h.
> > 
> > Perhaps fixing that header issue I reported in the first patch
> > helps to solve this too.
> > 
> 
> This function can't be static. We need it to be available in
> lib/dpif-netdev-private-dpif.c to register it as the DPIF function
> for the dpif_scalar dpif_impl. I hope that makes sense.

Sure, it can't be. What I am saying is that it is declared in two
different places. One in dpif-netdev.c and another in the header
dpif-netdev-private-dpif.h which is included as well.

Thanks,
-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 12/12] dpcls-avx512: Enable avx512 vector popcount instruction.

2021-06-24 Thread Flavio Leitner



Hi Harry,

On Thu, Jun 24, 2021 at 11:07:59AM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: dev  On Behalf Of Flavio Leitner
> > Sent: Thursday, June 24, 2021 4:57 AM
> > To: Ferriter, Cian 
> > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v13 12/12] dpcls-avx512: Enable avx512 vector 
> > popcount
> > instruction.
> > 
> > On Thu, Jun 17, 2021 at 05:18:25PM +0100, Cian Ferriter wrote:
> > > From: Harry van Haaren 
> > >
> > > This commit enables the AVX512-VPOPCNTDQ Vector Popcount
> > > instruction. This instruction is not available on every CPU
> > > that supports the AVX512-F Foundation ISA, hence it is enabled
> > > only when the additional VPOPCNTDQ ISA check is passed.
> > >
> > > The vector popcount instruction is used instead of the AVX512
> > > popcount emulation code present in the avx512 optimized DPCLS today.
> > > It provides higher performance in the SIMD miniflow processing
> > > as that requires the popcount to calculate the miniflow block indexes.
> > >
> > > Signed-off-by: Harry van Haaren 
> > 
> > Acked-by: Flavio Leitner 
> 
> Thanks for reviewing!
> 
> > This patch series implements low level optimizations by manually
> > coding instructions. I wonder if gcc couldn't get some relevant
> > level of vectorized optimizations refactoring and enabling
> > compiling flags. I assume the answer is no, but I would appreciate
> > some enlightenment on the matter.
> 
> Unfortunately no... there is no magic solution here to have the toolchain
> provide fallbacks if the latest ISA is not available. You're 100% right, these
> are manually implemented versions of new ISA, implemented in "older"
> ISA, to allow usage of the functionality. In this case, Skylake grade 
> "AVX512-F"
> is used to implement the Icelake grade "AVX512-VPOPCNTDQ" instruction:
> (https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_popcnt_epi64%2520=4368,4368)
> 
> I do like the idea of toolchain supporting ISA options a bit more, there is
> so much compute performance available that is not widely used today.
> Such an effort industry wide would be very beneficial to all for improving
> performance, but would be a pretty large undertaking too... outside the
> scope of this patchset! :)

Yeah, it is. I mean, if the toolchain is not ready yet and we think
worth the benefits considering that most probably fewer people will
be able to contribute or maintain, then I see no other way to solve
the issue.

Do you think improving the toolchain is a larger commitment than
manually improving applications? A quick look on gcc gave me the
impression that it does support at least some basic vector
optimization capabilities.


> I'll admit to being a bit of an ISA fan, but there's some magical instructions
> that can do stuff in 1x instruction that otherwise take large amounts of
> shifts & loops. Did I hear somebody ask for examples..??

Out of curiosity, which tool are you using (if you are) to measure
the improvements at cycles level? vtune?


> Miniflow Bits processing with "BMI" (Bit Manipulation Instructions)
> Introduced in Haswell era, 
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#othertechs=BMI1,BMI2
> - Favorite instructions are pdep and pext (parallel bit deposit, and parallel 
> bit extract)
> - Very useful for dense bitfield unpacking, instead of "load - shift - AND" 
> per field, can
>unpack up to 8 bitfields in a u64 and align them to byte-boundaries
> - Its "opposite" "pext" also exists, extracting sparse bits from an integer 
> into a packed layout
> (pext is used in DPCLS, to pull sparse bits from the packet's miniflow into 
> linear packed layout,
> allowing it to be processed in a single packed AVX512 register)
> 
> Note that we're all benefitting from novel usage of the scalar "popcount" 
> instruction too, since merging
> commit: a0b36b392 (introduced in SSE4.2, with CPUID flag POPCNT) It uses a 
> bitmask & popcount approach
> to index into the miniflow, improving on the previous "count and shifts bits" 
> to iterate miniflows approach.
> 
> There are likely multiple other places in OVS where we spend significant 
> cycles
> on processing data in ways that can be accelerated significantly by using all 
> available ISA.
> There is ongoing work in miniflow extract (MFEX) with AVX512 SIMD ISA, 
> allowing parsing
> of multiple packet protocols at the same time (see here 
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=249470)
&g

Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of dpif.

2021-06-23 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 05:18:17PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds the AVX512 implementation of DPIF functionality,
> specifically the dp_netdev_input_outer_avx512 function. This function
> only handles outer (no re-circulations), and is optimized to use the
> AVX512 ISA for packet batching and other DPIF work.
> 
> Sparse is not able to handle the AVX512 intrinsics, causing compile
> time failures, so it is disabled for this file.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> Co-authored-by: Kumar Amber 
> Signed-off-by: Kumar Amber 
> 
> ---
> 
> v13:
> - Squash "Add HWOL support" commit into this commit.
> - Add NEWS item about this feature here rather than in a later commit.
> - Add #define NUM_U64_IN_ZMM_REG 8.
> - Add comment describing operation of while loop handling HWOL->EMC->SMC
>   lookups in dp_netdev_input_outer_avx512().
> - Add EMC and SMC batch insert functions for better handling of EMC and
>   SMC in AVX512 DPIF.
> - Minor code refactor to address review comments.
> ---
>  NEWS |   2 +
>  lib/automake.mk  |   5 +-
>  lib/dpif-netdev-avx512.c | 327 +++
>  lib/dpif-netdev-private-dfc.h|  25 +++
>  lib/dpif-netdev-private-dpif.h   |  32 +++
>  lib/dpif-netdev-private-thread.h |  11 +-
>  lib/dpif-netdev-private.h|  25 +++
>  lib/dpif-netdev.c| 103 --
>  8 files changed, 514 insertions(+), 16 deletions(-)
>  create mode 100644 lib/dpif-netdev-avx512.c
>  create mode 100644 lib/dpif-netdev-private-dpif.h
> 
> diff --git a/NEWS b/NEWS
> index 96b3a61c8..6a4a7b76d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,6 +10,8 @@ Post-v2.15.0
>   * Auto load balancing of PMDs now partially supports cross-NUMA polling
> cases, e.g if all PMD threads are running on the same NUMA node.
>   * Refactor lib/dpif-netdev.c to multiple header files.
> + * Add avx512 implementation of dpif which can process non recirculated
> +   packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 3a33cdd5c..660cd07f0 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -33,11 +33,13 @@ lib_libopenvswitchavx512_la_CFLAGS = \
>   -mavx512f \
>   -mavx512bw \
>   -mavx512dq \
> + -mbmi \
>   -mbmi2 \
>   -fPIC \
>   $(AM_CFLAGS)
>  lib_libopenvswitchavx512_la_SOURCES = \
> - lib/dpif-netdev-lookup-avx512-gather.c
> + lib/dpif-netdev-lookup-avx512-gather.c \
> + lib/dpif-netdev-avx512.c
>  lib_libopenvswitchavx512_la_LDFLAGS = \
>   -static
>  endif
> @@ -114,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev-private-dfc.c \
>   lib/dpif-netdev-private-dfc.h \
>   lib/dpif-netdev-private-dpcls.h \
> + lib/dpif-netdev-private-dpif.h \
>   lib/dpif-netdev-private-flow.h \
>   lib/dpif-netdev-private-hwol.h \
>   lib/dpif-netdev-private-thread.h \
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> new file mode 100644
> index 0..0e55b0be2
> --- /dev/null
> +++ b/lib/dpif-netdev-avx512.c
> @@ -0,0 +1,327 @@
> +/*
> + * Copyright (c) 2021 Intel Corporation.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifdef __x86_64__
> +/* Sparse cannot handle the AVX512 instructions. */
> +#if !defined(__CHECKER__)
> +
> +#include 
> +
> +#include "dpif-netdev.h"
> +#include "dpif-netdev-perf.h"
> +
> +#include "dpif-netdev-private.h"
> +#include "dpif-netdev-private-dpcls.h"
> +#include "dpif-netdev-private-flow.h"
> +#include "dpif-netdev-private-thread.h"
> +#include "dpif-netdev-private-hwol.h"
> +
> +#include "dp-packet.h"
> +#include "netdev.h"
> +
> +#include "immintrin.h"
> +
> +/* Each AVX512 register (zmm register in assembly notation) can contain up to
> + * 512 bits, which is equivalent to 8 uint64_t variables. This is the maximum
> + * number of miniflow blocks that can be processed in a single pass of the
> + * AVX512 code at a time.
> + */
> +#define NUM_U64_IN_ZMM_REG (8)
> +
> +/* Structure to contain per-packet metadata that must be attributed to the
> + * dp netdev flow. This is unfortunate to have to track per packet, however
> + * it's

Re: [ovs-dev] [v13 12/12] dpcls-avx512: Enable avx512 vector popcount instruction.

2021-06-23 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 05:18:25PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit enables the AVX512-VPOPCNTDQ Vector Popcount
> instruction. This instruction is not available on every CPU
> that supports the AVX512-F Foundation ISA, hence it is enabled
> only when the additional VPOPCNTDQ ISA check is passed.
> 
> The vector popcount instruction is used instead of the AVX512
> popcount emulation code present in the avx512 optimized DPCLS today.
> It provides higher performance in the SIMD miniflow processing
> as that requires the popcount to calculate the miniflow block indexes.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---

Acked-by: Flavio Leitner 

This patch series implements low level optimizations by manually
coding instructions. I wonder if gcc couldn't get some relevant
level of vectorized optimizations refactoring and enabling
compiling flags. I assume the answer is no, but I would appreciate
some enlightenment on the matter.

Thanks,
fbl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 11/12] dpdk: Cache result of CPU ISA checks.

2021-06-23 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 05:18:24PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> As a small optimization, this patch caches the result of a CPU ISA
> check from DPDK. Particularly in the case of running the DPCLS
> autovalidator (which repeatedly probes subtables) this reduces
> the amount of CPU ISA lookups from the DPDK level.
> 
> By caching them at the OVS/dpdk.c level, the ISA checks remain
> runtime for the CPU where they are executed, but subsequent checks
> for the same ISA feature become much cheaper.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> ---

The current approach uses a static int8_t per CPU flag.  Perhaps
using two static int8_t (one for DONE and another for AVAIL) and
then use a bit on them for each CPU flag would result in
allocating less static variables.

Anyways, 2 or 3 CPU flags make no relevant difference now.

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 09/12] dpif-netdev/dpcls-avx512: Enable 16 block processing.

2021-06-23 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 05:18:22PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit implements larger subtable searches in avx512. A limitation
> of the previous implementation was that up to 8 blocks of miniflow
> data could be matched on (so a subtable with 8 blocks was handled
> in avx, but 9 blocks or more would fall back to scalar/generic).
> This limitation is removed in this patch, where up to 16 blocks
> of subtable can be matched on.
> 
> From an implementation perspective, the key to enabling 16 blocks
> over 8 blocks was to do bitmask calculation up front, and then use
> the pre-calculated bitmasks for 2x passes of the "blocks gather"
> routine. The bitmasks need to be shifted for k-mask usage in the
> upper (8-15) block range, but it is relatively trivial. This also
> helps in case expanding to 24 blocks is desired in future.
> 
> The implementation of the 2nd iteration to handle > 8 blocks is
> behind a conditional branch which checks the total number of bits.
> This helps the specialized versions of the function that have a
> miniflow fingerprint of less-than-or-equal 8 blocks, as the code
> can be statically stripped out of those functions. Specialized
> functions that do require more than 8 blocks will have the branch
> removed and unconditionally execute the 2nd blocks gather routine.
> 
> Lastly, the _any() flavour will have the conditional branch, and
> the branch predictor may mispredict a bit, but per burst will
> likely get most packets correct (particularly towards the middle
> and end of a burst).
> 
> The code has been run with unit tests under autovalidation and
> passes all cases, and unit test coverage has been checked to
> ensure the 16 block code paths are executing.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---

The changes look good to me. I also introduced errors on the
first 8 blocks and on the second 8 blocks and both caused the
autovalidation to fail.

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 08/12] dpif-netdev-unixctl.man: Document subtable-lookup-* CMDs

2021-06-23 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 05:18:21PM +0100, Cian Ferriter wrote:
> Signed-off-by: Cian Ferriter 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 07/12] dpif-netdev: Add a partial HWOL PMD statistic.

2021-06-23 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 05:18:20PM +0100, Cian Ferriter wrote:
> It is possible for packets traversing the userspace datapath to match a
> flow before hitting on EMC by using a mark ID provided by a NIC. Add a
> PMD statistic for this hit.
> 
> Signed-off-by: Cian Ferriter 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 10/12] dpif-netdev/dpcls: Specialize more subtable signatures.

2021-06-23 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 05:18:23PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds more subtables to be specialized. The traffic
> pattern here being matched is VXLAN traffic subtables, which commonly
> have (5,3), (9,1) and (9,4) subtable fingerprints.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 06/12] dpif-netdev: Add command to get dpif implementations.

2021-06-23 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 05:18:19PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds a new command to retrieve the list of available
> DPIF implementations. This can be used by to check what implementations
> of the DPIF are available in any given OVS binary.
> 
> Usage:
>  $ ovs-appctl dpif-netdev/dpif-get

I didn't mention this in the dpif-set but it would be great
to have a more targeted command name, like dpif-impl-{get,set}.

> 
> Signed-off-by: Harry van Haaren 
> 
> ---
> 
> v13:
> - Add NEWS item about DPIF get and set commands here rather than in a
>   later commit.
> - Add documentation items about DPIF set commands here rather than in a
>   later commit.
> ---
>  Documentation/topics/dpdk/bridge.rst |  8 
>  NEWS |  1 +
>  lib/dpif-netdev-private-dpif.c   |  8 
>  lib/dpif-netdev-private-dpif.h   |  6 ++
>  lib/dpif-netdev-unixctl.man  |  3 +++
>  lib/dpif-netdev.c| 24 
>  6 files changed, 50 insertions(+)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index fafa8c821..f59e26cbe 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -226,6 +226,14 @@ stats associated with the datapath.
>  Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF 
> to
>  improve performance.
>  
> +OVS provides multiple implementations of the DPIF. The available
> +implementations can be listed with the following command ::
> +
> +$ ovs-appctl dpif-netdev/dpif-get
> +Available DPIF implementations:
> +  dpif_scalar
> +  dpif_avx512
> +
>  By default, dpif_scalar is used. The DPIF implementation can be selected by
>  name ::
>  
> diff --git a/NEWS b/NEWS
> index 6a4a7b76d..c47ab349e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -12,6 +12,7 @@ Post-v2.15.0
>   * Refactor lib/dpif-netdev.c to multiple header files.
>   * Add avx512 implementation of dpif which can process non recirculated
> packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
> + * Add commands to get and set the dpif implementations.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
> index d829a7ee5..3649e775d 100644
> --- a/lib/dpif-netdev-private-dpif.c
> +++ b/lib/dpif-netdev-private-dpif.c
> @@ -73,6 +73,14 @@ dp_netdev_impl_set_default(dp_netdev_input_func func)
>  default_dpif_func = func;
>  }
>  
> +uint32_t
> +dp_netdev_impl_get(const struct dpif_netdev_impl_info_t **out_impls)
> +{
> +ovs_assert(out_impls);
> +*out_impls = dpif_impls;
> +return ARRAY_SIZE(dpif_impls);
> +}
> +

This could receive struct ds and fill with the internal details
to keep internal details in private-dpif.c


>  /* This function checks all available DPIF implementations, and selects the
>   * returns the function pointer to the one requested by "name".
>   */
> diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
> index a6db3c7f2..717e9e2f9 100644
> --- a/lib/dpif-netdev-private-dpif.h
> +++ b/lib/dpif-netdev-private-dpif.h
> @@ -48,6 +48,12 @@ struct dpif_netdev_impl_info_t {
>  const char *name;
>  };
>  
> +/* This function returns all available implementations to the caller. The
> + * quantity of implementations is returned by the int return value.
> + */
> +uint32_t
> +dp_netdev_impl_get(const struct dpif_netdev_impl_info_t **out_impls);
> +
>  /* This function checks all available DPIF implementations, and selects the
>   * returns the function pointer to the one requested by "name".
>   */
> diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
> index b348940b0..534823879 100644
> --- a/lib/dpif-netdev-unixctl.man
> +++ b/lib/dpif-netdev-unixctl.man
> @@ -227,5 +227,8 @@ When this is the case, the above command prints the 
> load-balancing information
>  of the bonds configured in datapath \fIdp\fR showing the interface associated
>  with each bucket (hash).
>  .
> +.IP "\fBdpif-netdev/dpif-get\fR
> +Lists the DPIF implementations that are available.
> +.
>  .IP "\fBdpif-netdev/dpif-set\fR \fIdpif_impl\fR"
>  Sets the DPIF to be used to \fIdpif_impl\fR. By default "dpif_scalar" is 
> used.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 9c234ef3d..59a44a848 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -991,6 +991,27 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn 
> *conn, int argc,
>  ds_destroy();
>  }
>  
> +static void
> +dpif_netdev_impl_get(struct unixctl_conn *conn, int argc OVS_UNUSED,
> + const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
> +{
> +const struct dpif_netdev_impl_info_t *dpif_impls;

then here you initialize 'reply', call dp_netdev_impl_get() and
reply if it

Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.

2021-06-23 Thread Flavio Leitner

On Thu, Jun 17, 2021 at 05:18:18PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds a new command to allow the user to switch
> the active DPIF implementation at runtime. A probe function
> is executed before switching the DPIF implementation, to ensure
> the CPU is capable of running the ISA required. For example, the
> below code will switch to the AVX512 enabled DPIF assuming
> that the runtime CPU is capable of running AVX512 instructions:
> 
>  $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> 
> A new configuration flag is added to allow selection of the
> default DPIF. This is useful for running the unit-tests against
> the available DPIF implementations, without modifying each unit test.
> 
> The design of the testing & validation for ISA optimized DPIF
> implementations is based around the work already upstream for DPCLS.
> Note however that a DPCLS lookup has no state or side-effects, allowing
> the auto-validator implementation to perform multiple lookups and
> provide consistent statistic counters.
> 
> The DPIF component does have state, so running two implementations in
> parallel and comparing output is not a valid testing method, as there
> are changes in DPIF statistic counters (side effects). As a result, the
> DPIF is tested directly against the unit-tests.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> 
> ---
> 
> v13:
> - Add Docs items about the switch DPIF command here rather than in
>   later commit.
> - Document operation in manpages as well as rST.
> - Minor code refactoring to address review comments.
> ---
>  Documentation/topics/dpdk/bridge.rst |  34 +
>  acinclude.m4 |  15 
>  configure.ac |   1 +
>  lib/automake.mk  |   1 +
>  lib/dpif-netdev-avx512.c |  14 
>  lib/dpif-netdev-private-dpif.c   | 103 +++
>  lib/dpif-netdev-private-dpif.h   |  49 -
>  lib/dpif-netdev-private-thread.h |  11 +--
>  lib/dpif-netdev-unixctl.man  |   3 +
>  lib/dpif-netdev.c|  89 +--
>  10 files changed, 304 insertions(+), 16 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-dpif.c
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 526d5c959..fafa8c821 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -214,3 +214,37 @@ implementation ::
>  
>  Compile OVS in debug mode to have `ovs_assert` statements error out if
>  there is a mis-match in the DPCLS lookup implementation.
> +
> +Datapath Interface Performance
> +--
> +
> +The datapath interface (DPIF) or dp_netdev_input() is responsible for taking
> +packets through the major components of the userspace datapath; such as
> +miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance
> +stats associated with the datapath.
> +
> +Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF 
> to
> +improve performance.
> +
> +By default, dpif_scalar is used. The DPIF implementation can be selected by
> +name ::
> +
> +$ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> +DPIF implementation set to dpif_avx512.
> +
> +$ ovs-appctl dpif-netdev/dpif-set dpif_scalar
> +DPIF implementation set to dpif_scalar.
> +
> +Running Unit Tests with AVX512 DPIF
> +~~~
> +
> +Since the AVX512 DPIF is disabled by default, a compile time option is
> +available in order to test it with the OVS unit test suite. When building 
> with
> +a CPU that supports AVX512, use the following configure option ::
> +
> +$ ./configure --enable-dpif-default-avx512
> +
> +The following line should be seen in the configure output when the above 
> option
> +is used ::
> +
> +checking whether DPIF AVX512 is default implementation... yes
> diff --git a/acinclude.m4 b/acinclude.m4
> index 15a54d636..5fbcd9872 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
>fi
>  ])
>  
> +dnl Set OVS DPIF default implementation at configure time for running the 
> unit
> +dnl tests on the whole codebase without modifying tests per DPIF impl
> +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
> +  AC_ARG_ENABLE([dpif-default-avx512],
> +[AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF 
> AVX512 implementation as default.])],
> +[dpifavx512=yes],[dpifavx512=no])
> +  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
> +  if test "$dpifavx512" != yes; then
> +AC_MSG_RESULT([no])
> +  else
> +OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
> +AC_MSG_RESULT([yes])
> +  fi
> +])
> +
>  dnl OVS_ENABLE_WERROR
>  AC_DEFUN([OVS_ENABLE_WERROR],
>[AC_ARG_ENABLE(
> diff --git a/configure.ac

Re: [ovs-dev] [v13 08/12] dpif-netdev-unixctl.man: Document subtable-lookup-* CMDs

2021-06-22 Thread Flavio Leitner

On Tue, Jun 22, 2021 at 03:42:46PM +, Stokes, Ian wrote:
> > Hi Flavio,
> > 
> > Thanks for the review. My responses are inline.
> > 
> > Cian
> > 
> > > -Original Message-
> > > From: Flavio Leitner 
> > > Sent: Monday 21 June 2021 19:22
> > > To: Ferriter, Cian 
> > > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> > > Subject: Re: [ovs-dev] [v13 08/12] dpif-netdev-unixctl.man: Document
> > subtable-lookup-* CMDs
> > >
> > >
> > > Hi,
> > >
> > > This commit could be submitted outside of this patch-set as fix
> > > for commit 9ff7cabfd7 ("dpif-netdev: add subtable-lookup-prio-get
> > > command") and commit 3d018c3ea79d ("dpif-netdev: add subtable lookup
> > > prio set command.").
> > >
> > > This helps to get it merged sooner and reduce this patch-set size.
> > >
> > 
> > I'll remove this patch from the patchset and send to the mailing list 
> > separately.
> > I'll wait till the DPIF patchset has been merged to send this, since I 
> > don't want
> > there to be rebase conflicts (the DPIF patchset also modifies this part of 
> > lib/dpif-
> > netdev-unixctl.man).
> > 
> > I'll add the appropriate Fixes tags.
> 
> @Flavio, If there is an aim to reduce the overall patch number of
> the series then I would recommend the following patches be
> submitted separately also from this series as there is no
> dependency on them to enable DPIF with AVX512.
> 
> [v13 10/12] dpif-netdev/dpcls: Specialize more subtable signatures.
> [v13 11/12] dpdk: Cache result of CPU ISA checks.
> 
> 
> These two are quite small and I think could almost be applied now
> rather than as part of the series as they are modifying existing
> functionality (DPCLS AVX512 supported traffic types and DPDK flag
> caching).
> 
> Thoughts?

The idea is to get unrelated chunks merged sooner, if they make
sense of course, and then we have less patches to carry on. 

If the patches are going to arrive later or it causes more work
on following up patches, then I see no benefit.

It's a suggestion. I am happy to review either way.

Thanks,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of dpif.

2021-06-22 Thread Flavio Leitner



Hi Harry,

All good points. I made a suggestion and left to the authors to
decide the best course of action. It was a suggestion to accommodate
everyone and to reduce the churn. That's all.

Anyways, my plan is to continue reviewing the patches and, as always,
I appreciate your support.

Thanks,
fbl


On Tue, Jun 22, 2021 at 11:10:32AM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: dev  On Behalf Of Flavio Leitner
> > Sent: Monday, June 21, 2021 5:39 PM
> > To: Ferriter, Cian 
> > Cc: ovs-dev@openvswitch.org; Amber, Kumar ;
> > i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of 
> > dpif.
> > 
> > On Mon, Jun 21, 2021 at 04:13:12PM +, Ferriter, Cian wrote:
> > > Hi Flavio,
> 
> Hi Flavio & All,
> 
> Responses inline below.
> 
> Regards, -Harry
> 
> 
> > > Thanks for the review. My responses are inline.
> > >
> > > Cian
> > >
> > > > -Original Message-
> > > > From: Flavio Leitner 
> > > > Sent: Sunday 20 June 2021 21:09
> > > > To: Ferriter, Cian 
> > > > Cc: ovs-dev@openvswitch.org; Amber, Kumar ;
> > i.maxim...@ovn.org
> > > > Subject: Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation 
> > > > of dpif.
> > > >
> > > >
> > > > Hi,
> > > >
> > > > I am still reviewing the patch, but I thought worth to discuss
> > > > few items below.
> > > >
> > > > On Thu, Jun 17, 2021 at 05:18:17PM +0100, Cian Ferriter wrote:
> > > > > From: Harry van Haaren 
> > > > >
> > > > > This commit adds the AVX512 implementation of DPIF functionality,
> > > > > specifically the dp_netdev_input_outer_avx512 function. This function
> > > > > only handles outer (no re-circulations), and is optimized to use the
> > > > > AVX512 ISA for packet batching and other DPIF work.
> > > > >
> > > > > Sparse is not able to handle the AVX512 intrinsics, causing compile
> > > > > time failures, so it is disabled for this file.
> > > > >
> > > > > Signed-off-by: Harry van Haaren 
> > > > > Co-authored-by: Cian Ferriter 
> > > > > Signed-off-by: Cian Ferriter 
> > > > > Co-authored-by: Kumar Amber 
> > > > > Signed-off-by: Kumar Amber 
> > > > >
> > > > > ---
> 
> 
> 
> > > Good point. This can be cleaned up. I've included 
> > > lib/dpif-netdev-private-hwol.h in
> > lib/dpif-netdev-private.h and removed the headers included by 
> > lib/dpif-netdev-
> > private.h from lib/dpif-netdev-avx512.c.
> > >
> > > I'll move the prototype for dpcls_lookup() too, it makes more sense if 
> > > it's in
> > lib/dpif-netdev-private-dpcls.h.
> > 
> > Before you spend time on it, please consider if the refactoring is
> > really required. I think refactoring the code usually is a nice
> > thing to do when the result is a clean interface
> 
> Refactoring code can be done for multiple reasons, indeed cleaner interfaces
> is a noble goal, as is avoiding code-duplication, and general tidying up.
> This refactoring is not a "nice to have" it is required, let me explain:
> 
> In this patchset as a whole, an ISA optimized DPIF implementation is added.
> Before this refactor all DPIF related components (EMC, SMC, PartialHWOL,
> and DPIF structs like flow-stats, dp_netdev_flow, dp_netdev_pmd_thread etc)
> are defined & used only in a single .c file. There is no modularity, and 
> there is
> no possibility to re-use any of those components outside the .c file where 
> they
> are declared.
> 
> This patchset refactors those components into separate header files, allowing
> re-use outside the .c that they were previously limited to. This allows EMC 
> and
> SMC to be re-used, and the ISA optimized DPIF is now viable, due to code 
> reuse.
> 
> The result of the patches is a much more modular codebase, and indeed it 
> avoids
> much code duplication. The interface is kept as consistent as possible with 
> the
> previous implementation. I agree the interface is not as clean as it could 
> be, but
> this is the pragmatic approach to improve modularity and avoid code 
> duplication.
> 
> 
> > but it seems that will conflict with some other patches being reviewed.
> 
> Yes, any code changes can cause rebase-conflicts. As you know, this is an 
> unfortunate
> but unavoidable step in gene

Re: [ovs-dev] [v13 08/12] dpif-netdev-unixctl.man: Document subtable-lookup-* CMDs

2021-06-21 Thread Flavio Leitner



Hi,

This commit could be submitted outside of this patch-set as fix
for commit 9ff7cabfd7 ("dpif-netdev: add subtable-lookup-prio-get
command") and commit 3d018c3ea79d ("dpif-netdev: add subtable lookup
prio set command.").

This helps to get it merged sooner and reduce this patch-set size.

Thanks for documenting it.
fbl

On Thu, Jun 17, 2021 at 05:18:21PM +0100, Cian Ferriter wrote:
> Signed-off-by: Cian Ferriter 
> 
> ---
> 
> v13:
> - New commit to update manpages with more commands that are missing.
> ---
>  lib/dpif-netdev-unixctl.man | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
> index 45a1bd669..d77f5d9a4 100644
> --- a/lib/dpif-netdev-unixctl.man
> +++ b/lib/dpif-netdev-unixctl.man
> @@ -228,6 +228,16 @@ When this is the case, the above command prints the 
> load-balancing information
>  of the bonds configured in datapath \fIdp\fR showing the interface associated
>  with each bucket (hash).
>  .
> +.IP "\fBdpif-netdev/subtable-lookup-prio-get\fR"
> +Lists the DPCLS implementations or lookup functions that are available as 
> well
> +as their priorities.
> +.
> +.IP "\fBdpif-netdev/subtable-lookup-prio-set\fR \fIlookup_function\fR \
> +\fIprio\fR"
> +Sets the priority of a lookup function by the name, \fIlookup_function\fR, 
> and
> +the priority, \fIprio\fR, which should be a positive integer value. The 
> highest
> +priority lookup function is used for classification.
> +.
>  .IP "\fBdpif-netdev/dpif-get\fR
>  Lists the DPIF implementations that are available.
>  .
> -- 
> 2.32.0
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of dpif.

2021-06-21 Thread Flavio Leitner

On Mon, Jun 21, 2021 at 04:13:12PM +, Ferriter, Cian wrote:
> Hi Flavio,
> 
> Thanks for the review. My responses are inline.
> 
> Cian
> 
> > -Original Message-
> > From: Flavio Leitner 
> > Sent: Sunday 20 June 2021 21:09
> > To: Ferriter, Cian 
> > Cc: ovs-dev@openvswitch.org; Amber, Kumar ; 
> > i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of 
> > dpif.
> > 
> > 
> > Hi,
> > 
> > I am still reviewing the patch, but I thought worth to discuss
> > few items below.
> > 
> > On Thu, Jun 17, 2021 at 05:18:17PM +0100, Cian Ferriter wrote:
> > > From: Harry van Haaren 
> > >
> > > This commit adds the AVX512 implementation of DPIF functionality,
> > > specifically the dp_netdev_input_outer_avx512 function. This function
> > > only handles outer (no re-circulations), and is optimized to use the
> > > AVX512 ISA for packet batching and other DPIF work.
> > >
> > > Sparse is not able to handle the AVX512 intrinsics, causing compile
> > > time failures, so it is disabled for this file.
> > >
> > > Signed-off-by: Harry van Haaren 
> > > Co-authored-by: Cian Ferriter 
> > > Signed-off-by: Cian Ferriter 
> > > Co-authored-by: Kumar Amber 
> > > Signed-off-by: Kumar Amber 
> > >
> > > ---
> > >
> > > v13:
> > > - Squash "Add HWOL support" commit into this commit.
> > > - Add NEWS item about this feature here rather than in a later commit.
> > > - Add #define NUM_U64_IN_ZMM_REG 8.
> > > - Add comment describing operation of while loop handling HWOL->EMC->SMC
> > >   lookups in dp_netdev_input_outer_avx512().
> > > - Add EMC and SMC batch insert functions for better handling of EMC and
> > >   SMC in AVX512 DPIF.
> > > - Minor code refactor to address review comments.
> > > ---
> > >  NEWS |   2 +
> > >  lib/automake.mk  |   5 +-
> > >  lib/dpif-netdev-avx512.c | 327 +++
> > >  lib/dpif-netdev-private-dfc.h|  25 +++
> > >  lib/dpif-netdev-private-dpif.h   |  32 +++
> > >  lib/dpif-netdev-private-thread.h |  11 +-
> > >  lib/dpif-netdev-private.h|  25 +++
> > >  lib/dpif-netdev.c| 103 --
> > >  8 files changed, 514 insertions(+), 16 deletions(-)
> > >  create mode 100644 lib/dpif-netdev-avx512.c
> > >  create mode 100644 lib/dpif-netdev-private-dpif.h
> > >
> > > diff --git a/NEWS b/NEWS
> > > index 96b3a61c8..6a4a7b76d 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -10,6 +10,8 @@ Post-v2.15.0
> > >   * Auto load balancing of PMDs now partially supports cross-NUMA 
> > > polling
> > > cases, e.g if all PMD threads are running on the same NUMA node.
> > >   * Refactor lib/dpif-netdev.c to multiple header files.
> > > + * Add avx512 implementation of dpif which can process non 
> > > recirculated
> > > +   packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
> > > - ovs-ctl:
> > >   * New option '--no-record-hostname' to disable hostname 
> > > configuration
> > > in ovsdb on startup.
> > > diff --git a/lib/automake.mk b/lib/automake.mk
> > > index 3a33cdd5c..660cd07f0 100644
> > > --- a/lib/automake.mk
> > > +++ b/lib/automake.mk
> > > @@ -33,11 +33,13 @@ lib_libopenvswitchavx512_la_CFLAGS = \
> > >   -mavx512f \
> > >   -mavx512bw \
> > >   -mavx512dq \
> > > + -mbmi \
> > >   -mbmi2 \
> > >   -fPIC \
> > >   $(AM_CFLAGS)
> > >  lib_libopenvswitchavx512_la_SOURCES = \
> > > - lib/dpif-netdev-lookup-avx512-gather.c
> > > + lib/dpif-netdev-lookup-avx512-gather.c \
> > > + lib/dpif-netdev-avx512.c
> > >  lib_libopenvswitchavx512_la_LDFLAGS = \
> > >   -static
> > >  endif
> > > @@ -114,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
> > >   lib/dpif-netdev-private-dfc.c \
> > >   lib/dpif-netdev-private-dfc.h \
> > >   lib/dpif-netdev-private-dpcls.h \
> > > + lib/dpif-netdev-private-dpif.h \
> > >   lib/dpif-netdev-private-flow.h \
> > >   lib/dpif-netdev-private-hwol.h \
> > >   lib/dpif-netdev-private-thread.h \
> > > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> > > new file mode 100644
&

Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of dpif.

2021-06-20 Thread Flavio Leitner



Hi,

I am still reviewing the patch, but I thought worth to discuss
few items below.

On Thu, Jun 17, 2021 at 05:18:17PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds the AVX512 implementation of DPIF functionality,
> specifically the dp_netdev_input_outer_avx512 function. This function
> only handles outer (no re-circulations), and is optimized to use the
> AVX512 ISA for packet batching and other DPIF work.
> 
> Sparse is not able to handle the AVX512 intrinsics, causing compile
> time failures, so it is disabled for this file.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> Co-authored-by: Kumar Amber 
> Signed-off-by: Kumar Amber 
> 
> ---
> 
> v13:
> - Squash "Add HWOL support" commit into this commit.
> - Add NEWS item about this feature here rather than in a later commit.
> - Add #define NUM_U64_IN_ZMM_REG 8.
> - Add comment describing operation of while loop handling HWOL->EMC->SMC
>   lookups in dp_netdev_input_outer_avx512().
> - Add EMC and SMC batch insert functions for better handling of EMC and
>   SMC in AVX512 DPIF.
> - Minor code refactor to address review comments.
> ---
>  NEWS |   2 +
>  lib/automake.mk  |   5 +-
>  lib/dpif-netdev-avx512.c | 327 +++
>  lib/dpif-netdev-private-dfc.h|  25 +++
>  lib/dpif-netdev-private-dpif.h   |  32 +++
>  lib/dpif-netdev-private-thread.h |  11 +-
>  lib/dpif-netdev-private.h|  25 +++
>  lib/dpif-netdev.c| 103 --
>  8 files changed, 514 insertions(+), 16 deletions(-)
>  create mode 100644 lib/dpif-netdev-avx512.c
>  create mode 100644 lib/dpif-netdev-private-dpif.h
> 
> diff --git a/NEWS b/NEWS
> index 96b3a61c8..6a4a7b76d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,6 +10,8 @@ Post-v2.15.0
>   * Auto load balancing of PMDs now partially supports cross-NUMA polling
> cases, e.g if all PMD threads are running on the same NUMA node.
>   * Refactor lib/dpif-netdev.c to multiple header files.
> + * Add avx512 implementation of dpif which can process non recirculated
> +   packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 3a33cdd5c..660cd07f0 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -33,11 +33,13 @@ lib_libopenvswitchavx512_la_CFLAGS = \
>   -mavx512f \
>   -mavx512bw \
>   -mavx512dq \
> + -mbmi \
>   -mbmi2 \
>   -fPIC \
>   $(AM_CFLAGS)
>  lib_libopenvswitchavx512_la_SOURCES = \
> - lib/dpif-netdev-lookup-avx512-gather.c
> + lib/dpif-netdev-lookup-avx512-gather.c \
> + lib/dpif-netdev-avx512.c
>  lib_libopenvswitchavx512_la_LDFLAGS = \
>   -static
>  endif
> @@ -114,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev-private-dfc.c \
>   lib/dpif-netdev-private-dfc.h \
>   lib/dpif-netdev-private-dpcls.h \
> + lib/dpif-netdev-private-dpif.h \
>   lib/dpif-netdev-private-flow.h \
>   lib/dpif-netdev-private-hwol.h \
>   lib/dpif-netdev-private-thread.h \
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> new file mode 100644
> index 0..0e55b0be2
> --- /dev/null
> +++ b/lib/dpif-netdev-avx512.c
> @@ -0,0 +1,327 @@
> +/*
> + * Copyright (c) 2021 Intel Corporation.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifdef __x86_64__
> +/* Sparse cannot handle the AVX512 instructions. */
> +#if !defined(__CHECKER__)
> +
> +#include 
> +
> +#include "dpif-netdev.h"
> +#include "dpif-netdev-perf.h"
> +
> +#include "dpif-netdev-private.h"
> +#include "dpif-netdev-private-dpcls.h"
> +#include "dpif-netdev-private-flow.h"
> +#include "dpif-netdev-private-thread.h"
> +#include "dpif-netdev-private-hwol.h"

The -private.h already includes a few of the above, but
not all, so the interface doesn't seem to be well defined.
For example, in -private.h we have dpcls_lookup() while
other dpcls functions are in -private-dpcls.h. In this
case, the following would be enough:

#include "dpif-netdev-private.h"
#include "dpif-netdev-private-hwol.h"

But then I don't know why other headers are included in the
interface but not the -private-hwol.h.


> +
> +#include "dp-packet.h"
> +#include

Re: [ovs-dev] [v13 03/12] dpif-netdev: Add function pointer for netdev input.

2021-06-18 Thread Flavio Leitner



Hello,

On Thu, Jun 17, 2021 at 05:18:16PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit adds a function pointer to the pmd thread data structure,
> giving the pmd thread flexibility in its dpif-input function choice.
> This allows choosing of the implementation based on ISA capabilities
> of the runtime CPU, leading to optimizations and higher performance.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---
> 
> v13:
> - Minor code refactor to address review comments.
> ---
>  lib/dpif-netdev-private-thread.h | 13 +
>  lib/dpif-netdev.c|  7 ++-
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev-private-thread.h 
> b/lib/dpif-netdev-private-thread.h
> index 5e5308b96..0d674ab83 100644
> --- a/lib/dpif-netdev-private-thread.h
> +++ b/lib/dpif-netdev-private-thread.h
> @@ -47,6 +47,13 @@ struct dp_netdev_pmd_thread_ctx {
>  uint32_t emc_insert_min;
>  };
>  
> +/* Forward declaration for typedef. */
> +struct dp_netdev_pmd_thread;
> +
> +typedef void (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> + struct dp_packet_batch *packets,
> + odp_port_t port_no);
> +
>  /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
>   * the performance overhead of interrupt processing.  Therefore netdev can
>   * not implement rx-wait for these devices.  dpif-netdev needs to poll
> @@ -101,6 +108,12 @@ struct dp_netdev_pmd_thread {
>  /* Current context of the PMD thread. */
>  struct dp_netdev_pmd_thread_ctx ctx;
>  
> +/* Function pointer to call for dp_netdev_input() functionality. */
> +dp_netdev_input_func netdev_input_func;
> +
> +/* Pointer for per-DPIF implementation scratch space. */
> +void *netdev_input_func_userdata;
> +

I see you need to switch the input function and the patch looks fine, but
the default function doesn't require netdev_input_func_userdata. I think
it would be better to add that in the next patch where it is actually
required.

fbl

>  struct seq *reload_seq;
>  uint64_t last_reload_seq;
>  
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e913f4efc..e6486417e 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4231,8 +4231,9 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread 
> *pmd,
>  }
>  }
>  }
> +
>  /* Process packet batch. */
> -dp_netdev_input(pmd, , port_no);
> +pmd->netdev_input_func(pmd, , port_no);
>  
>  /* Assign processing cycles to rx queue. */
>  cycles = cycle_timer_stop(>perf_stats, );
> @@ -6029,6 +6030,10 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread 
> *pmd, struct dp_netdev *dp,
>  hmap_init(>tnl_port_cache);
>  hmap_init(>send_port_cache);
>  cmap_init(>tx_bonds);
> +
> +/* Initialize the DPIF function pointer to the default scalar version. */
> +pmd->netdev_input_func = dp_netdev_input;
> +
>  /* init the 'flow_cache' since there is no
>   * actual thread created for NON_PMD_CORE_ID. */
>  if (core_id == NON_PMD_CORE_ID) {
> -- 
> 2.32.0
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 02/12] dpif-netdev: Split HWOL out to own header file.

2021-06-18 Thread Flavio Leitner



Hello,

I have some comments inline.

On Thu, Jun 17, 2021 at 05:18:15PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> This commit moves the datapath lookup functions required for
> hardware offload to a seperate file. This allows other DPIF


Spelling error.

> implementations to access the lookup functions, encouraging
> code reuse.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---
> 
> Cc: Gaetan Rivet 
> Cc: Sriharsha Basavapatna 
> 
> v13:
> - Minor code refactor to address review comments.
> ---
>  lib/automake.mk|  1 +
>  lib/dpif-netdev-private-hwol.h | 63 ++
>  lib/dpif-netdev.c  | 38 ++--
>  3 files changed, 66 insertions(+), 36 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-hwol.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index fdba3c6c0..3a33cdd5c 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -115,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev-private-dfc.h \
>   lib/dpif-netdev-private-dpcls.h \
>   lib/dpif-netdev-private-flow.h \
> + lib/dpif-netdev-private-hwol.h \
>   lib/dpif-netdev-private-thread.h \
>   lib/dpif-netdev-private.h \
>   lib/dpif-netdev-perf.c \
> diff --git a/lib/dpif-netdev-private-hwol.h b/lib/dpif-netdev-private-hwol.h
> new file mode 100644
> index 0..b93297a74
> --- /dev/null
> +++ b/lib/dpif-netdev-private-hwol.h
> @@ -0,0 +1,63 @@
> +/*
> + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc.
> + * Copyright (c) 2021 Intel Corporation.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef DPIF_NETDEV_PRIVATE_HWOL_H
> +#define DPIF_NETDEV_PRIVATE_HWOL_H 1
> +
> +#include "dpif-netdev-private-flow.h"
> +
> +#define MAX_FLOW_MARK   (UINT32_MAX - 1)
> +#define INVALID_FLOW_MARK   0
> +/* Zero flow mark is used to indicate the HW to remove the mark. A packet
> + * marked with zero mark is received in SW without a mark at all, so it
> + * cannot be used as a valid mark.
> + */
> +
> +struct megaflow_to_mark_data {
> +const struct cmap_node node;
> +ovs_u128 mega_ufid;
> +uint32_t mark;
> +};
> +
> +struct flow_mark {
> +struct cmap megaflow_to_mark;
> +struct cmap mark_to_flow;
> +struct id_pool *pool;
> +};
> +
> +/* allocated in dpif-netdev.c */
> +extern struct flow_mark flow_mark;
> +
> +static inline struct dp_netdev_flow *
> +mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd,
> +  const uint32_t mark)
> +{
> +struct dp_netdev_flow *flow;
> +
> +CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0),
> + _mark.mark_to_flow) {
> +if (flow->mark == mark && flow->pmd_id == pmd->core_id &&
> +flow->dead == false) {
> +return flow;
> +}
> +}
> +
> +return NULL;
> +}

Wouldn't this be better in a separate .c file? Because although the
structure flow_mark is here, it is allocated in dpif-netdev.c and
we have a fairly large function inline. This seems enough to start
a .c file to me.

Thanks,
fbl

> +
> +
> +#endif /* dpif-netdev-private-hwol.h */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index affeeacdc..e913f4efc 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -18,6 +18,7 @@
>  #include "dpif-netdev.h"
>  #include "dpif-netdev-private.h"
>  #include "dpif-netdev-private-dfc.h"
> +#include "dpif-netdev-private-hwol.h"
>  
>  #include 
>  #include 
> @@ -1953,26 +1954,8 @@ dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread 
> *pmd,
>  return cls;
>  }
>  
> -#define MAX_FLOW_MARK   (UINT32_MAX - 1)
> -#define INVALID_FLOW_MARK   0
> -/* Zero flow mark is used to indicate the HW to remove the mark. A packet
> - * marked with zero mark is received in SW without a mark at all, so it
> - * cannot be used as a valid mark.
> - */
>  
> -struct megaflow_to_mark_data {
> -const struct cmap_node node;
> -ovs_u128 mega_ufid;
> -uint32_t mark;
> -};
> -
> -struct flow_mark {
> -struct cmap megaflow_to_mark;
> -struct cmap mark_to_flow;
> -struct id_pool *pool;
> -};
> -
> -static struct flow_mark flow_mark = {
> +struct flow_mark flow_mark = {
>  .megaflow_to_mark = CMAP_INITIALIZER,
>  .mark_to_flow = CMAP_INITIALIZER,
>  };
> @@ -2141,23 +2124,6 @@ flow_mark_flush(struct dp_netdev_pmd_thread *pmd)

Re: [ovs-dev] [v13 01/12] dpif-netdev: Refactor to multiple header files.

2021-06-18 Thread Flavio Leitner



Hello,

Some comments below.

On Thu, Jun 17, 2021 at 05:18:14PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren 
> 
> Split the very large file dpif-netdev.c and the datastructures
> it contains into multiple header files. Each header file is
> responsible for the datastructures of that component.
> 
> This logical split allows better reuse and modularity of the code,
> and reduces the very large file dpif-netdev.c to be more managable.
> 
> Due to dependencies between components, it is not possible to
> move component in smaller granularities than this patch.
> 
> To explain the dependencies better, eg:
> 
> DPCLS has no deps (from dpif-netdev.c file)
> FLOW depends on DPCLS (struct dpcls_rule)
> DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key)
> THREAD depends on DFC (struct dfc_cache)
> 
> DFC_PROC depends on THREAD (struct pmd_thread)
> 
> DPCLS lookup.h/c require only DPCLS
> DPCLS implementations require only dpif-netdev-lookup.h.
> - This change was made in 2.12 release with function pointers
> - This commit only refactors the name to "private-dpcls.h"
> 
> netdev_flow_key_equal_mf() is renamed to emc_flow_key_equal_mf().
> 
> Rename functions specific to dpcls from netdev_* namespace to the
> dpcls_* namespace, as they are only used by dpcls code.
> 
> 'inline' is added to the dp_netdev_flow_hash() when it is moved
> definition to fix a compiler error.
> 
> One valid checkpatch issue with the use of the
> EMC_FOR_EACH_POS_WITH_HASH() macro was fixed.
> 
> Signed-off-by: Harry van Haaren 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> 
> ---
> 
> Cc: Gaetan Rivet 
> Cc: Sriharsha Basavapatna 
> 
> v13:
> - Add NEWS item in this commit rather than later.
> - Add lib/dpif-netdev-private-dfc.c file and move non fast path dfc
>   related functions there.
> - Squash commit which renames functions specific to dpcls from netdev_*
>   namespace to the dpcls_* namespace, as they are only used by dpcls
>   code into this commit.
> - Minor fixes from review comments.
> ---
>  NEWS   |   1 +
>  lib/automake.mk|   5 +
>  lib/dpif-netdev-lookup-autovalidator.c |   1 -
>  lib/dpif-netdev-lookup-avx512-gather.c |   1 -
>  lib/dpif-netdev-lookup-generic.c   |   1 -
>  lib/dpif-netdev-lookup.h   |   2 +-
>  lib/dpif-netdev-private-dfc.c  | 110 +
>  lib/dpif-netdev-private-dfc.h  | 176 
>  lib/dpif-netdev-private-dpcls.h| 127 ++
>  lib/dpif-netdev-private-flow.h | 162 
>  lib/dpif-netdev-private-thread.h   | 206 ++
>  lib/dpif-netdev-private.h  | 100 +
>  lib/dpif-netdev.c  | 539 +
>  13 files changed, 811 insertions(+), 620 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-dfc.c
>  create mode 100644 lib/dpif-netdev-private-dfc.h
>  create mode 100644 lib/dpif-netdev-private-dpcls.h
>  create mode 100644 lib/dpif-netdev-private-flow.h
>  create mode 100644 lib/dpif-netdev-private-thread.h
> 
> diff --git a/NEWS b/NEWS
> index ebba17b22..96b3a61c8 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,6 +9,7 @@ Post-v2.15.0
> - Userspace datapath:
>   * Auto load balancing of PMDs now partially supports cross-NUMA polling
> cases, e.g if all PMD threads are running on the same NUMA node.
> + * Refactor lib/dpif-netdev.c to multiple header files.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/automake.mk b/lib/automake.mk
> index db9017591..fdba3c6c0 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -111,6 +111,11 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev-lookup-generic.c \
>   lib/dpif-netdev.c \
>   lib/dpif-netdev.h \
> + lib/dpif-netdev-private-dfc.c \
> + lib/dpif-netdev-private-dfc.h \
> + lib/dpif-netdev-private-dpcls.h \
> + lib/dpif-netdev-private-flow.h \
> + lib/dpif-netdev-private-thread.h \
>   lib/dpif-netdev-private.h \
>   lib/dpif-netdev-perf.c \
>   lib/dpif-netdev-perf.h \
> diff --git a/lib/dpif-netdev-lookup-autovalidator.c 
> b/lib/dpif-netdev-lookup-autovalidator.c
> index 97b59fdd0..475e1ab1e 100644
> --- a/lib/dpif-netdev-lookup-autovalidator.c
> +++ b/lib/dpif-netdev-lookup-autovalidator.c
> @@ -17,7 +17,6 @@
>  #include 
>  #include "dpif-netdev.h"
>  #include "dpif-netdev-lookup.h"
> -#include "dpif-netdev-private.h"
>  #include "openvswitch/vlog.h"
>  
>  VLOG_DEFINE_THIS_MODULE(dpif_lookup_autovalidator);
> diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
> b/lib/dpif-netdev-lookup-avx512-gather.c
> index 5e3634249..8fc1cdfa5 100644
> --- a/lib/dpif-netdev-lookup-avx512-gather.c
> +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> @@ -21,7 +21,6 @@
>  
>  #include "dpif-netdev.h"
>  #include "dpif-netdev-lookup.h"
> -#include "dpif-netdev-private.h"
>  #include "cmap.h"

Re: [ovs-dev] [RFC 3/3] dpif-netlink: Introduce per-cpu upcall dispatch

2021-06-08 Thread Flavio Leitner



Hi Mark,

This looks good to me.

Since the new scheme doesn't allow users to change the number
of handlers, we must update ovs-vswitchd.conf.db(5) as well.

Some comments below.

On Fri, Apr 30, 2021 at 11:31:29AM -0400, Mark Gray wrote:
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
> 
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
> 
> * On systems with a large number of vports, there is correspondingly
> a large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> 
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
> 
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
> 
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
> 
> Reported-at: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
>  .../linux/compat/include/linux/openvswitch.h  |   7 +
>  lib/dpif-netdev.c |   1 +
>  lib/dpif-netlink.c| 405 +++---
>  lib/dpif-provider.h   |  10 +
>  lib/dpif.c|  17 +
>  lib/dpif.h|   1 +
>  ofproto/ofproto-dpif-upcall.c |  51 ++-
>  ofproto/ofproto.c |  12 -
>  8 files changed, 430 insertions(+), 74 deletions(-)
> 
> diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
> b/datapath/linux/compat/include/linux/openvswitch.h
> index 875de20250ce..f29265df055e 100644
> --- a/datapath/linux/compat/include/linux/openvswitch.h
> +++ b/datapath/linux/compat/include/linux/openvswitch.h
> @@ -89,6 +89,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.  Always present in notifications.
>   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> the
> @@ -105,6 +107,8 @@ enum ovs_datapath_attr {
>   OVS_DP_ATTR_MEGAFLOW_STATS, /* struct ovs_dp_megaflow_stats */
>   OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
>   OVS_DP_ATTR_PAD,
> + OVS_DP_ATTR_PAD2,
> + OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls */
>   __OVS_DP_ATTR_MAX
>  };
>  
> @@ -146,6 +150,9 @@ struct ovs_vport_stats {
>  /* Allow tc offload recirc sharing */
>  #define OVS_DP_F_TC_RECIRC_SHARING  (1 << 2)
>  
> +/* Allow per-cpu dispatch of upcalls */
> +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3)
> +
>  /* Fixed logical ports. */
>  #define OVSP_LOCAL  ((__u32)0)
>  
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 251788b04965..24e6911dd4ff 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -8488,6 +8488,7 @@ const struct dpif_class dpif_netdev_class = {
>  dpif_netdev_operate,
>  NULL,   /* recv_set */
>  NULL,   /* handlers_set */
> +NULL,   /* handlers_get */

That is number_handlers_required.

>  dpif_netdev_set_config,
>  dpif_netdev_queue_to_priority,
>  NULL,   /* recv */
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index 2ded5fdd01b3..349897e70632 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -80,6 +80,9 @@ enum { MAX_PORTS = USHRT_MAX };
>  #define FLOW_DUMP_MAX_BATCH 50
>  #define OPERATE_MAX_OPS 50
>  
> +#define DISPATCH_MODE_PER_CPU(dpif) ((dpif)->user_features & \
> + OVS_DP_F_DISPATCH_UPCALL_PER_CPU)
> +

Perhaps a function like this:
static inline bool

Re: [ovs-dev] [RFC 1/3] ofproto: change type of n_handlers and n_revalidators

2021-05-28 Thread Flavio Leitner

On Fri, Apr 30, 2021 at 11:31:27AM -0400, Mark Gray wrote:
> 'n_handlers' and 'n_revalidators' are declared as type 'size_t'.
> However, dpif_handlers_set() requires parameter 'n_handlers' as
> type 'uint32_t'. This patch fixes this type mismatch.

The change looks good, but I didn't understand the criteria used
to do the change. For example, at udpif_stop_threads() you changed
from 'size_t' to 'uint32_t', but variable 'i' is not required
to be of the same type (marked in line below). However, I could
find other similar cases left unchanged.

fbl

> 
> Signed-off-by: Mark Gray 
> ---
>  ofproto/ofproto-dpif-upcall.c | 24 
>  ofproto/ofproto-dpif-upcall.h |  5 +++--
>  ofproto/ofproto-provider.h|  2 +-
>  ofproto/ofproto.c |  2 +-
>  4 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index ccf97266c0b9..88406fea1391 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -129,10 +129,10 @@ struct udpif {
>  struct dpif_backer *backer;/* Opaque dpif_backer pointer. */
>  
>  struct handler *handlers;  /* Upcall handlers. */
> -size_t n_handlers;
> +uint32_t n_handlers;
>  
>  struct revalidator *revalidators;  /* Flow revalidators. */
> -size_t n_revalidators;
> +uint32_t n_revalidators;
>  
>  struct latch exit_latch;   /* Tells child threads to exit. */
>  
> @@ -335,8 +335,8 @@ static int process_upcall(struct udpif *, struct upcall *,
>struct ofpbuf *odp_actions, struct flow_wildcards 
> *);
>  static void handle_upcalls(struct udpif *, struct upcall *, size_t 
> n_upcalls);
>  static void udpif_stop_threads(struct udpif *, bool delete_flows);
> -static void udpif_start_threads(struct udpif *, size_t n_handlers,
> -size_t n_revalidators);
> +static void udpif_start_threads(struct udpif *, uint32_t n_handlers,
> +uint32_t n_revalidators);
>  static void udpif_pause_revalidators(struct udpif *);
>  static void udpif_resume_revalidators(struct udpif *);
>  static void *udpif_upcall_handler(void *);
> @@ -522,7 +522,7 @@ static void
>  udpif_stop_threads(struct udpif *udpif, bool delete_flows)
>  {
>  if (udpif && (udpif->n_handlers != 0 || udpif->n_revalidators != 0)) {
> -size_t i;
> +uint32_t i;



>  
>  /* Tell the threads to exit. */
>  latch_set(>exit_latch);
> @@ -562,8 +562,8 @@ udpif_stop_threads(struct udpif *udpif, bool delete_flows)
>  
>  /* Starts the handler and revalidator threads. */
>  static void
> -udpif_start_threads(struct udpif *udpif, size_t n_handlers_,
> -size_t n_revalidators_)
> +udpif_start_threads(struct udpif *udpif, uint32_t n_handlers_,
> +uint32_t n_revalidators_)
>  {
>  if (udpif && n_handlers_ && n_revalidators_) {
>  /* Creating a thread can take a significant amount of time on some
> @@ -574,7 +574,7 @@ udpif_start_threads(struct udpif *udpif, size_t 
> n_handlers_,
>  udpif->n_revalidators = n_revalidators_;
>  
>  udpif->handlers = xzalloc(udpif->n_handlers * sizeof 
> *udpif->handlers);
> -for (size_t i = 0; i < udpif->n_handlers; i++) {
> +for (uint32_t i = 0; i < udpif->n_handlers; i++) {
>  struct handler *handler = >handlers[i];
>  
>  handler->udpif = udpif;
> @@ -632,8 +632,8 @@ udpif_resume_revalidators(struct udpif *udpif)
>   * datapath handle must have packet reception enabled before starting
>   * threads. */
>  void
> -udpif_set_threads(struct udpif *udpif, size_t n_handlers_,
> -  size_t n_revalidators_)
> +udpif_set_threads(struct udpif *udpif, uint32_t n_handlers_,
> +  uint32_t n_revalidators_)
>  {
>  ovs_assert(udpif);
>  ovs_assert(n_handlers_ && n_revalidators_);
> @@ -691,8 +691,8 @@ udpif_get_memory_usage(struct udpif *udpif, struct simap 
> *usage)
>  void
>  udpif_flush(struct udpif *udpif)
>  {
> -size_t n_handlers_ = udpif->n_handlers;
> -size_t n_revalidators_ = udpif->n_revalidators;
> +uint32_t n_handlers_ = udpif->n_handlers;
> +uint32_t n_revalidators_ = udpif->n_revalidators;
>  
>  udpif_stop_threads(udpif, true);
>  dpif_flow_flush(udpif->dpif);
> diff --git a/ofproto/ofproto-dpif-upcall.h b/ofproto/ofproto-dpif-upcall.h
> index 693107ae56c1..b4dfed32046e 100644
> --- a/ofproto/ofproto-dpif-upcall.h
> +++ b/ofproto/ofproto-dpif-upcall.h
> @@ -16,6 +16,7 @@
>  #define OFPROTO_DPIF_UPCALL_H
>  
>  #include 
> +#include 
>  
>  struct dpif;
>  struct dpif_backer;
> @@ -31,8 +32,8 @@ struct simap;
>  void udpif_init(void);
>  struct udpif *udpif_create(struct dpif_backer *, struct dpif *);
>  void udpif_run(struct udpif *udpif);
> -void udpif_set_threads(struct udpif *, size_t n_handlers,
> -

Re: [ovs-dev] [RFC 2/3] dpif-netlink: fix report_loss() message

2021-05-28 Thread Flavio Leitner

On Fri, Apr 30, 2021 at 11:31:28AM -0400, Mark Gray wrote:
> Signed-off-by: Mark Gray 

This looks like a bug fix for this commit:
1579cf677fcb dpif-linux: Implement the API functions to allow multiple ...

If you agree, please add the Fixes: tag.

fbl

> ---
>  lib/dpif-netlink.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index 50520f8c0687..2ded5fdd01b3 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -4662,7 +4662,7 @@ report_loss(struct dpif_netlink *dpif, struct 
> dpif_channel *ch, uint32_t ch_idx,
>time_msec() - ch->last_poll);
>  }
>  
> -VLOG_WARN("%s: lost packet on port channel %u of handler %u",
> -  dpif_name(>dpif), ch_idx, handler_id);
> +VLOG_WARN("%s: lost packet on port channel %u of handler %u%s",
> +  dpif_name(>dpif), ch_idx, handler_id, ds_cstr());
>  ds_destroy();
>  }
> -- 
> 2.27.0
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC net-next] openvswitch: Introduce per-cpu upcall dispatch

2021-05-28 Thread Flavio Leitner



Hi Mark,

I think this patch is going in the right direction but there
are some points that I think we should address. See below.

On Fri, Apr 30, 2021 at 11:33:25AM -0400, Mark Gray wrote:
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
> 
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
> 
> * On systems with a large number of vports, there is a correspondingly
> large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> 
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
> 
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
> 
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
> 
> The corresponding user space code can be found at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html

Thanks for writing a nice commit description.

> 
> Bugzilla: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
>  include/uapi/linux/openvswitch.h |  8 
>  net/openvswitch/datapath.c   | 70 +++-
>  net/openvswitch/datapath.h   | 18 
>  net/openvswitch/flow_netlink.c   |  4 --
>  4 files changed, 94 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 8d16744edc31..6571b57b2268 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -70,6 +70,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.  Always present in notifications.
>   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> the
> @@ -87,6 +89,9 @@ enum ovs_datapath_attr {
>   OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
>   OVS_DP_ATTR_PAD,
>   OVS_DP_ATTR_MASKS_CACHE_SIZE,
> + OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls in 
> per-cpu
> +  * dispatch mode
> +  */
>   __OVS_DP_ATTR_MAX
>  };
>  
> @@ -127,6 +132,9 @@ struct ovs_vport_stats {
>  /* Allow tc offload recirc sharing */
>  #define OVS_DP_F_TC_RECIRC_SHARING   (1 << 2)
>  
> +/* Allow per-cpu dispatch of upcalls */
> +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3)
> +
>  /* Fixed logical ports. */
>  #define OVSP_LOCAL  ((__u32)0)
>  
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 9d6ef6cb9b26..98d54f41fdaa 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -121,6 +121,8 @@ int lockdep_ovsl_is_held(void)
>  #endif
>  
>  static struct vport *new_vport(const struct vport_parms *);
> +static u32 ovs_dp_get_upcall_portid(const struct datapath *, uint32_t);
> +static int ovs_dp_set_upcall_portids(struct datapath *, const struct nlattr 
> *);
>  static int queue_gso_packets(struct datapath *dp, struct sk_buff *,
>const struct sw_flow_key *,
>const struct dp_upcall_info *,
> @@ -238,7 +240,12 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct 
> sw_flow_key *key)
>  
>   memset(, 0, sizeof(upcall));
>   upcall.cmd = OVS_PACKET_CMD_MISS;
> - upcall.portid = ovs_vport_find_upcall_portid(p, skb);
> +
> + if (dp->user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU)
> + upcall.portid = ovs_dp_get_upcall_portid(dp, 
> smp_processor_id());
> + else
> + upcall.portid = ovs_vport_find_upcall_portid(p,

Re: [ovs-dev] Moving of the primary #openvswitch channel to irc.libera.chat ?

2021-05-20 Thread Flavio Leitner

On Wed, May 19, 2021 at 01:22:58PM -0700, Ben Pfaff wrote:
> On Wed, May 19, 2021 at 10:03:57PM +0200, Ilya Maximets wrote:
> > Hi.
> > 
> > Taking into account some very unhealthy things that happened recently
> > with FreeNode network and resigning of lots of its stuff [1], we
> > probably need to discuss if Open vSwitch project wants to change the
> > IRC server for a primary #openvswitch channel.  User's data is the
> > main concern, IIUC, as it's unclear what the new management will
> > do with the network.
> > 
> > The main alternative now seems to be the Libera.Chat [2] where most of
> > the former FreeNode stuff.
> > 
> > Some projects already announced [3][4] movement to Libera.Chat.  Others
> > are discussing the possibility [5].
> > 
> > So, I think, it make sense to discuss the future of #openvswitch
> > channel too.  Any thoughts?
> > 
> > Will we have an OVN meeting on a different server tomorrow?
> 
> Thanks for sending such a detailed message (with footnotes!).
> I think I support the move.  I have already registered the #openvswitch
> channel on libera.chat (and copied the topic across).

Great! Thanks.


> I'm going to log in on both servers tomorrow but I suggest that we
> transition to libera.chat after tomorrow's meeting, since this is pretty
> short notice and I think that most of us (including me) are only light
> users of IRC.
> 
> Also, I suspect that the infrastructure for recording meetings has not
> yet moved to libera.chat.  (I did notice that the 'openstack' user just
> dropped out of #openvswitch.  I think that's the bot that did the
> meeting recordings.  So maybe that infastructure is moving across right
> as I write.)

When things are ported, please change freenode's #openvswitch topic
to say that we are moving to the new network.

Usually the 'topic' is displayed when someone joins the channel, so
it could help users to find out about the change.

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 0/3] dpif-netlink: Introduce per-cpu upcall dispatching

2021-05-11 Thread Flavio Leitner

Hi,

On Fri, Apr 30, 2021 at 11:31:26AM -0400, Mark Gray wrote:
> This series proposes a new method of distributing upcalls
> to user space threads attempting to resolve a number of
> issues with the current method.
> 

I ran some tests with old V10, current master and this RFC
including the kernel (based on 5.11.0) on a 28 cores system.

The old v10 had the issue of not scaling up in case of a high
load of upcalls. The test sends a burst of UDP packets which
causes upcalls. The table below shows how many packets could
be sent without increasing the upcall loss counter.
   v10   master rfc
packets2k5   >55k   10k

So, it reproduced the same old v10 value. Regarding to branch
master then it's not determined due to test limitation. It is
at least above 55k (last time I think it was 63k). The RFC patch
resulted in a better number compared with v10 though the test
should be using only one thread as v10. I think that keeping
the CPU context could explain the difference.

Running the test with 8 parallel threads sending one burst of
UDP packets each resulted in the following table:
  Branch   missed   lost   
   v10 5201850288
  master   520220
   RFC 520210

Now the wake ups, one thread:
  Branch   wakeprocessing
  master   20+   16+
   RFC 3 1

Column wake: number of different threads receiving
   sched:sched_wakeup or irq:softirq_entry.
Column processing: number of CPUs with double digits
   usage.

And 8 parallel threads:
  Branch   wakeprocessing
  master   20+   20+
   RFC 108+

The results show that this new patch-set addressed the main
thundering herd issue and the scalability issue I reported
during V10 review.

Unfortunately I can review the patches only next week.

Thanks,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] flow: Count and dump invalid IP packets.

2021-04-16 Thread Flavio Leitner

On Fri, Apr 16, 2021 at 02:06:31PM +0200, David Marchand wrote:
> Skipping further processing of invalid IP packets helps avoid crashes
> but it does not help to figure out if the malformed packets are still
> present on the network.
> 
> Add coverage counters for IPv4 and IPv6 sanity checks so that we know
> there are some invalid packets.
> 
> Dump such whole packets in debug mode.
> 
> Signed-off-by: David Marchand 
> Acked-by: Eelco Chaudron 
> ---

The patch looks good to me.

Generated log dumping the packet correctly:
2021-04-16T12:37:25.525Z|4|flow(handler21)|DBG|invalid packet for 
ipv6_sanity_check: port 1, size 86
  33 33 ff 00 00 02 7a d0-49 c1 c0 e9 86 dd 60 00   
 
0010  00 00 00 21 3a ff fe 80-00 00 00 00 00 00 78 d0   
 
0020  49 ff fe c1 c0 e9 ff 02-00 00 00 00 00 00 00 00   
 
0030  00 01 ff 00 00 02 87 00-74 a2 00 00 00 00 fe 80   
 
0040  00 00 00 00 00 00 00 00-00 00 00 00 00 02 01 01   
 
0050  7a d0 49 c1 c0 e9 

# ovs-appctl coverage/show | grep miniflow
miniflow_extract_ipv6_pkt_len_error   0.0/sec 0.000/sec 0.0011/sec   total: 
4

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] flow: Count and dump invalid IP packets.

2021-04-15 Thread Flavio Leitner

On Thu, Apr 15, 2021 at 01:32:06PM +0200, Eelco Chaudron wrote:
> 
> 
> On 15 Apr 2021, at 13:12, Flavio Leitner wrote:
> 
> > Hi Eelco,
> > 
> > On Thu, Apr 15, 2021 at 10:07:24AM +0200, Eelco Chaudron wrote:
> > > 
> > > 
> > > On 14 Apr 2021, at 17:50, David Marchand wrote:
> > > 
> > > > Skipping further processing of invalid IP packets helps avoid
> > > > crashes
> > > > but it does not help to figure out if the malformed packets are
> > > > still
> > > > present on the network.
> > > > 
> > > > Add coverage counters for IPv4 and IPv6 sanity checks so that we
> > > > know
> > > > there are some invalid packets.
> > > > 
> > > > Dump such whole packets in debug mode.
> > > 
> > > Looks good to me, some small nits below.
> > > 
> > > > Signed-off-by: David Marchand 
> > > > ---
> > > >  lib/flow.c | 42 ++
> > > >  1 file changed, 42 insertions(+)
> > > > 
> > > > diff --git a/lib/flow.c b/lib/flow.c
> > > > index 729d59b1b3..2b55244190 100644
> > > > --- a/lib/flow.c
> > > > +++ b/lib/flow.c
> > > > @@ -44,8 +44,15 @@
> > > >  #include "openvswitch/nsh.h"
> > > >  #include "ovs-router.h"
> > > >  #include "lib/netdev-provider.h"
> > > > +#include "openvswitch/vlog.h"
> > > > +
> > > > +VLOG_DEFINE_THIS_MODULE(flow);
> > > > 
> > > >  COVERAGE_DEFINE(flow_extract);
> > > > +COVERAGE_DEFINE(ipv4_check_too_short);
> > > > +COVERAGE_DEFINE(ipv4_check_length_error);
> > > > +COVERAGE_DEFINE(ipv6_check_too_short);
> > > > +COVERAGE_DEFINE(ipv6_check_length_error);
> > > 
> > > The check keyword is a bit confusing to me, maybe something like
> > > ipv4_pkt_too_short, etc.?
> > 
> > David may have a different idea, but to me this works to pinpoint
> > the packet failure to ipv*_sanity_check(). Perhaps the name could
> > be better. However, a generic name that can be used in more places
> > would make it harder to pinpoint.
> 
> Guess you have to do something like grep “COVERAGE_DEFINE()” anyway to
> figure out what it does due to lack of documantion on any coverage counter
> :)

Perhaps calling the stats like these helps:
miniflow_extract_ipv*_len_error
miniflow_extract_ipv*_too_short

It still requires you to know OVS internals though.

fbl

> 
> Maybe Ilya has some preference?
> 
> > > >  COVERAGE_DEFINE(miniflow_malloc);
> > > > 
> > > >  /* U64 indices for segmented flow classification. */
> > > > @@ -645,17 +652,20 @@ ipv4_sanity_check(const struct ip_header *nh,
> > > > size_t size,
> > > >  uint16_t tot_len;
> > > > 
> > > >  if (OVS_UNLIKELY(size < IP_HEADER_LEN)) {
> > > > +COVERAGE_INC(ipv4_check_too_short);
> > > >  return false;
> > > >  }
> > > >  ip_len = IP_IHL(nh->ip_ihl_ver) * 4;
> > > > 
> > > >  if (OVS_UNLIKELY(ip_len < IP_HEADER_LEN || size < ip_len)) {
> > > > +COVERAGE_INC(ipv4_check_length_error);
> > > >  return false;
> > > >  }
> > > > 
> > > >  tot_len = ntohs(nh->ip_tot_len);
> > > >  if (OVS_UNLIKELY(tot_len > size || ip_len > tot_len ||
> > > >  size - tot_len > UINT16_MAX)) {
> > > > +COVERAGE_INC(ipv4_check_length_error);
> > > >  return false;
> > > >  }
> > > > 
> > > > @@ -686,21 +696,41 @@ ipv6_sanity_check(const struct
> > > > ovs_16aligned_ip6_hdr *nh, size_t size)
> > > >  uint16_t plen;
> > > > 
> > > >  if (OVS_UNLIKELY(size < sizeof *nh)) {
> > > > +COVERAGE_INC(ipv6_check_too_short);
> > > >  return false;
> > > >  }
> > > > 
> > > >  plen = ntohs(nh->ip6_plen);
> > > >  if (OVS_UNLIKELY(plen + IPV6_HEADER_LEN > size)) {
> > > > +COVERAGE_INC(ipv6_check_length_error);
> > > >  return false;
> > > >  }
> > > > 
> > > >  if (OVS_UNLIKELY(size - (plen + IPV6_HEADER_LEN) >
> > > > UINT16_MAX)) {
> > > > +COVERAGE_INC(ipv6_ch

Re: [ovs-dev] [PATCH] flow: Count and dump invalid IP packets.

2021-04-15 Thread Flavio Leitner


Hi Eelco,

On Thu, Apr 15, 2021 at 10:07:24AM +0200, Eelco Chaudron wrote:
> 
> 
> On 14 Apr 2021, at 17:50, David Marchand wrote:
> 
> > Skipping further processing of invalid IP packets helps avoid crashes
> > but it does not help to figure out if the malformed packets are still
> > present on the network.
> > 
> > Add coverage counters for IPv4 and IPv6 sanity checks so that we know
> > there are some invalid packets.
> > 
> > Dump such whole packets in debug mode.
> 
> Looks good to me, some small nits below.
> 
> > Signed-off-by: David Marchand 
> > ---
> >  lib/flow.c | 42 ++
> >  1 file changed, 42 insertions(+)
> > 
> > diff --git a/lib/flow.c b/lib/flow.c
> > index 729d59b1b3..2b55244190 100644
> > --- a/lib/flow.c
> > +++ b/lib/flow.c
> > @@ -44,8 +44,15 @@
> >  #include "openvswitch/nsh.h"
> >  #include "ovs-router.h"
> >  #include "lib/netdev-provider.h"
> > +#include "openvswitch/vlog.h"
> > +
> > +VLOG_DEFINE_THIS_MODULE(flow);
> > 
> >  COVERAGE_DEFINE(flow_extract);
> > +COVERAGE_DEFINE(ipv4_check_too_short);
> > +COVERAGE_DEFINE(ipv4_check_length_error);
> > +COVERAGE_DEFINE(ipv6_check_too_short);
> > +COVERAGE_DEFINE(ipv6_check_length_error);
> 
> The check keyword is a bit confusing to me, maybe something like
> ipv4_pkt_too_short, etc.?

David may have a different idea, but to me this works to pinpoint
the packet failure to ipv*_sanity_check(). Perhaps the name could
be better. However, a generic name that can be used in more places
would make it harder to pinpoint.


> >  COVERAGE_DEFINE(miniflow_malloc);
> > 
> >  /* U64 indices for segmented flow classification. */
> > @@ -645,17 +652,20 @@ ipv4_sanity_check(const struct ip_header *nh,
> > size_t size,
> >  uint16_t tot_len;
> > 
> >  if (OVS_UNLIKELY(size < IP_HEADER_LEN)) {
> > +COVERAGE_INC(ipv4_check_too_short);
> >  return false;
> >  }
> >  ip_len = IP_IHL(nh->ip_ihl_ver) * 4;
> > 
> >  if (OVS_UNLIKELY(ip_len < IP_HEADER_LEN || size < ip_len)) {
> > +COVERAGE_INC(ipv4_check_length_error);
> >  return false;
> >  }
> > 
> >  tot_len = ntohs(nh->ip_tot_len);
> >  if (OVS_UNLIKELY(tot_len > size || ip_len > tot_len ||
> >  size - tot_len > UINT16_MAX)) {
> > +COVERAGE_INC(ipv4_check_length_error);
> >  return false;
> >  }
> > 
> > @@ -686,21 +696,41 @@ ipv6_sanity_check(const struct
> > ovs_16aligned_ip6_hdr *nh, size_t size)
> >  uint16_t plen;
> > 
> >  if (OVS_UNLIKELY(size < sizeof *nh)) {
> > +COVERAGE_INC(ipv6_check_too_short);
> >  return false;
> >  }
> > 
> >  plen = ntohs(nh->ip6_plen);
> >  if (OVS_UNLIKELY(plen + IPV6_HEADER_LEN > size)) {
> > +COVERAGE_INC(ipv6_check_length_error);
> >  return false;
> >  }
> > 
> >  if (OVS_UNLIKELY(size - (plen + IPV6_HEADER_LEN) > UINT16_MAX)) {
> > +COVERAGE_INC(ipv6_check_length_error);
> >  return false;
> >  }
> > 
> >  return true;
> >  }
> > 
> > +static void
> > +dump_invalid_packet(struct dp_packet *packet, const char *reason)
> > +{
> > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> > +struct ds ds = DS_EMPTY_INITIALIZER;
> > +size_t size;
> > +
> > +if (VLOG_DROP_DBG()) {
> > +return;
> > +}
> > +size = dp_packet_size(packet);
> > +ds_put_hex_dump(, dp_packet_data(packet), size, 0, false);
> 
> Are we sure we need to dump the entire packet, or are we ok with let’s say
> the first 128 bytes?

For normal packets yes, that would be the case. But the packet might
be corrupted and this is only used when debugging is enabled, so having
a full dump is useful.

fbl

> 
> > +VLOG_DBG("invalid packet for %s: port %"PRIu32", size
> > %"PRIuSIZE"\n%s",
> 
> Do we want to indent the next line a bit, so we know they are together,
> i.e., “PRIuSIZE”\n  %s”
> 
> > + reason, packet->md.in_port.odp_port, size, ds_cstr());
> > +ds_destroy();
> > +}
> > +
> >  /* Initializes 'dst' from 'packet' and 'md', taking the packet type
> > into
> >   * account.  'dst' must have enough space for FLOW_U64S * 8 bytes.
> >   *
> > @@ -850,6 +880,9 @@ miniflow_extract(struct dp_packet *packet, struct
> > miniflow *dst)
> >  uint16_t tot_len;
> > 
> >  if (OVS_UNLIKELY(!ipv4_sanity_check(nh, size, _len,
> > _len))) {
> > +if (OVS_UNLIKELY(VLOG_IS_DBG_ENABLED())) {
> > +dump_invalid_packet(packet, "ipv4_sanity_check");
> > +}
> >  goto out;
> >  }
> >  dp_packet_set_l2_pad_size(packet, size - tot_len);
> > @@ -880,6 +913,9 @@ miniflow_extract(struct dp_packet *packet, struct
> > miniflow *dst)
> >  uint16_t plen;
> > 
> >  if (OVS_UNLIKELY(!ipv6_sanity_check(nh, size))) {
> > +if (OVS_UNLIKELY(VLOG_IS_DBG_ENABLED())) {
> > +dump_invalid_packet(packet, "ipv6_sanity_check");
> > +

Re: [ovs-dev] [PATCH] ipsec: Fix race in system tests

2021-04-13 Thread Flavio Leitner

On Tue, Apr 13, 2021 at 01:06:40PM -0400, Mark Gray wrote:
> This patch fixes an issue where, depending on timing fluctuations,
> each node has not fully loaded all connections before the other
> node begins to establish a connection. In this failure case, the
> "ovs-monitor-ipsec" instance on the "left" node may `ipsec auto --start`
> a connection which then gets rejected by the "right" side. Almost,
> simulaneously, the "right" side may initiate a connection that gets
> rejected by the "left" side. This can happen as, for all tunnels except
> for GRE, each node has two connections (an "in" connection and an "out"
> connection) that get added one after the other. If the "in" connection
> "starts" on both sides, the "out" connection from the other node
> may not be available causing the connection to fail. At this point,
> "Libreswan" will wait to retry the connection. In the interim, the
> OVS system test times out. This race manifests itself more frequently
> in a virtualized environment.
> 
> This patch resolves this issue by waiting for the "left" node to load
> all connections before starting the "right" side. This will cause
> the "left" side to fail to establish a connection with the "right"
> side (as the "right" side connections have not been loaded) but will
> cause the "right" side to succeed to establish a connection as all
> connections will have been loaded on the "left" side.
> 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381857.html
> Fixes: 8fc62df8b135 ("ipsec: Introduce IPsec system tests for Libreswan.")
> Signed-off-by: Mark Gray 
> ---

Thanks for following up with a testsuite fix.

The patch survived a loop testing (-k ipsec) 500 times.

Tested-by: Flavio Leitner 
Acked-by: Flavio Leitner 

fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1013 matches

Mail list logo