Re: [ovs-dev] [PATCH ovn] Honour options for solicited RA

2020-07-10 Thread Gabriele Cerami


> Replies to router solicitation follow a different flow than periodic RA.
> This flow currently does not honour the dnssl, rdnss and route_info
> options.
> 
> This patch modifies the flow to honour those options.
> 

I wanted to send out this patch because the implementation reached
basic functionality and the added tests establish correct expectations.

But the duplication between periodic RA and solicited RA is
too evident at this point to not address it.

In v2 I'll try to create common functions both in code and tests that can
be reused from both periodic and solicited flows for the same options.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] Honour options for solicited RA

2020-07-10 Thread Gabriele Cerami
Replies to router solicitation follow a different flow than periodic RA.
This flow currently does not honour the dnssl, rdnss and route_info
options.

This patch modifies the flow to honour those options.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1851788
Signed-off-by: Gabriele Cerami 
---
 lib/actions.c   | 145 
 lib/ovn-l7.h|   4 ++
 northd/ovn-northd.c |  18 ++
 tests/ovn.at|  68 +
 4 files changed, 223 insertions(+), 12 deletions(-)

diff --git a/lib/actions.c b/lib/actions.c
index e14907e3d..baa87517a 100644
--- a/lib/actions.c
+++ b/lib/actions.c
@@ -25,6 +25,7 @@
 #include "ovn-l7.h"
 #include "hash.h"
 #include "lib/packets.h"
+#include "lib/ovn-util.h"
 #include "nx-match.h"
 #include "openvswitch/dynamic-string.h"
 #include "openvswitch/hmap.h"
@@ -2671,6 +2672,18 @@ parse_put_nd_ra_opts(struct action_context *ctx, const 
struct expr_field *dst,
 case ND_OPT_MTU:
 ok = c->format == LEX_F_DECIMAL;
 break;
+
+case ND_OPT_RDNSS:
+ok = c->format == LEX_F_IPV6 && !c->masked;
+break;
+
+case ND_OPT_DNSSL:
+/* validation is left to the encoder */
+break;
+
+case ND_OPT_ROUTE_INFO_TYPE:
+/* validation is left to the encoder */
+break;
 }
 
 if (!ok) {
@@ -2775,6 +2788,138 @@ encode_put_nd_ra_option(const struct ovnact_gen_option 
*o,
sizeof(ovs_be32[4]));
 break;
 }
+
+case ND_OPT_DNSSL:
+{
+char *t0, *r0 = NULL, dnssl[255] = {};
+size_t size = sizeof(struct ovs_nd_dnssl);
+int i = 0;
+
+/* Multiple DNS Search List must be 'comma' separated
+ * (e.g. "a.b.c, d.e.f"). Domain names must be encoded
+ * as described in Section 3.1 of RFC1035.
+ * (e.g if dns list is a.b.c,www.ovn.org, it will be encoded as:
+ * 01 61 01 62 01 63 00 03 77 77 77 03 6f 76 63 03 6f 72 67 00
+ */
+for (t0 = strtok_r(c->string, ",", &r0); t0;
+ t0 = strtok_r(NULL, ",", &r0)) {
+char *t1, *r1 = NULL;
+
+if (size > sizeof(dnssl)) {
+/* too many dns options, truncate */
+break;
+} else {
+/* 1 byte label length at tge start, 1 byte 0 at the end */
+size += strlen(t0) + 2;
+}
+
+for (t1 = strtok_r(t0, ".", &r1); t1;
+ t1 = strtok_r(NULL, ".", &r1)) {
+dnssl[i++] = strlen(t1);
+memcpy(&dnssl[i], t1, strlen(t1));
+i += strlen(t1);
+}
+dnssl[i++] = 0;
+}
+size = ROUND_UP(size, 8);
+
+struct ovs_nd_dnssl *ra_dnssl =
+ofpbuf_put_uninit(ofpacts, sizeof *ra_dnssl);
+ra_dnssl->type = ND_OPT_DNSSL;
+ra_dnssl->len = size / 8;
+ra_dnssl->reserved = 0;
+/* Lifetime
+ * SHOULD be bounded as follows:
+ * MaxRtrAdvInterval <= Lifetime <= 2*MaxRtrAdvInterval.
+ */
+put_16aligned_be32(&ra_dnssl->lifetime, htonl(0x));
+ofpbuf_put(ofpacts, dnssl, size - sizeof(struct ovs_nd_dnssl));
+break;
+}
+
+case ND_OPT_RDNSS:
+{
+/* OVN supports only a single rdnss */
+int num = 1;
+/* with multiple dns support this will need to be filled
+ * by a strtok_r loop too */
+struct in6_addr dns[255] = {};
+dns[0] = c->value.ipv6;
+struct nd_rdnss_opt *ra_rdnss =
+ofpbuf_put_uninit(ofpacts, sizeof *ra_rdnss);
+size_t len = 2 * num + 1;
+
+ra_rdnss->type = ND_OPT_RDNSS;
+ra_rdnss->len = len;
+ra_rdnss->reserved = 0;
+put_16aligned_be32(&ra_rdnss->lifetime, htonl(0x));
+
+for (int i = 0; i < num; i++) {
+ofpbuf_put(ofpacts, &dns[i], sizeof(ovs_be32[4]));
+}
+break;
+}
+
+case ND_OPT_ROUTE_INFO_TYPE:
+{
+char *t0, *r0 = NULL;
+size_t size = 0;
+
+for (t0 = strtok_r(c->string, ",", &r0); t0;
+ t0 = strtok_r(NULL, ",", &r0)) {
+struct ovs_nd_route_info nd_rinfo;
+char *t1, *r1 = NULL;
+int index;
+
+nd_rinfo.type = ND_OPT_ROUTE_INFO_TYPE;
+nd_rinfo.route_lifetime = htonl(0x);
+
+for (t1 = strtok_r(t0, "-", &r1), index = 0; t1;
+ t1 = strtok_r(NULL, "-", &r1), index++) {
+
+switch (index) {
+case 0:
+if (!strcmp(t1, "HIGH")) {
+nd_rinfo.flags = IPV6_ND_RA_OPT_PRF_HIGH;
+} else if (!strcmp(t1, "LOW")) {
+nd_rinfo.flags = IPV6_ND_RA_OPT_PRF_LOW;
+} else {
+nd_rinfo.flags = IPV6_ND_RA_OPT_PRF_NORMAL;
+ 

Re: [ovs-dev] [PATCH v2 2/5] Enable VXLAN TSO for DPDK datapath

2020-07-10 Thread Flavio Leitner


Hi Yi,

This is not a full review, but netdev-dpdk.c is used by Window
and BSD as well, and there is a 'linux' function which seems
to be a copy of another existing one. Perhaps we can use just one?

This patch resets ol_flags from vhostuser ignoring what has
been set by rte_vhost_dequeue_burst(). What happens if a VM
turns off offloading? Also that it is always enabled while
userspace offloading is experimental and default to off.

Why do we need to set l2_len, l3_len and l4_len when receiving
from the VM? Those are not used by OVS and if the packet
changes during the pipeline execution, they will need to be
updated at the appropriate prepare function, which for dpdk is
netdev_dpdk_prep_hwol_packet().

Few more comments below. 

Thanks!
fbl

On Wed, Jul 01, 2020 at 05:15:30PM +0800, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> Many NICs can support VXLAN TSO which can help
> improve across-compute-node VM-to-VM performance
> in case that MTU is set to 1500.
> 
> This patch allows dpdkvhostuserclient interface
> and veth/tap interface to leverage NICs' offload
> capability to maximize across-compute-node TCP
> performance, with it applied, OVS DPDK can reach
> linespeed for across-compute-node VM-to-VM TCP
> performance.
> 
> Signed-off-by: Yi Yang 
> ---
>  lib/dp-packet.h|  61 +
>  lib/netdev-dpdk.c  | 193 
> +
>  lib/netdev-linux.c |  20 ++
>  lib/netdev.c   |  14 ++--
>  4 files changed, 271 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 070d111..07af124 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -1034,6 +1034,67 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b)
>  *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG;
>  }
>  
> +#ifdef DPDK_NETDEV
> +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */
> +static inline void
> +dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b)
> +{
> +b->mbuf.ol_flags |= PKT_TX_TUNNEL_VXLAN;
> +b->mbuf.l2_len += sizeof(struct udp_header) +
> +  sizeof(struct vxlanhdr);
> +b->mbuf.outer_l2_len = ETH_HEADER_LEN;
> +b->mbuf.outer_l3_len = IP_HEADER_LEN;

What about IPv6?


> +}
> +
> +/* Set l2_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l2_len(struct dp_packet *b, int l2_len)
> +{
> +b->mbuf.l2_len = l2_len;
> +}
> +
> +/* Set l3_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l3_len(struct dp_packet *b, int l3_len)
> +{
> +b->mbuf.l3_len = l3_len;
> +}
> +
> +/* Set l4_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l4_len(struct dp_packet *b, int l4_len)
> +{
> +b->mbuf.l4_len = l4_len;
> +}
> +#else
> +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */
> +static inline void
> +dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b OVS_UNUSED)
> +{
> +}
> +
> +/* Set l2_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l2_len(struct dp_packet *b OVS_UNUSED,
> +  int l2_len OVS_UNUSED)
> +{
> +}
> +
> +/* Set l3_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l3_len(struct dp_packet *b OVS_UNUSED,
> +  int l3_len OVS_UNUSED)
> +{
> +}
> +
> +/* Set l4_len for the packet 'b' */
> +static inline void
> +dp_packet_hwol_set_l4_len(struct dp_packet *b OVS_UNUSED,
> +  int l4_len OVS_UNUSED)
> +{
> +}
> +#endif /* DPDK_NETDEV */
> +
>  static inline bool
>  dp_packet_ip_checksum_valid(const struct dp_packet *p)
>  {
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 44ebf96..bf5fa63 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -44,6 +44,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

We have network headers definitions in OVS and we should
give preference to them.


>  #include "cmap.h"
>  #include "coverage.h"
> @@ -87,6 +88,7 @@ COVERAGE_DEFINE(vhost_notification);
>  
>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
>  #define OVS_VPORT_DPDK "ovs_dpdk"
> +#define DPDK_RTE_HDR_OFFSET 1

Perhaps HDR_NEXT_OFFSET defined somewhere more generic
because it's neither RTE nor DPDK specific?

>  
>  /*
>   * need to reserve tons of extra space in the mbufs so we can align the
> @@ -405,6 +407,7 @@ enum dpdk_hw_ol_features {
>  NETDEV_RX_HW_SCATTER = 1 << 2,
>  NETDEV_TX_TSO_OFFLOAD = 1 << 3,
>  NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 4,
> +NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD = 1 << 5,
>  };
>  
>  /*
> @@ -988,6 +991,12 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int 
> n_rxq, int n_txq)
>  
>  if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
>  conf.txmode.offloads |= DPDK_TX_TSO_OFFLOAD_FLAGS;
> +/* Enable VXLAN TSO support if available */
> +if (dev->hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) {
> +conf.txmode.offloads |= DEV_TX_OFFLOAD_VXLAN_TNL_TSO;
> +conf.txmode.of

Re: [ovs-dev] [PATCH v2 1/5] Fix dp_packet_set_size error for multi-seg mbuf

2020-07-10 Thread Flavio Leitner


Hi Yi,

Thanks for putting this patch-set together.

On Wed, Jul 01, 2020 at 05:15:29PM +0800, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> For multi-seg mbuf, pkt_len isn't equal to data_len,
> data_len is data_len of the first seg, pkt_len is
> sum of data_len of all the segs, so for such packets,
> dp_packet_set_size shouldn't change data_len.
> 
> Signed-off-by: Yi Yang 
> ---
>  lib/dp-packet.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 0430cca..070d111 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -575,7 +575,9 @@ dp_packet_set_size(struct dp_packet *b, uint32_t v)
>   * (and thus 'v') will always be <= UINT16_MAX; this means that there is 
> no
>   * loss of accuracy in assigning 'v' to 'data_len'.
>   */
> -b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
> +if (b->mbuf.nb_segs <= 1) {
> +b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
> +}
>  b->mbuf.pkt_len = v; /* Total length of all segments linked 
> to
>* this segment. */
>  }

Currently OVS doesn't support multi-seg mbuf, so although
this patch wouldn't break anything it doesn't sound correct
as it is.  It seems incomplete/limited as well.

I think at least the patch should add a comment explaining
why and when that is needed. 

Another thing is that this change alone has no users.  Usually
we do changes along with the first user. I am still reviewing
the following patches, but I suspect this change is for GRO/GSO,
so in my opinion it makes sense to be part of one of them.
Doing so helps to backtrack the reason for a specific change.

I think we should prioritize single mbuf as they carry less
data, so OVS has less time to process them. Therefore, it
seems appropriate to use OVS_LIKELY().

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6] Bareudp Tunnel Support

2020-07-10 Thread Ilya Maximets
On 6/29/20 3:31 PM, Martin Varghese wrote:
> From: Martin Varghese 
> 
> There are various L3 encapsulation standards using UDP being discussed to
> leverage the UDP based load balancing capability of different networks.
> MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them.
> 
> The Bareudp tunnel provides a generic L3 encapsulation support for
> tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP
> tunnel.
> 
> An example to create bareudp device to tunnel MPLS traffic is
> given
> 
> $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \
>  type=bareudp options:remote_ip=2.1.1.3
>  options:local_ip=2.1.1.2 \
>  options:payload_type=0x8847 options:dst_port=6635 \
>  options:packet_type="legacy_l3" \
>  ofport_request=$bareudp_egress_port
> 
> The bareudp device supports special handling for MPLS & IP as
> they can have multiple ethertypes. MPLS procotcol can have ethertypes
> ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can have
> ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6).
> 
> The bareudp device to tunnel L3 traffic with multiple ethertypes
> (MPLS & IP) can be created by passing the L3 protocol name as string in
> the field payload_type. An example to create bareudp device to tunnel
> MPLS unicast & multicast traffic is given below.
> 
> $ ovs-vsctl add-port  br_mpls udp_port -- set interface
> udp_port \
> type=bareudp options:remote_ip=2.1.1.3
> options:local_ip=2.1.1.2 \
> options:payload_type=mpls options:dst_port=6635 \
> options:packet_type="legacy_l3"
> 
> Signed-off-by: Martin Varghese 
> ---
> Changes in v2:
> - Removed vport-bareudp module.
> 
> Changes in v3:
> - Added net-next upstream commit id and message to commit message.
> 
> Changes in v4:
> - Removed kernel datapath changes.
> 
> Changes in v5:
> - Fixed release notes errors.
> - Fixed coding errors in dpif-nelink-rtnl.c.
> 
> Changes in v6:
> - Added code to enable rx metadata collection in the kernel device.
> - Added version history.
> 
>  Documentation/automake.mk |  1 +
>  Documentation/faq/bareudp.rst | 62 
> +++
>  Documentation/faq/index.rst   |  1 +
>  Documentation/faq/releases.rst|  1 +
>  NEWS  |  4 ++
>  datapath/linux/compat/include/linux/openvswitch.h | 10 
>  lib/dpif-netlink-rtnl.c   | 55 
>  lib/dpif-netlink.c|  5 ++
>  lib/netdev-vport.c| 27 +-
>  lib/netdev.h  |  1 +
>  ofproto/ofproto-dpif-xlate.c  |  1 +
>  tests/system-layer3-tunnels.at| 47 +
>  12 files changed, 213 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/faq/bareudp.rst
> 
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index f85c432..ea3475f 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -88,6 +88,7 @@ DOC_SOURCE = \
>   Documentation/faq/terminology.rst \
>   Documentation/faq/vlan.rst \
>   Documentation/faq/vxlan.rst \
> + Documentation/faq/bareudp.rst \
>   Documentation/internals/index.rst \
>   Documentation/internals/authors.rst \
>   Documentation/internals/bugs.rst \
> diff --git a/Documentation/faq/bareudp.rst b/Documentation/faq/bareudp.rst
> new file mode 100644
> index 000..9266daa
> --- /dev/null
> +++ b/Documentation/faq/bareudp.rst
> @@ -0,0 +1,62 @@
> +..
> +  Licensed under the Apache License, Version 2.0 (the "License"); you may
> +  not use this file except in compliance with the License. You may obtain
> +  a copy of the License at
> +
> +  http://www.apache.org/licenses/LICENSE-2.0
> +
> +  Unless required by applicable law or agreed to in writing, software
> +  distributed under the License is distributed on an "AS IS" BASIS, 
> WITHOUT
> +  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See 
> the
> +  License for the specific language governing permissions and limitations
> +  under the License.
> +
> +  Convention for heading levels in Open vSwitch documentation:
> +
> +  ===  Heading 0 (reserved for the title in a document)
> +  ---  Heading 1
> +  ~~~  Heading 2
> +  +++  Heading 3
> +  '''  Heading 4
> +
> +  Avoid deeper levels because they do not render well.
> +
> +===
> +Bareudp
> +===
> +
> +Q: What is Bareudp?
> +
> +A: There are various L3 encapsulation standards using UDP being discussed
> +   to leverage the UDP based load balancing capability of different
> +   networks. MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) 

Re: [ovs-dev] [PATCH v6 5/6] dpif-lookup: add avx512 gather implementation.

2020-07-10 Thread Van Haaren, Harry
> -Original Message-
> From: Stokes, Ian 
> Sent: Friday, July 10, 2020 4:51 PM
> To: Van Haaren, Harry ; ovs-dev@openvswitch.org
> Cc: i.maxim...@ovn.org; u9012...@gmail.com; fie...@redhat.com
> Subject: Re: [PATCH v6 5/6] dpif-lookup: add avx512 gather implementation.
> 
> 
> 
> On 7/2/2020 6:42 PM, Harry van Haaren wrote:
> > This commit adds an AVX-512 dpcls lookup implementation.
> > It uses the AVX-512 SIMD ISA to perform multiple miniflow
> > operations in parallel.
> >
> > To run this implementation, the "avx512f" and "bmi2" ISAs are
> > required. These ISA checks are performed at runtime while
> > probing the subtable implementation. If a CPU does not provide
> > both "avx512f" and "bmi2", then this code does not execute.
> >
> > The avx512 code is built as a seperate static library, with added
> > CFLAGS to enable the required ISA features. By building only this
> > static library with avx512 enabled, it is ensured that the main OVS
> > core library is *not* using avx512, and that OVS continues to run
> > as before on CPUs that do not support avx512.
> >
> > The approach taken in this implementation is to use the
> > gather instruction to access the packet miniflow, allowing
> > any miniflow blocks to be loaded into an AVX-512 register.
> > This maximises the usefulness of the register, and hence this
> > implementation handles any subtable with up to miniflow 8 bits.
> >
> > Note that specialization of these avx512 lookup routines
> > still provides performance value, as the hashing of the
> > resulting data is performed in scalar code, and compile-time
> > loop unrolling occurs when specialized to miniflow bits.
> >
> > This commit checks at configure time if the assembling in use
> > has a known bug in assembling AVX512 code. If this bug is present,
> > all AVX512 code is disabled. Checking the version string of the binutils
> > or assembler is not a good method to detect the issue, as backported fixes
> > would not be reflected.
> >
> > Signed-off-by: Harry van Haaren 
> 
> Thanks for this Harry,
> 
> I've spent some time testing this on both AVX512 enabled and non-enabled
> systems and can confirm the performance increase between scalar, generic
> and avx512 which is great too see.
> 
> One thing I noticed was that with the AVX512 system, there is a
> dependency on how the CFLAGS are passed when configuring and compiling
> OVS in order to enable AVX512 lookup.
> 
> In my testing passing the CFLAGS="CFLAGS="-g -Ofast -march=native" with
> configure seems to work fine and I could see the AVX512 lookup available.
> 
> However when testing with an older script and the same CFLAG was passed
> along with the make command instead of at configure, then AVX512  lookup
> would not be available.
> 
> Depending on how users configure and compile this may not be an issue
> but thought it worth flagging  as there does seem to be a dependency
> that could be missed at compilation (but from the configure logs all
> looked well).
> 
> At the minimum I think it might be worth documenting in the docs,
> possibly in the datapath performance section you add later in the series
> but also maybe in the compiler optimizations section.

Sure, can add command examples in those sections yes.

> > +/* Disabling AVX512 at compile time, as compile time requirements not 
> > met.
> > + * This could be due to a number of reasons:
> > + *  1) core OVS is not compiled with SSE4.2 instruction set.
> > + * The SSE42 instructions are required to use CRC32 ISA for high-
> > + * performance hashing. Consider ./configure of OVS with -msse42 
> > (or
> > + * newer) to enable CRC32 hashing and higher performance.
> > + *  2) The assembler in binutils versions 2.30 and 2.31 has bugs in 
> > AVX512
> > + * assembly. Compile time probes check for this assembler issue, 
> > and
> > + * disable the HAVE_LD_AVX512_GOOD check if an issue is detected.
> > + * Please upgrade binutils, or backport this binutils fix commit:
> > + * 2069ccaf8dc28ea699bd901fdd35d90613e4402a
> > + */
> 
> I wonder if the above info regarding what to check for would be useful
> in the documentation section?

Sure, will add some detail in the docs, and point to the source code for
technical details.

> >   int32_t
> > diff --git a/lib/dpif-netdev-lookup.h b/lib/dpif-netdev-lookup.h
> > index 61f44b9e8..bd72aa29b 100644
> > --- a/lib/dpif-netdev-lookup.h
> > +++ b/lib/dpif-netdev-lookup.h
> > @@ -42,6 +42,10 @@ dpcls_subtable_autovalidator_probe(uint32_t
> u0_bit_count,
> >   dpcls_subtable_lookup_func
> >   dpcls_subtable_generic_probe(uint32_t u0_bit_count, uint32_t 
> > u1_bit_count);
> >
> > +/* Probe function for AVX-512 gather implementation */
> > +dpcls_subtable_lookup_func
> > +dpcls_subtable_avx512_gather_probe(uint32_t u0_bit_cnt, uint32_t 
> > u1_bit_cnt);
> > +
> >
> >   /* Subtable registration and iteration helpers */
> >   struct dpcls_subtable_lookup_info_t {
> > dif

Re: [ovs-dev] [PATCH v6 1/6] dpif-netdev: implement subtable lookup validation.

2020-07-10 Thread Van Haaren, Harry
> -Original Message-
> From: Stokes, Ian 
> Sent: Thursday, July 9, 2020 2:21 PM
> To: Van Haaren, Harry ; ovs-dev@openvswitch.org
> Cc: i.maxim...@ovn.org; u9012...@gmail.com; fie...@redhat.com
> Subject: Re: [PATCH v6 1/6] dpif-netdev: implement subtable lookup validation.
> 
> 
> 
> On 7/2/2020 6:42 PM, Harry van Haaren wrote:
> > This commit refactors the existing dpif subtable function pointer
> > infrastructure, and implements an autovalidator component.
> >
> > The refactoring of the existing dpcls subtable lookup function
> > handling, making it more generic, and cleaning up how to enable
> > more implementations in future.
> >
> > In order to ensure all implementations provide identical results,
> > the autovalidator is added. The autovalidator itself implements
> > the subtable lookup function prototype, but internally iterates
> > over all other available implementations. The end result is that
> > testing of each implementation becomes automatic, when the auto-
> > validator implementation is selected.
> >
> > Signed-off-by: Harry van Haaren 
> 
> Thasnk for this Harry,
> 
> a few comments below, more for discussion.
> 
> Theres a few minor typos and style fixes required, nothing major that
> cannot be fixed on commit so no need for revision.

Fixed in v7.



> > +VLOG_DEFINE_THIS_MODULE(dpif_lookup_autovalidator);
> > +
> > +/* This file implements an automated validator for subtable search
> > + * implementations. It compares the results of the generic scalar search 
> > result
> > + * with ISA optimized implementations.
> > + *
> > + * Note the goal is *NOT* to test the *specialized* versions of subtables, 
> > as
> > + * the compiler performs the specialization - and we rely on the 
> > correctness of
> > + * the compiler to not break those specialized variantes.
> 
> So if we depend on the compiler for correctness, should we provide a
> list of known "Correct compilers" that this feature was tested with? I
> guess there could be a case where a compiler has a bug and as such may
> break the feature?

Compilers must be assumed to be correct. There is no value in attempting to 
provide
a list of compilers here - optimizing compilers do constant-propagation all the 
time,
and resulting generated code must be correct: the same rules apply here.


> Minor typo, variants, can be done on commit.

Fixed in v7.

> > + * The goal is to ensure identical results of the different 
> > implementations,
> > + * despite that the implementations may have different methods to get those
> > + * results.
> > + *
> > + * Example: AVX-512 ISA uses different instructions and algorithm to the 
> > scalar
> > + * implementation, however the results (rules[] output) must be the same.
> > + */
> > +
> > +dpcls_subtable_lookup_func
> > +dpcls_subtable_autovalidator_probe(uint32_t u0 OVS_UNUSED,
> > +   uint32_t u1 OVS_UNUSED);
> > +
> > +static uint32_t
> > +dpcls_subtable_autovalidator(struct dpcls_subtable *subtable,
> > + uint32_t keys_map,
> > + const struct netdev_flow_key *keys[],
> > + struct dpcls_rule **rules_good)
> > +{
> > +const uint32_t u0_bit_count = subtable->mf_bits_set_unit0;
> > +const uint32_t u1_bit_count = subtable->mf_bits_set_unit1;
> > +
> > +/* Scalar generic - the "known correct" version */
> Minor, for this and all other comments missing period. Fairly minor so
> can be fixed on commit.

Fixing along the way in v7.



> >   /* Generic lookup function that uses runtime provided mf bits for 
> > iterating. */
> > -uint32_t
> > +static uint32_t
> >   dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable,
> > uint32_t keys_map,
> > const struct netdev_flow_key *keys[],
> > @@ -310,6 +311,10 @@ dpcls_subtable_generic_probe(uint32_t u0_bits,
> uint32_t u1_bits)
> >   if (f) {
> >   VLOG_DBG("Subtable using Generic Optimized for u0 %d, u1 %d\n",
> >u0_bits, u1_bits);
> > +} else {
> > +/* Always return the generic function */
> > +f = dpcls_subtable_lookup_generic;
> 
> I had always assumed that the generic lookup would be selected anyway if
> the optimized scalar lookup wasn't available prioir to this patch, maybe
> I missed something as regards why you have to sepcify it here?

As per your below comment, scalar must always return a valid pointer. 



> > -/* Probe for a specialized generic lookup function. */
> > -subtable->lookup_func = dpcls_subtable_generic_probe(unit0, unit1);
> > -
> > -/* If not set, assign generic lookup. Generic works for any miniflow. 
> > */
> > -if (!subtable->lookup_func) {
> > -subtable->lookup_func = dpcls_subtable_lookup_generic;
> > -}
> Ah, ignore the comment above, this is where the confusion was coming
> from on my part, generic as selected here previously.
> 
> 
> Completed te

Re: [ovs-dev] [PATCH v6 6/6] docs/dpdk/bridge: add datapath performance section.

2020-07-10 Thread Van Haaren, Harry
> -Original Message-
> From: Stokes, Ian 
> Sent: Friday, July 10, 2020 5:00 PM
> To: Van Haaren, Harry ; ovs-dev@openvswitch.org
> Cc: i.maxim...@ovn.org; u9012...@gmail.com; fie...@redhat.com
> Subject: Re: [PATCH v6 6/6] docs/dpdk/bridge: add datapath performance
> section.
> 
> 
> 
> On 7/2/2020 6:43 PM, Harry van Haaren wrote:
> > This commit adds a section to the dpdk/bridge.rst netdev documentation,
> > detailing the added DPCLS functionality. The newly added commands are
> > documented, and sample output is provided.
> >
> > Running the DPCLS autovalidator with unit tests by default is possible
> > through re-compiling the autovalidator to have the highest priority at
> > startup time. This avoids making changes to all tests, and enables
> > debug and CI builds to validate every lookup implementation with all
> > unit tests.
> >
> > Add NEWS updates for CPU ISA, dynamic subtables, and AVX512 lookup.
> >
> > Signed-off-by: Harry van Haaren 
> >
> 
> Hi Harry,
> What you have below looks good to me.
> 
> The only additional ideas that might be worth adding would be either
> validated compilers as mention in patch 1 f the series (maybe this is
> not needed, but reviewing the existing Compilation section for OVS
> already states a GCC version that was tested with OVS DPDK so at least 1
> known GCC version is provided).

As mentioned in reply to first patch, I don't see value in stating what 
compilers
work or don’t - we must just rely on compilers working. OVS can recommend
or state that it is tested with specific compilers - but that is an independent 
issue to this patchset.

> Noting the configure, make CFLAGS dependency might be of use too
> although again, depends on how people configure and compile OVS to date.

Examples commands and documentation added to remedy this.

> Lastly possibly adding a section on what to check if AVX512 lookup is
> not appearing might be useful also.

Added section on potential issues regarding binutils bug, and how to
remedy.

> BR
> Ian

Thanks for review, -Harry

> >
> > v5:
> > - Include NEWS item updates.
> >
> > v4:
> > - Fix typos (William Tu)
> > - Update get commands to use include "prio" as updated in v4
> > - Add section on enabling autovalidator by default for unit tests
> > ---
> >   Documentation/topics/dpdk/bridge.rst | 77 
> >   NEWS |  3 ++
> >   2 files changed, 80 insertions(+)
> >
> > diff --git a/Documentation/topics/dpdk/bridge.rst
> b/Documentation/topics/dpdk/bridge.rst
> > index f0ef42ecc..526d5c959 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -137,3 +137,80 @@ currently turned off by default.
> >   To turn on SMC::
> >
> >   $ ovs-vsctl --no-wait set Open_vSwitch . other_config:smc-enable=true
> > +
> > +Datapath Classifier Performance
> > +---
> > +
> > +The datapath classifier (dpcls) performs wildcard rule matching, a compute
> > +intensive process of matching a packet ``miniflow`` to a rule 
> > ``miniflow``. The
> > +code that does this compute work impacts datapath performance, and
> optimizing
> > +it can provide higher switching performance.
> > +
> > +Modern CPUs provide extensive SIMD instructions which can be used to get
> higher
> > +performance. The CPU OVS is being deployed on must be capable of running
> these
> > +SIMD instructions in order to take advantage of the performance benefits.
> > +In OVS v2.14 runtime CPU detection was introduced to enable identifying if
> > +these CPU ISA additions are available, and to allow the user to enable 
> > them.
> > +
> > +OVS provides multiple implementations of dpcls. The following command
> enables
> > +the user to check what implementations are available in a running instance 
> > ::
> > +
> > +$ ovs-appctl dpif-netdev/subtable-lookup-prio-get
> > +Available lookup functions (priority : name)
> > +0 : autovalidator
> > +1 : generic
> > +0 : avx512_gather
> > +
> > +To set the priority of a lookup function, run the ``prio-set`` command ::
> > +
> > +$ ovs-appctl dpif-netdev/subtable-lookup-prio-set avx512_gather 5
> > +Lookup priority change affected 1 dpcls ports and 1 subtables.
> > +
> > +The highest priority lookup function is used for classification, and the 
> > output
> > +above indicates that one subtable of one DPCLS port is has changed its 
> > lookup
> > +function due to the command being run. To verify the prioritization, re-run
> the
> > +get command, note the updated priority of the ``avx512_gather`` function ::
> > +
> > +$ ovs-appctl dpif-netdev/subtable-lookup-prio-get
> > +Available lookup functions (priority : name)
> > +0 : autovalidator
> > +1 : generic
> > +5 : avx512_gather
> > +
> > +If two lookup functions have the same priority, the first one in the list 
> > is
> > +chosen, and the 2nd occurance of that pr

Re: [ovs-dev] [PATCH v1 1/1] dpdk: Use DPDK 19.11.2 release.

2020-07-10 Thread Stokes, Ian




On 7/10/2020 4:55 PM, Kevin Traynor wrote:

On 06/07/2020 16:50, Ian Stokes wrote:

Modify travis linux build script to use DPDK 19.11.2 stable release and
update docs to reference 19.11.2 stable release.

Signed-off-by: Ian Stokes 
---
  .travis/linux-build.sh   | 2 +-
  Documentation/faq/releases.rst   | 2 +-
  Documentation/intro/install/dpdk.rst | 8 
  Documentation/topics/dpdk/vhost-user.rst | 6 +++---
  NEWS | 5 -
  5 files changed, 13 insertions(+), 10 deletions(-)



userspace-tso.rst says:

When the NIC performing the segmentation is using the i40e DPDK PMD, a fix
must be included in the DPDK build, otherwise TSO will not work. The fix can
be found on `DPDK patchwork`__.

__ https://patches.dpdk.org/patch/64136/

This fix is expected to be included in the 19.11.1 release. When OVS
migrates
to this DPDK release, this limitation can be removed.
---

The commit is in 19.11.2 (see below), so can this be removed now or you
want to keep as info for someone using 19.11.0 ? Either way the last
paragraph deserves a little update. Other than that lgtm.


commit 6b08d9b3331625a9b4c598d5520bc5fc27fce147
Author: Xiaoyun Li 
Date:   Thu Dec 26 14:45:44 2019 +0800

 net/i40e: fix Tx when TSO is enabled

 [ upstream commit 29b2ba82c4c94df1975d0cb9c5c23feef99cf6a3 ]



Ah, good catch, I think we can remove it as we're recommending a new 
version now that has been validated and contains the fix.


Cheers for the review.

Ian



$ git tag --contains 6b08d9b333162 | grep 19.11.2
v19.11.2


diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
index 02615a8ec..e0a065291 100755
--- a/.travis/linux-build.sh
+++ b/.travis/linux-build.sh
@@ -170,7 +170,7 @@ fi
  
  if [ "$DPDK" ] || [ "$DPDK_SHARED" ]; then

  if [ -z "$DPDK_VER" ]; then
-DPDK_VER="19.11"
+DPDK_VER="19.11.2"
  fi
  install_dpdk $DPDK_VER
  if [ "$CC" = "clang" ]; then
diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index e5cef3915..7c826f239 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -194,7 +194,7 @@ Q: What DPDK version does each Open vSwitch release work 
with?
  2.10.x   17.11.4
  2.11.x   18.11.6
  2.12.x   18.11.6
-2.13.x   19.11.0
+2.13.x   19.11.2
   ===
  
  Q: Are all the DPDK releases that OVS versions work with maintained?

diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index dbf88ec43..90eaa8aa2 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -42,7 +42,7 @@ Build requirements
  In addition to the requirements described in :doc:`general`, building Open
  vSwitch with DPDK will require the following:
  
-- DPDK 19.11

+- DPDK 19.11.2
  
  - A `DPDK supported NIC`_
  
@@ -71,9 +71,9 @@ Install DPDK

  #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
  
 $ cd /usr/src/

-   $ wget https://fast.dpdk.org/rel/dpdk-19.11.tar.xz
-   $ tar xf dpdk-19.11.tar.xz
-   $ export DPDK_DIR=/usr/src/dpdk-19.11
+   $ wget https://fast.dpdk.org/rel/dpdk-19.11.2.tar.xz
+   $ tar xf dpdk-19.11.2.tar.xz
+   $ export DPDK_DIR=/usr/src/dpdk-stable-19.11.2
 $ cd $DPDK_DIR
  
  #. (Optional) Configure DPDK as a shared library

diff --git a/Documentation/topics/dpdk/vhost-user.rst 
b/Documentation/topics/dpdk/vhost-user.rst
index c6c6fd8bd..4bc5aef59 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -392,9 +392,9 @@ To begin, instantiate a guest as described in 
:ref:`dpdk-vhost-user` or
  DPDK sources to VM and build DPDK::
  
  $ cd /root/dpdk/

-$ wget https://fast.dpdk.org/rel/dpdk-19.11.tar.xz
-$ tar xf dpdk-19.11.tar.xz
-$ export DPDK_DIR=/root/dpdk/dpdk-19.11
+$ wget https://fast.dpdk.org/rel/dpdk-19.11.2.tar.xz
+$ tar xf dpdk-19.11.2.tar.xz
+$ export DPDK_DIR=/root/dpdk/dpdk-stable-19.11.2
  $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
  $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
  $ cd $DPDK_DIR
diff --git a/NEWS b/NEWS
index 0116b3ea0..162fbc991 100644
--- a/NEWS
+++ b/NEWS
@@ -23,7 +23,10 @@ Post-v2.13.0
 - Tunnels: TC Flower offload
   * Tunnel Local endpoint address masked match are supported.
   * Tunnel Romte endpoint address masked match are supported.
-
+   - DPDK:
+ * OVS validated with DPDK 19.11.2, due to the inclusion of fixes for
+   CVE-2020-10722, CVE-2020-10723, CVE-2020-10724, CVE-2020-10725 and
+   CVE-2020-10726, this DPDK version is strongly recommended to be used.
  
  v2.13.0 - 14 Feb 2020

  -




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 1/1] dpdk: Use DPDK 19.11.2 release.

2020-07-10 Thread Kevin Traynor
On 10/07/2020 16:55, Kevin Traynor wrote:
> On 06/07/2020 16:50, Ian Stokes wrote:
>> Modify travis linux build script to use DPDK 19.11.2 stable release and
>> update docs to reference 19.11.2 stable release.
>>
>> Signed-off-by: Ian Stokes 
>> ---
>>  .travis/linux-build.sh   | 2 +-
>>  Documentation/faq/releases.rst   | 2 +-
>>  Documentation/intro/install/dpdk.rst | 8 
>>  Documentation/topics/dpdk/vhost-user.rst | 6 +++---
>>  NEWS | 5 -
>>  5 files changed, 13 insertions(+), 10 deletions(-)
>>
> 
> userspace-tso.rst says:
> 
> When the NIC performing the segmentation is using the i40e DPDK PMD, a fix
> must be included in the DPDK build, otherwise TSO will not work. The fix can
> be found on `DPDK patchwork`__.
> 
> __ https://patches.dpdk.org/patch/64136/
> 
> This fix is expected to be included in the 19.11.1 release. When OVS
> migrates
> to this DPDK release, this limitation can be removed.
> ---
> 
> The commit is in 19.11.2 (see below), so can this be removed now or you
> want to keep as info for someone using 19.11.0 ? Either way the last
> paragraph deserves a little update. Other than that lgtm.
> 
> 
> commit 6b08d9b3331625a9b4c598d5520bc5fc27fce147
> Author: Xiaoyun Li 
> Date:   Thu Dec 26 14:45:44 2019 +0800
> 
> net/i40e: fix Tx when TSO is enabled
> 
> [ upstream commit 29b2ba82c4c94df1975d0cb9c5c23feef99cf6a3 ]
> 
> 
> $ git tag --contains 6b08d9b333162 | grep 19.11.2
> v19.11.2
> 
>> diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
>> index 02615a8ec..e0a065291 100755
>> --- a/.travis/linux-build.sh
>> +++ b/.travis/linux-build.sh
>> @@ -170,7 +170,7 @@ fi
>>  
>>  if [ "$DPDK" ] || [ "$DPDK_SHARED" ]; then
>>  if [ -z "$DPDK_VER" ]; then
>> -DPDK_VER="19.11"
>> +DPDK_VER="19.11.2"
>>  fi
>>  install_dpdk $DPDK_VER
>>  if [ "$CC" = "clang" ]; then
>> diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
>> index e5cef3915..7c826f239 100644
>> --- a/Documentation/faq/releases.rst
>> +++ b/Documentation/faq/releases.rst
>> @@ -194,7 +194,7 @@ Q: What DPDK version does each Open vSwitch release work 
>> with?
>>  2.10.x   17.11.4
>>  2.11.x   18.11.6
>>  2.12.x   18.11.6
>> -2.13.x   19.11.0
>> +2.13.x   19.11.2
>>   ===
>>  
>>  Q: Are all the DPDK releases that OVS versions work with maintained?
>> diff --git a/Documentation/intro/install/dpdk.rst 
>> b/Documentation/intro/install/dpdk.rst
>> index dbf88ec43..90eaa8aa2 100644
>> --- a/Documentation/intro/install/dpdk.rst
>> +++ b/Documentation/intro/install/dpdk.rst
>> @@ -42,7 +42,7 @@ Build requirements
>>  In addition to the requirements described in :doc:`general`, building Open
>>  vSwitch with DPDK will require the following:
>>  
>> -- DPDK 19.11
>> +- DPDK 19.11.2
>>  
>>  - A `DPDK supported NIC`_
>>  
>> @@ -71,9 +71,9 @@ Install DPDK
>>  #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
>>  
>> $ cd /usr/src/
>> -   $ wget https://fast.dpdk.org/rel/dpdk-19.11.tar.xz
>> -   $ tar xf dpdk-19.11.tar.xz
>> -   $ export DPDK_DIR=/usr/src/dpdk-19.11
>> +   $ wget https://fast.dpdk.org/rel/dpdk-19.11.2.tar.xz
>> +   $ tar xf dpdk-19.11.2.tar.xz
>> +   $ export DPDK_DIR=/usr/src/dpdk-stable-19.11.2
>> $ cd $DPDK_DIR
>>  
>>  #. (Optional) Configure DPDK as a shared library
>> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
>> b/Documentation/topics/dpdk/vhost-user.rst
>> index c6c6fd8bd..4bc5aef59 100644
>> --- a/Documentation/topics/dpdk/vhost-user.rst
>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>> @@ -392,9 +392,9 @@ To begin, instantiate a guest as described in 
>> :ref:`dpdk-vhost-user` or
>>  DPDK sources to VM and build DPDK::
>>  
>>  $ cd /root/dpdk/
>> -$ wget https://fast.dpdk.org/rel/dpdk-19.11.tar.xz
>> -$ tar xf dpdk-19.11.tar.xz
>> -$ export DPDK_DIR=/root/dpdk/dpdk-19.11
>> +$ wget https://fast.dpdk.org/rel/dpdk-19.11.2.tar.xz
>> +$ tar xf dpdk-19.11.2.tar.xz
>> +$ export DPDK_DIR=/root/dpdk/dpdk-stable-19.11.2
>>  $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
>>  $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
>>  $ cd $DPDK_DIR
>> diff --git a/NEWS b/NEWS
>> index 0116b3ea0..162fbc991 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -23,7 +23,10 @@ Post-v2.13.0
>> - Tunnels: TC Flower offload
>>   * Tunnel Local endpoint address masked match are supported.
>>   * Tunnel Romte endpoint address masked match are supported.
>> -
>> +   - DPDK:
>> + * OVS validated with DPDK 19.11.2, due to the inclusion of fixes for
>> +   CVE-2020-10722, CVE-2020-10723, CVE-2020-10724, CVE-2020-10725 and
>> +   CVE-2020-10726, this DPDK version is strongly recommended to be used.
>>  

Just noticed there is already a DPDK section in the NEWS, I guess you
can add this text to that.

>>  v2.13.0 - 14 Fe

Re: [ovs-dev] [PATCH v6 6/6] docs/dpdk/bridge: add datapath performance section.

2020-07-10 Thread Stokes, Ian




On 7/2/2020 6:43 PM, Harry van Haaren wrote:

This commit adds a section to the dpdk/bridge.rst netdev documentation,
detailing the added DPCLS functionality. The newly added commands are
documented, and sample output is provided.

Running the DPCLS autovalidator with unit tests by default is possible
through re-compiling the autovalidator to have the highest priority at
startup time. This avoids making changes to all tests, and enables
debug and CI builds to validate every lookup implementation with all
unit tests.

Add NEWS updates for CPU ISA, dynamic subtables, and AVX512 lookup.

Signed-off-by: Harry van Haaren 



Hi Harry,
What you have below looks good to me.

The only additional ideas that might be worth adding would be either 
validated compilers as mention in patch 1 f the series (maybe this is 
not needed, but reviewing the existing Compilation section for OVS 
already states a GCC version that was tested with OVS DPDK so at least 1 
known GCC version is provided).


Noting the configure, make CFLAGS dependency might be of use too 
although again, depends on how people configure and compile OVS to date.


Lastly possibly adding a section on what to check if AVX512 lookup is 
not appearing might be useful also.


BR
Ian


---

v5:
- Include NEWS item updates.

v4:
- Fix typos (William Tu)
- Update get commands to use include "prio" as updated in v4
- Add section on enabling autovalidator by default for unit tests
---
  Documentation/topics/dpdk/bridge.rst | 77 
  NEWS |  3 ++
  2 files changed, 80 insertions(+)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index f0ef42ecc..526d5c959 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -137,3 +137,80 @@ currently turned off by default.
  To turn on SMC::
  
  $ ovs-vsctl --no-wait set Open_vSwitch . other_config:smc-enable=true

+
+Datapath Classifier Performance
+---
+
+The datapath classifier (dpcls) performs wildcard rule matching, a compute
+intensive process of matching a packet ``miniflow`` to a rule ``miniflow``. The
+code that does this compute work impacts datapath performance, and optimizing
+it can provide higher switching performance.
+
+Modern CPUs provide extensive SIMD instructions which can be used to get higher
+performance. The CPU OVS is being deployed on must be capable of running these
+SIMD instructions in order to take advantage of the performance benefits.
+In OVS v2.14 runtime CPU detection was introduced to enable identifying if
+these CPU ISA additions are available, and to allow the user to enable them.
+
+OVS provides multiple implementations of dpcls. The following command enables
+the user to check what implementations are available in a running instance ::
+
+$ ovs-appctl dpif-netdev/subtable-lookup-prio-get
+Available lookup functions (priority : name)
+0 : autovalidator
+1 : generic
+0 : avx512_gather
+
+To set the priority of a lookup function, run the ``prio-set`` command ::
+
+$ ovs-appctl dpif-netdev/subtable-lookup-prio-set avx512_gather 5
+Lookup priority change affected 1 dpcls ports and 1 subtables.
+
+The highest priority lookup function is used for classification, and the output
+above indicates that one subtable of one DPCLS port is has changed its lookup
+function due to the command being run. To verify the prioritization, re-run the
+get command, note the updated priority of the ``avx512_gather`` function ::
+
+$ ovs-appctl dpif-netdev/subtable-lookup-prio-get
+Available lookup functions (priority : name)
+0 : autovalidator
+1 : generic
+5 : avx512_gather
+
+If two lookup functions have the same priority, the first one in the list is
+chosen, and the 2nd occurance of that priority is not used. Put in logical
+terms, a subtable is chosen if its priority is greater than the previous
+best candidate.
+
+CPU ISA Testing and Validation
+~~
+
+As multiple versions of DPCLS can co-exist, each with different CPU ISA
+optimizations, it is important to validate that they all give the exact same
+results. To easily test all DPCLS implementations, an ``autovalidator``
+implementation of the DPCLS exists. This implementation runs all other
+available DPCLS implementations, and verifies that the results are identical.
+
+Running the OVS unit tests with the autovalidator enabled ensures all
+implementations provide the same results. Note that the performance of the
+autovalidator is lower than all other implementations, as it tests the scalar
+implementation against itself, and against all other enabled DPCLS
+implementations.
+
+To adjust the DPCLS autovalidator priority, use this command ::
+
+$ ovs-appctl dpif-netdev/subtable-lookup-prio-set autovalidator 7
+
+Running Unit Tests with Autovalidator

Re: [ovs-dev] [PATCH 0/3] Remove duplicated includes

2020-07-10 Thread Gregory Rose




On 7/9/2020 5:57 PM, wangyunjian wrote:

From: Yunjian Wang 

This series include three patches for removing duplicated includes.

Yunjian Wang (3):
   lib: Remove duplicated includes
   ofproto: Remove duplicated includes
   datapath: Remove duplicated includes

  datapath/linux/compat/lisp.c | 1 -
  datapath/vport-stt.c | 1 -
  lib/netdev-native-tnl.c  | 1 -
  lib/tnl-ports.c  | 1 -
  ofproto/ofproto-dpif.h   | 1 -
  ofproto/tunnel.c | 2 --
  6 files changed, 7 deletions(-)



For the series...

LGTM.  Thanks for the cleanup!

Acked-by: Greg Rose 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 1/1] dpdk: Use DPDK 19.11.2 release.

2020-07-10 Thread Kevin Traynor
On 06/07/2020 16:50, Ian Stokes wrote:
> Modify travis linux build script to use DPDK 19.11.2 stable release and
> update docs to reference 19.11.2 stable release.
> 
> Signed-off-by: Ian Stokes 
> ---
>  .travis/linux-build.sh   | 2 +-
>  Documentation/faq/releases.rst   | 2 +-
>  Documentation/intro/install/dpdk.rst | 8 
>  Documentation/topics/dpdk/vhost-user.rst | 6 +++---
>  NEWS | 5 -
>  5 files changed, 13 insertions(+), 10 deletions(-)
> 

userspace-tso.rst says:

When the NIC performing the segmentation is using the i40e DPDK PMD, a fix
must be included in the DPDK build, otherwise TSO will not work. The fix can
be found on `DPDK patchwork`__.

__ https://patches.dpdk.org/patch/64136/

This fix is expected to be included in the 19.11.1 release. When OVS
migrates
to this DPDK release, this limitation can be removed.
---

The commit is in 19.11.2 (see below), so can this be removed now or you
want to keep as info for someone using 19.11.0 ? Either way the last
paragraph deserves a little update. Other than that lgtm.


commit 6b08d9b3331625a9b4c598d5520bc5fc27fce147
Author: Xiaoyun Li 
Date:   Thu Dec 26 14:45:44 2019 +0800

net/i40e: fix Tx when TSO is enabled

[ upstream commit 29b2ba82c4c94df1975d0cb9c5c23feef99cf6a3 ]


$ git tag --contains 6b08d9b333162 | grep 19.11.2
v19.11.2

> diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
> index 02615a8ec..e0a065291 100755
> --- a/.travis/linux-build.sh
> +++ b/.travis/linux-build.sh
> @@ -170,7 +170,7 @@ fi
>  
>  if [ "$DPDK" ] || [ "$DPDK_SHARED" ]; then
>  if [ -z "$DPDK_VER" ]; then
> -DPDK_VER="19.11"
> +DPDK_VER="19.11.2"
>  fi
>  install_dpdk $DPDK_VER
>  if [ "$CC" = "clang" ]; then
> diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
> index e5cef3915..7c826f239 100644
> --- a/Documentation/faq/releases.rst
> +++ b/Documentation/faq/releases.rst
> @@ -194,7 +194,7 @@ Q: What DPDK version does each Open vSwitch release work 
> with?
>  2.10.x   17.11.4
>  2.11.x   18.11.6
>  2.12.x   18.11.6
> -2.13.x   19.11.0
> +2.13.x   19.11.2
>   ===
>  
>  Q: Are all the DPDK releases that OVS versions work with maintained?
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Documentation/intro/install/dpdk.rst
> index dbf88ec43..90eaa8aa2 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -42,7 +42,7 @@ Build requirements
>  In addition to the requirements described in :doc:`general`, building Open
>  vSwitch with DPDK will require the following:
>  
> -- DPDK 19.11
> +- DPDK 19.11.2
>  
>  - A `DPDK supported NIC`_
>  
> @@ -71,9 +71,9 @@ Install DPDK
>  #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
>  
> $ cd /usr/src/
> -   $ wget https://fast.dpdk.org/rel/dpdk-19.11.tar.xz
> -   $ tar xf dpdk-19.11.tar.xz
> -   $ export DPDK_DIR=/usr/src/dpdk-19.11
> +   $ wget https://fast.dpdk.org/rel/dpdk-19.11.2.tar.xz
> +   $ tar xf dpdk-19.11.2.tar.xz
> +   $ export DPDK_DIR=/usr/src/dpdk-stable-19.11.2
> $ cd $DPDK_DIR
>  
>  #. (Optional) Configure DPDK as a shared library
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index c6c6fd8bd..4bc5aef59 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -392,9 +392,9 @@ To begin, instantiate a guest as described in 
> :ref:`dpdk-vhost-user` or
>  DPDK sources to VM and build DPDK::
>  
>  $ cd /root/dpdk/
> -$ wget https://fast.dpdk.org/rel/dpdk-19.11.tar.xz
> -$ tar xf dpdk-19.11.tar.xz
> -$ export DPDK_DIR=/root/dpdk/dpdk-19.11
> +$ wget https://fast.dpdk.org/rel/dpdk-19.11.2.tar.xz
> +$ tar xf dpdk-19.11.2.tar.xz
> +$ export DPDK_DIR=/root/dpdk/dpdk-stable-19.11.2
>  $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
>  $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
>  $ cd $DPDK_DIR
> diff --git a/NEWS b/NEWS
> index 0116b3ea0..162fbc991 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -23,7 +23,10 @@ Post-v2.13.0
> - Tunnels: TC Flower offload
>   * Tunnel Local endpoint address masked match are supported.
>   * Tunnel Romte endpoint address masked match are supported.
> -
> +   - DPDK:
> + * OVS validated with DPDK 19.11.2, due to the inclusion of fixes for
> +   CVE-2020-10722, CVE-2020-10723, CVE-2020-10724, CVE-2020-10725 and
> +   CVE-2020-10726, this DPDK version is strongly recommended to be used.
>  
>  v2.13.0 - 14 Feb 2020
>  -
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 5/6] dpif-lookup: add avx512 gather implementation.

2020-07-10 Thread Stokes, Ian




On 7/2/2020 6:42 PM, Harry van Haaren wrote:

This commit adds an AVX-512 dpcls lookup implementation.
It uses the AVX-512 SIMD ISA to perform multiple miniflow
operations in parallel.

To run this implementation, the "avx512f" and "bmi2" ISAs are
required. These ISA checks are performed at runtime while
probing the subtable implementation. If a CPU does not provide
both "avx512f" and "bmi2", then this code does not execute.

The avx512 code is built as a seperate static library, with added
CFLAGS to enable the required ISA features. By building only this
static library with avx512 enabled, it is ensured that the main OVS
core library is *not* using avx512, and that OVS continues to run
as before on CPUs that do not support avx512.

The approach taken in this implementation is to use the
gather instruction to access the packet miniflow, allowing
any miniflow blocks to be loaded into an AVX-512 register.
This maximises the usefulness of the register, and hence this
implementation handles any subtable with up to miniflow 8 bits.

Note that specialization of these avx512 lookup routines
still provides performance value, as the hashing of the
resulting data is performed in scalar code, and compile-time
loop unrolling occurs when specialized to miniflow bits.

This commit checks at configure time if the assembling in use
has a known bug in assembling AVX512 code. If this bug is present,
all AVX512 code is disabled. Checking the version string of the binutils
or assembler is not a good method to detect the issue, as backported fixes
would not be reflected.

Signed-off-by: Harry van Haaren 


Thanks for this Harry,

I've spent some time testing this on both AVX512 enabled and non-enabled 
systems and can confirm the performance increase between scalar, generic 
and avx512 which is great too see.


One thing I noticed was that with the AVX512 system, there is a 
dependency on how the CFLAGS are passed when configuring and compiling 
OVS in order to enable AVX512 lookup.


In my testing passing the CFLAGS="CFLAGS="-g -Ofast -march=native" with 
configure seems to work fine and I could see the AVX512 lookup available.


However when testing with an older script and the same CFLAG was passed 
along with the make command instead of at configure, then AVX512  lookup 
would not be available.


Depending on how users configure and compile this may not be an issue 
but thought it worth flagging  as there does seem to be a dependency 
that could be missed at compilation (but from the configure logs all 
looked well).


At the minimum I think it might be worth documenting in the docs, 
possibly in the datapath performance section you add later in the series 
but also maybe in the compiler optimizations section.


Few minor comments below.



---

v6:
- Remove binutils probe .o file once used (Travis/Ian/William)
- Fix compilation/linking of avx512 library on --enable-shared when
   doing a make install-recursive (Travis/William/Ian)
- Fix "as unrecognized option --64" warning (Travis/William/Ian)

v5:
- Fixed typo equivelent/equivalent (William Tu)
- Fixed incorrect argument type uint64_t* to void* (Travis/William Tu)
   (Note: no functional change here - its still the same address :)
- Cleanup #ifdefs registering avx512 subtable lookup func (William Tu)
- Merged commit 6/7 from previous v4 to build avx512 "right" in one go
- Use mkdir -p to create build-aux/ dir before binutils check (Travis)

v4:
- Remove TODO comment on prio-set command (was accidentally
   added to this commit in v3)
- Fixup v3 changlog to not include #warning comment (William Tu)
- Remove #define for debugging in lookup.h
- Fix builds on older gcc versions that don't support -mavx512f.
   Solution involves CC_CHECK and #ifdefs in code (OVS Robot, William Tu)

v3:
- Improve function name for _any subtable lookup
- Use "" include not <> for immintrin.h
- Add checks for SSE42 instructions in core OVS for CRC32 based hashing
   If not available, disable AVX512 lookup implementation as it requires
   uses CRC32 for hashing, and the hashing algorithm must match core OVS.
- Rework ovs_asserts() into function selection time check
- Add #define for magic number 8, number of u64 blocks in AVX512 register
- Add #if CHECKER around AVX code, sparse doesn't like checking it
- Simplify avx512 enabled building, fixes builds with --enable-shared
---
  configure.ac   |   3 +
  lib/automake.mk|  21 ++
  lib/dpif-netdev-lookup-avx512-gather.c | 265 +
  lib/dpif-netdev-lookup.c   |  20 ++
  lib/dpif-netdev-lookup.h   |   4 +
  m4/openvswitch.m4  |  30 +++
  6 files changed, 343 insertions(+)
  create mode 100644 lib/dpif-netdev-lookup-avx512-gather.c

diff --git a/configure.ac b/configure.ac
index 81893e56e..76d6de4e8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -178,10 +178,13 @@ OVS_ENABLE_OPTION([-Wno-null-pointer-arithmetic])
  OVS_ENABLE_OPTION([-

Re: [ovs-dev] [PATCH] vswitchd: Pass MCL_ONFAULT to mlockall

2020-07-10 Thread Flavio Leitner
On Fri, Jul 10, 2020 at 12:53:13PM +0100, Ross Lagerwall wrote:
> mlockall locks thread stack pages into memory, even pages which have not
> yet been demand-paged.  As vswitchd has a lot of threads and the default
> stack size on x86_64 is 8 MiB, this consumes a lot of memory.  On two
> systems I looked at, vswitchd used ~150 MiB of RSS when idle after
> startup.
> 
> Use the new MCL_ONFAULT flag to only lock pages into memory once they
> have been demand-paged in. This still satisfies the requirement that
> vswitchd is not swapped out but frees up ~144 MiB of unswappable memory
> (18 threads x 8 MiB).  After this, vswitchd uses ~6 MiB when idle after
> startup.

The problem with this approach is that when using userspace datapath
those page faults can introduce scheduling points which introduces
jitter and impacts performance.

Alternatively you can try to reduce the default stack size as done
by this commit:

commit b82a90e266e1246fe2973db97c95df22558174ea
Author: Flavio Leitner 
Date:   Thu Feb 28 13:13:57 2019 -0300

rhel: limit stack size to 2M.


Thanks,
fbl

> 
> Signed-off-by: Ross Lagerwall 
> ---
>  vswitchd/ovs-vswitchd.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
> index 1e72b628b..e4d482521 100644
> --- a/vswitchd/ovs-vswitchd.c
> +++ b/vswitchd/ovs-vswitchd.c
> @@ -92,7 +92,10 @@ main(int argc, char *argv[])
>  
>  if (want_mlockall) {
>  #ifdef HAVE_MLOCKALL
> -if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
> +#ifndef MCL_ONFAULT
> +#define MCL_ONFAULT 0
> +#endif
> +if (mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) {
>  VLOG_ERR("mlockall failed: %s", ovs_strerror(errno));
>  } else {
>  set_memory_locked();
> -- 
> 2.21.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] vswitchd: Pass MCL_ONFAULT to mlockall

2020-07-10 Thread Gregory Rose



On 7/10/2020 4:53 AM, Ross Lagerwall wrote:

mlockall locks thread stack pages into memory, even pages which have not
yet been demand-paged.  As vswitchd has a lot of threads and the default
stack size on x86_64 is 8 MiB, this consumes a lot of memory.  On two
systems I looked at, vswitchd used ~150 MiB of RSS when idle after
startup.

Use the new MCL_ONFAULT flag to only lock pages into memory once they
have been demand-paged in. This still satisfies the requirement that
vswitchd is not swapped out but frees up ~144 MiB of unswappable memory
(18 threads x 8 MiB).  After this, vswitchd uses ~6 MiB when idle after
startup.

Signed-off-by: Ross Lagerwall 


Have you done any performance measurements?  This seems like it may
have some impact.

Thanks,

- Greg


---
  vswitchd/ovs-vswitchd.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index 1e72b628b..e4d482521 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -92,7 +92,10 @@ main(int argc, char *argv[])
  
  if (want_mlockall) {

  #ifdef HAVE_MLOCKALL
-if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
+#ifndef MCL_ONFAULT
+#define MCL_ONFAULT 0
+#endif
+if (mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) {
  VLOG_ERR("mlockall failed: %s", ovs_strerror(errno));
  } else {
  set_memory_locked();


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v12] AB bonding: Add "primary" interface concept

2020-07-10 Thread Flavio Leitner
On Thu, Jul 09, 2020 at 04:57:47PM -0700, Jeff Squyres via dev wrote:
> In AB bonding, if the current active slave becomes disabled, a
> replacement slave is arbitrarily picked from the remaining set of
> enabled slaves.  This commit adds the concept of a "primary" slave: an
> interface that will always be (or become) the current active slave if
> it is enabled.
> 
> The rationale for this functionality is to allow the designation of a
> preferred interface for a given bond.  For example:
> 
> 1. Bond is created with interfaces p1 (primary) and p2, both enabled.
> 2. p1 becomes the current active slave (because it was designated as
>the primary).
> 3. Later, p1 fails/becomes disabled.
> 4. p2 is chosen to become the current active slave.
> 5. Later, p1 becomes re-enabled.
> 6. p1 is chosen to become the current active slave (because it was
>designated as the primary)
> 
> Note that p1 becomes the active slave once it becomes re-enabled, even
> if nothing has happened to p2.
> 
> This "primary" concept exists in Linux kernel network interface
> bonding, but did not previously exist in OVS bonding.
> 
> Only one primary slave inteface is supported per bond, and is only
> supported for active/backup bonding.
> 
> The primary slave interface is designated via
> "other_config:bond-primary" when creating a bond.
> 
> Also, while adding tests for the "primary" concept, make a few small
> improvements to the non-primary AB bonding test.
> 
> Signed-off-by: Jeff Squyres 
> Reviewed-by: Aaron Conole 
> Tested-by: Greg Rose 
> Acked-by: Greg Rose 
> ---

Acked-by: Flavio Leitner 

Thanks!
fbl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] Fix the routing for external logical ports of bridged logical switches.

2020-07-10 Thread Numan Siddique
On Fri, Jul 10, 2020 at 4:41 PM Numan Siddique  wrote:

>
>
> On Fri, Jul 10, 2020 at 12:45 AM Ankur Sharma 
> wrote:
>
>> Hi Numan, Daniel,
>>
>> I have not looked at the patch yet. But replacing arp.sha with chassis
>> mac is not the correct approach from networking perspective.
>> Chassic mac is NOT meant to replace the IP-MAC binding of router port, it
>> is ONLY meant to ensure that for EW traffic a distributed router port mac
>> does not show on multiple TOR ports.
>> Both for NS and EW, ARP resolution for router port ip should be responded
>> with router port mac ONLY.
>>
>> I am trying to understand the use case and we can discuss an alternative
>> in this thread.
>> Can you share the repro steps, i can try the same and will try to come up
>> with an alternative.
>>
>>
> Hi Ankur,
>
> In this particular case, the originator of the traffic is from a logical
> port of type 'external'.
>
> One example of using external ports is for SRIOV VMs. The traffic from
> these VMs are not seen
> by the local ovn-controller. And we want to provide E-W routing and other
> OVN services like DHCP, DNS etc
> to these VMS.
>
> So one of the controller nodes (which can receive the traffic sent by
> these SRIOV VMs) binds these external ports
> and it responds to the ARP requests and does the routing for it.
>
> To reproduce the issue, can you please use own-fake-multi node setup from
> here ? -
> https://github.com/numansiddique/ovn-fake-multinode/tree/vlan_chassis_mac_issue
>
> The steps are:
> 1. Build OVN containers.
> ./ovn_cluster.sh build
>
>
Please note, before the 'start', you need to start openvswitch on the host.

Thanks
Numan


> 2. ./ovn_cluster.sh start
>
> Run
> 3. sudo ip netns exec sw0-ext1 ping -c3 20.0.0.3
> PING 20.0.0.3 (20.0.0.3) 56(84) bytes of data.
> 64 bytes from 20.0.0.3: icmp_seq=1 ttl=63 time=0.074 ms
> 64 bytes from 20.0.0.3: icmp_seq=1 ttl=63 time=0.086 ms (DUP!)
> 64 bytes from 20.0.0.3: icmp_seq=1 ttl=63 time=0.089 ms (DUP!)
> 64 bytes from 20.0.0.3: icmp_seq=2 ttl=63 time=0.105 ms
> 64 bytes from 20.0.0.3: icmp_seq=2 ttl=63 time=0.120 ms (DUP!)
> 64 bytes from 20.0.0.3: icmp_seq=2 ttl=63 time=0.124 ms (DUP!)
> 64 bytes from 20.0.0.3: icmp_seq=3 ttl=63 time=0.145 ms
>
> --- 20.0.0.3 ping statistics ---
> 3 packets transmitted, 3 received, +4 duplicates, 0% packet loss, time
> 2036ms
> rtt min/avg/max/mdev = 0.074/0.106/0.145/0.023 ms
>
> You will see a few DUP packets.
>
> $sudo ip netns exec sw0-ext1 ping -c3 10.0.0.1
> PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
> 64 bytes from 10.0.0.1: icmp_seq=1 ttl=254 time=0.298 ms
> 64 bytes from 10.0.0.1: icmp_seq=1 ttl=254 time=0.358 ms (DUP!)
> 64 bytes from 10.0.0.1: icmp_seq=1 ttl=254 time=0.384 ms (DUP!)
> 64 bytes from 10.0.0.1: icmp_seq=2 ttl=254 time=0.598 ms
> 64 bytes from 10.0.0.1: icmp_seq=2 ttl=254 time=0.594 ms (DUP!)
> 64 bytes from 10.0.0.1: icmp_seq=2 ttl=254 time=0.656 ms (DUP!)
> 64 bytes from 10.0.0.1: icmp_seq=3 ttl=254 time=0.715 ms
>
> --- 10.0.0.1 ping statistics ---
> 3 packets transmitted, 3 received, +4 duplicates, 0% packet loss, time
> 2088ms
> rtt min/avg/max/mdev = 0.298/0.514/0.715/0.152 ms
>
> In the setup, sw0-ext1 represents an external logical switch port. If you
> see the script here [1],
> sw0-ext1 is claimed by ovn-chassis-1 node.
>
> And when sw0-ext1 sends ARP request to 10.0.0.1, the arp request is
> handled by ovn-chassis-1
> and the reply has  - arp.sha = router mac  and eth.src = chassis mac of
> ovn-chassis-1.
>
> And hence sw0-ext1 sends ping packets with the destination mac of router
> port  IP - 10.0.0.1.
> And all the 3 nodes reply - ovn-chassis-1, ovn-chassis-2 and ovn-gw-1.
>
> I'm not sure if you have played with ovn-fake-multinode before. If you run
> "docker ps", you will see a docker
> container representing each chassis.
>
> Please do "docker exec -it ovn-central bash" and run a few
> ovn-nbctl/ovn-sbctl commands to know more.
>
> You can also see the script in [1] and reproduce the issue in your setup.
>
> I didn't find any other way to solve this issue. Also in normal situations
> where external ports are not used,
> any arp request to the router IP from bridge logical switch ports don't
> leave the chassis since the local
> ovn-controller itself replies. This is for tenant bridged VLAN logical
> switches. I guess for provider VLAN networks
> (which provide the N/S traffic, I guess the arp request for the router
> port can come from the physical network).
>
>
> [1] -
> https://github.com/numansiddique/ovn-fake-multinode/blob/vlan_chassis_mac_issue/ovn_cluster.sh#L501
>
>
> Thanks
> Numan
>
>
>
> Regards,
>> Ankur
>> 
>> From: num...@ovn.org 
>> Sent: Thursday, July 9, 2020 2:11 AM
>> To: d...@openvswitch.org 
>> Cc: Numan Siddique ; Daniel Alvarez ;
>> Ankur Sharma 
>> Subject: [PATCH ovn v2] Fix the routing for external logical ports of
>> bridged logical switches.
>>
>> From: Numan Siddique 
>>
>> Routing for external logical ports is broken if t

Re: [ovs-dev] [PATCH v3 1/4] Eliminate use of term "slave" in bond, LACP, and bundle contexts.

2020-07-10 Thread Alin Serdean
From: Ben Pfaff
Sent: Tuesday, July 7, 2020 7:29 PM
To: d...@openvswitch.org
Cc: Ben Pfaff
Subject: [ovs-dev] [PATCH v3 1/4] Eliminate use of term "slave" in bond, LACP, 
and bundle contexts.

Most of these changes should not change user-visible behavior.  One
place where they do is in "ovs-ofctl dump-flows", which will now output
"subs:..." inside "bundle" actions instead of "slaves:...".  I don't
expect this to cause real problems in most systems.  The old syntax
is still supported on input for backward compatibility.

Signed-off-by: Ben Pfaff 
---
As discussed offline instead of the term “sub-interface” maybe use “member”.

Leaving the link for the MSFT naming convention:
https://docs.microsoft.com/en-us/powershell/module/netlbfo/add-netlbfoteammember?view=win10-ps

--
Alin.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/3] netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item

2020-07-10 Thread 0-day Robot
Bleep bloop.  Greetings Sriharsha Basavapatna, I am a robot and I have tried 
out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 netdev-offload-dpdk: Pass L4 proto-id to match in the L3 
rte_flow_item
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 3/4] conntrack: Rename "master" connection to "primary" connection.

2020-07-10 Thread Alin Serdean
From: Ben Pfaff
Sent: Tuesday, July 7, 2020 7:26 PM
To: d...@openvswitch.org
Cc: Ben Pfaff
Subject: [ovs-dev] [PATCH v3 3/4] conntrack: Rename "master" connection to 
"primary" connection.

Signed-off-by: Ben Pfaff 
---

As discussed offline, your suggestion of switching to “parent” connection 
instead of “primary”, sounds better
in my opinion.


--
Alin
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/4] Use primary/secondary, not master/slave, as names for OpenFlow roles.

2020-07-10 Thread Alin Serdean
From: Ben Pfaff
Sent: Tuesday, July 7, 2020 7:26 PM
To: d...@openvswitch.org
Cc: Ben Pfaff
Subject: [ovs-dev] [PATCH v3 2/4] Use primary/secondary, not master/slave, as 
names for OpenFlow roles.

Signed-off-by: Ben Pfaff 
---
Acked-by: Alin Gabriel Serdean aserd...@ovn.org
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 4/4] Eliminate "whitelist" and "blacklist" terms.

2020-07-10 Thread Alin Serdean
From: Ben Pfaff
Sent: Tuesday, July 7, 2020 7:25 PM
To: d...@openvswitch.org
Cc: Ben Pfaff
Subject: [ovs-dev] [PATCH v3 4/4] Eliminate "whitelist" and "blacklist" terms.

There is one remaining use under datapath.  That change should happen
upstream in Linux first according to our usual policy.

Signed-off-by: Ben Pfaff 
---

Acked-by: Alin Gabriel Serdean aserd...@ovn.org
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 0/7] netdev datapath: Partial action offload

2020-07-10 Thread Ilya Maximets
On 7/9/20 8:47 AM, Sriharsha Basavapatna via dev wrote:
> Hi,
> 
> This patchset extends the "Partial HW acceleration" mode to offload a
> part of the action processing to HW, instead of offloading just lookup
> (MARK/RSS), for "vhost-user" ports. This is referred to as "Partial Action
> Offload". This mode does not require SRIOV/switchdev configuration. In this
> mode, forwarding (output) action is still performed by OVS-DPDK SW datapath.

Hi.  I like the idea of egress offloading.  It's interesting.
IIUC, HW will perform matching on egress packets and perform actions before
actually sending them. Is that right?

For the implementation I have a few concerns:

1. Why only vhost-user ports?  I mean, egress offloading could be done
   for any ingress port in case ingress port doesn't support full offloading.

   Moreover, you could have classification offloading on ingress and actions
   offloading on egress at the same time.  This might be useful, for example,
   if we have two diferent NICs that both supports offloading, but we have to
   send packets between them.  But, yes, that might be a way for further
   improvement.

   Regarding vhost-user, you're exposing too much of netdev internals to
   datapath layer by checking for a specific netdev type.  This is not
   a good thing to do.  Especially because egress offloading doesn't depend
   on a type of ingress interface.

2. I'm worried about other offload providers like tinux-tc that doesn't know
   anything that happens in dpif-netdev and will not work correctly if
   dpif-netdev will try to use egress offloading on it.
   I see that you're using netdev_dpdk_flow_api_supported() inside the
   dpif-netdev, but that is the violation of netdev-offload abstraction layer. 

   I think, some more generic extension of netdev-offload interface required,
   so all offload providers will be able to use this API.  I mean, egress
   offloading should be possible with TC.  You don't need to add support for
   this, but we should have a generic interface to utilize this support in
   the future.

   At least there should be something more generic like info.offload_direction
   of some enumeration type like
enum netdev_offload_direction {
NETDEV_OFFLOAD_INGRESS,
NETDEV_OFFLOAD_EGRESS,
};
   And each offload provider should decide by itself if it supports some type
   of offloading or not.
   netdev specific or specific to particular offload provider functions should
   not be used in general outside of their own modules.

3. Some new identifiers for flow dumps to distinguish classification and actions
   partial offloading needed.

4. Ingress partial actions offloading sounds interesting, but current
   implementation forces datapath assistance for such actions.  This will
   significantly impact kernel datapath as all the vlan operations will be
   sent back and forth between kernel and userspace.  And I'm not sure if this
   will even work correctly with current patches.


I didn't make a full review.  These are just quick comments from the 
architectural
point of view.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/3] tunnel: Set ECN mask bits only when it is matched in the IP header

2020-07-10 Thread Sriharsha Basavapatna via dev
IP_ECN_MASK is set unconditionally in the mask field for a
tunneled flow. Set this only when the ECN field is matched.

Fixes: abcd4402fec4 ("tunnel: Only un-wildcard the ECN bits for IP traffic")
Signed-off-by: Sriharsha Basavapatna 
---
 ofproto/tunnel.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
index 03f0ab765..d0a65b430 100644
--- a/ofproto/tunnel.c
+++ b/ofproto/tunnel.c
@@ -455,13 +455,17 @@ tnl_port_send(const struct ofport_dpif *ofport, struct 
flow *flow,
 
 /* ECN fields are always inherited. */
 if (is_ip_any(flow)) {
-wc->masks.nw_tos |= IP_ECN_MASK;
-
 if (IP_ECN_is_ce(flow->nw_tos)) {
 flow->tunnel.ip_tos |= IP_ECN_ECT_0;
 } else {
 flow->tunnel.ip_tos |= flow->nw_tos & IP_ECN_MASK;
 }
+
+if (flow->tunnel.ip_tos & IP_ECN_MASK) {
+wc->masks.nw_tos |= IP_ECN_MASK;
+} else {
+wc->masks.nw_tos &= ~IP_ECN_MASK;
+}
 }
 
 flow->tunnel.flags &= ~(FLOW_TNL_F_MASK & ~FLOW_TNL_PUB_F_MASK);
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/3] netdev-offload-dpdk: Set transfer attribute to zero for mark/rss offload

2020-07-10 Thread Sriharsha Basavapatna via dev
The offload layer doesn't initialize the 'transfer' attribute
for mark/rss offload (partial offload). It should be set to 0.

Fixes: 60e778c7533a ("netdev-offload-dpdk: Framework for actions offload.")
Signed-off-by: Sriharsha Basavapatna 
---
 lib/netdev-offload-dpdk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 26a75f0f2..4c652fd82 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -818,7 +818,8 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns *patterns,
 .group = 0,
 .priority = 0,
 .ingress = 1,
-.egress = 0
+.egress = 0,
+.transfer = 0
 };
 struct rte_flow_error error;
 struct rte_flow *flow;
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/3] netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item

2020-07-10 Thread Sriharsha Basavapatna via dev
The offload layer clears the L4 protocol mask in the L3 item, when the
L4 item is passed for matching, as an optimization. This can be confusing
while parsing the headers in the PMD. Also, the datapath flow specifies
this field to be matched. This optimization is best left to the PMD.
This patch restores the code to pass the L4 protocol type in L3 match.

Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Sriharsha Basavapatna 
---
 lib/netdev-offload-dpdk.c | 22 --
 1 file changed, 22 deletions(-)

diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 4c652fd82..165fd1f47 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -596,7 +596,6 @@ static int
 parse_flow_match(struct flow_patterns *patterns,
  const struct match *match)
 {
-uint8_t *next_proto_mask = NULL;
 uint8_t proto = 0;
 
 /* Eth */
@@ -667,7 +666,6 @@ parse_flow_match(struct flow_patterns *patterns,
 /* Save proto for L4 protocol setup. */
 proto = spec->hdr.next_proto_id &
 mask->hdr.next_proto_id;
-next_proto_mask = &mask->hdr.next_proto_id;
 }
 
 if (proto != IPPROTO_ICMP && proto != IPPROTO_UDP  &&
@@ -701,11 +699,6 @@ parse_flow_match(struct flow_patterns *patterns,
 mask->hdr.tcp_flags = ntohs(match->wc.masks.tcp_flags) & 0xff;
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_TCP, spec, mask);
-
-/* proto == TCP and ITEM_TYPE_TCP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 } else if (proto == IPPROTO_UDP) {
 struct rte_flow_item_udp *spec, *mask;
 
@@ -719,11 +712,6 @@ parse_flow_match(struct flow_patterns *patterns,
 mask->hdr.dst_port = match->wc.masks.tp_dst;
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_UDP, spec, mask);
-
-/* proto == UDP and ITEM_TYPE_UDP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 } else if (proto == IPPROTO_SCTP) {
 struct rte_flow_item_sctp *spec, *mask;
 
@@ -737,11 +725,6 @@ parse_flow_match(struct flow_patterns *patterns,
 mask->hdr.dst_port = match->wc.masks.tp_dst;
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_SCTP, spec, mask);
-
-/* proto == SCTP and ITEM_TYPE_SCTP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 } else if (proto == IPPROTO_ICMP) {
 struct rte_flow_item_icmp *spec, *mask;
 
@@ -755,11 +738,6 @@ parse_flow_match(struct flow_patterns *patterns,
 mask->hdr.icmp_code = (uint8_t) ntohs(match->wc.masks.tp_dst);
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ICMP, spec, mask);
-
-/* proto == ICMP and ITEM_TYPE_ICMP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 }
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_END, NULL, NULL);
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/3] netdev datapath offload: misc fixes

2020-07-10 Thread Sriharsha Basavapatna via dev
Hi,

This patchset fixes some issues found during netdev-offload-dpdk testing.

Patch-1: Initialize rte 'transfer' attribute for mark/rss offload.
Patch-2: Pass L4 protocol-id to match in the rte_flow_item.
Patch-3: Set IP_ECN_MASK only when the ECN field is matched.

Thanks,
-Harsha

**

v1:
- Created this patchset using patches 1 & 2, sent separately earlier.
  Please ignore the previous version of these patches.
- Patch-2: Updated "fixes:" tag with the right commit id.
- Added patch-3.

**

Sriharsha Basavapatna (3):
  netdev-offload-dpdk: Set transfer attribute to zero for mark/rss
offload
  netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item
  tunnel: Set ECN mask bits only when it is matched in the IP header

 lib/netdev-offload-dpdk.c | 25 ++---
 ofproto/tunnel.c  |  8 ++--
 2 files changed, 8 insertions(+), 25 deletions(-)

-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item

2020-07-10 Thread Sriharsha Basavapatna via dev
On Fri, Jul 10, 2020 at 4:01 PM Sriharsha Basavapatna <
sriharsha.basavapa...@broadcom.com> wrote:

>
>
> On Sun, Jul 5, 2020 at 6:00 PM Eli Britstein  wrote:
>
>>
>> On 7/5/2020 2:48 PM, Sriharsha Basavapatna wrote:
>> > The offload layer clears the L4 protocol mask in the L3 item, when the
>> > L4 item is passed for matching, as an optimization. This can be
>> confusing
>> > while parsing the headers in the PMD. Also, the datapath flow specifies
>> > this field to be matched. This optimization is best left to the PMD.
>> > This patch restores the code to pass the L4 protocol type in L3 match.
>> >
>> > Fixes: 900fe00784ca ("netdev-offload-dpdk: Dynamically allocate pattern
>> items.")
>>
>> It's arguable if it's really a fix.
>
> It is better not to ignore a field that is specified to be matched by the
> datapath flow.
>
>
>> I don't see any further information
>> the PMD can use, but it's harmless anyway, so OK by me either with this
>> commit or without.
>
> If you insist it's a fix, this is the correct commit that did it in the
>> first place:
>>
>> e8a2b5bf92bb netdev-dpdk: implement flow offload with rte flow
>>
>
> Thanks, I'll update the "fixes" field in v2.
> -Harsha
>

I'll send v2 of this patch in a patchset with a couple of other fixes.
-Harsha


>
>> > Signed-off-by: Sriharsha Basavapatna <
>> sriharsha.basavapa...@broadcom.com>
>> > ---
>> >   lib/netdev-offload-dpdk.c | 22 --
>> >   1 file changed, 22 deletions(-)
>> >
>> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>> > index 4c652fd82..165fd1f47 100644
>> > --- a/lib/netdev-offload-dpdk.c
>> > +++ b/lib/netdev-offload-dpdk.c
>> > @@ -596,7 +596,6 @@ static int
>> >   parse_flow_match(struct flow_patterns *patterns,
>> >const struct match *match)
>> >   {
>> > -uint8_t *next_proto_mask = NULL;
>> >   uint8_t proto = 0;
>> >
>> >   /* Eth */
>> > @@ -667,7 +666,6 @@ parse_flow_match(struct flow_patterns *patterns,
>> >   /* Save proto for L4 protocol setup. */
>> >   proto = spec->hdr.next_proto_id &
>> >   mask->hdr.next_proto_id;
>> > -next_proto_mask = &mask->hdr.next_proto_id;
>> >   }
>> >
>> >   if (proto != IPPROTO_ICMP && proto != IPPROTO_UDP  &&
>> > @@ -701,11 +699,6 @@ parse_flow_match(struct flow_patterns *patterns,
>> >   mask->hdr.tcp_flags = ntohs(match->wc.masks.tcp_flags) & 0xff;
>> >
>> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_TCP, spec,
>> mask);
>> > -
>> > -/* proto == TCP and ITEM_TYPE_TCP, thus no need for proto
>> match. */
>> > -if (next_proto_mask) {
>> > -*next_proto_mask = 0;
>> > -}
>> >   } else if (proto == IPPROTO_UDP) {
>> >   struct rte_flow_item_udp *spec, *mask;
>> >
>> > @@ -719,11 +712,6 @@ parse_flow_match(struct flow_patterns *patterns,
>> >   mask->hdr.dst_port = match->wc.masks.tp_dst;
>> >
>> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_UDP, spec,
>> mask);
>> > -
>> > -/* proto == UDP and ITEM_TYPE_UDP, thus no need for proto
>> match. */
>> > -if (next_proto_mask) {
>> > -*next_proto_mask = 0;
>> > -}
>> >   } else if (proto == IPPROTO_SCTP) {
>> >   struct rte_flow_item_sctp *spec, *mask;
>> >
>> > @@ -737,11 +725,6 @@ parse_flow_match(struct flow_patterns *patterns,
>> >   mask->hdr.dst_port = match->wc.masks.tp_dst;
>> >
>> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_SCTP, spec,
>> mask);
>> > -
>> > -/* proto == SCTP and ITEM_TYPE_SCTP, thus no need for proto
>> match. */
>> > -if (next_proto_mask) {
>> > -*next_proto_mask = 0;
>> > -}
>> >   } else if (proto == IPPROTO_ICMP) {
>> >   struct rte_flow_item_icmp *spec, *mask;
>> >
>> > @@ -755,11 +738,6 @@ parse_flow_match(struct flow_patterns *patterns,
>> >   mask->hdr.icmp_code = (uint8_t) ntohs(match->wc.masks.tp_dst);
>> >
>> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ICMP, spec,
>> mask);
>> > -
>> > -/* proto == ICMP and ITEM_TYPE_ICMP, thus no need for proto
>> match. */
>> > -if (next_proto_mask) {
>> > -*next_proto_mask = 0;
>> > -}
>> >   }
>> >
>> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_END, NULL, NULL);
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-dpdk: Set transfer attribute to zero for mark/rss offload

2020-07-10 Thread Sriharsha Basavapatna via dev
Please ignore this patch, I'm resending it in a patchset.

Thanks,
-Harsha

On Mon, Jul 6, 2020 at 6:54 PM Sriharsha Basavapatna <
sriharsha.basavapa...@broadcom.com> wrote:

> A gentle reminder on this patch.
> Thanks,
> -Harsha
>
> On Mon, Jun 29, 2020 at 11:31 PM Sriharsha Basavapatna
>  wrote:
> >
> > The offload layer doesn't initialize the 'transfer' attribute
> > for mark/rss offload (partial offload). It should be set to 0.
> >
> > Fixes: 60e778c7533a ("netdev-offload-dpdk: Framework for actions
> offload.")
> > Signed-off-by: Sriharsha Basavapatna  >
> > ---
> >  lib/netdev-offload-dpdk.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index 26a75f0f2..4c652fd82 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -818,7 +818,8 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns
> *patterns,
> >  .group = 0,
> >  .priority = 0,
> >  .ingress = 1,
> > -.egress = 0
> > +.egress = 0,
> > +.transfer = 0
> >  };
> >  struct rte_flow_error error;
> >  struct rte_flow *flow;
> > --
> > 2.25.0.rc2
> >
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] vswitchd: Pass MCL_ONFAULT to mlockall

2020-07-10 Thread Ross Lagerwall
mlockall locks thread stack pages into memory, even pages which have not
yet been demand-paged.  As vswitchd has a lot of threads and the default
stack size on x86_64 is 8 MiB, this consumes a lot of memory.  On two
systems I looked at, vswitchd used ~150 MiB of RSS when idle after
startup.

Use the new MCL_ONFAULT flag to only lock pages into memory once they
have been demand-paged in. This still satisfies the requirement that
vswitchd is not swapped out but frees up ~144 MiB of unswappable memory
(18 threads x 8 MiB).  After this, vswitchd uses ~6 MiB when idle after
startup.

Signed-off-by: Ross Lagerwall 
---
 vswitchd/ovs-vswitchd.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index 1e72b628b..e4d482521 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -92,7 +92,10 @@ main(int argc, char *argv[])
 
 if (want_mlockall) {
 #ifdef HAVE_MLOCKALL
-if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
+#ifndef MCL_ONFAULT
+#define MCL_ONFAULT 0
+#endif
+if (mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) {
 VLOG_ERR("mlockall failed: %s", ovs_strerror(errno));
 } else {
 set_memory_locked();
-- 
2.21.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3] ovn-northd: Fix logical flows to limit ARP/NS broadcast domain.

2020-07-10 Thread Numan Siddique
On Fri, Jul 10, 2020 at 1:27 AM Dumitru Ceara  wrote:

> Logical flows that limit the ARP/NS broadcast domain on a logical switch
> should only match on ARP requests/NS for IPs that can actually be
> replied to on the connected router port (i.e., an IP on the same network
> is configured on the router port).
>
> Reported-by: Girish Moodalbail 
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-June/050287.html
> Fixes: 32f5ebb06226 ("ovn-northd: Limit ARP/ND broadcast domain whenever
> possible.")
> Signed-off-by: Dumitru Ceara 
>

Acked-by: Numan Siddique 

Thanks
Numan


> ---
> v3:
> - The first patch of the series was already applied.
> - Addressed Numan's comments:
>   - use precomputed ipv4/ipv6_netaddr network field instead of
> recomputing it.
>   - reworded comments to make them more clear.
> v2:
> - Changed the fix into a series, such that the memory leak fix can be
>   easily backported to stable branches.
> - Fixed the "Fixes" tag.
> ---
>  northd/ovn-northd.c | 162
> +++-
>  tests/ovn.at|  74 
>  2 files changed, 208 insertions(+), 28 deletions(-)
>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 1921982..d64b467 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -6091,6 +6091,42 @@ build_lrouter_groups(struct hmap *ports, struct
> ovs_list *lr_list)
>  }
>  }
>
> +/* Returns 'true' if the IPv4 'addr' is on the same subnet with one of the
> + * IPs configured on the router port.
> + */
> +static bool
> +lrouter_port_ipv4_reachable(const struct ovn_port *op, ovs_be32 addr)
> +{
> +for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
> +struct ipv4_netaddr *op_addr = &op->lrp_networks.ipv4_addrs[i];
> +
> +if ((addr & op_addr->mask) == op_addr->network) {
> +return true;
> +}
> +}
> +return false;
> +}
> +
> +/* Returns 'true' if the IPv6 'addr' is on the same subnet with one of the
> + * IPs configured on the router port.
> + */
> +static bool
> +lrouter_port_ipv6_reachable(const struct ovn_port *op,
> +const struct in6_addr *addr)
> +{
> +for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) {
> +struct ipv6_netaddr *op_addr = &op->lrp_networks.ipv6_addrs[i];
> +
> +struct in6_addr nat_addr6_masked =
> +ipv6_addr_bitand(addr, &op_addr->mask);
> +
> +if (ipv6_addr_equals(&nat_addr6_masked, &op_addr->network)) {
> +return true;
> +}
> +}
> +return false;
> +}
> +
>  /*
>   * Ingress table 19: Flows that flood self originated ARP/ND packets in
> the
>   * switching domain.
> @@ -6101,8 +6137,47 @@ build_lswitch_rport_arp_req_self_orig_flow(struct
> ovn_port *op,
> struct ovn_datapath *od,
> struct hmap *lflows)
>  {
> -struct ds match = DS_EMPTY_INITIALIZER;
> +struct sset all_eth_addrs = SSET_INITIALIZER(&all_eth_addrs);
>  struct ds eth_src = DS_EMPTY_INITIALIZER;
> +struct ds match = DS_EMPTY_INITIALIZER;
> +
> +sset_add(&all_eth_addrs, op->lrp_networks.ea_s);
> +
> +for (size_t i = 0; i < op->od->nbr->n_nat; i++) {
> +struct ovn_nat *nat_entry = &op->od->nat_entries[i];
> +const struct nbrec_nat *nat = nat_entry->nb;
> +
> +if (!nat_entry_is_valid(nat_entry)) {
> +continue;
> +}
> +
> +if (!strcmp(nat->type, "snat")) {
> +continue;
> +}
> +
> +if (!nat->external_mac) {
> +continue;
> +}
> +
> +/* Check if the ovn port has a network configured on which we
> could
> + * expect ARP requests/NS for the DNAT external_ip.
> + */
> +if (nat_entry_is_v6(nat_entry)) {
> +struct in6_addr *addr =
> &nat_entry->ext_addrs.ipv6_addrs[0].addr;
> +
> +if (!lrouter_port_ipv6_reachable(op, addr)) {
> +continue;
> +}
> +} else {
> +ovs_be32 addr = nat_entry->ext_addrs.ipv4_addrs[0].addr;
> +
> +if (!lrouter_port_ipv4_reachable(op, addr)) {
> +continue;
> +}
> +}
> +sset_add(&all_eth_addrs, nat->external_mac);
> +}
> +
>
>  /* Self originated (G)ARP requests/ND need to be flooded as usual.
>   * Determine that packets are self originated by also matching on
> @@ -6110,15 +6185,11 @@ build_lswitch_rport_arp_req_self_orig_flow(struct
> ovn_port *op,
>   * is a VLAN-backed network.
>   * Priority: 80.
>   */
> -ds_put_format(ð_src, "{ %s, ", op->lrp_networks.ea_s);
> -for (size_t i = 0; i < op->od->nbr->n_nat; i++) {
> -const struct nbrec_nat *nat = op->od->nbr->nat[i];
> -
> -if (!nat->external_mac) {
> -continue;
> -}
> +const char *eth_addr;
>
> -ds_put_format(ð_src, "%s, "

[ovs-dev] [PATCH] ovsdb-idl: Send "set_db_change_aware" before "monitor_cond_since".

2020-07-10 Thread Dumitru Ceara
For short lived IDL clients (e.g., ovn-sbctl) if the client sends
monitor_cond_since before set_db_change_aware, the client might close
the DB connection immediately after it received the reply for
monitor_cond_since and before the server has a chance to reply to
set_db_change_aware.

E.g., from the logs of the ovsdb-server:
2020-07-10T09:29:52.649Z|04479|jsonrpc|DBG|unix#72: received request,
method="monitor_cond_since", params=["OVN_Southbound",
["monid","OVN_Southbound"],{"SB_Global":[{"columns":["options"]}]},
"----"], id=2
2020-07-10T09:29:52.649Z|04480|jsonrpc|DBG|unix#72: send reply,
result=[false,"----",
{"SB_Global":{"6ad26b48-a742-4fe1-8671-3975e2146ce6":{"initial":
{"options":["map",[["mac_prefix","be:85:cb"],["svc_monitor_mac",
"52:58:b5:19:8c:40"]]]], id=2
2020-07-10T09:29:52.649Z|04482|jsonrpc|DBG|unix#72: received request,
method="set_db_change_aware", params=[true], id=3

<<< IDL client closes the connection here because it already got the
response to the monitor_cond_since request.

2020-07-10T09:29:59.023Z|04483|jsonrpc|DBG|unix#72: send reply, result={}, id=3
2020-07-10T09:29:59.023Z|04484|stream_fd|DBG|send: Broken pipe
2020-07-10T09:29:59.023Z|04485|jsonrpc|WARN|unix#72: send error: Broken pipe

While this is not a critical issue, it can be easily mitigated by changing
the IDL client to always send "set_db_change_aware" before
"monitor_cond_since". This way we ensure that a well behaving IDL client
doesn't close the connection too early, avoiding the error logs in
ovsdb-server.

This patch moves the code to send monitor_cond_since(data) from function
ovsdb_idl_check_server_db() to ovsdb_idl_process_response() as we can
transition to IDL_S_DATA_MONITOR_COND_SINCE_REQUESTED only upon
receiving a reply for monitor_cond(server).

CC: Ben Pfaff 
CC: Han Zhou 
CC: Ilya Maximets 
Reported-by: Girish Moodalbail 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2020-July/050343.html
Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
databases.")
Signed-off-by: Dumitru Ceara 
---
 lib/ovsdb-idl.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
index ef3b97b..c6427f5 100644
--- a/lib/ovsdb-idl.c
+++ b/lib/ovsdb-idl.c
@@ -770,6 +770,10 @@ ovsdb_idl_process_response(struct ovsdb_idl *idl, struct 
jsonrpc_msg *msg)
  OVSDB_IDL_MM_MONITOR_COND);
 if (ovsdb_idl_check_server_db(idl)) {
 ovsdb_idl_send_db_change_aware(idl);
+ovsdb_idl_send_monitor_request(
+idl, &idl->data, OVSDB_IDL_MM_MONITOR_COND_SINCE);
+ovsdb_idl_transition(
+idl, IDL_S_DATA_MONITOR_COND_SINCE_REQUESTED);
 }
 } else {
 ovsdb_idl_send_schema_request(idl, &idl->data);
@@ -2057,9 +2061,6 @@ ovsdb_idl_check_server_db(struct ovsdb_idl *idl)
 if (idl->state == IDL_S_SERVER_MONITOR_COND_REQUESTED) {
 json_destroy(idl->data.schema);
 idl->data.schema = json_from_string(database->schema);
-ovsdb_idl_send_monitor_request(idl, &idl->data,
-   OVSDB_IDL_MM_MONITOR_COND_SINCE);
-ovsdb_idl_transition(idl, IDL_S_DATA_MONITOR_COND_SINCE_REQUESTED);
 }
 return true;
 }
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Treat as urgent

2020-07-10 Thread Mr . john obisheh
-- 
I have made series of efforts to reach you on telephone concerning
your funds which was discovered in a Dormant Account with our bank. I
discovered that your funds was floating in an account with a financial
firm (my branch) and the money could not be transferred because of
lack of proof for the source of the funds also the transfer was not
authorized by you since your name was in the document.I had to trace
the documents to find out that the funds original belongs to you and I
decided to contact you so that you can have a smooth transfer. I will
appreciate if you can offer me 5% of the total money so that I will
work with you concerning this issue.

The account where the fund is deposited is currently dormant and if
you accept the agreement of offering me 5% of the funds then we can
work and I will guard you on the step on how you can get the funds
release and you will transfer the funds from you sitting room via your
telephone because that is a more secured way of doing bank transfers
currently to avoid fraud.

I will be waiting for your call once you receive my proposal and I
promise you that the funds will be accessible under 6 days.
You can email me back with your telephone number so I can call you at
better time for deep conversation on how to move forward.

Thanks
Yours faithfully

Mr.john obisheh
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] Fix the routing for external logical ports of bridged logical switches.

2020-07-10 Thread Numan Siddique
On Fri, Jul 10, 2020 at 12:45 AM Ankur Sharma 
wrote:

> Hi Numan, Daniel,
>
> I have not looked at the patch yet. But replacing arp.sha with chassis mac
> is not the correct approach from networking perspective.
> Chassic mac is NOT meant to replace the IP-MAC binding of router port, it
> is ONLY meant to ensure that for EW traffic a distributed router port mac
> does not show on multiple TOR ports.
> Both for NS and EW, ARP resolution for router port ip should be responded
> with router port mac ONLY.
>
> I am trying to understand the use case and we can discuss an alternative
> in this thread.
> Can you share the repro steps, i can try the same and will try to come up
> with an alternative.
>
>
Hi Ankur,

In this particular case, the originator of the traffic is from a logical
port of type 'external'.

One example of using external ports is for SRIOV VMs. The traffic from
these VMs are not seen
by the local ovn-controller. And we want to provide E-W routing and other
OVN services like DHCP, DNS etc
to these VMS.

So one of the controller nodes (which can receive the traffic sent by these
SRIOV VMs) binds these external ports
and it responds to the ARP requests and does the routing for it.

To reproduce the issue, can you please use own-fake-multi node setup from
here ? -
https://github.com/numansiddique/ovn-fake-multinode/tree/vlan_chassis_mac_issue

The steps are:
1. Build OVN containers.
./ovn_cluster.sh build

2. ./ovn_cluster.sh start

Run
3. sudo ip netns exec sw0-ext1 ping -c3 20.0.0.3
PING 20.0.0.3 (20.0.0.3) 56(84) bytes of data.
64 bytes from 20.0.0.3: icmp_seq=1 ttl=63 time=0.074 ms
64 bytes from 20.0.0.3: icmp_seq=1 ttl=63 time=0.086 ms (DUP!)
64 bytes from 20.0.0.3: icmp_seq=1 ttl=63 time=0.089 ms (DUP!)
64 bytes from 20.0.0.3: icmp_seq=2 ttl=63 time=0.105 ms
64 bytes from 20.0.0.3: icmp_seq=2 ttl=63 time=0.120 ms (DUP!)
64 bytes from 20.0.0.3: icmp_seq=2 ttl=63 time=0.124 ms (DUP!)
64 bytes from 20.0.0.3: icmp_seq=3 ttl=63 time=0.145 ms

--- 20.0.0.3 ping statistics ---
3 packets transmitted, 3 received, +4 duplicates, 0% packet loss, time
2036ms
rtt min/avg/max/mdev = 0.074/0.106/0.145/0.023 ms

You will see a few DUP packets.

$sudo ip netns exec sw0-ext1 ping -c3 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=254 time=0.298 ms
64 bytes from 10.0.0.1: icmp_seq=1 ttl=254 time=0.358 ms (DUP!)
64 bytes from 10.0.0.1: icmp_seq=1 ttl=254 time=0.384 ms (DUP!)
64 bytes from 10.0.0.1: icmp_seq=2 ttl=254 time=0.598 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=254 time=0.594 ms (DUP!)
64 bytes from 10.0.0.1: icmp_seq=2 ttl=254 time=0.656 ms (DUP!)
64 bytes from 10.0.0.1: icmp_seq=3 ttl=254 time=0.715 ms

--- 10.0.0.1 ping statistics ---
3 packets transmitted, 3 received, +4 duplicates, 0% packet loss, time
2088ms
rtt min/avg/max/mdev = 0.298/0.514/0.715/0.152 ms

In the setup, sw0-ext1 represents an external logical switch port. If you
see the script here [1],
sw0-ext1 is claimed by ovn-chassis-1 node.

And when sw0-ext1 sends ARP request to 10.0.0.1, the arp request is handled
by ovn-chassis-1
and the reply has  - arp.sha = router mac  and eth.src = chassis mac of
ovn-chassis-1.

And hence sw0-ext1 sends ping packets with the destination mac of router
port  IP - 10.0.0.1.
And all the 3 nodes reply - ovn-chassis-1, ovn-chassis-2 and ovn-gw-1.

I'm not sure if you have played with ovn-fake-multinode before. If you run
"docker ps", you will see a docker
container representing each chassis.

Please do "docker exec -it ovn-central bash" and run a few
ovn-nbctl/ovn-sbctl commands to know more.

You can also see the script in [1] and reproduce the issue in your setup.

I didn't find any other way to solve this issue. Also in normal situations
where external ports are not used,
any arp request to the router IP from bridge logical switch ports don't
leave the chassis since the local
ovn-controller itself replies. This is for tenant bridged VLAN logical
switches. I guess for provider VLAN networks
(which provide the N/S traffic, I guess the arp request for the router port
can come from the physical network).


[1] -
https://github.com/numansiddique/ovn-fake-multinode/blob/vlan_chassis_mac_issue/ovn_cluster.sh#L501


Thanks
Numan



Regards,
> Ankur
> 
> From: num...@ovn.org 
> Sent: Thursday, July 9, 2020 2:11 AM
> To: d...@openvswitch.org 
> Cc: Numan Siddique ; Daniel Alvarez ;
> Ankur Sharma 
> Subject: [PATCH ovn v2] Fix the routing for external logical ports of
> bridged logical switches.
>
> From: Numan Siddique 
>
> Routing for external logical ports is broken if these ports belonged
> to bridged logical switches (with localnet port) and
> 'ovn-chassis-mac-mappings'
> is configured. External logical ports are those which are external to OVN,
> but there is a logical port for it and it is claimed by one of the HA
> chassis.
> The claimed chassis provides routing and other native OVN serices like
> dhcp and dns.
>
> When t

Re: [ovs-dev] [PATCH] netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item

2020-07-10 Thread Sriharsha Basavapatna via dev
On Sun, Jul 5, 2020 at 6:00 PM Eli Britstein  wrote:

>
> On 7/5/2020 2:48 PM, Sriharsha Basavapatna wrote:
> > The offload layer clears the L4 protocol mask in the L3 item, when the
> > L4 item is passed for matching, as an optimization. This can be confusing
> > while parsing the headers in the PMD. Also, the datapath flow specifies
> > this field to be matched. This optimization is best left to the PMD.
> > This patch restores the code to pass the L4 protocol type in L3 match.
> >
> > Fixes: 900fe00784ca ("netdev-offload-dpdk: Dynamically allocate pattern
> items.")
>
> It's arguable if it's really a fix.

It is better not to ignore a field that is specified to be matched by the
datapath flow.


> I don't see any further information
> the PMD can use, but it's harmless anyway, so OK by me either with this
> commit or without.

If you insist it's a fix, this is the correct commit that did it in the
> first place:
>
> e8a2b5bf92bb netdev-dpdk: implement flow offload with rte flow
>

Thanks, I'll update the "fixes" field in v2.
-Harsha

>
> > Signed-off-by: Sriharsha Basavapatna  >
> > ---
> >   lib/netdev-offload-dpdk.c | 22 --
> >   1 file changed, 22 deletions(-)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index 4c652fd82..165fd1f47 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -596,7 +596,6 @@ static int
> >   parse_flow_match(struct flow_patterns *patterns,
> >const struct match *match)
> >   {
> > -uint8_t *next_proto_mask = NULL;
> >   uint8_t proto = 0;
> >
> >   /* Eth */
> > @@ -667,7 +666,6 @@ parse_flow_match(struct flow_patterns *patterns,
> >   /* Save proto for L4 protocol setup. */
> >   proto = spec->hdr.next_proto_id &
> >   mask->hdr.next_proto_id;
> > -next_proto_mask = &mask->hdr.next_proto_id;
> >   }
> >
> >   if (proto != IPPROTO_ICMP && proto != IPPROTO_UDP  &&
> > @@ -701,11 +699,6 @@ parse_flow_match(struct flow_patterns *patterns,
> >   mask->hdr.tcp_flags = ntohs(match->wc.masks.tcp_flags) & 0xff;
> >
> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_TCP, spec, mask);
> > -
> > -/* proto == TCP and ITEM_TYPE_TCP, thus no need for proto
> match. */
> > -if (next_proto_mask) {
> > -*next_proto_mask = 0;
> > -}
> >   } else if (proto == IPPROTO_UDP) {
> >   struct rte_flow_item_udp *spec, *mask;
> >
> > @@ -719,11 +712,6 @@ parse_flow_match(struct flow_patterns *patterns,
> >   mask->hdr.dst_port = match->wc.masks.tp_dst;
> >
> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_UDP, spec, mask);
> > -
> > -/* proto == UDP and ITEM_TYPE_UDP, thus no need for proto
> match. */
> > -if (next_proto_mask) {
> > -*next_proto_mask = 0;
> > -}
> >   } else if (proto == IPPROTO_SCTP) {
> >   struct rte_flow_item_sctp *spec, *mask;
> >
> > @@ -737,11 +725,6 @@ parse_flow_match(struct flow_patterns *patterns,
> >   mask->hdr.dst_port = match->wc.masks.tp_dst;
> >
> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_SCTP, spec,
> mask);
> > -
> > -/* proto == SCTP and ITEM_TYPE_SCTP, thus no need for proto
> match. */
> > -if (next_proto_mask) {
> > -*next_proto_mask = 0;
> > -}
> >   } else if (proto == IPPROTO_ICMP) {
> >   struct rte_flow_item_icmp *spec, *mask;
> >
> > @@ -755,11 +738,6 @@ parse_flow_match(struct flow_patterns *patterns,
> >   mask->hdr.icmp_code = (uint8_t) ntohs(match->wc.masks.tp_dst);
> >
> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ICMP, spec,
> mask);
> > -
> > -/* proto == ICMP and ITEM_TYPE_ICMP, thus no need for proto
> match. */
> > -if (next_proto_mask) {
> > -*next_proto_mask = 0;
> > -}
> >   }
> >
> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_END, NULL, NULL);
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] RFC DEMO of using parallel processing in OVN

2020-07-10 Thread anton . ivanov
Hi all,

This is a series of patches to demo the use in OVN of the parallel
processing of hashes patchset which I submitted to OVS.

The OVS patch series required for this patch can be found here: 

https://patchwork.ozlabs.org/project/openvswitch/patch/20200706083650.29443-2-anton.iva...@cambridgegreys.com/
https://patchwork.ozlabs.org/project/openvswitch/patch/20200706083650.29443-3-anton.iva...@cambridgegreys.com/

or here:

https://github.com/kot-begemot-uk/ovs/tree/parallel-submit-final

They improve the performance and stability of ovn in a scale test. 
I have used 64+ fake nodes in my scale testing. Your mileage may
vary depending on the performance of the systems used to run the tests.

The patchset covers most parts of ovn-northd which look amenable to
parallel processing. It may be possible to expand it to also cover
initial datapath hash generation and sb_only/nb_only lists, but that
does not look like it is worth it except for extremely large datasets.



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn RFC 4/5] Parallelise lswitch lflow generation where possible

2020-07-10 Thread anton . ivanov
From: Anton Ivanov 

Signed-off-by: Anton Ivanov 
---
 northd/ovn-northd.c | 1509 +--
 1 file changed, 880 insertions(+), 629 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 53ea35de7..61aecb80e 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -4553,8 +4553,7 @@ has_stateful_acl(struct ovn_datapath *od)
 }
 
 static void
-build_lswitch_input_port_sec(struct hmap *ports, struct hmap *datapaths,
- struct hmap *lflows)
+build_lswitch_input_port_sec_op(struct ovn_port *op, struct hmap *lflows)
 {
 /* Logical switch ingress table 0: Ingress port security - L2
  *  (priority 50).
@@ -4563,68 +4562,64 @@ build_lswitch_input_port_sec(struct hmap *ports, struct 
hmap *datapaths,
  */
 struct ds actions = DS_EMPTY_INITIALIZER;
 struct ds match = DS_EMPTY_INITIALIZER;
-struct ovn_port *op;
 
-HMAP_FOR_EACH (op, key_node, ports) {
-if (!op->nbsp) {
-continue;
-}
+if (!op->nbsp) {
+return;
+}
 
-if (!lsp_is_enabled(op->nbsp)) {
-/* Drop packets from disabled logical ports (since logical flow
- * tables are default-drop). */
-continue;
-}
+if (!lsp_is_enabled(op->nbsp)) {
+/* Drop packets from disabled logical ports (since logical flow
+ * tables are default-drop). */
+return;
+}
 
-if (lsp_is_external(op->nbsp)) {
-continue;
-}
+if (lsp_is_external(op->nbsp)) {
+return;
+}
 
-ds_clear(&match);
-ds_clear(&actions);
-ds_put_format(&match, "inport == %s", op->json_key);
-build_port_security_l2("eth.src", op->ps_addrs, op->n_ps_addrs,
-   &match);
+ds_clear(&match);
+ds_clear(&actions);
+ds_put_format(&match, "inport == %s", op->json_key);
+build_port_security_l2("eth.src", op->ps_addrs, op->n_ps_addrs,
+   &match);
 
-const char *queue_id = smap_get(&op->sb->options, "qdisc_queue_id");
-if (queue_id) {
-ds_put_format(&actions, "set_queue(%s); ", queue_id);
-}
-ds_put_cstr(&actions, "next;");
-ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_PORT_SEC_L2, 50,
-ds_cstr(&match), ds_cstr(&actions),
-&op->nbsp->header_);
+const char *queue_id = smap_get(&op->sb->options, "qdisc_queue_id");
+if (queue_id) {
+ds_put_format(&actions, "set_queue(%s); ", queue_id);
+}
+ds_put_cstr(&actions, "next;");
+ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_PORT_SEC_L2, 50,
+ds_cstr(&match), ds_cstr(&actions),
+&op->nbsp->header_);
 
-if (op->nbsp->n_port_security) {
-build_port_security_ip(P_IN, op, lflows, &op->nbsp->header_);
-build_port_security_nd(op, lflows, &op->nbsp->header_);
-}
+if (op->nbsp->n_port_security) {
+build_port_security_ip(P_IN, op, lflows, &op->nbsp->header_);
+build_port_security_nd(op, lflows, &op->nbsp->header_);
 }
+ds_destroy(&match);
+ds_destroy(&actions);
+}
 
+static void
+build_lswitch_input_port_sec_od(struct ovn_datapath *od, struct hmap *lflows)
+{
 /* Ingress table 1 and 2: Port security - IP and ND, by default
  * goto next. (priority 0)
  */
-struct ovn_datapath *od;
-HMAP_FOR_EACH (od, key_node, datapaths) {
 if (!od->nbs) {
-continue;
+return;
 }
 
 ovn_lflow_add(lflows, od, S_SWITCH_IN_PORT_SEC_ND, 0, "1", "next;");
 ovn_lflow_add(lflows, od, S_SWITCH_IN_PORT_SEC_IP, 0, "1", "next;");
-}
-
-ds_destroy(&match);
-ds_destroy(&actions);
 }
 
 static void
-build_lswitch_output_port_sec(struct hmap *ports, struct hmap *datapaths,
+build_lswitch_output_port_sec_op(struct ovn_port *op,
   struct hmap *lflows)
 {
 struct ds actions = DS_EMPTY_INITIALIZER;
 struct ds match = DS_EMPTY_INITIALIZER;
-struct ovn_port *op;
 
 /* Egress table 8: Egress port security - IP (priorities 90 and 80)
  * if port security enabled.
@@ -4636,57 +4631,57 @@ build_lswitch_output_port_sec(struct hmap *ports, 
struct hmap *datapaths,
  * Priority 150 rules drop packets to disabled logical ports, so that
  * they don't even receive multicast or broadcast packets.
  */
-HMAP_FOR_EACH (op, key_node, ports) {
-if (!op->nbsp || lsp_is_external(op->nbsp)) {
-continue;
-}
+if (!op->nbsp || lsp_is_external(op->nbsp)) {
+return;
+}
 
-ds_clear(&actions);
-ds_clear(&match);
+ds_clear(&actions);
+ds_clear(&match);
 
-ds_put_format(&match, "outport == %s", op->json_key);
-if (lsp_is_enabled(op->nbsp)) {
-build_port_security_l

[ovs-dev] [PATCH ovn RFC 2/5] Rearrange steps in the lrouter flow generation by iterator

2020-07-10 Thread anton . ivanov
From: Anton Ivanov 

In order to run over steps in parallel we need to have the
per-datapath steps and per-port steps grouped separately
and iterated separately.

This should be OK as the actual results generated by
all the rules are ordered by their table and priority so
the fact that the OD rules are now not in-between the OP
rules should not make a difference for the overall result

Signed-off-by: Anton Ivanov 
---
 northd/ovn-northd.c | 163 +++-
 1 file changed, 9 insertions(+), 154 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 7eb231e62..b41c523f2 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -10385,178 +10385,33 @@ build_lrouter_flows(struct hmap *datapaths, struct 
hmap *ports,
 struct hmap *lflows, struct shash *meter_groups,
 struct hmap *lbs)
 {
-/* This flow table structure is documented in ovn-northd(8), so please
- * update ovn-northd.8.xml if you change anything. */
-
-/* Logical router ingress table 0: Admission control framework. */
 struct ovn_datapath *od;
 HMAP_FOR_EACH (od, key_node, datapaths) {
 build_lrouter_flow_table_0_od(od, lflows);
-}
-
-/* Logical router ingress table 0: match (priority 50). */
-struct ovn_port *op;
-HMAP_FOR_EACH (op, key_node, ports) {
-build_lrouter_flow_table_0_op(op, lflows);
-}
-
-/* Logical router ingress table 1: LOOKUP_NEIGHBOR and
- * table 2: LEARN_NEIGHBOR. */
-HMAP_FOR_EACH (od, key_node, datapaths) {
 build_lrouter_flow_table_1_and_2_od(od, lflows);
-}
-
-HMAP_FOR_EACH (op, key_node, ports) {
-build_lrouter_flow_table_1_and_2_op(op, lflows);
-}
-
-/* Logical router ingress table 3: IP Input. */
-HMAP_FOR_EACH (od, key_node, datapaths) {
 build_lrouter_flow_table_3_od(od, lflows);
-}
-
-/* Logical router ingress table 3: IP Input for IPv4. */
-HMAP_FOR_EACH (op, key_node, ports) {
-build_lrouter_flow_table_3_op(op, lflows);
-}
-
-/* DHCPv6 reply handling */
-HMAP_FOR_EACH (op, key_node, ports) {
-build_lrouter_flow_DHCP_v6_op(op, lflows);
-}
-
-/* Logical router ingress table 1: IP Input for IPv6. */
-HMAP_FOR_EACH (op, key_node, ports) {
-build_lrouter_flow_inpit_for_v6_op(op, lflows);
-}
-
-/* NAT, Defrag and load balancing. */
-HMAP_FOR_EACH (od, key_node, datapaths) {
 build_lrouter_flow_nat_defrag_lb_od(od, lflows, meter_groups, lbs);
-}
-
-/* Logical router ingress table ND_RA_OPTIONS & ND_RA_RESPONSE: IPv6 Router
- * Adv (RA) options and response. */
-HMAP_FOR_EACH (op, key_node, ports) {
-build_lrouter_flow_ingress_ND_RA_op(op, lflows);
-}
-
-/* Logical router ingress table ND_RA_OPTIONS & ND_RA_RESPONSE: RS
- * responder, by default goto next. (priority 0)*/
-HMAP_FOR_EACH (od, key_node, datapaths) {
 build_lrouter_flow_ingress_ND_RA_od(od, lflows);
-}
-
-/* Logical router ingress table IP_ROUTING & IP_ROUTING_ECMP: IP Routing.
- *
- * A packet that arrives at this table is an IP packet that should be
- * routed to the address in 'ip[46].dst'.
- *
- * For regular routes without ECMP, table IP_ROUTING sets outport to the
- * correct output port, eth.src to the output port's MAC address, and
- * '[xx]reg0' to the next-hop IP address (leaving 'ip[46].dst', the
- * packet’s final destination, unchanged), and advances to the next 
table.
- *
- * For ECMP routes, i.e. multiple routes with same policy and prefix, table
- * IP_ROUTING remembers ECMP group id and selects a member id, and advances
- * to table IP_ROUTING_ECMP, which sets outport, eth.src and '[xx]reg0' for
- * the selected ECMP member.
- * */
-HMAP_FOR_EACH (op, key_node, ports) {
-build_lrouter_flow_ingress_ip_routing_ecmp(op, lflows);
-}
-
-/* Convert the static routes to flows. */
-HMAP_FOR_EACH (od, key_node, datapaths) {
 build_lrouter_flow_datapath_to_static_route(od, lflows, ports);
-}
-
-/* IP Multicast lookup. Here we set the output port, adjust TTL and
- * advance to next table (priority 500).
- */
-HMAP_FOR_EACH (od, key_node, datapaths) {
 build_lrouter_flow_multicast_lookup(od, lflows);
-}
-
-/* Logical router ingress table POLICY: Policy.
- *
- * A packet that arrives at this table is an IP packet that should be
- * permitted/denied/rerouted to the address in the rule's nexthop.
- * This table sets outport to the correct out_port,
- * eth.src to the output port's MAC address,
- * and '[xx]reg0' to the next-hop IP address (leaving
- * 'ip[46].dst', the packet’s final destination, unchanged), and
- * advances to the next table for ARP/ND resolution. */
-HMAP_FOR_EACH (od, key_node, datapaths) {
 build_lrouter_flow_ingress_policy(od, lfl

[ovs-dev] [PATCH ovn RFC 3/5] Parallelise lrouter_flow generation

2020-07-10 Thread anton . ivanov
From: Anton Ivanov 

Make the lrouter flow generation run in parallel.

Signed-off-by: Anton Ivanov 
---
 northd/ovn-northd.c | 268 +++-
 1 file changed, 241 insertions(+), 27 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index b41c523f2..53ea35de7 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -47,6 +47,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "fasthmap.h"
 #include "openvswitch/vlog.h"
 
 VLOG_DEFINE_THIS_MODULE(ovn_northd);
@@ -3964,7 +3965,7 @@ ovn_lflow_add_at(struct hmap *lflow_map, struct 
ovn_datapath *od,
 ovn_lflow_init(lflow, od, stage, priority,
xstrdup(match), xstrdup(actions),
ovn_lflow_hint(stage_hint), where);
-hmap_insert(lflow_map, &lflow->hmap_node, ovn_lflow_hash(lflow));
+hmap_insert_fast(lflow_map, &lflow->hmap_node, ovn_lflow_hash(lflow));
 }
 
 /* Adds a row with the specified contents to the Logical_Flow table. */
@@ -10380,43 +10381,250 @@ build_lrouter_egress_delivery(struct ovn_port *op, 
struct hmap *lflows)
 ds_destroy(&actions);
 ds_destroy(&match);
 }
+
+struct lrouter_flow_build_info {
+struct hmap *datapaths;
+struct hmap *ports;
+struct hmap *lflows;
+struct shash *meter_groups;
+struct hmap *lbs;
+};
+
+static void
+build_lrouter_flows_od(struct ovn_datapath *od,
+struct lrouter_flow_build_info *lfbi)
+{ 
+build_lrouter_flow_table_0_od(od, lfbi->lflows);
+build_lrouter_flow_table_1_and_2_od(od, lfbi->lflows);
+build_lrouter_flow_table_3_od(od, lfbi->lflows);
+build_lrouter_flow_nat_defrag_lb_od(od, lfbi->lflows, lfbi->meter_groups, 
lfbi->lbs);
+build_lrouter_flow_ingress_ND_RA_od(od, lfbi->lflows);
+build_lrouter_flow_datapath_to_static_route(od, lfbi->lflows, lfbi->ports);
+build_lrouter_flow_multicast_lookup(od, lfbi->lflows);
+build_lrouter_flow_ingress_policy(od, lfbi->lflows, lfbi->ports);
+build_lrouter_flow_dest_unreachable(od, lfbi->lflows);
+build_lrouter_arp_resolve_od(od, lfbi->lflows);
+build_lrouter_check_pck_len_od(od, lfbi->lflows, lfbi->ports);
+build_lrouter_gw_redirect_od(od, lfbi->lflows);
+build_lrouter_arp_request_od(od, lfbi->lflows);
+}
+
+static void
+build_lrouter_flows_op(struct ovn_port *op,
+struct lrouter_flow_build_info *lfbi)
+{
+build_lrouter_flow_table_0_op(op, lfbi->lflows);
+build_lrouter_flow_table_1_and_2_op(op, lfbi->lflows);
+build_lrouter_flow_table_3_op(op, lfbi->lflows);
+build_lrouter_flow_DHCP_v6_op(op, lfbi->lflows);
+build_lrouter_flow_inpit_for_v6_op(op, lfbi->lflows);
+build_lrouter_flow_ingress_ND_RA_op(op, lfbi->lflows);
+build_lrouter_flow_ingress_ip_routing_ecmp(op, lfbi->lflows);
+build_lrouter_arp_resolve_op(op, lfbi->lflows, lfbi->ports);
+build_lrouter_egress_delivery(op, lfbi->lflows);
+}
+
+struct lrouter_thread_od_pool {
+void (*od_helper_func)(struct ovn_datapath *od,
+struct lrouter_flow_build_info *lfbi);
+struct worker_pool *pool;
+};
+
+static void *build_lrouter_flows_od_thread(void *arg) {
+struct worker_control *control = (struct worker_control *) arg;
+struct lrouter_thread_od_pool *workload;
+struct lrouter_flow_build_info *lfbi;
+struct ovn_datapath *od;
+int bnum;
+
+
+while (!seize_fire()) {
+sem_wait(&control->fire);
+workload = (struct lrouter_thread_od_pool *) control->workload;
+lfbi = (struct lrouter_flow_build_info *) control->data;
+if (lfbi && workload) {
+for (bnum = control->id;
+bnum <= lfbi->datapaths->mask;
+bnum += workload->pool->size)
+{
+HMAP_FOR_EACH_IN_PARALLEL (
+od, key_node, bnum, lfbi->datapaths) {
+if (seize_fire()) {
+return NULL;
+}
+(workload->od_helper_func)(od, lfbi);
+}
+}
+atomic_store_relaxed(&control->finished, true);
+atomic_thread_fence(memory_order_release);
+}
+sem_post(control->done);
+}
+return NULL;
+}
+
+struct lrouter_thread_op_pool {
+void (*op_helper_func)(struct ovn_port *op,
+struct lrouter_flow_build_info *lfbi);
+struct worker_pool *pool;
+};
+
+static void *build_lrouter_flows_op_thread(void *arg) {
+struct worker_control *control = (struct worker_control *) arg;
+struct lrouter_thread_op_pool *workload;
+struct lrouter_flow_build_info *lfbi;
+struct ovn_port *op;
+int bnum;
+
+while (!seize_fire()) {
+sem_wait(&control->fire);
+workload = (struct lrouter_thread_op_pool *) control->workload;
+lfbi = (struct lrouter_flow_build_info *) control->data;
+if (lfbi && workload) {
+for (bnum = control->id;
+bnum <= lfbi->ports->m

[ovs-dev] [PATCH ovn RFC 5/5] Parallelised reconciliation of flows

2020-07-10 Thread anton . ivanov
From: Anton Ivanov 

Add single-thread-on-small-number/multi-thread-on-large-number
processing to the reconciliation of flows in ovn-northd.

Adjust the thresholds to more conservative values

Signed-off-by: Anton Ivanov 
---
 northd/ovn-northd.c | 212 +++-
 1 file changed, 189 insertions(+), 23 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 61aecb80e..c8ce16708 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -4001,7 +4001,9 @@ static void
 ovn_lflow_destroy(struct hmap *lflows, struct ovn_lflow *lflow)
 {
 if (lflow) {
-hmap_remove(lflows, &lflow->hmap_node);
+if (lflows) {
+hmap_remove(lflows, &lflow->hmap_node);
+}
 free(lflow->match);
 free(lflow->actions);
 free(lflow->stage_hint);
@@ -6920,11 +6922,11 @@ build_lswitch_flows_step_1_op(struct ovn_port *op,
 build_lswitch_output_port_sec_op(op, lsi->lflows);
 }
 
-#define OD_CUTOFF 16
+#define OD_CUTOFF 1024
 /* This is probably still too low for ports, not sure if there is even
  * a point to run them in parallel and at what point it should kick in.
  */
-#define OP_CUTOFF 1
+#define OP_CUTOFF 32
 
 
 struct lswitch_thread_od_pool {
@@ -10697,6 +10699,9 @@ static void *build_lrouter_flows_od_thread(void *arg) {
 
 while (!seize_fire()) {
 sem_wait(&control->fire);
+if (seize_fire()) {
+return NULL;
+}
 workload = (struct lrouter_thread_od_pool *) control->workload;
 lfbi = (struct lrouter_flow_build_info *) control->data;
 if (lfbi && workload) {
@@ -10735,6 +10740,9 @@ static void *build_lrouter_flows_op_thread(void *arg) {
 
 while (!seize_fire()) {
 sem_wait(&control->fire);
+if (seize_fire()) {
+return NULL;
+}
 workload = (struct lrouter_thread_op_pool *) control->workload;
 lfbi = (struct lrouter_flow_build_info *) control->data;
 if (lfbi && workload) {
@@ -10874,7 +10882,124 @@ build_lrouter_flows(struct hmap *datapaths, struct 
hmap *ports,
 free(lfbi);
 }
 
-static ssize_t max_seen_lflow_size = 128;
+struct sbrec_result {
+struct ovs_list list_node;
+const struct sbrec_logical_flow *sbflow;
+struct ovn_lflow *lflow;
+ssize_t lflow_hash;
+};
+
+struct reconcile_info {
+struct northd_context *ctx;
+struct hmap *lflows;
+struct hmap *datapaths;
+struct ovs_list results;
+};
+
+struct lflow_reconciliation_pool {
+struct worker_pool *pool;
+};
+
+static void *reconciliation_thread(void *arg) {
+struct worker_control *control = (struct worker_control *) arg;
+struct lflow_reconciliation_pool *workload;
+struct reconcile_info *ri;
+struct sbrec_result *res;
+
+while (!seize_fire()) {
+sem_wait(&control->fire);
+if (seize_fire()) {
+return NULL;
+}
+workload = (struct lflow_reconciliation_pool *) control->workload;
+ri = (struct reconcile_info *) control->data;
+if (ri && workload) {
+/* Push changes to the Logical_Flow table to database. */
+const struct sbrec_logical_flow *sbflow;
+SBREC_LOGICAL_FLOW_PARALLEL_FOR_EACH(sbflow, ri->ctx->ovnsb_idl, 
control->id,  workload->pool->size) {
+struct ovn_datapath *od
+= ovn_datapath_from_sbrec(ri->datapaths, 
sbflow->logical_datapath);
+res = xmalloc(sizeof(struct sbrec_result));
+
+if (!od || ovn_datapath_is_stale(od)) {
+res->sbflow = sbflow;
+res->lflow = NULL;
+ovs_list_push_back(&ri->results, &res->list_node);
+continue;
+}
+
+enum ovn_datapath_type dp_type = od->nbs ? DP_SWITCH : 
DP_ROUTER;
+enum ovn_pipeline pipeline
+= !strcmp(sbflow->pipeline, "ingress") ? P_IN : P_OUT;
+struct ovn_lflow *lflow = ovn_lflow_find(
+ri->lflows, od, ovn_stage_build(dp_type, pipeline, 
sbflow->table_id),
+sbflow->priority, sbflow->match, sbflow->actions, 
sbflow->hash);
+if (lflow) {
+res->lflow = lflow;
+res->lflow_hash = lflow->hmap_node.hash;
+res->sbflow = sbflow;
+} else {
+res->sbflow = sbflow;
+res->lflow = NULL;
+}
+ovs_list_push_back(&ri->results, &res->list_node);
+}
+atomic_store_relaxed(&control->finished, true);
+atomic_thread_fence(memory_order_release);
+}
+sem_post(control->done);
+}
+return NULL;
+}
+
+static struct lflow_reconciliation_pool *reconcile_pool = NULL;
+
+static void init_reconciliation_pool(void) {
+
+int index;
+
+if (!reconcile_pool) {
+reconcile_pool =
+

Re: [ovs-dev] [PATCH v1 2/6] dpif-netdev: add tunnel_valid flag to skip ip/ipv6 address comparison

2020-07-10 Thread Yanqin Wei
Hi Ilya,

> >
> >>> ---
> >>
> >> Hi.
> >> First of all, thanks for working on performance improvements!
> > Thanks, I saw some slides where OVS was used to compare flow scalability
> with other projects. It inspired me to optimize this code.
> >
> >>
> >> However, this doesn't look as a clean patch.
> > There are some trade-off for legacy code.
> 
> What trade-offs?
In some function, the parameter is struct flow_tnl instead of 'pkt_metadata' or 
'flow'. In these function, tunnel_valid cannot be used for valid check. So some 
function signatures need be modified.
> 
> >>
> >> Why we need both pkt_metadata_datapath_init() and pkt_metadata_init() ?
> >> Why we can't just not initialize ip_dst and use tunnel_valid flag 
> >> everywhere?
> >
> > This patch wants to reduce the scope of modification( only for fastpath),
> because performance is not critical for slow path. So tunnel dst@ is set 
> before
> leaving fast path(upcall).
> > Another reason is 'flow_tnl' member is defined in both ' pkt_metadata' and
> 'flow'.  If tunnel_valid flag is introduced into 'flow', the layout and  
> legacy flow
> API also need to be modified.
> 
> I understand that you didn't want to touch anything beside the performance
> critical parts.  However, dp_packet_/pkt_ API is already heavily overloaded
> and having a few very similar functions that can or can not be used in some
> contexts makes things even more complicated.  It's hard to read and maintain.
> And it's prone to errors in case someone will try t modify datapath code.
> I'd prefer not t have different initialization functions and only have one 
> variant.
> This will also solve the issue that every other part of code uses tunneling
> metadata without checking 'tunnel_valid' flag.  This is actually a logical
> mistake.
OK, it makes sense. I'll check all the places where flow_tnl is used and update 
v2 for tunnel_valid checking. 

> And yes, 'tunnel_valid' flag really needs a comment inside the structure
> definition.
OK, will add comment in V2.
> 
> >
> >>
> >> Current version complicates code making it less readable and prone to
> errors.
> > Do you prefer to use tunnel_valid in both fast path and slow path? I could
> send v2 for this modification.
> >
> >> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] ovsdb-tool: Add a db consistency check to the ovsdb-tool check-cluster command

2020-07-10 Thread Federico Paolinelli
There are some occurrences where the database ends up in an inconsistent
state. This happened in ovn-k8s and is described in [0].
Here we are adding a supported way to check that a given db is consistent,
which is less error prone than checking the logs.

Tested against both a valid db and a corrupted db attached to the
above bug [1].

[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
[1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595

Signed-off-by: Federico Paolinelli 
Suggested-by: Dumitru Ceara 
---
 ovsdb/ovsdb-tool.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
index 91662cab8..016a3ba28 100644
--- a/ovsdb/ovsdb-tool.c
+++ b/ovsdb/ovsdb-tool.c
@@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx)
 }
 }

+/* Check for db consistency:
+ * The serverid must be in the servers list.
+ */
+
+for (struct server *s = c.servers; s < &c.servers[c.n_servers]; s++) {
+struct shash *servers_obj = json_object(s->snap->servers);
+char *server_id = xasprintf(SID_FMT, SID_ARGS(&s->header.sid));
+bool found = false;
+const struct shash_node *node;
+
+SHASH_FOR_EACH (node, servers_obj) {
+if (!strncmp(server_id, node->name, SID_LEN)) {
+found = true;
+}
+}
+if (!found) {
+ovs_fatal(0, "%s: server %s not found in server list",
+  s->filename, server_id);
+}
+free(server_id);
+}
+
 /* Clean up. */

 for (size_t i = 0; i < c.n_servers; i++) {
-- 
2.26.2
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovsdb-tool: Add a db consistency check to the ovsdb-tool check-cluster command

2020-07-10 Thread Federico Paolinelli
On Thu, Jul 9, 2020 at 9:13 PM Dumitru Ceara  wrote:
>
> On 7/9/20 6:04 PM, Federico Paolinelli wrote:
> > There are some occurrences where the database ends up in an inconsistent
> > state. This happened in ovn-k8s and is described in
> > https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23.
> > Here we are adding a supported way to check that a given db is consistent,
> > which is less error prone than checking the logs.
> >
> > This was only tested against a valid database, as did not manage to get a
> > corrupted one.
>
> Hi Federico,
>
> The NB DB [0] on master-3 in the BZ [1] is an example of corrupted DB.
>

Ah, right. I forgot about it, thanks for taking care!

> I tested your patch with it and I get:
>
> ovsdb-tool check-cluster /tmp/kni1-vmaster-3-ovnnb_db.db
>
> ovsdb-tool: /tmp/kni1-vmaster-3-ovnnb_db.db: server d5db not found in
> server list
>
> Tested-by: Dumitru Ceara 
>
> I do have a few more minor comments on the patch itself.
>
> Thanks,
> Dumitru
>
> [0] https://bugzilla.redhat.com/attachment.cgi?id=1697595
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
>
> >
> > Signed-off-by: Federico Paolinelli 
> > Suggested-by: Dumitru Ceara 
> > ---
> >  ovsdb/ovsdb-tool.c | 21 +
> >  1 file changed, 21 insertions(+)
> >
> > diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
> > index 91662cab8..d5ada0c2d 100644
> > --- a/ovsdb/ovsdb-tool.c
> > +++ b/ovsdb/ovsdb-tool.c
> > @@ -1497,6 +1497,27 @@ do_check_cluster(struct ovs_cmdl_context *ctx)
> >  }
> >  }
> >
> > +/* Check for db consistency:
> > + * The serverid must be in the servers list
>
> Please add a '.' at the end of the sentence in the comment.
>
> > + */
> > +
> > +for (struct server *s = c.servers; s < &c.servers[c.n_servers]; s++) {
> > +struct shash *servers_obj = json_object(s->snap->servers);
> > +char *server_id = xasprintf(SID_FMT, SID_ARGS(&s->header.sid));
> > +bool found = false;
> > +const struct shash_node *node;
>
> Please add a blank line for readability.
>
> > +SHASH_FOR_EACH (node, servers_obj) {
> > +if (!strncmp(server_id, node->name, SID_LEN)) {
> > +found = true;
> > +}
> > +}
> > +if (!found) {
> > +ovs_fatal(0, "%s: server %s not found in server list",
> > +  s->filename, server_id);
>
> This should be indented such that the arguments on the second line are
> aligned right after the '(' above.
>
> > +}
> > +free(server_id);
> > +}
> > +
> >  /* Clean up. */
> >
> >  for (size_t i = 0; i < c.n_servers; i++) {
> >
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev