Re: [ovs-dev] [PATCH] dpif-netdev: calculate per numa variance

2022-12-09 Thread Cheng Li
On Fri, Dec 09, 2022 at 03:32:06PM +, Kevin Traynor wrote:
> On 04/12/2022 11:16, Cheng Li wrote:
> >Currently, pmd_rebalance_dry_run() calculate overall variance of
> >all pmds regardless of their numa location. The overall result may
> >hide un-balance in an individual numa.
> >
> 
> Hi. Thanks, the idea looks a good one. I didn't test yet, couple of
> comments on the code below.

Hi Kevin, thanks for the review.

> 
> >Considering the following case. Numa 0 is free because VMs on numa0
> >are not sending pkts, while numa 1 is busy. Within numa 1, pmds
> >workloads are not balanced. Obviously, moving 500 kpps workloads from
> >pmd 126 to pmd 61 will make numa1 much more balance. For numa1
> >the variance improment will be almost 100%, because after rebalance
> >each pmd in numa1 holds same workload(variance ~= 0). But the overall
> >variance improvement is only about 20%, which may not tigger auto_lb.
> >
> >```
> >numa_id   core_id  kpps
> >   030 0
> >   031 0
> >   094 0
> >   095 0
> >   1   126  1500
> >   1   127  1000
> >   163  1000
> >   162   500
> >```
> >
> >As auto_lb doesn't work if any coss_numa rxq exists, it means that
> 
> That's not fully true. It can run with cross numa in a very limited
> circumstances where there is only PMD active on 1 numa.
> 
> It doesn't diminish the idea here, but just best not write that
> blanket statement as it may confuse.

Makes sense, I will reword the description.

> 
> >auto_lb only balance rxq assignment within each numa. So it makes
> >more sense to calculate variance improvemnet per numa node.
> >
> >Signed-off-by: Cheng Li 
> >---
> >  lib/dpif-netdev.c | 90 
> > +--
> >  1 file changed, 47 insertions(+), 43 deletions(-)
> >
> 
> I think at least some docs would need to be updated to say it's
> variance per NUMA.

After going through the doc directory, I found two pleases:
1. Documentation/topics/dpdk/pmd.rst
2. NEWS

> 
> >diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >index 2c08a71..6a53f13 100644
> >--- a/lib/dpif-netdev.c
> >+++ b/lib/dpif-netdev.c
> >@@ -6076,39 +6076,33 @@ rxq_scheduling(struct dp_netdev *dp)
> >  static uint64_t variance(uint64_t a[], int n);
> >  static uint64_t
> >-sched_numa_list_variance(struct sched_numa_list *numa_list)
> >+sched_numa_variance(struct sched_numa *numa)
> >  {
> >-struct sched_numa *numa;
> >  uint64_t *percent_busy = NULL;
> >-unsigned total_pmds = 0;
> >  int n_proc = 0;
> >  uint64_t var;
> >-HMAP_FOR_EACH (numa, node, _list->numas) {
> >-total_pmds += numa->n_pmds;
> >-percent_busy = xrealloc(percent_busy,
> >-total_pmds * sizeof *percent_busy);
> >+percent_busy = xmalloc(numa->n_pmds * sizeof *percent_busy);
> >-for (unsigned i = 0; i < numa->n_pmds; i++) {
> >-struct sched_pmd *sched_pmd;
> >-uint64_t total_cycles = 0;
> >+for (unsigned i = 0; i < numa->n_pmds; i++) {
> >+struct sched_pmd *sched_pmd;
> >+uint64_t total_cycles = 0;
> >-sched_pmd = >pmds[i];
> >-/* Exclude isolated PMDs from variance calculations. */
> >-if (sched_pmd->isolated == true) {
> >-continue;
> >-}
> >-/* Get the total pmd cycles for an interval. */
> >-atomic_read_relaxed(_pmd->pmd->intrvl_cycles, 
> >_cycles);
> >+sched_pmd = >pmds[i];
> >+/* Exclude isolated PMDs from variance calculations. */
> >+if (sched_pmd->isolated == true) {
> >+continue;
> >+}
> >+/* Get the total pmd cycles for an interval. */
> >+atomic_read_relaxed(_pmd->pmd->intrvl_cycles, _cycles);
> >-if (total_cycles) {
> >-/* Estimate the cycles to cover all intervals. */
> >-total_cycles *= PMD_INTERVAL_MAX;
> >-percent_busy[n_proc++] = (sched_pmd->pmd_proc_cycles * 100)
> >- / total_cycles;
> >-} else {
> >-percent_busy[n_proc++] = 0;
> >-}
> >+if (total_cycles) {
> >+/* Estimate the cycles to cover all intervals. */
> >+total_cycles *= PMD_INTERVAL_MAX;
> >+percent_busy[n_proc++] = (sched_pmd->pmd_proc_cycles * 100)
> >+/ total_cycles;
> >+} else {
> >+percent_busy[n_proc++] = 0;
> >  }
> >  }
> >  var = variance(percent_busy, n_proc);
> >@@ -6182,6 +6176,7 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
> >  struct sched_numa_list numa_list_est;
> >  bool thresh_met = false;
> >  uint64_t current_var, estimate_var;
> >+struct sched_numa *numa_cur, *numa_est;
> >  uint64_t improvement = 0;
> >  

[ovs-dev] [PATCH 6/6] rhel: Enable AF_XDP by default in Fedora builds.

2022-12-09 Thread Ilya Maximets
All supported versions of Fedora do package libxdp and libbpf, so it
makes sense to enable AF_XDP support.

Control files for debian packaging are much less flexible, so its hard
to enable AF_XDP builds while not breaking builds for version of Ubuntu
and Debian that do not package libbpf or libxdp.

Signed-off-by: Ilya Maximets 
---
 rhel/openvswitch-fedora.spec.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index fbfcdcf63..7676eb737 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -26,8 +26,8 @@
 %bcond_without libcapng
 # To enable DPDK support, specify '--with dpdk' when building
 %bcond_with dpdk
-# To enable AF_XDP support, specify '--with afxdp' when building
-%bcond_with afxdp
+# To disable AF_XDP support, specify '--without afxdp' when building
+%bcond_without afxdp
 
 # If there is a need to automatically enable the package after installation,
 # specify the "--with autoenable"
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/6] acinclude.m4: Build with AF_XDP support by default if possible.

2022-12-09 Thread Ilya Maximets
With this change we will try to detect all the netdev-afxdp
dependencies and enable AF_XDP support by default if they are
present at the build time.

Configuration script behaves in a following way:

 - ./configure --enable-afxdp

   Will check for AF_XDP dependencies and fail if they are
   not available.

 - ./configure --disable-afxdp

   Disables checking for AF_XDP.  Build will not support
   AF_XDP even if all dependencies are installed.

 - Just ./configure or ./configure --enable-afxdp=auto

   Will check for AF_XDP dependencies.  Will print a warning
   if they are not available, but will continue without AF_XDP
   support.  If dependencies are available in a system, this
   option is equal to --enable-afxdp, except that AF_XDP will
   not be enabled for libbpf >= 0.7 if libxdp is not available,
   to avoid deprecation warnings during the build.

'--disable-afxdp' added to the debian and fedora package builds
to keep predictable behavior.

Signed-off-by: Ilya Maximets 
---
 Documentation/intro/install/afxdp.rst |  3 +-
 NEWS  |  3 +
 acinclude.m4  | 89 ++-
 debian/rules  |  8 ++-
 rhel/openvswitch-fedora.spec.in   |  2 +
 5 files changed, 72 insertions(+), 33 deletions(-)

diff --git a/Documentation/intro/install/afxdp.rst 
b/Documentation/intro/install/afxdp.rst
index bfef49860..84f8aa5d7 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -30,8 +30,7 @@ This document describes how to build and install Open vSwitch 
using
 AF_XDP netdev.
 
 .. warning::
-  The AF_XDP support of Open vSwitch is considered 'experimental',
-  and it is not compiled in by default.
+  The AF_XDP support of Open vSwitch is considered 'experimental'.
 
 
 Introduction
diff --git a/NEWS b/NEWS
index 5d39c7d27..d2bbae591 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,9 @@ Post-v3.0.0
 
- AF_XDP:
  * Added support for building with libxdp and libbpf >= 0.7.
+ * Support for AF_XDP is now enabled by default if all dependencies are
+   available at the build time.  Use --disable-afxdp to disable.
+   Use --enable-afxdp to fail the build if dependencies are not present.
- ovs-appctl:
  * "ovs-appctl ofproto/trace" command can now display port names with the
"--names" option.
diff --git a/acinclude.m4 b/acinclude.m4
index aed01c967..8411c0e6c 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -253,39 +253,72 @@ dnl OVS_CHECK_LINUX_AF_XDP
 dnl
 dnl Check both Linux kernel AF_XDP and libbpf/libxdp support
 AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
-  AC_ARG_ENABLE([afxdp],
-[AS_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
-[], [enable_afxdp=no])
+  AC_ARG_ENABLE(
+[afxdp],
+[AS_HELP_STRING([--disable-afxdp], [Disable AF-XDP support])],
+[case "${enableval}" in
+   (yes | no | auto) ;;
+   (*) AC_MSG_ERROR([bad value ${enableval} for --enable-afxdp]) ;;
+ esac],
+[enable_afxdp=auto])
+
   AC_MSG_CHECKING([whether AF_XDP is enabled])
-  if test "$enable_afxdp" != yes; then
+  if test "$enable_afxdp" == no; then
 AC_MSG_RESULT([no])
 AF_XDP_ENABLE=false
   else
-AC_MSG_RESULT([yes])
+AC_MSG_RESULT([$enable_afxdp])
 AF_XDP_ENABLE=true
-
-AC_CHECK_HEADER([bpf/libbpf.h], [],
-  [AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDP support])])
-
-AC_CHECK_HEADER([linux/if_xdp.h], [],
-  [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
-
-AC_CHECK_HEADER([xdp/xsk.h],
-  AC_DEFINE([HAVE_LIBXDP], [1], [xsk.h is supplied with libxdp]),
-  AC_CHECK_HEADER([bpf/xsk.h], [],
-[AC_MSG_ERROR([unable to find xsk.h for AF_XDP support])]))
-
-AC_CHECK_FUNCS([pthread_spin_lock], [],
-  [AC_MSG_ERROR([unable to find pthread_spin_lock for AF_XDP support])])
-
-OVS_FIND_DEPENDENCY([numa_alloc_onnode], [numa], [libnuma])
-OVS_FIND_DEPENDENCY([libbpf_strerror], [bpf], [libbpf])
-AC_SEARCH_LIBS([libxdp_strerror], [xdp])
-
-AC_CHECK_FUNCS([bpf_xdp_query_id bpf_xdp_detach])
-
-AC_DEFINE([HAVE_AF_XDP], [1],
-  [Define to 1 if AF_XDP support is available and enabled.])
+failed_dep=none
+dnl Saving libs to restore in case we will end up not building with AF_XDP.
+save_LIBS=$LIBS
+
+AC_CHECK_HEADER([bpf/libbpf.h], [], [failed_dep="bpf/libbpf.h"])
+
+if test "$failed_dep" = none; then
+  AC_CHECK_HEADER([linux/if_xdp.h], [], [failed_dep="linux/if_xdp.h"])
+fi
+
+if test "$failed_dep" = none; then
+  AC_CHECK_HEADER([xdp/xsk.h],
+AC_DEFINE([HAVE_LIBXDP], [1], [xsk.h is supplied with libxdp]),
+AC_CHECK_HEADER([bpf/xsk.h], [], [failed_dep="xsk.h"]))
+fi
+
+if test "$failed_dep" = none; then
+  AC_CHECK_FUNCS([pthread_spin_lock], [], [failed_dep="pthread_spin_lock"])
+fi
+
+if test "$failed_dep" = none; then
+  

[ovs-dev] [PATCH 5/6] Documentation/afxdp: Use packaged libbpf/libxdp for the build.

2022-12-09 Thread Ilya Maximets
Necessary bits was removed from the kernel's libbpf in 6.0 release,
so the instructions on how to build libbpf from kernel sources are
now incorrect.  Suggest to use libbpf and libxdp packaged by
distributions instead.

Signed-off-by: Ilya Maximets 
---
 Documentation/intro/install/afxdp.rst | 40 +++
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/Documentation/intro/install/afxdp.rst 
b/Documentation/intro/install/afxdp.rst
index 84f8aa5d7..51c24bf5b 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -87,7 +87,7 @@ Build requirements
 In addition to the requirements described in :doc:`general`, building Open
 vSwitch with AF_XDP will require the following:
 
-- libbpf from kernel source tree (kernel 5.0.0 or later)
+- ``libbpf`` and ``libxdp`` (if version of ``libbpf`` if higher than ``0.6``).
 
 - Linux kernel XDP support, with the following options (required)
 
@@ -124,40 +124,20 @@ vSwitch with AF_XDP will require the following:
 Installing
 --
 For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support.
-First, clone a recent version of Linux bpf-next tree::
 
-  git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
+First, install ``libbpf`` and ``libxdp``.  For example, on Fedora these
+libraries along with development headers can be obtained by installing
+``libbpf-devel`` and ``libxdp-devel`` packages.  For Ubuntu that will be
+``libbpf-dev`` package with additional ``libxdp-dev`` on Ubuntu 22.10
+or later.
 
-Second, go into the Linux source directory and build libbpf in the tools
-directory::
-
-  cd bpf-next/
-  cd tools/lib/bpf/
-  make && make install
-  make install_headers
-
-.. note::
-   Make sure xsk.h and bpf.h are installed in system's library path,
-   e.g. /usr/local/include/bpf/ or /usr/include/bpf/
-
-Make sure the libbpf.so is installed correctly::
-
-  ldconfig
-  ldconfig -p | grep libbpf
-
-.. note::
-   Check /etc/ld.so.conf if libbpf is installed but can not be found by
-   ldconfig.
-
-Third, ensure the standard OVS requirements are installed and
+Next, ensure the standard OVS requirements are installed and
 bootstrap/configure the package::
 
   ./boot.sh && ./configure --enable-afxdp
 
-.. note::
-   If you encounter "WARNING: bpf/libbpf.h: present but cannot be compiled",
-   check the Linux headers are in line with libbpf. For example, in Ubuntu,
-   check the installed linux-headers* and linux-libc-dev* dpkg.
+``--enable-afxdp`` here is optional, but it will ensure that all dependencies
+are available at the build time.
 
 Finally, build and install OVS::
 
@@ -181,7 +161,7 @@ If a test case fails, check the log at::
 
 Setup AF_XDP netdev
 ---
-Before running OVS with AF_XDP, make sure the libbpf, libelf, and libnuma are
+Before running OVS with AF_XDP, make sure the libbpf and libnuma are
 set-up right::
 
   ldd vswitchd/ovs-vswitchd
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 4/6] github: Test AF_XDP build using libbpf instead of kernel sources.

2022-12-09 Thread Ilya Maximets
AF_XDP bits was removed from kernel's libbpf in 6.0.  libbpf
and libxdp are now primary way to build AF_XDP applications.
Most of modern distributions are already packaging some version
of libbpf, so it's better to test building with it instead
of building old unsupported kernel tree.

All the jobs will now try to build with AF_XDP support.  One
AF_XDP-specific job kept to fail explicitly if something is
wrong in the configuration process, others will just skip
building AF_XDP support in case of missing dependencies.

Ubuntu started packaging libxdp only in 22.10, so not using
it for now.

Signed-off-by: Ilya Maximets 
---
 .ci/linux-build.sh   | 22 ++
 .github/workflows/build-and-test.yml |  9 +++--
 2 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 1f8896423..2904aadd2 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -76,22 +76,6 @@ function install_kernel()
 make net/bridge/
 fi
 
-if [ "$AFXDP" ]; then
-sudo make headers_install INSTALL_HDR_PATH=/usr
-pushd tools/lib/bpf/
-# Bulding with gcc because there are some issues in make files
-# that breaks building libbpf with clang on Travis.
-CC=gcc sudo make install
-CC=gcc sudo make install_headers
-sudo ldconfig
-popd
-# The Linux kernel defines __always_inline in stddef.h (283d7573), and
-# sys/cdefs.h tries to re-define it.  Older libc-dev package in xenial
-# doesn't have a fix for this issue.  Applying it manually.
-sudo sed -i '/^# define __always_inline .*/i # undef __always_inline' \
-/usr/include/x86_64-linux-gnu/sys/cdefs.h || true
-EXTRA_OPTS="${EXTRA_OPTS} --enable-afxdp"
-fi
 popd
 }
 
@@ -247,10 +231,8 @@ elif [ "$M32" ]; then
 export CC="$CC -m32"
 elif [ "$TRAVIS_ARCH" != "aarch64" ]; then
 EXTRA_OPTS="$EXTRA_OPTS --enable-sparse"
-if [ "$AFXDP" ]; then
-# netdev-afxdp uses memset for 64M for umem initialization.
-SPARSE_FLAGS="${SPARSE_FLAGS} -Wno-memcpy-max-count"
-fi
+# netdev-afxdp uses memset for 64M for umem initialization.
+SPARSE_FLAGS="${SPARSE_FLAGS} -Wno-memcpy-max-count"
 CFLAGS_FOR_OVS="${CFLAGS_FOR_OVS} ${SPARSE_FLAGS}"
 fi
 
diff --git a/.github/workflows/build-and-test.yml 
b/.github/workflows/build-and-test.yml
index e08d7b1ba..5f28493c7 100644
--- a/.github/workflows/build-and-test.yml
+++ b/.github/workflows/build-and-test.yml
@@ -8,8 +8,7 @@ jobs:
   dependencies: |
 automake libtool gcc bc libjemalloc2 libjemalloc-dev\
 libssl-dev llvm-dev libelf-dev libnuma-dev libpcap-dev  \
-ninja-build selinux-policy-dev
-  AFXDP:   ${{ matrix.afxdp }}
+ninja-build selinux-policy-dev libbpf-dev
   ASAN:${{ matrix.asan }}
   UBSAN:   ${{ matrix.ubsan }}
   CC:  ${{ matrix.compiler }}
@@ -65,11 +64,9 @@ jobs:
 libs: -ljemalloc
 
   - compiler: gcc
-afxdp:afxdp
-kernel:   5.3
+opts: --enable-afxdp
   - compiler: clang
-afxdp:afxdp
-kernel:   5.3
+opts: --enable-afxdp
 
   - compiler: gcc
 dpdk: dpdk
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/6] netdev-afxdp: Allow building with libxdp and newer libbpf.

2022-12-09 Thread Ilya Maximets
AF_XDP functions was deprecated in libbpf 0.7 and moved to libxdp.
Functions bpf_get/set_link_xdp_id() was deprecated in libbpf 0.8
and replaced with bpf_xdp_query_id() and bpf_xdp_attach/detach().

Updating configuration and source code to accommodate above changes
and allow building OVS with AF_XDP support on newer systems:

 - Checking availability of the libxdp in a system by looking
   for a library providing libxdp_strerror().

 - Checking for xsk.h header provided by libxdp-dev[el] first,
   fall back to xsk.h from libbpf if not found.

 - Check for the NEED_WAKEUP feature replaced with direct checking
   in the source code if XDP_USE_NEED_WAKEUP is defined.

 - Checking availability of bpf_xdp_query_id and bpf_xdp_detach
   and using them instead of deprecated APIs.  Fall back to old
   functions if not found.

 - Dropped LIBBPF_LDADD variable as it makes library and function
   detection much harder without providing any actual benefits.
   AC_SEARCH_LIBS is used instead and it allows use of AC_CHECK_FUNCS.

 - Header includes moved around to files where they are actually used.

 - Removed libelf dependency as it is not really used.

With these changes it should be possible to build OVS with either:

 - libbpf built from the kernel sources (5.19 or older).
 - libbpf < 0.7 provided in distributions.
 - libxdp and libbpf >= 0.7 provided in newer distributions.

libxdp added as a build dependency for Fedora build since all
supported versions of Fedora are packaging this library.

Signed-off-by: Ilya Maximets 
---
 NEWS|  2 ++
 acinclude.m4| 21 +-
 lib/automake.mk |  1 -
 lib/libopenvswitch.pc.in|  2 +-
 lib/netdev-afxdp-pool.c |  2 ++
 lib/netdev-afxdp-pool.h |  5 -
 lib/netdev-afxdp.c  | 38 ++---
 rhel/openvswitch-fedora.spec.in |  2 +-
 8 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/NEWS b/NEWS
index 265375e1c..5d39c7d27 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,7 @@
 Post-v3.0.0
 
+   - AF_XDP:
+ * Added support for building with libxdp and libbpf >= 0.7.
- ovs-appctl:
  * "ovs-appctl ofproto/trace" command can now display port names with the
"--names" option.
diff --git a/acinclude.m4 b/acinclude.m4
index aa9af5506..aed01c967 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -251,7 +251,7 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
 
 dnl OVS_CHECK_LINUX_AF_XDP
 dnl
-dnl Check both Linux kernel AF_XDP and libbpf support
+dnl Check both Linux kernel AF_XDP and libbpf/libxdp support
 AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
   AC_ARG_ENABLE([afxdp],
 [AS_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
@@ -270,23 +270,22 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
 AC_CHECK_HEADER([linux/if_xdp.h], [],
   [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
 
-AC_CHECK_HEADER([bpf/xsk.h], [],
-  [AC_MSG_ERROR([unable to find bpf/xsk.h for AF_XDP support])])
+AC_CHECK_HEADER([xdp/xsk.h],
+  AC_DEFINE([HAVE_LIBXDP], [1], [xsk.h is supplied with libxdp]),
+  AC_CHECK_HEADER([bpf/xsk.h], [],
+[AC_MSG_ERROR([unable to find xsk.h for AF_XDP support])]))
 
 AC_CHECK_FUNCS([pthread_spin_lock], [],
   [AC_MSG_ERROR([unable to find pthread_spin_lock for AF_XDP support])])
 
 OVS_FIND_DEPENDENCY([numa_alloc_onnode], [numa], [libnuma])
+OVS_FIND_DEPENDENCY([libbpf_strerror], [bpf], [libbpf])
+AC_SEARCH_LIBS([libxdp_strerror], [xdp])
+
+AC_CHECK_FUNCS([bpf_xdp_query_id bpf_xdp_detach])
 
 AC_DEFINE([HAVE_AF_XDP], [1],
   [Define to 1 if AF_XDP support is available and enabled.])
-LIBBPF_LDADD=" -lbpf -lelf"
-AC_SUBST([LIBBPF_LDADD])
-
-AC_CHECK_DECL([xsk_ring_prod__needs_wakeup], [
-  AC_DEFINE([HAVE_XDP_NEED_WAKEUP], [1],
-[XDP need wakeup support detected in xsk.h.])
-], [], [[#include ]])
   fi
   AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
 ])
@@ -357,7 +356,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
 ], [], [[#include ]])
 
 AC_CHECK_DECL([RTE_NET_AF_XDP], [
-  LIBBPF_LDADD="-lbpf"
+  OVS_FIND_DEPENDENCY([libbpf_strerror], [bpf], [libbpf])
 ], [], [[#include ]])
 
 AC_CHECK_DECL([RTE_LIBRTE_VHOST_NUMA], [
diff --git a/lib/automake.mk b/lib/automake.mk
index a0fabe38f..61bdc308f 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -9,7 +9,6 @@ lib_LTLIBRARIES += lib/libopenvswitch.la
 
 lib_libopenvswitch_la_LIBADD = $(SSL_LIBS)
 lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD)
-lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
 
 
 if WIN32
diff --git a/lib/libopenvswitch.pc.in b/lib/libopenvswitch.pc.in
index 44fbb1f9f..a5f4d3947 100644
--- a/lib/libopenvswitch.pc.in
+++ b/lib/libopenvswitch.pc.in
@@ -7,5 +7,5 @@ Name: libopenvswitch
 Description: Open vSwitch library
 Version: @VERSION@
 Libs: -L${libdir} -lopenvswitch
-Libs.private: @LIBS@ @SSL_LIBS@ 

[ovs-dev] [PATCH 0/6] AF_XDP build fixes and enhancements.

2022-12-09 Thread Ilya Maximets
This patch set allows OVS to build with libxdp and newer libbpf.
It also enables AF_XDP support by default as long as all the
dependencies are available.

Ilya Maximets (6):
  ci: Fix overriding OPTS provided from the yml.
  netdev-afxdp: Allow building with libxdp and newer libbpf.
  acinclude.m4: Build with AF_XDP support by default if possible.
  github: Test AF_XDP build using libbpf instead of kernel sources.
  Documentation/afxdp: Use packaged libbpf/libxdp for the build.
  rhel: Enable AF_XDP by default in Fedora builds.

 .ci/linux-build.sh| 24 +--
 .github/workflows/build-and-test.yml  |  9 +--
 Documentation/intro/install/afxdp.rst | 43 
 NEWS  |  5 ++
 acinclude.m4  | 94 ++-
 debian/rules  |  8 ++-
 lib/automake.mk   |  1 -
 lib/libopenvswitch.pc.in  |  2 +-
 lib/netdev-afxdp-pool.c   |  2 +
 lib/netdev-afxdp-pool.h   |  5 --
 lib/netdev-afxdp.c| 38 ---
 rhel/openvswitch-fedora.spec.in   |  8 ++-
 12 files changed, 128 insertions(+), 111 deletions(-)

-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/6] ci: Fix overriding OPTS provided from the yml.

2022-12-09 Thread Ilya Maximets
For GCC builds we're overriding --disbale-ssl or --enable-shared
options set up in the GHA yml file.

Fix that by adding to EXTRA_OPTS instead.

Signed-off-by: Ilya Maximets 
---
 .ci/linux-build.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 485109672..1f8896423 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -246,7 +246,7 @@ elif [ "$M32" ]; then
 # difference on 'configure' and 'make' stages.
 export CC="$CC -m32"
 elif [ "$TRAVIS_ARCH" != "aarch64" ]; then
-OPTS="--enable-sparse"
+EXTRA_OPTS="$EXTRA_OPTS --enable-sparse"
 if [ "$AFXDP" ]; then
 # netdev-afxdp uses memset for 64M for umem initialization.
 SPARSE_FLAGS="${SPARSE_FLAGS} -Wno-memcpy-max-count"
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev v7 1/3] ofproto-dpif-upcall: fix push_dp_ops

2022-12-09 Thread Peng He
Patch v5 has statistics issues.

In order to solve this issue, we had a discussion.

below is the quote of the email.

”
After a second thought, I think maybe keeping INCONSISTENT just for the
modify error is a better option.

With current patch:
1.
the modify error case:
OPERATIONAL -> INCONSISTENT ->  EVICTING -> EVICTED
2.
the delete error case:
EVICTING -> EVICTED

Change both to INCONSISTENT:

the modify error case:
did not change.

the delete error case:
EVICTING -> INCONSISTENT -> EVICTED?

“

And we agree to take the second solution.



Eelco Chaudron  于2022年12月8日周四 18:54写道:

>
>
> On 27 Nov 2022, at 8:28, Peng He wrote:
>
> > push_dp_ops only handles delete ops errors but ignores the modify
> > ops results. It's better to handle all the dp operation errors in
> > a consistent way.
> >
> > We observe in the production environment that sometimes a megaflow
> > with wrong actions keep staying in datapath. The coverage command shows
> > revalidators have dumped several times, however the correct
> > actions are not set. This implies that the ukey's action does not
> > equal to the meagaflow's, i.e. revalidators think the underlying
> > megaflow's actions are correct however they are not.
> >
> > We also check the megaflow using the ofproto/trace command, and the
> > actions are not matched with the ones in the actual magaflow. By
> > performing a revalidator/purge command, the right actions are set.
> >
> > This patch prevents the inconsistency by considering modify failure
> > in revalidators.
> >
> > To note, we cannot perform two state transitions and change ukey_state
> > into UKEY_EVICTED directly here, because, if we do so, the
> > sweep will remove the ukey alone and leave dp flow alive. Later, the
> > dump will retrieve the dp flow and might even recover it. This will
> > contribute the stats of this dp flow twice.
> >
> > Signed-off-by: Peng He 
> > ---
> >  ofproto/ofproto-dpif-upcall.c | 34 +++---
> >  1 file changed, 23 insertions(+), 11 deletions(-)
> >
> > diff --git a/ofproto/ofproto-dpif-upcall.c
> b/ofproto/ofproto-dpif-upcall.c
> > index 7ad728adf..c2cefbeb8 100644
> > --- a/ofproto/ofproto-dpif-upcall.c
> > +++ b/ofproto/ofproto-dpif-upcall.c
> > @@ -2416,26 +2416,30 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
> *ops, size_t n_ops)
> >
> >  for (i = 0; i < n_ops; i++) {
> >  struct ukey_op *op = [i];
> > -struct dpif_flow_stats *push, *stats, push_buf;
> > -
> > -stats = op->dop.flow_del.stats;
> > -push = _buf;
> > -
> > -if (op->dop.type != DPIF_OP_FLOW_DEL) {
> > -/* Only deleted flows need their stats pushed. */
> > -continue;
> > -}
> >
> >  if (op->dop.error) {
> > -/* flow_del error, 'stats' is unusable. */
> >  if (op->ukey) {
> >  ovs_mutex_lock(>ukey->mutex);
> > -transition_ukey(op->ukey, UKEY_EVICTED);
> > +if (op->dop.type == DPIF_OP_FLOW_DEL) {
> > +transition_ukey(op->ukey, UKEY_EVICTED);
> > +} else {
> > +transition_ukey(op->ukey, UKEY_EVICTING);
> > +}
> >  ovs_mutex_unlock(>ukey->mutex);
> >  }
> >  continue;
> >  }
> >
> > +if (op->dop.type != DPIF_OP_FLOW_DEL) {
> > +/* Only deleted flows need their stats pushed. */
> > +continue;
> > +}
> > +
> > +struct dpif_flow_stats *push, *stats, push_buf;
> > +
> > +stats = op->dop.flow_del.stats;
> > +push = _buf;
> > +
> >  if (op->ukey) {
> >  ovs_mutex_lock(>ukey->mutex);
> >  transition_ukey(op->ukey, UKEY_EVICTED);
> > @@ -2848,6 +2852,14 @@ revalidator_sweep__(struct revalidator
> *revalidator, bool purge)
> >  continue;
> >  }
> >  ukey_state = ukey->state;
> > +
> > +if (ukey_state == UKEY_EVICTING) {
> > +/* previous modify operation fails on this ukey and
> ukey_state
> > + * is set to UKEY_EVICTING, issue a delete operation on
> this
> > + * ukey.
> > + */
> > +delete_op_init(udpif, [n_ops++], ukey);
>
> How can we be sure this state here is only for failing to modify
> operations?
> This state is set in other places in the code also.
>

because revalidate_sweep is called after revaldiate. all the datapath ops
have already been
pushed, and if there are ukeys at UKEY_EVICTING, it's because previously
datapath ops
have errors. and if del ops fail, ukey is deleted. so only ukeys at
UKEY_EVICTING must
come from modify fail.


> Maybe we should keep the INCONSISTENT state to avoid this (see v5)?
>
> > +}
> >  if (ukey_state == UKEY_OPERATIONAL
> >  || (ukey_state == UKEY_VISIBLE && purge)) {
> >  struct recirc_refs recircs =
> 

Re: [ovs-dev] [PATCH ovn v2 3/4] ic: prevent advertising/learning multiple same routes

2022-12-09 Thread Vladislav Odintsov
Hi Dumitru,

please see answers inline.

Regards,
Vladislav Odintsov

> On 9 Dec 2022, at 17:37, Dumitru Ceara  wrote:
> 
> On 12/6/22 11:20, Vladislav Odintsov wrote:
>> Signed-off-by: Vladislav Odintsov > >
>> ---
> 
> Hi Vladislav,
> 
>> ic/ovn-ic.c | 83 +++--
>> ovn-ic-sb.ovsschema |  6 ++--
>> tests/ovn-ic.at | 60 
>> 3 files changed, 114 insertions(+), 35 deletions(-)
>> 
>> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
>> index 9e2369fef..64307d8c4 100644
>> --- a/ic/ovn-ic.c
>> +++ b/ic/ovn-ic.c
>> @@ -884,10 +884,12 @@ ic_route_hash(const struct in6_addr *prefix, unsigned 
>> int plen,
>> static struct ic_route_info *
>> ic_route_find(struct hmap *routes, const struct in6_addr *prefix,
>>   unsigned int plen, const struct in6_addr *nexthop,
>> -  const char *origin, char *route_table)
>> +  const char *origin, const char *route_table, uint32_t hash)
>> {
>> struct ic_route_info *r;
>> -uint32_t hash = ic_route_hash(prefix, plen, nexthop, origin, 
>> route_table);
>> +if (!hash) {
>> +hash = ic_route_hash(prefix, plen, nexthop, origin, route_table);
>> +}
>> HMAP_FOR_EACH_WITH_HASH (r, node, hash, routes) {
>> if (ipv6_addr_equals(>prefix, prefix) &&
>> r->plen == plen &&
>> @@ -945,8 +947,8 @@ add_to_routes_learned(struct hmap *routes_learned,
>> }
>> const char *origin = smap_get_def(_route->options, "origin", "");
>> if (ic_route_find(routes_learned, , plen, , origin,
>> -  nb_route->route_table)) {
>> -/* Route is already added to learned in previous iteration. */
>> +  nb_route->route_table, 0)) {
>> +/* Route was added to learned on previous iteration. */
>> return true;
>> }
>> 
>> @@ -1093,10 +1095,42 @@ route_need_advertise(const char *policy,
>> }
>> 
>> static void
>> -add_to_routes_ad(struct hmap *routes_ad,
>> - const struct nbrec_logical_router_static_route *nb_route,
>> - const struct lport_addresses *nexthop_addresses,
>> - const struct smap *nb_options, const char *route_table)
>> +add_to_routes_ad(struct hmap *routes_ad, const struct in6_addr prefix,
>> + unsigned int plen, const struct in6_addr nexthop,
>> + const char *origin, const char *route_table,
>> + const struct nbrec_logical_router_port *nb_lrp,
>> + const struct nbrec_logical_router_static_route *nb_route)
>> +{
>> +if (route_table == NULL) {
>> +route_table = "";
>> +}
>> +
>> +uint hash = ic_route_hash(, plen, , origin, route_table);
>> +
>> +if (!ic_route_find(routes_ad, , plen, , origin, 
>> route_table,
>> +   hash)) {
>> +struct ic_route_info *ic_route = xzalloc(sizeof *ic_route);
>> +ic_route->prefix = prefix;
>> +ic_route->plen = plen;
>> +ic_route->nexthop = nexthop;
>> +ic_route->nb_route = nb_route;
>> +ic_route->origin = origin;
>> +ic_route->route_table = route_table;
>> +ic_route->nb_lrp = nb_lrp;
>> +hmap_insert(routes_ad, _route->node, hash);
>> +} else {
>> +VLOG_WARN("Duplicate route advertisement was suppressed! NB route "
>> +  "uuid: "UUID_FMT,
>> +  UUID_ARGS(_route->header_.uuid));
> 
> I think this should be rate limited.

Agree, will fix this in v3.

> 
>> +}
>> +}
>> +
>> +static void
>> +add_static_to_routes_ad(
>> +struct hmap *routes_ad,
>> +const struct nbrec_logical_router_static_route *nb_route,
>> +const struct lport_addresses *nexthop_addresses,
>> +const struct smap *nb_options, const char *route_table)
>> {
>> if (strcmp(route_table, nb_route->route_table)) {
>> if (VLOG_IS_DBG_ENABLED()) {
>> @@ -1145,16 +1179,8 @@ add_to_routes_ad(struct hmap *routes_ad,
>> ds_destroy();
>> }
>> 
>> -struct ic_route_info *ic_route = xzalloc(sizeof *ic_route);
>> -ic_route->prefix = prefix;
>> -ic_route->plen = plen;
>> -ic_route->nexthop = nexthop;
>> -ic_route->nb_route = nb_route;
>> -ic_route->origin = ROUTE_ORIGIN_STATIC;
>> -ic_route->route_table = nb_route->route_table;
>> -hmap_insert(routes_ad, _route->node,
>> -ic_route_hash(, plen, , ROUTE_ORIGIN_STATIC,
>> -  nb_route->route_table));
>> +add_to_routes_ad(routes_ad, prefix, plen, nexthop, ROUTE_ORIGIN_STATIC,
>> + nb_route->route_table, NULL, nb_route);
>> }
>> 
>> static void
>> @@ -1198,18 +1224,9 @@ add_network_to_routes_ad(struct hmap *routes_ad, 
>> const char *network,
>> ds_destroy();
>> }
>> 
>> -struct ic_route_info *ic_route = xzalloc(sizeof *ic_route);
>> -ic_route->prefix = prefix;
>> -ic_route->plen = plen;
>> -ic_route->nexthop 

Re: [ovs-dev] [PATCH v4 ovn] binding: add the capability to apply QoS for lsp

2022-12-09 Thread Numan Siddique
On Fri, Dec 9, 2022 at 9:43 AM Lorenzo Bianconi
 wrote:
>
> > On Sun, Dec 4, 2022 at 5:06 PM Lorenzo Bianconi
> >  wrote:
> > >
> > > Introduce the capability to apply QoS rules for logical switch ports
> > > claimed by ovn-controller. Rely on shash instead of sset for
> > > egress_ifaces.
> > >
> > > Acked-by: Mark Michelson 
> > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2129742
> > > Signed-off-by: Lorenzo Bianconi 
> >
> > Hi Lorenzo,
>
> Hi Numan,
>
> thx for the review :)
>
> >
> > Thanks for addressing the comments.
> >
> > I tested this version and compared with the present main.
> >
> > I see a few differences.
> >
> > With the main version,  when I configure qos params to a localnet
> > logical port,  ovn-controller creates a qdisc on the tunnel interface
> > which I guess is expected.
> > But I'm still not sure why we need to configure qdiscs on the tunnel
> > interface.  But that's a story.  Perhaps I need to see the original
> > QoS commits and understand why it was added.
> >
> > With your patch,  I don't see that happening.  Perhaps there is a bug
> > in setup_qos() function now that 'shash' is used to store the egress
> > ifaces with the logical port and
> > there is no logical port for the tunnel interfaces.
>
> uhm, I can look into this. Do you think we should keep this behaviour?

Frankly I don't know.  We need to find out why qos was configured in
the first place.

I do agree with Ilya's comment that it is odd to configure qos on the
tunnel interface
and that could impact the other traffic.


>
> >
> > Regarding the option - "ovn-egress-iface".  I suppose the expectation
> > from the CMS is to set the qos parameters in the logical port and also
> > set this option in the
> > ovs interface right ?  I don't really see a need for this.  I think if
> > CMS configures the qos parameters in the logical port options,
> > ovn-controller should configure the qos.
> > I think otherwise this would complicate the CMS implementation.  Thoughts ?
>
> ovn-egress-iface is mandatory since it is used in setup_qos() to look for the
> proper netdevice pointer to configure ovs qdisc (please note it is used to
> identify ovs interfaces and not ovn ones).
> I think it can be doable (not sure about the complexity) to avoid it for
> logical switch ports (since we have iface-id in them external_ids column of
> ovs interface table) but I do not think we can avoid to set "ovn-egress-iface"
> for localnet port since afaik there is no direct link between ovn localnet 
> port
> and ovs interface used to connect the underlay network, right? If so I guess
> it is better to keep it for both localnet and lsp ports. Do you agree?

I think it's ok to expect CMS to set "ovn-egress-iface" for the
physical nic connected to the
provider bridge.  But I don't think we should do the same for the
normal logical port VIFs.



>
> >
> >
> > Please see a few comments below
> >
> > > ---
> > > Changes since v3:
> > > - fix typo in new system-ovn test
> > > Changes since v2:
> > > - fix qos configuration restarting ovn-controller
> > > Changes since v1:
> > > - improve ovs interface lookup
> > > - improve system-tests
> > > ---
> > >  controller/binding.c| 155 ++--
> > >  controller/binding.h|   5 +-
> > >  controller/ovn-controller.c |  15 ++--
> > >  tests/system-ovn.at |  48 +++
> > >  4 files changed, 156 insertions(+), 67 deletions(-)
> > >
> > > diff --git a/controller/binding.c b/controller/binding.c
> > > index 5df62baef..53520263c 100644
> > > --- a/controller/binding.c
> > > +++ b/controller/binding.c
> > > @@ -115,6 +115,7 @@ struct qos_queue {
> > >  uint32_t min_rate;
> > >  uint32_t max_rate;
> > >  uint32_t burst;
> > > +char *port_name;
> > >  };
> > >
> > >  void
> > > @@ -147,25 +148,50 @@ static void update_lport_tracking(const struct 
> > > sbrec_port_binding *pb,
> > >struct hmap *tracked_dp_bindings,
> > >bool claimed);
> > >
> > > +static bool is_lport_vif(const struct sbrec_port_binding *pb);
> > > +
> > > +static struct qos_queue *
> > > +get_qos_map_entry(struct hmap *queue_map, const char *name)
> > > +{
> > > +struct qos_queue *qos_node;
> > > +HMAP_FOR_EACH (qos_node, node, queue_map) {
> > > +if (!strcmp(qos_node->port_name, name)) {
> > > +return qos_node;
> > > +}
> > > +}
> > > +
> > > +return NULL;
> > > +}
> > > +
> > >  static void
> > > -get_qos_params(const struct sbrec_port_binding *pb, struct hmap 
> > > *queue_map)
> > > +update_qos_params(const struct sbrec_port_binding *pb, struct hmap 
> > > *queue_map)
> > >  {
> > >  uint32_t min_rate = smap_get_int(>options, "qos_min_rate", 0);
> > >  uint32_t max_rate = smap_get_int(>options, "qos_max_rate", 0);
> > >  uint32_t burst = smap_get_int(>options, "qos_burst", 0);
> > >  uint32_t queue_id = smap_get_int(>options, 

Re: [ovs-dev] [PATCH] dpif-netdev: calculate per numa variance

2022-12-09 Thread Kevin Traynor

On 04/12/2022 11:16, Cheng Li wrote:

Currently, pmd_rebalance_dry_run() calculate overall variance of
all pmds regardless of their numa location. The overall result may
hide un-balance in an individual numa.



Hi. Thanks, the idea looks a good one. I didn't test yet, couple of 
comments on the code below.



Considering the following case. Numa 0 is free because VMs on numa0
are not sending pkts, while numa 1 is busy. Within numa 1, pmds
workloads are not balanced. Obviously, moving 500 kpps workloads from
pmd 126 to pmd 61 will make numa1 much more balance. For numa1
the variance improment will be almost 100%, because after rebalance
each pmd in numa1 holds same workload(variance ~= 0). But the overall
variance improvement is only about 20%, which may not tigger auto_lb.

```
numa_id   core_id  kpps
   030 0
   031 0
   094 0
   095 0
   1   126  1500
   1   127  1000
   163  1000
   162   500
```

As auto_lb doesn't work if any coss_numa rxq exists, it means that


That's not fully true. It can run with cross numa in a very limited 
circumstances where there is only PMD active on 1 numa.


It doesn't diminish the idea here, but just best not write that blanket 
statement as it may confuse.



auto_lb only balance rxq assignment within each numa. So it makes
more sense to calculate variance improvemnet per numa node.

Signed-off-by: Cheng Li 
---
  lib/dpif-netdev.c | 90 +--
  1 file changed, 47 insertions(+), 43 deletions(-)



I think at least some docs would need to be updated to say it's variance 
per NUMA.



diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 2c08a71..6a53f13 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -6076,39 +6076,33 @@ rxq_scheduling(struct dp_netdev *dp)
  static uint64_t variance(uint64_t a[], int n);
  
  static uint64_t

-sched_numa_list_variance(struct sched_numa_list *numa_list)
+sched_numa_variance(struct sched_numa *numa)
  {
-struct sched_numa *numa;
  uint64_t *percent_busy = NULL;
-unsigned total_pmds = 0;
  int n_proc = 0;
  uint64_t var;
  
-HMAP_FOR_EACH (numa, node, _list->numas) {

-total_pmds += numa->n_pmds;
-percent_busy = xrealloc(percent_busy,
-total_pmds * sizeof *percent_busy);
+percent_busy = xmalloc(numa->n_pmds * sizeof *percent_busy);
  
-for (unsigned i = 0; i < numa->n_pmds; i++) {

-struct sched_pmd *sched_pmd;
-uint64_t total_cycles = 0;
+for (unsigned i = 0; i < numa->n_pmds; i++) {
+struct sched_pmd *sched_pmd;
+uint64_t total_cycles = 0;
  
-sched_pmd = >pmds[i];

-/* Exclude isolated PMDs from variance calculations. */
-if (sched_pmd->isolated == true) {
-continue;
-}
-/* Get the total pmd cycles for an interval. */
-atomic_read_relaxed(_pmd->pmd->intrvl_cycles, _cycles);
+sched_pmd = >pmds[i];
+/* Exclude isolated PMDs from variance calculations. */
+if (sched_pmd->isolated == true) {
+continue;
+}
+/* Get the total pmd cycles for an interval. */
+atomic_read_relaxed(_pmd->pmd->intrvl_cycles, _cycles);
  
-if (total_cycles) {

-/* Estimate the cycles to cover all intervals. */
-total_cycles *= PMD_INTERVAL_MAX;
-percent_busy[n_proc++] = (sched_pmd->pmd_proc_cycles * 100)
- / total_cycles;
-} else {
-percent_busy[n_proc++] = 0;
-}
+if (total_cycles) {
+/* Estimate the cycles to cover all intervals. */
+total_cycles *= PMD_INTERVAL_MAX;
+percent_busy[n_proc++] = (sched_pmd->pmd_proc_cycles * 100)
+/ total_cycles;
+} else {
+percent_busy[n_proc++] = 0;
  }
  }
  var = variance(percent_busy, n_proc);
@@ -6182,6 +6176,7 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
  struct sched_numa_list numa_list_est;
  bool thresh_met = false;
  uint64_t current_var, estimate_var;
+struct sched_numa *numa_cur, *numa_est;
  uint64_t improvement = 0;
  
  VLOG_DBG("PMD auto load balance performing dry run.");

@@ -6200,24 +6195,33 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
  sched_numa_list_count(_list_est) == 1) {
  
  /* Calculate variances. */

-current_var = sched_numa_list_variance(_list_cur);
-estimate_var = sched_numa_list_variance(_list_est);
-
-if (estimate_var < current_var) {
- improvement = ((current_var - estimate_var) * 100) / current_var;
-}
-VLOG_DBG("Current variance %"PRIu64" Estimated variance 

Re: [ovs-dev] [PATCHv4 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-12-09 Thread Pablo Neira Ayuso
On Thu, Dec 08, 2022 at 11:56:12AM -0500, Xin Long wrote:
> There are two nat functions are nearly the same in both OVS and
> TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
> 
> This patch creates nf_nat_ovs.c under netfilter and moves them
> there then exports nf_ct_nat() so that it can be shared by both
> OVS and TC, and keeps the nat (type) check and nat flag update
> in OVS and TC's own place, as these parts are different between
> OVS and TC.
> 
> Note that in OVS nat function it was using skb->protocol to get
> the proto as it already skips vlans in key_extract(), while it
> doesn't in TC, and TC has to call skb_protocol() to get proto.
> So in nf_ct_nat_execute(), we keep using skb_protocol() which
> works for both OVS and TC contrack.
> 
> Signed-off-by: Xin Long 

Acked-by: Pablo Neira Ayuso 

Thanks for addressing my feedback.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Load based PMD sleeping.

2022-12-09 Thread David Marchand
Hi Kevin,

Here are some comments on the stats side:

On Tue, Nov 29, 2022 at 3:01 PM Kevin Traynor  wrote:
> @@ -6964,5 +6999,19 @@ reload:
>   * There was no time updates on current iteration. */
>  pmd_thread_ctx_time_update(pmd);
> -tx_packets = dp_netdev_pmd_flush_output_packets(pmd, false);
> +tx_packets = dp_netdev_pmd_flush_output_packets(pmd,
> +   sleep_time ? true : 
> false);
> +}
> +
> +if (powersave) {
> +if (sleep_time) {
> +xnanosleep(sleep_time);
> +time_slept = sleep_time;

- time_slept is the requested time, and not the time that ovs actually slept.
PMD_PWR_SLEEP_TIME stat won't be accurate if xnanosleep() takes longer
than expected.
I would add a "timer" around xnanosleep() and store cycles internally
(and then translate back to us in the stats output).


- Both pmd idling (PMD_CYCLES_ITER_IDLE) and pmd processing
(PMD_CYCLES_ITER_BUSY) stats now include cycles spent sleeping.
Those stats are used for animating pmd stats ("avg processing cycles
per packet") and rxq stats ("overhead").
I think those "sleeping" cycles should be counted as idle so that
"busy" cycles do represent time spent processing the packets.


WDYT?

> +}
> +if (sleep_time < PMD_PWR_MAX_SLEEP) {
> +/* Increase potential sleep time for next iteration. */
> +sleep_time += PMD_PWR_INC;
> +} else {
> +max_sleep_hit = true;
> +}
>  }
>


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 branch-22.12] CI: Remove .cirrus.yml

2022-12-09 Thread Dumitru Ceara
On 12/9/22 10:09, Ales Musil wrote:
> Because we have enabled Cirrus CI remove the
> old config from branches to prevent it from failing.
> 
> Signed-off-by: Ales Musil 
> ---
> It should be backported all the way down to 21.12.
> 
> v2: Add forgotten Signed-off-by.
> ---
>  .cirrus.yml | 31 ---
>  1 file changed, 31 deletions(-)
>  delete mode 100644 .cirrus.yml

Thanks for the patch but this breaks the build.  This is from the
ovsrobot run:

https://github.com/ovsrobot/ovn/actions/runs/3656074515/jobs/6178100493#step:13:3206

make[1]: *** No rule to make target '.cirrus.yml', needed by 'all-am'.
Stop.

Regards,
Dumitru

> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> deleted file mode 100644
> index c71a0a249..0
> --- a/.cirrus.yml
> +++ /dev/null
> @@ -1,31 +0,0 @@
> -freebsd_build_task:
> -
> -  freebsd_instance:
> -matrix:
> -  image: freebsd-12-0-release-amd64
> -  image: freebsd-11-2-release-amd64
> -cpu: 4
> -memory: 8G
> -
> -  env:
> -DEPENDENCIES: automake libtool gmake gcc wget openssl
> -  python py27-six py27-sphinx py27-openssl
> -  python3 py36-six py36-openssl
> -matrix:
> -  COMPILER: gcc
> -  COMPILER: clang
> -
> -  prepare_script:
> -- pkg install -y ${DEPENDENCIES}
> -
> -  configure_script:
> -- ./boot.sh
> -- ./configure CC=${COMPILER} MAKE=gmake OVS_CFLAGS='-Wall' 
> --enable-Werror
> -  || { cat config.log; exit 1; }
> -
> -  build_script:
> -- gmake -j8
> -
> -  check_script:
> -- gmake -j8 check TESTSUITEFLAGS=-j8 RECHECK=yes
> -|| { cat ./tests/testsuite.log; exit 1; }

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 4/4] ic: minor code improvements

2022-12-09 Thread Dumitru Ceara
On 12/6/22 11:20, Vladislav Odintsov wrote:
> 1. Remove excess nbrec_logical_router variable.
> 2. Remove excess call to add_static_to_routes_ad().
> 3. Remove double nexthop check in ic_route_find().
> 
> Signed-off-by: Vladislav Odintsov 
> ---

Acked-by: Dumitru Ceara 

Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 1/4] ic: remove orphan ovn interconnection routes

2022-12-09 Thread Dumitru Ceara
On 12/6/22 11:19, Vladislav Odintsov wrote:
> Before this patch if one deletes transit switch through which there were
> routes in ICSB:Route table, such routes were left forever in the DB.
> 
> Now we validate that each ICSB:Route has an appropriate transit switch.
> 
> Signed-off-by: Vladislav Odintsov 
> ---
>  ic/ovn-ic.c | 40 +++
>  tests/ovn-ic.at | 73 +
>  2 files changed, 113 insertions(+)
> 
> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> index e5c193d9d..d3bc08761 100644
> --- a/ic/ovn-ic.c
> +++ b/ic/ovn-ic.c
> @@ -71,6 +71,7 @@ struct ic_context {
>  struct ovsdb_idl_index *icsbrec_port_binding_by_az;
>  struct ovsdb_idl_index *icsbrec_port_binding_by_ts;
>  struct ovsdb_idl_index *icsbrec_port_binding_by_ts_az;
> +struct ovsdb_idl_index *icsbrec_route_by_az;
>  struct ovsdb_idl_index *icsbrec_route_by_ts;
>  struct ovsdb_idl_index *icsbrec_route_by_ts_az;
>  };
> @@ -1612,6 +1613,38 @@ advertise_lr_routes(struct ic_context *ctx,
>  hmap_destroy(_ad);
>  }
>  
> +static void
> +delete_orphan_ic_routes(struct ic_context *ctx,
> + const struct icsbrec_availability_zone *az)
> +{
> +const struct icsbrec_route *isb_route, *isb_route_key =
> +icsbrec_route_index_init_row(ctx->icsbrec_route_by_az);
> +icsbrec_route_index_set_availability_zone(isb_route_key, az);
> +
> +const struct icnbrec_transit_switch *t_sw, *t_sw_key;
> +
> +ICSBREC_ROUTE_FOR_EACH_EQUAL (isb_route, isb_route_key,
> +  ctx->icsbrec_route_by_az)
> +{
> +t_sw_key = icnbrec_transit_switch_index_init_row(
> +ctx->icnbrec_transit_switch_by_name);
> +icnbrec_transit_switch_index_set_name(t_sw_key,
> +isb_route->transit_switch);
> +t_sw = icnbrec_transit_switch_index_find(
> +ctx->icnbrec_transit_switch_by_name, t_sw_key);
> +icnbrec_transit_switch_index_destroy_row(t_sw_key);
> +
> +if (!t_sw) {
> +VLOG_WARN("Deleting orphan ICDB:Route: %s->%s (%s, rtb:%s, "
> +  "transit switch: %s)", isb_route->ip_prefix,
> +  isb_route->nexthop, isb_route->origin,
> +  isb_route->route_table, isb_route->transit_switch);

I guess this slipped through.  We agreed to change this to VLOG_INFO_RL
in v1.  With that addressed please feel free to add my ack to v3:

Acked-by: Dumitru Ceara 

Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 ovn] binding: add the capability to apply QoS for lsp

2022-12-09 Thread Lorenzo Bianconi
> On Sun, Dec 4, 2022 at 5:06 PM Lorenzo Bianconi
>  wrote:
> >
> > Introduce the capability to apply QoS rules for logical switch ports
> > claimed by ovn-controller. Rely on shash instead of sset for
> > egress_ifaces.
> >
> > Acked-by: Mark Michelson 
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2129742
> > Signed-off-by: Lorenzo Bianconi 
> 
> Hi Lorenzo,

Hi Numan,

thx for the review :)

> 
> Thanks for addressing the comments.
> 
> I tested this version and compared with the present main.
> 
> I see a few differences.
> 
> With the main version,  when I configure qos params to a localnet
> logical port,  ovn-controller creates a qdisc on the tunnel interface
> which I guess is expected.
> But I'm still not sure why we need to configure qdiscs on the tunnel
> interface.  But that's a story.  Perhaps I need to see the original
> QoS commits and understand why it was added.
> 
> With your patch,  I don't see that happening.  Perhaps there is a bug
> in setup_qos() function now that 'shash' is used to store the egress
> ifaces with the logical port and
> there is no logical port for the tunnel interfaces.

uhm, I can look into this. Do you think we should keep this behaviour?

> 
> Regarding the option - "ovn-egress-iface".  I suppose the expectation
> from the CMS is to set the qos parameters in the logical port and also
> set this option in the
> ovs interface right ?  I don't really see a need for this.  I think if
> CMS configures the qos parameters in the logical port options,
> ovn-controller should configure the qos.
> I think otherwise this would complicate the CMS implementation.  Thoughts ?

ovn-egress-iface is mandatory since it is used in setup_qos() to look for the
proper netdevice pointer to configure ovs qdisc (please note it is used to
identify ovs interfaces and not ovn ones).
I think it can be doable (not sure about the complexity) to avoid it for
logical switch ports (since we have iface-id in them external_ids column of
ovs interface table) but I do not think we can avoid to set "ovn-egress-iface"
for localnet port since afaik there is no direct link between ovn localnet port
and ovs interface used to connect the underlay network, right? If so I guess
it is better to keep it for both localnet and lsp ports. Do you agree?

> 
> 
> Please see a few comments below
> 
> > ---
> > Changes since v3:
> > - fix typo in new system-ovn test
> > Changes since v2:
> > - fix qos configuration restarting ovn-controller
> > Changes since v1:
> > - improve ovs interface lookup
> > - improve system-tests
> > ---
> >  controller/binding.c| 155 ++--
> >  controller/binding.h|   5 +-
> >  controller/ovn-controller.c |  15 ++--
> >  tests/system-ovn.at |  48 +++
> >  4 files changed, 156 insertions(+), 67 deletions(-)
> >
> > diff --git a/controller/binding.c b/controller/binding.c
> > index 5df62baef..53520263c 100644
> > --- a/controller/binding.c
> > +++ b/controller/binding.c
> > @@ -115,6 +115,7 @@ struct qos_queue {
> >  uint32_t min_rate;
> >  uint32_t max_rate;
> >  uint32_t burst;
> > +char *port_name;
> >  };
> >
> >  void
> > @@ -147,25 +148,50 @@ static void update_lport_tracking(const struct 
> > sbrec_port_binding *pb,
> >struct hmap *tracked_dp_bindings,
> >bool claimed);
> >
> > +static bool is_lport_vif(const struct sbrec_port_binding *pb);
> > +
> > +static struct qos_queue *
> > +get_qos_map_entry(struct hmap *queue_map, const char *name)
> > +{
> > +struct qos_queue *qos_node;
> > +HMAP_FOR_EACH (qos_node, node, queue_map) {
> > +if (!strcmp(qos_node->port_name, name)) {
> > +return qos_node;
> > +}
> > +}
> > +
> > +return NULL;
> > +}
> > +
> >  static void
> > -get_qos_params(const struct sbrec_port_binding *pb, struct hmap *queue_map)
> > +update_qos_params(const struct sbrec_port_binding *pb, struct hmap 
> > *queue_map)
> >  {
> >  uint32_t min_rate = smap_get_int(>options, "qos_min_rate", 0);
> >  uint32_t max_rate = smap_get_int(>options, "qos_max_rate", 0);
> >  uint32_t burst = smap_get_int(>options, "qos_burst", 0);
> >  uint32_t queue_id = smap_get_int(>options, "qdisc_queue_id", 0);
> >
> > +struct qos_queue *node = get_qos_map_entry(queue_map, 
> > pb->logical_port);
> > +
> >  if ((!min_rate && !max_rate && !burst) || !queue_id) {
> >  /* Qos is not configured for this port. */
> > +if (node) {
> > + hmap_remove(queue_map, >node);
> > + free(node->port_name);
> > + free(node);
> > +}
> >  return;
> >  }
> >
> > -struct qos_queue *node = xzalloc(sizeof *node);
> > -hmap_insert(queue_map, >node, hash_int(queue_id, 0));
> > +if (!node) {
> > +node = xzalloc(sizeof *node);
> > +hmap_insert(queue_map, >node, hash_int(queue_id, 0));
> > +

Re: [ovs-dev] [PATCH ovn v2 3/4] ic: prevent advertising/learning multiple same routes

2022-12-09 Thread Dumitru Ceara
On 12/6/22 11:20, Vladislav Odintsov wrote:
> Signed-off-by: Vladislav Odintsov 
> ---

Hi Vladislav,

>  ic/ovn-ic.c | 83 +++--
>  ovn-ic-sb.ovsschema |  6 ++--
>  tests/ovn-ic.at | 60 
>  3 files changed, 114 insertions(+), 35 deletions(-)
> 
> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> index 9e2369fef..64307d8c4 100644
> --- a/ic/ovn-ic.c
> +++ b/ic/ovn-ic.c
> @@ -884,10 +884,12 @@ ic_route_hash(const struct in6_addr *prefix, unsigned 
> int plen,
>  static struct ic_route_info *
>  ic_route_find(struct hmap *routes, const struct in6_addr *prefix,
>unsigned int plen, const struct in6_addr *nexthop,
> -  const char *origin, char *route_table)
> +  const char *origin, const char *route_table, uint32_t hash)
>  {
>  struct ic_route_info *r;
> -uint32_t hash = ic_route_hash(prefix, plen, nexthop, origin, 
> route_table);
> +if (!hash) {
> +hash = ic_route_hash(prefix, plen, nexthop, origin, route_table);
> +}
>  HMAP_FOR_EACH_WITH_HASH (r, node, hash, routes) {
>  if (ipv6_addr_equals(>prefix, prefix) &&
>  r->plen == plen &&
> @@ -945,8 +947,8 @@ add_to_routes_learned(struct hmap *routes_learned,
>  }
>  const char *origin = smap_get_def(_route->options, "origin", "");
>  if (ic_route_find(routes_learned, , plen, , origin,
> -  nb_route->route_table)) {
> -/* Route is already added to learned in previous iteration. */
> +  nb_route->route_table, 0)) {
> +/* Route was added to learned on previous iteration. */
>  return true;
>  }
>  
> @@ -1093,10 +1095,42 @@ route_need_advertise(const char *policy,
>  }
>  
>  static void
> -add_to_routes_ad(struct hmap *routes_ad,
> - const struct nbrec_logical_router_static_route *nb_route,
> - const struct lport_addresses *nexthop_addresses,
> - const struct smap *nb_options, const char *route_table)
> +add_to_routes_ad(struct hmap *routes_ad, const struct in6_addr prefix,
> + unsigned int plen, const struct in6_addr nexthop,
> + const char *origin, const char *route_table,
> + const struct nbrec_logical_router_port *nb_lrp,
> + const struct nbrec_logical_router_static_route *nb_route)
> +{
> +if (route_table == NULL) {
> +route_table = "";
> +}
> +
> +uint hash = ic_route_hash(, plen, , origin, route_table);
> +
> +if (!ic_route_find(routes_ad, , plen, , origin, 
> route_table,
> +   hash)) {
> +struct ic_route_info *ic_route = xzalloc(sizeof *ic_route);
> +ic_route->prefix = prefix;
> +ic_route->plen = plen;
> +ic_route->nexthop = nexthop;
> +ic_route->nb_route = nb_route;
> +ic_route->origin = origin;
> +ic_route->route_table = route_table;
> +ic_route->nb_lrp = nb_lrp;
> +hmap_insert(routes_ad, _route->node, hash);
> +} else {
> +VLOG_WARN("Duplicate route advertisement was suppressed! NB route "
> +  "uuid: "UUID_FMT,
> +  UUID_ARGS(_route->header_.uuid));

I think this should be rate limited.

> +}
> +}
> +
> +static void
> +add_static_to_routes_ad(
> +struct hmap *routes_ad,
> +const struct nbrec_logical_router_static_route *nb_route,
> +const struct lport_addresses *nexthop_addresses,
> +const struct smap *nb_options, const char *route_table)
>  {
>  if (strcmp(route_table, nb_route->route_table)) {
>  if (VLOG_IS_DBG_ENABLED()) {
> @@ -1145,16 +1179,8 @@ add_to_routes_ad(struct hmap *routes_ad,
>  ds_destroy();
>  }
>  
> -struct ic_route_info *ic_route = xzalloc(sizeof *ic_route);
> -ic_route->prefix = prefix;
> -ic_route->plen = plen;
> -ic_route->nexthop = nexthop;
> -ic_route->nb_route = nb_route;
> -ic_route->origin = ROUTE_ORIGIN_STATIC;
> -ic_route->route_table = nb_route->route_table;
> -hmap_insert(routes_ad, _route->node,
> -ic_route_hash(, plen, , ROUTE_ORIGIN_STATIC,
> -  nb_route->route_table));
> +add_to_routes_ad(routes_ad, prefix, plen, nexthop, ROUTE_ORIGIN_STATIC,
> + nb_route->route_table, NULL, nb_route);
>  }
>  
>  static void
> @@ -1198,18 +1224,9 @@ add_network_to_routes_ad(struct hmap *routes_ad, const 
> char *network,
>  ds_destroy();
>  }
>  
> -struct ic_route_info *ic_route = xzalloc(sizeof *ic_route);
> -ic_route->prefix = prefix;
> -ic_route->plen = plen;
> -ic_route->nexthop = nexthop;
> -ic_route->nb_lrp = nb_lrp;
> -ic_route->origin = ROUTE_ORIGIN_CONNECTED;
> -
>  /* directly-connected routes go to  route table */
> -ic_route->route_table = NULL;
> -hmap_insert(routes_ad, _route->node,
> -ic_route_hash(, 

Re: [ovs-dev] [PATCH ovn] actions: Clarify the NAT type for ovnact_ct_nat

2022-12-09 Thread Dumitru Ceara
On 11/30/22 10:47, Ales Musil wrote:
> The encode_ct_nat took a bool specifying if the NAT
> type is source or destination. However that doesn't work
> with ct_commit_nat which has the NAT type unspecified.
> Add enum that allows to differentiate between those
> to make it clearer which type of NAT should be applied.
> 
> Signed-off-by: Ales Musil 
> ---

Looks good to me, thanks for the improvement!

Acked-by: Dumitru Ceara 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] docs: Extend upgrade documentation.

2022-12-09 Thread Dumitru Ceara
On 12/6/22 16:34, Frode Nordahl wrote:
> As uncovered during the OVSCON'22 open discussion forum on
> upgrades, there are some challenges in upgrading from older
> versions of OVN.
> 
> Document version requirements for performing a controller first
> rolling upgrade.
> 
> Add a section about how to perform a fail-safe upgrade for
> deployments that want to upgrade beyond a supported version span.
> 
> Reported-at: https://bugs.launchpad.net/bugs/1940043
> Signed-off-by: Frode Nordahl 
> ---

Acked-by: Dumitru Ceara 

Thanks again for improving the upgrade documentation!  It looks good to
me but let's see what others think about it too.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-trace: Support connecting to SB raft followers.

2022-12-09 Thread 0-day Robot
Bleep bloop.  Greetings Dumitru Ceara, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line lacks whitespace around operator
#75 FILE: utilities/ovn-trace.c:409:
  --no-leader-onlyaccept any cluster member, not just the leader\n\

Lines checked: 82, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] ovn-trace: Support connecting to SB raft followers.

2022-12-09 Thread Dumitru Ceara
Similar to other OVN utilities, ovn-trace now supports --no-leader-only
and --leader-only arguments.  The latter is default.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2152151
Reported-by: Patryk Diak 
Signed-off-by: Dumitru Ceara 
---
 utilities/ovn-trace.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c
index 79ed5a9af1..07ebac5e54 100644
--- a/utilities/ovn-trace.c
+++ b/utilities/ovn-trace.c
@@ -60,6 +60,9 @@ static char *unixctl_path;
 /* The southbound database. */
 static struct ovsdb_idl *ovnsb_idl;
 
+/* --leader-only, --no-leader-only: Only accept the leader in a cluster. */
+static int leader_only = true;
+
 /* --detailed: Show a detailed, table-by-table trace. */
 static bool detailed;
 
@@ -138,6 +141,7 @@ main(int argc, char *argv[])
  1, INT_MAX, ovntrace_trace, NULL);
 }
 ovnsb_idl = ovsdb_idl_create(db, _idl_class, true, false);
+ovsdb_idl_set_leader_only(ovnsb_idl, leader_only);
 
 bool already_read = false;
 for (;;) {
@@ -243,6 +247,8 @@ parse_options(int argc, char *argv[])
 {
 enum {
 OPT_DB = UCHAR_MAX + 1,
+OPT_LEADER_ONLY,
+OPT_NO_LEADER_ONLY,
 OPT_UNIXCTL,
 OPT_DETAILED,
 OPT_SUMMARY,
@@ -260,6 +266,8 @@ parse_options(int argc, char *argv[])
 };
 static const struct option long_options[] = {
 {"db", required_argument, NULL, OPT_DB},
+{"leader-only", no_argument, NULL, OPT_LEADER_ONLY},
+{"no-leader-only", no_argument, NULL, OPT_NO_LEADER_ONLY},
 {"unixctl", required_argument, NULL, OPT_UNIXCTL},
 {"detailed", no_argument, NULL, OPT_DETAILED},
 {"summary", no_argument, NULL, OPT_SUMMARY},
@@ -294,6 +302,14 @@ parse_options(int argc, char *argv[])
 db = optarg;
 break;
 
+case OPT_LEADER_ONLY:
+leader_only = true;
+break;
+
+case OPT_NO_LEADER_ONLY:
+leader_only = false;
+break;
+
 case OPT_UNIXCTL:
 unixctl_path = optarg;
 break;
@@ -390,6 +406,7 @@ Output style options:\n\
 Other options:\n\
   --db=DATABASE   connect to DATABASE\n\
   (default: %s)\n\
+  --no-leader-onlyaccept any cluster member, not just the leader\n\
   --ovs[=REMOTE]  obtain corresponding OpenFlow flows from REMOTE\n\
   (default: %s)\n\
   --unixctl=SOCKETset control socket name\n\
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] netdev-dpdk: make sure the mempool has enough mbufs for rxq

2022-12-09 Thread Kevin Traynor

On 07/12/2022 11:53, yangchang wrote:

For eth dev, dpdk X_rx_queue_start fuction will pre alloc mbuf
for rxq, if n_mbufs < n_rxq * rxq_size, dev will start error
becuse of it cannot allocate memory. If n_mbufs are not enough,
the mempool is not necessary to create.



On the good side, catching the issue here makes it more recoverable by 
the user by changing the num rxq or rxq size desc.


The issue is that it is based on the requested sizes, and there is 
fallback for them to move to lower sizes, so the requested size might 
not actually be what is used for setting up the rxqs. Meaning there 
could be a case where the mempool would have been big enough but we 
don't create it.


So I agree it's an improvement for most cases, but it's a backwards step 
for the (rare?) case where requested rxqs is too big and is adjusted 
downwards during dpdk init making the mempool big enough.


Some comments on the implementation below.


Signed-off-by: yangchang 
---
  lib/netdev-dpdk.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 72e7a32..7e0a819 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -757,6 +757,19 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
  n_mbufs = dpdk_calculate_mbufs(dev, mtu);

  do {
+/* For eth dev, dpdk X_rx_queue_start fuction will pre alloc mbuf


function


+ * for rxq, if n_mbufs < n_rxq * rxq_size, dev will start error
+ * becuse of it cannot allocate memory.


because


+ */
+if ((!per_port_memory) && (dev->type == DPDK_DEV_ETH) &&
+(n_mbufs < dev->requested_n_rxq * dev->requested_rxq_size)) {


Some of the () are unnecessary and can be removed.


+VLOG_ERR("Port %s: a mempool of %u mbuffs is not enough "
+ "for %d Rx queues and %d queue size",


s/mbuffs/mbufs/

I think "Rx queues of %d queue size" would sound better


+ netdev_name, n_mbufs, dev->requested_n_rxq,
+ dev->requested_rxq_size);
+ break;


Breaking here leads to an incorrect log message:
netdev_dpdk|ERR|Failed to create mempool "�" with a request of 262144 mbufs

thanks,
Kevin.


+}
+
  /* Full DPDK memory pool name must be unique and cannot be
   * longer than RTE_MEMPOOL_NAMESIZE. Note that for the shared
   * mempool case this can result in one device using a mempool
--
1.8.3.1


yangch...@chinatelecom.cn
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv4 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-12-09 Thread Aaron Conole
Xin Long  writes:

> There are two nat functions are nearly the same in both OVS and
> TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
>
> This patch creates nf_nat_ovs.c under netfilter and moves them
> there then exports nf_ct_nat() so that it can be shared by both
> OVS and TC, and keeps the nat (type) check and nat flag update
> in OVS and TC's own place, as these parts are different between
> OVS and TC.
>
> Note that in OVS nat function it was using skb->protocol to get
> the proto as it already skips vlans in key_extract(), while it
> doesn't in TC, and TC has to call skb_protocol() to get proto.
> So in nf_ct_nat_execute(), we keep using skb_protocol() which
> works for both OVS and TC contrack.
>
> Signed-off-by: Xin Long 
> ---

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv4 net-next 3/5] openvswitch: return NF_DROP when fails to add nat ext in ovs_ct_nat

2022-12-09 Thread Aaron Conole
Xin Long  writes:

> When it fails to allocate nat ext, the packet should be dropped, like
> the memory allocation failures in other places in ovs_ct_nat().
>
> This patch changes to return NF_DROP when fails to add nat ext before
> doing NAT in ovs_ct_nat(), also it would keep consistent with tc
> action ct' processing in tcf_ct_act_nat().
>
> Signed-off-by: Xin Long 
> ---

LGTM.

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 ovn] binding: add the capability to apply QoS for lsp

2022-12-09 Thread Ilya Maximets
On 12/9/22 04:18, Numan Siddique wrote:
> On Sun, Dec 4, 2022 at 5:06 PM Lorenzo Bianconi
>  wrote:
>>
>> Introduce the capability to apply QoS rules for logical switch ports
>> claimed by ovn-controller. Rely on shash instead of sset for
>> egress_ifaces.
>>
>> Acked-by: Mark Michelson 
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2129742
>> Signed-off-by: Lorenzo Bianconi 
> 
> Hi Lorenzo,
> 
> Thanks for addressing the comments.
> 
> I tested this version and compared with the present main.
> 
> I see a few differences.
> 
> With the main version,  when I configure qos params to a localnet
> logical port,  ovn-controller creates a qdisc on the tunnel interface
> which I guess is expected.
> But I'm still not sure why we need to configure qdiscs on the tunnel
> interface.  But that's a story.  Perhaps I need to see the original
> QoS commits and understand why it was added.

I'm not sure how that is supposed to work, but adding qdiscs
to tunnel interfaces sounds wrong.  It will affect all the
traffic between nodes, not only particular ports, right?

> 
> With your patch,  I don't see that happening.  Perhaps there is a bug
> in setup_qos() function now that 'shash' is used to store the egress
> ifaces with the logical port and
> there is no logical port for the tunnel interfaces.
> 
> Regarding the option - "ovn-egress-iface".  I suppose the expectation
> from the CMS is to set the qos parameters in the logical port and also
> set this option in the
> ovs interface right ?  I don't really see a need for this.  I think if
> CMS configures the qos parameters in the logical port options,
> ovn-controller should configure the qos.
> I think otherwise this would complicate the CMS implementation.  Thoughts ?
> 
> 
> Please see a few comments below
> 
>> ---
>> Changes since v3:
>> - fix typo in new system-ovn test
>> Changes since v2:
>> - fix qos configuration restarting ovn-controller
>> Changes since v1:
>> - improve ovs interface lookup
>> - improve system-tests
>> ---
>>  controller/binding.c| 155 ++--
>>  controller/binding.h|   5 +-
>>  controller/ovn-controller.c |  15 ++--
>>  tests/system-ovn.at |  48 +++
>>  4 files changed, 156 insertions(+), 67 deletions(-)
>>
>> diff --git a/controller/binding.c b/controller/binding.c
>> index 5df62baef..53520263c 100644
>> --- a/controller/binding.c
>> +++ b/controller/binding.c
>> @@ -115,6 +115,7 @@ struct qos_queue {
>>  uint32_t min_rate;
>>  uint32_t max_rate;
>>  uint32_t burst;
>> +char *port_name;
>>  };
>>
>>  void
>> @@ -147,25 +148,50 @@ static void update_lport_tracking(const struct 
>> sbrec_port_binding *pb,
>>struct hmap *tracked_dp_bindings,
>>bool claimed);
>>
>> +static bool is_lport_vif(const struct sbrec_port_binding *pb);
>> +
>> +static struct qos_queue *
>> +get_qos_map_entry(struct hmap *queue_map, const char *name)
>> +{
>> +struct qos_queue *qos_node;
>> +HMAP_FOR_EACH (qos_node, node, queue_map) {
>> +if (!strcmp(qos_node->port_name, name)) {
>> +return qos_node;
>> +}
>> +}
>> +
>> +return NULL;
>> +}
>> +
>>  static void
>> -get_qos_params(const struct sbrec_port_binding *pb, struct hmap *queue_map)
>> +update_qos_params(const struct sbrec_port_binding *pb, struct hmap 
>> *queue_map)
>>  {
>>  uint32_t min_rate = smap_get_int(>options, "qos_min_rate", 0);
>>  uint32_t max_rate = smap_get_int(>options, "qos_max_rate", 0);
>>  uint32_t burst = smap_get_int(>options, "qos_burst", 0);
>>  uint32_t queue_id = smap_get_int(>options, "qdisc_queue_id", 0);
>>
>> +struct qos_queue *node = get_qos_map_entry(queue_map, pb->logical_port);
>> +
>>  if ((!min_rate && !max_rate && !burst) || !queue_id) {
>>  /* Qos is not configured for this port. */
>> +if (node) {
>> + hmap_remove(queue_map, >node);
>> + free(node->port_name);
>> + free(node);
>> +}
>>  return;
>>  }
>>
>> -struct qos_queue *node = xzalloc(sizeof *node);
>> -hmap_insert(queue_map, >node, hash_int(queue_id, 0));
>> +if (!node) {
>> +node = xzalloc(sizeof *node);
>> +hmap_insert(queue_map, >node, hash_int(queue_id, 0));
>> +node->port_name = xstrdup(pb->logical_port);
>> +}
>> +node->queue_id = queue_id;
>>  node->min_rate = min_rate;
>>  node->max_rate = max_rate;
>>  node->burst = burst;
>> -node->queue_id = queue_id;
>>  }
>>
>>  static const struct ovsrec_qos *
>> @@ -191,7 +217,7 @@ static bool
>>  set_noop_qos(struct ovsdb_idl_txn *ovs_idl_txn,
>>   const struct ovsrec_port_table *port_table,
>>   const struct ovsrec_qos_table *qos_table,
>> - struct sset *egress_ifaces)
>> + struct shash *egress_ifaces)
>>  {
>>  if (!ovs_idl_txn) {
>>  return false;
>> 

Re: [ovs-dev] [PATCH] [PATCH v9 net-next] net: openvswitch: Add support to count upcall packets

2022-12-09 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller :

On Tue, 6 Dec 2022 20:38:57 -0500 you wrote:
> Add support to count upall packets, when kmod of openvswitch
> upcall to count the number of packets for upcall succeed and
> failed, which is a better way to see how many packets upcalled
> on every interfaces.
> 
> Signed-off-by: wangchuanlei 
> 
> [...]

Here is the summary with links:
  - [v9,net-next] net: openvswitch: Add support to count upcall packets
https://git.kernel.org/netdev/net-next/c/1933ea365aa7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-22.12] CI: Remove .cirrus.yml

2022-12-09 Thread Ales Musil
On Fri, Dec 9, 2022 at 9:39 AM Ales Musil  wrote:

> Because we have enabled Cirrus CI remove the
> old config from branches to prevent it from failing.
> ---
> It should be backported all the way down to 21.12.
> ---
>  .cirrus.yml | 31 ---
>  1 file changed, 31 deletions(-)
>  delete mode 100644 .cirrus.yml
>
> diff --git a/.cirrus.yml b/.cirrus.yml
> deleted file mode 100644
> index c71a0a249..0
> --- a/.cirrus.yml
> +++ /dev/null
> @@ -1,31 +0,0 @@
> -freebsd_build_task:
> -
> -  freebsd_instance:
> -matrix:
> -  image: freebsd-12-0-release-amd64
> -  image: freebsd-11-2-release-amd64
> -cpu: 4
> -memory: 8G
> -
> -  env:
> -DEPENDENCIES: automake libtool gmake gcc wget openssl
> -  python py27-six py27-sphinx py27-openssl
> -  python3 py36-six py36-openssl
> -matrix:
> -  COMPILER: gcc
> -  COMPILER: clang
> -
> -  prepare_script:
> -- pkg install -y ${DEPENDENCIES}
> -
> -  configure_script:
> -- ./boot.sh
> -- ./configure CC=${COMPILER} MAKE=gmake OVS_CFLAGS='-Wall'
> --enable-Werror
> -  || { cat config.log; exit 1; }
> -
> -  build_script:
> -- gmake -j8
> -
> -  check_script:
> -- gmake -j8 check TESTSUITEFLAGS=-j8 RECHECK=yes
> -|| { cat ./tests/testsuite.log; exit 1; }
> --
> 2.38.1
>
>

Sorry for the noise, the v2 is properly signed.

Thanks,
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.comIM: amusil

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 branch-22.12] CI: Remove .cirrus.yml

2022-12-09 Thread Ales Musil
Because we have enabled Cirrus CI remove the
old config from branches to prevent it from failing.

Signed-off-by: Ales Musil 
---
It should be backported all the way down to 21.12.

v2: Add forgotten Signed-off-by.
---
 .cirrus.yml | 31 ---
 1 file changed, 31 deletions(-)
 delete mode 100644 .cirrus.yml

diff --git a/.cirrus.yml b/.cirrus.yml
deleted file mode 100644
index c71a0a249..0
--- a/.cirrus.yml
+++ /dev/null
@@ -1,31 +0,0 @@
-freebsd_build_task:
-
-  freebsd_instance:
-matrix:
-  image: freebsd-12-0-release-amd64
-  image: freebsd-11-2-release-amd64
-cpu: 4
-memory: 8G
-
-  env:
-DEPENDENCIES: automake libtool gmake gcc wget openssl
-  python py27-six py27-sphinx py27-openssl
-  python3 py36-six py36-openssl
-matrix:
-  COMPILER: gcc
-  COMPILER: clang
-
-  prepare_script:
-- pkg install -y ${DEPENDENCIES}
-
-  configure_script:
-- ./boot.sh
-- ./configure CC=${COMPILER} MAKE=gmake OVS_CFLAGS='-Wall' --enable-Werror
-  || { cat config.log; exit 1; }
-
-  build_script:
-- gmake -j8
-
-  check_script:
-- gmake -j8 check TESTSUITEFLAGS=-j8 RECHECK=yes
-|| { cat ./tests/testsuite.log; exit 1; }
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-22.12] CI: Remove .cirrus.yml

2022-12-09 Thread 0-day Robot
Bleep bloop.  Greetings Ales Musil, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Ales Musil  needs to sign off.
Lines checked: 54, Warnings: 0, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn branch-22.12] CI: Remove .cirrus.yml

2022-12-09 Thread Ales Musil
Because we have enabled Cirrus CI remove the
old config from branches to prevent it from failing.
---
It should be backported all the way down to 21.12.
---
 .cirrus.yml | 31 ---
 1 file changed, 31 deletions(-)
 delete mode 100644 .cirrus.yml

diff --git a/.cirrus.yml b/.cirrus.yml
deleted file mode 100644
index c71a0a249..0
--- a/.cirrus.yml
+++ /dev/null
@@ -1,31 +0,0 @@
-freebsd_build_task:
-
-  freebsd_instance:
-matrix:
-  image: freebsd-12-0-release-amd64
-  image: freebsd-11-2-release-amd64
-cpu: 4
-memory: 8G
-
-  env:
-DEPENDENCIES: automake libtool gmake gcc wget openssl
-  python py27-six py27-sphinx py27-openssl
-  python3 py36-six py36-openssl
-matrix:
-  COMPILER: gcc
-  COMPILER: clang
-
-  prepare_script:
-- pkg install -y ${DEPENDENCIES}
-
-  configure_script:
-- ./boot.sh
-- ./configure CC=${COMPILER} MAKE=gmake OVS_CFLAGS='-Wall' --enable-Werror
-  || { cat config.log; exit 1; }
-
-  build_script:
-- gmake -j8
-
-  check_script:
-- gmake -j8 check TESTSUITEFLAGS=-j8 RECHECK=yes
-|| { cat ./tests/testsuite.log; exit 1; }
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev