Re: [ovs-dev] [PATCH ovn v10 6/6] ovn-northd-ddlog: New implementation of ovn-northd based on ddlog.

2021-02-17 Thread 0-day Robot
Bleep bloop.  Greetings Ben Pfaff, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Comment with 'xxx' marker
#3723 FILE: northd/ovn-northd-ddlog.c:166:
.cs = ovsdb_cs_create(database, 1 /* XXX */, _cs_ops, ctx),

WARNING: Comment with 'xxx' marker
#4511 FILE: northd/ovn-northd-ddlog.c:954:
 * XXX If the transaction we're sending to the database fails, then

WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
#4564 FILE: northd/ovn-northd-ddlog.c:1007:
  --ovnnb-db=DATABASE   connect to ovn-nb database at DATABASE\n\

WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
#4566 FILE: northd/ovn-northd-ddlog.c:1009:
  --ovnsb-db=DATABASE   connect to ovn-sb database at DATABASE\n\

WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
#4568 FILE: northd/ovn-northd-ddlog.c:1011:
  --unixctl=SOCKET  override default control socket name\n\

WARNING: Line has trailing whitespace
#6343 FILE: northd/ovn_northd.dl:176:
var l3dgw_port = peer.and_then(|p| p.router.l3dgw_port),

WARNING: Line has non-spaces leading whitespace
WARNING: Line has trailing whitespace
#9125 FILE: northd/ovn_northd.dl:2958:
 

WARNING: Line has non-spaces leading whitespace
WARNING: Line has trailing whitespace
#12817 FILE: northd/ovn_northd.dl:6650:


WARNING: Line has non-spaces leading whitespace
WARNING: Line has trailing whitespace
#13786 FILE: northd/ovn_northd.dl:7619:


WARNING: Line has trailing whitespace
#13987 FILE: northd/ovn_northd.dl:7820:
}

WARNING: Line has non-spaces leading whitespace
WARNING: Line has trailing whitespace
#13988 FILE: northd/ovn_northd.dl:7821:
  

WARNING: Line has non-spaces leading whitespace
WARNING: Line has trailing whitespace
#13989 FILE: northd/ovn_northd.dl:7822:
 

Lines checked: 14606, Warnings: 22, Errors: 0


build:
-e 's,[@]pkgdatadir[@],/usr/local/share/ovn,g' \
-e 's,[@]sysconfdir[@],/usr/local/etc,g' \
-e 's,[@]bindir[@],/usr/local/bin,g' \
-e 's,[@]sbindir[@],/usr/local/sbin,g' \
-e 
's,[@]abs_builddir[@],/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace,g'
 \
-e 
's,[@]abs_top_srcdir[@],/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace,g'
 \
  > utilities/ovn-lib.tmp
mv utilities/ovn-lib.tmp utilities/ovn-lib
:;{ \
  echo '# Signature of the current package.' && \
  echo 'm4_define([AT_PACKAGE_NAME],  [ovn])' && \
  echo 'm4_define([AT_PACKAGE_TARNAME],   [ovn])' && \
  echo 'm4_define([AT_PACKAGE_VERSION],   [20.12.90])' && \
  echo 'm4_define([AT_PACKAGE_STRING],[ovn 20.12.90])' && \
  echo 'm4_define([AT_PACKAGE_BUGREPORT], [b...@openvswitch.org])'; \
} >'./package.m4'
/bin/sh 
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/build-aux/missing
 autom4te --language=autotest -I '.' -o tests/testsuite.tmp tests/testsuite.at
mv tests/testsuite.tmp tests/testsuite
/bin/sh 
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/build-aux/missing
 autom4te --language=autotest -I '.' -o tests/system-kmod-testsuite.tmp 
tests/system-kmod-testsuite.at
mv tests/system-kmod-testsuite.tmp tests/system-kmod-testsuite
/bin/sh 
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/build-aux/missing
 autom4te --language=autotest -I '.' -o tests/system-userspace-testsuite.tmp 
tests/system-userspace-testsuite.at
mv tests/system-userspace-testsuite.tmp tests/system-userspace-testsuite
make[1]: *** No rule to make target `northd/OVN_Northbound.dl', needed by 
`all-am'.  Stop.
make[1]: Leaving directory 
`/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace'
make: *** [all] Error 2


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v10 1/6] Export `VLOG_WARN` and `VLOG_ERR` from libovn for use in ddlog

2021-02-17 Thread 0-day Robot
Bleep bloop.  Greetings Ben Pfaff, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Ben Pfaff 
Lines checked: 77, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v10 2/6] tests: Prepare for multiple northd types.

2021-02-17 Thread Ben Pfaff
The idea is to run each test twice, once with ovn-northd, once
with ovn-northd-ddlog.  To do that, we add a macro OVN_FOR_EACH_NORTHD
and bracket each test that uses ovn-northd in it.

Signed-off-by: Ben Pfaff 
Acked-by: Dumitru Ceara 
---
 tests/ovn-ic.at |  11 +-
 tests/ovn-macros.at |  96 +--
 tests/ovn-northd.at | 100 ---
 tests/ovn.at| 288 
 tests/ovs-macros.at |  20 +--
 tests/system-ovn.at | 124 ++-
 6 files changed, 528 insertions(+), 111 deletions(-)

diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
index 0638af401295..2a4fba031f36 100644
--- a/tests/ovn-ic.at
+++ b/tests/ovn-ic.at
@@ -1,4 +1,5 @@
 AT_BANNER([OVN Interconnection Controller])
+OVN_FOR_EACH_NORTHD([
 AT_SETUP([ovn-ic -- AZ register])
 
 ovn_init_ic_db
@@ -29,7 +30,9 @@ availability-zone az3
 OVN_CLEANUP_IC([az1], [az2])
 
 AT_CLEANUP
+])
 
+OVN_FOR_EACH_NORTHD([
 AT_SETUP([ovn-ic -- transit switch handling])
 
 ovn_init_ic_db
@@ -59,7 +62,9 @@ check_column ts2 nb:Logical_Switch name
 OVN_CLEANUP_IC([az1])
 
 AT_CLEANUP
+])
 
+OVN_FOR_EACH_NORTHD([
 AT_SETUP([ovn-ic -- gateway sync])
 
 ovn_init_ic_db
@@ -120,8 +125,9 @@ OVN_CLEANUP_SBOX(gw2)
 OVN_CLEANUP_IC([az1], [az2])
 
 AT_CLEANUP
+])
 
-
+OVN_FOR_EACH_NORTHD([
 AT_SETUP([ovn-ic -- port sync])
 
 ovn_init_ic_db
@@ -185,7 +191,9 @@ OVN_CLEANUP_SBOX(gw1)
 OVN_CLEANUP_IC([az1], [az2])
 
 AT_CLEANUP
+])
 
+OVN_FOR_EACH_NORTHD([
 AT_SETUP([ovn-ic -- route sync])
 
 ovn_init_ic_db
@@ -310,3 +318,4 @@ OVS_WAIT_WHILE([ovn_as az1 ovn-nbctl lr-route-list lr1 | 
grep learned | grep 10.
 OVN_CLEANUP_IC([az1], [az2])
 
 AT_CLEANUP
+])
diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
index 2ba29a960edc..a6b908741673 100644
--- a/tests/ovn-macros.at
+++ b/tests/ovn-macros.at
@@ -47,10 +47,12 @@ m4_define([OVN_CLEANUP],[
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 
 as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([[$NORTHD_TYPE]])
 
-as northd-backup
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+if test -d northd-backup; then
+as northd-backup
+OVS_APP_EXIT_AND_WAIT([[$NORTHD_TYPE]])
+fi
 
 OVN_CLEANUP_VSWITCH([main])
 ])
@@ -69,10 +71,12 @@ m4_define([OVN_CLEANUP_AZ],[
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 
 as $1/northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([[$NORTHD_TYPE]])
 
-as $1/northd-backup
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+if test -d $1/northd-backup; then
+as $1/northd-backup
+OVS_APP_EXIT_AND_WAIT([[$NORTHD_TYPE]])
+fi
 
 as $1/ic
 OVS_APP_EXIT_AND_WAIT([ovn-ic])
@@ -134,15 +138,48 @@ ovn_init_ic_db () {
 ovn_init_db ovn-ic-sb
 }
 
-# ovn_start [AZ]
+# ovn_start_northd (primary|backup) [AZ]
+ovn_start_northd() {
+local priority=$1
+local AZ=$2
+local msg_prefix=${AZ:+$AZ: }
+local d_prefix=${AZ:+$AZ/}
+
+local suffix=
+case $priority in
+backup) suffix=-backup ;;
+esac
+
+local northd_args=
+case ${NORTHD_TYPE:=ovn-northd} in
+ovn-northd) ;;
+ovn-northd-ddlog) 
northd_args="--ddlog-record=${AZ:+$AZ/}replay$suffix.dat -v" ;;
+esac
+
+local name=${d_prefix}northd${suffix}
+echo "${prefix}starting $name"
+test -d "$ovs_base/$name" || mkdir "$ovs_base/$name"
+as $name start_daemon $NORTHD_TYPE $northd_args -vjsonrpc \
+   --ovnnb-db=$OVN_NB_DB --ovnsb-db=$OVN_SB_DB
+}
+
+# ovn_start [--no-backup-northd] [AZ]
 #
 # Creates and initializes ovn-sb and ovn-nb databases and starts their
 # ovsdb-server instance, sets appropriate environment variables so that
 # ovn-sbctl and ovn-nbctl use them by default, and starts ovn-northd running
 # against them.
 ovn_start () {
-if test -n "$1"; then
-mkdir "$ovs_base"/$1
+local backup_northd=:
+case $1 in
+--no-backup-northd) backup_northd=false; shift ;;
+esac
+local AZ=$1
+local msg_prefix=${AZ:+$AZ: }
+local d_prefix=${AZ:+$AZ/}
+
+if test -n "$AZ"; then
+mkdir "$ovs_base"/$AZ
 fi
 
 ovn_init_db ovn-sb $1; ovn-sbctl init
@@ -150,36 +187,19 @@ ovn_start () {
 if test -n "$1"; then
 ovn-nbctl set NB_Global . name=$1
 fi
-local ovn_sb_db=$OVN_SB_DB
-local ovn_nb_db=$OVN_NB_DB
 
-local as_d=northd
-if test -n "$1"; then
-as_d=$1/$as_d
+ovn_start_northd primary $AZ
+if $backup_northd; then
+ovn_start_northd backup $AZ
 fi
-echo "starting ovn-northd"
-mkdir "$ovs_base"/$as_d
-as $as_d start_daemon ovn-northd -v \
-   --ovnnb-db=$ovn_nb_db \
-   --ovnsb-db=$ovn_sb_db
 
-as_d=northd-backup
-if test -n "$1"; then
-as_d=$1/$as_d
-fi
-echo "starting backup ovn-northd"
-mkdir "$ovs_base"/$as_d
-as $as_d start_daemon ovn-northd -v \
-   --ovnnb-db=$ovn_nb_db \
-   --ovnsb-db=$ovn_sb_db
+if test -n "$AZ"; then

[ovs-dev] [PATCH ovn v10 0/6] Add DDlog implementation of ovn-northd

2021-02-17 Thread Ben Pfaff
Also available here:
https://github.com/blp/ovs-reviews/tree/ddlog19

v10:
  - Fix linking problem reported by Dumitru and others.
  - Update to ddlog 0.36.0 (thanks Leonid!).
  - Port OVN commits up to 858d1dd716db ("ofctrl: Fix the assert seen when 
flood removing flows.")
  - Fix IGMP and MLD snoop/querier/reply tests (thanks Dumitru!).
  - New commit "ovn-sb: Allow Multicast_Group to have empty set of ports."
  - New commit "ovn-northd: Simplify logic in build_bfd_table()."
  - New commit "tests: Wait for updates in "check BFD config propagation to 
BFD" test."

v9:
  - Add ddlog.stamp to BUILT_SOURCES, to fix the dependencies "for sure".
  - Rebase against ovn/master.
  - Port commit fdf295d5eb3a ("pinctrl: Honor
always_learn_from_arp_request for self created MAC_Bindings.")
  - Update to work with and require v2 of the ovsdb-cs library (see
https://mail.openvswitch.org/pipermail/ovs-dev/2020-December/378180.html).
  - All non-skipped tests pass, although some are still racy.
  - This is also available as the ddlog14 branch here:
https://github.com/blp/ovs-reviews/commits/ddlog14
  - Haven't done performance tests.  I aim to do that for v10.

v8:
  - Rebase against latest ovn master.
  - Port commit f9cab11d5fab ("Allow explicit setting of the SNAT zone
on a gateway router.")
  - Port commit f1119c125765 ("northd: Refactor load balancer vip parsing.")
  - Port commit 1dd27ea7aea4 ("Provide the option to pin
ovn-controller and ovn-northd to a specific version.").
  - Port commit 880dca99eaf7 ("northd: Enhance the implementation of
ACL log meters (pre-ddlog merge).")
  - Make ovn-northd-ddlog use the new ovsdb-cs (client sync) library
proposed for OVS
(https://mail.openvswitch.org/pipermail/ovs-dev/2020-November/377828.html).
In addition to eliminating duplicated code, this should make it
able to connect to a raft cluster.
  - Further improvements to test improvement patch (mostly added "check"s
and AS_BOX(...) to announce parts of a test)
  - Modernize some of the ddlog code a bit, e.g. by using the method-like
calls such as foo.get("key") in place of map_get(foo, "key").

v7:
  - Fixed "make dist" (thanks to Numan for reporting this.)
  - Fix probe interval setting (thanks to Numan again).
  - Small amount of improvement of OVN DDlog code (use lambdas in more places).
  - Fine-tuning of ovn-northd-ddlog code for jsonrpc session usage.
  - Fix newline issues in tests reported by Dumitru.
  - Add ack from Dumitru.

v6:
  - Applied and dropped patch for "test a == b".
  - Applied and dropped patch to allow more args to wait_for_row_count().
  - Applied and dropped ARP/ND broadcast test improvement patch.
  - Fix whitespace in patch to export ddlog_warn() and ddlog_err().

v5:
  - Drop patch for ACL log meters (this was included in v4.1 by mistake).
  - In addition to waiting for database, also wait for ports to come up.
  - New patch to support more arguments to wait_for_row_count that waiting
for ports indirectly needed.
  - Port commit c29221d9322a ("Allow VLAN traffic when LS:vlan-passthru=true").
  - Port commit 36a8745de859 ("Add support for DHCP options 28 (Broadcast 
Address)")
  - Add patch to improve "ARP/ND request broadcast limiting" test.

v4:
  - Fix dependencies for parallel build.
  - Fix spelling error in documentation.
  - Use --wait=sb, not --wait=hv, when no chassis are running.
  - Fixed IGMP and MLD tests by porting commit 9d2e8d32fb98
("ofctrl.c: Fix duplicated flow handling in I-P while merging
opposite changes."), which I had missed before.

Ben Pfaff (4):
  tests: Prepare for multiple northd types.
  tests: Wait for updates in "check BFD config propagation to BFD" test.
  ovn-northd: Simplify logic in build_bfd_table().
  ovn-sb: Allow Multicast_Group to have empty set of ports.

Leonid Ryzhyk (2):
  Export `VLOG_WARN` and `VLOG_ERR` from libovn for use in ddlog
  ovn-northd-ddlog: New implementation of ovn-northd based on ddlog.

 Documentation/automake.mk |2 +
 Documentation/intro/install/general.rst   |   31 +-
 Documentation/topics/debugging-ddlog.rst  |  280 +
 Documentation/topics/index.rst|1 +
 Documentation/tutorials/ddlog-new-feature.rst |  362 +
 Documentation/tutorials/index.rst |1 +
 NEWS  |6 +
 TODO.rst  |6 +
 acinclude.m4  |   43 +
 build-aux/automake.mk |1 +
 build-aux/libtool-deps|   35 +
 configure.ac  |5 +
 lib/ovn-util.c|   15 +
 lib/ovn-util.h|5 +
 m4/ovn.m4 |   16 +
 northd/.gitignore |4 +
 northd/automake.mk|  107 +
 northd/helpers.dl |   96 +

[ovs-dev] [PATCH ovn v10 5/6] ovn-sb: Allow Multicast_Group to have empty set of ports.

2021-02-17 Thread Ben Pfaff
I don't know a good reason to intentionally create an empty multicast
group, but disallowing empty multicast groups has an odd side effect:
you can delete all but one of the ports that a group contains and the
database will happily remove the ports from the group automatically
(because 'ports' is a set of weak references) but if you try to delete
the last port that the group contained, the database server will reject
the whole transaction.  That's really weird.  By allowing a multicast
group with no ports, we avoid this special case.

Signed-off-by: Ben Pfaff 
---
 ovn-sb.ovsschema | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
index 0d20f0826646..07eb54b10dd4 100644
--- a/ovn-sb.ovsschema
+++ b/ovn-sb.ovsschema
@@ -1,7 +1,7 @@
 {
 "name": "OVN_Southbound",
-"version": "20.15.0",
-"cksum": "539683023 25965",
+"version": "20.16.0",
+"cksum": "3127541865 25965",
 "tables": {
 "SB_Global": {
 "columns": {
@@ -133,7 +133,7 @@
 "ports": {"type": {"key": {"type": "uuid",
"refTable": "Port_Binding",
"refType": "weak"},
-   "min": 1, "max": "unlimited"}}},
+   "min": 0, "max": "unlimited"}}},
 "indexes": [["datapath", "tunnel_key"],
 ["datapath", "name"]],
 "isRoot": true},
-- 
2.28.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v10 3/6] tests: Wait for updates in "check BFD config propagation to BFD" test.

2021-02-17 Thread Ben Pfaff
Otherwise this test is racy because it assumes that northd finishes
its updates between the ovn-nbctl changes and the subsequent checks.

Also, simplify some series of "check_column" into single
"check_row_count" calls (that then become "wait_row_count").

Signed-off-by: Ben Pfaff 
---
 tests/ovn-northd.at | 44 ++--
 1 file changed, 14 insertions(+), 30 deletions(-)

diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index e4fceff037a8..1a941efe9b5c 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -2463,55 +2463,39 @@ ovn-nbctl create bfd logical_port=r0-sw2 
dst_ip=192.168.20.2 status=down min_tx=
 ovn-nbctl create bfd logical_port=r0-sw3 dst_ip=192.168.30.2 status=down
 ovn-nbctl create bfd logical_port=r0-sw4 dst_ip=192.168.40.2 status=down 
min_tx=0 detect_mult=0
 
-check_column 10 bfd detect_mult logical_port=r0-sw1
-check_column "192.168.10.2" bfd dst_ip logical_port=r0-sw1
-check_column 250 bfd min_rx logical_port=r0-sw1
-check_column 250 bfd min_tx logical_port=r0-sw1
-check_column admin_down bfd status logical_port=r0-sw1
-
-check_column 20 bfd detect_mult logical_port=r0-sw2
-check_column "192.168.20.2" bfd dst_ip logical_port=r0-sw2
-check_column 500 bfd min_rx logical_port=r0-sw2
-check_column 500 bfd min_tx logical_port=r0-sw2
-check_column admin_down bfd status logical_port=r0-sw2
-
-check_column 5 bfd detect_mult logical_port=r0-sw3
-check_column "192.168.30.2" bfd dst_ip logical_port=r0-sw3
-check_column 1000 bfd min_rx logical_port=r0-sw3
-check_column 1000 bfd min_tx logical_port=r0-sw3
-check_column admin_down bfd status logical_port=r0-sw3
+wait_row_count bfd 1 logical_port=r0-sw1 detect_mult=10 dst_ip=192.168.10.2 \
+ min_rx=250 min_tx=250 status=admin_down
+wait_row_count bfd 1 logical_port=r0-sw2 detect_mult=20 dst_ip=192.168.20.2 \
+ min_rx=500 min_tx=500 status=admin_down
+wait_row_count bfd 1 logical_port=r0-sw3 detect_mult=5 dst_ip=192.168.30.2 \
+ min_rx=1000 min_tx=1000 status=admin_down
 
 uuid=$(fetch_column nb:bfd _uuid logical_port=r0-sw1)
-check ovn-nbctl set bfd $uuid min_tx=1000
-check ovn-nbctl set bfd $uuid min_rx=1000
-check ovn-nbctl set bfd $uuid detect_mult=100
+check ovn-nbctl set bfd $uuid min_tx=1000 min_rx=1000 detect_mult=100
 
 uuid_2=$(fetch_column nb:bfd _uuid logical_port=r0-sw2)
 check ovn-nbctl clear bfd $uuid_2 min_rx
-check_column 1000 bfd min_rx logical_port=r0-sw2
-
-check_column 1000 bfd min_tx logical_port=r0-sw1
-check_column 1000 bfd min_rx logical_port=r0-sw1
-check_column 100 bfd detect_mult logical_port=r0-sw1
+wait_row_count bfd 1 logical_port=r0-sw2 min_rx=1000
+wait_row_count bfd 1 logical_port=r0-sw1 min_rx=1000 min_tx=1000 
detect_mult=100
 
 check ovn-nbctl --bfd=$uuid lr-route-add r0 100.0.0.0/8 192.168.10.2
-check_column down bfd status logical_port=r0-sw1
+wait_column down bfd status logical_port=r0-sw1
 AT_CHECK([ovn-nbctl lr-route-list r0 | grep 192.168.10.2 | grep -q bfd],[0])
 
 check ovn-nbctl --bfd lr-route-add r0 200.0.0.0/8 192.168.20.2
-check_column down bfd status logical_port=r0-sw2
+wait_column down bfd status logical_port=r0-sw2
 AT_CHECK([ovn-nbctl lr-route-list r0 | grep 192.168.20.2 | grep -q bfd],[0])
 
 check ovn-nbctl --bfd lr-route-add r0 240.0.0.0/8 192.168.50.2 r0-sw5
-check_column down bfd status logical_port=r0-sw5
+wait_column down bfd status logical_port=r0-sw5
 AT_CHECK([ovn-nbctl lr-route-list r0 | grep 192.168.50.2 | grep -q bfd],[0])
 
 route_uuid=$(fetch_column nb:logical_router_static_route _uuid 
ip_prefix="100.0.0.0/8")
 check ovn-nbctl clear logical_router_static_route $route_uuid bfd
-check_column admin_down bfd status logical_port=r0-sw1
+wait_column admin_down bfd status logical_port=r0-sw1
 
 ovn-nbctl destroy bfd $uuid
-check_row_count bfd 3
+wait_row_count bfd 3
 
 AT_CLEANUP
 ])
-- 
2.28.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v10 1/6] Export `VLOG_WARN` and `VLOG_ERR` from libovn for use in ddlog

2021-02-17 Thread Ben Pfaff
From: Leonid Ryzhyk 

Export `ddlog_warn` and `ddlog_err` functions that are just wrappers
around `VLOG_WARN` and `VLOG_ERR`.  This is not ideal because the
functions are exported by `ovn_util.c` and the resulting log messages use
`ovn_util` as module name.  More importantly, these functions do not do
log rate limiting.

Signed-off-by: Leonid Ryzhyk 
Signed-off-by: Ben Pfaff 
Acked-by: Dumitru Ceara 
---
 TODO.rst   |  6 ++
 lib/ovn-util.c | 15 +++
 lib/ovn-util.h |  5 +
 3 files changed, 26 insertions(+)

diff --git a/TODO.rst b/TODO.rst
index c15815539f4f..ecfe62870fb0 100644
--- a/TODO.rst
+++ b/TODO.rst
@@ -153,6 +153,12 @@ OVN To-do List
   hashtable lookup in parse_port_group() which can be avoided when we are sure
   that the Southbound DB uses the new format.
 
+* ovn-northd-ddlog: Calls to warn() and err() from DDlog code would be
+  better refactored to use the Warning[] relation (and introduce an
+  Error[] relation once we want to issue some errors that way).  This
+  would be easier with some improvements in DDlog to more easily
+  output to multiple relations from a single production.
+
 * IP Multicast Relay
 
   * When connecting bridged logical switches (localnet) to logical routers
diff --git a/lib/ovn-util.c b/lib/ovn-util.c
index b6471063ef7e..a5e0ecd214b2 100644
--- a/lib/ovn-util.c
+++ b/lib/ovn-util.c
@@ -758,3 +758,18 @@ ovn_get_internal_version(void)
  sbrec_get_db_version(),
  N_OVNACTS, OVN_INTERNAL_MINOR_VER);
 }
+
+#ifdef DDLOG
+/* Callbacks used by the ddlog northd code to print warnings and errors. */
+void
+ddlog_warn(const char *msg)
+{
+VLOG_WARN("%s", msg);
+}
+
+void
+ddlog_err(const char *msg)
+{
+VLOG_ERR("%s", msg);
+}
+#endif
diff --git a/lib/ovn-util.h b/lib/ovn-util.h
index 08234611838a..55c7bf382a95 100644
--- a/lib/ovn-util.h
+++ b/lib/ovn-util.h
@@ -261,4 +261,9 @@ BUILD_ASSERT_DECL(SCTP_INIT_CHUNK_LEN == sizeof(struct 
sctp_init_chunk));
 /* See RFC 4960 Sections 3.3.7 and 8.5.1 for information on this flag. */
 #define SCTP_ABORT_CHUNK_FLAG_T (1 << 0)
 
+#ifdef DDLOG
+void ddlog_warn(const char *msg);
+void ddlog_err(const char *msg);
+#endif
+
 #endif
-- 
2.28.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v10 4/6] ovn-northd: Simplify logic in build_bfd_table().

2021-02-17 Thread Ben Pfaff
The code here checked whether 'bfd_e' was null even though that wasn't
necessary given the control flow just above it.

Signed-off-by: Ben Pfaff 
---
 northd/ovn-northd.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 39d798782002..8c74c9fd66b9 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -7457,15 +7457,15 @@ build_bfd_table(struct northd_context *ctx, struct hmap 
*bfd_connections,
 int d_mult = nb_bt->n_detect_mult ? nb_bt->detect_mult[0]
   : BFD_DEF_DETECT_MULT;
 sbrec_bfd_set_detect_mult(sb_bt, d_mult);
-} else if (strcmp(bfd_e->sb_bt->status, nb_bt->status)) {
-if (!strcmp(nb_bt->status, "admin_down") ||
-!strcmp(bfd_e->sb_bt->status, "admin_down")) {
-sbrec_bfd_set_status(bfd_e->sb_bt, nb_bt->status);
-} else {
-nbrec_bfd_set_status(nb_bt, bfd_e->sb_bt->status);
+} else {
+if (strcmp(bfd_e->sb_bt->status, nb_bt->status)) {
+if (!strcmp(nb_bt->status, "admin_down") ||
+!strcmp(bfd_e->sb_bt->status, "admin_down")) {
+sbrec_bfd_set_status(bfd_e->sb_bt, nb_bt->status);
+} else {
+nbrec_bfd_set_status(nb_bt, bfd_e->sb_bt->status);
+}
 }
-}
-if (bfd_e) {
 build_bfd_update_sb_conf(nb_bt, bfd_e->sb_bt);
 
 hmap_remove(_only, _e->hmap_node);
-- 
2.28.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC ovn] northd: set max priority for automatic routes

2021-02-17 Thread Han Zhou
On Wed, Feb 17, 2021 at 2:36 PM Tim Rozet  wrote:
>
>
>
>
> On Tue, Feb 16, 2021 at 4:21 AM Han Zhou  wrote:
>>
>>
>>
>> On Wed, Feb 10, 2021 at 2:17 PM Lorenzo Bianconi <
lorenzo.bianc...@redhat.com> wrote:
>> >
>> > Increase priority for automatic routes (routes created assigning IP
>> > addresses to OVN logical router interfaces) in order to always prefer
them
>> > over static routes since the router has a direct link to the
destination
>> > address (possible use-case can be found here [0]).
>> >
>> > [0] https://bugzilla.redhat.com/show_bug.cgi?id=1891516
>> >
>>
>> Hi Lorenzo, Tim,
>>
>> While the patch may solve the problem in the bug report, I think there
is something more fundamental to be discussed. The problem is caused by
ovn-k8s's use of "src-ip" in static routes which overrides the
direct-connected route. I think the implementation of "src-ip" support in
the static route is somehow flawed. The priorities of the flows generated
by static routes are calculated according to the prefix length, so that
longest prefix routes are prioritised when there are multiple route
matches, which is correct when comparing matches among "dst-ip" routes or
among "src-ip" routes, but is not correct between "dst-ip" and "src-ip"
routes. Comparison of prefix length between these two types of static
routes doesn't make sense, because they match by different fields (src-ip
v.s. dst-ip). For example, when there are static routes:
>> 1. 192.168.0.0/24 via 100.64.0.1 src-ip
>> 2. 10.0.0.0/20 via 100.64.0.2 dst-ip
>>
>> In this example, a packet from 192.168.0.1 to 10.0.0.1 matches both
routes, but it is unreasonable to say it should follow the 1st route just
because it has longer prefix length. Instead, we should prioritize one type
over the other. It seems in physical router implementation policy based
routing always has higher priority than destination routing, so we should
probably enforce it in a similar way in OVN, i.e. set "src-ip" flows with
higher priority than all the "dst-ip" flows. In fact, the policy routing
table already supported this behavior because it is examined before the
static route table.
>>
>> Since the "src-ip" feature in the static route table is flawed, and can
be replaced by the policy routing table, I'd suggest to deprecate it. For
correctness, users (like ovn-k8s) should use the policy routing table
instead for the src-ip based routing rules. Users have full control of how
they want the packets to be routed. For the use case mentioned in the bug
report, it should have two rules in the policy routing table:
>>
>>
>> 100 ip.dst == 100.64.0.0/16 accept # for directly connected destination,
skip the src-ip rules
>> 90   ip.src == 10.244.0.0/24 nexthop 100.64.0.1 # src-ip rules
>>
>> Would this better satisfy the need of ovn-k8s?
>
>
> I believe this is correct. src-ip matching should be done in the policy
table so traditional dest based routing is handled in default routing
table. Need to go double check though.
>
>>
>> If the above is agreed, the priority change of directly connected routes
from this patch is irrelevant to the problem reported in the bug, because
policy routing rules are examined before the static routing table anyway,
so no matter how high the route priority is, it wouldn't matter. In
addition, it seems to me no real use cases would benefit from this change,
but I could be wrong and please correct me if so.
>>
> I disagree with this. Trying to override a directly connected route is
fundamentally wrong, which is why real routers specifically stop a user
from being able to do this. What if a user who had a router attached to
100.64.0.0/16, adds a /32 route for 100.64.0.1 via another
interface/subnet? That would take precedence over the directly attached
route in OVN iiuc and pretty much guarantee improper networking. Directly
connected routes should always take precedence, and therefore the default
route lflows that get installed should always have the highest possible
priority.
>

Hi Tim,

Thanks for your inputs. Here are my thoughts:

In the scenario we discussed, it is in fact the same output interfaces but
just different nexthops in the 10.64.0.0/16 subnet. In this case, adding a
more specific route with a specific nexthop on the directly connected
subnet doesn't seem to be violating any routing principle, right? Use the
topology in the bug report as an example ((In the diagram of the bug report
I think there is a typo: 100.64.0.1 should be the DR's output port IP, and
100.64.0.2 & 100.64.0.3 belong to GR1 and GR2). 10.64.0.0/16 is directly
connected, but the adjacent L2 network doesn't have to have all the /16 IPs
directly connected. Some of the nodes can reside behind a L3 hop. For
example, if we know that 10.64.1.0/24 is behind a router with IP
10.64.255.1, we can still add a route: 10.64.1.0/24 via 10.64.255.1, which
should work with the current OVN implementation, while this patch would in
fact break it.

On the other hand, what I wanted to emphasize is not that we want 

Re: [ovs-dev] [RFC ovn] northd: set max priority for automatic routes

2021-02-17 Thread Lorenzo Bianconi
> On Tue, Feb 16, 2021 at 4:21 AM Han Zhou  wrote:
>>
>>
>>
>> On Wed, Feb 10, 2021 at 2:17 PM Lorenzo Bianconi 
>>  wrote:
>> >
>> > Increase priority for automatic routes (routes created assigning IP
>> > addresses to OVN logical router interfaces) in order to always prefer them
>> > over static routes since the router has a direct link to the destination
>> > address (possible use-case can be found here [0]).
>> >
>> > [0] https://bugzilla.redhat.com/show_bug.cgi?id=1891516
>> >
>>
>> Hi Lorenzo, Tim,

Hi Han,

thx for the detailed reply :)

>>
>> While the patch may solve the problem in the bug report, I think there is 
>> something more fundamental to be discussed. The problem is caused by 
>> ovn-k8s's use of "src-ip" in static routes which overrides the 
>> direct-connected route. I think the implementation of "src-ip" support in 
>> the static route is somehow flawed. The priorities of the flows generated by 
>> static routes are calculated according to the prefix length, so that longest 
>> prefix routes are prioritised when there are multiple route matches, which 
>> is correct when comparing matches among "dst-ip" routes or among "src-ip" 
>> routes, but is not correct between "dst-ip" and "src-ip" routes. Comparison 
>> of prefix length between these two types of static routes doesn't make 
>> sense, because they match by different fields (src-ip v.s. dst-ip). For 
>> example, when there are static routes:
>> 1. 192.168.0.0/24 via 100.64.0.1 src-ip
>> 2. 10.0.0.0/20 via 100.64.0.2 dst-ip
>>
>> In this example, a packet from 192.168.0.1 to 10.0.0.1 matches both routes, 
>> but it is unreasonable to say it should follow the 1st route just because it 
>> has longer prefix length. Instead, we should prioritize one type over the 
>> other. It seems in physical router implementation policy based routing 
>> always has higher priority than destination routing, so we should probably 
>> enforce it in a similar way in OVN, i.e. set "src-ip" flows with higher 
>> priority than all the "dst-ip" flows. In fact, the policy routing table 
>> already supported this behavior because it is examined before the static 
>> route table.
>>
>> Since the "src-ip" feature in the static route table is flawed, and can be 
>> replaced by the policy routing table, I'd suggest to deprecate it. For 
>> correctness, users (like ovn-k8s) should use the policy routing table 
>> instead for the src-ip based routing rules. Users have full control of how 
>> they want the packets to be routed. For the use case mentioned in the bug 
>> report, it should have two rules in the policy routing table:
>>
>>
>> 100 ip.dst == 100.64.0.0/16 accept # for directly connected destination, 
>> skip the src-ip rules
>> 90   ip.src == 10.244.0.0/24 nexthop 100.64.0.1 # src-ip rules
>>
>> Would this better satisfy the need of ovn-k8s?
>
>
> I believe this is correct. src-ip matching should be done in the policy table 
> so traditional dest based routing is handled in default routing table. Need 
> to go double check though.

I agree on this point.

>
>>
>> If the above is agreed, the priority change of directly connected routes 
>> from this patch is irrelevant to the problem reported in the bug, because 
>> policy routing rules are examined before the static routing table anyway, so 
>> no matter how high the route priority is, it wouldn't matter. In addition, 
>> it seems to me no real use cases would benefit from this change, but I could 
>> be wrong and please correct me if so.
>>
> I disagree with this. Trying to override a directly connected route is 
> fundamentally wrong, which is why real routers specifically stop a user from 
> being able to do this. What if a user who had a router attached to 
> 100.64.0.0/16, adds a /32 route for 100.64.0.1 via another interface/subnet? 
> That would take precedence over the directly attached route in OVN iiuc and 
> pretty much guarantee improper networking. Directly connected routes should 
> always take precedence, and therefore the default route lflows that get 
> installed should always have the highest possible priority.

I do not have a strong opinion on it since I do not have any use cases
for overwriting an automatic route with a static one.
@Dumitru Ceara IIRC you mentioned a use case for it, right?

Regards,
Lorenzo

>
>>
>> Thanks,
>> Han
>>
>> > Signed-off-by: Lorenzo Bianconi 
>> > ---
>> >  northd/ovn-northd.c | 27 ---
>> >  1 file changed, 16 insertions(+), 11 deletions(-)
>> >
>> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>> > index b2b5f6a1b..dc8706f2f 100644
>> > --- a/northd/ovn-northd.c
>> > +++ b/northd/ovn-northd.c
>> > @@ -7994,18 +7994,23 @@ build_route_prefix_s(const struct in6_addr 
>> > *prefix, unsigned int plen)
>> >
>> >  static void
>> >  build_route_match(const struct ovn_port *op_inport, const char *network_s,
>> > -  int plen, bool is_src_route, bool is_ipv4, struct ds 
>> > *match,
>> > -  

Re: [ovs-dev] [RFC ovn] northd: set max priority for automatic routes

2021-02-17 Thread Tim Rozet
On Tue, Feb 16, 2021 at 4:21 AM Han Zhou  wrote:

>
>
> On Wed, Feb 10, 2021 at 2:17 PM Lorenzo Bianconi <
> lorenzo.bianc...@redhat.com> wrote:
> >
> > Increase priority for automatic routes (routes created assigning IP
> > addresses to OVN logical router interfaces) in order to always prefer
> them
> > over static routes since the router has a direct link to the destination
> > address (possible use-case can be found here [0]).
> >
> > [0] https://bugzilla.redhat.com/show_bug.cgi?id=1891516
> >
>
> Hi Lorenzo, Tim,
>
> While the patch may solve the problem in the bug report, I think there is
> something more fundamental to be discussed. The problem is caused by
> ovn-k8s's use of "src-ip" in static routes which overrides the
> direct-connected route. I think the implementation of "src-ip" support in
> the static route is somehow flawed. The priorities of the flows generated
> by static routes are calculated according to the prefix length, so that
> longest prefix routes are prioritised when there are multiple route
> matches, which is correct when comparing matches among "dst-ip" routes or
> among "src-ip" routes, but is not correct between "dst-ip" and "src-ip"
> routes. Comparison of prefix length between these two types of static
> routes doesn't make sense, because they match by different fields (src-ip
> v.s. dst-ip). For example, when there are static routes:
> 1. 192.168.0.0/24 via 100.64.0.1 src-ip
> 2. 10.0.0.0/20 via 100.64.0.2 dst-ip
>
> In this example, a packet from 192.168.0.1 to 10.0.0.1 matches both
> routes, but it is unreasonable to say it should follow the 1st route just
> because it has longer prefix length. Instead, we should prioritize one type
> over the other. It seems in physical router implementation policy based
> routing always has higher priority than destination routing, so we should
> probably enforce it in a similar way in OVN, i.e. set "src-ip" flows with
> higher priority than all the "dst-ip" flows. In fact, the policy routing
> table already supported this behavior because it is examined before the
> static route table.
>
> Since the "src-ip" feature in the static route table is flawed, and can be
> replaced by the policy routing table, I'd suggest to deprecate it. For
> correctness, users (like ovn-k8s) should use the policy routing table
> instead for the src-ip based routing rules. Users have full control of how
> they want the packets to be routed. For the use case mentioned in the bug
> report, it should have two rules in the policy routing table:
>

> 100 ip.dst == 100.64.0.0/16 accept # for directly connected destination,
> skip the src-ip rules
> 90   ip.src == 10.244.0.0/24 nexthop 100.64.0.1 # src-ip rules
>
> Would this better satisfy the need of ovn-k8s?
>

I believe this is correct. src-ip matching should be done in the policy
table so traditional dest based routing is handled in default routing
table. Need to go double check though.


> If the above is agreed, the priority change of directly connected routes
> from this patch is irrelevant to the problem reported in the bug, because
> policy routing rules are examined before the static routing table anyway,
> so no matter how high the route priority is, it wouldn't matter. In
> addition, it seems to me no real use cases would benefit from this change,
> but I could be wrong and please correct me if so.
>
> I disagree with this. Trying to override a directly connected route is
fundamentally wrong, which is why real routers specifically stop a user
from being able to do this. What if a user who had a router attached to
100.64.0.0/16, adds a /32 route for 100.64.0.1 via another
interface/subnet? That would take precedence over the directly attached
route in OVN iiuc and pretty much guarantee improper networking. Directly
connected routes should always take precedence, and therefore the default
route lflows that get installed should always have the highest possible
priority.


> Thanks,
> Han
>
> > Signed-off-by: Lorenzo Bianconi 
> > ---
> >  northd/ovn-northd.c | 27 ---
> >  1 file changed, 16 insertions(+), 11 deletions(-)
> >
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index b2b5f6a1b..dc8706f2f 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -7994,18 +7994,23 @@ build_route_prefix_s(const struct in6_addr
> *prefix, unsigned int plen)
> >
> >  static void
> >  build_route_match(const struct ovn_port *op_inport, const char
> *network_s,
> > -  int plen, bool is_src_route, bool is_ipv4, struct ds
> *match,
> > -  uint16_t *priority)
> > +  int plen, bool is_src_route, bool is_ipv4, bool
> automatic,
> > +  struct ds *match, uint16_t *priority)
> >  {
> > +int prefix_len = plen;
> >  const char *dir;
> >  /* The priority here is calculated to implement longest-prefix-match
> > - * routing. */
> > + * routing. Automatic routes have max priority */
> > +

Re: [ovs-dev] [PATCH] github: Run clang test with AddressSanitizer enabled.

2021-02-17 Thread Ilya Maximets
On 1/27/21 12:03 PM, Ilya Maximets wrote:
> On 1/27/21 11:42 AM, Dumitru Ceara wrote:
>> On 1/22/21 8:04 PM, Ilya Maximets wrote:
>>> This commit is based on a similar one from OVN by Dumitru Ceara:
>>>    a429b24f7bf5 ("ci: Enable AddressSanitizer in Linux clang CI test runs.")
>>>
>>> It's useful to run testsuite with address sanitizer enabled to catch
>>> memory leaks and invalid memory accesses.  Skipping re-check if
>>> AddressSanitizer reports are present in the test run directory to
>>> not lose them.
>>>
>>> Right now OVS has no memory leaks detected on a testsuite run with -O1.
>>> With -O2 there are few false-positive leak reports in test-ovsdb
>>> application, so not using this optimization level for now.  For the
>>> same reason not enabling leak detection by default for everyone.
>>> Enabled only in CI.
>>>
>>> AddressSanitizer increases execution time for this job from ~12 to ~16
>>> minutes, but it looks like a reasonable sacrifice.
>>>
>>> Signed-off-by: Ilya Maximets 
>>> ---
>>
>> Looks good to me, thanks!
>>
>> Acked-by: Dumitru Ceara 
>>
> 
> Thanks, Dumitru.
> 
> I'll defer applying of this patch for now.  The 'bfd decay' test is unstable
> while running under AddressSanitizer.  It was always too time-sensitive.
> We need to look at this test and fix it or disable for this CI job.

This test didn't fail for me for a last couple of weeks, so maybe
the problem is not that frequent.  Taking that into account, I think
it's OK to apply this patch.

Applied to master.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] connmgr: Check nullptr inside ofmonitor_report()

2021-02-17 Thread Yifeng Sun
Thanks William and YiHung for review, I sent a new version.

On Tue, Feb 16, 2021 at 8:45 PM William Tu  wrote:

> On Tue, Feb 16, 2021 at 1:40 PM Yi-Hung Wei  wrote:
> >
> > On Tue, Feb 16, 2021 at 1:06 PM Yifeng Sun 
> wrote:
> > >
> > > ovs-vswitchd could crash under these circumstances:
> > > 1. When one bridge is being destroyed, ofproto_destroy() is called and
> > > connmgr pointer of its ofproto struct is nullified. This ofproto
> struct is
> > > deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
> > > 2. Before RCU enters quiesce state to actually free this ofproto
> struct,
> > > revalidator thread calls udpif_revalidator(), which could handle
> > > a learn flow and calls ofproto_flow_mod_learn(), it later calls
> > > ofmonitor_report() and ofproto struct's connmgr pointer is accessed.
> > >
>
> LGTM, thanks. I guess this is hard to reproduce or create a test.
> Do we need to worry about other places that use 'ofproto->connmgr'?
> ex: there are a couple of places calling ofmonitor_flush(ofproto->connmgr);
>
> > > The crash stack trace is shown below:
> > >
> > > 0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30,
> event=event@entry=NXFME_ADDED,
> > > reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0,
> abbrev_xid=0, old_actions=old_actions@entry=0x0)
> > > at ofproto/connmgr.c:2160
> > > 1  0x7fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0,
> ofm=, req=req@entry=0x0)
> > > at ofproto/ofproto.c:5221
> > > 2  0x7fa4d68036af in modify_flows_finish (req=0x0,
> ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
> > > at ofproto/ofproto.c:5823
> > > 3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, 
> > > ofm=ofm@entry=0x7fa4980753f0,
> req=req@entry=0x0)
> > > at ofproto/ofproto.c:8088
> > > 4  0x7fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry
> =0x7fa4980753f0,
> > > orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
> > > 5  0x7fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0,
> keep_ref=keep_ref@entry=true,
> > > limit=, below_limitp=below_limitp@entry=0x0) at
> ofproto/ofproto.c:5499
> > > 6  0x7fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448,
> stats=stats@entry=0x7fa4d2701a10,
> > > offloaded=offloaded@entry=false) at
> ofproto/ofproto-dpif-xlate-cache.c:127
> > > 7  0x7fa4d6835e3a in xlate_push_stats (xcache=,
> stats=stats@entry=0x7fa4d2701a10,
> > > offloaded=offloaded@entry=false) at
> ofproto/ofproto-dpif-xlate-cache.c:181
> > > 8  0x7fa4d6822046 in revalidate_ukey 
> > > (udpif=udpif@entry=0x55d90760b240,
> ukey=ukey@entry=0x7fa4b0191660,
> > > stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry
> =0x7fa4d2701b50,
> > > reval_seq=reval_seq@entry=5655486242, 
> > > recircs=recircs@entry=0x7fa4d2701b40,
> offloaded=false)
> > > at ofproto/ofproto-dpif-upcall.c:2294
> > > 9  0x7fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at
> ofproto/ofproto-dpif-upcall.c:2683
> > > 10 0x7fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at
> ofproto/ofproto-dpif-upcall.c:936
> > > 11 0x7fa4d6259c9f in ovsthread_wrapper (aux_=) at
> lib/ovs-thread.c:423
> > > 12 0x7fa4d582cea5 in start_thread () from
> /usr/lib64/libpthread.so.0
> > > 13 0x7fa4d504b96d in clone () from /usr/lib64/libc.so.6
> > >
> > > At the time of crash, the involved ofproto was already deallocated:
> > >
> > > (gdb) print *ofproto
> > > $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
> > > one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...
> > >
> > > This patch fixes it.
> > >
> > > VMware-BZ: #2700626
> > > Signed-off-by: Yifeng Sun 
> > > ---
> >
> > LGTM.
> >
> > Acked-by: Yi-Hung Wei 
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2] connmgr: Check nullptr inside ofmonitor_report()

2021-02-17 Thread Yifeng Sun
ovs-vswitchd could crash under these circumstances:
1. When one bridge is being destroyed, ofproto_destroy() is called and
connmgr pointer of its ofproto struct is nullified. This ofproto struct is
deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
2. Before RCU enters quiesce state to actually free this ofproto struct,
revalidator thread calls udpif_revalidator(), which could handle
a learn flow and calls ofproto_flow_mod_learn(), it later calls
ofmonitor_report() and ofproto struct's connmgr pointer is accessed.

The crash stack trace is shown below:

0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, 
event=event@entry=NXFME_ADDED,
reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, 
old_actions=old_actions@entry=0x0)
at ofproto/connmgr.c:2160
1  0x7fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, 
ofm=, req=req@entry=0x0)
at ofproto/ofproto.c:5221
2  0x7fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, 
ofproto=0x55d9075d4ab0)
at ofproto/ofproto.c:5823
3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, 
ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
at ofproto/ofproto.c:8088
4  0x7fa4d680372d in ofproto_flow_mod_learn_finish 
(ofm=ofm@entry=0x7fa4980753f0,
orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
5  0x7fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, 
keep_ref=keep_ref@entry=true,
limit=, below_limitp=below_limitp@entry=0x0) at 
ofproto/ofproto.c:5499
6  0x7fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, 
stats=stats@entry=0x7fa4d2701a10,
offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
7  0x7fa4d6835e3a in xlate_push_stats (xcache=, 
stats=stats@entry=0x7fa4d2701a10,
offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
8  0x7fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, 
ukey=ukey@entry=0x7fa4b0191660,
stats=stats@entry=0x7fa4d2705118, 
odp_actions=odp_actions@entry=0x7fa4d2701b50,
reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, 
offloaded=false)
at ofproto/ofproto-dpif-upcall.c:2294
9  0x7fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at 
ofproto/ofproto-dpif-upcall.c:2683
10 0x7fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at 
ofproto/ofproto-dpif-upcall.c:936
11 0x7fa4d6259c9f in ovsthread_wrapper (aux_=) at 
lib/ovs-thread.c:423
12 0x7fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
13 0x7fa4d504b96d in clone () from /usr/lib64/libc.so.6

At the time of crash, the involved ofproto was already deallocated:

(gdb) print *ofproto
$1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...

This patch fixes it.

VMware-BZ: #2700626
Signed-off-by: Yifeng Sun 
---
v1->v2: Add check for ofmonitor_flush, thanks William.

 ofproto/connmgr.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index 9c5c633b4171..fa8f6cd0e83a 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -2140,7 +2140,7 @@ ofmonitor_report(struct connmgr *mgr, struct rule *rule,
  const struct rule_actions *old_actions)
 OVS_REQUIRES(ofproto_mutex)
 {
-if (rule_is_hidden(rule)) {
+if (!mgr || rule_is_hidden(rule)) {
 return;
 }
 
@@ -2244,6 +2244,10 @@ ofmonitor_flush(struct connmgr *mgr)
 {
 struct ofconn *ofconn;
 
+if (!mgr) {
+return;
+}
+
 LIST_FOR_EACH (ofconn, connmgr_node, >conns) {
 struct rconn_packet_counter *counter = ofconn->monitor_counter;
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 ovn] controller: introduce coverage_counters for ovn-controller incremental processing

2021-02-17 Thread Lorenzo Bianconi
In order to help understanding system behaviour for debugging purpose,
introduce coverage counters for run handlers of ovn-controller
I-P engine nodes. Moreover add a global counter for engine abort.

https://bugzilla.redhat.com/show_bug.cgi?id=1890902
Signed-off-by: Lorenzo Bianconi 
---
Changes since v1:
- drop handler counters and add global abort counter
- improve documentation and naming scheme
- introduce engine_set_node_updated utility macro
---
 controller/ovn-controller.c | 39 +++--
 lib/inc-proc-eng.c  |  5 +
 lib/inc-proc-eng.h  | 12 +++-
 3 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 4343650fc..42eed9ebd 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -69,6 +69,7 @@
 #include "stopwatch.h"
 #include "lib/inc-proc-eng.h"
 #include "hmapx.h"
+#include "coverage.h"
 
 VLOG_DEFINE_THIS_MODULE(main);
 
@@ -85,6 +86,18 @@ static unixctl_cb_func lflow_cache_flush_cmd;
 static unixctl_cb_func lflow_cache_show_stats_cmd;
 static unixctl_cb_func debug_delay_nb_cfg_report;
 
+/* Coverage counters for run handlers of OVN controller
+ * incremental processing nodes
+ */
+ENGINE_RUN_COVERAGE_DEFINE(flow_output);
+ENGINE_RUN_COVERAGE_DEFINE(runtime_data);
+ENGINE_RUN_COVERAGE_DEFINE(addr_sets);
+ENGINE_RUN_COVERAGE_DEFINE(port_groups);
+ENGINE_RUN_COVERAGE_DEFINE(ct_zones);
+ENGINE_RUN_COVERAGE_DEFINE(mff_ovn_geneve);
+ENGINE_RUN_COVERAGE_DEFINE(ofctrl_is_connected);
+ENGINE_RUN_COVERAGE_DEFINE(physical_flow_changes);
+
 #define DEFAULT_BRIDGE_NAME "br-int"
 #define DEFAULT_PROBE_INTERVAL_MSEC 5000
 #define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
@@ -955,6 +968,10 @@ ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
 SB_NODE(dns, "dns") \
 SB_NODE(load_balancer, "load_balancer")
 
+#define SB_NODE(NAME, NAME_STR) ENGINE_RUN_COVERAGE_DEFINE(sb_##NAME);
+SB_NODES
+#undef SB_NODE
+
 enum sb_engine_node {
 #define SB_NODE(NAME, NAME_STR) SB_##NAME,
 SB_NODES
@@ -972,6 +989,10 @@ enum sb_engine_node {
 OVS_NODE(interface, "interface") \
 OVS_NODE(qos, "qos")
 
+#define OVS_NODE(NAME, NAME_STR) ENGINE_RUN_COVERAGE_DEFINE(ovs_##NAME);
+OVS_NODES
+#undef OVS_NODE
+
 enum ovs_engine_node {
 #define OVS_NODE(NAME, NAME_STR) OVS_##NAME,
 OVS_NODES
@@ -1011,7 +1032,7 @@ en_ofctrl_is_connected_run(struct engine_node *node, void 
*data)
 ofctrl_seqno_flush();
 binding_seqno_flush();
 }
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, ofctrl_is_connected);
 return;
 }
 engine_set_node_state(node, EN_UNCHANGED);
@@ -1067,7 +1088,7 @@ en_addr_sets_run(struct engine_node *node, void *data)
 addr_sets_init(as_table, >addr_sets);
 
 as->change_tracked = false;
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, addr_sets);
 }
 
 static bool
@@ -1147,7 +1168,7 @@ en_port_groups_run(struct engine_node *node, void *data)
 port_groups_init(pg_table, >port_groups);
 
 pg->change_tracked = false;
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, port_groups);
 }
 
 static bool
@@ -1482,7 +1503,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
 
 binding_run(_ctx_in, _ctx_out);
 
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, runtime_data);
 }
 
 static bool
@@ -1604,8 +1625,7 @@ en_ct_zones_run(struct engine_node *node, void *data)
 _zones_data->current, ct_zones_data->bitmap,
 _zones_data->pending, _data->ct_updated_datapaths);
 
-
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, ct_zones);
 }
 
 /* The data in the ct_zones node is always valid (i.e., no stale pointers). */
@@ -1639,7 +1659,7 @@ en_mff_ovn_geneve_run(struct engine_node *node, void 
*data)
 enum mf_field_id mff_ovn_geneve = ofctrl_get_mf_field_id();
 if (ed_mff_ovn_geneve->mff_ovn_geneve != mff_ovn_geneve) {
 ed_mff_ovn_geneve->mff_ovn_geneve = mff_ovn_geneve;
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, mff_ovn_geneve);
 return;
 }
 engine_set_node_state(node, EN_UNCHANGED);
@@ -1714,7 +1734,7 @@ en_physical_flow_changes_run(struct engine_node *node, 
void *data)
 {
 struct ed_type_pfc_data *pfc_tdata = data;
 pfc_tdata->recompute_physical_flows = true;
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, physical_flow_changes);
 }
 
 /* ct_zone changes are not handled incrementally but a handler is required
@@ -2034,8 +2054,7 @@ en_flow_output_run(struct engine_node *node, void *data)
 init_physical_ctx(node, rt_data, _ctx);
 
 physical_run(_ctx, >flow_table);
-
-engine_set_node_state(node, EN_UPDATED);
+engine_set_node_updated(node, flow_output);
 }
 
 static 

Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Fix ukey leak on udpif destroy.

2021-02-17 Thread William Tu
On Wed, Feb 17, 2021 at 8:40 AM Ilya Maximets  wrote:
>
> On 1/18/21 5:12 PM, Ilya Maximets wrote:
> > Since commit 79eadafeb1b4 udpif_stop_threads() doesn't delete datapath
> > flows while called from udpif_destroy().  This means that ukeys are
> > not cleaned up either.  So, hash maps in udpif->ukeys[] might still
> > contain valid pointers to ukeys that should be destroyed before
> > destroying the hash map itself:
> >
> >   ==2783089==ERROR: LeakSanitizer: detected memory leaks
> >
> >   Direct leak of 1560 byte(s) in 1 object(s) allocated from:
> > # 0 0x7f8a57eae667 in __interceptor_malloc (/lib64/libasan.so.6+0xb0667)
> > # 1 0x8411f6 in xmalloc lib/util.c:138
> > # 2 0x4d8a52 in ukey_create__ ofproto/ofproto-dpif-upcall.c:1682
> > # 3 0x4d99e3 in ukey_create_from_upcall 
> > ofproto/ofproto-dpif-upcall.c:1751
> > # 4 0x4d517d in upcall_xlate ofproto/ofproto-dpif-upcall.c:1242
> > # 5 0x4d63d2 in process_upcall ofproto/ofproto-dpif-upcall.c:1414
> > # 6 0x4d29f3 in recv_upcalls ofproto/ofproto-dpif-upcall.c:833
> > # 7 0x4d1ee1 in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:750
> > # 8 0x795aa2 in ovsthread_wrapper lib/ovs-thread.c:383
> > # 9 0x7f8a57a59431 in start_thread (/lib64/libpthread.so.0+0x9431)
> >
> > Fixes: 79eadafeb1b4 ("ofproto: Do not delete datapath flows on exit by 
> > default.")
> > Reported-by: Dumitru Ceara 
> > Signed-off-by: Ilya Maximets 
> > ---

LGTM. LeakSanitizer is pretty useful.
Acked-by: William Tu 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] docs: Add instruction to set local_ip to ipsec tutorial

2021-02-17 Thread Mark Gray
On 17/02/2021 15:17, Balazs Nemeth wrote:
> Signed-off-by: Balazs Nemeth 
> ---
>  Documentation/tutorials/ipsec.rst | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/tutorials/ipsec.rst 
> b/Documentation/tutorials/ipsec.rst
> index 3b3e42c59..b6cc1c3a8 100644
> --- a/Documentation/tutorials/ipsec.rst
> +++ b/Documentation/tutorials/ipsec.rst
> @@ -273,7 +273,26 @@ external IP is 1.1.1.1, and `host_2`'s external IP is 
> 2.2.2.2. Make sure
>  authenticated; otherwise, any certificate signed by the CA would be
>  accepted.
> 
> -3. Test IPsec tunnel.
> +3. Set the `local_ip` field in the Interface table (Optional)
> +
> +Make sure that the `local_ip` field in the Interface table is set to the
> +NIC used for egress traffic.
> +
> +On `host 1`::
> +
> +   # ovs-vsctl set Interface tun options:local_ip=$ip_1
> +
> +Similarly, on `host 2`::
> +
> +   # ovs-vsctl set Interface tun options:local_ip=$ip_2
> +
> +   .. note::
> +
> +It is not strictly necessary to set the `local_ip` field if your 
> system
> +only has one NIC or the default gateway interface is set to the NIC
> +used for egress traffic.
> +
> +4. Test IPsec tunnel.
> 
> Now you should have an IPsec GRE tunnel running between two hosts. To 
> verify
> it, in `host_1`::
> --
> 2.29.2
> 
Acked-by: Mark Gray 

I also ran make docs-check to check that it renders correctly - it does.
Thanks for the patch Balazs.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Documentation: Fix DPDK qos example.

2021-02-17 Thread William Tu
>
> >>  Information Rate). High priority traffic is routed to queue 10,
> >> which marks
> >>  all traffic as CIR, i.e. Green. All low priority traffic, queue 20,
> >> is
> >>  marked as EIR, i.e. Yellow::
> >>
> >>  $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
> >>  --id=@myqos create qos type=trtcm-policer \
> >> -other-config:cir=52000 other-config:cbs=2048 \
> >> -other-config:eir=52000 other-config:ebs=2048  \
>
> 52000 is fine as our documentation states cir, eir are in bytes per
> second, minus the ethernet header.
> So (64-12) * 1000 = 52000

How come it's not minus 14-byte ethernet header?

the rest looks good to me. Thanks!
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Documentation: Fix DPDK qos example.

2021-02-17 Thread Eelco Chaudron



On 17 Feb 2021, at 18:17, William Tu wrote:

On Wed, Feb 17, 2021 at 8:21 AM Eelco Chaudron  
wrote:




On 17 Feb 2021, at 9:23, Eelco Chaudron wrote:


On 17 Feb 2021, at 4:39, William Tu wrote:


RFC4115 says "The CIR and EIR are both measured in bits/s."
Fix the example use case based on the decription.
64-Byte packet * 8 * 1000 pps = 512000


Did you run some tests to verify the changes you made?


Fixes: e61bdffc2a98 ("netdev-dpdk: Add new DPDK RFC 4115 egress
policer")
Signed-off-by: William Tu 
---
 Documentation/topics/dpdk/qos.rst | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/topics/dpdk/qos.rst
b/Documentation/topics/dpdk/qos.rst
index 103495415a9c..e9a51ab3a3f0 100644
--- a/Documentation/topics/dpdk/qos.rst
+++ b/Documentation/topics/dpdk/qos.rst
@@ -69,22 +69,22 @@ to prioritize certain traffic over others at a
port level.

 For example, the following configuration will limit the traffic 
rate

at a
 port level to a maximum of 2000 packets a second (64 bytes IPv4
packets).
-100pps as CIR (Committed Information Rate) and 1000pps as EIR
(Excess
+1000pps as CIR (Committed Information Rate) and 1000pps as EIR
(Excess


Ack this should be 1000


 Information Rate). High priority traffic is routed to queue 10,
which marks
 all traffic as CIR, i.e. Green. All low priority traffic, queue 
20,

is
 marked as EIR, i.e. Yellow::

 $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
 --id=@myqos create qos type=trtcm-policer \
-other-config:cir=52000 other-config:cbs=2048 \
-other-config:eir=52000 other-config:ebs=2048  \


52000 is fine as our documentation states cir, eir are in bytes per
second, minus the ethernet header.
So (64-12) * 1000 = 52000


+other-config:cir=512000 other-config:cbs=2048 \
+other-config:eir=512000 other-config:ebs=2048  \
 queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
  --id=@dpdk1Q10 create queue \
-  other-config:cir=4160 other-config:cbs=2048 \


This one should change to cir 2*52000 = 104000 also.


-  other-config:eir=0 other-config:ebs=0 -- \


This should change unaltered.


+  other-config:cir=512000 other-config:cbs=2048 \
+  other-config:eir=512000 other-config:ebs=0 -- \


The eir should stay zero here


  --id=@dpdk1Q20 create queue \
other-config:cir=0 other-config:cbs=0 \
-   other-config:eir=4160 other-config:ebs=2048 \
+   other-config:eir=512000 other-config:ebs=2048 \


This should be 52000 also the trailing backslash can be removed.


 This configuration accomplishes that the high priority traffic has 
a

 guaranteed bandwidth egressing the ports at CIR (1000pps), but it
can also


I’ll re-run some of my tests if you have not done so with the new
config. Hopefully next week or the week after.


I found some time to re-test this, and with the new values it works 
as

expected.
To be clear this is the correct set:

$ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
 --id=@myqos create qos type=trtcm-policer \
 other-config:cir=52000 other-config:cbs=2048 \
 other-config:eir=52000 other-config:ebs=2048  \
 queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
  --id=@dpdk1Q10 create queue \
   other-config:cir=104000 other-config:cbs=2048 \
   other-config:eir=0 other-config:ebs=0 -- \
  --id=@dpdk1Q20 create queue \
other-config:cir=0 other-config:cbs=0 \
other-config:eir=52000 other-config:ebs=2048


Hi Eelco,
Thanks!
One question, I thought for queue10, instead of "cir=104000", we 
should do

  --id=@dpdk1Q10 create queue \
   other-config:cir=52000 other-config:cbs=2048 \
   other-config:eir=52000 other-config:ebs=2048 -- \
because later on in the description, we said "
 This configuration accomplishes that the high priority traffic has a
  guaranteed bandwidth egressing the ports at CIR (1000pps), but it 
can also
  use the EIR, so a total of 2000pps at max. These additional 1000pps 
is
 shared with the low priority traffic. The low priority traffic can 
use at

 maximum 1000pps.
"


My configuration is fine, the queue 10 classifier will mark all traffic 
as green, then when it comes to the port classifier, it will first let 
the port level CIR, and if none is left it will eat the EIR.


If you configure it your way, in theory, you could drop an EIR marked 
packet at the queue level when there is room in CIR but not in the EIR 
at port level.


Basically, if you look at the RFC the queue level classifier is done as 
color blind, followed by the port processing which is done colorful.



William


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Documentation: Fix DPDK qos example.

2021-02-17 Thread William Tu
On Wed, Feb 17, 2021 at 8:21 AM Eelco Chaudron  wrote:
>
>
>
> On 17 Feb 2021, at 9:23, Eelco Chaudron wrote:
>
> > On 17 Feb 2021, at 4:39, William Tu wrote:
> >
> >> RFC4115 says "The CIR and EIR are both measured in bits/s."
> >> Fix the example use case based on the decription.
> >> 64-Byte packet * 8 * 1000 pps = 512000
> >
> > Did you run some tests to verify the changes you made?
> >
> >> Fixes: e61bdffc2a98 ("netdev-dpdk: Add new DPDK RFC 4115 egress
> >> policer")
> >> Signed-off-by: William Tu 
> >> ---
> >>  Documentation/topics/dpdk/qos.rst | 12 ++--
> >>  1 file changed, 6 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/Documentation/topics/dpdk/qos.rst
> >> b/Documentation/topics/dpdk/qos.rst
> >> index 103495415a9c..e9a51ab3a3f0 100644
> >> --- a/Documentation/topics/dpdk/qos.rst
> >> +++ b/Documentation/topics/dpdk/qos.rst
> >> @@ -69,22 +69,22 @@ to prioritize certain traffic over others at a
> >> port level.
> >>
> >>  For example, the following configuration will limit the traffic rate
> >> at a
> >>  port level to a maximum of 2000 packets a second (64 bytes IPv4
> >> packets).
> >> -100pps as CIR (Committed Information Rate) and 1000pps as EIR
> >> (Excess
> >> +1000pps as CIR (Committed Information Rate) and 1000pps as EIR
> >> (Excess
>
> Ack this should be 1000
>
> >>  Information Rate). High priority traffic is routed to queue 10,
> >> which marks
> >>  all traffic as CIR, i.e. Green. All low priority traffic, queue 20,
> >> is
> >>  marked as EIR, i.e. Yellow::
> >>
> >>  $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
> >>  --id=@myqos create qos type=trtcm-policer \
> >> -other-config:cir=52000 other-config:cbs=2048 \
> >> -other-config:eir=52000 other-config:ebs=2048  \
>
> 52000 is fine as our documentation states cir, eir are in bytes per
> second, minus the ethernet header.
> So (64-12) * 1000 = 52000
>
> >> +other-config:cir=512000 other-config:cbs=2048 \
> >> +other-config:eir=512000 other-config:ebs=2048  \
> >>  queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
> >>   --id=@dpdk1Q10 create queue \
> >> -  other-config:cir=4160 other-config:cbs=2048 \
>
> This one should change to cir 2*52000 = 104000 also.
>
> >> -  other-config:eir=0 other-config:ebs=0 -- \
>
> This should change unaltered.
>
> >> +  other-config:cir=512000 other-config:cbs=2048 \
> >> +  other-config:eir=512000 other-config:ebs=0 -- \
> >
> > The eir should stay zero here
> >
> >>   --id=@dpdk1Q20 create queue \
> >> other-config:cir=0 other-config:cbs=0 \
> >> -   other-config:eir=4160 other-config:ebs=2048 \
> >> +   other-config:eir=512000 other-config:ebs=2048 \
>
> This should be 52000 also the trailing backslash can be removed.
> >>
> >>  This configuration accomplishes that the high priority traffic has a
> >>  guaranteed bandwidth egressing the ports at CIR (1000pps), but it
> >> can also
> >
> > I’ll re-run some of my tests if you have not done so with the new
> > config. Hopefully next week or the week after.
>
> I found some time to re-test this, and with the new values it works as
> expected.
> To be clear this is the correct set:
>
> $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
>  --id=@myqos create qos type=trtcm-policer \
>  other-config:cir=52000 other-config:cbs=2048 \
>  other-config:eir=52000 other-config:ebs=2048  \
>  queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
>   --id=@dpdk1Q10 create queue \
>other-config:cir=104000 other-config:cbs=2048 \
>other-config:eir=0 other-config:ebs=0 -- \
>   --id=@dpdk1Q20 create queue \
> other-config:cir=0 other-config:cbs=0 \
> other-config:eir=52000 other-config:ebs=2048

Hi Eelco,
Thanks!
One question, I thought for queue10, instead of "cir=104000", we should do
  --id=@dpdk1Q10 create queue \
   other-config:cir=52000 other-config:cbs=2048 \
   other-config:eir=52000 other-config:ebs=2048 -- \
because later on in the description, we said "
 This configuration accomplishes that the high priority traffic has a
  guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
  use the EIR, so a total of 2000pps at max. These additional 1000pps is
 shared with the low priority traffic. The low priority traffic can use at
 maximum 1000pps.
"
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 8/9] conntrack: Do not log empty ct-sweep

2021-02-17 Thread Gaetan Rivet
Do not add noise to the DBG log for empty sweeps.
Only log time taken when some connections were cleaned.

Signed-off-by: Gaetan Rivet 
Reviewed-by: Eli Britstein 
---
 lib/conntrack.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 1b21b79bd..e042683aa 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1574,8 +1574,10 @@ ct_sweep(struct conntrack *ct, long long now, size_t 
limit)
 }
 
 out:
-VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec", count,
- time_msec() - now);
+if (count > 0) {
+VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec", count,
+ time_msec() - now);
+}
 return min_expiration;
 }
 
-- 
2.30.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 5/9] conntrack: Inverse conn and ct lock precedence

2021-02-17 Thread Gaetan Rivet
The lock priority order is for the global 'ct_lock' to be taken first
and then 'conn->lock'. This is an issue, as multiple operations on
connections are thus blocked between threads contending on the
global 'ct_lock'.

This was previously necessary due to how the expiration lists, timeout
policies and zone limits were managed. They are now using RCU-friendly
structures that allow concurrent readers. The mutual exclusion now only
needs to happen during writes.

This allows reducing the 'ct_lock' precedence, and to only take it
when writing the relevant structures. This will reduce contention on
'ct_lock', which impairs scalability when the connection tracker is
used by many threads.

Signed-off-by: Gaetan Rivet 
Reviewed-by: Eli Britstein 
---
 lib/conntrack-private.h | 16 +---
 lib/conntrack-tp.c  | 29 ++---
 lib/conntrack.c | 58 +
 3 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 7eb555092..e8ffb5bf9 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -140,6 +140,9 @@ struct conn {
 struct nat_action_info_t *nat_info;
 char *alg;
 struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */
+atomic_flag reclaimed; /* False during the lifetime of the connection,
+* True as soon as a thread has started freeing
+* its memory. */
 
 /* Mutable data. */
 struct ovs_mutex lock; /* Guards all mutable fields. */
@@ -200,8 +203,8 @@ struct conntrack {
 };
 
 /* Lock acquisition order:
- *1. 'ct_lock'
- *2. 'conn->lock'
+ *1. 'conn->lock'
+ *2. 'ct_lock'
  *3. 'resources_lock'
  */
 
@@ -222,11 +225,18 @@ struct ct_l4_proto {
 };
 
 static inline void
-conn_expire_remove(struct conn_expire *exp)
+conn_expire_remove(struct conn_expire *exp, bool need_ct_lock)
+OVS_NO_THREAD_SAFETY_ANALYSIS
 {
 if (!atomic_flag_test_and_set(>remove_once)
 && rculist_next(>node)) {
+if (need_ct_lock) {
+ovs_mutex_lock(>ct->ct_lock);
+}
 rculist_remove(>node);
+if (need_ct_lock) {
+ovs_mutex_unlock(>ct->ct_lock);
+}
 }
 }
 
diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c
index a376857ec..01ffa30df 100644
--- a/lib/conntrack-tp.c
+++ b/lib/conntrack-tp.c
@@ -238,6 +238,7 @@ tm_to_ct_dpif_tp(enum ct_timeout tm)
 
 static void
 conn_expire_init(struct conn *conn, struct conntrack *ct)
+OVS_REQUIRES(conn->lock)
 {
 struct conn_expire *exp = >exp;
 
@@ -257,20 +258,22 @@ conn_expire_insert(struct conn *conn)
 {
 struct conn_expire *exp = >exp;
 
-ovs_mutex_lock(>ct->ct_lock);
 ovs_mutex_lock(>lock);
 
+ovs_mutex_lock(>ct->ct_lock);
 rculist_push_back(>ct->exp_lists[exp->tm], >node);
+ovs_mutex_unlock(>ct->ct_lock);
+
 atomic_flag_clear(>insert_once);
 atomic_flag_clear(>remove_once);
 
 ovs_mutex_unlock(>lock);
-ovs_mutex_unlock(>ct->ct_lock);
 }
 
 static void
 conn_schedule_expiration(struct conntrack *ct, struct conn *conn,
  enum ct_timeout tm, long long now, uint32_t tp_value)
+OVS_REQUIRES(conn->lock)
 {
 conn_expire_init(conn, ct);
 conn->expiration = now + tp_value * 1000;
@@ -285,19 +288,12 @@ conn_update_expiration__(struct conntrack *ct, struct 
conn *conn,
  enum ct_timeout tm, long long now,
  uint32_t tp_value)
 OVS_REQUIRES(conn->lock)
+OVS_EXCLUDED(ct->ct_lock)
 {
-ovs_mutex_unlock(>lock);
-
-ovs_mutex_lock(>ct_lock);
-ovs_mutex_lock(>lock);
 if (!conn->cleaned) {
-conn_expire_remove(>exp);
+conn_expire_remove(>exp, true);
 conn_schedule_expiration(ct, conn, tm, now, tp_value);
 }
-ovs_mutex_unlock(>lock);
-ovs_mutex_unlock(>ct_lock);
-
-ovs_mutex_lock(>lock);
 }
 
 /* The conn entry lock must be held on entry and exit. */
@@ -309,20 +305,12 @@ conn_update_expiration(struct conntrack *ct, struct conn 
*conn,
 struct timeout_policy *tp;
 uint32_t val;
 
-ovs_mutex_unlock(>lock);
-
-ovs_mutex_lock(>ct_lock);
-ovs_mutex_lock(>lock);
 tp = timeout_policy_lookup(ct, conn->tp_id);
 if (tp) {
 val = tp->policy.attrs[tm_to_ct_dpif_tp(tm)];
 } else {
 val = ct_dpif_netdev_tp_def[tm_to_ct_dpif_tp(tm)];
 }
-ovs_mutex_unlock(>lock);
-ovs_mutex_unlock(>ct_lock);
-
-ovs_mutex_lock(>lock);
 VLOG_DBG_RL(, "Update timeout %s zone=%u with policy id=%d "
 "val=%u sec.",
 ct_timeout_str[tm], conn->key.zone, conn->tp_id, val);
@@ -330,11 +318,10 @@ conn_update_expiration(struct conntrack *ct, struct conn 
*conn,
 conn_update_expiration__(ct, conn, tm, now, val);
 }
 
-/* ct_lock must be held. */
 void
 conn_init_expiration(struct conntrack *ct, struct conn *conn,
  enum 

[ovs-dev] [PATCH v1 6/9] conntrack: Do not schedule zero ms timers

2021-02-17 Thread Gaetan Rivet
When ct_sweep() is far behind on its work, the 'next_wake' returned can
be before the moment it started. When it happens, the thread schedules a
zero ms timer that is logged as an error.

Instead, mark the thread for immediate wake in the next poll_block().

Signed-off-by: Gaetan Rivet 
Reviewed-by: Eli Britstein 
---
 lib/conntrack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5aad64994..71f79a790 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1628,6 +1628,8 @@ clean_thread_main(void *f_)
 next_wake = conntrack_clean(ct, now);
 
 if (next_wake < now) {
+poll_immediate_wake();
+} else if (next_wake < now + CT_CLEAN_MIN_INTERVAL) {
 poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL);
 } else {
 poll_timer_wait_until(MAX(next_wake, now + CT_CLEAN_INTERVAL));
-- 
2.30.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 7/9] conntrack: Do not rate limit ct-sweep

2021-02-17 Thread Gaetan Rivet
The current rate limit is set to allow other threads to update the
connections when applicable. This was valid when taking the 'ct_lock'
was needed with a global critical section.

Now that the size of the critical section for 'ct_lock' is reduced, it
is not necessary to rate limit calls to ct_sweep() anymore.

Signed-off-by: Gaetan Rivet 
Reviewed-by: Eli Britstein 
---
 lib/conntrack.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 71f79a790..1b21b79bd 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1602,20 +1602,12 @@ conntrack_clean(struct conntrack *ct, long long now)
  * there is an actual connection that expires, or because a new connection
  * might be created with the minimum timeout).
  *
- * The logic below has two goals:
- *
- * - We want to reduce the number of wakeups and batch connection cleanup
- *   when the load is not very high.  CT_CLEAN_INTERVAL ensures that if we
- *   are coping with the current cleanup tasks, then we wait at least
- *   5 seconds to do further cleanup.
- *
- * - We don't want to keep the map locked too long, as we might prevent
- *   traffic from flowing.  CT_CLEAN_MIN_INTERVAL ensures that if cleanup is
- *   behind, there is at least some 200ms blocks of time when the map will be
- *   left alone, so the datapath can operate unhindered.
+ * We want to reduce the number of wakeups and batch connection cleanup
+ * when the load is not very high.  CT_CLEAN_INTERVAL ensures that if we
+ * are coping with the current cleanup tasks, then we wait at least
+ * 5 seconds to do further cleanup.
  */
 #define CT_CLEAN_INTERVAL 5000 /* 5 seconds */
-#define CT_CLEAN_MIN_INTERVAL 200  /* 0.2 seconds */
 
 static void *
 clean_thread_main(void *f_)
@@ -1627,12 +1619,10 @@ clean_thread_main(void *f_)
 long long now = time_msec();
 next_wake = conntrack_clean(ct, now);
 
-if (next_wake < now) {
-poll_immediate_wake();
-} else if (next_wake < now + CT_CLEAN_MIN_INTERVAL) {
-poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL);
+if (next_wake > now) {
+poll_timer_wait_until(MIN(next_wake, now + CT_CLEAN_INTERVAL));
 } else {
-poll_timer_wait_until(MAX(next_wake, now + CT_CLEAN_INTERVAL));
+poll_immediate_wake();
 }
 latch_wait(>clean_thread_exit);
 poll_block();
-- 
2.30.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 2/9] conntrack: Use a cmap to store zone limits

2021-02-17 Thread Gaetan Rivet
Change the data structure from hmap to cmap for zone limits.
As they are shared amongst multiple conntrack users, multiple
readers want to check the current zone limit state before progressing in
their processing. Using a CMAP allows doing lookups without taking the
global 'ct_lock', thus reducing contention.

Signed-off-by: Gaetan Rivet 
Reviewed-by: Eli Britstein 
---
 lib/conntrack-private.h |  2 +-
 lib/conntrack.c | 72 -
 lib/conntrack.h |  2 +-
 lib/dpif-netdev.c   |  5 +--
 4 files changed, 55 insertions(+), 26 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 4b6f9eae3..f2cbf657e 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -176,7 +176,7 @@ struct conntrack {
 struct ovs_mutex ct_lock; /* Protects 2 following fields. */
 struct cmap conns OVS_GUARDED;
 struct rculist exp_lists[N_CT_TM] OVS_GUARDED;
-struct hmap zone_limits OVS_GUARDED;
+struct cmap zone_limits OVS_GUARDED;
 struct hmap timeout_policies OVS_GUARDED;
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
diff --git a/lib/conntrack.c b/lib/conntrack.c
index ac12f9196..c218200dc 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -79,7 +79,7 @@ enum ct_alg_ctl_type {
 };
 
 struct zone_limit {
-struct hmap_node node;
+struct cmap_node node;
 struct conntrack_zone_limit czl;
 };
 
@@ -303,7 +303,7 @@ conntrack_init(void)
 for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) {
 rculist_init(>exp_lists[i]);
 }
-hmap_init(>zone_limits);
+cmap_init(>zone_limits);
 ct->zone_limit_seq = 0;
 timeout_policy_init(ct);
 ovs_mutex_unlock(>ct_lock);
@@ -339,12 +339,25 @@ zone_key_hash(int32_t zone, uint32_t basis)
 }
 
 static struct zone_limit *
-zone_limit_lookup(struct conntrack *ct, int32_t zone)
+zone_limit_lookup_protected(struct conntrack *ct, int32_t zone)
 OVS_REQUIRES(ct->ct_lock)
 {
 uint32_t hash = zone_key_hash(zone, ct->hash_basis);
 struct zone_limit *zl;
-HMAP_FOR_EACH_IN_BUCKET (zl, node, hash, >zone_limits) {
+CMAP_FOR_EACH_WITH_HASH_PROTECTED (zl, node, hash, >zone_limits) {
+if (zl->czl.zone == zone) {
+return zl;
+}
+}
+return NULL;
+}
+
+static struct zone_limit *
+zone_limit_lookup(struct conntrack *ct, int32_t zone)
+{
+uint32_t hash = zone_key_hash(zone, ct->hash_basis);
+struct zone_limit *zl;
+CMAP_FOR_EACH_WITH_HASH (zl, node, hash, >zone_limits) {
 if (zl->czl.zone == zone) {
 return zl;
 }
@@ -354,7 +367,6 @@ zone_limit_lookup(struct conntrack *ct, int32_t zone)
 
 static struct zone_limit *
 zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone)
-OVS_REQUIRES(ct->ct_lock)
 {
 struct zone_limit *zl = zone_limit_lookup(ct, zone);
 return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE);
@@ -363,13 +375,16 @@ zone_limit_lookup_or_default(struct conntrack *ct, 
int32_t zone)
 struct conntrack_zone_limit
 zone_limit_get(struct conntrack *ct, int32_t zone)
 {
-ovs_mutex_lock(>ct_lock);
-struct conntrack_zone_limit czl = {DEFAULT_ZONE, 0, 0, 0};
+struct conntrack_zone_limit czl = {
+.zone = DEFAULT_ZONE,
+.limit = 0,
+.count = ATOMIC_COUNT_INIT(0),
+.zone_limit_seq = 0,
+};
 struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone);
 if (zl) {
 czl = zl->czl;
 }
-ovs_mutex_unlock(>ct_lock);
 return czl;
 }
 
@@ -377,13 +392,19 @@ static int
 zone_limit_create(struct conntrack *ct, int32_t zone, uint32_t limit)
 OVS_REQUIRES(ct->ct_lock)
 {
+struct zone_limit *zl = zone_limit_lookup_protected(ct, zone);
+
+if (zl) {
+return 0;
+}
+
 if (zone >= DEFAULT_ZONE && zone <= MAX_ZONE) {
-struct zone_limit *zl = xzalloc(sizeof *zl);
+zl = xzalloc(sizeof *zl);
 zl->czl.limit = limit;
 zl->czl.zone = zone;
 zl->czl.zone_limit_seq = ct->zone_limit_seq++;
 uint32_t hash = zone_key_hash(zone, ct->hash_basis);
-hmap_insert(>zone_limits, >node, hash);
+cmap_insert(>zone_limits, >node, hash);
 return 0;
 } else {
 return EINVAL;
@@ -394,13 +415,14 @@ int
 zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit)
 {
 int err = 0;
-ovs_mutex_lock(>ct_lock);
 struct zone_limit *zl = zone_limit_lookup(ct, zone);
 if (zl) {
 zl->czl.limit = limit;
 VLOG_INFO("Changed zone limit of %u for zone %d", limit, zone);
 } else {
+ovs_mutex_lock(>ct_lock);
 err = zone_limit_create(ct, zone, limit);
+ovs_mutex_unlock(>ct_lock);
 if (!err) {
 VLOG_INFO("Created zone limit of %u for zone %d", limit, zone);
 } else {
@@ -408,7 +430,6 @@ zone_limit_update(struct conntrack *ct, 

[ovs-dev] [PATCH v1 4/9] conntrack-tp: Use a cmap to store timeout policies

2021-02-17 Thread Gaetan Rivet
Multiple lookups are done to stored timeout policies, each time blocking
the global 'ct_lock'. This is usually not necessary and it should be
acceptable to get policy updates slightly delayed (by one RCU sync
at most). Using a CMAP reduces multiple lock taking and releasing in
the connection insertion path.

Signed-off-by: Gaetan Rivet 
Reviewed-by: Eli Britstein 
---
 lib/conntrack-private.h |  2 +-
 lib/conntrack-tp.c  | 54 +++--
 lib/conntrack.c |  9 ---
 lib/conntrack.h |  2 +-
 4 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index f2cbf657e..7eb555092 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -177,7 +177,7 @@ struct conntrack {
 struct cmap conns OVS_GUARDED;
 struct rculist exp_lists[N_CT_TM] OVS_GUARDED;
 struct cmap zone_limits OVS_GUARDED;
-struct hmap timeout_policies OVS_GUARDED;
+struct cmap timeout_policies OVS_GUARDED;
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
 struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */
diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c
index 30ba4bda8..a376857ec 100644
--- a/lib/conntrack-tp.c
+++ b/lib/conntrack-tp.c
@@ -47,14 +47,15 @@ static unsigned int ct_dpif_netdev_tp_def[] = {
 };
 
 static struct timeout_policy *
-timeout_policy_lookup(struct conntrack *ct, int32_t tp_id)
+timeout_policy_lookup_protected(struct conntrack *ct, int32_t tp_id)
 OVS_REQUIRES(ct->ct_lock)
 {
 struct timeout_policy *tp;
 uint32_t hash;
 
 hash = hash_int(tp_id, ct->hash_basis);
-HMAP_FOR_EACH_IN_BUCKET (tp, node, hash, >timeout_policies) {
+CMAP_FOR_EACH_WITH_HASH_PROTECTED (tp, node, hash,
+   >timeout_policies) {
 if (tp->policy.id == tp_id) {
 return tp;
 }
@@ -62,20 +63,25 @@ timeout_policy_lookup(struct conntrack *ct, int32_t tp_id)
 return NULL;
 }
 
-struct timeout_policy *
-timeout_policy_get(struct conntrack *ct, int32_t tp_id)
+static struct timeout_policy *
+timeout_policy_lookup(struct conntrack *ct, int32_t tp_id)
 {
 struct timeout_policy *tp;
+uint32_t hash;
 
-ovs_mutex_lock(>ct_lock);
-tp = timeout_policy_lookup(ct, tp_id);
-if (!tp) {
-ovs_mutex_unlock(>ct_lock);
-return NULL;
+hash = hash_int(tp_id, ct->hash_basis);
+CMAP_FOR_EACH_WITH_HASH (tp, node, hash, >timeout_policies) {
+if (tp->policy.id == tp_id) {
+return tp;
+}
 }
+return NULL;
+}
 
-ovs_mutex_unlock(>ct_lock);
-return tp;
+struct timeout_policy *
+timeout_policy_get(struct conntrack *ct, int32_t tp_id)
+{
+return timeout_policy_lookup(ct, tp_id);
 }
 
 static void
@@ -125,27 +131,30 @@ timeout_policy_create(struct conntrack *ct,
 init_default_tp(tp, tp_id);
 update_existing_tp(tp, new_tp);
 hash = hash_int(tp_id, ct->hash_basis);
-hmap_insert(>timeout_policies, >node, hash);
+cmap_insert(>timeout_policies, >node, hash);
 }
 
 static void
 timeout_policy_clean(struct conntrack *ct, struct timeout_policy *tp)
 OVS_REQUIRES(ct->ct_lock)
 {
-hmap_remove(>timeout_policies, >node);
-free(tp);
+uint32_t hash = hash_int(tp->policy.id, ct->hash_basis);
+cmap_remove(>timeout_policies, >node, hash);
+ovsrcu_postpone(free, tp);
 }
 
 static int
-timeout_policy_delete__(struct conntrack *ct, uint32_t tp_id)
+timeout_policy_delete__(struct conntrack *ct, uint32_t tp_id,
+bool warn_on_error)
 OVS_REQUIRES(ct->ct_lock)
 {
+struct timeout_policy *tp;
 int err = 0;
-struct timeout_policy *tp = timeout_policy_lookup(ct, tp_id);
 
+tp = timeout_policy_lookup_protected(ct, tp_id);
 if (tp) {
 timeout_policy_clean(ct, tp);
-} else {
+} else if (warn_on_error) {
 VLOG_WARN_RL(, "Failed to delete a non-existent timeout "
  "policy: id=%d", tp_id);
 err = ENOENT;
@@ -159,7 +168,7 @@ timeout_policy_delete(struct conntrack *ct, uint32_t tp_id)
 int err;
 
 ovs_mutex_lock(>ct_lock);
-err = timeout_policy_delete__(ct, tp_id);
+err = timeout_policy_delete__(ct, tp_id, true);
 ovs_mutex_unlock(>ct_lock);
 return err;
 }
@@ -170,7 +179,7 @@ timeout_policy_init(struct conntrack *ct)
 {
 struct timeout_policy tp;
 
-hmap_init(>timeout_policies);
+cmap_init(>timeout_policies);
 
 /* Create default timeout policy. */
 memset(, 0, sizeof tp);
@@ -182,14 +191,11 @@ int
 timeout_policy_update(struct conntrack *ct,
   struct timeout_policy *new_tp)
 {
-int err = 0;
 uint32_t tp_id = new_tp->policy.id;
+int err = 0;
 
 ovs_mutex_lock(>ct_lock);
-struct timeout_policy *tp = timeout_policy_lookup(ct, tp_id);
-if (tp) {
-err = 

[ovs-dev] [PATCH v1 1/9] conntrack: Use rcu-lists to store conn expirations

2021-02-17 Thread Gaetan Rivet
Change the connection expiration lists from ovs_list to rculist.
This is a pre-step towards reducing the granularity of 'ct_lock'.

Signed-off-by: Gaetan Rivet 
Reviewed-by: Eli Britstein 
---
 lib/conntrack-private.h | 76 +++--
 lib/conntrack-tp.c  | 60 +---
 lib/conntrack.c | 14 
 3 files changed, 105 insertions(+), 45 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index e8332bdba..4b6f9eae3 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -29,6 +29,7 @@
 #include "openvswitch/list.h"
 #include "openvswitch/types.h"
 #include "packets.h"
+#include "rculist.h"
 #include "unaligned.h"
 #include "dp-packet.h"
 
@@ -86,17 +87,55 @@ struct alg_exp_node {
 bool nat_rpl_dst;
 };
 
+/* Timeouts: all the possible timeout states passed to update_expiration()
+ * are listed here. The name will be prefix by CT_TM_ and the value is in
+ * milliseconds */
+#define CT_TIMEOUTS \
+CT_TIMEOUT(TCP_FIRST_PACKET) \
+CT_TIMEOUT(TCP_OPENING) \
+CT_TIMEOUT(TCP_ESTABLISHED) \
+CT_TIMEOUT(TCP_CLOSING) \
+CT_TIMEOUT(TCP_FIN_WAIT) \
+CT_TIMEOUT(TCP_CLOSED) \
+CT_TIMEOUT(OTHER_FIRST) \
+CT_TIMEOUT(OTHER_MULTIPLE) \
+CT_TIMEOUT(OTHER_BIDIR) \
+CT_TIMEOUT(ICMP_FIRST) \
+CT_TIMEOUT(ICMP_REPLY)
+
+enum ct_timeout {
+#define CT_TIMEOUT(NAME) CT_TM_##NAME,
+CT_TIMEOUTS
+#undef CT_TIMEOUT
+N_CT_TM
+};
+
 enum OVS_PACKED_ENUM ct_conn_type {
 CT_CONN_TYPE_DEFAULT,
 CT_CONN_TYPE_UN_NAT,
 };
 
+struct conn_expire {
+/* Set once when initializing the expiration node. */
+struct conntrack *ct;
+/* Timeout state of the connection.
+ * It follows the connection state updates.
+ */
+enum ct_timeout tm;
+/* Insert and remove the expiration node only once per RCU syncs.
+ * If multiple threads update the connection, its expiration should
+ * be removed only once and added only once to timeout lists.
+ */
+atomic_flag insert_once;
+atomic_flag remove_once;
+struct rculist node;
+};
+
 struct conn {
 /* Immutable data. */
 struct conn_key key;
 struct conn_key rev_key;
 struct conn_key parent_key; /* Only used for orig_tuple support. */
-struct ovs_list exp_node;
 struct cmap_node cm_node;
 struct nat_action_info_t *nat_info;
 char *alg;
@@ -104,6 +143,7 @@ struct conn {
 
 /* Mutable data. */
 struct ovs_mutex lock; /* Guards all mutable fields. */
+struct conn_expire exp;
 ovs_u128 label;
 long long expiration;
 uint32_t mark;
@@ -132,33 +172,10 @@ enum ct_update_res {
 CT_UPDATE_VALID_NEW,
 };
 
-/* Timeouts: all the possible timeout states passed to update_expiration()
- * are listed here. The name will be prefix by CT_TM_ and the value is in
- * milliseconds */
-#define CT_TIMEOUTS \
-CT_TIMEOUT(TCP_FIRST_PACKET) \
-CT_TIMEOUT(TCP_OPENING) \
-CT_TIMEOUT(TCP_ESTABLISHED) \
-CT_TIMEOUT(TCP_CLOSING) \
-CT_TIMEOUT(TCP_FIN_WAIT) \
-CT_TIMEOUT(TCP_CLOSED) \
-CT_TIMEOUT(OTHER_FIRST) \
-CT_TIMEOUT(OTHER_MULTIPLE) \
-CT_TIMEOUT(OTHER_BIDIR) \
-CT_TIMEOUT(ICMP_FIRST) \
-CT_TIMEOUT(ICMP_REPLY)
-
-enum ct_timeout {
-#define CT_TIMEOUT(NAME) CT_TM_##NAME,
-CT_TIMEOUTS
-#undef CT_TIMEOUT
-N_CT_TM
-};
-
 struct conntrack {
 struct ovs_mutex ct_lock; /* Protects 2 following fields. */
 struct cmap conns OVS_GUARDED;
-struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
+struct rculist exp_lists[N_CT_TM] OVS_GUARDED;
 struct hmap zone_limits OVS_GUARDED;
 struct hmap timeout_policies OVS_GUARDED;
 uint32_t hash_basis; /* Salt for hashing a connection key. */
@@ -204,4 +221,13 @@ struct ct_l4_proto {
struct ct_dpif_protoinfo *);
 };
 
+static inline void
+conn_expire_remove(struct conn_expire *exp)
+{
+if (!atomic_flag_test_and_set(>remove_once)
+&& rculist_next(>node)) {
+rculist_remove(>node);
+}
+}
+
 #endif /* conntrack-private.h */
diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c
index a586d3a8d..30ba4bda8 100644
--- a/lib/conntrack-tp.c
+++ b/lib/conntrack-tp.c
@@ -230,6 +230,50 @@ tm_to_ct_dpif_tp(enum ct_timeout tm)
 return CT_DPIF_TP_ATTR_MAX;
 }
 
+static void
+conn_expire_init(struct conn *conn, struct conntrack *ct)
+{
+struct conn_expire *exp = >exp;
+
+if (exp->ct != NULL) {
+return;
+}
+
+exp->ct = ct;
+atomic_flag_clear(>insert_once);
+atomic_flag_clear(>remove_once);
+/* The expiration is initially unscheduled, flag it as 'removed'. */
+atomic_flag_test_and_set(>remove_once);
+}
+
+static void
+conn_expire_insert(struct conn *conn)
+{
+struct conn_expire *exp = >exp;
+
+ovs_mutex_lock(>ct->ct_lock);
+ovs_mutex_lock(>lock);
+
+rculist_push_back(>ct->exp_lists[exp->tm], >node);
+atomic_flag_clear(>insert_once);
+atomic_flag_clear(>remove_once);

[ovs-dev] [PATCH v1 9/9] conntrack: Use an atomic conn expiration value

2021-02-17 Thread Gaetan Rivet
A lock is taken during conn_lookup() to check whether a connection is
expired before returning it. This lock can have some contention.

Even though this lock ensures a consistent sequence of writes, it does
not imply a specific order. A ct_clean thread taking the lock first
could read a value that would be updated immediately after by a PMD
waiting on the same lock, just as well as the opposite order.

As such, the expiration time can be stale anytime it is read. In this
context, using an atomic will ensure the same write consistency while
keeping the same (lack of) guarantee for reads. Reading the atomic will
however be less costly than taking and releasing the lock.

Signed-off-by: Gaetan Rivet 
Reviewed-by: Eli Britstein 
---
 lib/conntrack-private.h |  2 +-
 lib/conntrack-tp.c  |  2 +-
 lib/conntrack.c | 25 +++--
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index e8ffb5bf9..7f3f8714a 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -148,7 +148,7 @@ struct conn {
 struct ovs_mutex lock; /* Guards all mutable fields. */
 struct conn_expire exp;
 ovs_u128 label;
-long long expiration;
+atomic_llong expiration;
 uint32_t mark;
 int seq_skew;
 
diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c
index 01ffa30df..a4ea19c1b 100644
--- a/lib/conntrack-tp.c
+++ b/lib/conntrack-tp.c
@@ -276,7 +276,7 @@ conn_schedule_expiration(struct conntrack *ct, struct conn 
*conn,
 OVS_REQUIRES(conn->lock)
 {
 conn_expire_init(conn, ct);
-conn->expiration = now + tp_value * 1000;
+atomic_store_relaxed(>expiration, now + tp_value * 1000);
 conn->exp.tm = tm;
 if (!atomic_flag_test_and_set(>exp.insert_once)) {
 ovsrcu_postpone(conn_expire_insert, conn);
diff --git a/lib/conntrack.c b/lib/conntrack.c
index e042683aa..193d550cb 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -99,6 +99,7 @@ static enum ct_update_res conn_update(struct conntrack *ct, 
struct conn *conn,
   struct dp_packet *pkt,
   struct conn_lookup_ctx *ctx,
   long long now);
+static long long int conn_expiration(const struct conn *);
 static bool conn_expired(struct conn *, long long now);
 static void set_mark(struct dp_packet *, struct conn *,
  uint32_t val, uint32_t mask);
@@ -1555,10 +1556,8 @@ ct_sweep(struct conntrack *ct, long long now, size_t 
limit)
 
 for (unsigned i = 0; i < N_CT_TM; i++) {
 RCULIST_FOR_EACH (conn, exp.node, >exp_lists[i]) {
-ovs_mutex_lock(>lock);
-if (now < conn->expiration || count >= limit) {
-min_expiration = MIN(min_expiration, conn->expiration);
-ovs_mutex_unlock(>lock);
+if (!conn_expired(conn, now) || count >= limit) {
+min_expiration = MIN(min_expiration, conn_expiration(conn));
 if (count >= limit) {
 /* Do not check other lists. */
 COVERAGE_INC(conntrack_long_cleanup);
@@ -1566,7 +1565,6 @@ ct_sweep(struct conntrack *ct, long long now, size_t 
limit)
 }
 break;
 } else {
-ovs_mutex_unlock(>lock);
 conn_clean(ct, conn);
 }
 count++;
@@ -2395,14 +2393,21 @@ conn_update(struct conntrack *ct, struct conn *conn, 
struct dp_packet *pkt,
 return update_res;
 }
 
+static long long int
+conn_expiration(const struct conn *conn)
+{
+long long int expiration;
+
+atomic_read_relaxed(_CAST(struct conn *, conn)->expiration,
+);
+return expiration;
+}
+
 static bool
 conn_expired(struct conn *conn, long long now)
 {
 if (conn->conn_type == CT_CONN_TYPE_DEFAULT) {
-ovs_mutex_lock(>lock);
-bool expired = now >= conn->expiration ? true : false;
-ovs_mutex_unlock(>lock);
-return expired;
+return now >= conn_expiration(conn);
 }
 return false;
 }
@@ -2545,7 +2550,7 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct 
ct_dpif_entry *entry,
 entry->mark = conn->mark;
 memcpy(>labels, >label, sizeof entry->labels);
 
-long long expiration = conn->expiration - now;
+long long expiration = conn_expiration(conn) - now;
 
 struct ct_l4_proto *class = l4_protos[conn->key.nw_proto];
 if (class->conn_get_protoinfo) {
-- 
2.30.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 0/9] conntrack: improve multithread scalability

2021-02-17 Thread Gaetan Rivet
Conntracks are executed within the datapath. Locks along this path are crucial
and their critical section should be minimal. The global 'ct_lock' is necessary
before any action taken on connection states. This lock is needed for many
operations on the conntrack, slowing down the datapath.

The cleanup thread 'ct_clean' will take it to do its job. As it can hold it a
long time, the thread is limited in amount of connection cleaned per round,
and calls are rate-limited.

* Timeout policies locking is contrived to avoid deadlock.
  Anytime a connection state is updated, during its update it is unlocked,
  'ct_lock' is taken, then the connection is locked again. Then the reverse
  is done for unlock.

* Scalability is poor. The global ct_lock needs to be taken before applying
  any change to a conn object. This is backward: local changes to smaller
  objects should be independent, then the global lock should only be taken once
  the rest of the work is done, the goal being to have the smallest possible
  critical section.

It can be improved. Using RCU-friendly structures for connections, zone limits
and timeout policies, read-first workload is improved and the precedence of the
global 'ct_lock' and local 'conn->lock' can be inversed.

Running the conntrack benchmark we see these changes:
  ./tests/ovstest test-conntrack benchmark  300 32

code \ N  1 2 4 8
  Before   2310  2766  6117 19838  (ms)
   After   2072  2084  2653  4541  (ms)

One thread in the benchmark executes the task of a PMD, while the 'ct_clean' 
thread
runs in background as well.

Github actions: https://github.com/grivet/ovs/actions/runs/574446345

Gaetan Rivet (9):
  conntrack: Use rcu-lists to store conn expirations
  conntrack: Use a cmap to store zone limits
  conntrack: Init hash basis first at creation
  conntrack-tp: Use a cmap to store timeout policies
  conntrack: Inverse conn and ct lock precedence
  conntrack: Do not schedule zero ms timers
  conntrack: Do not rate limit ct-sweep
  conntrack: Do not log empty ct-sweep
  conntrack: Use an atomic conn expiration value

 lib/conntrack-private.h |  96 +--
 lib/conntrack-tp.c  | 137 +++---
 lib/conntrack.c | 206 +---
 lib/conntrack.h |   4 +-
 lib/dpif-netdev.c   |   5 +-
 5 files changed, 280 insertions(+), 168 deletions(-)

--
2.30.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 3/9] conntrack: Init hash basis first at creation

2021-02-17 Thread Gaetan Rivet
The 'hash_basis' field is used sometimes during sub-systems init
routine. It will be 0 by default before randomization. Sub-systems would
then init some nodes with incorrect hash values.

The timeout policies module is affected, making the default policy being
referenced using an incorrect hash value.

Fixes: 2078901a4c14 ("userspace: Add conntrack timeout policy support.")
Signed-off-by: Gaetan Rivet 
Reviewed-by: Eli Britstein 
---
 lib/conntrack.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index c218200dc..8062dcd6b 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -291,6 +291,11 @@ conntrack_init(void)
 static struct ovsthread_once setup_l4_once = OVSTHREAD_ONCE_INITIALIZER;
 struct conntrack *ct = xzalloc(sizeof *ct);
 
+/* This value can be used during init (e.g. timeout_policy_init()),
+ * set it first to ensure it is available.
+ */
+ct->hash_basis = random_uint32();
+
 ovs_rwlock_init(>resources_lock);
 ovs_rwlock_wrlock(>resources_lock);
 hmap_init(>alg_expectations);
@@ -308,7 +313,6 @@ conntrack_init(void)
 timeout_policy_init(ct);
 ovs_mutex_unlock(>ct_lock);
 
-ct->hash_basis = random_uint32();
 atomic_count_init(>n_conn, 0);
 atomic_init(>n_conn_limit, DEFAULT_N_CONN_LIMIT);
 atomic_init(>tcp_seq_chk, true);
-- 
2.30.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v9 7/7] ovn-northd-ddlog: New implementation of ovn-northd based on ddlog.

2021-02-17 Thread Dumitru Ceara

On 2/17/21 2:49 AM, Ben Pfaff wrote:

On Fri, Dec 11, 2020 at 03:43:22PM +0100, Dumitru Ceara wrote:

diff --git a/tests/ovn.at b/tests/ovn.at
index 6e396895826f..2e4eaf7f12f7 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -16982,6 +16982,10 @@ AT_CLEANUP
  
  OVN_FOR_EACH_NORTHD([

  AT_SETUP([ovn -- IGMP snoop/querier/relay])
+
+dnl This test has problems with ovn-northd-ddlog.
+AT_SKIP_IF([test NORTHD_TYPE = ovn-northd-ddlog && test "$RUN_ANYWAY" != yes])


I might be wrong but I think the ddlog part still misses the logic from
this commit from the ovn-northd C implementation:

https://github.com/ovn-org/ovn/commit/97778ab3e422ac071faa67f9f477fd54977e9c04


Thanks very much, incorporating that fixed the IGMP and MLD
snoop/querier/relay tests.


No problem.



The interconnection test still fails, probably because I haven't
identified what's missing in that case yet.  Any insight is really
welcome.



I don't think there were any interconnection related changes that went 
into ovn-northd recently but maybe Han is the better person to ask.


Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Fix ukey leak on udpif destroy.

2021-02-17 Thread Ilya Maximets
On 1/18/21 5:12 PM, Ilya Maximets wrote:
> Since commit 79eadafeb1b4 udpif_stop_threads() doesn't delete datapath
> flows while called from udpif_destroy().  This means that ukeys are
> not cleaned up either.  So, hash maps in udpif->ukeys[] might still
> contain valid pointers to ukeys that should be destroyed before
> destroying the hash map itself:
> 
>   ==2783089==ERROR: LeakSanitizer: detected memory leaks
> 
>   Direct leak of 1560 byte(s) in 1 object(s) allocated from:
> # 0 0x7f8a57eae667 in __interceptor_malloc (/lib64/libasan.so.6+0xb0667)
> # 1 0x8411f6 in xmalloc lib/util.c:138
> # 2 0x4d8a52 in ukey_create__ ofproto/ofproto-dpif-upcall.c:1682
> # 3 0x4d99e3 in ukey_create_from_upcall ofproto/ofproto-dpif-upcall.c:1751
> # 4 0x4d517d in upcall_xlate ofproto/ofproto-dpif-upcall.c:1242
> # 5 0x4d63d2 in process_upcall ofproto/ofproto-dpif-upcall.c:1414
> # 6 0x4d29f3 in recv_upcalls ofproto/ofproto-dpif-upcall.c:833
> # 7 0x4d1ee1 in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:750
> # 8 0x795aa2 in ovsthread_wrapper lib/ovs-thread.c:383
> # 9 0x7f8a57a59431 in start_thread (/lib64/libpthread.so.0+0x9431)
> 
> Fixes: 79eadafeb1b4 ("ofproto: Do not delete datapath flows on exit by 
> default.")
> Reported-by: Dumitru Ceara 
> Signed-off-by: Ilya Maximets 
> ---
>  ofproto/ofproto-dpif-upcall.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 5fae46adf..ccf97266c 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -491,6 +491,11 @@ udpif_destroy(struct udpif *udpif)
>  dpif_register_upcall_cb(udpif->dpif, NULL, udpif);
>  
>  for (int i = 0; i < N_UMAPS; i++) {
> +struct udpif_key *ukey;
> +
> +CMAP_FOR_EACH (ukey, cmap_node, >ukeys[i].cmap) {
> +ukey_delete__(ukey);
> +}
>  cmap_destroy(>ukeys[i].cmap);
>  ovs_mutex_destroy(>ukeys[i].mutex);
>  }
> 

William, if you have some time, could you, please, take a look at this patch?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v9 7/7] ovn-northd-ddlog: New implementation of ovn-northd based on ddlog.

2021-02-17 Thread Dumitru Ceara

On 2/16/21 10:16 PM, Ben Pfaff wrote:

On Fri, Dec 11, 2020 at 03:43:22PM +0100, Dumitru Ceara wrote:

There are a few tests that fail (also northd-C tests) and I suspect it's
because the OVS IDL split patches [0] this commit depends on don't
include some of the recent fixes that went into the OVS IDL code, at
least the following:


I'm not sure whether the above comment was against an older form of the
idl-split patches that went into master, but to make sure I took a
careful look.


Yes, IIRC, back then the idl-split series that was under review didn't 
include the changes below.





- https://github.com/openvswitch/ovs/commit/97dbef6


I think that thiis is incorporated in the current OVS tree.  I see the
same code and comment in ovsdb-cs.c that was once in ovsdb-idl.c.


- https://github.com/openvswitch/ovs/commit/91a6a45
- https://github.com/openvswitch/ovs/commit/08e130a
- https://github.com/openvswitch/ovs/commit/f0d23f6


The above commits appear to me to be incorporated in the OVS tree
correctly.  Not much (any?) change was needed.  The idl-split didn't
have much effect on row tracking in general, because it's solidly on the
idl side of the split.



Now it's fine, thanks for double checking!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ofctrl: Do not link a desired flow twice.

2021-02-17 Thread Dumitru Ceara

On 2/17/21 5:24 PM, Mark Gray wrote:

On 16/02/2021 15:53, Dumitru Ceara wrote:

Depending on the logical flow matches, multiple SB flows can point to
the same desired flow.  If it happens that the desired flow conflicts
with a more restrictive (already installed) flow, then we shouldn't try
to add the desired flow multiple times to the list maintained for the
installed flow.

Fixes: 6f0b1e02d9ab ("ofctrl: Incremental processing for flow installation by 
tracking.")
Signed-off-by: Dumitru Ceara 
---
Note: I'm quite sure about the "Fixes" tag above, however, I couldn't
reproduce the issue without the following commit:
   dadae4f800cc ("ofctrl.c: Only merge actions for conjunctive flows.")
---
  controller/ofctrl.c |  3 +--
  tests/ovn.at| 19 +++
  2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index 9d62e12..f2bb93a 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -1975,8 +1975,7 @@ update_installed_flows_by_track(struct 
ovn_desired_flow_table *flow_table,
  /* The installed flow is installed for f, but f has change
   * tracked, so it must have been modified. */
  installed_flow_mod(>flow, >flow, msgs);
-ovn_flow_log(>flow, "updating installed (tracked)");


Why did you delete this log message?


Oops, completely by accident.
Thanks for spotting this!




-} else {> +} else if (!f->installed_flow) {
  /* Adding a new flow that conflicts with an existing installed
   * flow, so add it to the link.  If this flow becomes active,
   * e.g., it is less restrictive than the previous active flow
diff --git a/tests/ovn.at b/tests/ovn.at
index 55ea6d1..933958e 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -13955,6 +13955,25 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=45 
| ofctl_strip_all | \
   table=45, priority=1003,ip,metadata=0x1,nw_src=10.0.0.42 
actions=conjunction()
  ])
  
+# Add another ACL that overlaps with the existing less restrictive ones.

+check ovn-nbctl acl-add ls1 to-lport 3 'udp || ((ip4.src==10.0.0.1 || 
ip4.src==10.0.0.2) && (ip4.dst == 10.0.0.3 || ip4.dst == 10.0.0.4))' allow
+check ovn-nbctl --wait=hv sync
+


Please add a comment here it will help explain what is expected and
follows the pattern of other tests.


Sure, will do in v2.

Thanks for the review!

Regards,
Dumitru




+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=45 | ofctl_strip_all | \
+   grep "priority=1003" | \
+   sed 's/conjunction([[^)]]*)/conjunction()/g' | sort], [0], [dnl
+ table=45, priority=1003,conj_id=2,ip,metadata=0x1 actions=resubmit(,46)
+ table=45, priority=1003,conj_id=3,ip,metadata=0x1 actions=resubmit(,46)
+ table=45, priority=1003,conj_id=4,ip,metadata=0x1 actions=resubmit(,46)
+ table=45, priority=1003,ip,metadata=0x1,nw_dst=10.0.0.3 
actions=conjunction(),conjunction(),conjunction()
+ table=45, priority=1003,ip,metadata=0x1,nw_dst=10.0.0.4 
actions=conjunction(),conjunction(),conjunction()
+ table=45, priority=1003,ip,metadata=0x1,nw_src=10.0.0.1 actions=resubmit(,46)
+ table=45, priority=1003,ip,metadata=0x1,nw_src=10.0.0.2 
actions=conjunction(),conjunction()
+ table=45, priority=1003,ip,metadata=0x1,nw_src=10.0.0.42 actions=conjunction()
+ table=45, priority=1003,udp,metadata=0x1 actions=resubmit(,46)
+ table=45, priority=1003,udp6,metadata=0x1 actions=resubmit(,46)
+])
+
  OVN_CLEANUP([hv1])
  AT_CLEANUP
  





___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ofctrl: Do not link a desired flow twice.

2021-02-17 Thread Mark Gray
On 16/02/2021 15:53, Dumitru Ceara wrote:
> Depending on the logical flow matches, multiple SB flows can point to
> the same desired flow.  If it happens that the desired flow conflicts
> with a more restrictive (already installed) flow, then we shouldn't try
> to add the desired flow multiple times to the list maintained for the
> installed flow.
> 
> Fixes: 6f0b1e02d9ab ("ofctrl: Incremental processing for flow installation by 
> tracking.")
> Signed-off-by: Dumitru Ceara 
> ---
> Note: I'm quite sure about the "Fixes" tag above, however, I couldn't
> reproduce the issue without the following commit:
>   dadae4f800cc ("ofctrl.c: Only merge actions for conjunctive flows.")
> ---
>  controller/ofctrl.c |  3 +--
>  tests/ovn.at| 19 +++
>  2 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index 9d62e12..f2bb93a 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -1975,8 +1975,7 @@ update_installed_flows_by_track(struct 
> ovn_desired_flow_table *flow_table,
>  /* The installed flow is installed for f, but f has change
>   * tracked, so it must have been modified. */
>  installed_flow_mod(>flow, >flow, msgs);
> -ovn_flow_log(>flow, "updating installed (tracked)");

Why did you delete this log message?

> -} else {> +} else if (!f->installed_flow) {
>  /* Adding a new flow that conflicts with an existing 
> installed
>   * flow, so add it to the link.  If this flow becomes active,
>   * e.g., it is less restrictive than the previous active flow
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 55ea6d1..933958e 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -13955,6 +13955,25 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int 
> table=45 | ofctl_strip_all | \
>   table=45, priority=1003,ip,metadata=0x1,nw_src=10.0.0.42 
> actions=conjunction()
>  ])
>  
> +# Add another ACL that overlaps with the existing less restrictive ones.
> +check ovn-nbctl acl-add ls1 to-lport 3 'udp || ((ip4.src==10.0.0.1 || 
> ip4.src==10.0.0.2) && (ip4.dst == 10.0.0.3 || ip4.dst == 10.0.0.4))' allow
> +check ovn-nbctl --wait=hv sync
> +

Please add a comment here it will help explain what is expected and
follows the pattern of other tests.

> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=45 | ofctl_strip_all | \
> +   grep "priority=1003" | \
> +   sed 's/conjunction([[^)]]*)/conjunction()/g' | sort], [0], [dnl
> + table=45, priority=1003,conj_id=2,ip,metadata=0x1 actions=resubmit(,46)
> + table=45, priority=1003,conj_id=3,ip,metadata=0x1 actions=resubmit(,46)
> + table=45, priority=1003,conj_id=4,ip,metadata=0x1 actions=resubmit(,46)
> + table=45, priority=1003,ip,metadata=0x1,nw_dst=10.0.0.3 
> actions=conjunction(),conjunction(),conjunction()
> + table=45, priority=1003,ip,metadata=0x1,nw_dst=10.0.0.4 
> actions=conjunction(),conjunction(),conjunction()
> + table=45, priority=1003,ip,metadata=0x1,nw_src=10.0.0.1 
> actions=resubmit(,46)
> + table=45, priority=1003,ip,metadata=0x1,nw_src=10.0.0.2 
> actions=conjunction(),conjunction()
> + table=45, priority=1003,ip,metadata=0x1,nw_src=10.0.0.42 
> actions=conjunction()
> + table=45, priority=1003,udp,metadata=0x1 actions=resubmit(,46)
> + table=45, priority=1003,udp6,metadata=0x1 actions=resubmit(,46)
> +])
> +
>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Documentation: Fix DPDK qos example.

2021-02-17 Thread Eelco Chaudron



On 17 Feb 2021, at 9:23, Eelco Chaudron wrote:


On 17 Feb 2021, at 4:39, William Tu wrote:


RFC4115 says "The CIR and EIR are both measured in bits/s."
Fix the example use case based on the decription.
64-Byte packet * 8 * 1000 pps = 512000


Did you run some tests to verify the changes you made?

Fixes: e61bdffc2a98 ("netdev-dpdk: Add new DPDK RFC 4115 egress 
policer")

Signed-off-by: William Tu 
---
 Documentation/topics/dpdk/qos.rst | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/topics/dpdk/qos.rst 
b/Documentation/topics/dpdk/qos.rst

index 103495415a9c..e9a51ab3a3f0 100644
--- a/Documentation/topics/dpdk/qos.rst
+++ b/Documentation/topics/dpdk/qos.rst
@@ -69,22 +69,22 @@ to prioritize certain traffic over others at a 
port level.


 For example, the following configuration will limit the traffic rate 
at a
 port level to a maximum of 2000 packets a second (64 bytes IPv4 
packets).
-100pps as CIR (Committed Information Rate) and 1000pps as EIR 
(Excess
+1000pps as CIR (Committed Information Rate) and 1000pps as EIR 
(Excess


Ack this should be 1000

 Information Rate). High priority traffic is routed to queue 10, 
which marks
 all traffic as CIR, i.e. Green. All low priority traffic, queue 20, 
is

 marked as EIR, i.e. Yellow::

 $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
 --id=@myqos create qos type=trtcm-policer \
-other-config:cir=52000 other-config:cbs=2048 \
-other-config:eir=52000 other-config:ebs=2048  \


52000 is fine as our documentation states cir, eir are in bytes per 
second, minus the ethernet header.

So (64-12) * 1000 = 52000


+other-config:cir=512000 other-config:cbs=2048 \
+other-config:eir=512000 other-config:ebs=2048  \
 queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
  --id=@dpdk1Q10 create queue \
-  other-config:cir=4160 other-config:cbs=2048 \


This one should change to cir 2*52000 = 104000 also.


-  other-config:eir=0 other-config:ebs=0 -- \


This should change unaltered.


+  other-config:cir=512000 other-config:cbs=2048 \
+  other-config:eir=512000 other-config:ebs=0 -- \


The eir should stay zero here


  --id=@dpdk1Q20 create queue \
other-config:cir=0 other-config:cbs=0 \
-   other-config:eir=4160 other-config:ebs=2048 \
+   other-config:eir=512000 other-config:ebs=2048 \


This should be 52000 also the trailing backslash can be removed.


 This configuration accomplishes that the high priority traffic has a
 guaranteed bandwidth egressing the ports at CIR (1000pps), but it 
can also


I’ll re-run some of my tests if you have not done so with the new 
config. Hopefully next week or the week after.


I found some time to re-test this, and with the new values it works as 
expected.

To be clear this is the correct set:

$ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
--id=@myqos create qos type=trtcm-policer \
other-config:cir=52000 other-config:cbs=2048 \
other-config:eir=52000 other-config:ebs=2048  \
queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
 --id=@dpdk1Q10 create queue \
  other-config:cir=104000 other-config:cbs=2048 \
  other-config:eir=0 other-config:ebs=0 -- \
 --id=@dpdk1Q20 create queue \
   other-config:cir=0 other-config:cbs=0 \
   other-config:eir=52000 other-config:ebs=2048


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofp-actions: Fix use-after-free while decoding RAW_ENCAP.

2021-02-17 Thread Ilya Maximets
On 2/17/21 2:47 AM, William Tu wrote:
> On Tue, Feb 16, 2021 at 2:27 PM Ilya Maximets  wrote:
>>
>> While decoding RAW_ENCAP action, decode_ed_prop() might re-allocate
>> ofpbuf if there is no enough space left.  However, function
>> 'decode_NXAST_RAW_ENCAP' continues to use old pointer to 'encap'
>> structure leading to write-after-free and incorrect decoding.
>>
>>   ==3549105==ERROR: AddressSanitizer: heap-use-after-free on address
>>   0x6060011a at pc 0x005f6cc6 bp 0x7ffc3a2d4410 sp 0x7ffc3a2d4408
>>   WRITE of size 2 at 0x6060011a thread T0
>> #0 0x5f6cc5 in decode_NXAST_RAW_ENCAP lib/ofp-actions.c:4461:20
>> #1 0x5f0551 in ofpact_decode ./lib/ofp-actions.inc2:4777:16
>> #2 0x5ed17c in ofpacts_decode lib/ofp-actions.c:7752:21
>> #3 0x5eba9a in ofpacts_pull_openflow_actions__ lib/ofp-actions.c:7791:13
>> #4 0x5eb9fc in ofpacts_pull_openflow_actions lib/ofp-actions.c:7835:12
>> #5 0x64bb8b in ofputil_decode_packet_out lib/ofp-packet.c:1113:17
>> #6 0x65b6f4 in ofp_print_packet_out lib/ofp-print.c:148:13
>> #7 0x659e3f in ofp_to_string__ lib/ofp-print.c:1029:16
>> #8 0x659b24 in ofp_to_string lib/ofp-print.c:1244:21
>> #9 0x65a28c in ofp_print lib/ofp-print.c:1288:28
>> #10 0x540d11 in ofctl_ofp_parse utilities/ovs-ofctl.c:2814:9
>> #11 0x564228 in ovs_cmdl_run_command__ lib/command-line.c:247:17
>> #12 0x56408a in ovs_cmdl_run_command lib/command-line.c:278:5
>> #13 0x5391ae in main utilities/ovs-ofctl.c:179:9
>> #14 0x7f6911ce9081 in __libc_start_main (/lib64/libc.so.6+0x27081)
>> #15 0x461fed in _start (utilities/ovs-ofctl+0x461fed)
>>
>> Fix that by getting a new pointer before using.
>>
>> Credit to OSS-Fuzz.
>>
>> Fuzzer regression test will fail only with AddressSanitizer enabled.
>>
>> Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=27851
>> Fixes: f839892a206a ("OF support and translation of generic encap and decap")
>> Signed-off-by: Ilya Maximets 
> 
> 
> LGTM.
> Acked-by: William Tu 
> 

Thanks!
Applied to master and backported down to 2.11.

Patch doesn't apply cleanly to older branches, so I didn't backport it there.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Add a test for IPv4 UDP zero checksum

2021-02-17 Thread Aaron Conole
William Tu  writes:

> Hi Aaron,
>
> Should we also consider the case where udp checksum is 0x?
> I saw in netdev_tnl_calc_udp_csum, we set to 0x when a packet has
> udp checksum = 0.

I can add it.  This was added because a PMD we caught had a broken
firmware that didn't support udp with checksum 0 and would flag it as a
bad packet.

Thanks for the review!

> Thanks
> William
>
> On Wed, Feb 10, 2021 at 12:15 PM Flavio Leitner  wrote:
>>
>> On Wed, Feb 10, 2021 at 02:45:06PM -0500, Aaron Conole wrote:
>> > Flavio Leitner  writes:
>> >
>> > > On Wed, Feb 10, 2021 at 11:49:33AM -0500, Aaron Conole wrote:
>> > >> Recently, during some conntrack testing a bug was uncovered in a DPDK
>> > >> PMD, which doesn't support an IPv4 packet with a zero checksum value.
>> > >> In order to show that the connection tracking code in userspace
>> > >> supports IPv4 UDP with a zero checksum, add a test case to enforce
>> > >> this behavior.
>> > >>
>> > >> Reported-at: http://mails.dpdk.org/archives/dev/2021-January/198528.html
>> > >> Reported-by: Paolo Valerio 
>> > >> Signed-off-by: Aaron Conole 
>> > >> ---
>> > >>  tests/system-traffic.at | 43 +
>> > >>  1 file changed, 43 insertions(+)
>> > >>
>> > >> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> > >> index fb5b9a36d2..4971ccc966 100644
>> > >> --- a/tests/system-traffic.at
>> > >> +++ b/tests/system-traffic.at
>> > >> @@ -5927,6 +5927,48 @@ ovs-appctl dpif/dump-flows br0
>> > >>  OVS_TRAFFIC_VSWITCHD_STOP
>> > >>  AT_CLEANUP
>> > >>
>> > >> +
>> > >> +AT_SETUP([conntrack - IPv4 UDP zero checksum])
>> > >> +dnl This tracks sending zero checksum packets for udp over ipv4
>> > >> +CHECK_CONNTRACK()
>> > >> +OVS_TRAFFIC_VSWITCHD_START()
>> > >> +OVS_CHECK_CT_CLEAR()
>> > >> +
>> > >> +ADD_NAMESPACES(at_ns0, at_ns1)
>> > >> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01")
>> > >> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02")
>> > >> +dnl setup ct flows
>> > >> +AT_DATA([flows.txt], [dnl
>> > >> +table=0,priority=10  ip,udp,ct_state=-trk action=ct(zone=1,table=1)
>> > >> +table=0,priority=0   action=drop
>> > >> +table=1,priority=10
>> > >> ct_state=-est+trk+new,ip,ct_zone=1,in_port=1
>> > >> action=ct(commit,table=2)
>> > >> +table=1,priority=10  ct_state=+est-new+trk,ct_zone=1,in_port=1 
>> > >> action=resubmit(,2)
>> > >> +table=1,priority=0   action=drop
>> > >> +table=2,priority=10  ct_state=+trk+new,in_port=1 action=2
>> > >> +table=2,priority=10  ct_state=+trk+est action=2
>> > >> +])
>> > >> +
>> > >> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
>> > >> +
>> > >> +# sending udp pkt
>> > >> +NS_CHECK_EXEC([at_ns0], [$PYTHON3 $srcdir/sendpkt.py p0 f0 00 00 01
>> > >> 01 02 f0 00 00 01 01 01 08 00 45 00 00 28 00 01 00 00 40 11 64 c0 0a
>> > >> 01 01 01 0a 01 01 02 04 d2 04 d2 00 14 00 00 aa aa aa aa aa aa aa aa
>> > >> aa aa aa aa > /dev/null])
>> > >
>> > > That hex string translates to this packet which doesn't use cksum:
>> > > 12:57:40.065353 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none],
>> > > proto UDP (17), length 40)
>> > > 10.1.1.1.search-agent > 10.1.1.2.search-agent: [no cksum] UDP,
>> > > length 12
>> >
>> > Yes, there should be no UDP checksum.  This is the point of this test.
>>
>> Sorry, I was confirming that the hex string does use a packet
>> as this patch says. It's good.
>>
>>
>> > >> +
>> > >> +sleep 1
>> > >
>> > > Can we use OVS_WAIT_UNTIL() or OVS_WAIT_WHILE() ?
>> >
>> > Good idea.
>> >
>> > >> +
>> > >> +dnl ensure CT picked up the packet
>> > >> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1)], [0], 
>> > >> [dnl
>> > >> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=)
>> > >> +])
>> > >> +
>> > >> +AT_CHECK([ovs-ofctl dump-flows br0 | grep table=2, | 
>> > >> OFPROTO_CLEAR_DURATION_IDLE],
>> > >> + [0], [dnl
>> > >> + cookie=0x0, duration=, table=2, n_packets=1, n_bytes=54,
>> > >> idle_age=, priority=10,ct_state=+new+trk,in_port=1
>> > >> actions=output:2
>> > >> + cookie=0x0, duration=, table=2, n_packets=0, n_bytes=0,
>> > >> idle_age=, priority=10,ct_state=+est+trk actions=output:2
>> > >> +])
>> > >> +
>> > >> +OVS_TRAFFIC_VSWITCHD_STOP
>> > >> +AT_CLEANUP
>> > >> +
>> > >>  AT_SETUP([conntrack - Multiple ICMP traverse])
>> > >>  dnl This tracks sending ICMP packets via conntrack multiple times for 
>> > >> the
>> > >>  dnl same packet
>> > >> @@ -5971,6 +6013,7 @@ AT_CHECK([ovs-ofctl dump-flows br0 | grep
>> > >> table=2, | OFPROTO_CLEAR_DURATION_IDLE
>> > >>  OVS_TRAFFIC_VSWITCHD_STOP
>> > >>  AT_CLEANUP
>> > >>
>> > >> +
>> > >
>> > > There mixed styles. Some use two lines, some use single line
>> > > and some use two lines with a dnl - or ^L.
>> > >
>> > > I guess we should stick to single line, but I don't have a
>> > > strong preference.
>> >
>> > Oops, this snuck in.  I will remove it, and we can do something there
>> > with a 

[ovs-dev] [PATCH v3] docs: Add instruction to set local_ip to ipsec tutorial

2021-02-17 Thread Balazs Nemeth
Signed-off-by: Balazs Nemeth 
---
 Documentation/tutorials/ipsec.rst | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/tutorials/ipsec.rst 
b/Documentation/tutorials/ipsec.rst
index 3b3e42c59..b6cc1c3a8 100644
--- a/Documentation/tutorials/ipsec.rst
+++ b/Documentation/tutorials/ipsec.rst
@@ -273,7 +273,26 @@ external IP is 1.1.1.1, and `host_2`'s external IP is 
2.2.2.2. Make sure
 authenticated; otherwise, any certificate signed by the CA would be
 accepted.

-3. Test IPsec tunnel.
+3. Set the `local_ip` field in the Interface table (Optional)
+
+Make sure that the `local_ip` field in the Interface table is set to the
+NIC used for egress traffic.
+
+On `host 1`::
+
+   # ovs-vsctl set Interface tun options:local_ip=$ip_1
+
+Similarly, on `host 2`::
+
+   # ovs-vsctl set Interface tun options:local_ip=$ip_2
+
+   .. note::
+
+It is not strictly necessary to set the `local_ip` field if your system
+only has one NIC or the default gateway interface is set to the NIC
+used for egress traffic.
+
+4. Test IPsec tunnel.

Now you should have an IPsec GRE tunnel running between two hosts. To verify
it, in `host_1`::
--
2.29.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] docs: Add instruction to set local_ip to ipsec tutorial

2021-02-17 Thread Balazs Nemeth
Signed-off-by: Balazs Nemeth 
---
 Documentation/tutorials/ipsec.rst | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/tutorials/ipsec.rst 
b/Documentation/tutorials/ipsec.rst
index 3b3e42c59..6bcf3842e 100644
--- a/Documentation/tutorials/ipsec.rst
+++ b/Documentation/tutorials/ipsec.rst
@@ -144,6 +144,14 @@ external IP is 1.1.1.1, and `host_2`'s external IP is 
2.2.2.2. Make sure
 options:remote_ip=$ip_1 \
 options:psk=swordfish

+  Set the "local_ip" field in the Interface table on host_1::
+
+  $ ovs-vsctl set Interface tun options:local_ip=$ip_1
+
+  Same on host_2::
+
+  $ ovs-vsctl set Interface tun options:local_ip=$ip_2
+
   .. note::

 Pre-shared key (PSK) based authentication is easy to set up but less
@@ -273,7 +281,26 @@ external IP is 1.1.1.1, and `host_2`'s external IP is 
2.2.2.2. Make sure
 authenticated; otherwise, any certificate signed by the CA would be
 accepted.

-3. Test IPsec tunnel.
+3. Set the `local_ip` field in the Interface table (Optional)
+
+Make sure that the `local_ip` field in the Interface table is set to the
+NIC used for egress traffic.
+
+On `host 1`::
+
+   # ovs-vsctl set Interface tun options:local_ip=$ip_1
+
+Similarly, on `host 2`::
+
+   # ovs-vsctl set Interface tun options:local_ip=$ip_2
+
+   .. note::
+
+It is not strictly necessary to set the `local_ip` field if your system
+only has one NIC or the default gateway interface is set to the NIC
+used for egress traffic.
+
+4. Test IPsec tunnel.

Now you should have an IPsec GRE tunnel running between two hosts. To verify
it, in `host_1`::
--
2.29.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Provide the Gateway router option 'lb_force_snat_ip' to take router port ips.

2021-02-17 Thread Mark Gray
Thanks Numan, some suggestions below!

On 09/02/2021 18:44, num...@ovn.org wrote:
> From: Numan Siddique 
> 
> When a Gateway router is configured with a load balancer
> and it is also configured with options:lb_force_snat_ip=,
> OVN after load balancing the destination IP to one of the
> backend also does a NAT on the source ip with the
> lb_force_snat_ip if the packet is destined to a load balancer
> VIP.
> 
> There is a problem with the snat of source ip to 'lb_force_snat_ip'
> in one particular usecase.  When the packet enters the Gateway router
> from a provider logical switch destined to the load balancer VIP,
> then it is first load balanced to one of the backend and then
> the source ip is snatted to 'lb_force_snat_ip'.  If the chosen
> backend is reachable via the provider logical switch, then the
> packet is hairpinned back and it may hit the wire with
> the source ip 'lb_force_snat_ip'.  If 'lb_force_snat_ip' happens
> to be an OVN internal IP then the packet may be dropped.
> 
> This patch addresses this issue by providing the option to
> set the option - 'lb_force_snat_ip=router_ip'.  If 'router_ip'
> is set, then OVN will snat the load balanced packet to the
> router ip of the logical router port which chosen as 'outport'
> in lr_in_ip_routing stage.

It almost feels like this should be the default behaviour?
> 
> Example.
> 
> If the gateway router is
> 
> ovn-nbctl show lr0
> router 68f20092-5563-44b8-9ccb-b11de3e3a66c (lr0)
> port lr0-sw0
> mac: "00:00:00:00:ff:01"
> networks: ["10.0.0.1/24"]
> port lr0-public
> mac: "00:00:20:20:12:13"
> networks: ["172.168.0.100/24"]
> 
> Then the below logical flows are added if 'lb_force_snat_ip'
> is configured to 'router_ip'.
> 
> table=1 (lr_out_snat), priority=110
>match=(flags.force_snat_for_lb == 1 && ip4 && outport == "lr0-public"),
>action=(ct_snat(172.168.0.100);)
> 
> table=1 (lr_out_snat), priority=110
>match=(flags.force_snat_for_lb == 1 && ip4 && outport == "lr0-sw0")
>action=(ct_snat(10.0.0.1);)
> 
> For the above described scenario, the packet will have source ip as
> 172.168.0.100 which belongs to the provider logical switch CIDR.
> 
> Reported-by: Tim Rozet 
> Signed-off-by: Numan Siddique 
> ---
>  northd/ovn-northd.8.xml | 35 ++
>  northd/ovn-northd.c | 66 --
>  tests/ovn-northd.at | 79 +
>  3 files changed, 177 insertions(+), 3 deletions(-)
> 
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 70065a36d9..27b28aff93 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml

Should 'ovn-nb.xml' also be updated?

> @@ -3653,6 +3653,32 @@ nd_ns {
>flags.force_snat_for_dnat == 1  ip with an
>action ct_snat(B);.
>  
> +  
> +
> +  
> +
> +  If the Gateway router in the OVN Northbound database has been
> +  configured to force SNAT a packet (that has been previously
> +  load-balanced) using router IP (i.e  +  table="Logical_Router"/>:lb_force_snat_ip=router_ip), then for
> +  each logical router port P attached to the Gateway
> +  router, a priority-110 flow matches
> +  flags.force_snat_for_lb == 1  outport == 
> P
> +   with an action ct_snat(R);
> +  where R is the router port IP configured.

maybe rephrase to "is the IP configured on the router port."

> +  If R is an IPv4 address then the match will also
> +  include ip4 and if it is an IPv6 address, then the
> +  match will also include ip6.
> +
> +
> +
> +  If the logical router port P is configured with multiple
> +  IPv4 and multiple IPv6 addresses, only the first IPv4 and first 
> IPv6
> +  address is considered.

Should we log this condition?

> +
> +  
> +
> +  
>  
>If the Gateway router in the OVN Northbound database has been
>configured to force SNAT a packet (that has been previously
> @@ -3660,6 +3686,9 @@ nd_ns {
>flags.force_snat_for_lb == 1  ip with an
>action ct_snat(B);.
>  
> +  
> +
> +  
>  
>For each configuration in the OVN Northbound database, that asks
>to change the source IP address of a packet from an IP address of
> @@ -3673,14 +3702,18 @@ nd_ns {
>options, then the action would be ip4/6.src=
>(B).
>  
> +  
>  
> +  
>  
>If the NAT rule has allowed_ext_ips configured, then
>there is an additional match ip4.dst == allowed_ext_ips
>. Similarly, for IPV6, match would be ip6.dst ==
>allowed_ext_ips.
>  
> +  
>  
> +  
>  
>If the NAT rule has exempted_ext_ips set, then
>there is an additional flow configured at the priority + 1 of
> 

Re: [ovs-dev] [PATCH v1] ofproto-dpif-xlate: fix zone set from non frozen metadata field

2021-02-17 Thread Mark Gray
On 17/02/2021 10:40, 贺鹏 wrote:
> Hi,
> 
> Thanks for the review.
> 
> Mark Gray  于2021年2月17日周三 下午6:13写道:
>>
>> I'm not too familiar with this code but I have some comments.
>>
>> On 15/02/2021 09:50, Peng He wrote:
>>> CT zone could be set from a field that is not included in frozen
>>> metedata. Consider the belowing cases which is normally used in
>>
>> Nits:
>>
>> s/metedata/metadata
>> s/belowing cases which is/cases below which are
> 
> sorry for the typo, will fix it in the next version
> 
>>
>>> OpenStack security group rules:
>>>
>>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
>>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
>>>
>>> The zone is set from the first rule's ct action. These two rules will
>>> generate two megaflows: the first one uses zone=5 to query the CT module,
>>> the second one set zone from the first megaflow and commit to CT.
>>>
>>> The current implementation will generate a megaflow which does not use
>>> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone 
>>> is
>>> set by an Imm not a field.
>>>
>>> Consider a situation that one changes the zone id (for example to 15)
>>> in the first rule however still keep the second rule unchanged. During
>>> this change, there is traffic hitting the two generated megaflows, the
>>> revaldiator would revalidate all megaflows, however, the revalidator will
>>> not change the second megaflow, because zone=5 is recorded in the
>>> megaflow, so the xlate will still translate the commit action into zone=5,
>>> and the new traffic will still commit to CT as zone=5, not zone=15,
>>> resulting in taffic drops and other issues.
>>>
>>> Just like OVS set-field action, if the field X is set by Y, we should also
>>> mask Y as a match in the generated megaflow. An exception is that, if the 
>>> zone
>>> is set by the field that is included in the frozen state and this upcall is
>>> a resume of a thawed xlate, the masking can be skipped, as if one changes
>>> the previous rule, the generated recirc_id will be changed, and all 
>>> megaflows
>>> with the old recirc id will be invalid later, i.e. the revalidator will
>>> not reuse the old megaflows at all.
>>>
>>> Fixes: 07659514c3 ("Add support for connection tracking.")
>>> Reported-by: Sai Su 
>>> Signed-off-by: Peng He 
>>> ---
>>>  include/openvswitch/meta-flow.h |  1 +
>>>  lib/meta-flow.c | 13 +++
>>>  ofproto/ofproto-dpif-xlate.c| 15 +
>>>  tests/system-traffic.at | 39 +
>>>  4 files changed, 68 insertions(+)
>>>
>>> diff --git a/include/openvswitch/meta-flow.h 
>>> b/include/openvswitch/meta-flow.h
>>> index 95e52e358..045dce8f5 100644
>>> --- a/include/openvswitch/meta-flow.h
>>> +++ b/include/openvswitch/meta-flow.h
>>> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>>>const union mf_value *mask,
>>>struct flow *);
>>>  bool mf_is_tun_metadata(const struct mf_field *);
>>> +bool mf_is_frozen_metadata(const struct mf_field *);
>>>  bool mf_is_pipeline_field(const struct mf_field *);
>>>  bool mf_is_set(const struct mf_field *, const struct flow *);
>>>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
>>> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
>>> index c808d205d..e03cd8d0c 100644
>>> --- a/lib/meta-flow.c
>>> +++ b/lib/meta-flow.c
>>> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
>>> mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
>>>  }
>>>
>>> +bool
>>> +mf_is_frozen_metadata(const struct mf_field *mf)
>>> +{
>>> +if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
>>> +return true;
>>> +}
>>> +
>>> +if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
>>> +return true;
>>> +}
>>> +return false;
>>> +}
>>> +
>>>  bool
>>>  mf_is_pipeline_field(const struct mf_field *mf)
>>>  {
>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>> index 7108c8a30..5d1f029fd 100644
>>> --- a/ofproto/ofproto-dpif-xlate.c
>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>> @@ -6212,6 +6212,21 @@ compose_conntrack_action(struct xlate_ctx *ctx, 
>>> struct ofpact_conntrack *ofc,
>>> >xin->flow, ctx->wc, zone);
>>>  }
>>>  }
>>> +
>>> +if (ofc->zone_src.field) {
>>> +if (ctx->xin->frozen_state) {
>>> +/* If the upcall is a resume of a recirculation, we only need 
>>> to
>>> + * unwildcard the fields that are not in the frozen_metadata, 
>>> as
>>> + * when the rules update, OVS will generate a new recirc_id,
>>> + * which will invalidate the megaflow with old the recirc_id.
>>> + */
>>> +if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
>>> +mf_mask_field(ofc->zone_src.field, ctx->wc);
>> Is this 

Re: [ovs-dev] [PATCH v1] ofproto-dpif-xlate: fix zone set from non frozen metadata field

2021-02-17 Thread 贺鹏
Hi,

Thanks for the review.

Mark Gray  于2021年2月17日周三 下午6:13写道:
>
> I'm not too familiar with this code but I have some comments.
>
> On 15/02/2021 09:50, Peng He wrote:
> > CT zone could be set from a field that is not included in frozen
> > metedata. Consider the belowing cases which is normally used in
>
> Nits:
>
> s/metedata/metadata
> s/belowing cases which is/cases below which are

sorry for the typo, will fix it in the next version

>
> > OpenStack security group rules:
> >
> > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> >
> > The zone is set from the first rule's ct action. These two rules will
> > generate two megaflows: the first one uses zone=5 to query the CT module,
> > the second one set zone from the first megaflow and commit to CT.
> >
> > The current implementation will generate a megaflow which does not use
> > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone 
> > is
> > set by an Imm not a field.
> >
> > Consider a situation that one changes the zone id (for example to 15)
> > in the first rule however still keep the second rule unchanged. During
> > this change, there is traffic hitting the two generated megaflows, the
> > revaldiator would revalidate all megaflows, however, the revalidator will
> > not change the second megaflow, because zone=5 is recorded in the
> > megaflow, so the xlate will still translate the commit action into zone=5,
> > and the new traffic will still commit to CT as zone=5, not zone=15,
> > resulting in taffic drops and other issues.
> >
> > Just like OVS set-field action, if the field X is set by Y, we should also
> > mask Y as a match in the generated megaflow. An exception is that, if the 
> > zone
> > is set by the field that is included in the frozen state and this upcall is
> > a resume of a thawed xlate, the masking can be skipped, as if one changes
> > the previous rule, the generated recirc_id will be changed, and all 
> > megaflows
> > with the old recirc id will be invalid later, i.e. the revalidator will
> > not reuse the old megaflows at all.
> >
> > Fixes: 07659514c3 ("Add support for connection tracking.")
> > Reported-by: Sai Su 
> > Signed-off-by: Peng He 
> > ---
> >  include/openvswitch/meta-flow.h |  1 +
> >  lib/meta-flow.c | 13 +++
> >  ofproto/ofproto-dpif-xlate.c| 15 +
> >  tests/system-traffic.at | 39 +
> >  4 files changed, 68 insertions(+)
> >
> > diff --git a/include/openvswitch/meta-flow.h 
> > b/include/openvswitch/meta-flow.h
> > index 95e52e358..045dce8f5 100644
> > --- a/include/openvswitch/meta-flow.h
> > +++ b/include/openvswitch/meta-flow.h
> > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> >const union mf_value *mask,
> >struct flow *);
> >  bool mf_is_tun_metadata(const struct mf_field *);
> > +bool mf_is_frozen_metadata(const struct mf_field *);
> >  bool mf_is_pipeline_field(const struct mf_field *);
> >  bool mf_is_set(const struct mf_field *, const struct flow *);
> >  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> > index c808d205d..e03cd8d0c 100644
> > --- a/lib/meta-flow.c
> > +++ b/lib/meta-flow.c
> > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> > mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> >  }
> >
> > +bool
> > +mf_is_frozen_metadata(const struct mf_field *mf)
> > +{
> > +if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> > +return true;
> > +}
> > +
> > +if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> > +return true;
> > +}
> > +return false;
> > +}
> > +
> >  bool
> >  mf_is_pipeline_field(const struct mf_field *mf)
> >  {
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 7108c8a30..5d1f029fd 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -6212,6 +6212,21 @@ compose_conntrack_action(struct xlate_ctx *ctx, 
> > struct ofpact_conntrack *ofc,
> > >xin->flow, ctx->wc, zone);
> >  }
> >  }
> > +
> > +if (ofc->zone_src.field) {
> > +if (ctx->xin->frozen_state) {
> > +/* If the upcall is a resume of a recirculation, we only need 
> > to
> > + * unwildcard the fields that are not in the frozen_metadata, 
> > as
> > + * when the rules update, OVS will generate a new recirc_id,
> > + * which will invalidate the megaflow with old the recirc_id.
> > + */
> > +if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > +mf_mask_field(ofc->zone_src.field, ctx->wc);
> Is this the only field we should check and un-wildcard here. This 

Re: [ovs-dev] [PATCH v1] ofproto-dpif-xlate: fix zone set from non frozen metadata field

2021-02-17 Thread Mark Gray
I'm not too familiar with this code but I have some comments.

On 15/02/2021 09:50, Peng He wrote:
> CT zone could be set from a field that is not included in frozen
> metedata. Consider the belowing cases which is normally used in

Nits:

s/metedata/metadata
s/belowing cases which is/cases below which are

> OpenStack security group rules:
> 
> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> 
> The zone is set from the first rule's ct action. These two rules will
> generate two megaflows: the first one uses zone=5 to query the CT module,
> the second one set zone from the first megaflow and commit to CT.
> 
> The current implementation will generate a megaflow which does not use
> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> set by an Imm not a field.
> 
> Consider a situation that one changes the zone id (for example to 15)
> in the first rule however still keep the second rule unchanged. During
> this change, there is traffic hitting the two generated megaflows, the
> revaldiator would revalidate all megaflows, however, the revalidator will
> not change the second megaflow, because zone=5 is recorded in the
> megaflow, so the xlate will still translate the commit action into zone=5,
> and the new traffic will still commit to CT as zone=5, not zone=15,
> resulting in taffic drops and other issues.
> 
> Just like OVS set-field action, if the field X is set by Y, we should also
> mask Y as a match in the generated megaflow. An exception is that, if the zone
> is set by the field that is included in the frozen state and this upcall is
> a resume of a thawed xlate, the masking can be skipped, as if one changes
> the previous rule, the generated recirc_id will be changed, and all megaflows
> with the old recirc id will be invalid later, i.e. the revalidator will
> not reuse the old megaflows at all.
> 
> Fixes: 07659514c3 ("Add support for connection tracking.")
> Reported-by: Sai Su 
> Signed-off-by: Peng He 
> ---
>  include/openvswitch/meta-flow.h |  1 +
>  lib/meta-flow.c | 13 +++
>  ofproto/ofproto-dpif-xlate.c| 15 +
>  tests/system-traffic.at | 39 +
>  4 files changed, 68 insertions(+)
> 
> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> index 95e52e358..045dce8f5 100644
> --- a/include/openvswitch/meta-flow.h
> +++ b/include/openvswitch/meta-flow.h
> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>const union mf_value *mask,
>struct flow *);
>  bool mf_is_tun_metadata(const struct mf_field *);
> +bool mf_is_frozen_metadata(const struct mf_field *);
>  bool mf_is_pipeline_field(const struct mf_field *);
>  bool mf_is_set(const struct mf_field *, const struct flow *);
>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> index c808d205d..e03cd8d0c 100644
> --- a/lib/meta-flow.c
> +++ b/lib/meta-flow.c
> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
>  }
>  
> +bool
> +mf_is_frozen_metadata(const struct mf_field *mf)
> +{
> +if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> +return true;
> +}
> +
> +if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> +return true;
> +}
> +return false;
> +}
> +
>  bool
>  mf_is_pipeline_field(const struct mf_field *mf)
>  {
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 7108c8a30..5d1f029fd 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -6212,6 +6212,21 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct 
> ofpact_conntrack *ofc,
> >xin->flow, ctx->wc, zone);
>  }
>  }
> +
> +if (ofc->zone_src.field) {
> +if (ctx->xin->frozen_state) {
> +/* If the upcall is a resume of a recirculation, we only need to
> + * unwildcard the fields that are not in the frozen_metadata, as
> + * when the rules update, OVS will generate a new recirc_id,
> + * which will invalidate the megaflow with old the recirc_id.
> + */
> +if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> +mf_mask_field(ofc->zone_src.field, ctx->wc);
Is this the only field we should check and un-wildcard here. This seems
like it would be applicable across other fields.
> +}
> +} else {
> +mf_mask_field(ofc->zone_src.field, ctx->wc);
> +}
> +}

Add a new line after the bracket

>  nl_msg_put_u16(ctx->odp_actions, OVS_CT_ATTR_ZONE, zone);
>  put_ct_mark(>xin->flow, ctx->odp_actions, ctx->wc);
>  

Re: [ovs-dev] [PATCH] Documentation: Fix DPDK qos example.

2021-02-17 Thread Eelco Chaudron



On 17 Feb 2021, at 4:39, William Tu wrote:


RFC4115 says "The CIR and EIR are both measured in bits/s."
Fix the example use case based on the decription.
64-Byte packet * 8 * 1000 pps = 512000


Did you run some tests to verify the changes you made?

Fixes: e61bdffc2a98 ("netdev-dpdk: Add new DPDK RFC 4115 egress 
policer")

Signed-off-by: William Tu 
---
 Documentation/topics/dpdk/qos.rst | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/topics/dpdk/qos.rst 
b/Documentation/topics/dpdk/qos.rst

index 103495415a9c..e9a51ab3a3f0 100644
--- a/Documentation/topics/dpdk/qos.rst
+++ b/Documentation/topics/dpdk/qos.rst
@@ -69,22 +69,22 @@ to prioritize certain traffic over others at a 
port level.


 For example, the following configuration will limit the traffic rate 
at a
 port level to a maximum of 2000 packets a second (64 bytes IPv4 
packets).

-100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
+1000pps as CIR (Committed Information Rate) and 1000pps as EIR 
(Excess
 Information Rate). High priority traffic is routed to queue 10, which 
marks
 all traffic as CIR, i.e. Green. All low priority traffic, queue 20, 
is

 marked as EIR, i.e. Yellow::

 $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
 --id=@myqos create qos type=trtcm-policer \
-other-config:cir=52000 other-config:cbs=2048 \
-other-config:eir=52000 other-config:ebs=2048  \
+other-config:cir=512000 other-config:cbs=2048 \
+other-config:eir=512000 other-config:ebs=2048  \
 queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
  --id=@dpdk1Q10 create queue \
-  other-config:cir=4160 other-config:cbs=2048 \
-  other-config:eir=0 other-config:ebs=0 -- \
+  other-config:cir=512000 other-config:cbs=2048 \
+  other-config:eir=512000 other-config:ebs=0 -- \


The eir should stay zero here


  --id=@dpdk1Q20 create queue \
other-config:cir=0 other-config:cbs=0 \
-   other-config:eir=4160 other-config:ebs=2048 \
+   other-config:eir=512000 other-config:ebs=2048 \

 This configuration accomplishes that the high priority traffic has a
 guaranteed bandwidth egressing the ports at CIR (1000pps), but it can 
also


I’ll re-run some of my tests if you have not done so with the new 
config. Hopefully next week or the week after.


//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev