Re: [ovs-dev] [PATCH ovn 2/3] northd: update stage-name if changed

2021-06-21 Thread Han Zhou
On Sat, Jun 19, 2021 at 2:51 AM Mark Gray  wrote:
>
> If a new table is added to a logical flow pipeline, the mapping between
> 'external_ids:stage-name' from the 'Logical_Flow' table in the
> 'OVN_Southbound' database and the 'stage' number may change for some
tables.
>
> If 'ovn-northd' is started against a populated Southbound database,
> 'external_ids:stage-name' will not be updated to reflect the new, correct
> name. This will cause the stage name to be incorrectly displayed by some
> tools and commands such as `ovn-sbctl dump-flows`.
>
> This commit, reconciles changes to the stage name as part of
build_lflows().
>

This is interesting. It means a flow F existed in stage S (named "foo") of
ovn-northd version V1, and in version V2, S becomes S + 1 ("foo"), but in
the V2's stage S ("bar") there happens to be an exactly same flow like F
existing, but the stage name shown in ovn-sbctl dump-flows will be "foo"
instead of "bar". Is this the scenario the patch is trying to fix?

If so, I think it is better not only fixing the "stage-name" but also other
external_ids, including "source" and "stage-hint", for the same reason.

Thanks,
Han

> Suggested-by: Ilya Maximets 
> Signed-off-by: Mark Gray 
> ---
>  northd/ovn-northd.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index aae7314c8977..d97ab4a5b39c 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -12412,6 +12412,13 @@ build_lflows(struct northd_context *ctx, struct
hmap *datapaths,
>  ovn_stage_build(dp_type, pipeline, sbflow->table_id),
>  sbflow->priority, sbflow->match, sbflow->actions,
sbflow->hash);
>  if (lflow) {
> +const char *stage_name = smap_get_def(>external_ids,
> +  "stage-name", "");
> +if (strcmp(stage_name, ovn_stage_to_str(lflow->stage))) {
> +sbrec_logical_flow_update_external_ids_setkey(sbflow,
> + "stage-name", ovn_stage_to_str(lflow->stage));
> +}
> +
>  /* This is a valid lflow.  Checking if the datapath group
needs
>   * updates. */
>  bool update_dp_group = false;
> @@ -14197,6 +14204,8 @@ main(int argc, char *argv[])
>  add_column_noalert(ovnsb_idl_loop.idl,
_logical_flow_col_priority);
>  add_column_noalert(ovnsb_idl_loop.idl,
_logical_flow_col_match);
>  add_column_noalert(ovnsb_idl_loop.idl,
_logical_flow_col_actions);
> +ovsdb_idl_add_column(ovnsb_idl_loop.idl,
> + _logical_flow_col_external_ids);
>
>  ovsdb_idl_add_table(ovnsb_idl_loop.idl,
>  _table_logical_dp_group);
> --
> 2.27.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 1/3] system-ovn.at: fix typo

2021-06-21 Thread Han Zhou
Acked-by: Han Zhou 

On Sat, Jun 19, 2021 at 2:51 AM Mark Gray  wrote:

> Signed-off-by: Mark Gray 
> ---
>  tests/system-ovn.at | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 1b8bb3803def..552fdae52665 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -23,7 +23,7 @@ start_daemon ovn-controller
>
>  # Logical network:
>  # Two LRs - R1 and R2 that are connected to each other via LS "join"
> -# in 20.0.0.0/24 network. R1 has switchess foo (192.168.1.0/24) and
> +# in 20.0.0.0/24 network. R1 has switches foo (192.168.1.0/24) and
>  # bar (192.168.2.0/24) connected to it. R2 has alice (172.16.1.0/24)
> connected
>  # to it.  R2 is a gateway router on which we add NAT rules.
>  #
> --
> 2.27.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] ovn.at: Fix test "virtual ports -- ovn-northd-ddlog".

2021-06-21 Thread Han Zhou
On Thu, Jun 17, 2021 at 12:59 PM Mark Michelson  wrote:
>
> On 6/14/21 2:44 PM, Ben Pfaff wrote:
> > On Fri, Jun 11, 2021 at 03:48:52PM -0700, Han Zhou wrote:
> >> The test case fails quite often for northd-ddlog because of the tunnel
> >> keys mismatch when comparing OpenFlow rules. Keys can change in
> >> different runs. This patch fixes it by extracting the expected keys
from
> >> SB DB before comparison instead of hardcoding.
> >>
> >> There are some other potential timing issues in this test and this
> >> patch fixes them as well by replacing AT_CHECK with OVS_WAIT_UNTIL.
> >>
> >> Signed-off-by: Han Zhou 
> >
> > Awesome!  Thank you.
> >
> >> -AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find
port_binding \
> >> +OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find
port_binding \
> >>   logical_port=sw0-vir) = x], [0], [])
> >
> > I think the above can be better written:
> >  wait_row_count Port_Binding 0 logical_port=sw0-vir
>
> I don't think this is correct. The test is not attempting to wait for
> the Port_Binding record to be deleted. It's waiting for the chassis
> column in the Port_Binding to contain an empty string. I think
> wait_column() could work:
>
> wait_column "" Port_Binding chassis logical_port=sw0-vir
>
> (assuming that testing for an empty string works)
>

Thanks Ben and Mark! I used wait_column in v4:
https://patchwork.ozlabs.org/project/ovn/patch/20210622015529.2005615-1-hz...@ovn.org/

> >
> >
> >>   # Cleanup hv1-vif3.
> >>   as hv1
> >>   ovs-vsctl del-port hv1-vif3
> >>
> >> -AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find
port_binding \
> >> +OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find
port_binding \
> >>   logical_port=sw0-vir) = x], [0], [])
> >
> > Ditto?
> >
> >> +sw0_dp_key=$(fetch_column Datapath_Binding tunnel_key
external_ids:name=sw0)
> >> +lr0_dp_key=$(fetch_column Datapath_Binding tunnel_key
external_ids:name=lr0)
> >> +lr0_public_dp_key=$(fetch_column Port_Binding tunnel_key
logical_port=lr0-public)
> >
> > I think that the above will retrieve tunnel keys in decimal...
> >
> >> +AT_CHECK_UNQUOTED([as $hv ovs-ofctl dump-flows br-int table=44 |
ofctl_strip_all | grep "priority=2000"], [0], [dnl
> >> + table=44, priority=2000,ip,metadata=0x$sw0_dp_key
actions=resubmit(,45)
> >> + table=44, priority=2000,ipv6,metadata=0x$sw0_dp_key
actions=resubmit(,45)
> >>   ])
> >
> > ...therefore I think that the above 0x should not be there.  (I guess
> > the test passes because the numbers in the test are all under 10.)
>
> Yeah, it should probably be fine for this test to not worry about this.
>
I changed it to hex format in v4.

Thanks,
Han
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v4] ovn.at: Fix test "virtual ports -- ovn-northd-ddlog".

2021-06-21 Thread Han Zhou
The test case fails quite often for northd-ddlog because of the tunnel
keys mismatch when comparing OpenFlow rules. Keys can change in
different runs. This patch fixes it by extracting the expected keys from
SB DB before comparison instead of hardcoding.

There are some other potential timing issues in this test and this
patch fixes them as well by replacing AT_CHECK with wait_column.

Signed-off-by: Han Zhou 
---
v1 -> v2: addresses Mark Michelson's comments - adding more fixes to potential
  timing issues.
v2 -> v3: addresses Ben and Mark's comments:
- Using wait_column instead of OVS_WAIT_UNTIL.
- Using hex instead of decimal for the tunnel key match.
v3 -> v4: correcting a silly mistake of hex conversion in v3.

 tests/ovn.at | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index bc494fcad..edd9a2042 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -17311,27 +17311,29 @@ logical_port=sw0-vir) = x])
 as hv1
 ovs-vsctl set interface hv1-vif3 external-ids:iface-id=sw0-vir
 
-AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find port_binding \
-logical_port=sw0-vir) = x], [0], [])
+wait_column "" Port_Binding chassis logical_port=sw0-vir
 
 # Cleanup hv1-vif3.
 as hv1
 ovs-vsctl del-port hv1-vif3
 
-AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find port_binding \
-logical_port=sw0-vir) = x], [0], [])
+wait_column "" Port_Binding chassis logical_port=sw0-vir
 
 check_virtual_offlows_present() {
 hv=$1
 
-AT_CHECK([as $hv ovs-ofctl dump-flows br-int table=44 | ofctl_strip_all | 
grep "priority=2000"], [0], [dnl
- table=44, priority=2000,ip,metadata=0x1 actions=resubmit(,45)
- table=44, priority=2000,ipv6,metadata=0x1 actions=resubmit(,45)
+sw0_dp_key=$(printf "%x" $(fetch_column Datapath_Binding tunnel_key 
external_ids:name=sw0))
+lr0_dp_key=$(printf "%x" $(fetch_column Datapath_Binding tunnel_key 
external_ids:name=lr0))
+lr0_public_dp_key=$(printf "%x" $(fetch_column Port_Binding tunnel_key 
logical_port=lr0-public))
+
+AT_CHECK_UNQUOTED([as $hv ovs-ofctl dump-flows br-int table=44 | 
ofctl_strip_all | grep "priority=2000"], [0], [dnl
+ table=44, priority=2000,ip,metadata=0x$sw0_dp_key actions=resubmit(,45)
+ table=44, priority=2000,ipv6,metadata=0x$sw0_dp_key actions=resubmit(,45)
 ])
 
-AT_CHECK([as $hv ovs-ofctl dump-flows br-int table=11 | ofctl_strip_all | \
+AT_CHECK_UNQUOTED([as $hv ovs-ofctl dump-flows br-int table=11 | 
ofctl_strip_all | \
 grep "priority=92" | grep 172.168.0.50], [0], [dnl
- table=11, 
priority=92,arp,reg14=0x3,metadata=0x3,arp_tpa=172.168.0.50,arp_op=1 
actions=move:NXM_OF_ETH_SRC[[]]->NXM_OF_ETH_DST[[]],mod_dl_src:10:54:00:00:00:10,load:0x2->NXM_OF_ARP_OP[[]],move:NXM_NX_ARP_SHA[[]]->NXM_NX_ARP_THA[[]],load:0x10540010->NXM_NX_ARP_SHA[[]],push:NXM_OF_ARP_SPA[[]],push:NXM_OF_ARP_TPA[[]],pop:NXM_OF_ARP_SPA[[]],pop:NXM_OF_ARP_TPA[[]],move:NXM_NX_REG14[[]]->NXM_NX_REG15[[]],load:0x1->NXM_NX_REG10[[0]],resubmit(,37)
+ table=11, 
priority=92,arp,reg14=0x$lr0_public_dp_key,metadata=0x$lr0_dp_key,arp_tpa=172.168.0.50,arp_op=1
 
actions=move:NXM_OF_ETH_SRC[[]]->NXM_OF_ETH_DST[[]],mod_dl_src:10:54:00:00:00:10,load:0x2->NXM_OF_ARP_OP[[]],move:NXM_NX_ARP_SHA[[]]->NXM_NX_ARP_THA[[]],load:0x10540010->NXM_NX_ARP_SHA[[]],push:NXM_OF_ARP_SPA[[]],push:NXM_OF_ARP_TPA[[]],pop:NXM_OF_ARP_SPA[[]],pop:NXM_OF_ARP_TPA[[]],move:NXM_NX_REG14[[]]->NXM_NX_REG15[[]],load:0x1->NXM_NX_REG10[[0]],resubmit(,37)
 ])
 }
 
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3] ovn.at: Fix test "virtual ports -- ovn-northd-ddlog".

2021-06-21 Thread Han Zhou
On Mon, Jun 21, 2021 at 6:37 PM Han Zhou  wrote:
>
> The test case fails quite often for northd-ddlog because of the tunnel
> keys mismatch when comparing OpenFlow rules. Keys can change in
> different runs. This patch fixes it by extracting the expected keys from
> SB DB before comparison instead of hardcoding.
>
> There are some other potential timing issues in this test and this
> patch fixes them as well by replacing AT_CHECK with wait_column.
>
> Signed-off-by: Han Zhou 
> ---
> v1 -> v2: addresses Mark Michelson's comments - adding more fixes to
potential
>   timing issues.
> v2 -> v3: addresses Ben and Mark's comments:
> - Using wait_column instead of OVS_WAIT_UNTIL.
> - Using hex instead of decimal for the tunnel key match.
>
>  tests/ovn.at | 20 +++-
>  1 file changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index bc494fcad..914e07636 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -17311,27 +17311,29 @@ logical_port=sw0-vir) = x])
>  as hv1
>  ovs-vsctl set interface hv1-vif3 external-ids:iface-id=sw0-vir
>
> -AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find port_binding \
> -logical_port=sw0-vir) = x], [0], [])
> +wait_column "" Port_Binding chassis logical_port=sw0-vir
>
>  # Cleanup hv1-vif3.
>  as hv1
>  ovs-vsctl del-port hv1-vif3
>
> -AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find port_binding \
> -logical_port=sw0-vir) = x], [0], [])
> +wait_column "" Port_Binding chassis logical_port=sw0-vir
>
>  check_virtual_offlows_present() {
>  hv=$1
>
> -AT_CHECK([as $hv ovs-ofctl dump-flows br-int table=44 |
ofctl_strip_all | grep "priority=2000"], [0], [dnl
> - table=44, priority=2000,ip,metadata=0x1 actions=resubmit(,45)
> - table=44, priority=2000,ipv6,metadata=0x1 actions=resubmit(,45)
> +sw0_dp_key=$((16#$(fetch_column Datapath_Binding tunnel_key
external_ids:name=sw0)))
> +lr0_dp_key=$((16#$(fetch_column Datapath_Binding tunnel_key
external_ids:name=lr0)))
> +lr0_public_dp_key=$((16#$(fetch_column Port_Binding tunnel_key
logical_port=lr0-public)))

Sorry, this is not correct. Please ignore this version. I will send v4.

> +
> +AT_CHECK_UNQUOTED([as $hv ovs-ofctl dump-flows br-int table=44 |
ofctl_strip_all | grep "priority=2000"], [0], [dnl
> + table=44, priority=2000,ip,metadata=0x$sw0_dp_key actions=resubmit(,45)
> + table=44, priority=2000,ipv6,metadata=0x$sw0_dp_key
actions=resubmit(,45)
>  ])
>
> -AT_CHECK([as $hv ovs-ofctl dump-flows br-int table=11 |
ofctl_strip_all | \
> +AT_CHECK_UNQUOTED([as $hv ovs-ofctl dump-flows br-int table=11 |
ofctl_strip_all | \
>  grep "priority=92" | grep 172.168.0.50], [0], [dnl
> - table=11,
priority=92,arp,reg14=0x3,metadata=0x3,arp_tpa=172.168.0.50,arp_op=1
actions=move:NXM_OF_ETH_SRC[[]]->NXM_OF_ETH_DST[[]],mod_dl_src:10:54:00:00:00:10,load:0x2->NXM_OF_ARP_OP[[]],move:NXM_NX_ARP_SHA[[]]->NXM_NX_ARP_THA[[]],load:0x10540010->NXM_NX_ARP_SHA[[]],push:NXM_OF_ARP_SPA[[]],push:NXM_OF_ARP_TPA[[]],pop:NXM_OF_ARP_SPA[[]],pop:NXM_OF_ARP_TPA[[]],move:NXM_NX_REG14[[]]->NXM_NX_REG15[[]],load:0x1->NXM_NX_REG10[[0]],resubmit(,37)
> + table=11,
priority=92,arp,reg14=0x$lr0_public_dp_key,metadata=0x$lr0_dp_key,arp_tpa=172.168.0.50,arp_op=1
actions=move:NXM_OF_ETH_SRC[[]]->NXM_OF_ETH_DST[[]],mod_dl_src:10:54:00:00:00:10,load:0x2->NXM_OF_ARP_OP[[]],move:NXM_NX_ARP_SHA[[]]->NXM_NX_ARP_THA[[]],load:0x10540010->NXM_NX_ARP_SHA[[]],push:NXM_OF_ARP_SPA[[]],push:NXM_OF_ARP_TPA[[]],pop:NXM_OF_ARP_SPA[[]],pop:NXM_OF_ARP_TPA[[]],move:NXM_NX_REG14[[]]->NXM_NX_REG15[[]],load:0x1->NXM_NX_REG10[[0]],resubmit(,37)
>  ])
>  }
>
> --
> 2.30.2
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3] ovn.at: Fix test "virtual ports -- ovn-northd-ddlog".

2021-06-21 Thread Han Zhou
The test case fails quite often for northd-ddlog because of the tunnel
keys mismatch when comparing OpenFlow rules. Keys can change in
different runs. This patch fixes it by extracting the expected keys from
SB DB before comparison instead of hardcoding.

There are some other potential timing issues in this test and this
patch fixes them as well by replacing AT_CHECK with wait_column.

Signed-off-by: Han Zhou 
---
v1 -> v2: addresses Mark Michelson's comments - adding more fixes to potential
  timing issues.
v2 -> v3: addresses Ben and Mark's comments:
- Using wait_column instead of OVS_WAIT_UNTIL.
- Using hex instead of decimal for the tunnel key match.

 tests/ovn.at | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index bc494fcad..914e07636 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -17311,27 +17311,29 @@ logical_port=sw0-vir) = x])
 as hv1
 ovs-vsctl set interface hv1-vif3 external-ids:iface-id=sw0-vir
 
-AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find port_binding \
-logical_port=sw0-vir) = x], [0], [])
+wait_column "" Port_Binding chassis logical_port=sw0-vir
 
 # Cleanup hv1-vif3.
 as hv1
 ovs-vsctl del-port hv1-vif3
 
-AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find port_binding \
-logical_port=sw0-vir) = x], [0], [])
+wait_column "" Port_Binding chassis logical_port=sw0-vir
 
 check_virtual_offlows_present() {
 hv=$1
 
-AT_CHECK([as $hv ovs-ofctl dump-flows br-int table=44 | ofctl_strip_all | 
grep "priority=2000"], [0], [dnl
- table=44, priority=2000,ip,metadata=0x1 actions=resubmit(,45)
- table=44, priority=2000,ipv6,metadata=0x1 actions=resubmit(,45)
+sw0_dp_key=$((16#$(fetch_column Datapath_Binding tunnel_key 
external_ids:name=sw0)))
+lr0_dp_key=$((16#$(fetch_column Datapath_Binding tunnel_key 
external_ids:name=lr0)))
+lr0_public_dp_key=$((16#$(fetch_column Port_Binding tunnel_key 
logical_port=lr0-public)))
+
+AT_CHECK_UNQUOTED([as $hv ovs-ofctl dump-flows br-int table=44 | 
ofctl_strip_all | grep "priority=2000"], [0], [dnl
+ table=44, priority=2000,ip,metadata=0x$sw0_dp_key actions=resubmit(,45)
+ table=44, priority=2000,ipv6,metadata=0x$sw0_dp_key actions=resubmit(,45)
 ])
 
-AT_CHECK([as $hv ovs-ofctl dump-flows br-int table=11 | ofctl_strip_all | \
+AT_CHECK_UNQUOTED([as $hv ovs-ofctl dump-flows br-int table=11 | 
ofctl_strip_all | \
 grep "priority=92" | grep 172.168.0.50], [0], [dnl
- table=11, 
priority=92,arp,reg14=0x3,metadata=0x3,arp_tpa=172.168.0.50,arp_op=1 
actions=move:NXM_OF_ETH_SRC[[]]->NXM_OF_ETH_DST[[]],mod_dl_src:10:54:00:00:00:10,load:0x2->NXM_OF_ARP_OP[[]],move:NXM_NX_ARP_SHA[[]]->NXM_NX_ARP_THA[[]],load:0x10540010->NXM_NX_ARP_SHA[[]],push:NXM_OF_ARP_SPA[[]],push:NXM_OF_ARP_TPA[[]],pop:NXM_OF_ARP_SPA[[]],pop:NXM_OF_ARP_TPA[[]],move:NXM_NX_REG14[[]]->NXM_NX_REG15[[]],load:0x1->NXM_NX_REG10[[0]],resubmit(,37)
+ table=11, 
priority=92,arp,reg14=0x$lr0_public_dp_key,metadata=0x$lr0_dp_key,arp_tpa=172.168.0.50,arp_op=1
 
actions=move:NXM_OF_ETH_SRC[[]]->NXM_OF_ETH_DST[[]],mod_dl_src:10:54:00:00:00:10,load:0x2->NXM_OF_ARP_OP[[]],move:NXM_NX_ARP_SHA[[]]->NXM_NX_ARP_THA[[]],load:0x10540010->NXM_NX_ARP_SHA[[]],push:NXM_OF_ARP_SPA[[]],push:NXM_OF_ARP_TPA[[]],pop:NXM_OF_ARP_SPA[[]],pop:NXM_OF_ARP_TPA[[]],move:NXM_NX_REG14[[]]->NXM_NX_REG15[[]],load:0x1->NXM_NX_REG10[[0]],resubmit(,37)
 ])
 }
 
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 3/3] northd: add check_pkt_larger lflows for ingress traffic

2021-06-21 Thread Numan Siddique
On Fri, Jun 11, 2021 at 10:32 AM Lorenzo Bianconi
 wrote:
>
> Introduce check_pkt_larger action for ingress traffic
> entering the cluster from a distributed gw router port
> or from a gw router. This patch enables pMTU discovery
> for ingress traffic.
>
> Signed-off-by: Lorenzo Bianconi 


Hi Lorenzo,

Please update the ovn nb logical flow documentation.

This patch calls check_pkt_larger() in two places - one in
'lr_in_admission' and the other
in 'lr_in_chk_pkt_len'.   Instead of this I'd suggest to first
determine if check pkt larger is
required or not by storing 1 in a register.
And then you can call check_pkt_larger() if the register bit is set.

Thanks
Numan

> ---
>  northd/ovn-northd.c | 166 ++--
>  tests/ovn.at| 137 ++--
>  2 files changed, 231 insertions(+), 72 deletions(-)
>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 23367dbb0..a9b046898 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -9501,6 +9501,10 @@ build_adm_ctrl_flows_for_lrouter(
>  }
>  }
>
> +static void
> +build_check_pkt_len_action_string(struct ovn_port *op, int *pmtu,
> +  struct ds *actions);
> +
>  /* Logical router ingress Table 0: L2 Admission Control
>   * This table drops packets that the router shouldn’t see at all based
>   * on their Ethernet headers.
> @@ -9528,6 +9532,8 @@ build_adm_ctrl_flows_for_lrouter_port(
>   * the pipeline.
>   */
>  ds_clear(actions);
> +
> +build_check_pkt_len_action_string(op, NULL, actions);
>  ds_put_format(actions, REG_INPORT_ETH_ADDR " = %s; next;",
>op->lrp_networks.ea_s);
>
> @@ -10422,32 +10428,110 @@ build_arp_resolve_flows_for_lrouter_port(
>
>  }
>
> +static void
> +build_icmperr_pkt_big_flows(struct ovn_port *op, int mtu, struct hmap 
> *lflows,
> +struct ds *match, struct ds *actions,
> +enum ovn_stage stage)
> +{
> +if (op->lrp_networks.ipv4_addrs) {
> +ds_clear(match);
> +ds_put_format(match,
> +  "inport == %s && ip4 && "REGBIT_PKT_LARGER
> +  " && !"REGBIT_EGRESS_LOOPBACK, op->json_key);
> +
> +ds_clear(actions);
> +/* Set icmp4.frag_mtu to gw_mtu */
> +ds_put_format(actions,
> +"icmp4_error {"
> +REGBIT_EGRESS_LOOPBACK" = 1; "
> +REGBIT_PKT_LARGER" = 0; "
> +"eth.dst = %s; "
> +"ip4.dst = ip4.src; "
> +"ip4.src = %s; "
> +"ip.ttl = 255; "
> +"icmp4.type = 3; /* Destination Unreachable. */ "
> +"icmp4.code = 4; /* Frag Needed and DF was Set. */ "
> +"icmp4.frag_mtu = %d; "
> +"next(pipeline=ingress, table=%d); };",
> +op->lrp_networks.ea_s,
> +op->lrp_networks.ipv4_addrs[0].addr_s,
> +mtu, ovn_stage_get_table(S_ROUTER_IN_ADMISSION));
> +ovn_lflow_add_with_hint(lflows, op->od, stage, 150,
> +ds_cstr(match), ds_cstr(actions),
> +>nbrp->header_);
> +}
> +
> +if (op->lrp_networks.ipv6_addrs) {
> +ds_clear(match);
> +ds_put_format(match, "inport == %s && ip6 && "REGBIT_PKT_LARGER
> +  " && !"REGBIT_EGRESS_LOOPBACK, op->json_key);
> +
> +ds_clear(actions);
> +/* Set icmp6.frag_mtu to gw_mtu */
> +ds_put_format(actions,
> +"icmp6_error {"
> +REGBIT_EGRESS_LOOPBACK" = 1; "
> +REGBIT_PKT_LARGER" = 0; "
> +"eth.dst = %s; "
> +"ip6.dst = ip6.src; "
> +"ip6.src = %s; "
> +"ip.ttl = 255; "
> +"icmp6.type = 2; /* Packet Too Big. */ "
> +"icmp6.code = 0; "
> +"icmp6.frag_mtu = %d; "
> +"next(pipeline=ingress, table=%d); };",
> +op->lrp_networks.ea_s,
> +op->lrp_networks.ipv6_addrs[0].addr_s,
> +mtu, ovn_stage_get_table(S_ROUTER_IN_ADMISSION));
> +ovn_lflow_add_with_hint(lflows, op->od, stage, 150,
> +ds_cstr(match), ds_cstr(actions),
> +>nbrp->header_);
> +}
> +}
> +
> +static void
> +build_check_pkt_len_action_string(struct ovn_port *op, int *pmtu,
> +  struct ds *actions)
> +{
> +int gw_mtu = smap_get_int(>nbrp->options, "gateway_mtu", 0);
> +
> +if (gw_mtu > 0) {
> +/* Add the flows only if gateway_mtu is configured. */
> +ds_put_format(actions,
> +  REGBIT_PKT_LARGER" = check_pkt_larger(%d); ",
> +  gw_mtu + VLAN_ETH_HEADER_LEN);
> +}
> +if (pmtu) {
> +*pmtu = gw_mtu;
> +}
> +}
> +
>  static void
>  build_check_pkt_len_flows_for_lrp(struct ovn_port *op,
> 

Re: [ovs-dev] [PATCH v3 2/3] northd: enable check_pkt_larger for gw router

2021-06-21 Thread Numan Siddique
On Fri, Jun 11, 2021 at 10:32 AM Lorenzo Bianconi
 wrote:
>
> As it is already done for distributed gw router scenario, introduce
> check_pkt_larger logical flows for gw router use case.
>
> Signed-off-by: Lorenzo Bianconi 

Hi Lorenzo,

Thanks for the patches.  You forgot to put the "ovn" tag in the
subject line and the patches never reached the ovn patchwork :).

You've missed updating the documentation.

I updated your patch with the ddlog implementation here
https://github.com/numansiddique/ovn/commits/lorenzo_cpl.
Please update your patch with the ddlog changes.

I've few comments in patch 3.

Thanks
Numan

> ---
>  northd/ovn-northd.c | 31 +-
>  tests/ovn.at| 47 +
>  2 files changed, 69 insertions(+), 9 deletions(-)
>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 512ec4a32..23367dbb0 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -10534,17 +10534,30 @@ build_check_pkt_len_flows_for_lrouter(
>  struct hmap *ports,
>  struct ds *match, struct ds *actions)
>  {
> -if (od->nbr) {
> +if (!od->nbr) {
> +return;
> +}
>
> -/* Packets are allowed by default. */
> -ovn_lflow_add(lflows, od, S_ROUTER_IN_CHK_PKT_LEN, 0, "1",
> -  "next;");
> -ovn_lflow_add(lflows, od, S_ROUTER_IN_LARGER_PKTS, 0, "1",
> -  "next;");
> +/* Packets are allowed by default. */
> +ovn_lflow_add(lflows, od, S_ROUTER_IN_CHK_PKT_LEN, 0, "1",
> +  "next;");
> +ovn_lflow_add(lflows, od, S_ROUTER_IN_LARGER_PKTS, 0, "1",
> +  "next;");
>
> -if (od->l3dgw_port && od->l3redirect_port) {
> -build_check_pkt_len_flows_for_lrp(od->l3dgw_port, lflows,
> -  ports, match, actions);
> +if (od->l3dgw_port && od->l3redirect_port) {
> +/* gw router port */
> +build_check_pkt_len_flows_for_lrp(od->l3dgw_port, lflows,
> +  ports, match, actions);
> +} else if (smap_get(>nbr->options, "chassis")) {
> +for (size_t i = 0; i < od->nbr->n_ports; i++) {
> +/* gw router */
> +struct ovn_port *rp = ovn_port_find(ports,
> +od->nbr->ports[i]->name);
> +if (!rp) {
> +continue;
> +}
> +build_check_pkt_len_flows_for_lrp(rp, lflows, ports, match,
> +  actions);
>  }
>  }
>  }
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 11a85c457..5c3ed2633 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -16408,6 +16408,53 @@ for mtu in 100 500 118; do
>  test_ip6_packet_larger $mtu
>  done
>
> +ovn-nbctl lsp-del sw0-lr0
> +
> +ovn-nbctl lr-del lr0
> +ovn-nbctl create Logical_Router name=lr1 options:chassis="hv1"
> +ovn-nbctl lrp-add lr1 lr1-sw0 00:00:00:00:ff:01 10.0.0.1/24 1000::1/64
> +ovn-nbctl lsp-add sw0 sw0-lr1
> +ovn-nbctl lsp-set-type sw0-lr1 router
> +ovn-nbctl lsp-set-addresses sw0-lr1 router
> +ovn-nbctl lsp-set-options sw0-lr1 router-port=lr1-sw0
> +
> +ovn-nbctl lrp-add lr1 lr1-public 00:00:20:20:12:13 172.168.0.100/24 
> 2000::1/64
> +ovn-nbctl lsp-del public-lr0
> +ovn-nbctl lsp-add public public-lr1
> +ovn-nbctl lsp-set-type public-lr1 router
> +ovn-nbctl lsp-set-addresses public-lr1 router
> +ovn-nbctl lsp-set-options public-lr1 router-port=lr1-public
> +
> +ovn-nbctl lr-nat-add lr1 snat 172.168.0.100 10.0.0.0/24
> +ovn-nbctl lr-nat-add lr1 snat 2000::1 1000::/64
> +
> +dp_uuid=$(ovn-sbctl find datapath_binding | grep sw0 -B2 | grep _uuid | \
> +awk '{print $3}')
> +ovn-sbctl create MAC_Binding ip=172.168.0.3 datapath=$dp_uuid \
> +logical_port=lr1-public mac="00\:00\:00\:12\:af\:11"
> +
> +# Try different gateway mtus and send a 142-byte packet (corresponding
> +# to a 124-byte MTU).  If the MTU is less than 124, ovn-controller
> +# should send icmp host not reachable with pmtu set to $mtu.
> +for mtu in 100 500 118; do
> +AS_BOX([testing gw mtu $mtu])
> +check ovn-nbctl --wait=hv set logical_router_port lr1-public 
> options:gateway_mtu=$mtu
> +ovn-sbctl dump-flows > sbflows-gw-$mtu
> +AT_CAPTURE_FILE([sbflows-gw-$mtu])
> +
> +OVS_WAIT_FOR_OUTPUT([
> +as hv1 ovs-ofctl dump-flows br-int > br-int-gw-flows-$mtu
> +AT_CAPTURE_FILE([br-int-gw-flows-$mtu])
> +grep "check_pkt_larger($(expr $mtu + 18))" br-int-gw-flows-$mtu | wc 
> -l], [0], [1
> +])
> +
> +AS_BOX([testing gw mtu $mtu - IPv4])
> +test_ip_packet_larger $mtu
> +
> +AS_BOX([testing gw mtu $mtu - IPv6])
> +test_ip6_packet_larger $mtu
> +done
> +
>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])
> --
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT is metered

2021-06-21 Thread Ilya Maximets
On 6/19/21 4:11 AM, Tony van der Peet wrote:
> Hi all
> 
> I am in favour of a better fix, bandaids tend to come back and bite you in 
> the end anyway. So, will be happy to help with this effort, though I am 
> probably going to have to defer to the rest of you when it comes to actually 
> knowing what to do in this area.
> 

It's great to have a full correct solution, but unfortunately, I don't
think that we can come up with something like a full and correct
reference counting in a short period of time.  But we still need to fix
a crash that is pretty easy to trigger.  I'm also nervous that OVN is
using meters more and more (e.g. control plane protection patch set)
and they might actually hit this issue at the high load.  Another point
is that we need a fix that we can backport to LTS, and I don't think that
reference counting would be a small change that we can easily backport
without worrying about breaking something else.

All in all, I think that we need to come up with a "bandaid" solution
and work further on correct reference counting after that.

And we also need to create a unit test that will mimic packet-out that
I encountered in OVN tests, so we can test this kind of behavior in OVS.

For the reference, the packet-out generated by OVN controller had a few
set() actions and the resubmit() to a different table. And this
table had rules leading to packet output to a tunnel port, resulting
in a tunnel push + output datapath actions.

> Cheers
> Tony
> 
> From: Aaron Conole 
> Sent: Saturday, 19 June 2021 2:50 a.m.
> To: Ilya Maximets
> Cc: Ben Pfaff; Tony van der Peet; d...@openvswitch.org; Tony van der Peet
> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT is 
> metered
> 
> Ilya Maximets  writes:
> 
>> On 6/17/21 11:59 PM, Ben Pfaff wrote:
>>> All these flags for stealing, allowing stealing, blah blah, are just
>>> ways to do some kind of dumb reference counting without actually have a
>>> reference count.  When it gets super complex like this, maybe
>>> introducing a reference count is the way to go.  It would be a bigger
>>> change, but perhaps more maintainable over time.
>>>
>>
>> Yes, I absolutely agree.  We just removed one of these hacky flags from
>> the struct dp_packet_batch and we need to get rid of the rest of them.
> 
> +1 from me - it's a bigger scoped change, but over-all, I think it's a
> better one, and makes the most sense.
> 
>> One thing that bothers me is that some parts of the code treats
>> 'should_steal=false' as "do not modify".  For example, I don't really
>> understand why these functions are carrying cutlen around and clones
>> the packet before truncating if 'should_steal=false'.
>>
>> Some action executors also have optimizations that allows to not clone
>> the packet before the fatal action if this action is the last one.
>>
>> So, in general, yes, we need to get rid of these flags and accurately
>> re-work all the packet paths.  And yes, this would be not a small
>> change.
>>
>> For now, I think, we need to find a less ugly as possible solution for
>> existing crash (maybe the one that I suggested), so we will be able to
>> backport it and continue working on correct reference counting.
>>
>> What do you think?  Tony?  Aaron?
> 
> That makes sense to me, and I hope we will actually work on the
> ref-count stuff.  I had started taking a look a few weeks back, but the
> idea of 'steal' is pretty ingrained throughout the code, so getting a
> ref-count semantic correct is a big effort.  As an example, the
> odp-execute area expects that each sub-action will have its own copy to
> modify (and this is documented with the 'should_steal' semantics).
> Taking that out will be a big rework in that area.  I think it makes
> the most sense, and we probably could implement something like COW to
> cover those cases where we really need to modify a packet without
> propagating those changes back up.
> 
>> Best regards, Ilya Maximets.
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovsdb: provide raft and command interfaces with priority

2021-06-21 Thread Ilya Maximets
On 6/11/21 5:42 PM, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> Set a soft time limit of "raft election timer"/2 on ovsdb
> processing.
> 
> This improves behaviour in large heavily loaded clusters.
> While it cannot fully eliminate spurious raft elections
> under heavy load, it significantly decreases their number.
> 
> Processing is (to the extent possible) restarted where it
> stopped on the previous iteration to ensure that sessions
> towards the tail of the session list are not starved.
> 
> Signed-off-by: Anton Ivanov 
> ---

Hey, Anton.  Thanks for the patch!
This is not a complete review, but a few things that I noticed.
See inline.

Best regards, Ilya Maximets.

>  ovsdb/jsonrpc-server.c | 80 +++---
>  ovsdb/jsonrpc-server.h |  2 +-
>  ovsdb/ovsdb-server.c   | 15 +++-
>  ovsdb/raft.c   |  5 +++
>  ovsdb/raft.h   |  3 ++
>  ovsdb/storage.c|  8 +
>  ovsdb/storage.h|  2 ++
>  7 files changed, 109 insertions(+), 6 deletions(-)
> 
> diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
> index 4e2dfc3d7..84e0f69b5 100644
> --- a/ovsdb/jsonrpc-server.c
> +++ b/ovsdb/jsonrpc-server.c
> @@ -60,7 +60,8 @@ static struct ovsdb_jsonrpc_session 
> *ovsdb_jsonrpc_session_create(
>  struct ovsdb_jsonrpc_remote *, struct jsonrpc_session *, bool);
>  static void ovsdb_jsonrpc_session_preremove_db(struct ovsdb_jsonrpc_remote *,
> struct ovsdb *);
> -static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *);
> +static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *,
> +  uint64_t limit);
>  static void ovsdb_jsonrpc_session_wait_all(struct ovsdb_jsonrpc_remote *);
>  static void ovsdb_jsonrpc_session_get_memory_usage_all(
>  const struct ovsdb_jsonrpc_remote *, struct simap *usage);
> @@ -128,6 +129,8 @@ struct ovsdb_jsonrpc_server {
>  bool read_only;/* This server is does not accept any
>transactions that can modify the database. 
> */
>  struct shash remotes;  /* Contains "struct ovsdb_jsonrpc_remote *"s. 
> */
> +struct ovsdb_jsonrpc_remote *skip_to;
> +bool yield_immediately;

'yield' doesn't seem to be a right word here.  Maybe 'wake_up' or something
similar?

Also, both fields above needs a comment.

OTOH, do we really need this filed?  I mean, if we didn't process some session,
shouldn't next session_wait() wake us up?

>  };
>  
>  /* A configured remote.  This is either a passive stream listener plus a list
> @@ -137,6 +140,7 @@ struct ovsdb_jsonrpc_remote {
>  struct ovsdb_jsonrpc_server *server;
>  struct pstream *listener;   /* Listener, if passive. */
>  struct ovs_list sessions;   /* List of "struct ovsdb_jsonrpc_session"s. 
> */
> +struct ovsdb_jsonrpc_session *skip_to;
>  uint8_t dscp;
>  bool read_only;
>  char *role;
> @@ -159,6 +163,8 @@ ovsdb_jsonrpc_server_create(bool read_only)
>  ovsdb_server_init(>up);
>  shash_init(>remotes);
>  server->read_only = read_only;
> +server->yield_immediately = false;
> +server->skip_to = NULL;
>  return server;
>  }
>  
> @@ -279,6 +285,7 @@ ovsdb_jsonrpc_server_add_remote(struct 
> ovsdb_jsonrpc_server *svr,
>  remote->dscp = options->dscp;
>  remote->read_only = options->read_only;
>  remote->role = nullable_xstrdup(options->role);
> +remote->skip_to = NULL;
>  shash_add(>remotes, name, remote);
>  
>  if (!listener) {
> @@ -378,12 +385,26 @@ ovsdb_jsonrpc_server_set_read_only(struct 
> ovsdb_jsonrpc_server *svr,
>  }
>  
>  void
> -ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr)
> +ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr, uint64_t limit)
>  {
>  struct shash_node *node;
> +uint64_t elapsed = 0, start_time = 0;
> +
> +if (limit) {
> +start_time = time_now();

Why this function uses time_now() while others are using time_msec() ?
time_now() returns seconds while 'limit' is in milliseconds.

> +}
> +
> +svr->yield_immediately = false;
>  
>  SHASH_FOR_EACH (node, >remotes) {
>  struct ovsdb_jsonrpc_remote *remote = node->data;
> +if (svr->skip_to) {
> +if (remote != svr->skip_to) {
> +continue;

What if 'skip_to' is already removed from the list?  We will, probably,
never process any remotes again.

Also, we didn't process first N remotes here and we're not setting
'yield_immediately'.  This is inconsistent, at least.  But, yes,
it's unclear if 'yield_immediately' needed at all.

> +} else {
> +svr->skip_to = NULL;
> +}
> +}
>  
>  if (remote->listener) {
>  struct stream *stream;
> @@ -403,7 +424,16 @@ ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server 
> *svr)
>  }
>  }
>  
> -

Re: [ovs-dev] [v13 08/12] dpif-netdev-unixctl.man: Document subtable-lookup-* CMDs

2021-06-21 Thread Flavio Leitner


Hi,

This commit could be submitted outside of this patch-set as fix
for commit 9ff7cabfd7 ("dpif-netdev: add subtable-lookup-prio-get
command") and commit 3d018c3ea79d ("dpif-netdev: add subtable lookup
prio set command.").

This helps to get it merged sooner and reduce this patch-set size.

Thanks for documenting it.
fbl

On Thu, Jun 17, 2021 at 05:18:21PM +0100, Cian Ferriter wrote:
> Signed-off-by: Cian Ferriter 
> 
> ---
> 
> v13:
> - New commit to update manpages with more commands that are missing.
> ---
>  lib/dpif-netdev-unixctl.man | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
> index 45a1bd669..d77f5d9a4 100644
> --- a/lib/dpif-netdev-unixctl.man
> +++ b/lib/dpif-netdev-unixctl.man
> @@ -228,6 +228,16 @@ When this is the case, the above command prints the 
> load-balancing information
>  of the bonds configured in datapath \fIdp\fR showing the interface associated
>  with each bucket (hash).
>  .
> +.IP "\fBdpif-netdev/subtable-lookup-prio-get\fR"
> +Lists the DPCLS implementations or lookup functions that are available as 
> well
> +as their priorities.
> +.
> +.IP "\fBdpif-netdev/subtable-lookup-prio-set\fR \fIlookup_function\fR \
> +\fIprio\fR"
> +Sets the priority of a lookup function by the name, \fIlookup_function\fR, 
> and
> +the priority, \fIprio\fR, which should be a positive integer value. The 
> highest
> +priority lookup function is used for classification.
> +.
>  .IP "\fBdpif-netdev/dpif-get\fR
>  Lists the DPIF implementations that are available.
>  .
> -- 
> 2.32.0
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: do not centralized traffic for unclaimed virtual ports

2021-06-21 Thread Numan Siddique
On Fri, Jun 4, 2021 at 12:55 PM Lorenzo Bianconi
 wrote:
>
> Add a rule to drop traffic from a distributed NAT if the virtual
> port has not claimed yet becaused otherwise the traffic will be
> centralized misconfiguring the TOR switch.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1952961
> Signed-off-by: Lorenzo Bianconi 
> ---
>  northd/ovn-northd.c | 23 ++-
>  tests/ovn.at| 26 ++
>  2 files changed, 44 insertions(+), 5 deletions(-)


Hi Lorenzo,  this would require changes in ovn nb documentation
and the ddlog part is missing.

Thanks
Numan

>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 9652ce252..539b8f8b0 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -11666,6 +11666,7 @@ lrouter_check_nat_entry(struct ovn_datapath *od, 
> const struct nbrec_nat *nat,
>  static void
>  build_lrouter_nat_defrag_and_lb(struct ovn_datapath *od,
>  struct hmap *lflows,
> +struct hmap *ports,
>  struct shash *meter_groups,
>  struct hmap *lbs,
>  struct ds *match, struct ds *actions)
> @@ -11773,10 +11774,21 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
> *od,
>  ds_clear(match);
>  ds_clear(actions);
>  ds_put_format(match,
> -  "ip%s.src == %s && outport == %s && "
> -  "is_chassis_resident(\"%s\")",
> +  "ip%s.src == %s && outport == %s",
>is_v6 ? "6" : "4", nat->logical_ip,
> -  od->l3dgw_port->json_key, nat->logical_port);
> +  od->l3dgw_port->json_key);
> +/* Add a rule to drop traffic from a distributed NAT if
> + * the virtual port has not claimed yet becaused otherwise
> + * the traffic will be centralized misconfiguring the TOR switch.
> + */
> +struct ovn_port *op = ovn_port_find(ports, nat->logical_port);
> +if (op && op->nbsp && !strcmp(op->nbsp->type, "virtual")) {
> +ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_GW_REDIRECT,
> +80, ds_cstr(match), "drop;",
> +>header_);
> +}
> +ds_put_format(match, " && is_chassis_resident(\"%s\")",
> +  nat->logical_port);
>  ds_put_format(actions, "eth.src = %s; %s = %s; next;",
>nat->external_mac,
>is_v6 ? REG_SRC_IPV6 : REG_SRC_IPV4,
> @@ -11935,8 +11947,9 @@ build_lswitch_and_lrouter_iterate_by_od(struct 
> ovn_datapath *od,
>  >actions);
>  build_misc_local_traffic_drop_flows_for_lrouter(od, lsi->lflows);
>  build_lrouter_arp_nd_for_datapath(od, lsi->lflows);
> -build_lrouter_nat_defrag_and_lb(od, lsi->lflows, lsi->meter_groups,
> -lsi->lbs, >match, >actions);
> +build_lrouter_nat_defrag_and_lb(od, lsi->lflows, lsi->ports,
> +lsi->meter_groups, lsi->lbs, >match,
> +>actions);
>  }
>
>  /* Helper function to combine all lflow generation which is iterated by port.
> diff --git a/tests/ovn.at b/tests/ovn.at
> index f26894ce4..7731c915e 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -17165,6 +17165,16 @@ send_arp_reply() {
>  as hv$hv ovs-appctl netdev-dummy/receive hv${hv}-vif$inport $request
>  }
>
> +send_icmp_packet() {
> +local inport=$1 hv=$2 eth_src=$3 eth_dst=$4 ipv4_src=$5 ipv4_dst=$6 
> ip_chksum=$7 data=$8
> +shift 8
> +
> +local ip_ttl=ff
> +local ip_len=001c
> +local 
> packet=${eth_dst}${eth_src}08004500${ip_len}4000${ip_ttl}01${ip_chksum}${ipv4_src}${ipv4_dst}${data}
> +as hv$hv ovs-appctl netdev-dummy/receive hv${hv}-vif$inport $packet
> +}
> +
>  net_add n1
>
>  sim_add hv1
> @@ -17377,6 +17387,22 @@ logical_port=sw0-vir) = x])
>  wait_row_count nb:Logical_Switch_Port 1 up=false name=sw0-vir
>
>  check ovn-nbctl --wait=hv sync
> +
> +# verify the traffic from virtual port is discarded if the port is not 
> claimed
> +AT_CHECK([grep lr_in_gw_redirect lr0-flows2 | grep "ip4.src == 10.0.0.10"], 
> [0], [dnl
> +  table=17(lr_in_gw_redirect  ), priority=100  , match=(ip4.src == 10.0.0.10 
> && outport == "lr0-public" && is_chassis_resident("sw0-vir")), 
> action=(eth.src = 10:54:00:00:00:10; reg1 = 172.168.0.50; next;)
> +  table=17(lr_in_gw_redirect  ), priority=80   , match=(ip4.src == 10.0.0.10 
> && outport == "lr0-public"), action=(drop;)
> +])
> +
> +eth_src=50540003
> +eth_dst=ff01
> +ip_src=$(ip_to_hex 10 0 0 10)
> +ip_dst=$(ip_to_hex 172 168 0 101)
> +send_icmp_packet 1 1 $eth_src $eth_dst $ip_src $ip_dst c4c9 
> 

Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of dpif.

2021-06-21 Thread Flavio Leitner
On Mon, Jun 21, 2021 at 04:13:12PM +, Ferriter, Cian wrote:
> Hi Flavio,
> 
> Thanks for the review. My responses are inline.
> 
> Cian
> 
> > -Original Message-
> > From: Flavio Leitner 
> > Sent: Sunday 20 June 2021 21:09
> > To: Ferriter, Cian 
> > Cc: ovs-dev@openvswitch.org; Amber, Kumar ; 
> > i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of 
> > dpif.
> > 
> > 
> > Hi,
> > 
> > I am still reviewing the patch, but I thought worth to discuss
> > few items below.
> > 
> > On Thu, Jun 17, 2021 at 05:18:17PM +0100, Cian Ferriter wrote:
> > > From: Harry van Haaren 
> > >
> > > This commit adds the AVX512 implementation of DPIF functionality,
> > > specifically the dp_netdev_input_outer_avx512 function. This function
> > > only handles outer (no re-circulations), and is optimized to use the
> > > AVX512 ISA for packet batching and other DPIF work.
> > >
> > > Sparse is not able to handle the AVX512 intrinsics, causing compile
> > > time failures, so it is disabled for this file.
> > >
> > > Signed-off-by: Harry van Haaren 
> > > Co-authored-by: Cian Ferriter 
> > > Signed-off-by: Cian Ferriter 
> > > Co-authored-by: Kumar Amber 
> > > Signed-off-by: Kumar Amber 
> > >
> > > ---
> > >
> > > v13:
> > > - Squash "Add HWOL support" commit into this commit.
> > > - Add NEWS item about this feature here rather than in a later commit.
> > > - Add #define NUM_U64_IN_ZMM_REG 8.
> > > - Add comment describing operation of while loop handling HWOL->EMC->SMC
> > >   lookups in dp_netdev_input_outer_avx512().
> > > - Add EMC and SMC batch insert functions for better handling of EMC and
> > >   SMC in AVX512 DPIF.
> > > - Minor code refactor to address review comments.
> > > ---
> > >  NEWS |   2 +
> > >  lib/automake.mk  |   5 +-
> > >  lib/dpif-netdev-avx512.c | 327 +++
> > >  lib/dpif-netdev-private-dfc.h|  25 +++
> > >  lib/dpif-netdev-private-dpif.h   |  32 +++
> > >  lib/dpif-netdev-private-thread.h |  11 +-
> > >  lib/dpif-netdev-private.h|  25 +++
> > >  lib/dpif-netdev.c| 103 --
> > >  8 files changed, 514 insertions(+), 16 deletions(-)
> > >  create mode 100644 lib/dpif-netdev-avx512.c
> > >  create mode 100644 lib/dpif-netdev-private-dpif.h
> > >
> > > diff --git a/NEWS b/NEWS
> > > index 96b3a61c8..6a4a7b76d 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -10,6 +10,8 @@ Post-v2.15.0
> > >   * Auto load balancing of PMDs now partially supports cross-NUMA 
> > > polling
> > > cases, e.g if all PMD threads are running on the same NUMA node.
> > >   * Refactor lib/dpif-netdev.c to multiple header files.
> > > + * Add avx512 implementation of dpif which can process non 
> > > recirculated
> > > +   packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
> > > - ovs-ctl:
> > >   * New option '--no-record-hostname' to disable hostname 
> > > configuration
> > > in ovsdb on startup.
> > > diff --git a/lib/automake.mk b/lib/automake.mk
> > > index 3a33cdd5c..660cd07f0 100644
> > > --- a/lib/automake.mk
> > > +++ b/lib/automake.mk
> > > @@ -33,11 +33,13 @@ lib_libopenvswitchavx512_la_CFLAGS = \
> > >   -mavx512f \
> > >   -mavx512bw \
> > >   -mavx512dq \
> > > + -mbmi \
> > >   -mbmi2 \
> > >   -fPIC \
> > >   $(AM_CFLAGS)
> > >  lib_libopenvswitchavx512_la_SOURCES = \
> > > - lib/dpif-netdev-lookup-avx512-gather.c
> > > + lib/dpif-netdev-lookup-avx512-gather.c \
> > > + lib/dpif-netdev-avx512.c
> > >  lib_libopenvswitchavx512_la_LDFLAGS = \
> > >   -static
> > >  endif
> > > @@ -114,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
> > >   lib/dpif-netdev-private-dfc.c \
> > >   lib/dpif-netdev-private-dfc.h \
> > >   lib/dpif-netdev-private-dpcls.h \
> > > + lib/dpif-netdev-private-dpif.h \
> > >   lib/dpif-netdev-private-flow.h \
> > >   lib/dpif-netdev-private-hwol.h \
> > >   lib/dpif-netdev-private-thread.h \
> > > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> > > new file mode 100644
> > > index 0..0e55b0be2
> > > --- /dev/null
> > > +++ b/lib/dpif-netdev-avx512.c
> > > @@ -0,0 +1,327 @@
> > > +/*
> > > + * Copyright (c) 2021 Intel Corporation.
> > > + *
> > > + * Licensed under the Apache License, Version 2.0 (the "License");
> > > + * you may not use this file except in compliance with the License.
> > > + * You may obtain a copy of the License at:
> > > + *
> > > + * http://www.apache.org/licenses/LICENSE-2.0
> > > + *
> > > + * Unless required by applicable law or agreed to in writing, software
> > > + * distributed under the License is distributed on an "AS IS" BASIS,
> > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
> > > implied.
> > > + * See the License for the specific language governing permissions and
> > > + * limitations under the License.
> > > + */
> > > +
> > > +#ifdef __x86_64__
> > > +/* Sparse cannot handle the 

Re: [ovs-dev] [PATCH v6 1/1] ovs-numa: Support non-contiguous numa nodes and offline CPU cores

2021-06-21 Thread Kevin Traynor
On 18/06/2021 00:43, David Wilder wrote:
> This change removes the assumption that numa nodes and cores are numbered
> contiguously in linux.  This change is required to support some Power
> systems.
> 
> A check has been added to verify that cores are online,
> offline cores result in non-contiguously numbered cores.
> 
> Dpdk EAL option generation is updated to work with non-contiguous numa nodes.
> These options can be seen in the ovs-vswitchd.log.  For example:
> a system containing only numa nodes 0 and 8 will generate the following:
> 
> EAL ARGS: ovs-vswitchd --socket-mem 1024,0,0,0,0,0,0,0,1024 \
>  --socket-limit 1024,0,0,0,0,0,0,0,1024 -l 0
> 
> Tests for pmd and dpif-netdev have been updated to validate non-contiguous
> numbered nodes.
> 

Hi David,

Just a minor comment below. Aside from that it LGTM, but it's review
based as i don't have a system with non-contiguous numas to test it on.
I don't see a regression in generating default EAL memory parameters on
my system with 2 contiguous numas.

checkpatch ok, unit tests ok, github actions ok.

Kevin.

> Signed-off-by: David Wilder 
> ---
>  lib/dpdk.c   | 57 +++--
>  lib/ovs-numa.c   | 51 
>  lib/ovs-numa.h   |  2 ++
>  tests/dpif-netdev.at |  2 +-
>  tests/pmd.at | 61 
>  5 files changed, 142 insertions(+), 31 deletions(-)
> 
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index 319540394..7f6f1d164 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -129,22 +129,63 @@ construct_dpdk_options(const struct smap 
> *ovs_other_config, struct svec *args)
>  }
>  }
>  
> +static int
> +compare_numa_node_list(const void *a_, const void *b_)
> +{
> +size_t a = *(const size_t *) a_;
> +size_t b = *(const size_t *) b_;
> +

These can be ints (see comment below)

> +if (a < b) {
> +return -1;
> +}
> +if (a > b) {
> +return 1;
> +}
> +return 0;> +}
> +
>  static char *
>  construct_dpdk_socket_mem(void)
>  {
>  const char *def_value = "1024";
> -int numa, numa_nodes = ovs_numa_get_n_numas();
> +struct ovs_numa_dump *dump;
> +const struct ovs_numa_info_numa *node;
> +size_t k = 0, last_node = 0, n_numa_nodes, *numa_node_list;

numa_node_list[] will be assigned numa->numa_id, which is type int, so
it can be:
int *numa_node_list;

>  struct ds dpdk_socket_mem = DS_EMPTY_INITIALIZER;
>  
> -if (numa_nodes == 0 || numa_nodes == OVS_NUMA_UNSPEC) {
> -numa_nodes = 1;
> -}
> +/* Build a list of all numa nodes with at least one core */
> +dump = ovs_numa_dump_n_cores_per_numa(1);
> +n_numa_nodes = hmap_count(>numas);
> +numa_node_list = xcalloc(n_numa_nodes, sizeof numa_node_list);
>  

It should dereference numa_node_list for type,
numa_node_list = xcalloc(n_numa_nodes, sizeof *numa_node_list);

> -ds_put_cstr(_socket_mem, def_value);
> -for (numa = 1; numa < numa_nodes; ++numa) {
> -ds_put_format(_socket_mem, ",%s", def_value);
> +FOR_EACH_NUMA_ON_DUMP(node, dump) {
> +if (k >= n_numa_nodes) {
> +break;
> +}
> +numa_node_list[k++] = node->numa_id;
>  }
> -
> +qsort(numa_node_list, k, sizeof numa_node_list, compare_numa_node_list);
> +
> +for (size_t i = 0; i < n_numa_nodes; i++) {
> +while (numa_node_list[i] > last_node &&
> +   numa_node_list[i] != OVS_NUMA_UNSPEC &&
> +   numa_node_list[i] <= MAX_NUMA_NODES){
> +if (last_node == 0) {
> +ds_put_format(_socket_mem, "%s", "0");
> +} else {
> +ds_put_format(_socket_mem, ",%s", "0");
> +}
> +last_node++;
> +}
> +if (numa_node_list[i] == 0) {
> +ds_put_format(_socket_mem, "%s", def_value);
> +} else {
> +ds_put_format(_socket_mem, ",%s", def_value);
> +}
> +last_node++;
> +}
> +free(numa_node_list);
> +ovs_numa_dump_destroy(dump);
>  return ds_cstr(_socket_mem);
>  }
>  
> diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
> index 6d0a68522..b825ecbdd 100644
> --- a/lib/ovs-numa.c
> +++ b/lib/ovs-numa.c
> @@ -42,21 +42,22 @@ VLOG_DEFINE_THIS_MODULE(ovs_numa);
>   * This module stores the affinity information of numa nodes and cpu cores.
>   * It also provides functions to bookkeep the pin of threads on cpu cores.
>   *
> - * It is assumed that the numa node ids and cpu core ids all start from 0 and
> - * range continuously.  So, for example, if 'ovs_numa_get_n_cores()' returns 
> N,
> - * user can assume core ids from 0 to N-1 are all valid and there is a
> - * 'struct cpu_core' for each id.
> + * It is assumed that the numa node ids and cpu core ids all start from 0.
> + * There is no guarantee that node and cpu ids are numbered consecutively
> + * (this is a change from earlier version of the code). So, 

Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of dpif.

2021-06-21 Thread Ferriter, Cian
Hi Flavio,

Thanks for the review. My responses are inline.

Cian

> -Original Message-
> From: Flavio Leitner 
> Sent: Sunday 20 June 2021 21:09
> To: Ferriter, Cian 
> Cc: ovs-dev@openvswitch.org; Amber, Kumar ; 
> i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of 
> dpif.
> 
> 
> Hi,
> 
> I am still reviewing the patch, but I thought worth to discuss
> few items below.
> 
> On Thu, Jun 17, 2021 at 05:18:17PM +0100, Cian Ferriter wrote:
> > From: Harry van Haaren 
> >
> > This commit adds the AVX512 implementation of DPIF functionality,
> > specifically the dp_netdev_input_outer_avx512 function. This function
> > only handles outer (no re-circulations), and is optimized to use the
> > AVX512 ISA for packet batching and other DPIF work.
> >
> > Sparse is not able to handle the AVX512 intrinsics, causing compile
> > time failures, so it is disabled for this file.
> >
> > Signed-off-by: Harry van Haaren 
> > Co-authored-by: Cian Ferriter 
> > Signed-off-by: Cian Ferriter 
> > Co-authored-by: Kumar Amber 
> > Signed-off-by: Kumar Amber 
> >
> > ---
> >
> > v13:
> > - Squash "Add HWOL support" commit into this commit.
> > - Add NEWS item about this feature here rather than in a later commit.
> > - Add #define NUM_U64_IN_ZMM_REG 8.
> > - Add comment describing operation of while loop handling HWOL->EMC->SMC
> >   lookups in dp_netdev_input_outer_avx512().
> > - Add EMC and SMC batch insert functions for better handling of EMC and
> >   SMC in AVX512 DPIF.
> > - Minor code refactor to address review comments.
> > ---
> >  NEWS |   2 +
> >  lib/automake.mk  |   5 +-
> >  lib/dpif-netdev-avx512.c | 327 +++
> >  lib/dpif-netdev-private-dfc.h|  25 +++
> >  lib/dpif-netdev-private-dpif.h   |  32 +++
> >  lib/dpif-netdev-private-thread.h |  11 +-
> >  lib/dpif-netdev-private.h|  25 +++
> >  lib/dpif-netdev.c| 103 --
> >  8 files changed, 514 insertions(+), 16 deletions(-)
> >  create mode 100644 lib/dpif-netdev-avx512.c
> >  create mode 100644 lib/dpif-netdev-private-dpif.h
> >
> > diff --git a/NEWS b/NEWS
> > index 96b3a61c8..6a4a7b76d 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -10,6 +10,8 @@ Post-v2.15.0
> >   * Auto load balancing of PMDs now partially supports cross-NUMA 
> > polling
> > cases, e.g if all PMD threads are running on the same NUMA node.
> >   * Refactor lib/dpif-netdev.c to multiple header files.
> > + * Add avx512 implementation of dpif which can process non recirculated
> > +   packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
> > - ovs-ctl:
> >   * New option '--no-record-hostname' to disable hostname configuration
> > in ovsdb on startup.
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index 3a33cdd5c..660cd07f0 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -33,11 +33,13 @@ lib_libopenvswitchavx512_la_CFLAGS = \
> > -mavx512f \
> > -mavx512bw \
> > -mavx512dq \
> > +   -mbmi \
> > -mbmi2 \
> > -fPIC \
> > $(AM_CFLAGS)
> >  lib_libopenvswitchavx512_la_SOURCES = \
> > -   lib/dpif-netdev-lookup-avx512-gather.c
> > +   lib/dpif-netdev-lookup-avx512-gather.c \
> > +   lib/dpif-netdev-avx512.c
> >  lib_libopenvswitchavx512_la_LDFLAGS = \
> > -static
> >  endif
> > @@ -114,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
> > lib/dpif-netdev-private-dfc.c \
> > lib/dpif-netdev-private-dfc.h \
> > lib/dpif-netdev-private-dpcls.h \
> > +   lib/dpif-netdev-private-dpif.h \
> > lib/dpif-netdev-private-flow.h \
> > lib/dpif-netdev-private-hwol.h \
> > lib/dpif-netdev-private-thread.h \
> > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> > new file mode 100644
> > index 0..0e55b0be2
> > --- /dev/null
> > +++ b/lib/dpif-netdev-avx512.c
> > @@ -0,0 +1,327 @@
> > +/*
> > + * Copyright (c) 2021 Intel Corporation.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + * http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#ifdef __x86_64__
> > +/* Sparse cannot handle the AVX512 instructions. */
> > +#if !defined(__CHECKER__)
> > +
> > +#include 
> > +
> > +#include "dpif-netdev.h"
> > +#include "dpif-netdev-perf.h"
> > +
> > +#include "dpif-netdev-private.h"
> > +#include "dpif-netdev-private-dpcls.h"
> > +#include "dpif-netdev-private-flow.h"
> > +#include 

Re: [ovs-dev] [v13 03/12] dpif-netdev: Add function pointer for netdev input.

2021-06-21 Thread Ferriter, Cian
Hi Flavio,

Thanks for the review. My responses are inline.

Cian

> -Original Message-
> From: Flavio Leitner 
> Sent: Friday 18 June 2021 13:53
> To: Ferriter, Cian 
> Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v13 03/12] dpif-netdev: Add function pointer for 
> netdev input.
> 
> 
> Hello,
> 
> On Thu, Jun 17, 2021 at 05:18:16PM +0100, Cian Ferriter wrote:
> > From: Harry van Haaren 
> >
> > This commit adds a function pointer to the pmd thread data structure,
> > giving the pmd thread flexibility in its dpif-input function choice.
> > This allows choosing of the implementation based on ISA capabilities
> > of the runtime CPU, leading to optimizations and higher performance.
> >
> > Signed-off-by: Harry van Haaren 
> >
> > ---
> >
> > v13:
> > - Minor code refactor to address review comments.
> > ---
> >  lib/dpif-netdev-private-thread.h | 13 +
> >  lib/dpif-netdev.c|  7 ++-
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/dpif-netdev-private-thread.h 
> > b/lib/dpif-netdev-private-thread.h
> > index 5e5308b96..0d674ab83 100644
> > --- a/lib/dpif-netdev-private-thread.h
> > +++ b/lib/dpif-netdev-private-thread.h
> > @@ -47,6 +47,13 @@ struct dp_netdev_pmd_thread_ctx {
> >  uint32_t emc_insert_min;
> >  };
> >
> > +/* Forward declaration for typedef. */
> > +struct dp_netdev_pmd_thread;
> > +
> > +typedef void (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> > + struct dp_packet_batch *packets,
> > + odp_port_t port_no);
> > +
> >  /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
> >   * the performance overhead of interrupt processing.  Therefore netdev can
> >   * not implement rx-wait for these devices.  dpif-netdev needs to poll
> > @@ -101,6 +108,12 @@ struct dp_netdev_pmd_thread {
> >  /* Current context of the PMD thread. */
> >  struct dp_netdev_pmd_thread_ctx ctx;
> >
> > +/* Function pointer to call for dp_netdev_input() functionality. */
> > +dp_netdev_input_func netdev_input_func;
> > +
> > +/* Pointer for per-DPIF implementation scratch space. */
> > +void *netdev_input_func_userdata;
> > +
> 
> I see you need to switch the input function and the patch looks fine, but
> the default function doesn't require netdev_input_func_userdata. I think
> it would be better to add that in the next patch where it is actually
> required.
> 

Good point, I'll move netdev_input_func_userdata to the next patch.

> fbl
> 
> >  struct seq *reload_seq;
> >  uint64_t last_reload_seq;
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index e913f4efc..e6486417e 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4231,8 +4231,9 @@ dp_netdev_process_rxq_port(struct 
> > dp_netdev_pmd_thread *pmd,
> >  }
> >  }
> >  }
> > +
> >  /* Process packet batch. */
> > -dp_netdev_input(pmd, , port_no);
> > +pmd->netdev_input_func(pmd, , port_no);
> >
> >  /* Assign processing cycles to rx queue. */
> >  cycles = cycle_timer_stop(>perf_stats, );
> > @@ -6029,6 +6030,10 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread 
> > *pmd, struct dp_netdev
> *dp,
> >  hmap_init(>tnl_port_cache);
> >  hmap_init(>send_port_cache);
> >  cmap_init(>tx_bonds);
> > +
> > +/* Initialize the DPIF function pointer to the default scalar version. 
> > */
> > +pmd->netdev_input_func = dp_netdev_input;
> > +
> >  /* init the 'flow_cache' since there is no
> >   * actual thread created for NON_PMD_CORE_ID. */
> >  if (core_id == NON_PMD_CORE_ID) {
> > --
> > 2.32.0
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> --
> fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v13 02/12] dpif-netdev: Split HWOL out to own header file.

2021-06-21 Thread Ferriter, Cian
Hi Flavio,

Thanks for the review. My comments are inline,

Cian

> -Original Message-
> From: Flavio Leitner 
> Sent: Friday 18 June 2021 13:12
> To: Ferriter, Cian 
> Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v13 02/12] dpif-netdev: Split HWOL out to own header 
> file.
> 
> 
> Hello,
> 
> I have some comments inline.
> 
> On Thu, Jun 17, 2021 at 05:18:15PM +0100, Cian Ferriter wrote:
> > From: Harry van Haaren 
> >
> > This commit moves the datapath lookup functions required for
> > hardware offload to a seperate file. This allows other DPIF
> 
> 
> Spelling error.
> 

Good spot, I'll fix this in the next version.

> > implementations to access the lookup functions, encouraging
> > code reuse.
> >
> > Signed-off-by: Harry van Haaren 
> >
> > ---
> >
> > Cc: Gaetan Rivet 
> > Cc: Sriharsha Basavapatna 
> >
> > v13:
> > - Minor code refactor to address review comments.
> > ---
> >  lib/automake.mk|  1 +
> >  lib/dpif-netdev-private-hwol.h | 63 ++
> >  lib/dpif-netdev.c  | 38 ++--
> >  3 files changed, 66 insertions(+), 36 deletions(-)
> >  create mode 100644 lib/dpif-netdev-private-hwol.h
> >
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index fdba3c6c0..3a33cdd5c 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -115,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \
> > lib/dpif-netdev-private-dfc.h \
> > lib/dpif-netdev-private-dpcls.h \
> > lib/dpif-netdev-private-flow.h \
> > +   lib/dpif-netdev-private-hwol.h \
> > lib/dpif-netdev-private-thread.h \
> > lib/dpif-netdev-private.h \
> > lib/dpif-netdev-perf.c \
> > diff --git a/lib/dpif-netdev-private-hwol.h b/lib/dpif-netdev-private-hwol.h
> > new file mode 100644
> > index 0..b93297a74
> > --- /dev/null
> > +++ b/lib/dpif-netdev-private-hwol.h
> > @@ -0,0 +1,63 @@
> > +/*
> > + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc.
> > + * Copyright (c) 2021 Intel Corporation.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + * http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#ifndef DPIF_NETDEV_PRIVATE_HWOL_H
> > +#define DPIF_NETDEV_PRIVATE_HWOL_H 1
> > +
> > +#include "dpif-netdev-private-flow.h"
> > +
> > +#define MAX_FLOW_MARK   (UINT32_MAX - 1)
> > +#define INVALID_FLOW_MARK   0
> > +/* Zero flow mark is used to indicate the HW to remove the mark. A packet
> > + * marked with zero mark is received in SW without a mark at all, so it
> > + * cannot be used as a valid mark.
> > + */
> > +
> > +struct megaflow_to_mark_data {
> > +const struct cmap_node node;
> > +ovs_u128 mega_ufid;
> > +uint32_t mark;
> > +};
> > +
> > +struct flow_mark {
> > +struct cmap megaflow_to_mark;
> > +struct cmap mark_to_flow;
> > +struct id_pool *pool;
> > +};
> > +
> > +/* allocated in dpif-netdev.c */
> > +extern struct flow_mark flow_mark;
> > +
> > +static inline struct dp_netdev_flow *
> > +mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd,
> > +  const uint32_t mark)
> > +{
> > +struct dp_netdev_flow *flow;
> > +
> > +CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0),
> > + _mark.mark_to_flow) {
> > +if (flow->mark == mark && flow->pmd_id == pmd->core_id &&
> > +flow->dead == false) {
> > +return flow;
> > +}
> > +}
> > +
> > +return NULL;
> > +}
> 
> Wouldn't this be better in a separate .c file? Because although the
> structure flow_mark is here, it is allocated in dpif-netdev.c and
> we have a fairly large function inline. This seems enough to start
> a .c file to me.
> 

The mark_to_flow_find() function looks like a good inline candidate to us since 
we can avoid function call overhead per packet. So keeping it in a header file 
so it will be inlined seems like the best approach.

> Thanks,
> fbl
> 
> > +
> > +
> > +#endif /* dpif-netdev-private-hwol.h */
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index affeeacdc..e913f4efc 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -18,6 +18,7 @@
> >  #include "dpif-netdev.h"
> >  #include "dpif-netdev-private.h"
> >  #include "dpif-netdev-private-dfc.h"
> > +#include "dpif-netdev-private-hwol.h"
> >
> >  #include 
> >  #include 
> > @@ -1953,26 +1954,8 @@ dp_netdev_pmd_find_dpcls(struct 

Re: [ovs-dev] [v13 01/12] dpif-netdev: Refactor to multiple header files.

2021-06-21 Thread Ferriter, Cian
Hi Flavio,

Thanks for the review. My responses are inline.

Thanks,
Cian

> -Original Message-
> From: Flavio Leitner 
> Sent: Friday 18 June 2021 13:12
> To: Ferriter, Cian 
> Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v13 01/12] dpif-netdev: Refactor to multiple header 
> files.
> 
> 
> Hello,
> 
> Some comments below.
> 
> On Thu, Jun 17, 2021 at 05:18:14PM +0100, Cian Ferriter wrote:
> > From: Harry van Haaren 
> >
> > Split the very large file dpif-netdev.c and the datastructures
> > it contains into multiple header files. Each header file is
> > responsible for the datastructures of that component.
> >
> > This logical split allows better reuse and modularity of the code,
> > and reduces the very large file dpif-netdev.c to be more managable.
> >
> > Due to dependencies between components, it is not possible to
> > move component in smaller granularities than this patch.
> >
> > To explain the dependencies better, eg:
> >
> > DPCLS has no deps (from dpif-netdev.c file)
> > FLOW depends on DPCLS (struct dpcls_rule)
> > DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key)
> > THREAD depends on DFC (struct dfc_cache)
> >
> > DFC_PROC depends on THREAD (struct pmd_thread)
> >
> > DPCLS lookup.h/c require only DPCLS
> > DPCLS implementations require only dpif-netdev-lookup.h.
> > - This change was made in 2.12 release with function pointers
> > - This commit only refactors the name to "private-dpcls.h"
> >
> > netdev_flow_key_equal_mf() is renamed to emc_flow_key_equal_mf().
> >
> > Rename functions specific to dpcls from netdev_* namespace to the
> > dpcls_* namespace, as they are only used by dpcls code.
> >
> > 'inline' is added to the dp_netdev_flow_hash() when it is moved
> > definition to fix a compiler error.
> >
> > One valid checkpatch issue with the use of the
> > EMC_FOR_EACH_POS_WITH_HASH() macro was fixed.
> >
> > Signed-off-by: Harry van Haaren 
> > Co-authored-by: Cian Ferriter 
> > Signed-off-by: Cian Ferriter 
> >
> > ---
> >
> > Cc: Gaetan Rivet 
> > Cc: Sriharsha Basavapatna 
> >
> > v13:
> > - Add NEWS item in this commit rather than later.
> > - Add lib/dpif-netdev-private-dfc.c file and move non fast path dfc
> >   related functions there.
> > - Squash commit which renames functions specific to dpcls from netdev_*
> >   namespace to the dpcls_* namespace, as they are only used by dpcls
> >   code into this commit.
> > - Minor fixes from review comments.
> > ---
> >  NEWS   |   1 +
> >  lib/automake.mk|   5 +
> >  lib/dpif-netdev-lookup-autovalidator.c |   1 -
> >  lib/dpif-netdev-lookup-avx512-gather.c |   1 -
> >  lib/dpif-netdev-lookup-generic.c   |   1 -
> >  lib/dpif-netdev-lookup.h   |   2 +-
> >  lib/dpif-netdev-private-dfc.c  | 110 +
> >  lib/dpif-netdev-private-dfc.h  | 176 
> >  lib/dpif-netdev-private-dpcls.h| 127 ++
> >  lib/dpif-netdev-private-flow.h | 162 
> >  lib/dpif-netdev-private-thread.h   | 206 ++
> >  lib/dpif-netdev-private.h  | 100 +
> >  lib/dpif-netdev.c  | 539 +
> >  13 files changed, 811 insertions(+), 620 deletions(-)
> >  create mode 100644 lib/dpif-netdev-private-dfc.c
> >  create mode 100644 lib/dpif-netdev-private-dfc.h
> >  create mode 100644 lib/dpif-netdev-private-dpcls.h
> >  create mode 100644 lib/dpif-netdev-private-flow.h
> >  create mode 100644 lib/dpif-netdev-private-thread.h
> >
> > diff --git a/NEWS b/NEWS
> > index ebba17b22..96b3a61c8 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -9,6 +9,7 @@ Post-v2.15.0
> > - Userspace datapath:
> >   * Auto load balancing of PMDs now partially supports cross-NUMA 
> > polling
> > cases, e.g if all PMD threads are running on the same NUMA node.
> > + * Refactor lib/dpif-netdev.c to multiple header files.
> > - ovs-ctl:
> >   * New option '--no-record-hostname' to disable hostname configuration
> > in ovsdb on startup.
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index db9017591..fdba3c6c0 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -111,6 +111,11 @@ lib_libopenvswitch_la_SOURCES = \
> > lib/dpif-netdev-lookup-generic.c \
> > lib/dpif-netdev.c \
> > lib/dpif-netdev.h \
> > +   lib/dpif-netdev-private-dfc.c \
> > +   lib/dpif-netdev-private-dfc.h \
> > +   lib/dpif-netdev-private-dpcls.h \
> > +   lib/dpif-netdev-private-flow.h \
> > +   lib/dpif-netdev-private-thread.h \
> > lib/dpif-netdev-private.h \
> > lib/dpif-netdev-perf.c \
> > lib/dpif-netdev-perf.h \
> > diff --git a/lib/dpif-netdev-lookup-autovalidator.c 
> > b/lib/dpif-netdev-lookup-autovalidator.c
> > index 97b59fdd0..475e1ab1e 100644
> > --- a/lib/dpif-netdev-lookup-autovalidator.c
> > +++ b/lib/dpif-netdev-lookup-autovalidator.c
> > @@ -17,7 +17,6 @@
> >  #include 
> >  #include "dpif-netdev.h"
> >  

Re: [ovs-dev] [PATCH v7 4/4] dpif-netdev: add all-zero SNAT to the advertised features of ct

2021-06-21 Thread Aaron Conole
Paolo Valerio  writes:

> Signed-off-by: Paolo Valerio 
> ---

Acked-by: Aaron Conole 

>  lib/dpif-netdev.c |   12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 05de57ed5..3f04bf6ec 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -8250,6 +8250,16 @@ dpif_netdev_ct_del_limits(struct dpif *dpif OVS_UNUSED,
>  return err;
>  }
>  
> +static int
> +dpif_netdev_ct_get_features(struct dpif *dpif OVS_UNUSED,
> +enum ct_features *features)
> +{
> +if (features != NULL) {
> +*features = CONNTRACK_F_ZERO_SNAT;
> +}
> +return 0;
> +}
> +
>  static int
>  dpif_netdev_ct_set_timeout_policy(struct dpif *dpif,
>const struct ct_dpif_timeout_policy 
> *dpif_tp)
> @@ -8515,7 +8525,7 @@ const struct dpif_class dpif_netdev_class = {
>  NULL,   /* ct_timeout_policy_dump_next */
>  NULL,   /* ct_timeout_policy_dump_done */
>  dpif_netdev_ct_get_timeout_policy_name,
> -NULL,   /* ct_get_features */
> +dpif_netdev_ct_get_features,
>  dpif_netdev_ipf_set_enabled,
>  dpif_netdev_ipf_set_min_frag,
>  dpif_netdev_ipf_set_max_nfrags,

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC net-next] openvswitch: add trace points

2021-06-21 Thread Aaron Conole
Dumitru Ceara  writes:

> On 5/27/21 9:15 PM, Aaron Conole wrote:
>> This makes openvswitch module use the event tracing framework
>> to log the upcall interface and action execution pipeline.  When
>> using openvswitch as the packet forwarding engine, some types of
>> debugging are made possible simply by using the ovs-vswitchd's
>> ofproto/trace command.  However, such a command has some
>> limitations:
>> 
>>   1. When trying to trace packets that go through the CT action,
>>  the state of the packet can't be determined, and probably
>>  would be potentially wrong.
>> 
>>   2. Deducing problem packets can sometimes be difficult as well
>>  even if many of the flows are known
>> 
>>   3. It's possible to use the openvswitch module even without
>>  the ovs-vswitchd (although, not common use).
>> 
>> Introduce the event tracing points here to make it possible for
>> working through these problems in kernel space.  The style is
>> copied from the mac80211 driver-trace / trace code for
>> consistency.
>> 
>> Signed-off-by: Aaron Conole 
>> ---
>
> Hi Aaron,

Hi Dumitru,

>>  net/openvswitch/Makefile|   3 +
>>  net/openvswitch/actions.c   |   4 +
>>  net/openvswitch/datapath.c  |   7 ++
>>  net/openvswitch/openvswitch_trace.c |  10 ++
>>  net/openvswitch/openvswitch_trace.h | 152 
>>  5 files changed, 176 insertions(+)
>>  create mode 100644 net/openvswitch/openvswitch_trace.c
>>  create mode 100644 net/openvswitch/openvswitch_trace.h
>> 
>> diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
>> index 41109c326f3a..28982630bef3 100644
>> --- a/net/openvswitch/Makefile
>> +++ b/net/openvswitch/Makefile
>> @@ -13,6 +13,7 @@ openvswitch-y := \
>>  flow_netlink.o \
>>  flow_table.o \
>>  meter.o \
>> +openvswitch_trace.o \
>>  vport.o \
>>  vport-internal_dev.o \
>>  vport-netdev.o
>> @@ -24,3 +25,5 @@ endif
>>  obj-$(CONFIG_OPENVSWITCH_VXLAN)+= vport-vxlan.o
>>  obj-$(CONFIG_OPENVSWITCH_GENEVE)+= vport-geneve.o
>>  obj-$(CONFIG_OPENVSWITCH_GRE)   += vport-gre.o
>> +
>> +CFLAGS_openvswitch_trace.o = -I$(src)
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index d858ea580e43..62285453ca79 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -30,6 +30,7 @@
>>  #include "conntrack.h"
>>  #include "vport.h"
>>  #include "flow_netlink.h"
>> +#include "openvswitch_trace.h"
>>  
>>  struct deferred_action {
>>  struct sk_buff *skb;
>> @@ -1242,6 +1243,9 @@ static int do_execute_actions(struct datapath *dp, 
>> struct sk_buff *skb,
>>   a = nla_next(a, )) {
>>  int err = 0;
>>  
>> +if (trace_openvswitch_probe_action_enabled())
>> +trace_openvswitch_probe_action(dp, skb, key, a, rem);
>> +
>>  switch (nla_type(a)) {
>>  case OVS_ACTION_ATTR_OUTPUT: {
>>  int port = nla_get_u32(a);
>> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
>> index 9d6ef6cb9b26..63f19a6ed472 100644
>> --- a/net/openvswitch/datapath.c
>> +++ b/net/openvswitch/datapath.c
>> @@ -43,6 +43,7 @@
>>  #include "flow_table.h"
>>  #include "flow_netlink.h"
>>  #include "meter.h"
>> +#include "openvswitch_trace.h"
>>  #include "vport-internal_dev.h"
>>  #include "vport-netdev.h"
>>  
>> @@ -275,6 +276,12 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff 
>> *skb,
>>  struct dp_stats_percpu *stats;
>>  int err;
>>  
>> +if (trace_openvswitch_probe_userspace_enabled()) {
>> +struct sw_flow_key ukey = *key;
>> +
>> +trace_openvswitch_probe_userspace(dp, skb, , upcall_info);
>> +}
>> +
>>  if (upcall_info->portid == 0) {
>>  err = -ENOTCONN;
>>  goto err;
>> diff --git a/net/openvswitch/openvswitch_trace.c 
>> b/net/openvswitch/openvswitch_trace.c
>> new file mode 100644
>> index ..62c5f7d6f023
>> --- /dev/null
>> +++ b/net/openvswitch/openvswitch_trace.c
>> @@ -0,0 +1,10 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* bug in tracepoint.h, it should include this */
>> +#include 
>> +
>> +/* sparse isn't too happy with all macros... */
>> +#ifndef __CHECKER__
>> +#define CREATE_TRACE_POINTS
>> +#include "openvswitch_trace.h"
>> +
>> +#endif
>> diff --git a/net/openvswitch/openvswitch_trace.h 
>> b/net/openvswitch/openvswitch_trace.h
>> new file mode 100644
>> index ..1b350306f622
>> --- /dev/null
>> +++ b/net/openvswitch/openvswitch_trace.h
>> @@ -0,0 +1,152 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM openvswitch
>> +
>> +#if !defined(_TRACE_OPENVSWITCH_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_OPENVSWITCH_H
>> +
>> +#include 
>> +
>> +#include "datapath.h"
>> +
>> +TRACE_EVENT(openvswitch_probe_action,
>> +
>> +TP_PROTO(struct datapath *dp, struct sk_buff *skb,
>> + struct sw_flow_key *key, 

Re: [ovs-dev] [RFC net-next] openvswitch: add trace points

2021-06-21 Thread Dumitru Ceara
On 5/27/21 9:15 PM, Aaron Conole wrote:
> This makes openvswitch module use the event tracing framework
> to log the upcall interface and action execution pipeline.  When
> using openvswitch as the packet forwarding engine, some types of
> debugging are made possible simply by using the ovs-vswitchd's
> ofproto/trace command.  However, such a command has some
> limitations:
> 
>   1. When trying to trace packets that go through the CT action,
>  the state of the packet can't be determined, and probably
>  would be potentially wrong.
> 
>   2. Deducing problem packets can sometimes be difficult as well
>  even if many of the flows are known
> 
>   3. It's possible to use the openvswitch module even without
>  the ovs-vswitchd (although, not common use).
> 
> Introduce the event tracing points here to make it possible for
> working through these problems in kernel space.  The style is
> copied from the mac80211 driver-trace / trace code for
> consistency.
> 
> Signed-off-by: Aaron Conole 
> ---

Hi Aaron,

>  net/openvswitch/Makefile|   3 +
>  net/openvswitch/actions.c   |   4 +
>  net/openvswitch/datapath.c  |   7 ++
>  net/openvswitch/openvswitch_trace.c |  10 ++
>  net/openvswitch/openvswitch_trace.h | 152 
>  5 files changed, 176 insertions(+)
>  create mode 100644 net/openvswitch/openvswitch_trace.c
>  create mode 100644 net/openvswitch/openvswitch_trace.h
> 
> diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
> index 41109c326f3a..28982630bef3 100644
> --- a/net/openvswitch/Makefile
> +++ b/net/openvswitch/Makefile
> @@ -13,6 +13,7 @@ openvswitch-y := \
>   flow_netlink.o \
>   flow_table.o \
>   meter.o \
> + openvswitch_trace.o \
>   vport.o \
>   vport-internal_dev.o \
>   vport-netdev.o
> @@ -24,3 +25,5 @@ endif
>  obj-$(CONFIG_OPENVSWITCH_VXLAN)+= vport-vxlan.o
>  obj-$(CONFIG_OPENVSWITCH_GENEVE)+= vport-geneve.o
>  obj-$(CONFIG_OPENVSWITCH_GRE)+= vport-gre.o
> +
> +CFLAGS_openvswitch_trace.o = -I$(src)
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index d858ea580e43..62285453ca79 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -30,6 +30,7 @@
>  #include "conntrack.h"
>  #include "vport.h"
>  #include "flow_netlink.h"
> +#include "openvswitch_trace.h"
>  
>  struct deferred_action {
>   struct sk_buff *skb;
> @@ -1242,6 +1243,9 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
>a = nla_next(a, )) {
>   int err = 0;
>  
> + if (trace_openvswitch_probe_action_enabled())
> + trace_openvswitch_probe_action(dp, skb, key, a, rem);
> +
>   switch (nla_type(a)) {
>   case OVS_ACTION_ATTR_OUTPUT: {
>   int port = nla_get_u32(a);
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 9d6ef6cb9b26..63f19a6ed472 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -43,6 +43,7 @@
>  #include "flow_table.h"
>  #include "flow_netlink.h"
>  #include "meter.h"
> +#include "openvswitch_trace.h"
>  #include "vport-internal_dev.h"
>  #include "vport-netdev.h"
>  
> @@ -275,6 +276,12 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff 
> *skb,
>   struct dp_stats_percpu *stats;
>   int err;
>  
> + if (trace_openvswitch_probe_userspace_enabled()) {
> + struct sw_flow_key ukey = *key;
> +
> + trace_openvswitch_probe_userspace(dp, skb, , upcall_info);
> + }
> +
>   if (upcall_info->portid == 0) {
>   err = -ENOTCONN;
>   goto err;
> diff --git a/net/openvswitch/openvswitch_trace.c 
> b/net/openvswitch/openvswitch_trace.c
> new file mode 100644
> index ..62c5f7d6f023
> --- /dev/null
> +++ b/net/openvswitch/openvswitch_trace.c
> @@ -0,0 +1,10 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* bug in tracepoint.h, it should include this */
> +#include 
> +
> +/* sparse isn't too happy with all macros... */
> +#ifndef __CHECKER__
> +#define CREATE_TRACE_POINTS
> +#include "openvswitch_trace.h"
> +
> +#endif
> diff --git a/net/openvswitch/openvswitch_trace.h 
> b/net/openvswitch/openvswitch_trace.h
> new file mode 100644
> index ..1b350306f622
> --- /dev/null
> +++ b/net/openvswitch/openvswitch_trace.h
> @@ -0,0 +1,152 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM openvswitch
> +
> +#if !defined(_TRACE_OPENVSWITCH_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_OPENVSWITCH_H
> +
> +#include 
> +
> +#include "datapath.h"
> +
> +TRACE_EVENT(openvswitch_probe_action,
> +
> + TP_PROTO(struct datapath *dp, struct sk_buff *skb,
> +  struct sw_flow_key *key, const struct nlattr *a, int rem),
> +
> + TP_ARGS(dp, skb, key, a, rem),
> +
> + TP_STRUCT__entry(
> + __field(void *, 

Re: [ovs-dev] [PATCH v7 3/4] conntrack: handle SNAT with all-zero IP address

2021-06-21 Thread 0-day Robot
Bleep bloop.  Greetings Paolo Valerio, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 conntrack: handle SNAT with all-zero IP address
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [External] : [PATCH ovn v3] ovn-northd.c: Add proxy ARP support to OVN

2021-06-21 Thread Brendan Doyle

Hi Numan,

I know things are busy but did you get a chance to look at this. I made 
the change to use

extract_ip_addresses() as requested.

Brendan

On 18/06/2021 14:53, Brendan Doyle wrote:

From 634fd88b726700b30cb76203ca45f1e9c041368a Mon Sep 17 00:00:00 2001
From: Brendan Doyle 
Date: Fri, 28 May 2021 10:01:17 -0700
Subject: [PATCH ovn] ovn-northd.c: Add proxy ARP support to OVN

This patch provides the ability to configure proxy ARP IPs on a Logical
Switch Router port. The IPs are added as Options for router ports. This
provides a useful feature where traffic for a service must be sent to an
address in a logical network address space, but the service is provided
in a different network. For example an NFS service is provide to Logical
networks at an address in their Logical network space, but the NFS
server resides in a physical network. A Logical switch Router port can
be configured to respond to ARP requests sent to the service "Logical
address", the Logical Router/Gateway can then be configured to forward
the traffic to the underlay/physical network.

Signed-off-by: Brendan Doyle 
---
 northd/ovn-northd.c |  46 +++
 ovn-nb.xml  |   9 +
 tests/ovn.at    | 103 


 3 files changed, 158 insertions(+)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 0e5092a..2bc6f28 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -6943,6 +6943,7 @@ build_lswitch_arp_nd_responder_known_ips(struct 
ovn_port *op,

  struct ds *match)
 {
 if (op->nbsp) {
+    const char *arp_proxy;
 if (!strcmp(op->nbsp->type, "virtual")) {
 /* Handle
  *  - GARPs for virtual ip which belongs to a logical port
@@ -7096,6 +7097,51 @@ build_lswitch_arp_nd_responder_known_ips(struct 
ovn_port *op,

 }
 }
 }
+
+    /*
+ * Add responses for ARP proxies.
+ */
+    arp_proxy = smap_get(>nbsp->options,"arp_proxy");
+    if (arp_proxy && op->peer) {
+    struct lport_addresses proxy_arp_addrs;
+    int i = 0;
+
+    if (extract_ip_addresses(arp_proxy, _arp_addrs)) {
+    /*
+ * Match rule on all proxy ARP IPs.
+ */
+    ds_clear(match);
+    ds_put_cstr(match, "arp.op == 1 && (");
+
+    for (i = 0; i < proxy_arp_addrs.n_ipv4_addrs; i++) {
+    if (i > 0) {
+    ds_put_cstr(match, " || ");
+    }
+    ds_put_format(match, "arp.tpa == %s",
+    proxy_arp_addrs.ipv4_addrs[i].addr_s);
+    }
+
+    ds_put_cstr(match, ")");
+    destroy_lport_addresses(_arp_addrs);
+
+    ds_clear(actions);
+    ds_put_format(actions,
+    "eth.dst = eth.src; "
+    "eth.src = %s; "
+    "arp.op = 2; /* ARP reply */ "
+    "arp.tha = arp.sha; "
+    "arp.sha = %s; "
+    "arp.tpa <-> arp.spa; "
+    "outport = inport; "
+    "flags.loopback = 1; "
+    "output;",
+    op->peer->lrp_networks.ea_s,
+    op->peer->lrp_networks.ea_s);
+
+    ovn_lflow_add_with_hint(lflows, op->od, 
S_SWITCH_IN_ARP_ND_RSP,
+    50, ds_cstr(match), ds_cstr(actions), 
>nbsp->header_);

+    }
+    }
 }
 }

diff --git a/ovn-nb.xml b/ovn-nb.xml
index 02fd216..e4a8114 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -848,6 +848,15 @@
 
   
 
+
+    
+  Optional. A list IPv4 addresses that this
+  logical switch router port will reply to ARP 
requests.

+  Example: 169.254.239.254 169.254.239.2. The
+  's logical router 
should
+  have a route to forward packets sent to configured proxy 
ARP IPs to

+  an appropriate destination.
+    
   

   
diff --git a/tests/ovn.at b/tests/ovn.at
index 2c3c36d..ffbb594 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -26527,3 +26527,106 @@ AT_CHECK([test $(ovn-appctl -t 
ovn-controller coverage/read-counter lflow_run) =

 OVN_CLEANUP([hv1])
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR])
+AT_KEYWORDS([proxy-arp])
+ovn_start
+
+# Logical network:
+# One LR - lr1 has switch ls1 (192.16.1.0/24) connected to it,
+# and and one HV with IP 192.16.1.6.
+
+ovn-nbctl lr-add lr1
+ovn-nbctl ls-add ls1
+
+# Connect ls1 to lr1
+ovn-nbctl lrp-add lr1 ls1 00:00:00:01:02:f1 192.16.1.1/24
+ovn-nbctl lsp-add ls1 rp-ls1 -- set Logical_Switch_Port rp-ls1 \
+    type=router options:router-port=ls1 addresses=\"00:00:00:01:02:f1\"
+
+# Create logical port ls1-lp1 in ls1
+ovn-nbctl lsp-add ls1 ls1-lp1 \
+-- lsp-set-addresses 

[ovs-dev] [PATCH v7 4/4] dpif-netdev: add all-zero SNAT to the advertised features of ct

2021-06-21 Thread Paolo Valerio
Signed-off-by: Paolo Valerio 
---
 lib/dpif-netdev.c |   12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 05de57ed5..3f04bf6ec 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -8250,6 +8250,16 @@ dpif_netdev_ct_del_limits(struct dpif *dpif OVS_UNUSED,
 return err;
 }
 
+static int
+dpif_netdev_ct_get_features(struct dpif *dpif OVS_UNUSED,
+enum ct_features *features)
+{
+if (features != NULL) {
+*features = CONNTRACK_F_ZERO_SNAT;
+}
+return 0;
+}
+
 static int
 dpif_netdev_ct_set_timeout_policy(struct dpif *dpif,
   const struct ct_dpif_timeout_policy *dpif_tp)
@@ -8515,7 +8525,7 @@ const struct dpif_class dpif_netdev_class = {
 NULL,   /* ct_timeout_policy_dump_next */
 NULL,   /* ct_timeout_policy_dump_done */
 dpif_netdev_ct_get_timeout_policy_name,
-NULL,   /* ct_get_features */
+dpif_netdev_ct_get_features,
 dpif_netdev_ipf_set_enabled,
 dpif_netdev_ipf_set_min_frag,
 dpif_netdev_ipf_set_max_nfrags,

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v7 3/4] conntrack: handle SNAT with all-zero IP address

2021-06-21 Thread Paolo Valerio
this patch introduces for the userspace datapath the handling
of rules like the following:

ct(commit,nat(src=0.0.0.0),...)

Kernel datapath already handle this case that is particularly
handy in scenarios like the following:

Given A: 10.1.1.1, B: 192.168.2.100, C: 10.1.1.2

A opens a connection toward B on port 80 selecting as source port 1.
B's IP gets dnat'ed to C's IP (10.1.1.1:1 -> 192.168.2.100:80).

This will result in:

tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=1,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=1),protoinfo=(state=ESTABLISHED)

A now tries to establish another connection with C using source port
1, this time using C's IP address (10.1.1.1:1 -> 10.1.1.2:80).

This second connection, if processed by conntrack with no SNAT/DNAT
involved, collides with the reverse tuple of the first connection,
so the entry for this valid connection doesn't get created.

With this commit, and adding a SNAT rule with 0.0.0.0 for
10.1.1.1:1 -> 10.1.1.2:80 will allow to create the conn entry:

tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10001),protoinfo=(state=ESTABLISHED)
tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=1,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=1),protoinfo=(state=ESTABLISHED)

The issue exists even in the opposite case (with A trying to connect
to C using B's IP after establishing a direct connection from A to C).

This commit refactors the relevant function in a way that both of the
previously mentioned cases are handled as well.

Suggested-by: Eelco Chaudron 
Signed-off-by: Paolo Valerio 
Acked-by: Gaetan Rivet 
Acked-by: Aaron Conole 
---
 NEWS |3 
 lib/conntrack-private.h  |   33 
 lib/conntrack.c  |  335 --
 lib/ovs-actions.xml  |3 
 tests/system-userspace-macros.at |8 -
 5 files changed, 251 insertions(+), 131 deletions(-)

diff --git a/NEWS b/NEWS
index ebba17b22..ca6f52522 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,9 @@ Post-v2.15.0
- Userspace datapath:
  * Auto load balancing of PMDs now partially supports cross-NUMA polling
cases, e.g if all PMD threads are running on the same NUMA node.
+ * Add all-zero IP SNAT handling to conntrack. In case of collision,
+   using ct(src=0.0.0.0), the source port will be replaced with another
+   non-colliding port in the ephemeral range (1024, 65535).
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index e8332bdba..cc2fb045d 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -148,6 +148,39 @@ enum ct_update_res {
 CT_TIMEOUT(ICMP_FIRST) \
 CT_TIMEOUT(ICMP_REPLY)
 
+#define NAT_ACTION_SNAT_ALL (NAT_ACTION_SRC | NAT_ACTION_SRC_PORT)
+#define NAT_ACTION_DNAT_ALL (NAT_ACTION_DST | NAT_ACTION_DST_PORT)
+
+enum ct_ephemeral_range {
+MIN_NAT_EPHEMERAL_PORT = 1024,
+MAX_NAT_EPHEMERAL_PORT = 65535
+};
+
+#define IN_RANGE(curr, min, max) \
+(curr >= min && curr <= max)
+
+#define NEXT_PORT_IN_RANGE(curr, min, max) \
+(curr = (!IN_RANGE(curr, min, max) || curr == max) ? min : curr + 1)
+
+/* if the current port is out of range increase the attempts by
+ * one so that in the worst case scenario the current out of
+ * range port plus all the in-range ports get tested.
+ * Note that curr can be an out of range port only in case of
+ * source port (SNAT with port range unspecified or DNAT),
+ * furthermore the source port in the packet has to be less than
+ * MIN_NAT_EPHEMERAL_PORT. */
+#define N_PORT_ATTEMPTS(curr, min, max) \
+((!IN_RANGE(curr, min, max)) ? (max - min) + 2 : (max - min) + 1)
+
+/* loose in-range check, the first curr port can be any port out of
+ * the range. */
+#define FOR_EACH_PORT_IN_RANGE__(curr, min, max, INAME) \
+for (uint16_t INAME = N_PORT_ATTEMPTS(curr, min, max); \
+INAME > 0; INAME--, NEXT_PORT_IN_RANGE(curr, min, max))
+
+#define FOR_EACH_PORT_IN_RANGE(curr, min, max) \
+FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
+
 enum ct_timeout {
 #define CT_TIMEOUT(NAME) CT_TM_##NAME,
 CT_TIMEOUTS
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 7e8b16a3e..f49382adb 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -108,8 +108,8 @@ static void set_label(struct dp_packet *, struct conn *,
 static void *clean_thread_main(void *f_);
 
 static bool
-nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
-   struct conn *nat_conn);
+nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn,
+ struct conn *nat_conn);
 
 static uint8_t
 reverse_icmp_type(uint8_t type);
@@ -728,11 +728,11 @@ pat_packet(struct dp_packet *pkt, const struct conn *conn)
 }
 } else if 

[ovs-dev] [PATCH v7 2/4] util.h: add token concatenation macro with argument expansion

2021-06-21 Thread Paolo Valerio
this macro is handy when it comes paste two tokens when one or both
are macros.

Rename CURSOR_JOIN() to OVS_JOIN() and move it to util.h so that it can
be reused.

Signed-off-by: Paolo Valerio 
Acked-by: Gaetan Rivet 
Acked-by: Aaron Conole 
---
 lib/cmap.h |5 +
 lib/util.h |7 +++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/cmap.h b/lib/cmap.h
index d9db3c915..c502d2311 100644
--- a/lib/cmap.h
+++ b/lib/cmap.h
@@ -245,9 +245,6 @@ void cmap_cursor_advance(struct cmap_cursor *);
 
 /* Generate a unique name for the cursor with the __COUNTER__ macro to
  * allow nesting of CMAP_FOR_EACH loops. */
-#define CURSOR_JOIN2(x,y) x##y
-#define CURSOR_JOIN(x, y) CURSOR_JOIN2(x,y)
-
 #define CMAP_FOR_EACH__(NODE, MEMBER, CMAP, CURSOR_NAME)   \
 for (struct cmap_cursor CURSOR_NAME = cmap_cursor_start(CMAP); \
  CMAP_CURSOR_FOR_EACH__(NODE, _NAME, MEMBER);   \
@@ -255,7 +252,7 @@ void cmap_cursor_advance(struct cmap_cursor *);
 
 #define CMAP_FOR_EACH(NODE, MEMBER, CMAP) \
   CMAP_FOR_EACH__(NODE, MEMBER, CMAP, \
-CURSOR_JOIN(cursor_, __COUNTER__))
+OVS_JOIN(cursor_, __COUNTER__))
 
 static inline struct cmap_node *cmap_first(const struct cmap *);
 
diff --git a/lib/util.h b/lib/util.h
index 1fe8ef32b..aea19d45f 100644
--- a/lib/util.h
+++ b/lib/util.h
@@ -105,6 +105,13 @@ ovs_prefetch_range(const void *start, size_t size)
 
 #define OVS_NOT_REACHED() abort()
 
+/* Joins two token expanding the arguments if they are macros.
+ *
+ * For token concatenation the circumlocution is needed for the
+ * expansion. */
+#define OVS_JOIN2(X, Y) X##Y
+#define OVS_JOIN(X, Y) OVS_JOIN2(X, Y)
+
 /* Use "%"PRIuSIZE to format size_t with printf(). */
 #ifdef _WIN32
 #define PRIdSIZE "Id"

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v7 1/4] conntrack: handle already natted packets

2021-06-21 Thread Paolo Valerio
when a packet gets dnatted and then recirculated, it could be possible
that it matches another rule that performs another nat action.
The kernel datapath handles this situation turning to a no-op the
second nat action, so natting only once the packet.  In the userspace
datapath instead, when the ct action gets executed, an initial lookup
of the translated packet fails to retrieve the connection related to
the packet, leading to the creation of a new entry in ct for the src
nat action with a subsequent failure of the connection establishment.

with the following flows:

table=0,priority=30,in_port=1,ip,nw_dst=192.168.2.100,actions=ct(commit,nat(dst=10.1.1.2:80),table=1)
table=0,priority=20,in_port=2,ip,actions=ct(nat,table=1)
table=0,priority=10,ip,actions=resubmit(,2)
table=0,priority=10,arp,actions=NORMAL
table=0,priority=0,actions=drop
table=1,priority=5,ip,actions=ct(commit,nat(src=10.1.1.240),table=2)
table=2,in_port=ovs-l0,actions=2
table=2,in_port=ovs-r0,actions=1

establishing a connection from 10.1.1.1 to 192.168.2.100 the outcome is:

tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.240,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)
tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)

with this patch applied the outcome is:

tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)

The patch performs, for already natted packets, a lookup of the
reverse key in order to retrieve the related entry, it also adds a
test case that besides testing the scenario ensures that the other ct
actions are executed.

Reported-by: Dumitru Ceara 
Signed-off-by: Paolo Valerio 
---
 lib/conntrack.c |   30 +-
 tests/system-traffic.at |   35 +++
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 99198a601..7e8b16a3e 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1281,6 +1281,33 @@ process_one_fast(uint16_t zone, const uint32_t *setmark,
 }
 }
 
+static void
+initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx,
+ long long now, bool natted)
+{
+bool found;
+
+if (natted) {
+/* if the packet has been already natted (e.g. a previous
+ * action took place), retrieve it performing a lookup of its
+ * reverse key. */
+conn_key_reverse(>key);
+}
+
+found = conn_key_lookup(ct, >key, ctx->hash,
+now, >conn, >reply);
+if (natted) {
+if (OVS_LIKELY(found)) {
+ctx->reply = !ctx->reply;
+ctx->key = ctx->reply ? ctx->conn->rev_key : ctx->conn->key;
+ctx->hash = conn_key_hash(>key, ct->hash_basis);
+} else {
+/* in case of failure restore the initial key. */
+conn_key_reverse(>key);
+}
+}
+}
+
 static void
 process_one(struct conntrack *ct, struct dp_packet *pkt,
 struct conn_lookup_ctx *ctx, uint16_t zone,
@@ -1296,7 +1323,8 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 }
 
 bool create_new_conn = false;
-conn_key_lookup(ct, >key, ctx->hash, now, >conn, >reply);
+initial_conn_lookup(ct, ctx, now, !!(pkt->md.ct_state &
+ (CS_SRC_NAT | CS_DST_NAT)));
 struct conn *conn = ctx->conn;
 
 /* Delete found entry if in wrong direction. 'force' implies commit. */
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 6a15d08c2..f400cfabc 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -4588,6 +4588,41 @@ 
tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - DNAT with additional SNAT])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+NS_CHECK_EXEC([at_ns0], [ip route add 172.1.1.0/24 via 10.1.1.2])
+
+OVS_START_L7([at_ns1], [http])
+
+AT_DATA([flows.txt], [dnl
+table=0,priority=30,in_port=1,ip,nw_dst=172.1.1.2,actions=ct(commit,nat(dst=10.1.1.2:80),table=1)
+table=0,priority=20,in_port=2,ip,actions=ct(nat),1
+table=0,priority=10,arp,actions=NORMAL
+table=0,priority=1,actions=drop
+dnl Be sure all ct() actions but src nat are executed
+table=1,ip,actions=ct(commit,nat(src=10.1.1.240),exec(set_field:0xac->ct_mark,set_field:0xac->ct_label),table=2)
+table=2,in_port=1,ip,ct_mark=0xac,ct_label=0xac,actions=2
+])
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+NS_CHECK_EXEC([at_ns0], [wget http://172.1.1.2:8080 -t 5 -T 1 
--retry-connrefused -v -o wget0.log])
+
+dnl - make sure only dst nat has been performed
+AT_CHECK([ovs-appctl dpctl/dump-conntrack |  FORMAT_CT(10.1.1.240)], 

[ovs-dev] [PATCH v7 0/4] conntrack: add all-zero SNAT

2021-06-21 Thread Paolo Valerio
Two more patches has been introduced.

1/4 is a prereq for the series because it fixes an issue that prevents
OVN to use all-zero snat due to the way it builds the pipeline.
The patch 2/4 has been introduced in v6 as a prereq of 3/4.
All the versions up to v6 were about patch 3/4.

{2,3}/4 are untouched, so the acks are kept.

v7: fixed the multiple ct(commit,[dst|src]) issue spotted by Dumitru
and adds the ct_get_features callback for userspace datapath
advertising the availability of all-zero snat.
v6: moved CURSOR_JOIN from cmap.h to util.h and renamed it as OVS_JOIN
reworked a little FOR_EACH_PORT_IN_RANGE() so that two arguments have
been removed (one has been masked and the macro uses patch 1/2 to
generate a unique index name based on __COUNTER__).
INIT_ATT() has become N_PORT_ATTEMPTS().
Moved all macros from conntrack.h to conntrack-private.h
Changed ovs-actions.xml and system-userspace-macros.at according to the
new version of [1].
v5: added an entry to NEWS, updated ovs-actions.xml removing
the kernel only exception, improved the range handling in
case the packet source port is out of the ephemeral range
(for SNAT without port range and DNAT actions), expanded
some comment.
v4: no code changes, just restored some removed new line.
v3: replaced NULL with all-zero in the commit message.
v2: enabled NULL SNAT self-test also for userspace.

Note for the maintainers:
{3,4}/4 depend on:

https://patchwork.ozlabs.org/project/openvswitch/patch/162331699885.2208579.16546865084041166731.stgit@ebuild/

Paolo Valerio (4):
  conntrack: handle already natted packets
  util.h: add token concatenation macro with argument expansion
  conntrack: handle SNAT with all-zero IP address
  dpif-netdev: add all-zero SNAT to the advertised features of ct


 NEWS |   3 +
 lib/cmap.h   |   5 +-
 lib/conntrack-private.h  |  33 +++
 lib/conntrack.c  | 335 +++
 lib/dpif-netdev.c|  12 +-
 lib/ovs-actions.xml  |   3 +-
 lib/util.h   |   7 +
 tests/system-userspace-macros.at |   8 +-
 8 files changed, 270 insertions(+), 136 deletions(-)

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 5/5] ovn-controller: Fix incremental processing for logical port references.

2021-06-21 Thread Dumitru Ceara
On 6/21/21 8:53 AM, Han Zhou wrote:
> is_cr_cond_present only checks if is_chassis_resident() exists (which would
> reference a logical port). lflow_ref_lookup() checks for all the logical
> port references (not just by is_chassis_resident()).
> For lflows that reference logical ports, we can cache the expr but not the
> match. I wondered if this could impact performance but from my earlier
> tests there was no impact, but maybe there are scenarios that may have
> performance impact but I didn't test.

Makes sense, thanks for the clarification!

> I also noticed that I forgot to remove the variable is_cr_cond_present
> which is not used any more. I removed it in v3.

I'll have a look at v3 this week.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3 2/2] lflow-cache: Auto trim when cache size is reduced significantly.

2021-06-21 Thread Dumitru Ceara
On 6/20/21 9:49 AM, Mark Gray wrote:
>> Instead, we now automatically trim memory every time the lflow cache
>> utilization decreases by 50%, with a threshold of at least
>> lflow-cache-trim-limit (1) elements in the cache.
>>
>> The percentage of the high watermark under which memory is trimmed
>> automatically as well as the lflow-cache minimum number of elements
>> above which trimming is performed are configurable through OVS external
>> IDs.
>>
>> [0] https://sourceware.org/bugzilla/show_bug.cgi?id=14827
>>
>> Reported-at: https://bugzilla.redhat.com/1967882
>> Signed-off-by: Dumitru Ceara 
> There's a small typo below in the test code. Please *don't* update
> unless you need to do a v4 as it wouldnt be worth the effort otherwise.
> 
> Acked-by: Mark D. Gray 
> 

Thanks for the review!  I assume maintainers can fix the typo when the
patch is pushed if there are no other issues.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ofproto-dpif: fix issue with non-reversible actions on a patch ports

2021-06-21 Thread Eelco Chaudron
For patch ports, the is_last_action value is not propagated and is
always set to true. This causes non-reversible actions to modify the
packet, and the original content is not preserved when processing
the remaining actions.

This patch propagates the is_last_action flag for patch port related
actions. In addition, it also fixes a general last action propagation
to the individual actions.

Fixed check_pkt_larger as last action, as it is a valid case for the
drop action, so it should not be skipped.

Fixes: feee58b95 ("ofproto-dpif-xlate: Keep track of the last action")
Fixes: 5b34f8fc3 ("Add a new OVS action check_pkt_larger")
Signed-off-by: Eelco Chaudron 

---
v2: Fixed additional last action propagation to individual actions.
Fixed problem with the check_pkt_larger now that that the correct
last action state is passed.

 ofproto/ofproto-dpif-xlate.c |   27 ---
 tests/ofproto-dpif.at|   28 
 2 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 7108c8a30..7b35f6a32 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -460,7 +460,7 @@ static void xlate_commit_actions(struct xlate_ctx *ctx);
 
 static void
 patch_port_output(struct xlate_ctx *ctx, const struct xport *in_dev,
-  struct xport *out_dev);
+  struct xport *out_dev, bool is_last_action);
 
 static void
 ctx_trigger_freeze(struct xlate_ctx *ctx)
@@ -3598,7 +3598,7 @@ propagate_tunnel_data_to_flow(struct xlate_ctx *ctx, 
struct eth_addr dmac,
 static int
 native_tunnel_output(struct xlate_ctx *ctx, const struct xport *xport,
  const struct flow *flow, odp_port_t tunnel_odp_port,
- bool truncate)
+ bool truncate, bool is_last_action)
 {
 struct netdev_tnl_build_header_params tnl_params;
 struct ovs_action_push_tnl tnl_push_data;
@@ -3728,7 +3728,7 @@ native_tunnel_output(struct xlate_ctx *ctx, const struct 
xport *xport,
 entry->tunnel_hdr.hdr_size = tnl_push_data.header_len;
 entry->tunnel_hdr.operation = ADD;
 
-patch_port_output(ctx, xport, out_dev);
+patch_port_output(ctx, xport, out_dev, is_last_action);
 
 /* Similar to the stats update in revalidation, the x_cache entries
  * are populated by the previous translation are used to update the
@@ -3822,7 +3822,7 @@ xlate_flow_is_protected(const struct xlate_ctx *ctx, 
const struct flow *flow, co
  */
 static void
 patch_port_output(struct xlate_ctx *ctx, const struct xport *in_dev,
-  struct xport *out_dev)
+  struct xport *out_dev, bool is_last_action)
 {
 struct flow *flow = >xin->flow;
 struct flow old_flow = ctx->xin->flow;
@@ -3864,8 +3864,9 @@ patch_port_output(struct xlate_ctx *ctx, const struct 
xport *in_dev,
 if (!process_special(ctx, out_dev) && may_receive(out_dev, ctx)) {
 if (xport_stp_forward_state(out_dev) &&
 xport_rstp_forward_state(out_dev)) {
+
 xlate_table_action(ctx, flow->in_port.ofp_port, 0, true, true,
-   false, true, clone_xlate_actions);
+   false, is_last_action, clone_xlate_actions);
 if (!ctx->freezing) {
 xlate_action_set(ctx);
 }
@@ -3880,7 +3881,7 @@ patch_port_output(struct xlate_ctx *ctx, const struct 
xport *in_dev,
 mirror_mask_t old_mirrors2 = ctx->mirrors;
 
 xlate_table_action(ctx, flow->in_port.ofp_port, 0, true, true,
-   false, true, clone_xlate_actions);
+   false, is_last_action, clone_xlate_actions);
 ctx->mirrors = old_mirrors2;
 ctx->base_flow = old_base_flow;
 ctx->odp_actions->size = old_size;
@@ -4107,7 +4108,7 @@ terminate_native_tunnel(struct xlate_ctx *ctx, struct 
flow *flow,
 static void
 compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
 const struct xlate_bond_recirc *xr, bool check_stp,
-bool is_last_action OVS_UNUSED, bool truncate)
+bool is_last_action, bool truncate)
 {
 const struct xport *xport = get_ofp_port(ctx->xbridge, ofp_port);
 struct flow_wildcards *wc = ctx->wc;
@@ -4144,7 +4145,7 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t 
ofp_port,
if (truncate) {
xlate_report_error(ctx, "Cannot truncate output to patch port");
}
-   patch_port_output(ctx, xport, xport->peer);
+   patch_port_output(ctx, xport, xport->peer, is_last_action);
return;
 }
 
@@ -4239,7 +4240,8 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t 
ofp_port,
xr->recirc_id);
 } else if (is_native_tunnel) {
 /* Output to native tunnel port. */
-   

[ovs-dev] [PATCH] ofproto-dpif-xlate: Optimize select group performance.

2021-06-21 Thread Surya Rudra via dev
In a new OVS use case with a 3rd party SDN controller,
we observe two new typical patterns that are handled
sub-optimally:
1. Degenerate select groups with a single bucket.
For such select groups the overhead of computing a dp_hash
and recirculating the packet in the datapath to look up the
bucket is superfluous.

2. Select groups with 2 equally weighted buckets
The minimum lookup table size is set to 16 slots (4 bit hash)
even in this scenario where two slots would suffice (1 bit hash).
This implies an unnecessary 8-fold increase of needed
recirculation megaflow entries in the datapath. In select group
use cases where the number of megaflows is already high(i.e.10-100K)
an additional factor 8 can push the OVS megaflow cache over the edge.

Solution:
The patch contains two changes:
1. If the select group has only a single bucket, skip dp_hash
calculation an recirculation entirely and execute the single
bucket's actions directly.
2. Do not artificially increase the size of the dp_hash lookup
table to 16 slots if the bucket configuration allows an accurate
representation with 2, 4 or 8 slots. This minimizes the number of
recirculated megaflows.

Testing:
Updated existing unit testcases and added new testcase with one
bucket in select group.

Signed-off-by: Surya Rudra 
---
 ofproto/ofproto-dpif-xlate.c |  5 +
 ofproto/ofproto-dpif.c   |  3 +--
 tests/ofproto-dpif.at| 43 +++-
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index a6f4ea334..5c41357f1 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4612,6 +4612,11 @@ pick_hash_fields_select_group(struct xlate_ctx *ctx, 
struct group_dpif *group)
 static struct ofputil_bucket *
 pick_dp_hash_select_group(struct xlate_ctx *ctx, struct group_dpif *group)
 {
+/* Skip dp_hash recirculation in the special case of a single bucket */
+if (group->up.n_buckets == 1) {
+return group_first_live_bucket(ctx, group, 0);
+}
+
 uint32_t dp_hash = ctx->xin->flow.dp_hash;
 
 /* dp_hash value 0 is special since it means that the dp_hash has not been
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 47db9bb57..3bd2136d8 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -5080,8 +5080,7 @@ group_setup_dp_hash_table(struct group_dpif *group, 
size_t max_hash)
  min_weight, total_weight);
 
 uint64_t min_slots = DIV_ROUND_UP(total_weight, min_weight);
-uint64_t min_slots2 = ROUND_UP_POW2(min_slots);
-uint64_t n_hash = MAX(16, min_slots2);
+uint64_t n_hash = ROUND_UP_POW2(min_slots);
 if (n_hash > MAX_SELECT_GROUP_HASH_VALUES ||
 (max_hash != 0 && n_hash > max_hash)) {
 VLOG_DBG("  Too many hash values required: %"PRIu64, n_hash);
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 31064ed95..e41c44d57 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -607,10 +607,31 @@ AT_CHECK([tail -1 stdout], [0],
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([ofproto-dpif - select group with one bucket in action list])
+OVS_VSWITCHD_START
+add_of_ports br0 1 10
+AT_CHECK([ovs-ofctl -O OpenFlow12 add-group br0 
'group_id=1234,type=select,bucket=weight:10,set_field:192.168.3.90->ip_src,output:10'])
+AT_CHECK([ovs-ofctl -O OpenFlow12 add-flow br0 'ip 
actions=group:1234,output:10'])
+AT_CHECK([ovs-appctl ofproto/trace br0 
'in_port=1,dl_src=50:54:00:00:00:05,dl_dst=50:54:00:00:00:07,dl_type=0x0800,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=1,nw_tos=0,nw_ttl=128,icmp_type=8,icmp_code=0'],
 [0], [stdout])
+AT_CHECK([tail -10 stdout | sed 's/dp_hash=.*\/0xf/dp_hash=0x\/0xf/'], [0],
+  [group:1234
+ -> using bucket 0
+bucket 0
+set_field:192.168.3.90->ip_src
+output:10
+output:10
+
+Final flow: unchanged
+Megaflow: recirc_id=0,eth,ip,in_port=1,nw_src=192.168.0.1,nw_frag=no
+Datapath actions: set(ipv4(src=192.168.3.90)),10,set(ipv4(src=192.168.0.1)),10
+])
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([ofproto-dpif - group actions have no effect afterwards])
 OVS_VSWITCHD_START
 add_of_ports br0 1 10
-AT_CHECK([ovs-ofctl -O OpenFlow12 add-group br0 
'group_id=1234,type=select,bucket=set_field:192.168.3.90->ip_src,output:10'])
+AT_CHECK([ovs-ofctl -O OpenFlow12 add-group br0 
'group_id=1234,type=select,bucket=weight:10,set_field:192.168.3.90->ip_src,output:10,bucket=set_field:192.168.3.90->ip_src,output:10'])
 AT_CHECK([ovs-ofctl -O OpenFlow12 add-flow br0 'ip 
actions=group:1234,output:10'])
 
 for d in 0 1 2 3; do
@@ -690,10 +711,10 @@ AT_CHECK([grep -A6 "Constructing select group 1234" 
ovs-vswitchd.log | sed 's/^.
 ofproto_dpif|DBG|Constructing select group 1234
 ofproto_dpif|DBG|No selection method specified. Trying dp_hash.
 ofproto_dpif|DBG|  Minimum weight: 1, total weight: 2
-ofproto_dpif|DBG|  Using 16 hash values:
-ofproto_dpif|DBG|  Bucket 0: weight=1, 

Re: [ovs-dev] [PATCH v4 1/2] Encap & Decap actions for MPLS packet type.

2021-06-21 Thread Eelco Chaudron


On 19 Jun 2021, at 8:06, Martin Varghese wrote:

> On Thu, Apr 08, 2021 at 03:31:24PM +0200, Eelco Chaudron wrote:
>>
>>
>> On 8 Apr 2021, at 14:05, Martin Varghese wrote:
>>
>>> On Wed, Apr 07, 2021 at 03:49:07PM +, Jan Scheurich wrote:
 Hi Martin,

 I guess you are aware of the original design document we wrote for
 generic encap/decap and NSH support:
 https://docs.google.com/document/d/1oWMYUH8sjZJzWa72o2q9kU0N6pNE-rwZcLH3-kbbDR8/edit#

 It is no longer 100% aligned with the final implementation in OVS
 but still a good reference for understanding the design principles
 behind the implementation and some specifics for Ethernet and NSH
 encap/decap use cases.

 Please find some more answers/comments below.

 BR, Jan

> -Original Message-
> From: Martin Varghese 
> Sent: Wednesday, 7 April, 2021 10:43
> To: Jan Scheurich 
> Cc: Eelco Chaudron ; d...@openvswitch.org;
> pshe...@ovn.org; martin.vargh...@nokia.com
> Subject: Re: [PATCH v4 1/2] Encap & Decap actions for MPLS
> packet type.
>
> On Tue, Apr 06, 2021 at 09:00:16AM +, Jan Scheurich wrote:
>> Hi,
>>
>> Thanks for the heads up. The interaction with MPLS push/pop
>> is a use case
> that was likely not tested during the NSH and generic
> encap/decap design. It's
> complex code and a long time ago. I'm willing to help, but I
> will need some
> time to go back and have a look.
>>
>> It would definitely help, if you could provide a minimal
>> example for
> reproducing the problem.
>>
>
> Hi Jan ,
>
> Thanks for your help.
>
> I was trying to implement ENCAP/DECAP support for MPLS.
>
> The programming of datapath flow for the below  userspace rule
> fails as there
> is set(eth() action between pop_mpls and recirc ovs-ofctl -O
> OpenFlow13 add-
> flow br_mpls2 "in_port=$egress_port,dl_type=0x8847
> actions=decap(),decap(packet_type(ns=0,type=0),goto_table:1
>
> 2021-04-05T05:46:49.192Z|00068|dpif(handler51)|WARN|system@ovs-
> system: failed to put[create] (Invalid argument)
> ufid:1dddb0ba-27fe-44ea-
> 9a99-5815764b4b9c
> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(6),skb_mark(0/0),ct_state
> (0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:01/00:
> 00:00:00:00:00,dst=00:00:00:00:00:02/00:00:00:00:00:00),eth_type(0x8847)
> ,mpls(label=2/0x0,tc=0/0,ttl=64/0x0,bos=1/1),
> actions:pop_eth,pop_mpls(eth_type=0x6558),set(eth()),recirc(0x45)
>

 Conceptually, what should happen in this scenario is that, after the
 second decap(packet_type(ns=0,type=0) action, OVS processes the
 unchanged inner packet as packet type PT_ETH, i.e. as L2 Ethernet
 frame. Overwriting the existing Ethernet header with zero values
 through set(eth()) is clearly incorrect. That is a logical error
 inside the ofproto-dpif-xlate module (see below).

 I believe the netdev userspace datapath would still have accepted
 the incorrect datapath flow. I have too little experience with the
 kernel datapath to explain why that rejects the datapath flow as
 invalid.

 Unlike in the Ethernet and NSH cases, the MPLS header does not
 contain any indication about the inner packet type. That is why the
 packet_type must be provided by the SDN controller as part of the
 decap() action.  And the ofproto-dpif-xlate module must consider the
 specified inner packet type when continuing the translation. In the
 general case, a decap() action should trigger recirculation for
 reparsing of the inner packet, so the new packet type must be set
 before recirculation. (Exceptions to the general recirculation rule
 are those where OVS has already parsed further into the packet and
 ofproto can modify the flow on the fly: decap(Ethernet) and possibly
 decap(MPLS) for all but the last bottom of stack label).

 I have had a look at your new code for encap/decap of MPLS headers,
 but I must admit I cannot fully judge in how far re-using the
 existing translation functions for MPLS label stacks written for the
 legacy push/pop_mpls case (i.e. manipulating a label stack between
 the L2 and the L3 headers of a PT_ETH Packet) are possible to re-use
 in the new context.

 BTW: Do you support multiple MPLS label encap or decap actions with
 your patch? Have you tested that?

>>> Yes, I will add those tests too.
>>
>> Maybe you could add some tests the same way NSH does, by sending a packet
>> and verifying the modified content. The current test does encap than decap,
>> so if both go wrong, or are skipped we are not actually testing anything.
>>
 I am uncertain about the handling of the ethertype of the
 decapsulated inner packet. In the design base, the ethertype that is
 set in the 

Re: [ovs-dev] [PATCH ovn v2 5/5] ovn-controller: Fix incremental processing for logical port references.

2021-06-21 Thread Han Zhou
On Fri, Jun 18, 2021 at 8:52 AM Dumitru Ceara  wrote:
>
> On 6/11/21 9:35 PM, Han Zhou wrote:
> > If a lflow has an lport name in the match, but when the lflow is
> > processed the port-binding is not seen by ovn-controller, the
> > corresponding openflow will not be created. Later if the port-binding is
> > created/monitored by ovn-controller, the lflow is not reprocessed
> > because the lflow didn't change and ovn-controller doesn't know that the
> > port-binding affects the lflow. This patch fixes the problem by tracking
> > the references when parsing the lflow, even if the port-binding is not
> > found when the lflow is firstly parsed. A test case is also added to
> > cover the scenario.
> >
> > Signed-off-by: Han Zhou 
> > ---
> >  controller/lflow.c  | 64 +++--
> >  controller/lflow.h  |  3 ++
> >  controller/ovn-controller.c | 17 --
> >  tests/ovn.at| 47 +++
> >  4 files changed, 113 insertions(+), 18 deletions(-)
> >
> > diff --git a/controller/lflow.c b/controller/lflow.c
> > index 34eca135a..7ae0ed15e 100644
> > --- a/controller/lflow.c
> > +++ b/controller/lflow.c
> > @@ -61,6 +61,7 @@ struct lookup_port_aux {
> >
> >  struct condition_aux {
> >  struct ovsdb_idl_index *sbrec_port_binding_by_name;
> > +const struct sbrec_datapath_binding *dp;
> >  const struct sbrec_chassis *chassis;
> >  const struct sset *active_tunnels;
> >  const struct sbrec_logical_flow *lflow;
> > @@ -98,6 +99,12 @@ lookup_port_cb(const void *aux_, const char
*port_name, unsigned int *portp)
> >
> >  const struct lookup_port_aux *aux = aux_;
> >
> > +/* Store the name that used to lookup the lport to lflow
reference, so that
> > + * in the future when the lport's port binding changes, the
logical flow
> > + * that references this lport can be reprocessed. */
> > +lflow_resource_add(aux->lfrr, REF_TYPE_PORTBINDING, port_name,
> > +   >lflow->header_.uuid);
> > +
> >  const struct sbrec_port_binding *pb
> >  = lport_lookup_by_name(aux->sbrec_port_binding_by_name,
port_name);
> >  if (pb && pb->datapath == aux->dp) {
> > @@ -149,19 +156,18 @@ is_chassis_resident_cb(const void *c_aux_, const
char *port_name)
> >  {
> >  const struct condition_aux *c_aux = c_aux_;
> >
> > +/* Store the port name that used to lookup the lport to lflow
reference, so
> > + * that in the future when the lport's port-binding changes the
logical
> > + * flow that references this lport can be reprocessed. */
> > +lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, port_name,
> > +   _aux->lflow->header_.uuid);
> > +
> >  const struct sbrec_port_binding *pb
> >  = lport_lookup_by_name(c_aux->sbrec_port_binding_by_name,
port_name);
> >  if (!pb) {
> >  return false;
> >  }
> >
> > -/* Store the port_name to lflow reference. */
> > -int64_t dp_id = pb->datapath->tunnel_key;
> > -char buf[16];
> > -get_unique_lport_key(dp_id, pb->tunnel_key, buf, sizeof(buf));
> > -lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, buf,
> > -   _aux->lflow->header_.uuid);
> > -
> >  if (strcmp(pb->type, "chassisredirect")) {
> >  /* for non-chassisredirect ports */
> >  return pb->chassis && pb->chassis == c_aux->chassis;
> > @@ -623,8 +629,6 @@ add_matches_to_flow_table(const struct
sbrec_logical_flow *lflow,
> >  int64_t dp_id = dp->tunnel_key;
> >  char buf[16];
> >  get_unique_lport_key(dp_id, port_id, buf, sizeof(buf));
> > -lflow_resource_add(l_ctx_out->lfrr,
REF_TYPE_PORTBINDING, buf,
> > -   >header_.uuid);
> >  if (!sset_contains(l_ctx_in->local_lport_ids, buf)) {
> >  VLOG_DBG("lflow "UUID_FMT
> >   " port %s in match is not local, skip",
> > @@ -788,6 +792,7 @@ consider_logical_flow__(const struct
sbrec_logical_flow *lflow,
> >  };
> >  struct condition_aux cond_aux = {
> >  .sbrec_port_binding_by_name =
l_ctx_in->sbrec_port_binding_by_name,
> > +.dp = dp,
> >  .chassis = l_ctx_in->chassis,
> >  .active_tunnels = l_ctx_in->active_tunnels,
> >  .lflow = lflow,
> > @@ -883,7 +888,9 @@ consider_logical_flow__(const struct
sbrec_logical_flow *lflow,
> >
> >  /* Cache new entry if caching is enabled. */
> >  if (lflow_cache_is_enabled(l_ctx_out->lflow_cache)) {
> > -if (cached_expr && !is_cr_cond_present) {
> > +if (cached_expr
> > +&& !lflow_ref_lookup(_ctx_out->lfrr->lflow_ref_table,
> > + >header_.uuid)) {
>
> Why is "!is_cr_cond_present" not OK here?  It seems to me that the
> result is the same except that lflow_ref_lookup() will do more work.

is_cr_cond_present 

Re: [ovs-dev] [PATCH ovn v2 3/5] ovn-northd: Remove lflow_add_unique.

2021-06-21 Thread Han Zhou
On Fri, Jun 18, 2021 at 12:12 PM Dumitru Ceara  wrote:
>
> On 6/18/21 9:10 PM, Han Zhou wrote:
> > On Fri, Jun 18, 2021 at 11:57 AM Dumitru Ceara 
wrote:
> >>
> >> On 6/18/21 5:51 PM, Dumitru Ceara wrote:
> >>> On 6/11/21 9:35 PM, Han Zhou wrote:
>  This patch removes the workaround when adding multicast group related
>  lflows, because the multicast group dependency problem is fixed in
>  ovn-controller in the previous commit.
> 
>  This patch also removes the UniqueFlow/AnnotatedFlow usage in northd
>  DDlog implementation for the same reason.
> 
>  Signed-off-by: Han Zhou 
>  ---
>   northd/ovn-northd.c  |  89 ++---
>   northd/ovn_northd.dl | 233
+++
>   tests/ovn-northd.at  |   2 +-
>   3 files changed, 137 insertions(+), 187 deletions(-)
> 
>  diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>  index 005c1fc86..411b14adf 100644
>  --- a/northd/ovn-northd.c
>  +++ b/northd/ovn-northd.c
>  @@ -3663,9 +3663,6 @@ build_ports(struct northd_context *ctx,
>   sset_destroy(_ha_chassis_grps);
>   }
> >>>
> >>> [...]
> >>>
>  @@ -6447,9 +6425,8 @@
> > build_lswitch_rport_arp_req_self_orig_flow(struct ovn_port *op,
> 
>   ds_put_format(, "eth.src == %s && (arp.op == 1 || nd_ns)",
> ds_cstr(_src));
>  -ovn_lflow_add_unique(lflows, od, S_SWITCH_IN_L2_LKUP, priority,
>  - ds_cstr(),
>  - "outport = \""MC_FLOOD_L2"\"; output;");
>  +ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, priority,
> > ds_cstr(),
>  +  "outport = \""MC_FLOOD_L2"\"; output;");
> 
>   sset_destroy(_eth_addrs);
>   ds_destroy(_src);
>  @@ -6502,7 +6479,7 @@ build_lswitch_rport_arp_req_flow_for_ip(struct
> > sset *ips,
>   ds_put_format(, "clone {outport = %s; output; }; "
>   "outport = \""MC_FLOOD_L2"\";
> > output;",
> patch_op->json_key);
>  -ovn_lflow_add_unique_with_hint(lflows, od,
> > S_SWITCH_IN_L2_LKUP,
>  +ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_L2_LKUP,
>  priority, ds_cstr(),
>  ds_cstr(),
stage_hint);
> >>>
> >>> Nit: indentation.
> >>>
> >>> Otherwise:
> >>>
> >>> Acked-by: Dumitru Ceara 
> >>>
> >>
> >> Actually, I'm not so sure about the ack anymore, with this applied some
> >> ovn-northd-ddlog tests fail, including:
> >>
> >> 772: ovn -- logical gatapath groups -- ovn-northd-ddlog -- dp-groups=no
> >> FAILED (ovn-macros.at:413)
> >>
> >> Regards,
> >> Dumitru
> >>
> > Thanks Dumitru for the review and test. All the test cases have passed
on
> > the earlier master branch, but after Numan's I-P patches for split
> > logical/physical flow_output I think I need to rebase this patch
series. I
> > didn't get time to do that yet, but I will do it in the next couple of
days
> > and re-test.
> >
>
> Looks like you need the following incremental for northd-ddlog.
>
> Regards,
> Dumitru
>
> diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> index 1323b935e..fb3840ac5 100644
> --- a/northd/ovn_northd.dl
> +++ b/northd/ovn_northd.dl
> @@ -1625,6 +1625,12 @@ UseLogicalDatapathGroups[false] :-
>  Unit(),
>  not nb in nb::NB_Global().
>
> +relation AnnotatedFlow(f: Flow, shared: bool)
> +AnnotatedFlow(f, b) :- UseLogicalDatapathGroups[b], Flow[f].
> +
> +relation UniqueFlow[Flow]
> +AnnotatedFlow(f, false) :- UniqueFlow[f].
> +
>  relation AggregatedFlow (
>  logical_datapaths: Set,
>  stage: Stage,
> @@ -1639,8 +1645,15 @@ AggregatedFlow(.logical_datapaths = g.to_set(),
> .__match = __match,
> .actions = actions,
> .external_ids = external_ids) :-
> -Flow(logical_datapath, stage, priority, __match, actions,
external_ids),
> +AnnotatedFlow(Flow{logical_datapath, stage, priority, __match,
actions, external_ids}, true),
>  var g = logical_datapath.group_by((stage, priority, __match,
actions, external_ids)).
> +AggregatedFlow(.logical_datapaths = set_singleton(logical_datapath),
> +   .stage = stage,
> +   .priority = priority,
> +   .__match = __match,
> +   .actions = actions,
> +   .external_ids = external_ids) :-
> +AnnotatedFlow(Flow{logical_datapath, stage, priority, __match,
actions, external_ids}, false).
>
>  for (f in AggregatedFlow()) {
>  var pipeline = if (f.stage.pipeline == Ingress) "ingress" else
"egress" in
>

Thanks Dumitru. I see the problem. I must have missed rerunning the ddlog
test after this v2 patch and for whatever reason I believed that I ran it
successfully, really embarrassing.

So in v2 when removing the AnnotatedFlow I made a mistake and left
UseLogicalDatapathGroups 

[ovs-dev] [PATCH ovn v3 1/3] ovn-northd: Remove lflow_add_unique.

2021-06-21 Thread Han Zhou
This patch removes the workaround when adding multicast group related
lflows, because the multicast group dependency problem is fixed in
ovn-controller in the previous commit.

This patch also removes the UniqueFlow/AnnotatedFlow usage in northd
DDlog implementation for the same reason.

Signed-off-by: Han Zhou 
---
 northd/ovn-northd.c  |  93 +++--
 northd/ovn_northd.dl | 232 ---
 tests/ovn-northd.at  |   2 +-
 3 files changed, 143 insertions(+), 184 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 75484c5cd..ae799cda9 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -3688,9 +3688,6 @@ build_ports(struct northd_context *ctx,
 sset_destroy(_ha_chassis_grps);
 }
 
-/* XXX: The 'ovn_lflow_add_unique*()' functions should be used for logical
- *  flows using a multicast group.
- *  See the comment on 'ovn_lflow_add_unique()' for details. */
 struct multicast_group {
 const char *name;
 uint16_t key;   /* OVN_MIN_MULTICAST...OVN_MAX_MULTICAST. */
@@ -4107,7 +4104,7 @@ static struct hashrow_locks lflow_locks;
  * Version to use when locking is required.
  */
 static void
-do_ovn_lflow_add(struct hmap *lflow_map, bool shared, struct ovn_datapath *od,
+do_ovn_lflow_add(struct hmap *lflow_map, struct ovn_datapath *od,
  uint32_t hash, enum ovn_stage stage, uint16_t priority,
  const char *match, const char *actions,
  const struct ovsdb_idl_row *stage_hint,
@@ -4117,7 +4114,7 @@ do_ovn_lflow_add(struct hmap *lflow_map, bool shared, 
struct ovn_datapath *od,
 struct ovn_lflow *old_lflow;
 struct ovn_lflow *lflow;
 
-if (shared && use_logical_dp_groups) {
+if (use_logical_dp_groups) {
 old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
actions, hash);
 if (old_lflow) {
@@ -4141,7 +4138,7 @@ do_ovn_lflow_add(struct hmap *lflow_map, bool shared, 
struct ovn_datapath *od,
 static void
 ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
  enum ovn_stage stage, uint16_t priority,
- const char *match, const char *actions, bool shared,
+ const char *match, const char *actions,
  const struct ovsdb_idl_row *stage_hint, const char *where)
 {
 ovs_assert(ovn_stage_to_datapath_type(stage) == ovn_datapath_get_type(od));
@@ -4155,11 +4152,11 @@ ovn_lflow_add_at(struct hmap *lflow_map, struct 
ovn_datapath *od,
 
 if (use_logical_dp_groups && use_parallel_build) {
 lock_hash_row(_locks, hash);
-do_ovn_lflow_add(lflow_map, shared, od, hash, stage, priority, match,
+do_ovn_lflow_add(lflow_map, od, hash, stage, priority, match,
  actions, stage_hint, where);
 unlock_hash_row(_locks, hash);
 } else {
-do_ovn_lflow_add(lflow_map, shared, od, hash, stage, priority, match,
+do_ovn_lflow_add(lflow_map, od, hash, stage, priority, match,
  actions, stage_hint, where);
 }
 }
@@ -4167,30 +4164,11 @@ ovn_lflow_add_at(struct hmap *lflow_map, struct 
ovn_datapath *od,
 /* Adds a row with the specified contents to the Logical_Flow table. */
 #define ovn_lflow_add_with_hint(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, \
 ACTIONS, STAGE_HINT) \
-ovn_lflow_add_at(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS, true, \
+ovn_lflow_add_at(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS, \
  STAGE_HINT, OVS_SOURCE_LOCATOR)
 
 #define ovn_lflow_add(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS) \
-ovn_lflow_add_at(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS, true, \
- NULL, OVS_SOURCE_LOCATOR)
-
-/* Adds a row with the specified contents to the Logical_Flow table.
- * Combining of this logical flow with already existing ones, e.g., by using
- * Logical Datapath Groups, is forbidden.
- *
- * XXX: ovn-controller assumes that a logical flow using multicast group always
- *  comes after or in the same database update with the corresponding
- *  multicast group.  That will not be the case with datapath groups.
- *  For this reason, the 'ovn_lflow_add_unique*()' functions should be used
- *  for such logical flows.
- */
-#define ovn_lflow_add_unique_with_hint(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, \
-   ACTIONS, STAGE_HINT) \
-ovn_lflow_add_at(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS, false, \
- STAGE_HINT, OVS_SOURCE_LOCATOR)
-
-#define ovn_lflow_add_unique(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS) \
-ovn_lflow_add_at(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS, false, \
+ovn_lflow_add_at(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS, \
  NULL, OVS_SOURCE_LOCATOR)
 
 static struct ovn_lflow *
@@ -6461,9 +6439,8 @@ 

[ovs-dev] [PATCH ovn v3 3/3] ovn-controller: Fix incremental processing for logical port references.

2021-06-21 Thread Han Zhou
If a lflow has an lport name in the match, but when the lflow is
processed the port-binding is not seen by ovn-controller, the
corresponding openflow will not be created. Later if the port-binding is
created/monitored by ovn-controller, the lflow is not reprocessed
because the lflow didn't change and ovn-controller doesn't know that the
port-binding affects the lflow. This patch fixes the problem by tracking
the references when parsing the lflow, even if the port-binding is not
found when the lflow is firstly parsed. A test case is also added to
cover the scenario.

Signed-off-by: Han Zhou 
---
 controller/lflow.c  | 63 ++---
 controller/lflow.h  |  3 ++
 controller/ovn-controller.c | 35 -
 include/ovn/expr.h  |  2 +-
 lib/expr.c  | 14 +++--
 tests/ovn.at| 47 +++
 tests/test-ovn.c|  4 +--
 utilities/ovn-trace.c   |  2 +-
 8 files changed, 132 insertions(+), 38 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 34eca135a..b7699a309 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -61,6 +61,7 @@ struct lookup_port_aux {
 
 struct condition_aux {
 struct ovsdb_idl_index *sbrec_port_binding_by_name;
+const struct sbrec_datapath_binding *dp;
 const struct sbrec_chassis *chassis;
 const struct sset *active_tunnels;
 const struct sbrec_logical_flow *lflow;
@@ -98,6 +99,12 @@ lookup_port_cb(const void *aux_, const char *port_name, 
unsigned int *portp)
 
 const struct lookup_port_aux *aux = aux_;
 
+/* Store the name that used to lookup the lport to lflow reference, so that
+ * in the future when the lport's port binding changes, the logical flow
+ * that references this lport can be reprocessed. */
+lflow_resource_add(aux->lfrr, REF_TYPE_PORTBINDING, port_name,
+   >lflow->header_.uuid);
+
 const struct sbrec_port_binding *pb
 = lport_lookup_by_name(aux->sbrec_port_binding_by_name, port_name);
 if (pb && pb->datapath == aux->dp) {
@@ -149,19 +156,18 @@ is_chassis_resident_cb(const void *c_aux_, const char 
*port_name)
 {
 const struct condition_aux *c_aux = c_aux_;
 
+/* Store the port name that used to lookup the lport to lflow reference, so
+ * that in the future when the lport's port-binding changes the logical
+ * flow that references this lport can be reprocessed. */
+lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, port_name,
+   _aux->lflow->header_.uuid);
+
 const struct sbrec_port_binding *pb
 = lport_lookup_by_name(c_aux->sbrec_port_binding_by_name, port_name);
 if (!pb) {
 return false;
 }
 
-/* Store the port_name to lflow reference. */
-int64_t dp_id = pb->datapath->tunnel_key;
-char buf[16];
-get_unique_lport_key(dp_id, pb->tunnel_key, buf, sizeof(buf));
-lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, buf,
-   _aux->lflow->header_.uuid);
-
 if (strcmp(pb->type, "chassisredirect")) {
 /* for non-chassisredirect ports */
 return pb->chassis && pb->chassis == c_aux->chassis;
@@ -623,8 +629,6 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
 int64_t dp_id = dp->tunnel_key;
 char buf[16];
 get_unique_lport_key(dp_id, port_id, buf, sizeof(buf));
-lflow_resource_add(l_ctx_out->lfrr, REF_TYPE_PORTBINDING, buf,
-   >header_.uuid);
 if (!sset_contains(l_ctx_in->local_lport_ids, buf)) {
 VLOG_DBG("lflow "UUID_FMT
  " port %s in match is not local, skip",
@@ -788,6 +792,7 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 };
 struct condition_aux cond_aux = {
 .sbrec_port_binding_by_name = l_ctx_in->sbrec_port_binding_by_name,
+.dp = dp,
 .chassis = l_ctx_in->chassis,
 .active_tunnels = l_ctx_in->active_tunnels,
 .lflow = lflow,
@@ -805,7 +810,6 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 struct hmap *matches = NULL;
 size_t matches_size = 0;
 
-bool is_cr_cond_present = false;
 bool pg_addr_set_ref = false;
 uint32_t n_conjs = 0;
 
@@ -843,8 +847,8 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 case LCACHE_T_NONE:
 case LCACHE_T_CONJ_ID:
 case LCACHE_T_EXPR:
-expr = expr_evaluate_condition(expr, is_chassis_resident_cb, _aux,
-   _cr_cond_present);
+expr = expr_evaluate_condition(expr, is_chassis_resident_cb,
+   _aux);
 expr = expr_normalize(expr);
 break;
 case LCACHE_T_MATCHES:
@@ -883,7 +887,9 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 
 /* Cache new 

[ovs-dev] [PATCH ovn v3 0/3] Fix ovn-controller I-P for multicast groups and lport changes

2021-06-21 Thread Han Zhou
The series fixes incremental processing for missing dependency handling for
multicast group and logical port binding changes when computing logical flows.
It also removes the workaround in northd that was required due to the missing
dependency handling. In addition, the fix also allows us to monitor all DPGs as
an optimization, so it is also included in the series.

v1 -> v2: Addresses comments from Dumitru.
v2 -> v3: Merged patch 1/5 and 1/5 of v2, revised the patch 3/5 (now 1/3) and
  5/5 (now 3/3) addressing Dumitru's comments.
- patch 1/3: fixes northd-ddlog when DP groups is disabled.
- patch 3/3: removes the unused variable/parameter
  is_cr_cond_present, and simplify a function returning.

Han Zhou (3):
  ovn-northd: Remove lflow_add_unique.
  ovn-controller: Always monitor all logical datapath groups.
  ovn-controller: Fix incremental processing for logical port
references.

 controller/lflow.c  |  64 +++---
 controller/lflow.h  |   3 +
 controller/ovn-controller.c |  44 +--
 northd/ovn-northd.c |  93 ++-
 northd/ovn_northd.dl| 232 +---
 tests/ovn-northd.at |   2 +-
 tests/ovn.at|  47 
 7 files changed, 275 insertions(+), 210 deletions(-)

-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 2/3] ovn-controller: Always monitor all logical datapath groups.

2021-06-21 Thread Han Zhou
Always monitor all logical datapath groups. Otherwise, DPG updates may
be received *after* the lflows using it are seen by ovn-controller.
Since the number of DPGs are relatively small, we monitor all DPGs to
avoid the unnecessarily extra control plane round trip for the lflows to
be processed.

Acked-by: Dumitru Ceara 
Signed-off-by: Han Zhou 
---
 controller/ovn-controller.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 1cfe4b713..c28fd6f2d 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -192,10 +192,15 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
 struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT();
 struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT();
 
+/* Always monitor all logical datapath groups. Otherwise, DPG updates may
+ * be received *after* the lflows using it are seen by ovn-controller.
+ * Since the number of DPGs are relatively small, we monitor all DPGs to
+ * avoid the unnecessarily extra wake-ups of ovn-controller. */
+ovsdb_idl_condition_add_clause_true();
+
 if (monitor_all) {
 ovsdb_idl_condition_add_clause_true();
 ovsdb_idl_condition_add_clause_true();
-ovsdb_idl_condition_add_clause_true();
 ovsdb_idl_condition_add_clause_true();
 ovsdb_idl_condition_add_clause_true();
 ovsdb_idl_condition_add_clause_true();
@@ -259,8 +264,6 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
 sbrec_port_binding_add_clause_datapath(, OVSDB_F_EQ, uuid);
 sbrec_logical_flow_add_clause_logical_datapath(, OVSDB_F_EQ,
uuid);
-sbrec_logical_dp_group_add_clause_datapaths(
-, OVSDB_F_INCLUDES, , 1);
 sbrec_mac_binding_add_clause_datapath(, OVSDB_F_EQ, uuid);
 sbrec_multicast_group_add_clause_datapath(, OVSDB_F_EQ, uuid);
 sbrec_dns_add_clause_datapaths(, OVSDB_F_INCLUDES, , 1);
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev