[ovs-dev] [BUG][revalidator] ovs crash and could NOT fix again after set request_mtu

2024-05-30 Thread Simon Jones
Hi all,

I'm using ovs-dpdk(ovs:2.17.1, dpdk:21.11.1).
Now I found a BUG that ovs crash and could NOT fix again after set
request_mtu.

1. How to reproduce and my Analysis:
```
# start ovs and add bridge and port and openflow

[root@bogon ~]# ovs-vsctl show
0444869c-dc4d-462f-8caf-074ecbab1a55
Bridge br-int
datapath_type: netdev
Port p0
Interface p0
type: dpdk
options: {dpdk-devargs=":c1:00.0"}
Port br-int
Interface br-int
type: internal
Bridge br-phy
datapath_type: netdev
Port pf1vf0
Interface pf1vf0
type: dpdk
options: {dpdk-devargs=":c1:00.1,representor=[0]"}
Port pf1vf1
Interface pf1vf1
type: dpdk
options: {dpdk-devargs=":c1:00.1,representor=[1]"}
Port br-phy
Interface br-phy
type: internal
Port pf1vf3
Interface pf1vf3
type: dpdk
options: {dpdk-devargs=":c1:00.1,representor=[3]"}
Port pf1vf2
Interface pf1vf2
type: dpdk
options: {dpdk-devargs=":c1:00.1,representor=[2]"}
ovs_version: "2.17.2"

[root@bogon ~]# ovs-ofctl dump-flows br-int
 cookie=0x0, duration=60216.364s, table=0, n_packets=16923639262,
n_bytes=984712027272, priority=0 actions=NORMAL

 865084 root  10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:48.23
revalidator53
 865123 root  10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:00.43
revalidator92
 865158 root  10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:58.49
revalidator127
 865171 root  10 -10  522.9g   1.6g  42808 S  17.3   0.6 176:29.69
revalidator140
 865058 root  10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:58.03
revalidator27
 865091 root  10 -10  522.9g   1.6g  42808 S  16.9   0.6 175:41.81
revalidator60
 865111 root  10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:05.97
revalidator80
 865113 root  10 -10  522.9g   1.6g  42808 S  16.9   0.6 177:09.64
revalidator82
 865130 root  10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:16.27
revalidator99
 865155 root  10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:11.22
revalidator124
 865097 root  10 -10  522.9g   1.6g  42808 S  16.6   0.6 177:00.22
revalidator66
 865110 root  10 -10  522.9g   1.6g  42808 S  16.6   0.6 175:16.52
revalidator79
 865149 root  10 -10  522.9g   1.6g  42808 S  16.6   0.6 176:00.84
revalidator118
 865151 root  10 -10  522.9g   1.6g  42808 S  16.6   0.6 176:29.06
revalidator120
 865057 root  10 -10  522.9g   1.6g  42808 S  16.3   0.6 178:03.60
revalidator26
 865070 root  10 -10  522.9g   1.6g  42808 S  16.3   0.6 176:17.63
revalidator39
 865112 root  10 -10  522.9g   1.6g  42808 S  16.3   0.6 175:35.65
revalidator81
 865083 root  10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:21.53
revalidator52
 865124 root  10 -10  522.9g   1.6g  42808 S  15.9   0.6 175:31.27
revalidator93
 865127 root  10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:59.65
revalidator96
 865147 root  10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:51.85
revalidator116
 865164 root  10 -10  522.9g   1.6g  42808 S  15.9   0.6 177:34.16
revalidator133
 865051 root  10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:27.68
revalidator20
 865066 root  10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:54.05
revalidator35
 865087 root  10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:38.54
revalidator56
 865100 root  10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:12.42
revalidator69
 865118 root  10 -10  522.9g   1.6g  42808 S  15.6   0.6 176:02.57
revalidator87
 865121 root  10 -10  522.9g   1.6g  42808 S  15.6   0.6 176:06.20
revalidator90
 865132 root  10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:24.71
revalidator101
 865148 root  10 -10  522.9g   1.6g  42808 S  15.6   0.6 179:07.53
revalidator117
 865162 root  10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:18.34
revalidator131
 865047 root  10 -10  522.9g   1.6g  42808 S  15.3   0.6 176:30.75
revalidator16
 865080 root  10 -10  522.9g   1.6g  42808 S  15.3   0.6 175:36.41
revalidator49
 865117 root  10 -10  522.9g   1.6g  42808 S  15.3   0.6 176:03.18
revalidator86
 865125 root  10 -10  522.9g   1.6g  42808 S  15.3   0.6 177:15.42
revalidator94
 865122 root  10 -10  522.9g   1.6g  42808 S  15.0   0.6 176:45.37
revalidator91
 865065 root  10 -10  522.9g   1.6g  42808 S  14.6   0.6 176:49.66
revalidator34
 865116 root  10 -10  522.9g   1.6g  42808 S  14.6   0.6 174:57.67
revalidator85
 865161 root  10 -10  522.9g   1.6g  42808 S  14.6   0.6 175:10.52
revalidator130
 865133 root  10 -10  522.9g   1.6g  42808 S  14.3   0.6 174:49.83
revalidator102
 865016 root  10 -10  522.9g   1.6g  42808 S   0.0   0.6   1:27.68
ovs-vswitchd
 865017 root  10 -10  522.9g   1.6g  42808 S   0.0 

Re: [ovs-dev] [PATCH v5] route-table: Add support for v4 via v6 route.

2024-05-30 Thread Ilya Maximets
On 5/30/24 20:17, Ilya Maximets wrote:
> On 5/30/24 01:27, William Tu wrote:
>> Add route-table support for ipv4 dst via ipv6. One use case is BGP
>> unnumbered, a mechanism that establishes peering sessions without the
>> need to explicitly configure IPv4 addresses on the interfaces involved
>> in the peering. Without using IPv4 address assignments, it uses
>> link-local IPv6 addresses of the directly connected neighbors for
>> peering purposes. For example, BGP might install the following route:
>> $ ip route get 100.87.18.3
>> 100.87.18.3 via inet6 fe80::920a:84ff:fe9e:9570 \
>> dev br-phy src 100.87.18.6
>>
>> Note that the v6 addr fe80::920a:84ff:fe9e:9570 is not being used in
>> the packet header, but only used for lookup the out dev br-phy.
>> Currently OVS can only support either all-ipv4 or all-ipv6, the patch
>> adds support for such use case.
>>
>> Reported-at: 
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-January/052908.html
>> Acked-by: Simon Horman 
>> Signed-off-by: William Tu 
>> ---
>> v5: fix minor CI failure
>> v4: feedback from Ilya
>> - add route del test case, wrap around test width
>> - not set neighbor cache manually
>> - on br-phy, use /32 on address in steead of /24
>> compare v3 and v4
>> https://github.com/williamtu/ovs/compare/router..router-v4
>> v3: add vxlan test, remove rfc
>> v2: fix CI error
>> ---
> 
> 
> 
>> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
>> index 508737c53ec6..7266f0990570 100644
>> --- a/tests/tunnel-push-pop.at
>> +++ b/tests/tunnel-push-pop.at
>> @@ -196,6 +196,69 @@ OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep 
>> 100022eb00012237 | wc -l`
>>  OVS_VSWITCHD_STOP
>>  AT_CLEANUP
>>  
>> +AT_SETUP([tunnel_push_pop - v4 via v6 route])
>> +
>> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy 
>> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>> +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br 
>> datapath_type=dummy], [0])
>> +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=vxlan \
>> +   options:remote_ip=1.1.2.92 options:key=123 
>> ofport_request=1\
>> +   ], [0])
>> +
>> +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
>> +dummy@ovs-dummy: hit:0 missed:0
>> +  br0:
>> +br0 65534/100: (dummy-internal)
>> +p0 1/1: (dummy)
>> +  int-br:
>> +int-br 65534/2: (dummy-internal)
>> +t1 1/4789: (vxlan: key=123, remote_ip=1.1.2.92)
>> +])
>> +
>> +AT_CHECK([ovs-ofctl add-flow br0 action=normal])
>> +
>> +dnl Setup dummy interface IP addresses.
>> +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/32], [0], [OK
>> +])
>> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/64], [0], [OK
>> +])
>> +dnl Add a static v4 via v6 route
>> +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/32 br0 2001:cafe::10 
>> src=1.1.2.89], [0], [OK
>> +])
>> +
>> +AT_CHECK([ovs-appctl ovs/route/show | grep br0 | sort], [0], [dnl
>> +Cached: 1.1.2.88/32 dev br0 SRC 1.1.2.88 local
>> +Cached: 2001:cafe::/64 dev br0 SRC 2001:cafe::88 local
>> +User: 1.1.2.92/32 dev br0 GW 2001:cafe::10 SRC 1.1.2.89
>> +])
>> +
>> +dnl Check ARP Snoop
>> +AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(100),dnl
>> +eth(src=f8:bc:12:44:34:b6,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),dnl
>> +arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
> 
> This is still not a correct test, we would never receive an ARP
> from an IPv6-only network.  This must be an IPv6 NA packet instead.
> 
> All in all, I'd expect the following test to work without modifications
> (unless I mistyped something):
> 
> ---
> AT_SETUP([tunnel_push_pop - v4 via v6 route])
> 
> OVS_VSWITCHD_START(
> [add-port br0 p0 \
>  -- set Interface p0 type=dummy ofport_request=1 \
>  other-config:hwaddr=aa:55:aa:55:00:00])
> AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg])
> AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy])
> AT_CHECK([ovs-vsctl add-port int-br t2 \
>   -- set Interface t2 type=geneve \
>   options:remote_ip=1.1.2.92 \
>   options:key=123 ofport_request=2])
> 
> dnl Setup IP addresses.
> AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/32], [0], [OK
> ])
> AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/64], [0], [OK
> ])
> dnl Adding a static v4 via v6 route.
> AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/32 br0 2001:cafe::10 
> src=1.1.2.88], [0], [OK
> ])
> 
> dnl Checking that a local route for added IP was successfully installed.
> AT_CHECK([ovs-appctl ovs/route/show | grep br0 | sort], [0], [dnl
> Cached: 1.1.2.88/32 dev br0 SRC 1.1.2.88 local
> Cached: 2001:cafe::/64 dev br0 SRC 2001:cafe::88 local
> User: 1.1.2.92/32 dev br0 GW 2001:cafe::10 SRC 1.1.2.88
> ])
> 
> AT_CHECK([ovs-ofctl add-flow br0 action=normal])
> AT_CHECK([ovs-ofctl add-flow int-br action=normal])
> 
> AT_CHECK([

Re: [ovs-dev] [PATCH v5] route-table: Add support for v4 via v6 route.

2024-05-30 Thread Ilya Maximets
On 5/30/24 01:27, William Tu wrote:
> Add route-table support for ipv4 dst via ipv6. One use case is BGP
> unnumbered, a mechanism that establishes peering sessions without the
> need to explicitly configure IPv4 addresses on the interfaces involved
> in the peering. Without using IPv4 address assignments, it uses
> link-local IPv6 addresses of the directly connected neighbors for
> peering purposes. For example, BGP might install the following route:
> $ ip route get 100.87.18.3
> 100.87.18.3 via inet6 fe80::920a:84ff:fe9e:9570 \
> dev br-phy src 100.87.18.6
> 
> Note that the v6 addr fe80::920a:84ff:fe9e:9570 is not being used in
> the packet header, but only used for lookup the out dev br-phy.
> Currently OVS can only support either all-ipv4 or all-ipv6, the patch
> adds support for such use case.
> 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-January/052908.html
> Acked-by: Simon Horman 
> Signed-off-by: William Tu 
> ---
> v5: fix minor CI failure
> v4: feedback from Ilya
> - add route del test case, wrap around test width
> - not set neighbor cache manually
> - on br-phy, use /32 on address in steead of /24
> compare v3 and v4
> https://github.com/williamtu/ovs/compare/router..router-v4
> v3: add vxlan test, remove rfc
> v2: fix CI error
> ---



> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
> index 508737c53ec6..7266f0990570 100644
> --- a/tests/tunnel-push-pop.at
> +++ b/tests/tunnel-push-pop.at
> @@ -196,6 +196,69 @@ OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep 
> 100022eb00012237 | wc -l`
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([tunnel_push_pop - v4 via v6 route])
> +
> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy 
> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
> +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy], 
> [0])
> +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=vxlan \
> +   options:remote_ip=1.1.2.92 options:key=123 
> ofport_request=1\
> +   ], [0])
> +
> +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
> +dummy@ovs-dummy: hit:0 missed:0
> +  br0:
> +br0 65534/100: (dummy-internal)
> +p0 1/1: (dummy)
> +  int-br:
> +int-br 65534/2: (dummy-internal)
> +t1 1/4789: (vxlan: key=123, remote_ip=1.1.2.92)
> +])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 action=normal])
> +
> +dnl Setup dummy interface IP addresses.
> +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/32], [0], [OK
> +])
> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/64], [0], [OK
> +])
> +dnl Add a static v4 via v6 route
> +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/32 br0 2001:cafe::10 
> src=1.1.2.89], [0], [OK
> +])
> +
> +AT_CHECK([ovs-appctl ovs/route/show | grep br0 | sort], [0], [dnl
> +Cached: 1.1.2.88/32 dev br0 SRC 1.1.2.88 local
> +Cached: 2001:cafe::/64 dev br0 SRC 2001:cafe::88 local
> +User: 1.1.2.92/32 dev br0 GW 2001:cafe::10 SRC 1.1.2.89
> +])
> +
> +dnl Check ARP Snoop
> +AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(100),dnl
> +eth(src=f8:bc:12:44:34:b6,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),dnl
> +arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])

This is still not a correct test, we would never receive an ARP
from an IPv6-only network.  This must be an IPv6 NA packet instead.

All in all, I'd expect the following test to work without modifications
(unless I mistyped something):

---
AT_SETUP([tunnel_push_pop - v4 via v6 route])

OVS_VSWITCHD_START(
[add-port br0 p0 \
 -- set Interface p0 type=dummy ofport_request=1 \
 other-config:hwaddr=aa:55:aa:55:00:00])
AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg])
AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy])
AT_CHECK([ovs-vsctl add-port int-br t2 \
  -- set Interface t2 type=geneve \
  options:remote_ip=1.1.2.92 \
  options:key=123 ofport_request=2])

dnl Setup IP addresses.
AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/32], [0], [OK
])
AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/64], [0], [OK
])
dnl Adding a static v4 via v6 route.
AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/32 br0 2001:cafe::10 src=1.1.2.88], 
[0], [OK
])

dnl Checking that a local route for added IP was successfully installed.
AT_CHECK([ovs-appctl ovs/route/show | grep br0 | sort], [0], [dnl
Cached: 1.1.2.88/32 dev br0 SRC 1.1.2.88 local
Cached: 2001:cafe::/64 dev br0 SRC 2001:cafe::88 local
User: 1.1.2.92/32 dev br0 GW 2001:cafe::10 SRC 1.1.2.88
])

AT_CHECK([ovs-ofctl add-flow br0 action=normal])
AT_CHECK([ovs-ofctl add-flow int-br action=normal])

AT_CHECK([ovs-vsctl -- set Interface p0 options:tx_pcap=p0.pcap])

dnl Check that v4-over-v6 route is used in the trace and that a tunnel neighbor
dnl lookup miss generates ND and not an ARP.
AT_CHECK([ovs-appctl ofproto/

Re: [ovs-dev] [PATCH v2 branch-2.17] dpdk: Use DPDK 21.11.7 release for OVS 2.17.

2024-05-30 Thread Kevin Traynor
On 30/05/2024 13:13, Eelco Chaudron wrote:
> 
> 
> On 28 May 2024, at 11:25, Kevin Traynor wrote:
> 
>> Update the CI and docs to use DPDK 21.11.7.
>>
>> Signed-off-by: Kevin Traynor 
> 
> Thanks Kevin, changes look good to me.
> 
> Acked-by: Eelco Chaudron 
> 

Thanks Ilya and Eelco. Applied and pushed to the relevant branches.

Kevin.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4] odp-execute: Fix AVX checksum calculation.

2024-05-30 Thread Eelco Chaudron



On 24 May 2024, at 11:20, Emma Finn wrote:

> The AVX implementation for calcualting checksums was not
> handling carry-over addition correctly in some cases.
> This patch adds an additional shuffle to add 16-bit padding to
> the final part of the calculation to handle such cases. This
> commit also adds a unit test to check the checksum carry-bits
> issue with actions autovalidator enabled.
>
> Signed-off-by: Emma Finn 
> Reported-by: Eelco Chaudron 

Thanks Emma and others for the feedback on the patch. It has been applied 
upstream.

Cheers,

Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] ofproto-dpif-upcall: Avoid stale ukeys leaks.

2024-05-30 Thread Eelco Chaudron


On 23 May 2024, at 12:46, Roi Dayan via dev wrote:

> It is observed in some environments that there are much more ukeys than
> actual DP flows. For example:
>
> $ ovs-appctl upcall/show
> system@ovs-system:
> flows : (current 7) (avg 6) (max 117) (limit 2125)
> offloaded flows : 525
> dump duration : 1063ms
> ufid enabled : true
>
> 23: (keys 3612)
> 24: (keys 3625)
> 25: (keys 3485)
>
> The revalidator threads are busy revalidating the stale ukeys leading to
> high CPU and long dump duration.
>
> There are some possible situations that may result in stale ukeys that
> have no corresponding DP flows.
>
> In revalidator, push_dp_ops() doesn't check error if the op type is not
> DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload
> case, where the old flow is deleted first and then the new one is
> created. If the creation fails, the ukey will be stale (no corresponding
> DP flow). This patch adds a warning in such case.

Not sure I understand, this behavior should be captured by the 
UKEY_INCONSISTENT state.

> Another possible scenario is in handle_upcalls() if a PUT operation did
> not succeed and op->error attribute was not set correctly it can lead to
> stale ukey in operational state.


Guess we need to figure out these cases, as these are the root cause of your 
problem.

>
> This patch adds checks in the sweep phase for such ukeys and move them
> to DELETE so that they can be cleared eventually.
>
> Co-authored-by: Han Zhou 
> Signed-off-by: Han Zhou 
> Signed-off-by: Roi Dayan 
> ---
>  ofproto/ofproto-dpif-upcall.c | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 83609ec62b63..e9520ebdf910 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow);
>  COVERAGE_DEFINE(dumped_new_flow);
>  COVERAGE_DEFINE(handler_duplicate_upcall);
>  COVERAGE_DEFINE(revalidate_missed_dp_flow);
> +COVERAGE_DEFINE(revalidate_missed_dp_flow_del);
>  COVERAGE_DEFINE(ukey_dp_change);
>  COVERAGE_DEFINE(ukey_invalid_stat_reset);
>  COVERAGE_DEFINE(ukey_replace_contention);
> @@ -278,6 +279,7 @@ enum flow_del_reason {
>  FDR_BAD_ODP_FIT,/* Bad ODP flow fit. */
>  FDR_FLOW_IDLE,  /* Flow idle timeout. */
>  FDR_FLOW_LIMIT, /* Kill all flows condition reached. */
> +FDR_FLOW_STALE, /* Flow stale detected. */

Please also update the associated script, see above comment.

>  FDR_FLOW_WILDCARDED,/* Flow needs a narrower wildcard mask. */
>  FDR_NO_OFPROTO, /* Bridge not found. */
>  FDR_PURGE,  /* User requested flow deletion. */
> @@ -2557,6 +2559,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, 
> size_t n_ops)


The (op->dop.error) condition below will never be reached, as it’s explicitly 
checked above. So the error message will never happen. The condition you 
explain is already covered by UKEY_INCONSISTENT. Not sure if we need a log 
message for this either at warn level, maybe debug, especially as you say it 
can happen often.

>  if (op->dop.type != DPIF_OP_FLOW_DEL) {
>  /* Only deleted flows need their stats pushed. */
> +if (op->dop.error) {
> +VLOG_WARN_RL(&rl, "push_dp_ops: error %d in op type %d, ukey 
> "
> + "%p", op->dop.error, op->dop.type, op->ukey);
> +}
>  continue;
>  }
>
> @@ -3027,6 +3033,15 @@ revalidator_sweep__(struct revalidator *revalidator, 
> bool purge)
>  del_reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL;
>  } else if (!seq_mismatch) {
>  result = UKEY_KEEP;
> +} else if (!ukey->stats.used &&
> +   udpif_flow_time_delta(udpif, ukey) * 1000 >
> +   ofproto_max_idle) {

In theory, this is a missed dp flow, i.e. the same as the else {} case below.
Maybe we should change revalidate_ukey() to have a test for this?


Maybe something like this (not tested):

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index e9520ebdf..c34978fc9 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -279,7 +279,7 @@ enum flow_del_reason {
 FDR_BAD_ODP_FIT,/* Bad ODP flow fit. */
 FDR_FLOW_IDLE,  /* Flow idle timeout. */
 FDR_FLOW_LIMIT, /* Kill all flows condition reached. */
-FDR_FLOW_STALE, /* Flow stale detected. */
+FDR_FLOW_NO_STATS_IDLE, /* Flow idled out without receiving statistics. */
 FDR_FLOW_WILDCARDED,/* Flow needs a narrower wildcard mask. */
 FDR_NO_OFPROTO, /* Bridge not found. */
 FDR_PURGE,  /* User requested flow deletion. */
@@ -2452,7 +2452,14 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
*ukey,
 log_unexpected_stats_jum

Re: [ovs-dev] [PATCH v2 ovn] northd: Add bfd and bfd_consumer nodes to I-P engine.

2024-05-30 Thread Han Zhou
On Wed, May 29, 2024 at 5:26 AM Lorenzo Bianconi <
lorenzo.bianc...@redhat.com> wrote:
>
> Introduce bfd and bfd_consumer nodes to northd I-P engine to track bfd
> connections and northd static_route/policy_route changes.
>
> Reported-at: https://issues.redhat.com/browse/FDP-600
> Signed-off-by: Lorenzo Bianconi 
> ---
> Changes since v1:
> - add incremental processing logic for bfd_consumer node to avoid a full
>   recompute

Thanks Lorenzo for the updates. Please see my comments below.

> ---
>  northd/en-lflow.c|  23 +-
>  northd/en-northd.c   |  97 +
>  northd/en-northd.h   |   8 +
>  northd/inc-proc-northd.c |  23 +-
>  northd/northd.c  | 458 +--
>  northd/northd.h  |  57 -
>  6 files changed, 526 insertions(+), 140 deletions(-)
>
> diff --git a/northd/en-lflow.c b/northd/en-lflow.c
> index c4b927fb8..b8ce8fb96 100644
> --- a/northd/en-lflow.c
> +++ b/northd/en-lflow.c
> @@ -41,6 +41,9 @@ lflow_get_input_data(struct engine_node *node,
>   struct lflow_input *lflow_input)
>  {
>  struct northd_data *northd_data = engine_get_input_data("northd",
node);
> +struct bfd_data *bfd_data = engine_get_input_data("bfd", node);
> +struct bfd_consumer_data *bfd_consumer_data =
> +engine_get_input_data("bfd_consumer", node);
>  struct port_group_data *pg_data =
>  engine_get_input_data("port_group", node);
>  struct sync_meters_data *sync_meters_data =
> @@ -50,10 +53,6 @@ lflow_get_input_data(struct engine_node *node,
>  struct ed_type_ls_stateful *ls_stateful_data =
>  engine_get_input_data("ls_stateful", node);
>
> -lflow_input->nbrec_bfd_table =
> -EN_OVSDB_GET(engine_get_input("NB_bfd", node));
> -lflow_input->sbrec_bfd_table =
> -EN_OVSDB_GET(engine_get_input("SB_bfd", node));
>  lflow_input->sbrec_logical_flow_table =
>  EN_OVSDB_GET(engine_get_input("SB_logical_flow", node));
>  lflow_input->sbrec_multicast_group_table =
> @@ -78,7 +77,10 @@ lflow_get_input_data(struct engine_node *node,
>  lflow_input->meter_groups = &sync_meters_data->meter_groups;
>  lflow_input->lb_datapaths_map = &northd_data->lb_datapaths_map;
>  lflow_input->svc_monitor_map = &northd_data->svc_monitor_map;
> -lflow_input->bfd_connections = NULL;
> +lflow_input->bfd_connections = &bfd_data->bfd_connections;
> +lflow_input->parsed_routes = &bfd_consumer_data->parsed_routes;
> +lflow_input->route_tables = &bfd_consumer_data->route_tables;
> +lflow_input->route_policies = &bfd_consumer_data->route_policies;
>
>  struct ed_type_global_config *global_config =
>  engine_get_input_data("global_config", node);
> @@ -95,25 +97,14 @@ void en_lflow_run(struct engine_node *node, void
*data)
>  struct lflow_input lflow_input;
>  lflow_get_input_data(node, &lflow_input);
>
> -struct hmap bfd_connections = HMAP_INITIALIZER(&bfd_connections);
> -lflow_input.bfd_connections = &bfd_connections;
> -
>  stopwatch_start(BUILD_LFLOWS_STOPWATCH_NAME, time_msec());
>
>  struct lflow_data *lflow_data = data;
>  lflow_table_clear(lflow_data->lflow_table);
>  lflow_reset_northd_refs(&lflow_input);
>
> -build_bfd_table(eng_ctx->ovnsb_idl_txn,
> -lflow_input.nbrec_bfd_table,
> -lflow_input.sbrec_bfd_table,
> -lflow_input.lr_ports,
> -&bfd_connections);
>  build_lflows(eng_ctx->ovnsb_idl_txn, &lflow_input,
>   lflow_data->lflow_table);
> -bfd_cleanup_connections(lflow_input.nbrec_bfd_table,
> -&bfd_connections);
> -hmap_destroy(&bfd_connections);
>  stopwatch_stop(BUILD_LFLOWS_STOPWATCH_NAME, time_msec());
>
>  engine_set_node_state(node, EN_UPDATED);
> diff --git a/northd/en-northd.c b/northd/en-northd.c
> index 4479b4aff..c9ff152c9 100644
> --- a/northd/en-northd.c
> +++ b/northd/en-northd.c
> @@ -236,6 +236,71 @@ northd_global_config_handler(struct engine_node
*node, void *data OVS_UNUSED)
>  return true;
>  }
>
> +void
> +en_bfd_consumer_run(struct engine_node *node, void *data)
> +{
> +
> +struct northd_data *northd_data = engine_get_input_data("northd",
node);
> +struct bfd_data *bfd_data = engine_get_input_data("bfd", node);
> +const struct nbrec_bfd_table *nbrec_bfd_table =
> +EN_OVSDB_GET(engine_get_input("NB_bfd", node));
> +struct bfd_consumer_data *bfd_consumer_data = data;
> +enum engine_node_state state = EN_UNCHANGED;
> +
> +struct ovn_datapath *od;
> +HMAP_FOR_EACH (od, key_node, &northd_data->lr_datapaths.datapaths) {
> +for (int i = 0; i < od->nbr->n_ports; i++) {
> +const char *route_table_name =
> +smap_get(&od->nbr->ports[i]->options, "route_table");
> +get_route_table_id(&bfd_consumer_data->route_tables,
> +   route_table_na

Re: [ovs-dev] [PATCH ovn 5/5] controller: Handle postponed ports release.

2024-05-30 Thread Numan Siddique
On Tue, Apr 23, 2024 at 7:54 AM Xavier Simonart  wrote:
>
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/405107.html
> Suggested-by: Priyankar Jain 
>
> Signed-off-by: Xavier Simonart 
> ---
>  controller/binding.c | 12 +-
>  tests/ovn.at | 57 
>  2 files changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 0bef5dc42..c37066cbe 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -1317,6 +1317,14 @@ lport_maybe_postpone(const char *port_name, long long 
> int now,
>  return true;
>  }
>
> +static bool
> +is_postponed_port(const char *port_name)
> +{
> +struct claimed_port *cp =
> +(struct claimed_port *) sset_find(&_postponed_ports, port_name);
> +return !!cp;
> +}

This is wrong.  '_postponed_ports' is an sset and not shash and
sset_find returns 'struct sset_node'.

You can use sset_contains i.,e

static bool
is_postponed_port(const char *port_name)
{
return sset_contains(&_postponed_ports, port_name);
}



With the above addressed:

Acked-by: Numan Siddique 

Numan


> +
>  /* Returns false if lport is not claimed due to 'sb_readonly'.
>   * Returns true otherwise.
>   */
> @@ -1491,7 +1499,8 @@ is_binding_lport_this_chassis(struct binding_lport 
> *b_lport,
>  {
>  return (b_lport && b_lport->pb && chassis &&
>  (b_lport->pb->chassis == chassis
> - || is_additional_chassis(b_lport->pb, chassis)));
> + || is_additional_chassis(b_lport->pb, chassis)
> + || is_postponed_port(b_lport->pb->logical_port)));
>  }
>
>  /* Returns 'true' if the 'lbinding' has binding lports of type LP_CONTAINER,
> @@ -1593,6 +1602,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb,
>  }
>
>  if (!lbinding_set || !can_bind) {
> +remove_related_lport(pb, b_ctx_out);
>  return release_lport(pb, b_ctx_in->chassis_rec,
>   !b_ctx_in->ovnsb_idl_txn,
>   b_ctx_out->tracked_dp_bindings,
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 74c5bccc0..b0aba2207 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -37838,3 +37838,60 @@ OVN_CLEANUP([hv1])
>
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([Deleting vif while controller fight for port claim])
> +ovn_start
> +
> +ovn-nbctl ls-add ls0
> +ovn-nbctl lsp-add ls0 lsp0
> +ovn-nbctl lsp-add ls0 lsp1
> +
> +net_add n1
> +for i in 1 2; do
> +sim_add hv$i
> +as hv$i
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.$i
> +done
> +
> +check ovn-nbctl --wait=hv sync
> +hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1)
> +hv2_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv2)
> +
> +as hv1 ovs-vsctl -- add-port br-int vif1 -- set Interface vif1 
> external-ids:iface-id=lsp1
> +as hv1 ovs-vsctl -- add-port br-int vif -- set Interface vif 
> external-ids:iface-id=lsp0
> +wait_for_ports_up
> +
> +# Delete vif => store flows w/ only vif1, and no vif
> +as hv1 ovs-vsctl -- del-port br-int vif
> +check ovn-nbctl --wait=hv sync
> +DUMP_FLOWS([hv1], [oflows1])
> +sleep_controller hv1
> +as hv2 ovs-vsctl -- add-port br-int vif -- set Interface vif 
> external-ids:iface-id=lsp0
> +as hv1 ovs-vsctl -- add-port br-int vif -- set Interface vif 
> external-ids:iface-id=lsp0
> +
> +OVS_WAIT_UNTIL([
> +chassis=$(ovn-sbctl --bare --columns chassis list port_binding lsp0)
> +test "$chassis" = $hv2_uuid
> +])
> +
> +sleep_sb
> +wake_up_controller hv1
> +sleep_controller hv2
> +
> +as hv1 ovs-vsctl -- del-port br-int vif
> +wake_up_sb
> +wake_up_controller hv2
> +
> +as hv2 ovs-vsctl -- del-port br-int vif
> +check ovn-nbctl --wait=hv sync
> +
> +DUMP_FLOWS([hv1], [oflows2])
> +
> +check diff oflows1 oflows2
> +OVN_CLEANUP([hv1],[hv2])
> +
> +AT_CLEANUP
> +])
> +
> --
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 4/5] controller: Handle postponed ports claims.

2024-05-30 Thread Numan Siddique
On Tue, Apr 23, 2024 at 7:54 AM Xavier Simonart  wrote:
>
> When a port was claimed by two chassis, both chassis were fighting for the 
> port;
> a claim was postponed if the port was claimed recently.
> However, there were two issues:
> - Not all the flows were properly removed when the remote chassis was claiming
>   the port.
> - A port was not properly released if claim was postponed when the 
> port_binding
>   was deleted.
>
> There were also multiple cases causing ovn-controller to always wake-up
> immediately as a port was not removed from postponed_port list.
> - Changing port type to localport while port claim was postponed.
> - Deleting parent of a container port while parent was postponed.
>
> Signed-off-by: Xavier Simonart 


Acked-by: Numan Siddique 

Numan

> ---
>  controller/binding.c | 29 +++--
>  tests/ovn.at | 10 ++
>  2 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 1499ceae1..0bef5dc42 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -1270,6 +1270,18 @@ update_port_additional_encap_if_needed(
>  return true;
>  }
>
> +static bool
> +is_requested_additional_chassis(const struct sbrec_port_binding *pb,
> +  const struct sbrec_chassis *chassis_rec)
> +{
> +for (size_t i = 0; i < pb->n_requested_additional_chassis; i++) {
> +if (pb->requested_additional_chassis[i] == chassis_rec) {
> +return true;
> +}
> +}
> +return false;
> +}
> +
>  bool
>  is_additional_chassis(const struct sbrec_port_binding *pb,
>const struct sbrec_chassis *chassis_rec)
> @@ -1587,6 +1599,15 @@ consider_vif_lport_(const struct sbrec_port_binding 
> *pb,
>   b_ctx_out->if_mgr);
>  }
>  }
> +if (pb->chassis != b_ctx_in->chassis_rec
> +&& !is_requested_additional_chassis(pb, b_ctx_in->chassis_rec)
> +&& if_status_is_port_claimed(b_ctx_out->if_mgr,
> + pb->logical_port)) {
> +update_lport_tracking(pb, b_ctx_out->tracked_dp_bindings, false);
> +if_status_mgr_remove_ovn_installed(b_ctx_out->if_mgr,
> +   b_lport->lbinding->iface->name,
> +   &b_lport->lbinding->iface->header_.uuid);
> +}
>
>  return true;
>  }
> @@ -1787,7 +1808,8 @@ consider_localport(const struct sbrec_port_binding *pb,
>  struct shash *local_bindings = &b_ctx_out->lbinding_data->bindings;
>  struct local_binding *lbinding = local_binding_find(local_bindings,
>  pb->logical_port);
> -
> +/* Make sure there is no previous postponed port claim */
> +sset_find_and_delete(b_ctx_out->postponed_ports, pb->logical_port);
>  if (!lbinding) {
>  return true;
>  }
> @@ -2754,11 +2776,14 @@ handle_deleted_vif_lport(const struct 
> sbrec_port_binding *pb,
>   binding_lport_delete(binding_lports, b_lport);
>  }
>
> -if (bound && lbinding && lport_type == LP_VIF) {
> +if ((lbinding && lport_type == LP_VIF) &&
> +(bound || sset_find_and_delete(b_ctx_out->postponed_ports,
> +   pb->logical_port))) {
>  /* We need to release the container/virtual binding lports (if any) 
> if
>   * deleted 'pb' type is LP_VIF. */
>  struct binding_lport *c_lport;
>  LIST_FOR_EACH (c_lport, list_node, &lbinding->binding_lports) {
> +sset_find_and_delete(b_ctx_out->postponed_ports, c_lport->name);
>  remove_local_lports(c_lport->pb->logical_port, b_ctx_out);
>  if (!release_binding_lport(b_ctx_in->chassis_rec, c_lport,
> !b_ctx_in->ovnsb_idl_txn,
> diff --git a/tests/ovn.at b/tests/ovn.at
> index b68678472..74c5bccc0 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -16413,6 +16413,10 @@ ovn_start
>
>  ovn-nbctl ls-add ls0
>  ovn-nbctl lsp-add ls0 lsp0
> +ovn-nbctl lsp-add ls0 lsp1
> +ovn-nbctl lsp-add ls0 lsp2
> +check ovn-nbctl lsp-add ls0 lsp-cont1 lsp0 1
> +
>
>  net_add n1
>  for i in 1 2; do
> @@ -16426,6 +16430,8 @@ for i in 1 2; do
>  as hv$i
>  ovs-vsctl -- add-port br-int vif \
>-- set Interface vif external-ids:iface-id=lsp0
> +ovs-vsctl -- add-port br-int vif$i \
> +  -- set Interface vif$i external-ids:iface-id=lsp$i
>  done
>
>  # give controllers some time to fight for the port binding
> @@ -16443,6 +16449,10 @@ max_claims=20
>  AT_CHECK([test "${hv1_claims}" -le "${max_claims}"], [0], [])
>  AT_CHECK([test "${hv2_claims}" -le "${max_claims}"], [0], [])
>
> +check ovn-nbctl --wait=hv lsp-del lsp0
> +CHECK_FLOWS_AFTER_RECOMPUTE([hv1], [hv1])
> +CHECK_FLOWS_AFTER_RECOMPUTE([hv2], [hv2])
> +
>  OVN_CLEANUP([hv1],[hv2])
>
>  AT_CLEANUP
> --
> 2.31.1
>
> ___

Re: [ovs-dev] [PATCH ovn 3/5] controller: Fix deletion of container parent port.

2024-05-30 Thread Numan Siddique
On Tue, Apr 23, 2024 at 7:54 AM Xavier Simonart  wrote:
>
> Flows were not properly removed when parent port of a container port
> was deleted.
>
> Signed-off-by: Xavier Simonart 

Acked-by: Numan Siddique 

Numan

> ---
>  controller/binding.c  |  1 +
>  controller/physical.c |  3 ++-
>  tests/ovn.at  | 34 ++
>  3 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 3b36ed881..1499ceae1 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -2759,6 +2759,7 @@ handle_deleted_vif_lport(const struct 
> sbrec_port_binding *pb,
>   * deleted 'pb' type is LP_VIF. */
>  struct binding_lport *c_lport;
>  LIST_FOR_EACH (c_lport, list_node, &lbinding->binding_lports) {
> +remove_local_lports(c_lport->pb->logical_port, b_ctx_out);
>  if (!release_binding_lport(b_ctx_in->chassis_rec, c_lport,
> !b_ctx_in->ovnsb_idl_txn,
> b_ctx_out)) {
> diff --git a/controller/physical.c b/controller/physical.c
> index 7ee308694..98f7dbab2 100644
> --- a/controller/physical.c
> +++ b/controller/physical.c
> @@ -1026,7 +1026,8 @@ put_local_common_flows(uint32_t dp_key,
>  put_resubmit(OFTABLE_LOG_TO_PHY, ofpacts_p);
>  put_stack(MFF_IN_PORT, ofpact_put_STACK_POP(ofpacts_p));
>  ofctrl_check_and_add_flow_metered(flow_table, OFTABLE_SAVE_INPORT, 
> 100,
> -  0, &match, ofpacts_p, hc_uuid,
> +  parent_pb->header_.uuid.parts[0],
> +  &match, ofpacts_p, 
> &pb->header_.uuid,
>NX_CTLR_NO_METER, NULL, false);
>  }
>  }
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 3c888aaf5..b68678472 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -37794,3 +37794,37 @@ OVS_WAIT_FOR_OUTPUT([as hv1 ovs-ofctl dump-flows 
> br-int table=0 |grep priority=1
>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([Delete parent of container port])
> +ovn_start
> +
> +ovn-nbctl ls-add ls0
> +ovn-nbctl lsp-add ls0 lsp0
> +
> +# Add a second logical port, so that deleting lsp0 does not result in 
> deleting
> +# the last port of the datapath.
> +ovn-nbctl lsp-add ls0 lsp1
> +
> +check ovn-nbctl lsp-add ls0 lsp-cont1 lsp0 1
> +
> +net_add n1
> +
> +sim_add hv1
> +as hv1
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +
> +as hv1
> +ovs-vsctl -- add-port br-int vif \
> +  -- set Interface vif external-ids:iface-id=lsp0
> +ovs-vsctl -- add-port br-int vif1 \
> +  -- set Interface vif1 external-ids:iface-id=lsp1
> +
> +check ovn-nbctl --wait=hv lsp-del lsp0
> +CHECK_FLOWS_AFTER_RECOMPUTE([hv1], [hv1])
> +
> +OVN_CLEANUP([hv1])
> +
> +AT_CLEANUP
> +])
> --
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/5] controller: Nonvif related lports handling.

2024-05-30 Thread Numan Siddique
On Tue, Apr 23, 2024 at 7:54 AM Xavier Simonart  wrote:
>
> This patches fixes flows not properly deleted in scenarios similar to:
> foo1 (hv1) - foo - R1 - join - R2 (chassis = hv2) - alice - alice1 (hv2).
>
> When R2 is deleted, alice_R2 changed from l3gateway to patch, and, on hv1,
> alice_R2 was added to related ports.
> When R2 was added back (together with R2-alice and R2-join), alice_R2 changed
> back from patch to l3gateway and alice_R2 remained in related_lports on hv1.
>
> This is now fixed: a l3_gateway port is not a related_port if not for our 
> chassis.
>
> A test case has been modified to highlight the error, but also to
> make the test easier to read (e.g avoid using same name for a port and
> for a switch).
>
> Signed-off-by: Xavier Simonart 

Acked-by: Numan Siddique 

Numan

> ---
>  controller/binding.c | 12 
>  tests/ovn.at | 12 
>  2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index c9658cb2a..3b36ed881 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -1824,7 +1824,7 @@ consider_localport(const struct sbrec_port_binding *pb,
>   */
>  static bool
>  consider_nonvif_lport_(const struct sbrec_port_binding *pb,
> -   bool our_chassis,
> +   bool our_chassis, bool is_ha_chassis,
> struct binding_ctx_in *b_ctx_in,
> struct binding_ctx_out *b_ctx_out)
>  {
> @@ -1844,6 +1844,9 @@ consider_nonvif_lport_(const struct sbrec_port_binding 
> *pb,
> b_ctx_out->if_mgr,
> b_ctx_out->postponed_ports);
>  }
> +if (!is_ha_chassis) {
> +remove_related_lport(pb, b_ctx_out);
> +}
>
>  if (pb->chassis == b_ctx_in->chassis_rec
>  || is_additional_chassis(pb, b_ctx_in->chassis_rec)
> @@ -1867,7 +1870,7 @@ consider_l2gw_lport(const struct sbrec_port_binding *pb,
>  bool our_chassis = chassis_id && !strcmp(chassis_id,
>   b_ctx_in->chassis_rec->name);
>
> -return consider_nonvif_lport_(pb, our_chassis, b_ctx_in, b_ctx_out);
> +return consider_nonvif_lport_(pb, our_chassis, false, b_ctx_in, 
> b_ctx_out);
>  }
>
>  static bool
> @@ -1879,7 +1882,7 @@ consider_l3gw_lport(const struct sbrec_port_binding *pb,
>  bool our_chassis = chassis_id && !strcmp(chassis_id,
>   b_ctx_in->chassis_rec->name);
>
> -return consider_nonvif_lport_(pb, our_chassis, b_ctx_in, b_ctx_out);
> +return consider_nonvif_lport_(pb, our_chassis, false, b_ctx_in, 
> b_ctx_out);
>  }
>
>  static void
> @@ -1942,7 +1945,8 @@ consider_ha_lport(const struct sbrec_port_binding *pb,
>  update_related_lport(pb, b_ctx_out);
>  }
>
> -return consider_nonvif_lport_(pb, our_chassis, b_ctx_in, b_ctx_out);
> +return consider_nonvif_lport_(pb, our_chassis, is_ha_chassis, b_ctx_in,
> +  b_ctx_out);
>  }
>
>  static bool
> diff --git a/tests/ovn.at b/tests/ovn.at
> index f974cbb15..3c888aaf5 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -7781,9 +7781,9 @@ ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port 
> rp-foo \
>  type=router options:router-port=foo addresses=\"00:00:01:01:02:03\"
>
>  # Connect alice to R2
> -ovn-nbctl lrp-add R2 alice 00:00:02:01:02:03 172.16.1.1/24
> -ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> -type=router options:router-port=alice addresses=\"00:00:02:01:02:03\"
> +ovn-nbctl lrp-add R2 R2-alice 00:00:02:01:02:03 172.16.1.1/24
> +ovn-nbctl lsp-add alice alice-R2 -- set Logical_Switch_Port alice-R2 \
> +type=router options:router-port=R2-alice addresses=\"00:00:02:01:02:03\"
>
>  # Connect R1 to join
>  ovn-nbctl lrp-add R1 R1_join 00:00:04:01:02:03 20.0.0.1/24
> @@ -7871,11 +7871,13 @@ echo ""
>  echo $expected > expected
>  OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
>
> +CHECK_FLOWS_AFTER_RECOMPUTE([hv1], [hv1])
> +
>  # Delete the router and re-create it. Things should work as before.
>  ovn-nbctl  lr-del R2
>  ovn-nbctl create Logical_Router name=R2 options:chassis="hv2"
>  # Connect alice to R2
> -ovn-nbctl lrp-add R2 alice 00:00:02:01:02:03 172.16.1.1/24
> +ovn-nbctl lrp-add R2 R2-alice 00:00:02:01:02:03 172.16.1.1/24
>  # Connect R2 to join
>  ovn-nbctl lrp-add R2 R2_join 00:00:04:01:02:04 20.0.0.2/24
>
> @@ -7887,6 +7889,8 @@ R2 static_routes @lrt
>  wait_for_ports_up
>  check ovn-nbctl --wait=hv sync
>
> +CHECK_FLOWS_AFTER_RECOMPUTE([hv1], [hv1])
> +
>  # Send the packet again.
>  as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>
> --
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitc

Re: [ovs-dev] [PATCH ovn 1/5] controller: Fix iface-id-ver handling.

2024-05-30 Thread Numan Siddique
On Tue, Apr 23, 2024 at 7:53 AM Xavier Simonart  wrote:
>
> If iface-id-ver was wrong and modified to a correct value,
> the port was correctly claimed, but the flows were not installed
> by I+P.
>
> Signed-off-by: Xavier Simonart 

Acked-by: Numan Siddique 

Numan

> ---
>  controller/binding.c | 14 ++
>  tests/ovn.at |  2 ++
>  2 files changed, 16 insertions(+)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 8ac2ce3e2..c9658cb2a 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -757,6 +757,8 @@ update_active_pb_ras_pd(const struct sbrec_port_binding 
> *pb,
>  }
>  }
>
> +static bool is_ext_id_changed(const struct smap *a, const struct smap *b,
> +  const char *key);
>  static struct local_binding *local_binding_create(
>  const char *name, const struct ovsrec_interface *);
>  static void local_binding_add(struct shash *local_bindings,
> @@ -2311,6 +2313,18 @@ consider_iface_claim(const struct ovsrec_interface 
> *iface_rec,
>  return true;
>  }
>
> +/* Check if iface-id-ver just becomes correct */
> +struct smap *external_ids_old =
> +shash_find_data(b_ctx_in->iface_table_external_ids_old,
> +iface_rec->name);
> +
> +if (external_ids_old &&
> +is_ext_id_changed(&iface_rec->external_ids,
> +  external_ids_old,
> +  "iface-id-ver")) {
> +b_ctx_out->local_lports_changed = true;
> +}
> +
>  /* If multiple bindings to the same port, remove the "old" binding.
>   * This ensures that change tracking is correct.
>   */
> diff --git a/tests/ovn.at b/tests/ovn.at
> index e81cd4f45..f974cbb15 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -33239,6 +33239,8 @@ check as hv1 ovs-vsctl set interface vif11 
> external_ids:iface-id-ver=foo
>
>  wait_for_ports_up sw0-port1
>
> +CHECK_FLOWS_AFTER_RECOMPUTE([hv1], [hv1])
> +
>  AT_CHECK([as hv1 ovn-appctl -t ovn-controller debug/dump-local-bindings], 
> [0], [dnl
>  Local bindings:
>  name: [[sw0-port1]], OVS interface name : [[vif11]], num binding lports : 
> [[1]]
> --
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Fix logical router load-balancer nat rules when using DGP.

2024-05-30 Thread Roberto Bartzen Acosta
Em qui., 2 de mai. de 2024 às 10:11, Numan Siddique 
escreveu:

> On Fri, Mar 22, 2024 at 7:51 AM Roberto Bartzen Acosta
>  wrote:
> >
> > Hi Mark,
> >
> > Thanks for your feedback.
> >
> > Em seg., 18 de mar. de 2024 às 11:53, Mark Michelson <
> mmich...@redhat.com>
> > escreveu:
> >
> > > Hi Roberto,
> > >
> > > I have some concerns about this patch. Let's use the test case you
> added
> > > as an example network. Let's bind the vms and DGPs to hypervisors:
> > >
> > > * vm1 and lr1-ts1 are bound to hypervisor hv1
> > > * vm2 and lr1-ts2 are bound to hypervisor hv2
> > >
> > > Now imagine a packet arrives on lr1-ts1. The packet gets load balanced
> > > and sent to vm2. In this case, since lr1-ts1 is on hv1, the
> ct_lb_mark()
> > > action runs there, creating a conntrack entry on hv1. However, the
> > > packet eventually is tunneled to hv2 so that it can be output to vm2.
> > >
> > > Now vm2 replies to the packet. Is there anything that ensures that the
> > > reply from vm2 gets sent to hv1 for the egress pipeline of lr1? If so,
> > > then this should work fine since the packet will be unDNATted as
> > > expected on hv1. However, if the packet runs all of lr1's pipelines on
> > > hv2, then there is no conntrack entry present, and the attempt to
> unDNAT
> > > will not succeed. The packet will either be dropped because of invalid
> > > CT, or the packet will have an incorrect source IP and port. Either
> way,
> > > this isn't what is desired.
> > >
> >
> > yep, you're right! that makes sense.
> > If the packet comes back with a chassis that does not have the related LB
> > contrack entry corresponding to the initial request, this will trigger a
> > miss in the pipeline.
> >
> > I tried to disable ct tracking for lb by configuring the parameter on the
> > router, but I still wasn't successful. E.g.:
> > options:lb_force_snat_ip="172.16.200.201"
> >
> > Even changing the lb_force snat ip on the router, or creating a SNAT
> > "stateless" rule, I still see this action in the output pipeline in the
> > highest priority table (table=1).
> >
> >   table=1 (lr_out_undnat  ), priority=120  , match=(ip4 && ((ip4.src
> ==
> > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src
> ==
> > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src ==
> > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src
> ==
> > 8000)) && outport == "incus-net40-lr-lrp-01" &&
> > is_chassis_resident("cr-incus-net40-lr-lrp-01")),
> action=(ct_dnat_in_czone;)
> >   table=1 (lr_out_undnat  ), priority=120  , match=(ip4 && ((ip4.src
> ==
> > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src
> ==
> > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src ==
> > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src
> ==
> > 8000)) && outport == "incus-net40-lr-lrp-02" &&
> > is_chassis_resident("cr-incus-net40-lr-lrp-02")),
> action=(ct_dnat_in_czone;)
> >   table=1 (lr_out_undnat  ), priority=120  , match=(ip4 && ((ip4.src
> ==
> > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src
> ==
> > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src ==
> > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src
> ==
> > 8000)) && outport == "incus-net40-lr-lrp-03" &&
> > is_chassis_resident("cr-incus-net40-lr-lrp-03")),
> action=(ct_dnat_in_czone;)
> >   table=1 (lr_out_undnat  ), priority=120  , match=(ip4 && ((ip4.src
> ==
> > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src
> ==
> > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src ==
> > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src
> ==
> > 8000)) && outport == "incus-net40-lr-lrp-04" &&
> > is_chassis_resident("cr-incus-net40-lr-lrp-04")),
> action=(ct_dnat_in_czone;)
> >   table=1 (lr_out_undnat  ), priority=120  , match=(ip4 && ((ip4.src
> ==
> > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src
> ==
> > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src ==
> > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src
> ==
> > 8000)) && outport == "incus-net40-lr-lrp-ext" &&
> > is_chassis_resident("cr-incus-net40-lr-lrp-ext")),
> > action=(ct_dnat_in_czone;)
> >
> >
> > Considering that the return when using multiple DGPs is probably
> associated
> > with ECMP, any idea on how to change the rules so that the LB output
> rules
> > are 'stateless' (not ct dependent) in this case with multiple DGPs? I
> > imagine this solves the problem and guarantees a return for any chassis.
>
> I think the only way to solve your problem is to add NAT support for a
> router having multiple DGPs.
> I'm not sure how easy that is or if it is even possible to support.
> We need to explore this.
>


Hi Numan, thanks for your feedback.

I'm not convinced that adding stateful NAT support to a distributed port
would work and/or be a g

Re: [ovs-dev] [v4] odp-execute: Fix AVX checksum calculation.

2024-05-30 Thread Eelco Chaudron


On 30 May 2024, at 15:28, Eelco Chaudron wrote:

> On 30 May 2024, at 14:46, Finn, Emma wrote:
>
>>> -Original Message-
>>> From: Eelco Chaudron 
>>> Sent: Wednesday, May 29, 2024 3:23 PM
>>> To: Finn, Emma 
>>> Cc: Ilya Maximets ; ovs-dev@openvswitch.org; Van
>>> Haaren, Harry 
>>> Subject: Re: [v4] odp-execute: Fix AVX checksum calculation.
>>>
>>>
>>>
>>> On 29 May 2024, at 14:51, Ilya Maximets wrote:
>>>
 On 5/29/24 11:01, Eelco Chaudron wrote:
>
>
> On 28 May 2024, at 16:49, Ilya Maximets wrote:
>
>> On 5/28/24 14:36, Eelco Chaudron wrote:
>>>
>>>
>>> On 24 May 2024, at 11:20, Emma Finn wrote:
>>>
 The AVX implementation for calcualting checksums was not handling
 carry-over addition correctly in some cases.
 This patch adds an additional shuffle to add 16-bit padding to the
 final part of the calculation to handle such cases. This commit
 also adds a unit test to check the checksum carry-bits issue with
 actions autovalidator enabled.
>>>
>>> Hi Emma,
>>>
>>> I made the small changes, and did some more testing before I committed.
>>> However, there are more failures in the same area with or without your 
>>> patch.
>>> I’m holding of committing this patch as it might be related.
>>>
>>
>> Hi Eelco,
>>
>> These tests are unrelated to this patch so I think we should go ahead and 
>> merge this.
>
> Ok, I’ll go ahead and apply it later today.
>
>>> The failing tests are (on latest main branch):
>>>
>>> 1064: ofproto - implicit mask of ipv6 proto with HOPOPT field FAILED
>>> (ofproto.at:6668)
>>
>> I investigated this test and the SIMD implementation isn't handling traffic 
>> class field correctly. I'm on PTO for the next week but I will make a fix 
>> for this once I'm back.
>
> Thanks!
>
>>> 2615: nsh - triangle PTAP bridge setup with NSH over vxlan-gpe FAILED
>>> (nsh.at:816)
>>>
>> For this one it looks like the scalar is expecting an ipv4 checksum of 0x000 
>> and the SIMD implementation has calculated an ipv4 checksum of 0xDF77.
>> This is more a logic question whether or not the checksum should be 
>> calculated for this? Thoughts?
>
> I need to look at the tests, but if it’s a UDP packet, and the original UDP 
> checksum was 0, it should stay zero.


In addition, any idea why these tests do not fail in Intel’s upstream unit 
tests? Do they use different hardware? Copied in Michael, maybe he knows more 
about the setup/tests.

//Eelco

>>> Here are some details:
>>>
>>> 2024-05-29T14:18:53.923Z|00119|odp_execute_impl|ERR|Autovalidation
>>> of avx512 failed. Details:
>>> Packet: 0
>>> Action : set(ipv6(tclass=0x2/0x3))
>>> Good hex:
>>>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 60 20
>>> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
>>> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
>>> 0030  00 00 00 00 00 05 00 00-1b fc 00 00 00 00 00 01
>>> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
>>> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
>>> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
>>> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f Test hex:
>>>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 60 00
>>> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
>>> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
>>> 0030  00 00 00 00 00 05 00 00-1b fc 00 00 00 00 00 01
>>> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
>>> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
>>> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
>>> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f 2024-05-
>>> 29T14:18:53.926Z|00120|unixctl|DBG|received request netdev-
>>> dummy/receive["p1","in_port(1),eth(src=50:54:00:00:00:0b,dst=50:54:00:0
>>> 0:00:0c),eth_type(0x86dd),ipv6(src=2001:db8::1,dst=111:db8::6,proto=1,tcl
>>> ass=0,hlimit=64,frag=no),icmpv6(type=0,code=8)"], id=0 2024-05-
>>> 29T14:18:53.926Z|00121|unixctl|DBG|replying with success, id=0: ""
>>> 2024-05-29T14:18:53.926Z|00122|odp_execute_impl|ERR|Autovalidation
>>> of avx512 failed. Details:
>>> Packet: 0
>>> Action : set(ipv6(tclass=0x40/0xfc))
>>> Good hex:
>>>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 64 00
>>> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
>>> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
>>> 0030  00 00 00 00 00 06 00 00-1b fc 00 00 00 00 00 01
>>> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
>>> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
>>> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
>>> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f Test hex:
>>>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 60 00
>>> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
>>> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
>>> 0030  00 00 00 00 00 06 00 00-1b fc 00 00 00 00 00 01
>>> 0040  02 03 04 05 06 07 08 09-

Re: [ovs-dev] [v4] odp-execute: Fix AVX checksum calculation.

2024-05-30 Thread Eelco Chaudron


On 30 May 2024, at 14:46, Finn, Emma wrote:

>> -Original Message-
>> From: Eelco Chaudron 
>> Sent: Wednesday, May 29, 2024 3:23 PM
>> To: Finn, Emma 
>> Cc: Ilya Maximets ; ovs-dev@openvswitch.org; Van
>> Haaren, Harry 
>> Subject: Re: [v4] odp-execute: Fix AVX checksum calculation.
>>
>>
>>
>> On 29 May 2024, at 14:51, Ilya Maximets wrote:
>>
>>> On 5/29/24 11:01, Eelco Chaudron wrote:


 On 28 May 2024, at 16:49, Ilya Maximets wrote:

> On 5/28/24 14:36, Eelco Chaudron wrote:
>>
>>
>> On 24 May 2024, at 11:20, Emma Finn wrote:
>>
>>> The AVX implementation for calcualting checksums was not handling
>>> carry-over addition correctly in some cases.
>>> This patch adds an additional shuffle to add 16-bit padding to the
>>> final part of the calculation to handle such cases. This commit
>>> also adds a unit test to check the checksum carry-bits issue with
>>> actions autovalidator enabled.
>>
>> Hi Emma,
>>
>> I made the small changes, and did some more testing before I committed.
>> However, there are more failures in the same area with or without your patch.
>> I’m holding of committing this patch as it might be related.
>>
>
> Hi Eelco,
>
> These tests are unrelated to this patch so I think we should go ahead and 
> merge this.

Ok, I’ll go ahead and apply it later today.

>> The failing tests are (on latest main branch):
>>
>> 1064: ofproto - implicit mask of ipv6 proto with HOPOPT field FAILED
>> (ofproto.at:6668)
>
> I investigated this test and the SIMD implementation isn't handling traffic 
> class field correctly. I'm on PTO for the next week but I will make a fix for 
> this once I'm back.

Thanks!

>> 2615: nsh - triangle PTAP bridge setup with NSH over vxlan-gpe FAILED
>> (nsh.at:816)
>>
> For this one it looks like the scalar is expecting an ipv4 checksum of 0x000 
> and the SIMD implementation has calculated an ipv4 checksum of 0xDF77.
> This is more a logic question whether or not the checksum should be 
> calculated for this? Thoughts?

I need to look at the tests, but if it’s a UDP packet, and the original UDP 
checksum was 0, it should stay zero.

>> Here are some details:
>>
>> 2024-05-29T14:18:53.923Z|00119|odp_execute_impl|ERR|Autovalidation
>> of avx512 failed. Details:
>> Packet: 0
>> Action : set(ipv6(tclass=0x2/0x3))
>> Good hex:
>>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 60 20
>> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
>> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
>> 0030  00 00 00 00 00 05 00 00-1b fc 00 00 00 00 00 01
>> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
>> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
>> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
>> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f Test hex:
>>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 60 00
>> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
>> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
>> 0030  00 00 00 00 00 05 00 00-1b fc 00 00 00 00 00 01
>> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
>> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
>> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
>> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f 2024-05-
>> 29T14:18:53.926Z|00120|unixctl|DBG|received request netdev-
>> dummy/receive["p1","in_port(1),eth(src=50:54:00:00:00:0b,dst=50:54:00:0
>> 0:00:0c),eth_type(0x86dd),ipv6(src=2001:db8::1,dst=111:db8::6,proto=1,tcl
>> ass=0,hlimit=64,frag=no),icmpv6(type=0,code=8)"], id=0 2024-05-
>> 29T14:18:53.926Z|00121|unixctl|DBG|replying with success, id=0: ""
>> 2024-05-29T14:18:53.926Z|00122|odp_execute_impl|ERR|Autovalidation
>> of avx512 failed. Details:
>> Packet: 0
>> Action : set(ipv6(tclass=0x40/0xfc))
>> Good hex:
>>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 64 00
>> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
>> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
>> 0030  00 00 00 00 00 06 00 00-1b fc 00 00 00 00 00 01
>> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
>> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
>> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
>> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f Test hex:
>>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 60 00
>> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
>> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
>> 0030  00 00 00 00 00 06 00 00-1b fc 00 00 00 00 00 01
>> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
>> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
>> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
>> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f
>>
>> And
>>
>> 2024-05-29T14:18:54.503Z|00659|odp_execute_impl|ERR|Autovalidation
>> of avx512 failed. Details:
>> Packet: 0
>> Action : set(ipv4(src=30.0.0.1

[ovs-dev] [PATCH v4 6/6] netdev-dpdk: Refactor tunnel checksum offloading.

2024-05-30 Thread David Marchand
All informations required for checksum offloading can be deducted by
already tracked dp_packet l3_ofs, l4_ofs, inner_l3_ofs and inner_l4_ofs
fields.
Remove DPDK specific l[2-4]_len from generic OVS code.

netdev-dpdk code then fills mbuf specifics step by step:
- outer_l2_len and outer_l3_len are needed for tunneling (and below
  features),
- l2_len and l3_len are needed for IP and L4 checksum (and below features),
- l4_len and tso_segsz are needed when doing TSO,

Signed-off-by: David Marchand 
---
 lib/dp-packet.h | 37 --
 lib/netdev-dpdk.c   | 35 ++---
 lib/netdev-native-tnl.c | 50 +
 3 files changed, 27 insertions(+), 95 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 3622764c47..a75b1c5cdb 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -604,25 +604,6 @@ dp_packet_get_nd_payload(const struct dp_packet *b)
 }
 
 #ifdef DPDK_NETDEV
-static inline void
-dp_packet_set_l2_len(struct dp_packet *b, size_t l2_len)
-{
-b->mbuf.l2_len = l2_len;
-}
-
-static inline void
-dp_packet_set_l3_len(struct dp_packet *b, size_t l3_len)
-{
-b->mbuf.l3_len = l3_len;
-}
-
-static inline void
-dp_packet_set_l4_len(struct dp_packet *b, size_t l4_len)
-{
-b->mbuf.l4_len = l4_len;
-}
-
-
 static inline uint64_t *
 dp_packet_ol_flags_ptr(const struct dp_packet *b)
 {
@@ -642,24 +623,6 @@ dp_packet_flow_mark_ptr(const struct dp_packet *b)
 }
 
 #else
-static inline void
-dp_packet_set_l2_len(struct dp_packet *b OVS_UNUSED, size_t l2_len OVS_UNUSED)
-{
-/* There is no implementation. */
-}
-
-static inline void
-dp_packet_set_l3_len(struct dp_packet *b OVS_UNUSED, size_t l3_len OVS_UNUSED)
-{
-/* There is no implementation. */
-}
-
-static inline void
-dp_packet_set_l4_len(struct dp_packet *b OVS_UNUSED, size_t l4_len OVS_UNUSED)
-{
-/* There is no implementation. */
-}
-
 static inline uint32_t *
 dp_packet_ol_flags_ptr(const struct dp_packet *b)
 {
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index bda3fa94b6..0fa37d5145 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2583,6 +2583,9 @@ static bool
 netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
 {
 struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf);
+void *l2;
+void *l3;
+void *l4;
 
 const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM |
  RTE_MBUF_F_TX_L4_MASK |
@@ -2612,11 +2615,6 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 return true;
 }
 
-ovs_assert(dp_packet_l4(pkt));
-
-/* If packet is vxlan or geneve tunnel packet, calculate outer
- * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated
- * before. */
 const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK;
 if (OVS_UNLIKELY(tunnel_type &&
  tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE &&
@@ -2634,6 +2632,11 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
  (char *) dp_packet_eth(pkt);
 mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
  (char *) dp_packet_l3(pkt);
+
+/* Inner L2 length must account for the tunnel header length. */
+l2 = dp_packet_l4(pkt);
+l3 = dp_packet_inner_l3(pkt);
+l4 = dp_packet_inner_l4(pkt);
 } else {
 /* If no outer offloading is requested, clear outer marks. */
 mbuf->ol_flags &= ~all_outer_marks;
@@ -2641,8 +2644,9 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 mbuf->outer_l3_len = 0;
 
 /* Skip outer headers. */
-mbuf->l2_len += (char *) dp_packet_l4(pkt) -
-(char *) dp_packet_eth(pkt);
+l2 = dp_packet_eth(pkt);
+l3 = dp_packet_inner_l3(pkt);
+l4 = dp_packet_inner_l4(pkt);
 }
 } else {
 if (tunnel_type) {
@@ -2662,22 +2666,27 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 }
 mbuf->outer_l2_len = 0;
 mbuf->outer_l3_len = 0;
-mbuf->l2_len = (char *) dp_packet_l3(pkt) -
-   (char *) dp_packet_eth(pkt);
-mbuf->l3_len = (char *) dp_packet_l4(pkt) -
-   (char *) dp_packet_l3(pkt);
+
+l2 = dp_packet_eth(pkt);
+l3 = dp_packet_l3(pkt);
+l4 = dp_packet_l4(pkt);
 }
 
+ovs_assert(l4);
+
+mbuf->l2_len = (char *) l3 - (char *) l2;
+mbuf->l3_len = (char *) l4 - (char *) l3;
+
 if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-struct tcp_header *th = dp_packet_l4(pkt);
+struct tcp_header *th = l4;
 uint16_t link_tso_segsz;
 int hdr_len;
 
+mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
  

[ovs-dev] [PATCH v4 5/6] netdev-dpdk: Use guest TSO segmentation size hint.

2024-05-30 Thread David Marchand
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|2|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
Signed-off-by: David Marchand 
---
Changes since v3:
- removed check on mbuf->tso_segsz == 0,

---
 lib/netdev-dpdk.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0dfd685467..bda3fa94b6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2670,14 +2670,19 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 
 if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
 struct tcp_header *th = dp_packet_l4(pkt);
+uint16_t link_tso_segsz;
 int hdr_len;
 
 if (tunnel_type) {
-mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
-  mbuf->l4_len - mbuf->outer_l3_len;
+link_tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
+ mbuf->l4_len - mbuf->outer_l3_len;
 } else {
 mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
-mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
+link_tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
+}
+
+if (mbuf->tso_segsz > link_tso_segsz) {
+mbuf->tso_segsz = link_tso_segsz;
 }
 
 hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len;
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 4/6] netdev-dpdk: Refactor TSO request code.

2024-05-30 Thread David Marchand
Every L3, L4 checksum offload or TSO requires a (outer) L3 length to be
provided.
This length is computed via dp_packet_l4(pkt) that is always set when
such offloads are requested in OVS.
Getting a th == NULL is a bug in OVS, so an assert() is more appropriate.

Besides, filling l4_len and tso_segsz only matters to TSO, so there is
no need to set it for other L4 checksum offloading requests.

Signed-off-by: David Marchand 
---
Changes since v3:
- reworded commitlog,

---
 lib/netdev-dpdk.c | 36 +++-
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0c624d5d38..0dfd685467 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2583,7 +2583,6 @@ static bool
 netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
 {
 struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf);
-struct tcp_header *th;
 
 const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM |
  RTE_MBUF_F_TX_L4_MASK |
@@ -2613,6 +2612,8 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 return true;
 }
 
+ovs_assert(dp_packet_l4(pkt));
+
 /* If packet is vxlan or geneve tunnel packet, calculate outer
  * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated
  * before. */
@@ -2666,22 +2667,10 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 mbuf->l3_len = (char *) dp_packet_l4(pkt) -
(char *) dp_packet_l3(pkt);
 }
-th = dp_packet_l4(pkt);
 
 if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-if (!th) {
-VLOG_WARN_RL(&rl, "%s: TCP Segmentation without L4 header"
- " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len);
-return false;
-}
-}
-
-if ((mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) == RTE_MBUF_F_TX_TCP_CKSUM) {
-if (!th) {
-VLOG_WARN_RL(&rl, "%s: TCP offloading without L4 header"
- " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len);
-return false;
-}
+struct tcp_header *th = dp_packet_l4(pkt);
+int hdr_len;
 
 if (tunnel_type) {
 mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
@@ -2691,16 +2680,13 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
 }
 
-if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-int hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len;
-if (OVS_UNLIKELY((hdr_len +
-  mbuf->tso_segsz) > dev->max_packet_len)) {
-VLOG_WARN_RL(&rl, "%s: Oversized TSO packet. hdr: %"PRIu32", "
- "gso: %"PRIu32", max len: %"PRIu32"",
- dev->up.name, hdr_len, mbuf->tso_segsz,
- dev->max_packet_len);
-return false;
-}
+hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len;
+if (OVS_UNLIKELY((hdr_len + mbuf->tso_segsz) > dev->max_packet_len)) {
+VLOG_WARN_RL(&rl, "%s: Oversized TSO packet. hdr: %"PRIu32", "
+ "gso: %"PRIu32", max len: %"PRIu32"",
+ dev->up.name, hdr_len, mbuf->tso_segsz,
+ dev->max_packet_len);
+return false;
 }
 }
 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 3/6] netdev-dpdk: Fix inner checksum when outer is not supported.

2024-05-30 Thread David Marchand
If outer checksum is not supported and OVS already set L3/L4 outer
checksums in the packet, no outer mark should be left in ol_flags
(as it confuses some driver, like net/ixgbe).

l2_len must be adjusted to account for the tunnel header.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: David Marchand 
Acked-by: Kevin Traynor 
---
 lib/netdev-dpdk.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 7c910cac8e..0c624d5d38 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2628,10 +2628,21 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 }
 
 if (tunnel_type && (mbuf->ol_flags & all_inner_requests)) {
-mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) -
- (char *) dp_packet_eth(pkt);
-mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
- (char *) dp_packet_l3(pkt);
+if (mbuf->ol_flags & all_outer_requests) {
+mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) -
+ (char *) dp_packet_eth(pkt);
+mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
+ (char *) dp_packet_l3(pkt);
+} else {
+/* If no outer offloading is requested, clear outer marks. */
+mbuf->ol_flags &= ~all_outer_marks;
+mbuf->outer_l2_len = 0;
+mbuf->outer_l3_len = 0;
+
+/* Skip outer headers. */
+mbuf->l2_len += (char *) dp_packet_l4(pkt) -
+(char *) dp_packet_eth(pkt);
+}
 } else {
 if (tunnel_type) {
 /* No inner offload is requested, fallback to non tunnel
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 2/6] netdev-dpdk: Disable outer UDP checksum for net/iavf.

2024-05-30 Thread David Marchand
Same as the commit 6f93d8e62f13 ("netdev-dpdk: Disable outer UDP checksum
offload for ice/i40e driver."), disable outer UDP checksum and related
offloads for net/iavf.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: David Marchand 
Acked-by: Kevin Traynor 
---
Note:
- DPDK (in progress) fixes can be found at:
  https://patchwork.dpdk.org/project/dpdk/list/?series=31780&state=*

---
 lib/netdev-dpdk.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e15b491ed5..7c910cac8e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1355,12 +1355,14 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 }
 
 if (!strcmp(info.driver_name, "net_ice")
-|| !strcmp(info.driver_name, "net_i40e")) {
+|| !strcmp(info.driver_name, "net_i40e")
+|| !strcmp(info.driver_name, "net_iavf")) {
 /* FIXME: Driver advertises the capability but doesn't seem
  * to actually support it correctly.  Can remove this once
  * the driver is fixed on DPDK side. */
 VLOG_INFO("%s: disabled Tx outer udp checksum offloads for a "
-  "net/ice or net/i40e port.", netdev_get_name(&dev->up));
+  "net/ice, net/i40e or net/iavf port.",
+  netdev_get_name(&dev->up));
 info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM;
 info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO;
 info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO;
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 1/6] netdev-dpdk: Fallback to non tunnel checksum offloading.

2024-05-30 Thread David Marchand
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782c1906 ("netdev-native-tnl: Mark all vxlan/geneve packets as 
tunneled.")
Signed-off-by: David Marchand 
Acked-by: Kevin Traynor 
---
Changes since v2:
- kept offloads disabled for net/i40e and net/ice as this patch does not
  fix outer udp checksum (a DPDK fix is required),
- updated commitlog with details to reproduce the issue,
- adjusted indent,

Changes since v1:
- reset inner marks before converting outer requests,
- fixed some coding style,

---
 lib/netdev-dpdk.c | 71 +++
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 7b84c858e9..e15b491ed5 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2583,16 +2583,18 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf);
 struct tcp_header *th;
 
-const uint64_t all_requests = (RTE_MBUF_F_TX_IP_CKSUM |
-   RTE_MBUF_F_TX_L4_MASK  |
-   RTE_MBUF_F_TX_OUTER_IP_CKSUM  |
-   RTE_MBUF_F_TX_OUTER_UDP_CKSUM |
-   RTE_MBUF_F_TX_TCP_SEG);
-const uint64_t all_marks = (RTE_MBUF_F_TX_IPV4 |
-RTE_MBUF_F_TX_IPV6 |
-RTE_MBUF_F_TX_OUTER_IPV4 |
-RTE_MBUF_F_TX_OUTER_IPV6 |
-RTE_MBUF_F_TX_TUNNEL_MASK);
+const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM |
+ RTE_MBUF_F_TX_L4_MASK |
+ RTE_MBUF_F_TX_TCP_SEG);
+const uint64_t all_outer_requests = (RTE_MBUF_F_TX_OUTER_IP_CKSUM |
+ RTE_MBUF_F_TX_OUTER_UDP_CKSUM);
+const uint64_t all_requests = all_inner_requests | all_outer_requests;
+const uint64_t all_inner_marks = (RTE_MBUF_F_TX_IPV4 |
+  RTE_MBUF_F_TX_IPV6);
+const uint64_t all_outer_marks = (RTE_MBUF_F_TX_OUTER_IPV4 |
+  RTE_MBUF_F_TX_OUTER_IPV6 |
+  RTE_MBUF_F_TX_TUNNEL_MASK);
+const uint64_t all_marks = all_inner_marks | all_outer_marks;
 
 if (!(mbuf->ol_flags & all_requests)) {
 /* No offloads requested, no marks should be set. */
@@ -2613,34 +2615,43 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
  * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated
  * before. */
 const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK;
-if (tunnel_type == RTE_MBUF_F_TX_TUNNEL_GENEVE ||
-tunnel_type == RTE_MBUF_F_TX_TUNNEL_VXLAN) {
-mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) -
- (char *) dp_packet_eth(pkt);
-mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
- (char *) dp_packet_l3(pkt);
-
-/* If neither inner checksums nor TSO is requested, inner marks
- * should not be set. */
-if (!(mbuf->ol_flags & (RTE_MBUF_F_TX_IP_CKSUM |
-RTE_MBUF_F_TX_L4_MASK  |
-RTE_MBUF_F_TX_TCP_SEG))) {
-mbuf->ol_flags &= ~(RTE_MBUF_F_TX_IPV4 |
-RTE_MBUF_F_TX_IPV6);
-}
-} else if (OVS_UNLIKELY(tunnel_type)) {
+if (OVS_UNLIKELY(tunnel_type &&
+ tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE &&
+ tunnel_type != RTE_MBUF_F_TX_TUNNEL_VXLAN)) {
 VLOG_WARN_RL(&rl, "%s: Unexpected tunnel type: %#"PRIx64,
  netdev_get_name(&dev->up), tunnel_type);
 netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up),
   "Packet with unexpected tunnel type", mbuf);
 return false;
+}
+
+if (tunnel_type && (mbuf->ol_flags & all_inner_requests)) {
+mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) -
+ (char *) dp_packet_eth(pkt);
+   

Re: [ovs-dev] [v4] odp-execute: Fix AVX checksum calculation.

2024-05-30 Thread Finn, Emma
> -Original Message-
> From: Eelco Chaudron 
> Sent: Wednesday, May 29, 2024 3:23 PM
> To: Finn, Emma 
> Cc: Ilya Maximets ; ovs-dev@openvswitch.org; Van
> Haaren, Harry 
> Subject: Re: [v4] odp-execute: Fix AVX checksum calculation.
> 
> 
> 
> On 29 May 2024, at 14:51, Ilya Maximets wrote:
> 
> > On 5/29/24 11:01, Eelco Chaudron wrote:
> >>
> >>
> >> On 28 May 2024, at 16:49, Ilya Maximets wrote:
> >>
> >>> On 5/28/24 14:36, Eelco Chaudron wrote:
> 
> 
>  On 24 May 2024, at 11:20, Emma Finn wrote:
> 
> > The AVX implementation for calcualting checksums was not handling
> > carry-over addition correctly in some cases.
> > This patch adds an additional shuffle to add 16-bit padding to the
> > final part of the calculation to handle such cases. This commit
> > also adds a unit test to check the checksum carry-bits issue with
> > actions autovalidator enabled.
> 
> Hi Emma,
> 
> I made the small changes, and did some more testing before I committed.
> However, there are more failures in the same area with or without your patch.
> I’m holding of committing this patch as it might be related.
> 

Hi Eelco, 

These tests are unrelated to this patch so I think we should go ahead and merge 
this.

> The failing tests are (on latest main branch):
> 
> 1064: ofproto - implicit mask of ipv6 proto with HOPOPT field FAILED
> (ofproto.at:6668)

I investigated this test and the SIMD implementation isn't handling traffic 
class field correctly. I'm on PTO for the next week but I will make a fix for 
this once I'm back. 

> 2615: nsh - triangle PTAP bridge setup with NSH over vxlan-gpe FAILED
> (nsh.at:816)
> 
For this one it looks like the scalar is expecting an ipv4 checksum of 0x000 
and the SIMD implementation has calculated an ipv4 checksum of 0xDF77. 
This is more a logic question whether or not the checksum should be calculated 
for this? Thoughts?
 
Thanks, 
Emma 
> 
> Here are some details:
> 
> 2024-05-29T14:18:53.923Z|00119|odp_execute_impl|ERR|Autovalidation
> of avx512 failed. Details:
> Packet: 0
> Action : set(ipv6(tclass=0x2/0x3))
> Good hex:
>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 60 20
> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
> 0030  00 00 00 00 00 05 00 00-1b fc 00 00 00 00 00 01
> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f Test hex:
>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 60 00
> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
> 0030  00 00 00 00 00 05 00 00-1b fc 00 00 00 00 00 01
> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f 2024-05-
> 29T14:18:53.926Z|00120|unixctl|DBG|received request netdev-
> dummy/receive["p1","in_port(1),eth(src=50:54:00:00:00:0b,dst=50:54:00:0
> 0:00:0c),eth_type(0x86dd),ipv6(src=2001:db8::1,dst=111:db8::6,proto=1,tcl
> ass=0,hlimit=64,frag=no),icmpv6(type=0,code=8)"], id=0 2024-05-
> 29T14:18:53.926Z|00121|unixctl|DBG|replying with success, id=0: ""
> 2024-05-29T14:18:53.926Z|00122|odp_execute_impl|ERR|Autovalidation
> of avx512 failed. Details:
> Packet: 0
> Action : set(ipv6(tclass=0x40/0xfc))
> Good hex:
>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 64 00
> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
> 0030  00 00 00 00 00 06 00 00-1b fc 00 00 00 00 00 01
> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f Test hex:
>   50 54 00 00 00 0c 50 54-00 00 00 0b 86 dd 60 00
> 0010  00 00 00 48 01 40 20 01-0d b8 00 00 00 00 00 00
> 0020  00 00 00 00 00 01 01 11-0d b8 00 00 00 00 00 00
> 0030  00 00 00 00 00 06 00 00-1b fc 00 00 00 00 00 01
> 0040  02 03 04 05 06 07 08 09-0a 0b 0c 0d 0e 0f 10 11
> 0050  12 13 14 15 16 17 18 19-1a 1b 1c 1d 1e 1f 20 21
> 0060  22 23 24 25 26 27 28 29-2a 2b 2c 2d 2e 2f 30 31
> 0070  32 33 34 35 36 37 38 39-3a 3b 3c 3d 3e 3f
> 
> And
> 
> 2024-05-29T14:18:54.503Z|00659|odp_execute_impl|ERR|Autovalidation
> of avx512 failed. Details:
> Packet: 0
> Action : set(ipv4(src=30.0.0.1,dst=30.0.0.3))
> Good hex:
>   aa 55 00 00 00 03 aa 55-00 00 00 01 08 00 45 00
> 0010  00 90 00 00 40 00 40 11-00 00 1e 00 00 01 1e 00
> 0020  00 03 e8 20 12 b5 00 7c-00 00 0c 00 00 04 00 00
> 0030  00 00 0f c6 01 01 00 30-00 ff 00 00 00 00 00 00
> 0040  00 00 0

Re: [ovs-dev] [PATCH v2] dpdk: Use DPDK 23.11.1 release.

2024-05-30 Thread Eelco Chaudron



On 28 May 2024, at 11:25, Kevin Traynor wrote:

> Update the CI and docs to use DPDK 23.11.1.
>
> Signed-off-by: Kevin Traynor 

Thanks Kevin, changes look good to me.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 branch-3.3] dpdk: Use DPDK 23.11.1 release for OVS 3.3.

2024-05-30 Thread Eelco Chaudron



On 28 May 2024, at 11:25, Kevin Traynor wrote:

> Update the CI and docs to use DPDK 23.11.1.
>
> Signed-off-by: Kevin Traynor 

Thanks Kevin, changes look good to me.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 branch-3.2] dpdk: Use DPDK 22.11.5 release for OVS 3.2.

2024-05-30 Thread Eelco Chaudron



On 28 May 2024, at 11:25, Kevin Traynor wrote:

> Update the CI and docs to use DPDK 22.11.5.
>
> Signed-off-by: Kevin Traynor 

Thanks Kevin, changes look good to me.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 branch-3.1] dpdk: Use DPDK 22.11.5 release for OVS 3.1.

2024-05-30 Thread Eelco Chaudron



On 28 May 2024, at 11:25, Kevin Traynor wrote:

> Update the CI and docs to use DPDK 22.11.5.
>
> Signed-off-by: Kevin Traynor 

Thanks Kevin, changes look good to me.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 branch-3.0] dpdk: Use DPDK 21.11.7 release for OVS 3.0.

2024-05-30 Thread Eelco Chaudron



On 28 May 2024, at 11:25, Kevin Traynor wrote:

> Update the CI and docs to use DPDK 21.11.7.
>
> Signed-off-by: Kevin Traynor 

Thanks Kevin, changes look good to me.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 branch-2.17] dpdk: Use DPDK 21.11.7 release for OVS 2.17.

2024-05-30 Thread Eelco Chaudron



On 28 May 2024, at 11:25, Kevin Traynor wrote:

> Update the CI and docs to use DPDK 21.11.7.
>
> Signed-off-by: Kevin Traynor 

Thanks Kevin, changes look good to me.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 4/8] sflow: Use uint32_t instead of time_t for tick handling in the poller.

2024-05-30 Thread Eelco Chaudron



On 29 May 2024, at 12:53, Eelco Chaudron wrote:

> The sFlow library uses a uint32_t to configure timeout ticks, but
> stores this value as a time_t. Although this doesn't cause functional
> issues, it wastes space and confuses Coverity, potentially indicating
> a Y2K38 problem when storing uint32_t values in time_t. This patch
> updates the internal data structures to use uint32_t variables.
>
> Fixes: c72e245a0e2c ("Add InMon's sFlow Agent library to the build system.")
> Acked-by: Mike Pattrick 
> Signed-off-by: Eelco Chaudron 
> ---
>  lib/sflow_api.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/sflow_api.h b/lib/sflow_api.h
> index eb23e2acd..f4bfa5ead 100644
> --- a/lib/sflow_api.h
> +++ b/lib/sflow_api.h
> @@ -148,7 +148,7 @@ typedef struct _SFLPoller {
>  /* MIB fields */
>  SFLDataSource_instance dsi;
>  u_int32_t sFlowCpReceiver;
> -time_t sFlowCpInterval;
> +u_int32_t sFlowCpInterval;
>  /* public fields */
>  struct _SFLAgent *agent; /* pointer to my agent */
>  void *magic; /* ptr to pass back in getCountersFn() */
> @@ -156,7 +156,7 @@ typedef struct _SFLPoller {
>  u_int32_t bridgePort; /* port number local to bridge */
>  /* private fields */
>  SFLReceiver *myReceiver;
> -time_t countersCountdown;
> +u_int32_t countersCountdown;
>  u_int32_t countersSampleSeqNo;
>  } SFLPoller;
>
> -- 
> 2.44.0

Recheck-request: github-robot

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 5/8] sflow: Fix check for disabled receive time.

2024-05-30 Thread Eelco Chaudron



On 29 May 2024, at 12:53, Eelco Chaudron wrote:

> Changed sFlowRcvrTimeout to a uint32_t to avoid time_t warnings
> reported by Coverity. A uint32_t is more than large enough as
> this is a (seconds) tick counter and OVS is not even using this.
>
> Fixes: c72e245a0e2c ("Add InMon's sFlow Agent library to the build system.")
> Acked-by: Ilya Maximets 
> Signed-off-by: Eelco Chaudron 
> --
> Note that this checkpatch reports an 'Improper whitespace
> around control block' error on this patch + some warnings.
> But I did not want to change the code style in this entire file.
> ---
>  lib/sflow_api.h  | 6 +++---
>  lib/sflow_receiver.c | 7 ---
>  ofproto/ofproto-dpif-sflow.c | 2 +-
>  3 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/lib/sflow_api.h b/lib/sflow_api.h
> index f4bfa5ead..b884a6a7d 100644
> --- a/lib/sflow_api.h
> +++ b/lib/sflow_api.h
> @@ -97,7 +97,7 @@ typedef struct _SFLReceiver {
>  struct _SFLReceiver *nxt;
>  /* MIB fields */
>  char *sFlowRcvrOwner;
> -time_t sFlowRcvrTimeout;
> +u_int32_t sFlowRcvrTimeout;
>  u_int32_t sFlowRcvrMaximumDatagramSize;
>  SFLAddress sFlowRcvrAddress;
>  u_int32_t sFlowRcvrPort;
> @@ -251,8 +251,8 @@ SFLSampler *sfl_agent_getSamplerByIfIndex(SFLAgent 
> *agent, u_int32_t ifIndex);
>  /* receiver */
>  char *  sfl_receiver_get_sFlowRcvrOwner(SFLReceiver *receiver);
>  voidsfl_receiver_set_sFlowRcvrOwner(SFLReceiver *receiver, char 
> *sFlowRcvrOwner);
> -time_t  sfl_receiver_get_sFlowRcvrTimeout(SFLReceiver *receiver);
> -voidsfl_receiver_set_sFlowRcvrTimeout(SFLReceiver *receiver, time_t 
> sFlowRcvrTimeout);
> +u_int32_t   sfl_receiver_get_sFlowRcvrTimeout(SFLReceiver *receiver);
> +voidsfl_receiver_set_sFlowRcvrTimeout(SFLReceiver *receiver, 
> u_int32_t sFlowRcvrTimeout);
>  u_int32_t   sfl_receiver_get_sFlowRcvrMaximumDatagramSize(SFLReceiver 
> *receiver);
>  voidsfl_receiver_set_sFlowRcvrMaximumDatagramSize(SFLReceiver 
> *receiver, u_int32_t sFlowRcvrMaximumDatagramSize);
>  SFLAddress *sfl_receiver_get_sFlowRcvrAddress(SFLReceiver *receiver);
> diff --git a/lib/sflow_receiver.c b/lib/sflow_receiver.c
> index 4162518e3..3c5aec897 100644
> --- a/lib/sflow_receiver.c
> +++ b/lib/sflow_receiver.c
> @@ -102,10 +102,10 @@ void sfl_receiver_set_sFlowRcvrOwner(SFLReceiver 
> *receiver, char *sFlowRcvrOwner
>   reset(receiver);
>  }
>  }
> -time_t sfl_receiver_get_sFlowRcvrTimeout(SFLReceiver *receiver) {
> +u_int32_t sfl_receiver_get_sFlowRcvrTimeout(SFLReceiver *receiver) {
>  return receiver->sFlowRcvrTimeout;
>  }
> -void sfl_receiver_set_sFlowRcvrTimeout(SFLReceiver *receiver, time_t 
> sFlowRcvrTimeout) {
> +void sfl_receiver_set_sFlowRcvrTimeout(SFLReceiver *receiver, u_int32_t 
> sFlowRcvrTimeout) {
>  receiver->sFlowRcvrTimeout =sFlowRcvrTimeout;
>  }
>  u_int32_t sfl_receiver_get_sFlowRcvrMaximumDatagramSize(SFLReceiver 
> *receiver) {
> @@ -146,7 +146,8 @@ void sfl_receiver_tick(SFLReceiver *receiver)
>  // if there are any samples to send, flush them now
>  if(receiver->sampleCollector.numSamples > 0) sendSample(receiver);
>  // check the timeout
> -if(receiver->sFlowRcvrTimeout && (u_int32_t)receiver->sFlowRcvrTimeout 
> != 0x) {
> +if(receiver->sFlowRcvrTimeout
> +   && receiver->sFlowRcvrTimeout != UINT32_MAX) {
>   // count down one tick and reset if we reach 0
>   if(--receiver->sFlowRcvrTimeout == 0) reset(receiver);
>  }
> diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
> index 4a68e9b94..80405b68a 100644
> --- a/ofproto/ofproto-dpif-sflow.c
> +++ b/ofproto/ofproto-dpif-sflow.c
> @@ -808,7 +808,7 @@ dpif_sflow_set_options(struct dpif_sflow *ds,
>
>  receiver = sfl_agent_addReceiver(ds->sflow_agent);
>  sfl_receiver_set_sFlowRcvrOwner(receiver, "Open vSwitch sFlow");
> -sfl_receiver_set_sFlowRcvrTimeout(receiver, 0x);
> +sfl_receiver_set_sFlowRcvrTimeout(receiver, UINT32_MAX);
>
>  /* Set the sampling_rate down in the datapath. */
>  ds->probability = MAX(1, UINT32_MAX / ds->options->sampling_rate);
> -- 
> 2.44.0

Recheck-request: github-robot

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 8/8] netdev-linux: Fix uninitialized gso_type case.

2024-05-30 Thread Eelco Chaudron



On 29 May 2024, at 12:53, Eelco Chaudron wrote:

> This patch fixes an uninitialized gso_type case in
> netdev_linux_prepend_vnet_hdr() by returning an error.
>
> Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.")
> Signed-off-by: Eelco Chaudron 
> ---
>  lib/netdev-linux.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index eb0c5c624..dc67e1268 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -7167,6 +7167,10 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
> mtu)
>  vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>  } else if (dp_packet_hwol_tx_ipv6(b)) {
>  vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> +} else {
> +VLOG_ERR_RL(&rl, "Unknown gso_type for TSO packet. Flags: 
> %"PRIx64,
> +(uint64_t) *dp_packet_ol_flags_ptr(b));
> +return EINVAL;
>  }
>  } else {
>  vnet->hdr_len = 0;
> -- 
> 2.44.0

Recheck-request: github-robot

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev