Re: [ovs-dev] [PATCH 0/2] tests: Replace wget with curl.
On Tue, Oct 29, 2024 at 2:33 PM Eelco Chaudron wrote: > > Eelco Chaudron (2): > system-traffic: Replace wget with curl for negative and ftp tests. > system-traffic: Standardize by replacing all wget instances with curl. > > tests/system-tap.at | 3 +- > tests/system-traffic.at | 249 +++- > 2 files changed, 172 insertions(+), 80 deletions(-) > Overall, it lgtm (just a nit in the second patch). -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/2] tests: Replace wget with curl.
On Wed, Oct 30, 2024 at 3:45 PM Eelco Chaudron wrote: > On 30 Oct 2024, at 15:34, David Marchand wrote: > > > On Tue, Oct 29, 2024 at 2:33 PM Eelco Chaudron wrote: > >> > >> Eelco Chaudron (2): > >> system-traffic: Replace wget with curl for negative and ftp tests. > >> system-traffic: Standardize by replacing all wget instances with curl. > >> > >> tests/system-tap.at | 3 +- > >> tests/system-traffic.at | 249 +++- > >> 2 files changed, 172 insertions(+), 80 deletions(-) > >> > > > > Overall, it lgtm (just a nit in the second patch). > > Thanks for the review. I can fix the nit on commit. Let me know if you want > to add your ack/review-by. Ah yes, you can add on both patches. Acked-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 2/2] system-traffic: Standardize by replacing all wget instances with curl.
On Tue, Oct 29, 2024 at 2:34 PM Eelco Chaudron wrote: > @@ -7892,7 +7933,10 @@ NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -W 2 > fc00::240 | FORMAT_PING], [0] > > dnl Should work with the virtual IP address through NAT > OVS_START_L7([at_ns1], [http6]) > -NS_CHECK_EXEC([at_ns0], [wget http://[[fc00::240]] -t 5 -T 1 > --retry-connrefused -v -o wget0.log]) > +NS_CHECK_EXEC([at_ns0], > + [curl http://[[fc00::240]] --retry 5 --max-time 1 --retry-connrefused \ > +-v >curl0.log 2>&1]) > + > Nit: no need for extra line here. > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::1)], [0], [dnl > > icmpv6,orig=(src=fc00::1,dst=fc00::240,id=,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=,type=129,code=0),zone=1 -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 2/2] ci: Update GitHub actions runner from Ubuntu 22.04 to 24.04.
On Fri, Oct 25, 2024 at 10:02 PM Ilya Maximets wrote: > > On 10/25/24 20:25, Kevin Traynor wrote: > > On 17/10/2024 14:09, Eelco Chaudron wrote: > >> This patch upgrades the Ubuntu runner to the latest LTS version. > >> It also adds required packages > > What requires the libsystemd-dev ? It's a shame if we have to link with it. > (I believe I asked that question somewhere..., but I don't remember where > or what was the answer.) It seems to be pulled from the DPDK pkg-config (which depends on libpcap, itself depending on libsystem-dev when statically linking). # pkg-config --static --libs libpcap -lpcap -ldbus-1 -lsystemd -libverbs -lbnxt_re-rdmav34 -lcxgb4-rdmav34 -lefa -lerdma-rdmav34 -lhns-rdmav34 -lirdma-rdmav34 -lmana -lmlx4 -lmlx5 -lmthca-rdmav34 -locrdma-rdmav34 -lqedr-rdmav34 -lvmw_pvrdma-rdmav34 -lhfi1verbs-rdmav34 -lipathverbs-rdmav34 -lrxe-rdmav34 -lsiw-rdmav34 -libverbs -lpthread -lnl-route-3 -lnl-3 -lpthread But, we don't need net/pcap anymore since commit 98ee21ef6364 ("system-dpdk: Use dummy-pmd port for packet injection."). I sent a patch: https://patchwork.ozlabs.org/project/openvswitch/patch/20241028102343.2544097-1-david.march...@redhat.com/ Btw, I don't think libbpf-dev is needed anymore (for DPDK compilation at least), as long as libxdp-dev is installed. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ci: Remove dependency on libpcap.
Since commit 98ee21ef6364 ("system-dpdk: Use dummy-pmd port for ..."), DPDK unit tests don't require net/pcap driver anymore, and we can remove this dependency when building DPDK and OVS in GHA. Fixes: 98ee21ef6364 ("system-dpdk: Use dummy-pmd port for packet injection.") Signed-off-by: David Marchand --- .ci/dpdk-build.sh| 2 +- .github/workflows/build-and-test.yml | 9 - 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/.ci/dpdk-build.sh b/.ci/dpdk-build.sh index e1b8e3ccbb..991b272ea1 100755 --- a/.ci/dpdk-build.sh +++ b/.ci/dpdk-build.sh @@ -40,7 +40,7 @@ function build_dpdk() # any DPDK driver. # check-dpdk unit tests requires testpmd and some net/ driver. DPDK_OPTS="$DPDK_OPTS -Denable_apps=test-pmd" -enable_drivers="net/null,net/af_xdp,net/tap,net/virtio,net/pcap" +enable_drivers="net/null,net/af_xdp,net/tap,net/virtio" DPDK_OPTS="$DPDK_OPTS -Denable_drivers=$enable_drivers" # OVS depends on the vhost library (and its dependencies). # net/tap depends on the gso library. diff --git a/.github/workflows/build-and-test.yml b/.github/workflows/build-and-test.yml index 6ed39ca806..4a4e3eea39 100644 --- a/.github/workflows/build-and-test.yml +++ b/.github/workflows/build-and-test.yml @@ -8,7 +8,7 @@ env: jobs: build-dpdk: env: - dependencies: gcc libbpf-dev libnuma-dev libpcap-dev ninja-build pkgconf + dependencies: gcc libbpf-dev libnuma-dev ninja-build pkgconf CC: gcc DPDK_GIT: https://dpdk.org/git/dpdk-stable DPDK_VER: 23.11.2 @@ -79,8 +79,7 @@ jobs: env: dependencies: | automake libtool gcc bc libjemalloc2 libjemalloc-dev libssl-dev \ -llvm-dev libnuma-dev libpcap-dev selinux-policy-dev libbpf-dev \ -lftp libreswan +llvm-dev libnuma-dev selinux-policy-dev libbpf-dev lftp libreswan CC: ${{ matrix.compiler }} DPDK:${{ matrix.dpdk }} DPDK_SHARED: ${{ matrix.dpdk_shared }} @@ -279,8 +278,8 @@ jobs: needs: build-dpdk env: dependencies: | -automake bc clang-tools libbpf-dev libnuma-dev libpcap-dev \ -libunbound-dev libunwind-dev libssl-dev libtool llvm-dev +automake bc clang-tools libbpf-dev libnuma-dev libunbound-dev \ +libunwind-dev libssl-dev libtool llvm-dev CC: clang DPDK: dpdk CLANG_ANALYZE: true -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] github: Remove ASLR entropy workaround.
On Fri, Oct 25, 2024 at 10:41 PM Ilya Maximets wrote: > > The issue should be fixed by now. > > Signed-off-by: Ilya Maximets I was looking into this for dpdk GHA during last week and I came to the same conclusion. CI seems to agree. Acked-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [dpdk-latest] netdev-offload-dpdk: Fix build with v24.11-rc1.
On Mon, Oct 21, 2024 at 11:31 AM Ilya Maximets wrote: > > Just to share for others, there is one CI fail which is a bit > > unintuitive at first glance. clang analyze is failing. The fail is for > > the OVS base branch. So it is OVS *without* this patch with DPDK main > > branch HEAD. That is expected as this patch is the one to fix the issue. > > I guess, this patch is also a good opportunity to think how to make > clang-analyzer job not fail on DPDK upgrades as it seems that we'll > have a CI failure when we try to move the main branch to 24.11. It comes back to any build error due to an env issue (imagine GHA Ubuntu images get broken for building OVS) / external source. The simpler is to skip this test (iow report success) on build failure of the reference. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [dpdk-latest] netdev-offload-dpdk: Fix build with v24.11-rc1.
Following introduction of a IPv6 address structure and its use in the rte_ipv6_hdr struct referenced by rte_flow IPv6 object, adjust code manipulating IPv6 rules in the offload code. Link: https://git.dpdk.org/dpdk/commit/?id=89b5642d0d45 Signed-off-by: David Marchand --- lib/netdev-offload-dpdk.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c index 1a6e100ffb..b820aa60e3 100644 --- a/lib/netdev-offload-dpdk.c +++ b/lib/netdev-offload-dpdk.c @@ -530,15 +530,15 @@ dump_flow_pattern(struct ds *s, if (!ipv6_mask) { ipv6_mask = &rte_flow_item_ipv6_mask; } -memcpy(&addr, ipv6_spec->hdr.src_addr, sizeof addr); -memcpy(&mask, ipv6_mask->hdr.src_addr, sizeof mask); +memcpy(&addr, &ipv6_spec->hdr.src_addr, sizeof addr); +memcpy(&mask, &ipv6_mask->hdr.src_addr, sizeof mask); ipv6_string_mapped(addr_str, &addr); ipv6_string_mapped(mask_str, &mask); DUMP_PATTERN_ITEM(mask, false, "src", "%s", addr_str, mask_str, ""); -memcpy(&addr, ipv6_spec->hdr.dst_addr, sizeof addr); -memcpy(&mask, ipv6_mask->hdr.dst_addr, sizeof mask); +memcpy(&addr, &ipv6_spec->hdr.dst_addr, sizeof addr); +memcpy(&mask, &ipv6_mask->hdr.dst_addr, sizeof mask); ipv6_string_mapped(addr_str, &addr); ipv6_string_mapped(mask_str, &mask); DUMP_PATTERN_ITEM(mask, false, "dst", "%s", @@ -695,10 +695,10 @@ dump_vxlan_encap(struct ds *s, const struct rte_flow_item *items) struct in6_addr addr; ds_put_cstr(s, "ip-src "); -memcpy(&addr, ipv6->hdr.src_addr, sizeof addr); +memcpy(&addr, &ipv6->hdr.src_addr, sizeof addr); ipv6_format_mapped(&addr, s); ds_put_cstr(s, " ip-dst "); -memcpy(&addr, ipv6->hdr.dst_addr, sizeof addr); +memcpy(&addr, &ipv6->hdr.dst_addr, sizeof addr); ipv6_format_mapped(&addr, s); ds_put_cstr(s, " "); } @@ -834,7 +834,7 @@ dump_flow_action(struct ds *s, struct ds *s_extra, struct in6_addr addr; ds_put_cstr(s, "ipv6_addr "); -memcpy(&addr, set_ipv6->ipv6_addr, sizeof addr); +memcpy(&addr, &set_ipv6->ipv6_addr, sizeof addr); ipv6_format_addr(&addr, s); ds_put_cstr(s, " "); } @@ -1210,18 +1210,18 @@ parse_tnl_ip_match(struct flow_patterns *patterns, spec->hdr.hop_limits = match->flow.tunnel.ip_ttl; spec->hdr.vtc_flow = htonl((uint32_t) match->flow.tunnel.ip_tos << RTE_IPV6_HDR_TC_SHIFT); -memcpy(spec->hdr.src_addr, &match->flow.tunnel.ipv6_src, +memcpy(&spec->hdr.src_addr, &match->flow.tunnel.ipv6_src, sizeof spec->hdr.src_addr); -memcpy(spec->hdr.dst_addr, &match->flow.tunnel.ipv6_dst, +memcpy(&spec->hdr.dst_addr, &match->flow.tunnel.ipv6_dst, sizeof spec->hdr.dst_addr); mask->hdr.proto = UINT8_MAX; mask->hdr.hop_limits = match->wc.masks.tunnel.ip_ttl; mask->hdr.vtc_flow = htonl((uint32_t) match->wc.masks.tunnel.ip_tos << RTE_IPV6_HDR_TC_SHIFT); -memcpy(mask->hdr.src_addr, &match->wc.masks.tunnel.ipv6_src, +memcpy(&mask->hdr.src_addr, &match->wc.masks.tunnel.ipv6_src, sizeof mask->hdr.src_addr); -memcpy(mask->hdr.dst_addr, &match->wc.masks.tunnel.ipv6_dst, +memcpy(&mask->hdr.dst_addr, &match->wc.masks.tunnel.ipv6_dst, sizeof mask->hdr.dst_addr); consumed_masks->tunnel.ip_tos = 0; @@ -1532,9 +1532,9 @@ parse_flow_match(struct netdev *netdev, spec->hdr.hop_limits = match->flow.nw_ttl; spec->hdr.vtc_flow = htonl((uint32_t) match->flow.nw_tos << RTE_IPV6_HDR_TC_SHIFT); -memcpy(spec->hdr.src_addr, &match->flow.ipv6_src, +memcpy(&spec->hdr.src_addr, &match->flow.ipv6_src, sizeof spec->hdr.src_addr); -memcpy(spec->hdr.dst_addr, &match->flow.ipv6_dst, +memcpy(&spec->hdr.dst_addr, &match->flow.ipv6_dst, sizeof spec->hdr.dst_addr); if ((match->wc.masks.nw_frag & FLOW_NW_FRAG_ANY) && (match->flow.nw_frag & FLOW_NW_FRAG_ANY)) { @@ -1545,9 +1545,9 @@
Re: [ovs-dev] [PATCH v1] netdev-dpdk: Added xstat statistics interface.
On Thu, Sep 19, 2024 at 6:47 AM Jun Wang wrote: > > On Wed, Sep 18, 2024 at 4:01 AM Jun Wang wrote: > > > Hi David, > > > As you mentioned, I also found that the results obtained by > > > netdev_dpdk_get_custom_stats are filtered. However, I noticed that in > > > DPDK, different net drivers handle xstats commands in various ways. > > > If we apply a unified filtering process, it may result in not being > > > able to retrieve the desired results for different net drivers. > > > Therefore, I believe adding a new interface might be a suitable > > > solution. For example, the xstats of the ixgbe/i40e/txgbe driver. > > > > Well, the problem is that drivers came up with all the stats they > > could get from the hw with no unification. > > This is why I was requesting some example of the stats you need, to > > see if DPDK can align drivers, and then OVS filter can be extended. I see no repy, so I understand that you have no explicit requirement for a stat not reported in OVS. > > > > On the other hand, for *debugging*, you may query those vendor > > specific stats via the DPDK telemetry socket. > > I tried using DPDK telemetry with the X710 network card, and it was > indeed able to retrieve all xstat statistics. However, there seems to > be an issue with retrieving xstat data from the Wangxun network card > using the txgbe driver. > This might be an issue with the specific driver. [...] > wangxun txgbe driver: > --> /ethdev/xstats,0 > { > "/ethdev/xstats": { >"rx_good_packets": 28224, >"tx_good_packets": 110, >"rx_good_bytes": 3048375, >"tx_good_bytes": 13640, >"rx_missed_errors": 0, >"rx_errors": 0, >"tx_errors": 0, >"rx_mbuf_allocation_errors": 0 > } > } Definitely a driver issue. I did not look at the txgbe driver code, but such issue is often around the count of reported xstats (in xstats_get_names() or xstats_get() op). Please open a bz at bugs.dpdk.org and assign it to a txgbe driver maintainer. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v1] netdev-dpdk: Added xstat statistics interface.
On Wed, Sep 18, 2024 at 4:01 AM Jun Wang wrote: > > >On Mon, Sep 16, 2024 at 9:46 AM Eelco Chaudron wrote: > > > On 14 Sep 2024, at 7:26, Jun Wang wrote: > > > > > > > For many network cards, xstat statistics cannot be queried in the > > > > ovs interface. A new interface is added to retrieve all xstat > > > > information of the network card. > > > Hi Jun, > > > > > > Isn’t this already handled with netdev_dpdk_get_custom_stats()? I think > > > you can see them with ‘ovs-vsctl get Interface statistics. > > > > > > If this is not the case this is what we should enhance. > > > > xstats are currently filtered to only save per queue stats and some > > common counters. > > > static bool > > is_queue_stat(const char *s) > > { > > uint16_t tmp; > > > > return (s[0] == 'r' || s[0] == 't') && > > (ovs_scan(s + 1, "x_q%"SCNu16"_packets", &tmp) || > > ovs_scan(s + 1, "x_q%"SCNu16"_bytes", &tmp)); > > } > > > > static void > > netdev_dpdk_configure_xstats(struct netdev_dpdk *dev) > > OVS_REQUIRES(dev->mutex) > > { > > ... > > > > /* For custom stats, we filter out everything except per > > rxq/txq basic > > * stats, and dropped, error and management counters. */ > > if (is_queue_stat(name) || > > string_ends_with(name, "_errors") || > > strstr(name, "_management_") || > > string_ends_with(name, "_dropped")) { > > > > dev->rte_xstats_ids[dev->rte_xstats_ids_size] = id; > > dev->rte_xstats_ids_size++; > > } > > > > > > Could you explain which stats you are missing? > > Hi David, > As you mentioned, I also found that the results obtained by > netdev_dpdk_get_custom_stats are filtered. However, I noticed that in > DPDK, different net drivers handle xstats commands in various ways. > If we apply a unified filtering process, it may result in not being > able to retrieve the desired results for different net drivers. > Therefore, I believe adding a new interface might be a suitable > solution. For example, the xstats of the ixgbe/i40e/txgbe driver. Well, the problem is that drivers came up with all the stats they could get from the hw with no unification. This is why I was requesting some example of the stats you need, to see if DPDK can align drivers, and then OVS filter can be extended. On the other hand, for *debugging*, you may query those vendor specific stats via the DPDK telemetry socket. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v1] netdev-dpdk: Disable outer udp checksum offload for txgbe driver.
On Sat, Sep 14, 2024 at 5:06 AM Jun Wang wrote: > > Fixing the issue of incorrect outer UDP checksum in packets sent by > the wangxun network card (driver is txgbe), we disabled > RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM. I don't have the hw, so I'll trust you on the topic :-). > > Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") > Reported-by: Jun Wang > Nit: no need for an extra line here. > Signed-off-by: Jun Wang Acked-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] netdev-offload-dpdk: Change flow offload failure log level.
On Fri, Sep 13, 2024 at 2:25 PM Kevin Traynor wrote: > > Previously when a flow was attempted to be offloaded, if it > could not be offloaded and did not return an actions error, > a warning was logged. > > The reason there was an exception for an actions error was to allow > for failure for full offload of a flow because it will fallback to > partial offload. There are some issues with this approach to logging. > > Some NICs do not specify an actions error, because other config in > the NIC may be checked first. e.g. In the case of Mellanox CX-5, > there can be different types of offload configured, so an unspecified > error may be returned. > > More generally, enabling hw-offload is best effort per datapath/NIC/flow > as full and partial offload support in NICs is variable and there is > always fallback to software. > > So there is likely to be repeated logging about offloading of flows > failing. With this in mind, change the log level to debug. > > The status of the offload can still be seen with below command: > $ ovs-appctl dpctl/dump-flows -m > ... offloaded:partial ... > > Also, remove some duplicated rate limiting and tidy-up the succeed > and failure logs. > > Signed-off-by: Kevin Traynor > > --- > v2: combined logs on failure and added confirmation of result on > succeed. > --- > lib/netdev-offload-dpdk.c | 23 +-- > 1 file changed, 9 insertions(+), 14 deletions(-) > > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c > index 623005b1c..342292d23 100644 > --- a/lib/netdev-offload-dpdk.c > +++ b/lib/netdev-offload-dpdk.c > @@ -920,22 +920,17 @@ netdev_offload_dpdk_flow_create(struct netdev *netdev, > dump_flow(&s, &s_extra, attr, flow_patterns, flow_actions); > extra_str = ds_cstr(&s_extra); > -VLOG_DBG_RL(&rl, "%s: rte_flow 0x%"PRIxPTR" %s flow create %d > %s", > -netdev_get_name(netdev), (intptr_t) flow, extra_str, > -netdev_dpdk_get_port_id(netdev), ds_cstr(&s)); > +VLOG_DBG("%s: rte_flow creation succeeded: rte_flow 0x%"PRIxPTR" > " > + "%s flow create %d %s", netdev_get_name(netdev), > + (intptr_t) flow, extra_str, > + netdev_dpdk_get_port_id(netdev), ds_cstr(&s)); > } > } else { > -enum vlog_level level = VLL_WARN; > - > -if (error->type == RTE_FLOW_ERROR_TYPE_ACTION) { > -level = VLL_DBG; > -} > -VLOG_RL(&rl, level, "%s: rte_flow creation failed: %d (%s).", > -netdev_get_name(netdev), error->type, error->message); > -if (!vlog_should_drop(&this_module, level, &rl)) { > +if (!VLOG_DROP_DBG(&rl)) { > dump_flow(&s, &s_extra, attr, flow_patterns, flow_actions); > extra_str = ds_cstr(&s_extra); > -VLOG_RL(&rl, level, "%s: Failed flow: %s flow create %d %s", > -netdev_get_name(netdev), extra_str, > -netdev_dpdk_get_port_id(netdev), ds_cstr(&s)); > +VLOG_DBG("%s: rte_flow creation failed [%d (%s)]: " > + "%s flow create %d %s", netdev_get_name(netdev), > + error->type, error->message,extra_str, Nit: I think an extra space is missing before extra_str. > + netdev_dpdk_get_port_id(netdev), ds_cstr(&s)); > } > } > -- > 2.46.0 > Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v1] netdev-dpdk: Added xstat statistics interface.
On Mon, Sep 16, 2024 at 9:46 AM Eelco Chaudron wrote: > On 14 Sep 2024, at 7:26, Jun Wang wrote: > > > For many network cards, xstat statistics cannot be queried in the > > ovs interface. A new interface is added to retrieve all xstat > > information of the network card. > Hi Jun, > > Isn’t this already handled with netdev_dpdk_get_custom_stats()? I think you > can see them with ‘ovs-vsctl get Interface statistics. > > If this is not the case this is what we should enhance. xstats are currently filtered to only save per queue stats and some common counters. static bool is_queue_stat(const char *s) { uint16_t tmp; return (s[0] == 'r' || s[0] == 't') && (ovs_scan(s + 1, "x_q%"SCNu16"_packets", &tmp) || ovs_scan(s + 1, "x_q%"SCNu16"_bytes", &tmp)); } static void netdev_dpdk_configure_xstats(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { ... /* For custom stats, we filter out everything except per rxq/txq basic * stats, and dropped, error and management counters. */ if (is_queue_stat(name) || string_ends_with(name, "_errors") || strstr(name, "_management_") || string_ends_with(name, "_dropped")) { dev->rte_xstats_ids[dev->rte_xstats_ids_size] = id; dev->rte_xstats_ids_size++; } Could you explain which stats you are missing? -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-offload-dpdk: Change log level on flow offload failure.
On Tue, Sep 10, 2024 at 5:11 PM Kevin Traynor wrote: > -enum vlog_level level = VLL_WARN; > - > -if (error->type == RTE_FLOW_ERROR_TYPE_ACTION) { > -level = VLL_DBG; > -} > -VLOG_RL(&rl, level, "%s: rte_flow creation failed: %d (%s).", > -netdev_get_name(netdev), error->type, error->message); > -if (!vlog_should_drop(&this_module, level, &rl)) { > +if (!VLOG_DROP_DBG(&rl)) { > +VLOG_DBG("%s: rte_flow creation failed: %d (%s).", > + netdev_get_name(netdev), error->type, error->message); > dump_flow(&s, &s_extra, attr, flow_patterns, flow_actions); > extra_str = ds_cstr(&s_extra); > -VLOG_RL(&rl, level, "%s: Failed flow: %s flow create %d %s", > -netdev_get_name(netdev), extra_str, > -netdev_dpdk_get_port_id(netdev), ds_cstr(&s)); > +VLOG_DBG("%s: Failed flow: %s flow create %d %s", > + netdev_get_name(netdev), extra_str, > + netdev_dpdk_get_port_id(netdev), ds_cstr(&s)); I wonder why we had two log messages so far. In any case, wrt ratelimit, with this change above, I would say we should log only one message. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev: Fix full duplex capability for 25G ports.
As OVS does not know of the 25G speed, matching on a list of known speed for deducing full duplex capability is broken. Invert the test and assume full duplex is the default. Suggested-by: Adrian Moreno Signed-off-by: David Marchand --- lib/netdev.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/netdev.c b/lib/netdev.c index f2d921ed63..bdacfe5c49 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -1285,14 +1285,14 @@ netdev_features_to_bps(enum netdev_features features, : default_bps); } -/* Returns true if any of the NETDEV_F_* bits that indicate a full-duplex link - * are set in 'features', otherwise false. */ +/* Returns true if any of the NETDEV_F_* bits that indicate a half-duplex link + * are unset in 'features', otherwise false. */ bool netdev_features_is_full_duplex(enum netdev_features features) { -return (features & (NETDEV_F_10MB_FD | NETDEV_F_100MB_FD | NETDEV_F_1GB_FD -| NETDEV_F_10GB_FD | NETDEV_F_40GB_FD -| NETDEV_F_100GB_FD | NETDEV_F_1TB_FD)) != 0; + +return (features & (NETDEV_F_10MB_HD | NETDEV_F_100MB_HD +| NETDEV_F_1GB_HD)) == 0; } /* Set the features advertised by 'netdev' to 'advertise'. Returns 0 if -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev-linux: Do not offload IP checksum.
Offloading checksum with a virtio_net_hdr header works only for L4. Remove false claim that IP checksum offloading is supported. As a consequence, L3 checksum can be assumed to be correct when preparing a L4 offload request. Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.") Signed-off-by: David Marchand --- Note: this issue was first noticed while natting IP packets received from a DPDK port (reporting that the packet IP checksum was correct) and sending those packets to a tap netdev. --- lib/netdev-linux.c | 16 +++- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index c316238cd5..765a07e355 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -1061,8 +1061,7 @@ netdev_linux_construct_tap(struct netdev *netdev_) if (tap_supports_vnet_hdr && ioctl(netdev->tap_fd, TUNSETOFFLOAD, oflags) == 0) { -netdev_->ol_flags |= (NETDEV_TX_OFFLOAD_IPV4_CKSUM - | NETDEV_TX_OFFLOAD_TCP_CKSUM +netdev_->ol_flags |= (NETDEV_TX_OFFLOAD_TCP_CKSUM | NETDEV_TX_OFFLOAD_UDP_CKSUM); if (userspace_tso_enabled()) { @@ -2510,13 +2509,11 @@ netdev_linux_set_ol(struct netdev *netdev_) char *string; uint32_t value; } t_list[] = { -{"tx-checksum-ipv4", NETDEV_TX_OFFLOAD_IPV4_CKSUM | - NETDEV_TX_OFFLOAD_TCP_CKSUM | +{"tx-checksum-ipv4", NETDEV_TX_OFFLOAD_TCP_CKSUM | NETDEV_TX_OFFLOAD_UDP_CKSUM}, {"tx-checksum-ipv6", NETDEV_TX_OFFLOAD_TCP_CKSUM | NETDEV_TX_OFFLOAD_UDP_CKSUM}, -{"tx-checksum-ip-generic", NETDEV_TX_OFFLOAD_IPV4_CKSUM | - NETDEV_TX_OFFLOAD_TCP_CKSUM | +{"tx-checksum-ip-generic", NETDEV_TX_OFFLOAD_TCP_CKSUM | NETDEV_TX_OFFLOAD_UDP_CKSUM}, {"tx-checksum-sctp", NETDEV_TX_OFFLOAD_SCTP_CKSUM}, {"tx-tcp-segmentation", NETDEV_TX_OFFLOAD_TCP_TSO}, @@ -7203,13 +7200,6 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu) /* The packet has good L4 checksum. No need to validate again. */ vnet->csum_start = vnet->csum_offset = (OVS_FORCE __virtio16) 0; vnet->flags = VIRTIO_NET_HDR_F_DATA_VALID; - -/* It is possible that L4 is good but the IPv4 checksum isn't - * complete. For example in the case of UDP encapsulation of an ARP - * packet where the UDP checksum is 0. */ -if (dp_packet_hwol_l3_csum_ipv4_ol(b)) { -dp_packet_ip_set_header_csum(b, false); -} } else if (dp_packet_hwol_tx_l4_checksum(b)) { /* The csum calculation is offloaded. */ if (dp_packet_hwol_l4_is_tcp(b)) { -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 3/3] netdev-dpdk: Re-enable VXLAN/Geneve offload for Intel cards.
On Fri, Jul 12, 2024 at 11:30 PM Mike Pattrick wrote: > > Previously the OVS support for checksum/TSO offloading didn't work well > with some network cards that supported VXLAN/Geneve tunnel TSO but not > outer UDP checksums. Now support for this configuration is improved and > we no longer need to disable the VXLAN/Geneve TSO flags from intel > hardware support flags. Not sure it is clear to anyone reading this commitlog. Maybe explain that restoring those capabilities makes it possible to do hw tso over tunnel, when tunnel csum is disabled? > > The modification to outer UDP offload is still required pending a future > DPDK release. Indeed, when the dpdk fixes are in, we can re-enable those capabilities (well, to be more precise, we can stop filtering broken capabilities) on those nics. > > Suggested-by: David Marchand > Signed-off-by: Mike Pattrick Reviewed-by: David Marchand -- David Marchand (going off for some time) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 2/3] Userspace: Software fallback for UDP encapsulated TCP segmentation.
On Fri, Jul 12, 2024 at 11:30 PM Mike Pattrick wrote: > > When sending packets that are flagged as requiring segmentation to an > interface that does not support this feature, send the packet to the TSO > software fallback instead of dropping it. > > Signed-off-by: Mike Pattrick > --- > v2: > - Fixed udp tunnel length > - Added test that UDP headers are correct > - Split inner and outer ip_id into different counters > - Set tunnel flags in reset_tcp_seg > > v3: > - Changed logic of netdev_send() to account for NICs that support > tunnel offload but not checksum offload > - Adjusted udp tunnel header length during software tso > > v4: > - Moved a bugfix into its own patch > - Fixed an indentation issue > - Changed the scope of ip_hdr > > v5: > - Change order of tso fallback check > - Added a unit test > Signed-off-by: Mike Pattrick I double checked the datapath code and it looks good to me (no difference with the branch containing my suggestion for the fallback check, and this branch was tested with E810, mlx5 and ixgbe nics). I did not go in depth for the added unit test (which is skipped in the CI) but I was ok with the previous version. Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 1/3] userspace: Adjust segment size on encapsulation.
On Fri, Jul 12, 2024 at 11:30 PM Mike Pattrick wrote: > > When prepending a tunnel header to a packet marked for segmentation, we > need to adjust the segment size. Failure to do so can result in packets > that are larger than the intended MTU post segmentation. > > Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") > Signed-off-by: Mike Pattrick I see no difference with v4, so just repeating: Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netlink-rtnl: Fix netdev leak in out-of-tree tunnels probe.
On Fri, Jul 12, 2024 at 5:03 PM Eric Garver wrote: > > IIRC, the out-of-tree modules have been removed since 3.0. I think this > probe could be removed entirely. I was wondering too if this code should be dropped, but at least from a backport pov, this fix seems to make sense. > > At any rate, giving my ACK for this patch. > > Acked-by: Eric Garver Thanks. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] dpif-netlink-rtnl: Fix netdev leak in out-of-tree tunnels probe.
Caught by code review, calling netdev_open works in pair of netdev_close when no reference to a netdev must be kept. Fixes: 921c370a9df5 ("dpif-netlink: Probe for out-of-tree tunnels, decides used interface") Signed-off-by: David Marchand --- lib/dpif-netlink-rtnl.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c index 5788294ae0..f7035333e6 100644 --- a/lib/dpif-netlink-rtnl.c +++ b/lib/dpif-netlink-rtnl.c @@ -566,6 +566,7 @@ dpif_netlink_rtnl_probe_oot_tunnels(void) tnl_cfg = netdev_get_tunnel_config(netdev); if (!tnl_cfg) { +netdev_close(netdev); return true; } -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpctl: Netdev leak causes stale VHU & reaching limit.
On Thu, Jul 11, 2024 at 10:18 PM Vipul Ashri via dev wrote: > > > While running a test with a continous VM creation/deletion using an > orchestration script with-in cloud environment. In parallel we have > some monitoring script calling ovs-appctl dpctl/show stats commands > every minute. > > During VHU port delete, one of netdev references were not reduced to > 0 as show_dpif call has not given-up the reference back or doing bad > cleanup. This pending deference preventing VHU deletion sequence, this > is found to be one of corner case inside dpctl code which results in > leaking up netdev which ultimately results in stale VHU entry. After > fixing this problematic cleanup, issue is not seen. > > Fixes: fceef2095222 ("dpctl: add ovs-appctl dpctl/* commands to talk to > dpif-netdev") > Signed-off-by: Vipul Ashri The issue seems generic as any netdev for which a call to netdev_get_stats failed would trigger the same leak. So I would update the commit title with something more generic like: dpctl: Fix netdev reference leak in "show" command. This can probably be changed when applying. In any case, the fix lgtm. Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 2/3] Userspace: Software fallback for UDP encapsulated TCP segmentation.
On Mon, Jul 8, 2024 at 4:22 PM Mike Pattrick wrote: > On Mon, Jul 8, 2024 at 8:23 AM David Marchand > wrote: > > A final note, I suspect all those checks negatively impact non tso > > packets processing when OVS tso is enabled. > > > > Would it be feasible to mark that tso (or a tx offload) has been > > requested at the "batch" level? > > This is more an optimisation... maybe in the future? > > I've been thinking about changing the trunc member to a flags bit > array lately in other contexts as well. It would be useful to mark if > a batch has fragmented packets, if there is mixed memory between > mbuf's and other types of packet source memory, there should be other > uses as well. Having batch flags would remove a few instances of > iterating over the whole batch. > > I think this is a seperate patch though. Yes. TSO is still marked experimental and we can wait for such optimisations. Thanks Mike. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 3/3] netdev-dpdk: Re-enable VXLAN/Geneve offload for Intel cards.
On Fri, Jul 5, 2024 at 10:44 PM Mike Pattrick wrote: > > Previously the OVS support for checksum/TSO offloading didn't work well > with some network cards that supported VXLAN/Geneve tunnel TSO but not > outer UDP checksums. Now support for this configuration is improved and > we no longer need to disable the VXLAN/Geneve TSO flags from intel > hardware support flags. > > The modification to outer UDP offload is still required pending a future > DPDK release. > > Suggested-by: David Marchand > Signed-off-by: Mike Pattrick Reviewed-by: David Marchand (for the record, I tested with E810 PF and VF) Thanks Mike! -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 2/3] Userspace: Software fallback for UDP encapsulated TCP segmentation.
On Fri, Jul 5, 2024 at 10:44 PM Mike Pattrick wrote: > diff --git a/lib/netdev.c b/lib/netdev.c > index f2d921ed6..866dbf3b7 100644 > --- a/lib/netdev.c > +++ b/lib/netdev.c > @@ -69,8 +69,6 @@ COVERAGE_DEFINE(netdev_received); > COVERAGE_DEFINE(netdev_sent); > COVERAGE_DEFINE(netdev_add_router); > COVERAGE_DEFINE(netdev_get_stats); > -COVERAGE_DEFINE(netdev_vxlan_tso_drops); > -COVERAGE_DEFINE(netdev_geneve_tso_drops); > COVERAGE_DEFINE(netdev_push_header_drops); > COVERAGE_DEFINE(netdev_soft_seg_good); > COVERAGE_DEFINE(netdev_soft_seg_drops); > @@ -910,28 +908,30 @@ netdev_send(struct netdev *netdev, int qid, struct > dp_packet_batch *batch, > struct dp_packet *packet; > int error; > > -if (userspace_tso_enabled() && > -!(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) { > -DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > -if (dp_packet_hwol_is_tso(packet)) { > -if (dp_packet_hwol_is_tunnel_vxlan(packet) > -&& !(netdev_flags & NETDEV_TX_VXLAN_TNL_TSO)) { > -VLOG_WARN_RL(&rl, "%s: No VXLAN TSO support", > - netdev_get_name(netdev)); > -COVERAGE_INC(netdev_vxlan_tso_drops); > -dp_packet_delete_batch(batch, true); > -return false; > +if (userspace_tso_enabled()) { > +if (!(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) { > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > +if (dp_packet_hwol_is_tso(packet)) { > +return netdev_send_tso(netdev, qid, batch, > concurrent_txq); > } > - > -if (dp_packet_hwol_is_tunnel_geneve(packet) > -&& !(netdev_flags & NETDEV_TX_GENEVE_TNL_TSO)) { > -VLOG_WARN_RL(&rl, "%s: No GENEVE TSO support", > - netdev_get_name(netdev)); > -COVERAGE_INC(netdev_geneve_tso_drops); > -dp_packet_delete_batch(batch, true); > -return false; > +} > +} else if (!(netdev_flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) { Just checking this feature is not enough at this point. Imagine a driver that has TSO support, but neither outer udp checksum nor any tunnel tso support (the one I have in mind is net/ixgbe). IIUC, a batch of (tunneled) tso packets with no request for outer udp csum would not be segmented in sw (and it would probably be dropped by hw). I think this block should be moved after checking the presence of *tnl_tso flags. > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > +if (dp_packet_hwol_is_tso(packet) && > +(dp_packet_hwol_is_tunnel_vxlan(packet) || > + dp_packet_hwol_is_tunnel_geneve(packet)) && > +dp_packet_hwol_is_outer_udp_cksum(packet)) { > +return netdev_send_tso(netdev, qid, batch, > concurrent_txq); > +} > +} > +} else if (!(netdev_flags & (NETDEV_TX_VXLAN_TNL_TSO | > + NETDEV_TX_GENEVE_TNL_TSO))) { > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > +if (dp_packet_hwol_is_tso(packet) && > +(dp_packet_hwol_is_tunnel_vxlan(packet) || > + dp_packet_hwol_is_tunnel_geneve(packet))) { Note: you can imagine a netdev supporting geneve tso, but not vxlan tso. A batch containing only tso requests through a geneve tunnel would end up being segmented in sw. DPDK drivers either support both or none. And only those drivers expose these netdev features in OVS. So I don't think it is a real problem and I am ok with this hunk. > +return netdev_send_tso(netdev, qid, batch, > concurrent_txq); > } > -return netdev_send_tso(netdev, qid, batch, concurrent_txq); > } > } > } A final note, I suspect all those checks negatively impact non tso packets processing when OVS tso is enabled. Would it be feasible to mark that tso (or a tx offload) has been requested at the "batch" level? This is more an optimisation... maybe in the future? Overall the patch lgtm. I'll be off soon, so with the issue I mentionned for net/ixgbe fixed, feel free to add my review tag. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/3] userspace: Adjust segment size on encapsulation.
On Fri, Jul 5, 2024 at 10:44 PM Mike Pattrick wrote: > > When prepending a tunnel header to a packet marked for segmentation, we > need to adjust the segment size. Failure to do so can result in packets > that are larger than the intended MTU post segmentation. > > Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") > Signed-off-by: Mike Pattrick Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [ovs-dev, v4, 3/3] netdev-dpdk: Re-enable VXLAN/Geneve offload for Intel cards.
Hello, On Mon, Jul 8, 2024 at 11:07 AM junwan...@cestc.cn wrote: > > Hi Mike, > > I noticed that the DPDK main version and the v24.07-rc1 version already > > include the changes for net/ice's outer UDP checksum offload. > > Can we now enable net/ice instead of disabling it? OVS main branch uses DPDK 23.11 LTS. Re-enabling this offload for Intel drivers will have to wait until the fixes are part of a released 23.11 LTS (which is expected in August, after OVS 3.4 is released). -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3] Userspace: Software fallback for UDP encapsulated TCP segmentation.
On Thu, Jul 4, 2024 at 7:55 AM Mike Pattrick wrote: > > When sending packets that are flagged as requiring segmentation to an > interface that doens't support this feature, send the packet to the TSO does not* > software fallback instead of dropping it. > > Signed-off-by: Mike Pattrick I did not test all the combinations yet (busy on dpdk). In any case, I don't expect much difference in behavior since v2. I see you intend to send a new revision, here are some small nits. > > --- > v2: > - Fixed udp tunnel length > - Added test that UDP headers are correct > - Split inner and outer ip_id into different counters > - Set tunnel flags in reset_tcp_seg > > v3: > - Changed logic of netdev_send() to account for NICs that support > tunnel offload but not checksum offload > - Adjusted udp tunnel header length during software tso > --- > lib/dp-packet-gso.c | 87 + > lib/dp-packet.h | 34 > lib/netdev-native-tnl.c | 8 > lib/netdev.c| 44 ++--- > tests/system-traffic.at | 87 + > 5 files changed, 221 insertions(+), 39 deletions(-) > > diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c > index 847685ad9..4ac16b7d5 100644 > --- a/lib/dp-packet-gso.c > +++ b/lib/dp-packet-gso.c [snip] > @@ -105,20 +115,37 @@ dp_packet_gso(struct dp_packet *p, struct > dp_packet_batch **batches) > return false; > } > > -tcp_hdr = dp_packet_l4(p); > -tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl); > -tcp_seq = ntohl(get_16aligned_be32(&tcp_hdr->tcp_seq)); > -hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p)) > - + tcp_offset * 4; > -ip_id = 0; > -if (dp_packet_hwol_is_ipv4(p)) { > +if (dp_packet_hwol_is_tunnel_vxlan(p) || > +dp_packet_hwol_is_tunnel_geneve(p)) { Nit: indent seems wrong ^^. > +outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p); > +tcp_hdr = dp_packet_inner_l4(p); > +ip_hdr = dp_packet_inner_l3(p); Nit: previously, ip_hdr (which is declared as struct ip_header *) was initialised only for ipv4. Now with this patch, it is set regardless of what is present in dp_packet_inner_l3. The only use for this ip_hdr is for extracting ip_id, so I would move ip_hdr setting in the branch where it matters. @@ -119,22 +119,22 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch **batches) dp_packet_hwol_is_tunnel_geneve(p)) { outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p); tcp_hdr = dp_packet_inner_l4(p); -ip_hdr = dp_packet_inner_l3(p); tnl = true; if (outer_ipv4) { outer_ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id); } if (dp_packet_hwol_is_ipv4(p)) { +ip_hdr = dp_packet_inner_l3(p); inner_ip_id = ntohs(ip_hdr->ip_id); } } else { outer_ipv4 = dp_packet_hwol_is_ipv4(p); tcp_hdr = dp_packet_l4(p); -ip_hdr = dp_packet_l3(p); tnl = false; if (outer_ipv4) { +ip_hdr = dp_packet_l3(p); outer_ip_id = ntohs(ip_hdr->ip_id); } } > +tnl = true; > + > +if (outer_ipv4) { > +outer_ip_id = ntohs(((struct ip_header *) > dp_packet_l3(p))->ip_id); > +} > +if (dp_packet_hwol_is_ipv4(p)) { > +inner_ip_id = ntohs(ip_hdr->ip_id); > +} > +} else { > +outer_ipv4 = dp_packet_hwol_is_ipv4(p); > +tcp_hdr = dp_packet_l4(p); > ip_hdr = dp_packet_l3(p); > -ip_id = ntohs(ip_hdr->ip_id); > +tnl = false; > + > +if (outer_ipv4) { > +outer_ip_id = ntohs(ip_hdr->ip_id); > +} > } > [snip] Once this patch is in, we can re-enable RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO / RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO for Intel drivers. Outer udp checksum is broken for them (until we have a new 23.11.x release with driver fixes) but, from my previous tests, the tunnel tso feature works fine with net/ice at least, when (tunnel) csum is not requested. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 2/2] tests: Skip memory error in DPDK tests.
On Thu, Jun 27, 2024 at 10:38 AM Ales Musil wrote: > > Add skip for memory error produced by DPDK on ARM with 2MB hugepages > which isn't harmful [0]. > > [0] http://mails.dpdk.org/archives/dev/2024-June/296764.html > > Suggested-by: David Marchand > Signed-off-by: Ales Musil I would mention ARM in the commit title (maybe it can be tweaked when applying), but otherwise the patch lgtm. Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 1/2] ci: Save some DPDK compilation time.
On Thu, Jun 27, 2024 at 10:38 AM Ales Musil wrote: > > Add more options that will shave some compilation time off DPDK by > skipping unused parts of the code. > > Suggested-by: David Marchand > Signed-off-by: Ales Musil Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] Userspace: Software fallback for UDP encapsulated TCP segmentation.
On Thu, Jun 27, 2024 at 4:04 PM David Marchand wrote: > @@ -926,6 +926,17 @@ netdev_send(struct netdev *netdev, int qid, > struct dp_packet_batch *batch, > return netdev_send_tso(netdev, qid, batch, > concurrent_txq); > } > } > +} else if (!(netdev_flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) { > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > +if (!dp_packet_hwol_is_tso(packet) && > +!dp_packet_hwol_is_tunnel_vxlan(packet) && > +!dp_packet_hwol_is_tunnel_geneve(packet)) { > +continue; Erm, less buggy: +if (!dp_packet_hwol_is_tso(packet) || +(!dp_packet_hwol_is_tunnel_vxlan(packet) && + !dp_packet_hwol_is_tunnel_geneve(packet))) { +continue; +} > +} > +if (dp_packet_hwol_is_outer_udp_cksum(packet)) { > +return netdev_send_tso(netdev, qid, batch, > concurrent_txq); > +} > +} > } > } -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] Userspace: Software fallback for UDP encapsulated TCP segmentation.
On Fri, Jun 21, 2024 at 5:07 PM David Marchand wrote: > net/mlx5 is claiming (geneve|vxlan)_tso_offload while not supporting > outer udp checksum. This may be a set of capabilities we can use if, for example, no udp checksum is requested. How about adding: # git diff diff --git a/lib/netdev.c b/lib/netdev.c index 1d59bbe5d..429c86a22 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -926,6 +926,17 @@ netdev_send(struct netdev *netdev, int qid, struct dp_packet_batch *batch, return netdev_send_tso(netdev, qid, batch, concurrent_txq); } } +} else if (!(netdev_flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) { +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { +if (!dp_packet_hwol_is_tso(packet) && +!dp_packet_hwol_is_tunnel_vxlan(packet) && +!dp_packet_hwol_is_tunnel_geneve(packet)) { +continue; +} +if (dp_packet_hwol_is_outer_udp_cksum(packet)) { +return netdev_send_tso(netdev, qid, batch, concurrent_txq); +} +} } } Here are some numbers on mlx5, with the udp length fix I mentionned previously + this hunk above: # Sw GSO when csum enabled - ovs-vsctl del-port tunnel0 -- add-port br-int tunnel0 -- set interface tunnel0 type=vxlan options:remote_ip=172.31.3.1 options:key=0 options:csum=true Connecting to host 172.31.2.2, port 5201 Reverse mode, remote host 172.31.2.2 is sending [ 5] local 172.31.2.1 port 58748 connected to 172.31.2.2 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 580 MBytes 4.87 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-1.04 sec 582 MBytes 4.70 Gbits/sec0 sender [ 5] 0.00-1.00 sec 580 MBytes 4.87 Gbits/sec receiver iperf Done. # Hw GSO when csum disabled - ovs-vsctl del-port tunnel0 -- add-port br-int tunnel0 -- set interface tunnel0 type=vxlan options:remote_ip=172.31.3.1 options:key=0 options:csum=false Connecting to host 172.31.2.2, port 5201 Reverse mode, remote host 172.31.2.2 is sending [ 5] local 172.31.2.1 port 39080 connected to 172.31.2.2 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 1.50 GBytes 12.9 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-1.04 sec 1.50 GBytes 12.4 Gbits/sec 213 sender [ 5] 0.00-1.00 sec 1.50 GBytes 12.9 Gbits/sec receiver iperf Done. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] ci: Save some compilation for DPDK and skip error on ARM.
On Thu, Jun 27, 2024 at 10:16 AM Ales Musil wrote: > > Add more options that will shave some compilation time of DPDK by > removing unused parts of the code. Also add skip for memory error > produced by DPDK on ARM with 2MB hugepages which isn't harmful [0]. > > [0] http://mails.dpdk.org/archives/dev/2024-June/296764.html > > Suggested-by: David Marchand > Signed-off-by: Ales Musil I would split this in two patches (compilation changes in one patch, error log in another) but overall the changes lgtm. One nit below. > --- > tests/system-dpdk-macros.at | 3 ++- > utilities/containers/prepare.sh | 10 ++ > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at > index 440908af7..39d19f366 100644 > --- a/tests/system-dpdk-macros.at > +++ b/tests/system-dpdk-macros.at > @@ -77,7 +77,8 @@ $1";/EAL: No \(available\|free\) .*hugepages reported/d > /netdev_linux.*obtaining netdev stats via vport failed/d > /qdisc_create_multiq(): Could not add multiq qdisc (2): No such file or > directory/d > /tap_mp_req_on_rxtx(): Failed to send start req to secondary/d > -/tap_nl_dump_ext_ack(): Specified qdisc kind is unknown/d"]) > +/tap_nl_dump_ext_ack(): Specified qdisc kind is unknown/d > +/EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list/d"]) Nit: insert this "alphabetically", close to other EAL: ignored messages. > AT_CHECK([:; $2]) > ]) > -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] Userspace: Software fallback for UDP encapsulated TCP segmentation.
Some additional comments. On Thu, Jun 13, 2024 at 4:54 AM Mike Pattrick wrote: > @@ -89,14 +96,19 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch > **batches) > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > struct dp_packet_batch *curr_batch = *batches; > struct tcp_header *tcp_hdr; > +struct udp_header *tnl_hdr; > struct ip_header *ip_hdr; > +uint16_t inner_ip_id = 0; > +uint16_t outer_ip_id = 0; > struct dp_packet *seg; > +const char *data_pos; > uint16_t tcp_offset; > uint16_t tso_segsz; > uint32_t tcp_seq; > -uint16_t ip_id; > +bool outer_ipv4; > int hdr_len; > int seg_len; > +bool tnl; > > tso_segsz = dp_packet_get_tso_segsz(p); > if (!tso_segsz) { > @@ -105,20 +117,38 @@ dp_packet_gso(struct dp_packet *p, struct > dp_packet_batch **batches) > return false; > } > > -tcp_hdr = dp_packet_l4(p); > -tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl); > -tcp_seq = ntohl(get_16aligned_be32(&tcp_hdr->tcp_seq)); > -hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p)) > - + tcp_offset * 4; > -ip_id = 0; > -if (dp_packet_hwol_is_ipv4(p)) { > +if (dp_packet_hwol_is_tunnel_vxlan(p) || > +dp_packet_hwol_is_tunnel_geneve(p)) { > +data_pos = dp_packet_get_inner_tcp_payload(p); > +outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p); > +tcp_hdr = dp_packet_inner_l4(p); > +ip_hdr = dp_packet_inner_l3(p); > +tnl = true; > + > +if (outer_ipv4) { > +outer_ip_id = ntohs(((struct ip_header *) > dp_packet_l3(p))->ip_id); > +} > +if (dp_packet_hwol_is_ipv4(p)) { > +inner_ip_id = ntohs(ip_hdr->ip_id); > +} > +} else { > +data_pos = dp_packet_get_tcp_payload(p); > +outer_ipv4 = dp_packet_hwol_is_ipv4(p); > +tcp_hdr = dp_packet_l4(p); > ip_hdr = dp_packet_l3(p); > -ip_id = ntohs(ip_hdr->ip_id); > +tnl = false; > + > +if (outer_ipv4) { > +outer_ip_id = ntohs(ip_hdr->ip_id); > +} > } > > +tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl); > +tcp_seq = ntohl(get_16aligned_be32(&tcp_hdr->tcp_seq)); > +hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p)) > + + tcp_offset * 4; > const char *data_tail = (char *) dp_packet_tail(p) > - dp_packet_l2_pad_size(p); > -const char *data_pos = dp_packet_get_tcp_payload(p); Not a strong opinion but data_pos init could be kept here. const char *data_pos = (char *) tcp_hdr + tcp_offset * 4; > int n_segs = dp_packet_gso_nr_segs(p); > > for (int i = 0; i < n_segs; i++) { > @@ -130,14 +160,35 @@ dp_packet_gso(struct dp_packet *p, struct > dp_packet_batch **batches) > seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len); > data_pos += seg_len; > > +if (tnl) { > +/* Update tunnel inner L3 header. */ > +if (dp_packet_hwol_is_ipv4(seg)) { > +ip_hdr = dp_packet_inner_l3(seg); > +ip_hdr->ip_tot_len = htons(sizeof *ip_hdr + > + dp_packet_inner_l4_size(seg)); ip_hdr->ip_tot_len = htons(dp_packet_inner_l3_size(seg)); is easier to read (and more consistent with the ipv6 block that follows). > +ip_hdr->ip_id = htons(inner_ip_id); > +ip_hdr->ip_csum = 0; > +inner_ip_id++; > +} else { > +struct ovs_16aligned_ip6_hdr *ip6_hdr; > + > +ip6_hdr = dp_packet_inner_l3(seg); > +ip6_hdr->ip6_ctlun.ip6_un1.ip6_un1_plen > += htons(dp_packet_inner_l3_size(seg) - sizeof *ip6_hdr); > +} > +} > + > /* Update L3 header. */ > -if (dp_packet_hwol_is_ipv4(seg)) { > +if (outer_ipv4) { > +uint16_t ip_tot_len; > ip_hdr = dp_packet_l3(seg); > -ip_hdr->ip_tot_len = htons(sizeof *ip_hdr + > - dp_packet_l4_size(seg)); > -ip_hdr->ip_id = htons(ip_id); > +tnl_hdr = dp_packet_l4(seg); > +ip_tot_len = sizeof *ip_hdr + dp_packet_l4_size(seg); ip_hdr->ip_tot_len = htons(dp_packet_l3_size(seg)); > + ip_hdr->ip_tot_len = htons(ip_tot_len); > +ip_hdr->ip_id = htons(outer_ip_id); > ip_hdr->ip_csum = 0; > -ip_id++; > +tnl_hdr->udp_len = htons(ip_tot_len - IP_HEADER_LEN); > +outer_ip_id++; > } else { > struct ovs_16aligned_ip6_hdr *ip6_hdr = dp_packet_l3(seg); > -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] Userspace: Software fallback for UDP encapsulated TCP segmentation.
13,16 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) netdev_get_name(&dev->up)); } -if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO) { +if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM && +info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO) { dev->hw_ol_features |= NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD; } else { VLOG_WARN("%s: Tx Vxlan tunnel TSO offload is not supported.", netdev_get_name(&dev->up)); } -if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO) { +if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM && +info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO) { dev->hw_ol_features |= NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD; } else { VLOG_WARN("%s: Tx Geneve tunnel TSO offload is not supported.", > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > +if (!dp_packet_hwol_is_tso(packet)) { > +continue; > +} > +if (dp_packet_hwol_is_tunnel_vxlan(packet) || > +dp_packet_hwol_is_tunnel_geneve(packet)) { > +return netdev_send_tso(netdev, qid, batch, > concurrent_txq); > } > -return netdev_send_tso(netdev, qid, batch, concurrent_txq); > } > } > } -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4] netdev-dpdk: Use LSC interrupt mode.
Querying link status may get delayed for an undeterministic (long) time with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool kernel API and getting stuck on the kernel RTNL lock while some other operation is in progress under this lock. One impact for long link status query is that it is called under the bond lock taken in write mode periodically in bond_run(). In parallel, datapath threads may block requesting to read bonding related info (like for example in bond_check_admissibility()). The LSC interrupt mode is available with many DPDK drivers and is used by default with testpmd. It seems safe enough to switch on this feature by default in OVS. We keep the per interface option to disable this feature in case of an unforeseen bug. Signed-off-by: David Marchand Reviewed-by: Robin Jarry Acked-by: Mike Pattrick --- Changes since v3: - updated logging in case of error, Changes since v2: - fixed typo in NEWS, Changes since v1: - (early) fail when interrupt lsc is requested by user but not supported by the driver, - otherwise, log a debug message if user did not request interrupt mode, --- Documentation/topics/dpdk/phy.rst | 4 ++-- NEWS | 3 +++ lib/netdev-dpdk.c | 13 - vswitchd/vswitch.xml | 8 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst index efd168cba8..eefc25613d 100644 --- a/Documentation/topics/dpdk/phy.rst +++ b/Documentation/topics/dpdk/phy.rst @@ -546,8 +546,8 @@ the firmware every time to fulfil this request. Note that not all PMD drivers support LSC interrupts. -The default configuration is polling mode. To set interrupt mode, option -``dpdk-lsc-interrupt`` has to be set to ``true``. +The default configuration is interrupt mode. To set polling mode, option +``dpdk-lsc-interrupt`` has to be set to ``false``. Command to set interrupt mode for a specific interface:: $ ovs-vsctl set interface options:dpdk-lsc-interrupt=true diff --git a/NEWS b/NEWS index 5ae0108d55..d05f2d0f89 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,9 @@ Post-v3.3.0 https://github.com/openvswitch/ovs.git - DPDK: * OVS validated with DPDK 23.11.1. + * Link status changes are now handled via interrupt mode if the DPDK + driver supports it. It is possible to revert to polling mode by setting + per interface 'options:dpdk-lsc-interrupt' to 'false'. v3.3.0 - 16 Feb 2024 diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0fa37d5145..c42144b091 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, } } -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false); +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true); +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) { +if (smap_get(args, "dpdk-lsc-interrupt")) { +VLOG_WARN_BUF(errp, "'%s': link status interrupt is not " + "supported.", netdev_get_name(netdev)); +err = EINVAL; +goto out; +} +VLOG_DBG("interface '%s': not enabling link status interrupt.", + netdev_get_name(netdev)); +lsc_interrupt_mode = false; +} if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) { dev->requested_lsc_interrupt_mode = lsc_interrupt_mode; netdev_request_reconfigure(netdev); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 8a1b607d71..e3afb78a4e 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -4647,12 +4647,12 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ - Set this value to true to configure interrupt mode for - Link State Change (LSC) detection instead of poll mode for the DPDK - interface. + Set this value to false to configure poll mode for + Link State Change (LSC) detection instead of interrupt mode for the + DPDK interface. - If this value is not set, poll mode is configured. + If this value is not set, interrupt mode is configured. This parameter has an effect only on netdev dpdk interfaces. -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3] netdev-dpdk: Use LSC interrupt mode.
On Mon, Jun 17, 2024 at 10:11 AM Ilya Maximets wrote: > > On 6/17/24 09:46, David Marchand wrote: > > On Fri, Jun 14, 2024 at 6:22 PM Ilya Maximets wrote: > >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > >>> index 0fa37d5145..a260bc8485 100644 > >>> --- a/lib/netdev-dpdk.c > >>> +++ b/lib/netdev-dpdk.c > >>> @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, > >>> const struct smap *args, > >>> } > >>> } > >>> > >>> -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", > >>> false); > >>> +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true); > >>> +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) > >>> { > >>> +if (smap_get(args, "dpdk-lsc-interrupt")) { > >>> +VLOG_ERR("interface '%s': link status interrupt is not > >>> supported.", > >>> + netdev_get_name(netdev)); > >> > >> Since we're exiting with an error set, the message should be buffered > >> into errp instead, so it can be visible in the database record and > >> returned as a result of the ovs-vsctl. > >> > >> Also, we're using WARN level for all other configuration issues, so we > >> should do that here as well. ERR is usually some sort of internal error. > >> And we're usually just using "%s: ..." and not "interface '%s': ...". > > > > Ok for ERR vs WARN. > > > > For the rest, well, I copied the logs right before. > > > > vf_mac = smap_get(args, "dpdk-vf-mac"); > > if (vf_mac) { > > struct eth_addr mac; > > > > if (!dpdk_port_is_representor(dev)) { > > VLOG_WARN("'%s' is trying to set the VF MAC '%s' " > > "but 'options:dpdk-vf-mac' is only supported for " > > "VF representors.", > > netdev_get_name(netdev), vf_mac); > > } else if (!eth_addr_from_string(vf_mac, &mac)) { > > VLOG_WARN("interface '%s': cannot parse VF MAC '%s'.", > > netdev_get_name(netdev), vf_mac); > > } else if (eth_addr_is_multicast(mac)) { > > VLOG_WARN("interface '%s': cannot set VF MAC to multicast " > > "address '%s'.", netdev_get_name(netdev), vf_mac); > > } else if (!eth_addr_equals(dev->requested_hwaddr, mac)) { > > dev->requested_hwaddr = mac; > > netdev_request_reconfigure(netdev); > > } > > } > > > > lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true); > > > > > > So I'll fix the dpdk-vf-mac stuff (and double check the rest of this > > function), then go with your suggestion for this added log of mine. > > > > > > We must not initialize errp if we do not fail with error, otherwise we leak > the memory. VF mac code does not fail the configuration, so we only log the > warning. All the paths that fail should set errp instead. > Talk about obvious... I'll fix my stuff and leave the rest untouched. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3] netdev-dpdk: Use LSC interrupt mode.
On Fri, Jun 14, 2024 at 6:22 PM Ilya Maximets wrote: > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index 0fa37d5145..a260bc8485 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const > > struct smap *args, > > } > > } > > > > -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false); > > +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true); > > +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) { > > +if (smap_get(args, "dpdk-lsc-interrupt")) { > > +VLOG_ERR("interface '%s': link status interrupt is not > > supported.", > > + netdev_get_name(netdev)); > > Since we're exiting with an error set, the message should be buffered > into errp instead, so it can be visible in the database record and > returned as a result of the ovs-vsctl. > > Also, we're using WARN level for all other configuration issues, so we > should do that here as well. ERR is usually some sort of internal error. > And we're usually just using "%s: ..." and not "interface '%s': ...". Ok for ERR vs WARN. For the rest, well, I copied the logs right before. vf_mac = smap_get(args, "dpdk-vf-mac"); if (vf_mac) { struct eth_addr mac; if (!dpdk_port_is_representor(dev)) { VLOG_WARN("'%s' is trying to set the VF MAC '%s' " "but 'options:dpdk-vf-mac' is only supported for " "VF representors.", netdev_get_name(netdev), vf_mac); } else if (!eth_addr_from_string(vf_mac, &mac)) { VLOG_WARN("interface '%s': cannot parse VF MAC '%s'.", netdev_get_name(netdev), vf_mac); } else if (eth_addr_is_multicast(mac)) { VLOG_WARN("interface '%s': cannot set VF MAC to multicast " "address '%s'.", netdev_get_name(netdev), vf_mac); } else if (!eth_addr_equals(dev->requested_hwaddr, mac)) { dev->requested_hwaddr = mac; netdev_request_reconfigure(netdev); } } lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true); So I'll fix the dpdk-vf-mac stuff (and double check the rest of this function), then go with your suggestion for this added log of mine. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Use LSC interrupt mode.
On Fri, Jun 14, 2024 at 10:17 AM Robin Jarry wrote: > > David Marchand, Jun 14, 2024 at 08:48: > > Querying link status may get delayed for an undeterministic (long) time > > with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool > > kernel API and getting stuck on the kernel RTNL lock while some other > > operation is in progress under this lock. > > > > One impact for long link status query is that it is called under the bond > > lock taken in write mode periodically in bond_run(). > > In parallel, datapath threads may block requesting to read bonding related > > info (like for example in bond_check_admissibility()). > > > > The LSC interrupt mode is available with many DPDK drivers and is used by > > default with testpmd. > > > > It seems safe enough to switch on this feature by default in OVS. > > We keep the per interface option to disable this feature in case of an > > unforeseen bug. > > > > Signed-off-by: David Marchand > > --- > > Changes since v1: > > - (early) fail when interrupt lsc is requested by user but not supported > > by the driver, > > - otherwise, log a debug message if user did not request interrupt mode, > > This looks good to me. Is there a chance that this could be backported > to the 3.1 branch? Well, this changes a default behavior, so it may not be welcome in a stable branch. On the other hand, this helps resolving (not always trivial) random packet drops, so it is worth considering. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Use LSC interrupt mode.
On Fri, Jun 14, 2024 at 10:40 AM Robin Jarry wrote: > > David Marchand, Jun 14, 2024 at 08:48: > > diff --git a/NEWS b/NEWS > > index 5ae0108d55..1e19beb793 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -9,6 +9,9 @@ Post-v3.3.0 > > https://github.com/openvswitch/ovs.git > > - DPDK: > > * OVS validated with DPDK 23.11.1. > > + * Link status changes are now handled via interrupt mode if the DPDK > > + driver supports it. It is possible to revert to polling mode by > > setting > > + per interface 'dpdk-lsc-interrupt' other_config to 'false'. > > Nit pick: this setting is options:dpdk-lsc-interrupt not in > other_config. > Oh, indeed.. fixed in v3. Thanks Robin. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3] netdev-dpdk: Use LSC interrupt mode.
Querying link status may get delayed for an undeterministic (long) time with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool kernel API and getting stuck on the kernel RTNL lock while some other operation is in progress under this lock. One impact for long link status query is that it is called under the bond lock taken in write mode periodically in bond_run(). In parallel, datapath threads may block requesting to read bonding related info (like for example in bond_check_admissibility()). The LSC interrupt mode is available with many DPDK drivers and is used by default with testpmd. It seems safe enough to switch on this feature by default in OVS. We keep the per interface option to disable this feature in case of an unforeseen bug. Signed-off-by: David Marchand Reviewed-by: Robin Jarry Acked-by: Mike Pattrick --- Changes since v2: - fixed typo in NEWS, Changes since v1: - (early) fail when interrupt lsc is requested by user but not supported by the driver, - otherwise, log a debug message if user did not request interrupt mode, --- Documentation/topics/dpdk/phy.rst | 4 ++-- NEWS | 3 +++ lib/netdev-dpdk.c | 13 - vswitchd/vswitch.xml | 8 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst index efd168cba8..eefc25613d 100644 --- a/Documentation/topics/dpdk/phy.rst +++ b/Documentation/topics/dpdk/phy.rst @@ -546,8 +546,8 @@ the firmware every time to fulfil this request. Note that not all PMD drivers support LSC interrupts. -The default configuration is polling mode. To set interrupt mode, option -``dpdk-lsc-interrupt`` has to be set to ``true``. +The default configuration is interrupt mode. To set polling mode, option +``dpdk-lsc-interrupt`` has to be set to ``false``. Command to set interrupt mode for a specific interface:: $ ovs-vsctl set interface options:dpdk-lsc-interrupt=true diff --git a/NEWS b/NEWS index 5ae0108d55..d05f2d0f89 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,9 @@ Post-v3.3.0 https://github.com/openvswitch/ovs.git - DPDK: * OVS validated with DPDK 23.11.1. + * Link status changes are now handled via interrupt mode if the DPDK + driver supports it. It is possible to revert to polling mode by setting + per interface 'options:dpdk-lsc-interrupt' to 'false'. v3.3.0 - 16 Feb 2024 diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0fa37d5145..a260bc8485 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, } } -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false); +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true); +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) { +if (smap_get(args, "dpdk-lsc-interrupt")) { +VLOG_ERR("interface '%s': link status interrupt is not supported.", + netdev_get_name(netdev)); +err = EINVAL; +goto out; +} +VLOG_DBG("interface '%s': not enabling link status interrupt.", + netdev_get_name(netdev)); +lsc_interrupt_mode = false; +} if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) { dev->requested_lsc_interrupt_mode = lsc_interrupt_mode; netdev_request_reconfigure(netdev); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 8a1b607d71..e3afb78a4e 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -4647,12 +4647,12 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ - Set this value to true to configure interrupt mode for - Link State Change (LSC) detection instead of poll mode for the DPDK - interface. + Set this value to false to configure poll mode for + Link State Change (LSC) detection instead of interrupt mode for the + DPDK interface. - If this value is not set, poll mode is configured. + If this value is not set, interrupt mode is configured. This parameter has an effect only on netdev dpdk interfaces. -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] netdev-dpdk: Use LSC interrupt mode.
Querying link status may get delayed for an undeterministic (long) time with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool kernel API and getting stuck on the kernel RTNL lock while some other operation is in progress under this lock. One impact for long link status query is that it is called under the bond lock taken in write mode periodically in bond_run(). In parallel, datapath threads may block requesting to read bonding related info (like for example in bond_check_admissibility()). The LSC interrupt mode is available with many DPDK drivers and is used by default with testpmd. It seems safe enough to switch on this feature by default in OVS. We keep the per interface option to disable this feature in case of an unforeseen bug. Signed-off-by: David Marchand --- Changes since v1: - (early) fail when interrupt lsc is requested by user but not supported by the driver, - otherwise, log a debug message if user did not request interrupt mode, --- Documentation/topics/dpdk/phy.rst | 4 ++-- NEWS | 3 +++ lib/netdev-dpdk.c | 13 - vswitchd/vswitch.xml | 8 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst index efd168cba8..eefc25613d 100644 --- a/Documentation/topics/dpdk/phy.rst +++ b/Documentation/topics/dpdk/phy.rst @@ -546,8 +546,8 @@ the firmware every time to fulfil this request. Note that not all PMD drivers support LSC interrupts. -The default configuration is polling mode. To set interrupt mode, option -``dpdk-lsc-interrupt`` has to be set to ``true``. +The default configuration is interrupt mode. To set polling mode, option +``dpdk-lsc-interrupt`` has to be set to ``false``. Command to set interrupt mode for a specific interface:: $ ovs-vsctl set interface options:dpdk-lsc-interrupt=true diff --git a/NEWS b/NEWS index 5ae0108d55..1e19beb793 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,9 @@ Post-v3.3.0 https://github.com/openvswitch/ovs.git - DPDK: * OVS validated with DPDK 23.11.1. + * Link status changes are now handled via interrupt mode if the DPDK + driver supports it. It is possible to revert to polling mode by setting + per interface 'dpdk-lsc-interrupt' other_config to 'false'. v3.3.0 - 16 Feb 2024 diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0fa37d5145..a260bc8485 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, } } -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false); +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true); +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) { +if (smap_get(args, "dpdk-lsc-interrupt")) { +VLOG_ERR("interface '%s': link status interrupt is not supported.", + netdev_get_name(netdev)); +err = EINVAL; +goto out; +} +VLOG_DBG("interface '%s': not enabling link status interrupt.", + netdev_get_name(netdev)); +lsc_interrupt_mode = false; +} if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) { dev->requested_lsc_interrupt_mode = lsc_interrupt_mode; netdev_request_reconfigure(netdev); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 8a1b607d71..e3afb78a4e 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -4647,12 +4647,12 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ - Set this value to true to configure interrupt mode for - Link State Change (LSC) detection instead of poll mode for the DPDK - interface. + Set this value to false to configure poll mode for + Link State Change (LSC) detection instead of interrupt mode for the + DPDK interface. - If this value is not set, poll mode is configured. + If this value is not set, interrupt mode is configured. This parameter has an effect only on netdev dpdk interfaces. -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dpdk: Use LSC interrupt mode.
On Thu, Jun 13, 2024 at 4:32 PM Mike Pattrick wrote: > > On Thu, Jun 13, 2024 at 5:59 AM David Marchand > wrote: > > > > Querying link status may get delayed for an undeterministic (long) time > > with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool > > kernel API and getting stuck on the kernel RTNL lock while some other > > operation is in progress under this lock. > > > > One impact for long link status query is that it is called under the bond > > lock taken in write mode periodically in bond_run(). > > In parallel, datapath threads may block requesting to read bonding related > > info (like for example in bond_check_admissibility()). > > > > The LSC interrupt mode is available with many DPDK drivers and is used by > > default with testpmd. > > > > It seems safe enough to switch on this feature by default in OVS. > > We keep the per interface option to disable this feature in case of an > > unforeseen bug. > > > > Signed-off-by: David Marchand > > --- > > Documentation/topics/dpdk/phy.rst | 4 ++-- > > NEWS | 3 +++ > > lib/netdev-dpdk.c | 7 ++- > > vswitchd/vswitch.xml | 8 > > 4 files changed, 15 insertions(+), 7 deletions(-) > > > > diff --git a/Documentation/topics/dpdk/phy.rst > > b/Documentation/topics/dpdk/phy.rst > > index efd168cba8..eefc25613d 100644 > > --- a/Documentation/topics/dpdk/phy.rst > > +++ b/Documentation/topics/dpdk/phy.rst > > @@ -546,8 +546,8 @@ the firmware every time to fulfil this request. > > > > Note that not all PMD drivers support LSC interrupts. > > > > -The default configuration is polling mode. To set interrupt mode, option > > -``dpdk-lsc-interrupt`` has to be set to ``true``. > > +The default configuration is interrupt mode. To set polling mode, option > > +``dpdk-lsc-interrupt`` has to be set to ``false``. > > > > Command to set interrupt mode for a specific interface:: > > $ ovs-vsctl set interface options:dpdk-lsc-interrupt=true > > diff --git a/NEWS b/NEWS > > index 5ae0108d55..1e19beb793 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -9,6 +9,9 @@ Post-v3.3.0 > > https://github.com/openvswitch/ovs.git > > - DPDK: > > * OVS validated with DPDK 23.11.1. > > + * Link status changes are now handled via interrupt mode if the DPDK > > + driver supports it. It is possible to revert to polling mode by > > setting > > + per interface 'dpdk-lsc-interrupt' other_config to 'false'. > > > > > > v3.3.0 - 16 Feb 2024 > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index 0fa37d5145..833d852319 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -2397,7 +2397,12 @@ netdev_dpdk_set_config(struct netdev *netdev, const > > struct smap *args, > > } > > } > > > > -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false); > > +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true); > > +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) { > > +VLOG_INFO("interface '%s': link status interrupt is not > > supported.", > > + netdev_get_name(netdev)); > > +lsc_interrupt_mode = false; > > +} > > I did a quick grep of the dpdk source tree and noticed the following > drivers had this define: > > bnxt dpaa dpaa2 ena failsafe hns3 mlx4 mlx5 netvsc tap vhost virtio For "normal" PCI drivers, the flag to look for is RTE_PCI_DRV_INTR_LSC $ git grep -A1 RTE_PCI_DRV_INTR_LSC lib/ethdev/ lib/ethdev/ethdev_pci.h:if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC) lib/ethdev/ethdev_pci.h- eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC; > > This may have been an invalid methodology, but I'm noticing some > drivers for very common network cards are missing. Is there a risk > here of starting to print these info messages all the time for a large > number of users who never set this feature in the first place? $ for dir in drivers/net/*; do git grep -q -E 'RTE_PCI_DRV_INTR_LSC|RTE_ETH_DEV_INTR_LSC' $dir && echo "OK for "$dir || echo " Missing for "$dir; done | grep -c OK 34 $ for dir in drivers/net/*; do git grep -q -E 'RTE_PCI_DRV_INTR_LSC|RTE_ETH_DEV_INTR_LSC' $dir && echo "OK for "$dir || echo " Missing for "$dir; done | grep -c Missing 25 The majority of net drivers has this feature. > > It may be preferable to continue defaulting to true, but only print > the error if the flag was explicitly set. I had considered this option, but I went with the simpler approach :-). Ok since you thought of it too, let me update for v2. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev-dpdk: Use LSC interrupt mode.
Querying link status may get delayed for an undeterministic (long) time with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool kernel API and getting stuck on the kernel RTNL lock while some other operation is in progress under this lock. One impact for long link status query is that it is called under the bond lock taken in write mode periodically in bond_run(). In parallel, datapath threads may block requesting to read bonding related info (like for example in bond_check_admissibility()). The LSC interrupt mode is available with many DPDK drivers and is used by default with testpmd. It seems safe enough to switch on this feature by default in OVS. We keep the per interface option to disable this feature in case of an unforeseen bug. Signed-off-by: David Marchand --- Documentation/topics/dpdk/phy.rst | 4 ++-- NEWS | 3 +++ lib/netdev-dpdk.c | 7 ++- vswitchd/vswitch.xml | 8 4 files changed, 15 insertions(+), 7 deletions(-) diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst index efd168cba8..eefc25613d 100644 --- a/Documentation/topics/dpdk/phy.rst +++ b/Documentation/topics/dpdk/phy.rst @@ -546,8 +546,8 @@ the firmware every time to fulfil this request. Note that not all PMD drivers support LSC interrupts. -The default configuration is polling mode. To set interrupt mode, option -``dpdk-lsc-interrupt`` has to be set to ``true``. +The default configuration is interrupt mode. To set polling mode, option +``dpdk-lsc-interrupt`` has to be set to ``false``. Command to set interrupt mode for a specific interface:: $ ovs-vsctl set interface options:dpdk-lsc-interrupt=true diff --git a/NEWS b/NEWS index 5ae0108d55..1e19beb793 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,9 @@ Post-v3.3.0 https://github.com/openvswitch/ovs.git - DPDK: * OVS validated with DPDK 23.11.1. + * Link status changes are now handled via interrupt mode if the DPDK + driver supports it. It is possible to revert to polling mode by setting + per interface 'dpdk-lsc-interrupt' other_config to 'false'. v3.3.0 - 16 Feb 2024 diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0fa37d5145..833d852319 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2397,7 +2397,12 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, } } -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false); +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true); +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) { +VLOG_INFO("interface '%s': link status interrupt is not supported.", + netdev_get_name(netdev)); +lsc_interrupt_mode = false; +} if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) { dev->requested_lsc_interrupt_mode = lsc_interrupt_mode; netdev_request_reconfigure(netdev); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 8a1b607d71..e3afb78a4e 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -4647,12 +4647,12 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ - Set this value to true to configure interrupt mode for - Link State Change (LSC) detection instead of poll mode for the DPDK - interface. + Set this value to false to configure poll mode for + Link State Change (LSC) detection instead of interrupt mode for the + DPDK interface. - If this value is not set, poll mode is configured. + If this value is not set, interrupt mode is configured. This parameter has an effect only on netdev dpdk interfaces. -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/1] netdev-dpdk: Check pending reset when adding device.
On Wed, Jun 12, 2024 at 4:33 PM Kevin Traynor wrote: > > When a device reset interrupt event (RTE_ETH_EVENT_INTR_RESET) > is detected for a DPDK device added to OVS, a device reset is > performed. > > If a device reset interrupt event is detected for a device before > it is added to OVS, device reset is not called. > > If that device is later attempted to be added to OVS, it may fail > while being configured if it is still pending a reset as pending > reset is not checked when adding a device. > > A simple way to force a reset event from the ice driver for an > iavf device is to set the mac address after binding iavf dev to > vfio but before adding to OVS. (note: should not be set like this > in normal case). e.g. > > $ echo 2 > /sys/class/net/ens3f0/device/sriov_numvfs > $ ./devbind.py -b vfio-pci :d8:01.1 > $ ip link set ens3f0 vf 1 mac 26:ab:e6:6f:79:4d > $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \ > options:dpdk-devargs=:d8:01.1 > > |dpdk|ERR|Port1 dev_configure = -1 > |netdev_dpdk|WARN|Interface dpdk0 eth_dev setup error > Operation not permitted > |netdev_dpdk|ERR|Interface dpdk0(rxq:1 txq:5 lsc interrupt mode:false) > configure error: Operation not permitted > |dpif_netdev|ERR|Failed to set interface dpdk0 new configuration > > Add a check if there was any previous device reset interrupt events > when a device is added to OVS. If there was, perform the reset > before continuing with the rest of the configuration. > > netdev_dpdk_pending_reset[] already tracks device reset interrupt > events for all devices, so it can be reused to check if there is a > reset needed during configuration of newly added devices. By extending > it's usage, dev->reset_needed is no longer needed. > Maybe add a Fixes: and I think we should backport this. > Signed-off-by: Kevin Traynor > --- > lib/netdev-dpdk.c | 20 > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 0fa37d514..9ceb0d972 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -465,7 +465,6 @@ struct netdev_dpdk { > /* If true, rte_eth_dev_start() was successfully called */ > bool started; > -bool reset_needed; > -/* 1 pad byte here. */ > struct eth_addr hwaddr; > +/* 2 pad byte here. */ Nit: bytes? > int mtu; > int socket_id; > @@ -1532,5 +1531,4 @@ common_construct(struct netdev *netdev, dpdk_port_t > port_no, > dev->attached = false; > dev->started = false; > -dev->reset_needed = false; > > ovsrcu_init(&dev->qos_conf, NULL); > @@ -2155,5 +2153,4 @@ netdev_dpdk_run(const struct netdev_class *netdev_class > OVS_UNUSED) > continue; > } > -atomic_store_relaxed(&netdev_dpdk_pending_reset[port_id], false); > > ovs_mutex_lock(&dpdk_mutex); > @@ -2161,5 +2158,4 @@ netdev_dpdk_run(const struct netdev_class *netdev_class > OVS_UNUSED) > if (dev) { > ovs_mutex_lock(&dev->mutex); > -dev->reset_needed = true; > netdev_request_reconfigure(&dev->up); > VLOG_DBG_RL(&rl, "%s: Device reset requested.", > @@ -6073,4 +6069,5 @@ netdev_dpdk_reconfigure(struct netdev *netdev) > { > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > +bool pending_reset; > bool try_rx_steer; > int err = 0; > @@ -6084,4 +6081,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev) > } > > +atomic_read_relaxed(&netdev_dpdk_pending_reset[dev->port_id], > +&pending_reset); > + > if (netdev->n_txq == dev->requested_n_txq > && netdev->n_rxq == dev->requested_n_rxq > @@ -6093,5 +6093,5 @@ netdev_dpdk_reconfigure(struct netdev *netdev) > && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr) > && dev->socket_id == dev->requested_socket_id > -&& dev->started && !dev->reset_needed) { > +&& dev->started && !pending_reset) { > /* Reconfiguration is unnecessary */ > > @@ -6102,8 +6102,12 @@ retry: > dpdk_rx_steer_unconfigure(dev); > > -if (dev->reset_needed) { > +if (pending_reset) { > + /* > + * Set false before reset to avoid missing a new reset interrupt > event > + * in a race with event callback. > + */ > +atomic_store_relaxed(&netdev_dpdk_pending_reset[dev->port_id], > false); > rte_eth_dev_reset(dev->port_id); > if_notifier_manual_report(); > -dev->reset_needed = false; > } else { > rte_eth_dev_stop(dev->port_id); The patch looks good to me. Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpdk: Check other_config:dpdk-extra for '--lcores'.
Hello Kevin, On Wed, Jun 12, 2024 at 7:09 PM Kevin Traynor wrote: > > Currently dpdk lcore args for DPDK EAL init can be generated or it > can be written directly by the user through dpdk-extra. > > If dpdk-extra does not contain '-l' or '-c', a '-l' argument with > a core list for DPDK EAL init will be generated. > > The '--lcores' argument should also be checked, as if it is used in > dpdk-extra, a '-l' is still generated and that causes DPDK EAL init > to fail: > > |9|dpdk|INFO|EAL ARGS: ovs-vswitchd --lcores 0@18 --in-memory -l 0. > |00012|dpdk|ERR|EAL: Option -l is ignored, because (--lcore) is set! Not important for this patch, but there is a typo in the DPDK error message, should be --lcores. Can you send a fix on DPDK side? > > Add check for '--lcores' in dpdk-extra config and don't add '-l' if > it is detected: > > |9|dpdk|INFO|EAL ARGS: ovs-vswitchd --lcores 0@8 --in-memory. > > Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07") I am not clear why you point at this commit. --lcores were introduced in DPDK v2.0.0. In 0a0f39df1d5a (which swaps DPDK from 16.04 to 16.07), only -c is handled so passing additional -l and --lcores via dpdk-extra would already be a problem. I would point at: Fixes: 543342a41cbc ("DPDK: add support for v2.0.0") > Signed-off-by: Kevin Traynor With the Fixes: tag sorted out, you can add: Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 2/2] ci: Restore vhost-user unit tests in check-dpdk.
Following a rework in the DPDK cache, the PATH variable is incorrectly set, resulting in dpdk-testpmd not being available. Because of this, vhost-user unit tests were skipped in GHA runs. Fixes: 8893e24d9d09 ("dpdk: Update to use v23.11.") Signed-off-by: David Marchand --- .ci/linux-build.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh index bf9d6241d5..702feeb3bb 100755 --- a/.ci/linux-build.sh +++ b/.ci/linux-build.sh @@ -25,7 +25,7 @@ function install_dpdk() export PKG_CONFIG_PATH=$DPDK_LIB/pkgconfig/:$PKG_CONFIG_PATH # Expose dpdk binaries. -export PATH=$(pwd)/dpdk-dir/build/bin:$PATH +export PATH=$(pwd)/dpdk-dir/bin:$PATH if [ ! -f "${VERSION_FILE}" ]; then echo "Could not find DPDK in $DPDK_INSTALL_DIR" -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 1/2] system-dpdk: Fix socket conflict when starting testpmd.
The DPDK telemetry library tries to connect to existing socket files so that it knows whether it can take over them. As was reported by Christian, following a fix in DPDK that got backported in v23.11.1, vhost-user unit tests that have both OVS and testpmd running at the same time reveal a conflict over the telemetry socket. This conflict shows up as an error message in OVS logs which makes those tests fail in the CI: 2024-06-06T13:03:38.351Z|1|dpdk|ERR|TELEMETRY: Socket write base info to client failed The EAL file-prefix option affects both the directory where DPDK stores running files (like the telemetry socket) and how files backing hugepages are named (when in non --in-memory mode). Configure (again) this prefix so that testpmd runs in a dedicated directory. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-June/414545.html Fixes: c488f28a0eaf ("system-dpdk: Don't require hugetlbfs.") Signed-off-by: David Marchand --- tests/system-dpdk-macros.at | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at index 7cf9bac170..f8ba766739 100644 --- a/tests/system-dpdk-macros.at +++ b/tests/system-dpdk-macros.at @@ -102,7 +102,7 @@ m4_define([OVS_DPDK_CHECK_TESTPMD], m4_define([OVS_DPDK_START_TESTPMD], [AT_CHECK([lscpu], [], [stdout]) AT_CHECK([cat stdout | grep "NUMA node(s)" | awk '{c=1; while (c++<$(3)) {printf "512,"}; print "512"}' > NUMA_NODE]) - eal_options="$DPDK_EAL_OPTIONS --in-memory --socket-mem="$(cat NUMA_NODE)" --single-file-segments --no-pci" + eal_options="$DPDK_EAL_OPTIONS --in-memory --socket-mem="$(cat NUMA_NODE)" --single-file-segments --no-pci --file-prefix testpmd" options="$1" test "$options" != "${options%% -- *}" || options="$options -- " eal_options="$eal_options ${options%% -- *}" -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] system-dpdk: Tolerate new warnings 23.11.1/24.03.
On Thu, Jun 6, 2024 at 2:41 PM Christian Ehrhardt wrote: > On Thu, Jun 6, 2024 at 2:25 PM David Marchand > wrote: > > On Thu, Jun 6, 2024 at 9:33 AM wrote: > > > > > > From: Christian Ehrhardt > > > > > > DPDK fixed counting of telemetry clients in 24.03 [1] which was also > > > backported to 23.11.1 [2]. Due to that the dpdk related openvswitch > > > tests now fail in the following cases: > > > 4: OVS-DPDK - ping vhost-user ports FAILED (system-dpdk.at:148) > > > 5: OVS-DPDK - ping vhost-user-client ports FAILED (system-dpdk.at:224) > > > 16: OVS-DPDK - MTU increase vport port FAILED (system-dpdk.at:609) > > > 17: OVS-DPDK - MTU decrease vport port FAILED (system-dpdk.at:651) > > > 20: OVS-DPDK - MTU upper bound vport port FAILED (system-dpdk.at:767) > > > 21: OVS-DPDK - MTU lower bound vport port FAILED (system-dpdk.at:809) > > > > > > This is due to a new error being logged under these conditions: > > > dpdk|ERR|TELEMETRY: Socket write base info to client failed > > > > I had some trouble understanding what was happening here because this > > error log should be triggered on connection to the telemetry socket. > > > > So I expected this error log affect any dpdk unit tests, but only the > > tests with vhost-user were affected... > > Attaching gdb, I could confirm something was indeed connecting to the > > telemetry socket and I realised it was due to telemetry running in the > > testpmd dpdk instance that tries to be smart and connect/quit on > > existing sockets. > > > > So we can waive this error log but I think a better fix would be to > > avoid this conflict between testpmd and ovs (regardless of telemetry). > > Something like: > > > > diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at > > index 7cf9bac170..f8ba766739 100644 > > --- a/tests/system-dpdk-macros.at > > +++ b/tests/system-dpdk-macros.at > > @@ -102,7 +102,7 @@ m4_define([OVS_DPDK_CHECK_TESTPMD], > > m4_define([OVS_DPDK_START_TESTPMD], > >[AT_CHECK([lscpu], [], [stdout]) > > AT_CHECK([cat stdout | grep "NUMA node(s)" | awk '{c=1; while > > (c++<$(3)) {printf "512,"}; print "512"}' > NUMA_NODE]) > > - eal_options="$DPDK_EAL_OPTIONS --in-memory --socket-mem="$(cat > > NUMA_NODE)" --single-file-segments --no-pci" > > + eal_options="$DPDK_EAL_OPTIONS --in-memory --socket-mem="$(cat > > NUMA_NODE)" --single-file-segments --no-pci --file-prefix testpmd" > > options="$1" > > test "$options" != "${options%% -- *}" || options="$options -- " > > eal_options="$eal_options ${options%% -- *}" > > I'm happy with that approach as well, > feel free to submit for real and scrap mine. Ok, I'll post a series as we also have a problem with GHA (not) running DPDK tests. Btw, on the DPDK side, I sent a patch to lower this telemetry log message to debug level. https://patchwork.dpdk.org/project/dpdk/patch/20240606122654.2889214-1-david.march...@redhat.com/ -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] system-dpdk: Tolerate new warnings 23.11.1/24.03.
On Thu, Jun 6, 2024 at 9:33 AM wrote: > > From: Christian Ehrhardt > > DPDK fixed counting of telemetry clients in 24.03 [1] which was also > backported to 23.11.1 [2]. Due to that the dpdk related openvswitch > tests now fail in the following cases: > 4: OVS-DPDK - ping vhost-user ports FAILED (system-dpdk.at:148) > 5: OVS-DPDK - ping vhost-user-client ports FAILED (system-dpdk.at:224) > 16: OVS-DPDK - MTU increase vport port FAILED (system-dpdk.at:609) > 17: OVS-DPDK - MTU decrease vport port FAILED (system-dpdk.at:651) > 20: OVS-DPDK - MTU upper bound vport port FAILED (system-dpdk.at:767) > 21: OVS-DPDK - MTU lower bound vport port FAILED (system-dpdk.at:809) > > This is due to a new error being logged under these conditions: > dpdk|ERR|TELEMETRY: Socket write base info to client failed I had some trouble understanding what was happening here because this error log should be triggered on connection to the telemetry socket. So I expected this error log affect any dpdk unit tests, but only the tests with vhost-user were affected... Attaching gdb, I could confirm something was indeed connecting to the telemetry socket and I realised it was due to telemetry running in the testpmd dpdk instance that tries to be smart and connect/quit on existing sockets. So we can waive this error log but I think a better fix would be to avoid this conflict between testpmd and ovs (regardless of telemetry). Something like: diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at index 7cf9bac170..f8ba766739 100644 --- a/tests/system-dpdk-macros.at +++ b/tests/system-dpdk-macros.at @@ -102,7 +102,7 @@ m4_define([OVS_DPDK_CHECK_TESTPMD], m4_define([OVS_DPDK_START_TESTPMD], [AT_CHECK([lscpu], [], [stdout]) AT_CHECK([cat stdout | grep "NUMA node(s)" | awk '{c=1; while (c++<$(3)) {printf "512,"}; print "512"}' > NUMA_NODE]) - eal_options="$DPDK_EAL_OPTIONS --in-memory --socket-mem="$(cat NUMA_NODE)" --single-file-segments --no-pci" + eal_options="$DPDK_EAL_OPTIONS --in-memory --socket-mem="$(cat NUMA_NODE)" --single-file-segments --no-pci --file-prefix testpmd" options="$1" test "$options" != "${options%% -- *}" || options="$options -- " eal_options="$eal_options ${options%% -- *}" Opinions? -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Userspace: Software fallback for UDP encapsulated TCP segmentation.
On Wed, Feb 21, 2024 at 5:09 AM Mike Pattrick wrote: > @@ -105,20 +115,35 @@ dp_packet_gso(struct dp_packet *p, struct > dp_packet_batch **batches) > return false; > } > > -tcp_hdr = dp_packet_l4(p); > -tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl); > -tcp_seq = ntohl(get_16aligned_be32(&tcp_hdr->tcp_seq)); > -hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p)) > - + tcp_offset * 4; > -ip_id = 0; > -if (dp_packet_hwol_is_ipv4(p)) { > +if (dp_packet_hwol_is_tunnel_vxlan(p) || > +dp_packet_hwol_is_tunnel_geneve(p)) { > +data_pos = dp_packet_get_inner_tcp_payload(p); > +outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p); > +tcp_hdr = dp_packet_inner_l4(p); > +ip_hdr = dp_packet_inner_l3(p); > +tnl = true; > +if (outer_ipv4) { > +ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id); > +} else if (dp_packet_hwol_is_ipv4(p)) { > +ip_id = ntohs(ip_hdr->ip_id); > +} > +} else { > +data_pos = dp_packet_get_tcp_payload(p); > +outer_ipv4 = dp_packet_hwol_is_ipv4(p); > +tcp_hdr = dp_packet_l4(p); > ip_hdr = dp_packet_l3(p); > -ip_id = ntohs(ip_hdr->ip_id); > +tnl = false; > +if (outer_ipv4) { > +ip_id = ntohs(ip_hdr->ip_id); > +} > } > > +tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl); > +tcp_seq = ntohl(get_16aligned_be32(&tcp_hdr->tcp_seq)); > +hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p)) > + + tcp_offset * 4; > const char *data_tail = (char *) dp_packet_tail(p) > - dp_packet_l2_pad_size(p); > -const char *data_pos = dp_packet_get_tcp_payload(p); > int n_segs = dp_packet_gso_nr_segs(p); > > for (int i = 0; i < n_segs; i++) { > @@ -130,8 +155,26 @@ dp_packet_gso(struct dp_packet *p, struct > dp_packet_batch **batches) > seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len); > data_pos += seg_len; > > +if (tnl) { > +/* Update tunnel L3 header. */ > +if (dp_packet_hwol_is_ipv4(seg)) { > +ip_hdr = dp_packet_inner_l3(seg); > +ip_hdr->ip_tot_len = htons(sizeof *ip_hdr + > + dp_packet_inner_l4_size(seg)); > +ip_hdr->ip_id = htons(ip_id); > +ip_hdr->ip_csum = 0; > +ip_id++; Hum, it seems with this change, we are tying outer and inner (in the ipv4 in ipv4 case) ip id together. I am unclear what the Linux kernel does or what is acceptable, but I prefer to mention. I also noticed ip_id is incremented a second time later in this loop which I find suspicious. I wonder if OVS should instead increment outer and inner ip ids (copied from original packet data) by 'i'. Like ip_hdr->ip_id = htons(ntohs(ip_hdr->ip_id) + i) ? And adjusting the outer udp seems missing (length and checksum if needed), which is probably what Ilya meant. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 6/6] netdev-dpdk: Refactor tunnel checksum offloading.
All informations required for checksum offloading can be deducted by already tracked dp_packet l3_ofs, l4_ofs, inner_l3_ofs and inner_l4_ofs fields. Remove DPDK specific l[2-4]_len from generic OVS code. netdev-dpdk code then fills mbuf specifics step by step: - outer_l2_len and outer_l3_len are needed for tunneling (and below features), - l2_len and l3_len are needed for IP and L4 checksum (and below features), - l4_len and tso_segsz are needed when doing TSO, Signed-off-by: David Marchand --- lib/dp-packet.h | 37 -- lib/netdev-dpdk.c | 35 ++--- lib/netdev-native-tnl.c | 50 + 3 files changed, 27 insertions(+), 95 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 3622764c47..a75b1c5cdb 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -604,25 +604,6 @@ dp_packet_get_nd_payload(const struct dp_packet *b) } #ifdef DPDK_NETDEV -static inline void -dp_packet_set_l2_len(struct dp_packet *b, size_t l2_len) -{ -b->mbuf.l2_len = l2_len; -} - -static inline void -dp_packet_set_l3_len(struct dp_packet *b, size_t l3_len) -{ -b->mbuf.l3_len = l3_len; -} - -static inline void -dp_packet_set_l4_len(struct dp_packet *b, size_t l4_len) -{ -b->mbuf.l4_len = l4_len; -} - - static inline uint64_t * dp_packet_ol_flags_ptr(const struct dp_packet *b) { @@ -642,24 +623,6 @@ dp_packet_flow_mark_ptr(const struct dp_packet *b) } #else -static inline void -dp_packet_set_l2_len(struct dp_packet *b OVS_UNUSED, size_t l2_len OVS_UNUSED) -{ -/* There is no implementation. */ -} - -static inline void -dp_packet_set_l3_len(struct dp_packet *b OVS_UNUSED, size_t l3_len OVS_UNUSED) -{ -/* There is no implementation. */ -} - -static inline void -dp_packet_set_l4_len(struct dp_packet *b OVS_UNUSED, size_t l4_len OVS_UNUSED) -{ -/* There is no implementation. */ -} - static inline uint32_t * dp_packet_ol_flags_ptr(const struct dp_packet *b) { diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bda3fa94b6..0fa37d5145 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2583,6 +2583,9 @@ static bool netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) { struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); +void *l2; +void *l3; +void *l4; const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_L4_MASK | @@ -2612,11 +2615,6 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) return true; } -ovs_assert(dp_packet_l4(pkt)); - -/* If packet is vxlan or geneve tunnel packet, calculate outer - * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated - * before. */ const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK; if (OVS_UNLIKELY(tunnel_type && tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE && @@ -2634,6 +2632,11 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) (char *) dp_packet_eth(pkt); mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) - (char *) dp_packet_l3(pkt); + +/* Inner L2 length must account for the tunnel header length. */ +l2 = dp_packet_l4(pkt); +l3 = dp_packet_inner_l3(pkt); +l4 = dp_packet_inner_l4(pkt); } else { /* If no outer offloading is requested, clear outer marks. */ mbuf->ol_flags &= ~all_outer_marks; @@ -2641,8 +2644,9 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->outer_l3_len = 0; /* Skip outer headers. */ -mbuf->l2_len += (char *) dp_packet_l4(pkt) - -(char *) dp_packet_eth(pkt); +l2 = dp_packet_eth(pkt); +l3 = dp_packet_inner_l3(pkt); +l4 = dp_packet_inner_l4(pkt); } } else { if (tunnel_type) { @@ -2662,22 +2666,27 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) } mbuf->outer_l2_len = 0; mbuf->outer_l3_len = 0; -mbuf->l2_len = (char *) dp_packet_l3(pkt) - - (char *) dp_packet_eth(pkt); -mbuf->l3_len = (char *) dp_packet_l4(pkt) - - (char *) dp_packet_l3(pkt); + +l2 = dp_packet_eth(pkt); +l3 = dp_packet_l3(pkt); +l4 = dp_packet_l4(pkt); } +ovs_assert(l4); + +mbuf->l2_len = (char *) l3 - (char *) l2; +mbuf->l3_len = (char *) l4 - (char *) l3; + if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) { -struct tcp_header *th = dp_packet_l4(pkt); +struct tcp_header *th = l4; uint16_t link_tso_segsz;
[ovs-dev] [PATCH v4 5/6] netdev-dpdk: Use guest TSO segmentation size hint.
In a typical setup like: guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B TSO packets from guest A are segmented against the OVS A physical port mtu adjusted by the vxlan tunnel header size, regardless of guest A interface mtu. As an example, let's say guest A and guest B mtu are set to 1500 bytes. OVS A and OVS B physical ports mtu are set to 1600 bytes. Guest A will request TCP segmentation for 1448 bytes segments. On the other hand, OVS A will request 1498 bytes segments to the HW. This results in OVS B dropping packets because decapsulated packets are larger than the vhost-user port (serving guest B) mtu. 2024-04-17T14:13:01.239Z|2|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0: Too big size 1564 max_packet_len 1518 vhost-user ports expose a guest mtu by filling mbuf->tso_segsz. Use it as a hint. This may result in segments (on the wire) slightly shorter than the optimal size. Reported-at: https://github.com/openvswitch/ovs-issues/issues/321 Signed-off-by: David Marchand --- Changes since v3: - removed check on mbuf->tso_segsz == 0, --- lib/netdev-dpdk.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0dfd685467..bda3fa94b6 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2670,14 +2670,19 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) { struct tcp_header *th = dp_packet_l4(pkt); +uint16_t link_tso_segsz; int hdr_len; if (tunnel_type) { -mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len - - mbuf->l4_len - mbuf->outer_l3_len; +link_tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len - + mbuf->l4_len - mbuf->outer_l3_len; } else { mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4; -mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; +link_tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; +} + +if (mbuf->tso_segsz > link_tso_segsz) { +mbuf->tso_segsz = link_tso_segsz; } hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len; -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 4/6] netdev-dpdk: Refactor TSO request code.
Every L3, L4 checksum offload or TSO requires a (outer) L3 length to be provided. This length is computed via dp_packet_l4(pkt) that is always set when such offloads are requested in OVS. Getting a th == NULL is a bug in OVS, so an assert() is more appropriate. Besides, filling l4_len and tso_segsz only matters to TSO, so there is no need to set it for other L4 checksum offloading requests. Signed-off-by: David Marchand --- Changes since v3: - reworded commitlog, --- lib/netdev-dpdk.c | 36 +++- 1 file changed, 11 insertions(+), 25 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0c624d5d38..0dfd685467 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2583,7 +2583,6 @@ static bool netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) { struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); -struct tcp_header *th; const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_L4_MASK | @@ -2613,6 +2612,8 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) return true; } +ovs_assert(dp_packet_l4(pkt)); + /* If packet is vxlan or geneve tunnel packet, calculate outer * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated * before. */ @@ -2666,22 +2667,10 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->l3_len = (char *) dp_packet_l4(pkt) - (char *) dp_packet_l3(pkt); } -th = dp_packet_l4(pkt); if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) { -if (!th) { -VLOG_WARN_RL(&rl, "%s: TCP Segmentation without L4 header" - " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len); -return false; -} -} - -if ((mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) == RTE_MBUF_F_TX_TCP_CKSUM) { -if (!th) { -VLOG_WARN_RL(&rl, "%s: TCP offloading without L4 header" - " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len); -return false; -} +struct tcp_header *th = dp_packet_l4(pkt); +int hdr_len; if (tunnel_type) { mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len - @@ -2691,16 +2680,13 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; } -if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) { -int hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len; -if (OVS_UNLIKELY((hdr_len + - mbuf->tso_segsz) > dev->max_packet_len)) { -VLOG_WARN_RL(&rl, "%s: Oversized TSO packet. hdr: %"PRIu32", " - "gso: %"PRIu32", max len: %"PRIu32"", - dev->up.name, hdr_len, mbuf->tso_segsz, - dev->max_packet_len); -return false; -} +hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len; +if (OVS_UNLIKELY((hdr_len + mbuf->tso_segsz) > dev->max_packet_len)) { +VLOG_WARN_RL(&rl, "%s: Oversized TSO packet. hdr: %"PRIu32", " + "gso: %"PRIu32", max len: %"PRIu32"", + dev->up.name, hdr_len, mbuf->tso_segsz, + dev->max_packet_len); +return false; } } -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 3/6] netdev-dpdk: Fix inner checksum when outer is not supported.
If outer checksum is not supported and OVS already set L3/L4 outer checksums in the packet, no outer mark should be left in ol_flags (as it confuses some driver, like net/ixgbe). l2_len must be adjusted to account for the tunnel header. Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") Signed-off-by: David Marchand Acked-by: Kevin Traynor --- lib/netdev-dpdk.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 7c910cac8e..0c624d5d38 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2628,10 +2628,21 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) } if (tunnel_type && (mbuf->ol_flags & all_inner_requests)) { -mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) - - (char *) dp_packet_eth(pkt); -mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) - - (char *) dp_packet_l3(pkt); +if (mbuf->ol_flags & all_outer_requests) { +mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) - + (char *) dp_packet_eth(pkt); +mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) - + (char *) dp_packet_l3(pkt); +} else { +/* If no outer offloading is requested, clear outer marks. */ +mbuf->ol_flags &= ~all_outer_marks; +mbuf->outer_l2_len = 0; +mbuf->outer_l3_len = 0; + +/* Skip outer headers. */ +mbuf->l2_len += (char *) dp_packet_l4(pkt) - +(char *) dp_packet_eth(pkt); +} } else { if (tunnel_type) { /* No inner offload is requested, fallback to non tunnel -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 2/6] netdev-dpdk: Disable outer UDP checksum for net/iavf.
Same as the commit 6f93d8e62f13 ("netdev-dpdk: Disable outer UDP checksum offload for ice/i40e driver."), disable outer UDP checksum and related offloads for net/iavf. Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") Signed-off-by: David Marchand Acked-by: Kevin Traynor --- Note: - DPDK (in progress) fixes can be found at: https://patchwork.dpdk.org/project/dpdk/list/?series=31780&state=* --- lib/netdev-dpdk.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index e15b491ed5..7c910cac8e 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1355,12 +1355,14 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) } if (!strcmp(info.driver_name, "net_ice") -|| !strcmp(info.driver_name, "net_i40e")) { +|| !strcmp(info.driver_name, "net_i40e") +|| !strcmp(info.driver_name, "net_iavf")) { /* FIXME: Driver advertises the capability but doesn't seem * to actually support it correctly. Can remove this once * the driver is fixed on DPDK side. */ VLOG_INFO("%s: disabled Tx outer udp checksum offloads for a " - "net/ice or net/i40e port.", netdev_get_name(&dev->up)); + "net/ice, net/i40e or net/iavf port.", + netdev_get_name(&dev->up)); info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM; info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO; info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO; -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 1/6] netdev-dpdk: Fallback to non tunnel checksum offloading.
The outer checksum offloading API in DPDK is ambiguous and was implemented by Intel folks in their drivers with the assumption that any outer offloading always goes with an inner offloading request. With net/i40e and net/ice drivers, in the case of encapsulating a ARP packet in a vxlan tunnel (which results in requesting outer ip checksum with a tunnel context but no inner offloading request), a Tx failure is triggered, associated with a port MDD event. 2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event To avoid this situation, if no checksum or segmentation offloading is requested on the inner part of a packet, fallback to "normal" (non outer) offloading request. Reported-at: https://github.com/openvswitch/ovs-issues/issues/321 Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") Fixes: f81d782c1906 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.") Signed-off-by: David Marchand Acked-by: Kevin Traynor --- Changes since v2: - kept offloads disabled for net/i40e and net/ice as this patch does not fix outer udp checksum (a DPDK fix is required), - updated commitlog with details to reproduce the issue, - adjusted indent, Changes since v1: - reset inner marks before converting outer requests, - fixed some coding style, --- lib/netdev-dpdk.c | 71 +++ 1 file changed, 41 insertions(+), 30 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 7b84c858e9..e15b491ed5 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2583,16 +2583,18 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); struct tcp_header *th; -const uint64_t all_requests = (RTE_MBUF_F_TX_IP_CKSUM | - RTE_MBUF_F_TX_L4_MASK | - RTE_MBUF_F_TX_OUTER_IP_CKSUM | - RTE_MBUF_F_TX_OUTER_UDP_CKSUM | - RTE_MBUF_F_TX_TCP_SEG); -const uint64_t all_marks = (RTE_MBUF_F_TX_IPV4 | -RTE_MBUF_F_TX_IPV6 | -RTE_MBUF_F_TX_OUTER_IPV4 | -RTE_MBUF_F_TX_OUTER_IPV6 | -RTE_MBUF_F_TX_TUNNEL_MASK); +const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM | + RTE_MBUF_F_TX_L4_MASK | + RTE_MBUF_F_TX_TCP_SEG); +const uint64_t all_outer_requests = (RTE_MBUF_F_TX_OUTER_IP_CKSUM | + RTE_MBUF_F_TX_OUTER_UDP_CKSUM); +const uint64_t all_requests = all_inner_requests | all_outer_requests; +const uint64_t all_inner_marks = (RTE_MBUF_F_TX_IPV4 | + RTE_MBUF_F_TX_IPV6); +const uint64_t all_outer_marks = (RTE_MBUF_F_TX_OUTER_IPV4 | + RTE_MBUF_F_TX_OUTER_IPV6 | + RTE_MBUF_F_TX_TUNNEL_MASK); +const uint64_t all_marks = all_inner_marks | all_outer_marks; if (!(mbuf->ol_flags & all_requests)) { /* No offloads requested, no marks should be set. */ @@ -2613,34 +2615,43 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated * before. */ const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK; -if (tunnel_type == RTE_MBUF_F_TX_TUNNEL_GENEVE || -tunnel_type == RTE_MBUF_F_TX_TUNNEL_VXLAN) { -mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) - - (char *) dp_packet_eth(pkt); -mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) - - (char *) dp_packet_l3(pkt); - -/* If neither inner checksums nor TSO is requested, inner marks - * should not be set. */ -if (!(mbuf->ol_flags & (RTE_MBUF_F_TX_IP_CKSUM | -RTE_MBUF_F_TX_L4_MASK | -RTE_MBUF_F_TX_TCP_SEG))) { -mbuf->ol_flags &= ~(RTE_MBUF_F_TX_IPV4 | -RTE_MBUF_F_TX_IPV6); -} -} else if (OVS_UNLIKELY(tunnel_type)) { +if (OVS_UNLIKELY(tunnel_type && + tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE && + tunnel_type != RTE_MBUF_F_TX_TUNNEL_VXLAN)) { VLOG_WARN_RL(&rl, "%s: Unexpected tunnel type: %#"PRIx64, netdev_get_name(&dev->up), tunnel_type); netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up), "Packet with unexpected tunnel type", mbuf); return false; +} + +if (tunnel_type && (mbuf->ol_flags & all_i
Re: [ovs-dev] [PATCH v3 6/6] netdev-dpdk: Refactor tunnel checksum offloading.
On Wed, May 15, 2024 at 2:11 PM Kevin Traynor wrote: > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index 1dad2ef833..31dd6f1d5a 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -2584,6 +2584,9 @@ static bool > > netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf > > *mbuf) > > { > > struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); > > +void *l2; > > +void *l3; > > +void *l4; > > > > const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM | > > RTE_MBUF_F_TX_L4_MASK | > > @@ -2613,11 +2616,6 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk > > *dev, struct rte_mbuf *mbuf) > > return true; > > } > > > > -ovs_assert(dp_packet_l4(pkt)); > > - > > -/* If packet is vxlan or geneve tunnel packet, calculate outer > > - * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated > > - * before. */ > > const uint64_t tunnel_type = mbuf->ol_flags & > > RTE_MBUF_F_TX_TUNNEL_MASK; > > if (OVS_UNLIKELY(tunnel_type && > > tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE && > > @@ -2635,6 +2633,11 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk > > *dev, struct rte_mbuf *mbuf) > > (char *) dp_packet_eth(pkt); > > mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) - > > (char *) dp_packet_l3(pkt); > > + > > > +/* Inner L2 length must account for the tunnel header length. > > */ > > +l2 = dp_packet_l4(pkt); > > Code looks ok to me, but it's tricky and the L2 settings with inner > requests are a bit unintuitive without a notepad and thinking from the > driver perspective backwards. Not sure there is much can be done to > mitigate that here, other than the comment you added. Unfortunately, I don't have a better idea. It was already unintuitive before this patch, but to make it worse, the logic was split across lib/netdev-dpdk.c and lib/netdev-native-tnl.c. Like for example this comment in dp_packet_tnl_ol_process(), which is DPDK specific. /* Attention please, tunnel inner l2 len is consist of udp header * len and tunnel header len and inner l2 len. */ > > Did you manage to test to confirm they're working as expected ? In general, I tested the series with CX6, E810 and ixgbe, with ipv4 traffic, ipv4 traffic tunneled in ipv4/ipv6 vxlan and ipv4 traffic tunneled in ipv4/ipv6 geneve. But I am not sure I covered every possible combinations. Specifically for this case you point at (outer + inner offloads), I tested CX6 with IPv4/IPv6 VxLAN and Geneve (for which I have traces in my bash history). With E810, I remember testing the same with the DPDK fixes, but I don't have a trace of it. I'll double check before sending a next revision. > > > +l3 = dp_packet_inner_l3(pkt); > > +l4 = dp_packet_inner_l4(pkt); > > see below > > > } else { > > /* If no outer offloading is requested, clear outer marks. */ > > mbuf->ol_flags &= ~all_outer_marks; > > @@ -2642,8 +2645,9 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, > > struct rte_mbuf *mbuf) > > mbuf->outer_l3_len = 0; > > > > /* Skip outer headers. */ > > -mbuf->l2_len += (char *) dp_packet_l4(pkt) - > > -(char *) dp_packet_eth(pkt); > > +l2 = dp_packet_eth(pkt); > > > +l3 = dp_packet_inner_l3(pkt); > > +l4 = dp_packet_inner_l4(pkt); > > You could move these outside the inner (pardon the pun) if else, but I > could understand if you prefer to set l2/l3/l4 together for better > readability ? Well, as you noted, this code is not trivial. I preferred to have all 3 pointers grouped, with a comment relating to the group. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 5/6] netdev-dpdk: Use guest TSO segmentation size hint.
On Wed, May 15, 2024 at 2:09 PM Kevin Traynor wrote: > > On 19/04/2024 15:06, David Marchand wrote: > > In a typical setup like: > > guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B > > > > TSO packets from guest A are segmented against the OVS A physical port > > mtu adjusted by the vxlan tunnel header size, regardless of guest A > > interface mtu. > > > > As an example, let's say guest A and guest B mtu are set to 1500 bytes. > > OVS A and OVS B physical ports mtu are set to 1600 bytes. > > Guest A will request TCP segmentation for 1448 bytes segments. > > On the other hand, OVS A will request 1498 bytes segments to the HW. > > This results in OVS B dropping packets because decapsulated packets > > are larger than the vhost-user port (serving guest B) mtu. > > > > 2024-04-17T14:13:01.239Z|2|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0: > > Too big size 1564 max_packet_len 1518 > > > > vhost-user ports expose a guest mtu by filling mbuf->tso_segsz. > > Use it as a hint. > > > > This may result in segments (on the wire) slightly shorter than the > > optimal size. > > > > Reported-at: https://github.com/openvswitch/ovs-issues/issues/321 > > Signed-off-by: David Marchand > > --- > > Note: > > As we trust the guest with this change, should we put a lower limit on > > mbuf->tso_segsz? > > > > There are some checks I looked at (e.g [0]), but it could be checked > here for an earlier drop i suppose...additional comment below > > [0] > https://git.dpdk.org/dpdk/tree/drivers/net/ice/ice_rxtx.c#n3754 I guess you meant https://git.dpdk.org/dpdk/tree/drivers/net/ice/ice_rxtx.c#n3818 And same in v23.11, there are checks at the prepare stage: https://git.dpdk.org/dpdk-stable/tree/drivers/net/ice/ice_rxtx.c?h=23.11#n3681 I had forgotten about those checks. There is no limit exposed per driver from DPDK, so the simpler for OVS is to trust them. > > > --- > > lib/netdev-dpdk.c | 11 --- > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index 661269e4b6..1dad2ef833 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -2671,14 +2671,19 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk > > *dev, struct rte_mbuf *mbuf) > > > > if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) { > > struct tcp_header *th = dp_packet_l4(pkt); > > +uint16_t link_tso_segsz; > > int hdr_len; > > > > if (tunnel_type) { > > -mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len - > > - mbuf->l4_len - mbuf->outer_l3_len; > > +link_tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len - > > + mbuf->l4_len - mbuf->outer_l3_len; > > } else { > > mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4; > > -mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; > > +link_tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; > > +} > > + > > +if (!mbuf->tso_segsz || mbuf->tso_segsz > link_tso_segsz) { > > It seems like something is not right if the flag is set but tso_segsz is > 0. It is set by vhost lib from gso_size, but I don't see a validation > there either. At the time I added a check on the 0 value, I thought there was a case where RTE_MBUF_F_TX_TCP_SEG could be set with no segsz value. But as you mention, all setters of this flag (either in vhost or in OVS) set a segsz too. So with segsz always set, combined with the drivers check, OVS probably does not need any check on tso_segsz. I intend to remove this check in a next revision. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 4/6] netdev-dpdk: Refactor TSO request code.
Hello Kevin, Thanks for reviewing. On Fri, May 10, 2024 at 11:50 PM Kevin Traynor wrote: > > On 19/04/2024 15:06, David Marchand wrote: > > Replace check on th == NULL with an assert() because dp_packet_l4(pkt) > > is priorly used to compute (outer) L3 length. > > > > Besides, filling l4_len and tso_segsz only matters to TSO, so there is > > no need to check for other L4 checksum offloading requests. > > > > Signed-off-by: David Marchand > > --- > > lib/netdev-dpdk.c | 36 +++- > > 1 file changed, 11 insertions(+), 25 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index 8b6a3ed189..661269e4b6 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -2584,7 +2584,6 @@ static bool > > netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf > > *mbuf) > > { > > struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); > > -struct tcp_header *th; > > > > const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM | > > RTE_MBUF_F_TX_L4_MASK | > > @@ -2614,6 +2613,8 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, > > struct rte_mbuf *mbuf) > > return true; > > } > > > > +ovs_assert(dp_packet_l4(pkt)); > > I'm not clear why you want to change this from a warning/return > fail/drop to an assert ? From this point in the function, there is at least one request for checksum offloading pending. Any L3 (or higher) checksum requested by OVS means that the packet has been parsed/composed as either IP or IPv6 and packet->l4_ofs was set to point after the l3 header (with miniflow_extract / *_compose() helpers). So getting a NULL pointer for l4 here indicates a bug in OVS. An assert seems better than a warn/return that probably nobody notice(d). Did I miss a case where l4_ofs can be unset? > > Nit: should this be in the previous patch instead ? and I see it is > removed in a later patch. It is not supposed to be removed in the series. The last patch moves it later in the function. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ci: Set platform parameter when building DPDK.
This change has no impact, since -Dmachine=default gets converted by DPDK into -Dplatform=generic (since v21.08, see the link to DPDK commit below). Yet, switch to explicitly setting -Dplatform and avoid the following warning: 2024-04-18T14:50:16.8001092Z config/meson.build:113: WARNING: The "machine" option is deprecated. Please use "cpu_instruction_set" instead. While at it, solve another warning and call explicitly meson setup. 2024-04-18T14:50:17.0770596Z WARNING: Running the setup command as `meson [options]` instead of `meson setup [options]` is ambiguous and deprecated. Link: https://git.dpdk.org/dpdk/commit/?id=bf66003b51ec Signed-off-by: David Marchand --- .ci/dpdk-build.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.ci/dpdk-build.sh b/.ci/dpdk-build.sh index 23f3166a54..e1b8e3ccbb 100755 --- a/.ci/dpdk-build.sh +++ b/.ci/dpdk-build.sh @@ -25,9 +25,9 @@ function build_dpdk() pushd dpdk-src fi -# Switching to 'default' machine to make the dpdk cache usable on +# Switching to 'generic' platform to make the dpdk cache usable on # different CPUs. We can't be sure that all CI machines are exactly same. -DPDK_OPTS="$DPDK_OPTS -Dmachine=default" +DPDK_OPTS="$DPDK_OPTS -Dplatform=generic" # Disable building DPDK unit tests. Not needed for OVS build or tests. DPDK_OPTS="$DPDK_OPTS -Dtests=false" @@ -49,7 +49,7 @@ function build_dpdk() # Install DPDK using prefix. DPDK_OPTS="$DPDK_OPTS --prefix=$DPDK_INSTALL_DIR" -meson $DPDK_OPTS build +meson setup $DPDK_OPTS build ninja -C build ninja -C build install popd -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 6/6] netdev-dpdk: Refactor tunnel checksum offloading.
All informations required for checksum offloading can be deduced by already tracked dp_packet l3_ofs, l4_ofs, inner_l3_ofs and inner_l4_ofs fields. Remove DPDK specific l[2-4]_len from generic OVS code. netdev-dpdk code then fills mbuf specifics step by step: - outer_l2_len and outer_l3_len are needed for tunneling (and below features), - l2_len and l3_len are needed for IP and L4 checksum (and below features), - l4_len and tso_segsz are needed when doing TSO, Signed-off-by: David Marchand --- lib/dp-packet.h | 37 -- lib/netdev-dpdk.c | 35 ++--- lib/netdev-native-tnl.c | 50 + 3 files changed, 27 insertions(+), 95 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 3622764c47..a75b1c5cdb 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -604,25 +604,6 @@ dp_packet_get_nd_payload(const struct dp_packet *b) } #ifdef DPDK_NETDEV -static inline void -dp_packet_set_l2_len(struct dp_packet *b, size_t l2_len) -{ -b->mbuf.l2_len = l2_len; -} - -static inline void -dp_packet_set_l3_len(struct dp_packet *b, size_t l3_len) -{ -b->mbuf.l3_len = l3_len; -} - -static inline void -dp_packet_set_l4_len(struct dp_packet *b, size_t l4_len) -{ -b->mbuf.l4_len = l4_len; -} - - static inline uint64_t * dp_packet_ol_flags_ptr(const struct dp_packet *b) { @@ -642,24 +623,6 @@ dp_packet_flow_mark_ptr(const struct dp_packet *b) } #else -static inline void -dp_packet_set_l2_len(struct dp_packet *b OVS_UNUSED, size_t l2_len OVS_UNUSED) -{ -/* There is no implementation. */ -} - -static inline void -dp_packet_set_l3_len(struct dp_packet *b OVS_UNUSED, size_t l3_len OVS_UNUSED) -{ -/* There is no implementation. */ -} - -static inline void -dp_packet_set_l4_len(struct dp_packet *b OVS_UNUSED, size_t l4_len OVS_UNUSED) -{ -/* There is no implementation. */ -} - static inline uint32_t * dp_packet_ol_flags_ptr(const struct dp_packet *b) { diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 1dad2ef833..31dd6f1d5a 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2584,6 +2584,9 @@ static bool netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) { struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); +void *l2; +void *l3; +void *l4; const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_L4_MASK | @@ -2613,11 +2616,6 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) return true; } -ovs_assert(dp_packet_l4(pkt)); - -/* If packet is vxlan or geneve tunnel packet, calculate outer - * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated - * before. */ const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK; if (OVS_UNLIKELY(tunnel_type && tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE && @@ -2635,6 +2633,11 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) (char *) dp_packet_eth(pkt); mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) - (char *) dp_packet_l3(pkt); + +/* Inner L2 length must account for the tunnel header length. */ +l2 = dp_packet_l4(pkt); +l3 = dp_packet_inner_l3(pkt); +l4 = dp_packet_inner_l4(pkt); } else { /* If no outer offloading is requested, clear outer marks. */ mbuf->ol_flags &= ~all_outer_marks; @@ -2642,8 +2645,9 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->outer_l3_len = 0; /* Skip outer headers. */ -mbuf->l2_len += (char *) dp_packet_l4(pkt) - -(char *) dp_packet_eth(pkt); +l2 = dp_packet_eth(pkt); +l3 = dp_packet_inner_l3(pkt); +l4 = dp_packet_inner_l4(pkt); } } else { if (tunnel_type) { @@ -2663,22 +2667,27 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) } mbuf->outer_l2_len = 0; mbuf->outer_l3_len = 0; -mbuf->l2_len = (char *) dp_packet_l3(pkt) - - (char *) dp_packet_eth(pkt); -mbuf->l3_len = (char *) dp_packet_l4(pkt) - - (char *) dp_packet_l3(pkt); + +l2 = dp_packet_eth(pkt); +l3 = dp_packet_l3(pkt); +l4 = dp_packet_l4(pkt); } +ovs_assert(l4); + +mbuf->l2_len = (char *) l3 - (char *) l2; +mbuf->l3_len = (char *) l4 - (char *) l3; + if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) { -struct tcp_header *th = dp_packet_l4(pkt); +struct tcp_header *th = l4; uint16_t link_tso_segsz;
[ovs-dev] [PATCH v3 5/6] netdev-dpdk: Use guest TSO segmentation size hint.
In a typical setup like: guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B TSO packets from guest A are segmented against the OVS A physical port mtu adjusted by the vxlan tunnel header size, regardless of guest A interface mtu. As an example, let's say guest A and guest B mtu are set to 1500 bytes. OVS A and OVS B physical ports mtu are set to 1600 bytes. Guest A will request TCP segmentation for 1448 bytes segments. On the other hand, OVS A will request 1498 bytes segments to the HW. This results in OVS B dropping packets because decapsulated packets are larger than the vhost-user port (serving guest B) mtu. 2024-04-17T14:13:01.239Z|2|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0: Too big size 1564 max_packet_len 1518 vhost-user ports expose a guest mtu by filling mbuf->tso_segsz. Use it as a hint. This may result in segments (on the wire) slightly shorter than the optimal size. Reported-at: https://github.com/openvswitch/ovs-issues/issues/321 Signed-off-by: David Marchand --- Note: As we trust the guest with this change, should we put a lower limit on mbuf->tso_segsz? --- lib/netdev-dpdk.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 661269e4b6..1dad2ef833 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2671,14 +2671,19 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) { struct tcp_header *th = dp_packet_l4(pkt); +uint16_t link_tso_segsz; int hdr_len; if (tunnel_type) { -mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len - - mbuf->l4_len - mbuf->outer_l3_len; +link_tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len - + mbuf->l4_len - mbuf->outer_l3_len; } else { mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4; -mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; +link_tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; +} + +if (!mbuf->tso_segsz || mbuf->tso_segsz > link_tso_segsz) { +mbuf->tso_segsz = link_tso_segsz; } hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len; -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 4/6] netdev-dpdk: Refactor TSO request code.
Replace check on th == NULL with an assert() because dp_packet_l4(pkt) is priorly used to compute (outer) L3 length. Besides, filling l4_len and tso_segsz only matters to TSO, so there is no need to check for other L4 checksum offloading requests. Signed-off-by: David Marchand --- lib/netdev-dpdk.c | 36 +++- 1 file changed, 11 insertions(+), 25 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 8b6a3ed189..661269e4b6 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2584,7 +2584,6 @@ static bool netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) { struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); -struct tcp_header *th; const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_L4_MASK | @@ -2614,6 +2613,8 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) return true; } +ovs_assert(dp_packet_l4(pkt)); + /* If packet is vxlan or geneve tunnel packet, calculate outer * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated * before. */ @@ -2667,22 +2668,10 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->l3_len = (char *) dp_packet_l4(pkt) - (char *) dp_packet_l3(pkt); } -th = dp_packet_l4(pkt); if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) { -if (!th) { -VLOG_WARN_RL(&rl, "%s: TCP Segmentation without L4 header" - " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len); -return false; -} -} - -if ((mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) == RTE_MBUF_F_TX_TCP_CKSUM) { -if (!th) { -VLOG_WARN_RL(&rl, "%s: TCP offloading without L4 header" - " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len); -return false; -} +struct tcp_header *th = dp_packet_l4(pkt); +int hdr_len; if (tunnel_type) { mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len - @@ -2692,16 +2681,13 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; } -if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) { -int hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len; -if (OVS_UNLIKELY((hdr_len + - mbuf->tso_segsz) > dev->max_packet_len)) { -VLOG_WARN_RL(&rl, "%s: Oversized TSO packet. hdr: %"PRIu32", " - "gso: %"PRIu32", max len: %"PRIu32"", - dev->up.name, hdr_len, mbuf->tso_segsz, - dev->max_packet_len); -return false; -} +hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len; +if (OVS_UNLIKELY((hdr_len + mbuf->tso_segsz) > dev->max_packet_len)) { +VLOG_WARN_RL(&rl, "%s: Oversized TSO packet. hdr: %"PRIu32", " + "gso: %"PRIu32", max len: %"PRIu32"", + dev->up.name, hdr_len, mbuf->tso_segsz, + dev->max_packet_len); +return false; } } -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 3/6] netdev-dpdk: Fix inner checksum when outer is not supported.
If outer checksum is not supported and OVS already set L3/L4 outer checksums in the packet, no outer mark should be left in ol_flags (as it confuses some driver, like net/ixgbe). l2_len must be adjusted to account for the tunnel header. Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") Signed-off-by: David Marchand --- lib/netdev-dpdk.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index f732716141..8b6a3ed189 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2629,10 +2629,21 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) } if (tunnel_type && (mbuf->ol_flags & all_inner_requests)) { -mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) - - (char *) dp_packet_eth(pkt); -mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) - - (char *) dp_packet_l3(pkt); +if (mbuf->ol_flags & all_outer_requests) { +mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) - + (char *) dp_packet_eth(pkt); +mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) - + (char *) dp_packet_l3(pkt); +} else { +/* If no outer offloading is requested, clear outer marks. */ +mbuf->ol_flags &= ~all_outer_marks; +mbuf->outer_l2_len = 0; +mbuf->outer_l3_len = 0; + +/* Skip outer headers. */ +mbuf->l2_len += (char *) dp_packet_l4(pkt) - +(char *) dp_packet_eth(pkt); +} } else { if (tunnel_type) { /* No inner offload is requested, fallback to non tunnel -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 1/6] netdev-dpdk: Fallback to non tunnel checksum offloading.
The outer checksum offloading API in DPDK is ambiguous and was implemented by Intel folks in their drivers with the assumption that any outer offloading always goes with an inner offloading request. With net/i40e and net/ice drivers, in the case of encapsulating a ARP packet in a vxlan tunnel (which results in requesting outer ip checksum with a tunnel context but no inner offloading request), a Tx failure is triggered, associated with a port MDD event. 2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event To avoid this situation, if no checksum or segmentation offloading is requested on the inner part of a packet, fallback to "normal" (non outer) offloading request. Reported-at: https://github.com/openvswitch/ovs-issues/issues/321 Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") Fixes: f81d782c1906 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.") Signed-off-by: David Marchand --- Changes since v2: - kept offloads disabled for net/i40e and net/ice as this patch does not fix outer udp checksum (a DPDK fix is required), - updated commitlog with details to reproduce the issue, - adjusted indent, Changes since v1: - reset inner marks before converting outer requests, - fixed some coding style, --- lib/netdev-dpdk.c | 71 +++ 1 file changed, 41 insertions(+), 30 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 2111f77681..7e109903c0 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2584,16 +2584,18 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); struct tcp_header *th; -const uint64_t all_requests = (RTE_MBUF_F_TX_IP_CKSUM | - RTE_MBUF_F_TX_L4_MASK | - RTE_MBUF_F_TX_OUTER_IP_CKSUM | - RTE_MBUF_F_TX_OUTER_UDP_CKSUM | - RTE_MBUF_F_TX_TCP_SEG); -const uint64_t all_marks = (RTE_MBUF_F_TX_IPV4 | -RTE_MBUF_F_TX_IPV6 | -RTE_MBUF_F_TX_OUTER_IPV4 | -RTE_MBUF_F_TX_OUTER_IPV6 | -RTE_MBUF_F_TX_TUNNEL_MASK); +const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM | + RTE_MBUF_F_TX_L4_MASK | + RTE_MBUF_F_TX_TCP_SEG); +const uint64_t all_outer_requests = (RTE_MBUF_F_TX_OUTER_IP_CKSUM | + RTE_MBUF_F_TX_OUTER_UDP_CKSUM); +const uint64_t all_requests = all_inner_requests | all_outer_requests; +const uint64_t all_inner_marks = (RTE_MBUF_F_TX_IPV4 | + RTE_MBUF_F_TX_IPV6); +const uint64_t all_outer_marks = (RTE_MBUF_F_TX_OUTER_IPV4 | + RTE_MBUF_F_TX_OUTER_IPV6 | + RTE_MBUF_F_TX_TUNNEL_MASK); +const uint64_t all_marks = all_inner_marks | all_outer_marks; if (!(mbuf->ol_flags & all_requests)) { /* No offloads requested, no marks should be set. */ @@ -2614,34 +2616,43 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated * before. */ const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK; -if (tunnel_type == RTE_MBUF_F_TX_TUNNEL_GENEVE || -tunnel_type == RTE_MBUF_F_TX_TUNNEL_VXLAN) { -mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) - - (char *) dp_packet_eth(pkt); -mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) - - (char *) dp_packet_l3(pkt); - -/* If neither inner checksums nor TSO is requested, inner marks - * should not be set. */ -if (!(mbuf->ol_flags & (RTE_MBUF_F_TX_IP_CKSUM | -RTE_MBUF_F_TX_L4_MASK | -RTE_MBUF_F_TX_TCP_SEG))) { -mbuf->ol_flags &= ~(RTE_MBUF_F_TX_IPV4 | -RTE_MBUF_F_TX_IPV6); -} -} else if (OVS_UNLIKELY(tunnel_type)) { +if (OVS_UNLIKELY(tunnel_type && + tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE && + tunnel_type != RTE_MBUF_F_TX_TUNNEL_VXLAN)) { VLOG_WARN_RL(&rl, "%s: Unexpected tunnel type: %#"PRIx64, netdev_get_name(&dev->up), tunnel_type); netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up), "Packet with unexpected tunnel type", mbuf); return false; +} + +if (tunnel_type && (mbuf->ol_flags & all_inner_requests)) { +mbu
[ovs-dev] [PATCH v3 2/6] netdev-dpdk: Disable outer UDP checksum for net/iavf.
Same as the commit 6f93d8e62f13 ("netdev-dpdk: Disable outer UDP checksum offload for ice/i40e driver."), disable outer UDP checksum and related offloads for net/iavf. Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") Signed-off-by: David Marchand --- Note: - DPDK (in progress) fixes can be found at: https://patchwork.dpdk.org/project/dpdk/list/?series=31780&state=* --- lib/netdev-dpdk.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 7e109903c0..f732716141 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1355,12 +1355,14 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) } if (!strcmp(info.driver_name, "net_ice") -|| !strcmp(info.driver_name, "net_i40e")) { +|| !strcmp(info.driver_name, "net_i40e") +|| !strcmp(info.driver_name, "net_iavf")) { /* FIXME: Driver advertises the capability but doesn't seem * to actually support it correctly. Can remove this once * the driver is fixed on DPDK side. */ VLOG_INFO("%s: disabled Tx outer udp checksum offloads for a " - "net/ice or net/i40e port.", netdev_get_name(&dev->up)); + "net/ice, net/i40e or net/iavf port.", + netdev_get_name(&dev->up)); info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM; info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO; info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO; -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v1] nedev-dpdk: Fix config with dpdk net_bonding offloads.
Hello, On Fri, Apr 12, 2024 at 8:30 AM Jun Wang wrote: > > If it's a DPDK net_bonding, it may cause > offload-related configurations to take effect, > leading to offload failure. I did not look at the patch for now. What is the interest of using a net/bonding DPDK port when there is native support of bonding in OVS? I am not familiar with OVN setups so maybe I am missing something on this side. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Fallback to non tunnel offloading API.
On Fri, Apr 5, 2024 at 3:00 PM Ilya Maximets wrote: > >> > >>> Basically, resolving a neighbor with ARP inside tunnels is broken on > >>> Intel nics (even without re-enabling outer udp checksum). > >>> This can be observed with the following debug patch at the end of > >>> netdev_dpdk_prep_hwol_packet(): > >>> > >>> +char buf[256]; > >>> +if (rte_get_tx_ol_flag_list(mbuf->ol_flags, buf, sizeof(buf)) != 0) > >>> +buf[0] = '\0'; > >>> +VLOG_WARN("len=%u, ol_flags=%s, outer_l2_len=%u, outer_l3_len=%u, > >>> l2_len=%u, l3_len=%u, l4_len=%u, tso_segsz=%u", mbuf->pkt_len, buf, > >>> mbuf->outer_l2_len, mbuf->outer_l3_len, mbuf->l2_len, mbuf->l3_len, > >>> mbuf->l4_len, mbuf->tso_segsz); > >>> > >>> Then doing a "arping" inside the tunnel triggers: > >>> 2024-04-03T16:05:40.920Z|00014|netdev_dpdk(pmd-c03/id:8)|WARN|len=96, > >>> ol_flags=RTE_MBUF_F_TX_L4_NO_CKSUM RTE_MBUF_F_TX_OUTER_IP_CKSUM > >>> RTE_MBUF_F_TX_OUTER_IPV4 RTE_MBUF_F_TX_TUNNEL_VXLAN , outer_l2_len=18, > >>> outer_l3_len=20, l2_len=0, l3_len=0, l4_len=0, tso_segsz=0 > > The fact that l2_len and l3_len are not set here looks like an OVS > bug though, as AFAIU, these should always be set if any Tx offload > is requested. The commit that introduces such Tx offloads requests is: f81d782c19 - netdev-native-tnl: Mark all vxlan/geneve packets as tunneled. (7 weeks ago) -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Fallback to non tunnel offloading API.
On Wed, Apr 3, 2024 at 8:13 PM Ilya Maximets wrote: > > - This patch fixes some misusage of the DPDK API. > > Hmm, I understand that the driver does something funny when it gets > outer flags set without any inner flags, but how is that a misuse > of the DPDK API? Could you point me to the API docs that say that > inner flags must always be set in this case or that setting only > outer offloads is not allowed? Setting the tunnel type (which is set along outer checksum in OVS) is described as: /** * Bits 45:48 used for the tunnel type. * The tunnel type must be specified for TSO or checksum on the inner part * of tunnel packets. * These flags can be used with RTE_MBUF_F_TX_TCP_SEG for TSO, or * RTE_MBUF_F_TX_xxx_CKSUM. * The mbuf fields for inner and outer header lengths are required: * outer_l2_len, outer_l3_len, l2_len, l3_len, l4_len and tso_segsz for TSO. */ #define RTE_MBUF_F_TX_TUNNEL_VXLAN (0x1ULL << 45) #define RTE_MBUF_F_TX_TUNNEL_GRE (0x2ULL << 45) #define RTE_MBUF_F_TX_TUNNEL_IPIP(0x3ULL << 45) #define RTE_MBUF_F_TX_TUNNEL_GENEVE (0x4ULL << 45) /** TX packet with MPLS-in-UDP RFC 7510 header. */ #define RTE_MBUF_F_TX_TUNNEL_MPLSINUDP (0x5ULL << 45) #define RTE_MBUF_F_TX_TUNNEL_VXLAN_GPE (0x6ULL << 45) #define RTE_MBUF_F_TX_TUNNEL_GTP (0x7ULL << 45) #define RTE_MBUF_F_TX_TUNNEL_ESP (0x8ULL << 45) It is not specified what to expect it neither TSO nor inner checksum is requested. In a same way, it is not described what to expect if outer API is called with no inner offload. Adding Ferruh and Thomas who may have one opinion. > > I agree that it seems safer to just downgrade all outer flags to > inner ones on OVS side, when no inner offloads are requested, I'm > just not sure I agree that it's an API misuse. Especially since > non-Intel cards seem to work fine. I suppose you mean mlx5. Has it been tested on other nics? > > > Basically, resolving a neighbor with ARP inside tunnels is broken on > > Intel nics (even without re-enabling outer udp checksum). > > This can be observed with the following debug patch at the end of > > netdev_dpdk_prep_hwol_packet(): > > > > +char buf[256]; > > +if (rte_get_tx_ol_flag_list(mbuf->ol_flags, buf, sizeof(buf)) != 0) > > +buf[0] = '\0'; > > +VLOG_WARN("len=%u, ol_flags=%s, outer_l2_len=%u, outer_l3_len=%u, > > l2_len=%u, l3_len=%u, l4_len=%u, tso_segsz=%u", mbuf->pkt_len, buf, > > mbuf->outer_l2_len, mbuf->outer_l3_len, mbuf->l2_len, mbuf->l3_len, > > mbuf->l4_len, mbuf->tso_segsz); > > > > Then doing a "arping" inside the tunnel triggers: > > 2024-04-03T16:05:40.920Z|00014|netdev_dpdk(pmd-c03/id:8)|WARN|len=96, > > ol_flags=RTE_MBUF_F_TX_L4_NO_CKSUM RTE_MBUF_F_TX_OUTER_IP_CKSUM > > RTE_MBUF_F_TX_OUTER_IPV4 RTE_MBUF_F_TX_TUNNEL_VXLAN , outer_l2_len=18, > > outer_l3_len=20, l2_len=0, l3_len=0, l4_len=0, tso_segsz=0 > > 2024-04-03T16:05:40.920Z|00012|dpdk|WARN|ice_interrupt_handler(): > > OICR: MDD event > > > > We need this fix in OVS regardless of the outer udp checksum issue. > > I'll respin this fix in a new series, without touching UDP checksum capa. > > > > > > - It does seem that X710 nics have no support for outer udp checksum > > (according to its datasheet). Some X722 version may have support for > > it, but net/i40e does not configure the Tx descriptor accordingly. > > On the other hand, E810 ones seem fine (according to its datasheet). > > > > After more debugging, I managed to get outer udp checksum working. > > I understand the DPDK rte_net_intel_cksum_flags_prepare() helper does > > not set the pseudo header checksum in the outer udp header. > > I proposed a fix in the dpdk bz. > > > > Waiting for the fix on DPDK side... it is still possible to add the > > missing bits in OVS (see the branch I pointed at in the OVS issue). > > Since this feature never worked with ice in OVS and it is experimental, > I tend to think that we should just disable it for ice as well until > DPDK is fixed. > > A little too many fixes for that thing we have already and this one will > involve some extra driver-specific logic that we don't have any automated > tests for. I don't mind waiting for the DPDK fix before re-enabling outer udp and other offloads. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Fallback to non tunnel offloading API.
On Thu, Mar 28, 2024 at 10:16 AM David Marchand wrote: > > The outer checksum offloading API in DPDK is ambiguous and was > added by Intel folks with the assumption that any outer offloading > always goes with an inner offloading request. > > With net/i40e and net/ice drivers, requesting outer ip checksum with a > tunnel context but no inner offloading request triggers a Tx failure > associated with a port MDD event. > 2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR: > MDD event > > To avoid this situation, if no checksum or segmentation offloading is > requested on the inner part of a packet, fallback to "normal" (non outer) > offloading request. > And outer offloading can be re-enabled for net/i40e and netice. > > Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") > Signed-off-by: David Marchand > --- > Changes since v1: > - reset inner marks before converting outer requests, > - fixed some coding style, > > --- > lib/netdev-dpdk.c | 83 --- > 1 file changed, 43 insertions(+), 40 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 2111f77681..ae43594a3d 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -1354,18 +1354,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) > info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_TCP_CKSUM; > } > > -if (!strcmp(info.driver_name, "net_ice") > -|| !strcmp(info.driver_name, "net_i40e")) { > -/* FIXME: Driver advertises the capability but doesn't seem > - * to actually support it correctly. Can remove this once > - * the driver is fixed on DPDK side. */ > -VLOG_INFO("%s: disabled Tx outer udp checksum offloads for a " > - "net/ice or net/i40e port.", netdev_get_name(&dev->up)); > -info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM; > -info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO; > -info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO; > -} > - A few comments after spending some time on the topic. - This patch fixes some misusage of the DPDK API. Basically, resolving a neighbor with ARP inside tunnels is broken on Intel nics (even without re-enabling outer udp checksum). This can be observed with the following debug patch at the end of netdev_dpdk_prep_hwol_packet(): +char buf[256]; +if (rte_get_tx_ol_flag_list(mbuf->ol_flags, buf, sizeof(buf)) != 0) +buf[0] = '\0'; +VLOG_WARN("len=%u, ol_flags=%s, outer_l2_len=%u, outer_l3_len=%u, l2_len=%u, l3_len=%u, l4_len=%u, tso_segsz=%u", mbuf->pkt_len, buf, mbuf->outer_l2_len, mbuf->outer_l3_len, mbuf->l2_len, mbuf->l3_len, mbuf->l4_len, mbuf->tso_segsz); Then doing a "arping" inside the tunnel triggers: 2024-04-03T16:05:40.920Z|00014|netdev_dpdk(pmd-c03/id:8)|WARN|len=96, ol_flags=RTE_MBUF_F_TX_L4_NO_CKSUM RTE_MBUF_F_TX_OUTER_IP_CKSUM RTE_MBUF_F_TX_OUTER_IPV4 RTE_MBUF_F_TX_TUNNEL_VXLAN , outer_l2_len=18, outer_l3_len=20, l2_len=0, l3_len=0, l4_len=0, tso_segsz=0 2024-04-03T16:05:40.920Z|00012|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event We need this fix in OVS regardless of the outer udp checksum issue. I'll respin this fix in a new series, without touching UDP checksum capa. - It does seem that X710 nics have no support for outer udp checksum (according to its datasheet). Some X722 version may have support for it, but net/i40e does not configure the Tx descriptor accordingly. On the other hand, E810 ones seem fine (according to its datasheet). After more debugging, I managed to get outer udp checksum working. I understand the DPDK rte_net_intel_cksum_flags_prepare() helper does not set the pseudo header checksum in the outer udp header. I proposed a fix in the dpdk bz. Waiting for the fix on DPDK side... it is still possible to add the missing bits in OVS (see the branch I pointed at in the OVS issue). - About the workaround (disabling outer udp checksum for net/ice and net/i40e), the net/iavf is subject to the same bugs. So we should disable outer udp checksum too for this driver. However, I am not sure the iavf driver (can?) differentiates which PF / hw is used underneath. So we may have no solution but to always disable this type of offloading in OVS for net/iavf. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Request for Source File Location: 'ovs-vswitchd.conf.db' MAN Page
Hello, On Thu, Mar 28, 2024 at 11:13 AM Farhan Tariq wrote: > > I'm adding a new feature to OvS and need to update the 'ovs-vswitchd.conf.db' > MAN page. However, I'm unable to locate the source file for this MAN page to > add the necessary information about the new feature. > > Could you please provide the exact file path or location for the > 'ovs-vswitchd.conf.db' MAN page? You are probably looking for vswitchd/vswitch.xml. ovsdb man page gets generated from vswitchd/vswitch.xml (and vswitchd/vswitch.ovsschema) content. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] netdev-dpdk: Fallback to non tunnel offloading API.
The outer checksum offloading API in DPDK is ambiguous and was added by Intel folks with the assumption that any outer offloading always goes with an inner offloading request. With net/i40e and net/ice drivers, requesting outer ip checksum with a tunnel context but no inner offloading request triggers a Tx failure associated with a port MDD event. 2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event To avoid this situation, if no checksum or segmentation offloading is requested on the inner part of a packet, fallback to "normal" (non outer) offloading request. And outer offloading can be re-enabled for net/i40e and netice. Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") Signed-off-by: David Marchand --- Changes since v1: - reset inner marks before converting outer requests, - fixed some coding style, --- lib/netdev-dpdk.c | 83 --- 1 file changed, 43 insertions(+), 40 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 2111f77681..ae43594a3d 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1354,18 +1354,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_TCP_CKSUM; } -if (!strcmp(info.driver_name, "net_ice") -|| !strcmp(info.driver_name, "net_i40e")) { -/* FIXME: Driver advertises the capability but doesn't seem - * to actually support it correctly. Can remove this once - * the driver is fixed on DPDK side. */ -VLOG_INFO("%s: disabled Tx outer udp checksum offloads for a " - "net/ice or net/i40e port.", netdev_get_name(&dev->up)); -info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM; -info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO; -info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO; -} - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_IPV4_CKSUM) { dev->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; } else { @@ -2584,16 +2572,18 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); struct tcp_header *th; -const uint64_t all_requests = (RTE_MBUF_F_TX_IP_CKSUM | - RTE_MBUF_F_TX_L4_MASK | - RTE_MBUF_F_TX_OUTER_IP_CKSUM | - RTE_MBUF_F_TX_OUTER_UDP_CKSUM | - RTE_MBUF_F_TX_TCP_SEG); -const uint64_t all_marks = (RTE_MBUF_F_TX_IPV4 | -RTE_MBUF_F_TX_IPV6 | -RTE_MBUF_F_TX_OUTER_IPV4 | -RTE_MBUF_F_TX_OUTER_IPV6 | -RTE_MBUF_F_TX_TUNNEL_MASK); +const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM | + RTE_MBUF_F_TX_L4_MASK | + RTE_MBUF_F_TX_TCP_SEG); +const uint64_t all_outer_requests = (RTE_MBUF_F_TX_OUTER_IP_CKSUM | + RTE_MBUF_F_TX_OUTER_UDP_CKSUM); +const uint64_t all_requests = all_inner_requests | all_outer_requests; +const uint64_t all_inner_marks = (RTE_MBUF_F_TX_IPV4 | + RTE_MBUF_F_TX_IPV6); +const uint64_t all_outer_marks = (RTE_MBUF_F_TX_OUTER_IPV4 | + RTE_MBUF_F_TX_OUTER_IPV6 | + RTE_MBUF_F_TX_TUNNEL_MASK); +const uint64_t all_marks = all_inner_marks | all_outer_marks; if (!(mbuf->ol_flags & all_requests)) { /* No offloads requested, no marks should be set. */ @@ -2610,32 +2600,45 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) return true; } +const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK; +if (OVS_UNLIKELY(tunnel_type + && tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE + && tunnel_type != RTE_MBUF_F_TX_TUNNEL_VXLAN)) { +VLOG_WARN_RL(&rl, "%s: Unexpected tunnel type: %#"PRIx64, + netdev_get_name(&dev->up), tunnel_type); +netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up), + "Packet with unexpected tunnel type", mbuf); +return false; +} + /* If packet is vxlan or geneve tunnel packet, calculate outer * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated * before. */ -const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK; -if (tunnel_type == RTE_MBUF_F_TX_TUNNEL_GENEVE || -tunnel_type == RTE_MBUF_F_TX_TUNNEL_VXLAN) { +if
Re: [ovs-dev] [PATCH] netdev-dpdk: Fallback to non tunnel offloading API.
On Thu, Mar 28, 2024 at 5:40 AM junwan...@cestc.cn wrote: > > I validated this modification on my x710 network card, but I found that > the outer UDP checksum of the transmitted packets is incorrect, leading > to communication abnormalities. I think it's necessary to disable the outer > UDP checksum because although the capability reported by DPDK > indicates support, in reality, the hardware doesn't actually support > offloading, > resulting in outer UDP checksum errors. > > tx_geneve_tso_offload="false", tx_ip_csum_offload="true", > tx_out_ip_csum_offload="true", > tx_out_udp_csum_offload="true", tx_sctp_csum_offload="true", > tx_tcp_csum_offload="true", > tx_tcp_seg_offload="false", tx_udp_csum_offload="true", > tx_vxlan_tso_offload="false" Well, good timing, thanks for the report. I was testing ipv6 in ipv4 (which seemed to work) and I realised something is wrong at the outer -> inner conversion by looking at the ol_flags in my debug prints. Now, trying the opposite (ipv4 in ipv6), I think I reproduce your issue with a E810 nic: 04:50:46.211854 50:7c:6f:3c:0c:26 > 50:7c:6f:3c:10:5a, ethertype IPv6 (0x86dd), length 168: (hlim 64, next-header UDP (17) payload length: 114) 2001:4e48::2.39854 > 2001::1.geneve: [bad udp cksum 0x89ad -> 0xfadd!] Geneve, Flags [none], vni 0x0, proto TEB (0x6558) 52:54:00:00:11:01 > 4e:a9:1d:ce:85:4a, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 9408, offset 0, flags [DF], proto ICMP (1), length 84) 172.31.22.2 > 172.31.22.1: ICMP echo request, id 1442, seq 9, length 64 Please have a try with the v2 (I'll post soon). If it still fails, can you provide a reproducer (ideally without OVN to reduce the scope)? -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dpdk: Fallback to non tunnel offloading API.
On Wed, Mar 27, 2024 at 5:51 PM David Marchand wrote: > /* If packet is vxlan or geneve tunnel packet, calculate outer > * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated > * before. */ > -const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK; > -if (tunnel_type == RTE_MBUF_F_TX_TUNNEL_GENEVE || > -tunnel_type == RTE_MBUF_F_TX_TUNNEL_VXLAN) { > +if ((tunnel_type == RTE_MBUF_F_TX_TUNNEL_GENEVE || > + tunnel_type == RTE_MBUF_F_TX_TUNNEL_VXLAN) && > +mbuf->ol_flags & all_inner_requests) { > + > mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) - > (char *) dp_packet_eth(pkt); > mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) - > (char *) dp_packet_l3(pkt); > +} else { > +if (OVS_UNLIKELY(!(mbuf->ol_flags & all_inner_requests))) { > +/* If no inner offloading is requesting, fallback to non > tunneling > + * checksum offloads. */ Inner marks must be reset before converting outer marks. Otherwise, this results (with IPv4 traffic encapsulated in IPv6 geneve tunnel) in such a ol_flags combination: RTE_MBUF_F_RX_RSS_HASH RTE_MBUF_F_TX_UDP_CKSUM RTE_MBUF_F_TX_IP_CKSUM RTE_MBUF_F_TX_IPV4 RTE_MBUF_F_TX_IPV6 v2 in preparation. > > -/* If neither inner checksums nor TSO is requested, inner marks > - * should not be set. */ > -if (!(mbuf->ol_flags & (RTE_MBUF_F_TX_IP_CKSUM | > -RTE_MBUF_F_TX_L4_MASK | > -RTE_MBUF_F_TX_TCP_SEG))) { > -mbuf->ol_flags &= ~(RTE_MBUF_F_TX_IPV4 | > -RTE_MBUF_F_TX_IPV6); > +if (mbuf->ol_flags & RTE_MBUF_F_TX_OUTER_IP_CKSUM) { > +mbuf->ol_flags |= RTE_MBUF_F_TX_IP_CKSUM; > +mbuf->ol_flags |= RTE_MBUF_F_TX_IPV4; > +} > +if (mbuf->ol_flags & RTE_MBUF_F_TX_OUTER_UDP_CKSUM) { > +mbuf->ol_flags |= RTE_MBUF_F_TX_UDP_CKSUM; > +mbuf->ol_flags |= (mbuf->ol_flags & > RTE_MBUF_F_TX_OUTER_IPV4) ? > + RTE_MBUF_F_TX_IPV4 : RTE_MBUF_F_TX_IPV6; > +} > +mbuf->ol_flags &= ~(all_outer_requests | all_outer_marks); -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev-dpdk: Fallback to non tunnel offloading API.
The outer checksum offloading API in DPDK is ambiguous and was added by Intel folks with the assumption that any outer offloading always goes with an inner offloading request. With net/i40e and net/ice drivers, requesting outer ip checksum with a tunnel context but no inner offloading request triggers a Tx failure associated with a port MDD event. 2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event To avoid this situation, if no checksum or segmentation offloading is requested on the inner part of a packet, fallback to "normal" (non outer) offloading request. And outer offloading can be re-enabled for net/i40e and netice. Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") Signed-off-by: David Marchand --- lib/netdev-dpdk.c | 84 +++ 1 file changed, 42 insertions(+), 42 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 2111f77681..939817474c 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1354,18 +1354,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_TCP_CKSUM; } -if (!strcmp(info.driver_name, "net_ice") -|| !strcmp(info.driver_name, "net_i40e")) { -/* FIXME: Driver advertises the capability but doesn't seem - * to actually support it correctly. Can remove this once - * the driver is fixed on DPDK side. */ -VLOG_INFO("%s: disabled Tx outer udp checksum offloads for a " - "net/ice or net/i40e port.", netdev_get_name(&dev->up)); -info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM; -info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO; -info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO; -} - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_IPV4_CKSUM) { dev->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; } else { @@ -2584,20 +2572,20 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); struct tcp_header *th; -const uint64_t all_requests = (RTE_MBUF_F_TX_IP_CKSUM | - RTE_MBUF_F_TX_L4_MASK | - RTE_MBUF_F_TX_OUTER_IP_CKSUM | - RTE_MBUF_F_TX_OUTER_UDP_CKSUM | - RTE_MBUF_F_TX_TCP_SEG); -const uint64_t all_marks = (RTE_MBUF_F_TX_IPV4 | -RTE_MBUF_F_TX_IPV6 | -RTE_MBUF_F_TX_OUTER_IPV4 | -RTE_MBUF_F_TX_OUTER_IPV6 | -RTE_MBUF_F_TX_TUNNEL_MASK); - -if (!(mbuf->ol_flags & all_requests)) { +const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM | + RTE_MBUF_F_TX_L4_MASK | + RTE_MBUF_F_TX_TCP_SEG); +const uint64_t all_outer_requests = (RTE_MBUF_F_TX_OUTER_IP_CKSUM | + RTE_MBUF_F_TX_OUTER_UDP_CKSUM); +const uint64_t all_inner_marks = (RTE_MBUF_F_TX_IPV4 | + RTE_MBUF_F_TX_IPV6); +const uint64_t all_outer_marks = (RTE_MBUF_F_TX_OUTER_IPV4 | + RTE_MBUF_F_TX_OUTER_IPV6 | + RTE_MBUF_F_TX_TUNNEL_MASK); + +if (!(mbuf->ol_flags & (all_inner_requests | all_outer_requests))) { /* No offloads requested, no marks should be set. */ -mbuf->ol_flags &= ~all_marks; +mbuf->ol_flags &= ~(all_inner_marks | all_outer_marks); uint64_t unexpected = mbuf->ol_flags & RTE_MBUF_F_TX_OFFLOAD_MASK; if (OVS_UNLIKELY(unexpected)) { @@ -2610,32 +2598,44 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) return true; } +const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK; +if (OVS_UNLIKELY(tunnel_type + && tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE + && tunnel_type != RTE_MBUF_F_TX_TUNNEL_VXLAN)) { +VLOG_WARN_RL(&rl, "%s: Unexpected tunnel type: %#"PRIx64, + netdev_get_name(&dev->up), tunnel_type); +netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up), + "Packet with unexpected tunnel type", mbuf); +return false; +} + /* If packet is vxlan or geneve tunnel packet, calculate outer * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated * before. */ -const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK; -if (tunnel_type == RTE_MBUF_F_T
Re: [ovs-dev] [PATCH v2 2/4] netdev-linux: Favour inner packet for multi-encapsulated TSO.
On Thu, Feb 15, 2024 at 7:03 AM Mike Pattrick wrote: > I've made a branch where we properly account for outer and inner > checksums, and it passes the tests mostly, except for afxdp. > > For afxdp we crash in dp_packet_prealloc_headroom(). netdev-afxdp has > a hardcoded OVS_XDP_HEADROOM=128 bytes and the multiple layers of > tunneling exceeds that. I ran a test where I set this to 256 and the > test passes, but that seems like a non-ideal solution. We probably > shouldn't abort() in dp_packet_resize(), as it could be possible to > accidentally run into this. This is exactly the point I wanted to stress with DPDK dp-packets. The reason behind was to check this old patch of mine: https://patchwork.ozlabs.org/project/openvswitch/patch/20220318153339.31083-1-david.march...@redhat.com/ DPDK dp-packets data are supposed to be located at RTE_PKTMBUF_HEADROOM == 128 bytes, on rx. But I uncovered recently that we won't hit this headroom limit with net/af_xdp backing netdev-dpdk ports... The net/af_xdp driver tries to be smart and avoid copies by using the unaligned chunk af_xdp feature. https://git.dpdk.org/dpdk/commit/?id=d8a210774e1d4c295fd93b983538da0d15312edd A consequence is that this driver places received data with a 384 bytes headroom (RTE_PKTMBUF_HEADROOM + XDP_PACKET_HEADROOM). Which then defeats my unit test... This placement of data looks incorrect to me, from the DPDK mbuf API "spirit". Applications expect a RTE_PKTMBUF_HEADROOM headroom, and they size their buffers accordingly. This extra headroom would mean applications need to account for this peculiarity when using this driver... I will need to spend more time on this, but not now. > > Dropping the packet is probably preferable IMO, but that is also a > very large change, as none of the calling functions have return codes > themselves and some of the 2rd degree call backs don't either, so many > functions will need to change. Or extend dp_packet_resize() for af_xdp dp-packet. The tricky part is that the dp-packet is part of a umem buffer. If we make a af_xdp dp-packet points at a different malloc'd data buffer, we need to distinguish for this case when freeing this dp-packet. I can put this on my todolist. > > You can see the branch here: https://github.com/mkp-rh/ovs/tree/multitun > And the test run here: https://github.com/mkp-rh/ovs/actions/runs/7911539363 > > I'll clean up this a bit and address some of the other things > mentioned, like the incorrect Fixes tag. We don't need to fix all issues, the main point is the inner checksum issue, as it is something that got broken in 3.3. If we strip the 3rd layer of tunnel from my unit test, it would be enough to reproduce without hitting af_xdp headroom limit. Or do you think we can extend an existing test? At least, fixes should be isolated from the new features like one introduced in patch 1 of this series. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 2/4] netdev-linux: Favour inner packet for multi-encapsulated TSO.
Hello Mike, On Mon, Feb 12, 2024 at 8:50 PM Mike Pattrick wrote: > > Previously if an OVS configuration nested multiple layers of UDP tunnels > like VXLAN or GENEVE on top of each other through netdev-linux > interfaces, the vnet header would be incorrectly set to the outermost > UDP tunnel layer instead of the intermediary tunnel layer. > > This resulted in the middle UDP tunnel not checksum offloading properly. > > Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.") > Reported-by: David Marchand > Signed-off-by: Mike Pattrick I have some trouble relating this patch to the issue I faced :-). Could you detail a test that shows the issue you fix here? After applying (only this patch), I still reproduce an issue with inner checksums. As I reported this issue to you offlist, let me put the details in public here. I wrote a system-traffic.at unit test that stacks 3 vxlan tunnels (separate topic, but for the context, my goal was to stress DPDK dp-packets wrt headroom). If I try this unit test before commit 084c8087292c ("userspace: Support VXLAN and GENEVE TSO."), I have no issue. The topology is as follows: ## # # at_ns0. init_net # . # at_vxlan1 (10.1.1.1/24) . br0 (10.1.1.100/24) # (remote 172.31.1.100) . | # . at_vxlan0 # . (remote 172.31.1.1) # . # at_vxlan3 (172.31.1.1/24) . br-underlay0 (172.31.1.100/24) # (remote 172.31.2.100) . | # . at_vxlan2 # . (remote 172.31.2.1) # . # at_vxlan5 (172.31.2.1/24) . br-underlay1 (172.31.2.100/24) # (remote 172.31.3.100) . | # . at_vxlan4 # . (remote 172.31.3.1) # . # p0 (172.31.3.1/24). br-underlay2 (172.31.3.100/24) # | . | # \-.-ovs-p0 # ## (gmail will probably bust this copy/paste, so putting a link to the actual test: https://github.com/david-marchand/ovs/commit/manyvxlan~2#diff-45a77f85f9679bc66ac97300392c0d5d9f5c53264fa8a82d735a553246e71faeR400) With this setup, I try to ping, from at_ns0 netns, the ip address of the br tap iface plugged with the other side of each tunnel: - Most outter level, no encapsulation, all good: 16:24:51.590966 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 63550, offset 0, flags [DF], proto ICMP (1), length 84) 172.31.3.1 > 172.31.3.100: ICMP echo request, id 26707, seq 1, length 64 16:24:51.591084 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 28720, offset 0, flags [none], proto ICMP (1), length 84) 172.31.3.100 > 172.31.3.1: ICMP echo reply, id 26707, seq 1, length 64 - One tunnel encap all good: 16:24:54.140629 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 61052, offset 0, flags [none], proto UDP (17), length 134) 172.31.3.1.36831 > 172.31.3.100.vxlan: [udp sum ok] VXLAN, flags [I] (0x08), vni 0 1e:db:ec:e5:28:6d > 9a:39:be:e8:18:4b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 54399, offset 0, flags [DF], proto ICMP (1), length 84) 172.31.2.1 > 172.31.2.100: ICMP echo request, id 51488, seq 1, length 64 16:24:54.140772 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 134) 172.31.3.100.39912 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 0 9a:39:be:e8:18:4b > 1e:db:ec:e5:28:6d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 29701, offset 0, flags [none], proto ICMP (1), length 84) 172.31.2.100 > 172.31.2.1: ICMP echo reply, id 51488, seq 1, length 64 - Two tunnels encap: 16:24:58.578900 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4 (0x0800), length 142: (tos 0x0, ttl 64, id 61719, offset 0, flags [none], proto UDP (17), length 128) 172.31.3.1.50673 > 172.31.3.100.vxlan: [udp sum ok] VXLAN, flags [I] (0x08), vni 0 1e:db:ec:e5:28:6d > 9a:39:be:e8:18:4b, ethertype IPv4 (0x0800), length 92: (tos 0x0, ttl 64, id 35175, offset 0, flags [none], proto UDP (17), length 78) 172.31.2.1.44060 > 172.31.2.100.vxlan: [udp sum ok] VXLAN, flags [I] (0x08), vni 1 62:53:3f:82:da:56 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 172.31.1.100 tell 172.31.1.1, length 28 16:24:58.579021 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4 (0x0800), length 142: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 128) 172.31.3.100.56325 > 172.31.3.1.vxlan: [n
Re: [ovs-dev] [PATCH v4 3/4] dp-packet: Include inner offsets in adjustments and checks.
On Mon, Feb 12, 2024 at 7:54 AM Mike Pattrick wrote: > > Include inner offsets in functions where l3 and l4 offsets are either > modified or checked. > > Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") > Signed-off-by: Mike Pattrick Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 2/4] bfd: Set proper offsets and flags in BFD packets.
On Mon, Feb 12, 2024 at 7:53 AM Mike Pattrick wrote: > > Previously the BFD packet creation code did not appropriately set > offsets or flags. This contributed to issues involving encapsulation and > the TSO code. > > The transition to using standard functions also means some other > metadata like packet_type are set appropriately. > > Fixes: ccc096898c46 ("bfd: Implement Bidirectional Forwarding Detection.") > Signed-off-by: Mike Pattrick Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/2] dpif-netdev: Do not create handler threads.
On Tue, Feb 6, 2024 at 3:47 PM Eelco Chaudron wrote: > On 6 Feb 2024, at 15:17, David Marchand wrote: > > > On Tue, Feb 6, 2024 at 2:31 PM Eelco Chaudron wrote: > >> > >> Avoid unnecessary thread creation as no upcalls are generated, > >> resulting in idle threads waiting for process termination. > >> > >> This optimization significantly reduces memory usage, cutting it > >> by half on a 128 CPU/thread system during testing, with the number > >> of threads reduced from 95 to 0. > >> > >> Signed-off-by: Eelco Chaudron > > > > I find it weird that the dpif layer reports an information on how the > > ofproto-dpif layer behaves. > > The handler threads are something ofproto-dpif is responsible for. > > The upcall receiving loop is something the ofproto-dpif owns. > > Why should the dpif layer tells how many handlers are needed? > > > > > > I would have seen a different change, where the dpif layer exports a > > capability, like dpif_can_recv() { return !!dpif->dpif_class->recv; }. > > ofproto-dpif would then deduce there is no handler to start at all. > > That was my first idea also, but then I found there is already an API call to > the dpif layer where it can tell the user (ofproto in this case) how many > threads it needs to function correctly. Here is the API definition: > > 369 /* Queries 'dpif' to see if a certain number of handlers are > required by > 370 * the implementation. > 371 * > 372 * If a certain number of handlers are required, returns 'true' and > sets > 373 * 'n_handlers' to that number of handler threads. > 374 * > 375 * If not, returns 'false'. > 376 */ > 377 bool (*number_handlers_required)(struct dpif *dpif, uint32_t > *n_handlers); > > I guess the ‘If a certain number of handlers are required, returns 'true’’ > part fits here, as we need 0. The fact that it exists does not convince me on its validity :-). I must be missing something. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/2] dpif-netdev: Do not create handler threads.
On Tue, Feb 6, 2024 at 2:31 PM Eelco Chaudron wrote: > > Avoid unnecessary thread creation as no upcalls are generated, > resulting in idle threads waiting for process termination. > > This optimization significantly reduces memory usage, cutting it > by half on a 128 CPU/thread system during testing, with the number > of threads reduced from 95 to 0. > > Signed-off-by: Eelco Chaudron I find it weird that the dpif layer reports an information on how the ofproto-dpif layer behaves. The handler threads are something ofproto-dpif is responsible for. The upcall receiving loop is something the ofproto-dpif owns. Why should the dpif layer tells how many handlers are needed? I would have seen a different change, where the dpif layer exports a capability, like dpif_can_recv() { return !!dpif->dpif_class->recv; }. ofproto-dpif would then deduce there is no handler to start at all. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 4/4] ofproto-dpif-monitor: Remove unneeded calls to clear packets.
On Tue, Jan 30, 2024 at 11:15 PM Mike Pattrick wrote: > > Currently the monitor will call dp_packet_clear() on the dp_packet that > is shared amongst BFD, LLDP, and CFM. However, all of these packets are > created with eth_compose(), which already calls dp_packet_clear(). > > Signed-off-by: Mike Pattrick Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 3/4] dp-packet: Include inner offsets in adjustments and checks.
offsetof(struct dp_packet, l4_ofs)); > > +BUILD_ASSERT_DECL(offsetof(struct dp_packet, l4_ofs) + > + MEMBER_SIZEOF(struct dp_packet, l4_ofs) == > + offsetof(struct dp_packet, inner_l3_ofs)); > + > +BUILD_ASSERT_DECL(offsetof(struct dp_packet, inner_l3_ofs) + > + MEMBER_SIZEOF(struct dp_packet, inner_l3_ofs) == > + offsetof(struct dp_packet, inner_l4_ofs)); > + > /* The below build assert makes sure it's safe to read/write 128-bits > starting > * at the l2_pad_size location. */ > BUILD_ASSERT_DECL(sizeof(struct dp_packet) - > @@ -125,7 +134,7 @@ avx512_dp_packet_resize_l2(struct dp_packet *b, int > resize_by_bytes) > /* Each lane represents 16 bits in a 12-bit register. In this case the > * first three 16-bit values, which will map to the l2_5_ofs, l3_ofs and > * l4_ofs fields. */ > -const uint8_t k_lanes = 0b1110; > +const uint8_t k_lanes = 0b10; > > /* Set all 16-bit words in the 128-bits v_offset register to the value we > * need to add/substract from the l2_5_ofs, l3_ofs, and l4_ofs fields. */ Touching this part scares me. I think some comments are wrong, and otherwise I hope Intel CI will be enough to check nothing gets broken here :-). -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 2/4] bfd: Set proper offsets and flags in BFD packets.
On Tue, Jan 30, 2024 at 11:15 PM Mike Pattrick wrote: > > Previously the BFD packet creation code did not appropriately set > offsets or flags. This contributed to issues involving encapsulation and > the TSO code. I noted that apart from fixing the offsets / flags used to checksum offloading, this patch also fixes the packet_type used by other dp_packet helpers. I see nothing fixed on that later topic though. > > Fixes: ccc096898c46 ("bfd: Implement Bidirectional Forwarding Detection.") > Signed-off-by: Mike Pattrick Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 1/4] dp-packet: Validate correct offset for L4 inner size.
On Tue, Jan 30, 2024 at 11:15 PM Mike Pattrick wrote: > > This patch fixes the correctness of dp_packet_inner_l4_size() when > checking for the existence of an inner L4 header. Previously it checked > for the outer L4 header. > > This function is currently only used when a packet is already flagged > for tunneling, so an incorrect determination isn't possible as long as > the flags of the packet are correct. > > Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.") > Signed-off-by: Mike Pattrick Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] system-common-macros: Fix conntrack matching.
On Fri, Jan 19, 2024 at 1:49 PM Ilya Maximets wrote: > > On 1/18/24 14:00, David Marchand wrote: > > Seen in GHA recently. > > Unit tests are checking conntracks relating to a destination ip address > > but the FORMAT_CT macro is not strict enough and would match unrelated > > conntracks too. > > > > Example: > > 148. system-traffic.at:6432: testing conntrack - DNAT with > > additional SNAT ... > > [...] > > ./system-traffic.at:6460: ovs-appctl dpctl/dump-conntrack | > > grep "dst=10.1.1.1" | > > sed -e 's/port=[0-9]*/port=/g' > > -e 's/id=[0-9]*/id=/g' > > -e 's/state=[0-9_A-Z]*/state=/g' | sort | uniq > > [...] > > @@ -1,2 +1,7 @@ > > tcp,orig=(src=10.1.1.1,dst=172.1.1.2,sport=,... > > +tcp,...,reply=(src=13.107.42.16,dst=10.1.1.10,sport=,... > > +tcp,...,reply=(src=168.63.129.16,dst=10.1.1.10,sport=,... > > +tcp,...,reply=(src=20.242.161.191,dst=10.1.1.10,sport=,... > > +tcp,orig=(src=13.107.42.16,dst=10.1.1.10,sport=,... > > +tcp,orig=(src=20.242.161.191,dst=10.1.1.10,sport=,... > > > > Fixes: 07659514c3c1 ("Add support for connection tracking.") > > Signed-off-by: David Marchand > > --- > > tests/system-common-macros.at | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at > > index 01ebe364ee..07be29f673 100644 > > --- a/tests/system-common-macros.at > > +++ b/tests/system-common-macros.at > > @@ -256,7 +256,7 @@ m4_define([STRIP_MONITOR_CSUM], [grep "csum:" | sed > > 's/csum:.*/csum: /']) > > # and limit the output to the rows containing 'ip-addr'. > > # > > m4_define([FORMAT_CT], > > -[[grep "dst=$1" | sed -e 's/port=[0-9]*/port=/g' -e > > 's/id=[0-9]*/id=/g' -e 's/state=[0-9_A-Z]*/state=/g' | > > sort | uniq]]) > > +[[grep "dst=$1\>" | sed -e 's/port=[0-9]*/port=/g' -e > > 's/id=[0-9]*/id=/g' -e 's/state=[0-9_A-Z]*/state=/g' | > > sort | uniq]]) > > > > # NETNS_DAEMONIZE([namespace], [command], [pidfile]) > > # > > I remembered why the macro is loose. We wanted to be able > to match on "subnets" by supplying only part of the address. > > There was at least one test that used this functionality. > Eelco removed it though here: > > https://github.com/openvswitch/ovs/commit/a80883f7682158c7a6955360ee852e8279f748e9 > > Did you check if have any more instances of such tests? I did not. > They can be tricky to find, as we can supply 10.1.1.2 in order > to match 10.1.1.240, for example. Ok, you can discard my patch. Thanks. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] system-common-macros: Fix conntrack matching.
On Fri, Jan 19, 2024 at 1:20 PM Simon Horman wrote: > > On Thu, Jan 18, 2024 at 02:00:18PM +0100, David Marchand wrote: > > Seen in GHA recently. > > Unit tests are checking conntracks relating to a destination ip address > > but the FORMAT_CT macro is not strict enough and would match unrelated > > conntracks too. > > > > Example: > > 148. system-traffic.at:6432: testing conntrack - DNAT with > > additional SNAT ... > > [...] > > ./system-traffic.at:6460: ovs-appctl dpctl/dump-conntrack | > > grep "dst=10.1.1.1" | > > sed -e 's/port=[0-9]*/port=/g' > > -e 's/id=[0-9]*/id=/g' > > -e 's/state=[0-9_A-Z]*/state=/g' | sort | uniq > > [...] > > @@ -1,2 +1,7 @@ > > tcp,orig=(src=10.1.1.1,dst=172.1.1.2,sport=,... > > +tcp,...,reply=(src=13.107.42.16,dst=10.1.1.10,sport=,... > > +tcp,...,reply=(src=168.63.129.16,dst=10.1.1.10,sport=,... > > +tcp,...,reply=(src=20.242.161.191,dst=10.1.1.10,sport=,... > > +tcp,orig=(src=13.107.42.16,dst=10.1.1.10,sport=,... > > +tcp,orig=(src=20.242.161.191,dst=10.1.1.10,sport=,... > > > > Fixes: 07659514c3c1 ("Add support for connection tracking.") > > Signed-off-by: David Marchand > > --- > > tests/system-common-macros.at | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at > > index 01ebe364ee..07be29f673 100644 > > --- a/tests/system-common-macros.at > > +++ b/tests/system-common-macros.at > > @@ -256,7 +256,7 @@ m4_define([STRIP_MONITOR_CSUM], [grep "csum:" | sed > > 's/csum:.*/csum: /']) > > # and limit the output to the rows containing 'ip-addr'. > > # > > m4_define([FORMAT_CT], > > -[[grep "dst=$1" | sed -e 's/port=[0-9]*/port=/g' -e > > 's/id=[0-9]*/id=/g' -e 's/state=[0-9_A-Z]*/state=/g' | > > sort | uniq]]) > > +[[grep "dst=$1\>" | sed -e 's/port=[0-9]*/port=/g' -e > > 's/id=[0-9]*/id=/g' -e 's/state=[0-9_A-Z]*/state=/g' | > > sort | uniq]]) > > > > # NETNS_DAEMONIZE([namespace], [command], [pidfile]) > > # > > Sorry, I feel I mist be missing something very obvious, but > I'm unsure why the match is on "dst=$1\>". I would have thought > the match would be "dst=$1," instead. \> matches the end of a word. Using , as a delimiter works too in this case. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3] netdev-dpdk: Trigger port reconfiguration in main thread for resets.
When OVS (main thread) configures a DPDK netdev, it holds a netdev_dpdk mutex lock. As part of this configure operation, the net/iavf driver (used with i40e VF devices) triggers a queue count change. The PF entity (serviced by a kernel PF driver for example) handles this change and requests back that the VF driver resets the VF device. The driver then completes the VF reset operation on its side and waits for completion of the iavf-event thread responsible for handling various VF device events. On the other hand, handling of the VF reset request in this iavf-event thread results in notifying the application with a port reset request (RTE_ETH_EVENT_INTR_RESET). The OVS reset callback tries to take a hold of the same netdev_dpdk mutex and blocks the iavf-event thread. As a resut, the net/iavf driver (still running on OVS main thread) is unable to complete as it is waiting for iavf-event to complete. To break from this situation, the OVS reset callback now won't take a netdev_dpdk mutex. Instead, the port reset request is stored in a simple RTE_ETH_MAXPORTS array associated to a seq object. This is enough to let the VF driver complete this port initialisation. The OVS main thread later handles the port reset request. More details in the DPDK upstream bz as this issue appeared following a change in DPDK. Link: https://bugs.dpdk.org/show_bug.cgi?id=1337 Signed-off-by: David Marchand --- Changes since v2: - fixed build with clang, - fixed indentation, - updated NEWS, Changes since v1: - converted to atomic accesses on netdev_dpdk_pending_reset[], --- NEWS | 7 - lib/netdev-dpdk.c | 76 +-- 2 files changed, 61 insertions(+), 22 deletions(-) diff --git a/NEWS b/NEWS index 2153b48053..a6617546c6 100644 --- a/NEWS +++ b/NEWS @@ -54,13 +54,6 @@ v3.3.0 - xx xxx - Support for multicast snooping to show the protocol responsible for adding/updating the entry. -Known issues: - - DPDK: v23.11 has a change in behavior in handling i40e VF devices. This - may block and prevent OVS from adding such devices as ports in a netdev - datapath bridge. - For the details, see https://bugs.dpdk.org/show_bug.cgi?id=1337 which - describes the issue first detected in the 21.11 LTS branch. - v3.2.0 - 17 Aug 2023 diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index fb26825ff8..45f61930d4 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -58,6 +58,7 @@ #include "openvswitch/match.h" #include "openvswitch/ofp-parse.h" #include "openvswitch/ofp-print.h" +#include "openvswitch/poll-loop.h" #include "openvswitch/shash.h" #include "openvswitch/vlog.h" #include "ovs-numa.h" @@ -2101,32 +2102,73 @@ netdev_dpdk_process_devargs(struct netdev_dpdk *dev, return new_port_id; } +static struct seq *netdev_dpdk_reset_seq; +static uint64_t netdev_dpdk_last_reset_seq; +static atomic_bool netdev_dpdk_pending_reset[RTE_MAX_ETHPORTS]; + +static void +netdev_dpdk_wait(const struct netdev_class *netdev_class OVS_UNUSED) +{ +uint64_t last_reset_seq = seq_read(netdev_dpdk_reset_seq); + +if (netdev_dpdk_last_reset_seq == last_reset_seq) { +seq_wait(netdev_dpdk_reset_seq, netdev_dpdk_last_reset_seq); +} else { +poll_immediate_wake(); +} +} + +static void +netdev_dpdk_run(const struct netdev_class *netdev_class OVS_UNUSED) +{ +uint64_t reset_seq = seq_read(netdev_dpdk_reset_seq); + +if (reset_seq != netdev_dpdk_last_reset_seq) { +dpdk_port_t port_id; + +netdev_dpdk_last_reset_seq = reset_seq; + +for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) { +struct netdev_dpdk *dev; +bool pending_reset; + +atomic_read_relaxed(&netdev_dpdk_pending_reset[port_id], +&pending_reset); +if (!pending_reset) { +continue; +} +atomic_store_relaxed(&netdev_dpdk_pending_reset[port_id], false); + +ovs_mutex_lock(&dpdk_mutex); +dev = netdev_dpdk_lookup_by_port_id(port_id); +if (dev) { +ovs_mutex_lock(&dev->mutex); +dev->reset_needed = true; +netdev_request_reconfigure(&dev->up); +VLOG_DBG_RL(&rl, "%s: Device reset requested.", +netdev_get_name(&dev->up)); +ovs_mutex_unlock(&dev->mutex); +} +ovs_mutex_unlock(&dpdk_mutex); +} +} +} + static int dpdk_eth_event_callback(dpdk_port_t port_id, enum rte_eth_event_type type, void *param OVS_UNUSED, void *ret_param OVS_UNUSED) { -struct netdev_dpdk *dev; - switch ((int) type) { case RTE_ETH_EVENT_INTR_RESET: -ovs_mutex_lock(&dpdk_mutex);
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Trigger port reconfiguration in main thread for resets.
On Thu, Jan 18, 2024 at 4:43 PM Ilya Maximets wrote: > > On 1/18/24 15:41, David Marchand wrote: > > When OVS (main thread) configures a DPDK netdev, it holds a netdev_dpdk > > mutex lock. > > As part of this configure operation, the net/iavf driver (used with i40e > > VF devices) triggers a queue count change. The PF entity (serviced by a > > kernel PF driver for example) handles this change and requests back that > > the VF driver resets the VF device. The driver then completes the VF reset > > operation on its side and waits for completion of the iavf-event thread > > responsible for handling various VF device events. > > > > On the other hand, handling of the VF reset request in this iavf-event > > thread results in notifying the application with a port reset request > > (RTE_ETH_EVENT_INTR_RESET). The OVS reset callback tries to take a hold > > of the same netdev_dpdk mutex and blocks the iavf-event thread. > > > > As a resut, the net/iavf driver (still running on OVS main thread) is > > unable to complete as it is waiting for iavf-event to complete. > > > > To break from this situation, the OVS reset callback now won't take a > > netdev_dpdk mutex. Instead, the port reset request is stored in a simple > > RTE_ETH_MAXPORTS array associated to a seq object. > > This is enough to let the VF driver complete this port initialisation. > > The OVS main thread later handles the port reset request. > > > > More details in the DPDK upstream bz as this issue appeared following a > > change in DPDK. > > > > Link: https://bugs.dpdk.org/show_bug.cgi?id=1337 > > Signed-off-by: David Marchand > > --- > > Changes since v1: > > - converted to atomic accesses on netdev_dpdk_pending_reset[], > > > > > > --- > > lib/netdev-dpdk.c | 76 +-- > > 1 file changed, 61 insertions(+), 15 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index fb26825ff8..6b15e4c03a 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -58,6 +58,7 @@ > > #include "openvswitch/match.h" > > #include "openvswitch/ofp-parse.h" > > #include "openvswitch/ofp-print.h" > > +#include "openvswitch/poll-loop.h" > > #include "openvswitch/shash.h" > > #include "openvswitch/vlog.h" > > #include "ovs-numa.h" > > @@ -2101,32 +2102,73 @@ netdev_dpdk_process_devargs(struct netdev_dpdk *dev, > > return new_port_id; > > } > > > > +static struct seq *netdev_dpdk_reset_seq; > > +static uint64_t netdev_dpdk_last_reset_seq; > > +static bool netdev_dpdk_pending_reset[RTE_MAX_ETHPORTS]; > > Should be an atomic_bool, I suppose. Yes, already fixed. > > > + > > +static void > > +netdev_dpdk_wait(const struct netdev_class *netdev_class OVS_UNUSED) > > +{ > > +uint64_t last_reset_seq = seq_read(netdev_dpdk_reset_seq); > > + > > +if (netdev_dpdk_last_reset_seq == last_reset_seq) { > > +seq_wait(netdev_dpdk_reset_seq, netdev_dpdk_last_reset_seq); > > +} else { > > +poll_immediate_wake(); > > +} > > +} > > + > > +static void > > +netdev_dpdk_run(const struct netdev_class *netdev_class OVS_UNUSED) > > +{ > > +uint64_t reset_seq = seq_read(netdev_dpdk_reset_seq); > > + > > +if (reset_seq != netdev_dpdk_last_reset_seq) { > > +dpdk_port_t port_id; > > + > > +netdev_dpdk_last_reset_seq = reset_seq; > > + > > +for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) { > > +struct netdev_dpdk *dev; > > +bool pending_reset; > > + > > +atomic_read_relaxed(&netdev_dpdk_pending_reset[port_id], > > + &pending_reset); > > Indentation. Indeed. > > > +if (!pending_reset) { > > +continue; > > +} > > +atomic_store_relaxed(&netdev_dpdk_pending_reset[port_id], > > false); > > + > > +ovs_mutex_lock(&dpdk_mutex); > > +dev = netdev_dpdk_lookup_by_port_id(port_id); > > +if (dev) { > > +ovs_mutex_lock(&dev->mutex); > > +dev->reset_needed = true; > > +netdev_request_reconfigure(&dev->up); > > +VLOG_DBG_RL(&rl, "%s: Device reset requested.", > > +netd
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Trigger port reconfiguration in main thread for resets.
On Thu, Jan 18, 2024 at 3:41 PM David Marchand wrote: > Changes since v1: > - converted to atomic accesses on netdev_dpdk_pending_reset[], I did not wait for the clang results during my tests... cooking v3. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] netdev-dpdk: Trigger port reconfiguration in main thread for resets.
When OVS (main thread) configures a DPDK netdev, it holds a netdev_dpdk mutex lock. As part of this configure operation, the net/iavf driver (used with i40e VF devices) triggers a queue count change. The PF entity (serviced by a kernel PF driver for example) handles this change and requests back that the VF driver resets the VF device. The driver then completes the VF reset operation on its side and waits for completion of the iavf-event thread responsible for handling various VF device events. On the other hand, handling of the VF reset request in this iavf-event thread results in notifying the application with a port reset request (RTE_ETH_EVENT_INTR_RESET). The OVS reset callback tries to take a hold of the same netdev_dpdk mutex and blocks the iavf-event thread. As a resut, the net/iavf driver (still running on OVS main thread) is unable to complete as it is waiting for iavf-event to complete. To break from this situation, the OVS reset callback now won't take a netdev_dpdk mutex. Instead, the port reset request is stored in a simple RTE_ETH_MAXPORTS array associated to a seq object. This is enough to let the VF driver complete this port initialisation. The OVS main thread later handles the port reset request. More details in the DPDK upstream bz as this issue appeared following a change in DPDK. Link: https://bugs.dpdk.org/show_bug.cgi?id=1337 Signed-off-by: David Marchand --- Changes since v1: - converted to atomic accesses on netdev_dpdk_pending_reset[], --- lib/netdev-dpdk.c | 76 +-- 1 file changed, 61 insertions(+), 15 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index fb26825ff8..6b15e4c03a 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -58,6 +58,7 @@ #include "openvswitch/match.h" #include "openvswitch/ofp-parse.h" #include "openvswitch/ofp-print.h" +#include "openvswitch/poll-loop.h" #include "openvswitch/shash.h" #include "openvswitch/vlog.h" #include "ovs-numa.h" @@ -2101,32 +2102,73 @@ netdev_dpdk_process_devargs(struct netdev_dpdk *dev, return new_port_id; } +static struct seq *netdev_dpdk_reset_seq; +static uint64_t netdev_dpdk_last_reset_seq; +static bool netdev_dpdk_pending_reset[RTE_MAX_ETHPORTS]; + +static void +netdev_dpdk_wait(const struct netdev_class *netdev_class OVS_UNUSED) +{ +uint64_t last_reset_seq = seq_read(netdev_dpdk_reset_seq); + +if (netdev_dpdk_last_reset_seq == last_reset_seq) { +seq_wait(netdev_dpdk_reset_seq, netdev_dpdk_last_reset_seq); +} else { +poll_immediate_wake(); +} +} + +static void +netdev_dpdk_run(const struct netdev_class *netdev_class OVS_UNUSED) +{ +uint64_t reset_seq = seq_read(netdev_dpdk_reset_seq); + +if (reset_seq != netdev_dpdk_last_reset_seq) { +dpdk_port_t port_id; + +netdev_dpdk_last_reset_seq = reset_seq; + +for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) { +struct netdev_dpdk *dev; +bool pending_reset; + +atomic_read_relaxed(&netdev_dpdk_pending_reset[port_id], + &pending_reset); +if (!pending_reset) { +continue; +} +atomic_store_relaxed(&netdev_dpdk_pending_reset[port_id], false); + +ovs_mutex_lock(&dpdk_mutex); +dev = netdev_dpdk_lookup_by_port_id(port_id); +if (dev) { +ovs_mutex_lock(&dev->mutex); +dev->reset_needed = true; +netdev_request_reconfigure(&dev->up); +VLOG_DBG_RL(&rl, "%s: Device reset requested.", +netdev_get_name(&dev->up)); +ovs_mutex_unlock(&dev->mutex); +} +ovs_mutex_unlock(&dpdk_mutex); +} +} +} + static int dpdk_eth_event_callback(dpdk_port_t port_id, enum rte_eth_event_type type, void *param OVS_UNUSED, void *ret_param OVS_UNUSED) { -struct netdev_dpdk *dev; - switch ((int) type) { case RTE_ETH_EVENT_INTR_RESET: -ovs_mutex_lock(&dpdk_mutex); -dev = netdev_dpdk_lookup_by_port_id(port_id); -if (dev) { -ovs_mutex_lock(&dev->mutex); -dev->reset_needed = true; -netdev_request_reconfigure(&dev->up); -VLOG_DBG_RL(&rl, "%s: Device reset requested.", -netdev_get_name(&dev->up)); -ovs_mutex_unlock(&dev->mutex); -} -ovs_mutex_unlock(&dpdk_mutex); +atomic_store_relaxed(&netdev_dpdk_pending_reset[port_id], true); +seq_change(netdev_dpdk_reset_seq); break; default: /* Ignore all other types. */ break; - } - return 0; +} +
Re: [ovs-dev] [PATCH] netdev-dpdk: Trigger port reconfiguration in main thread for resets.
On Thu, Jan 18, 2024 at 2:23 PM Ilya Maximets wrote: > > On 1/18/24 14:16, David Marchand wrote: > > When OVS (main thread) configures a DPDK netdev, it holds a netdev_dpdk > > mutex lock. > > As part of this configure operation, the net/iavf driver (used with i40e > > VF devices) triggers a queue count change. The PF entity (serviced by a > > kernel PF driver for example) handles this change and requests back that > > the VF driver resets the VF device. The driver then completes the VF reset > > operation on its side and waits for completion of the iavf-event thread > > responsible for handling various VF device events. > > > > On the other hand, handling of the VF reset request in this iavf-event > > thread results in notifying the application with a port reset request > > (RTE_ETH_EVENT_INTR_RESET). The OVS reset callback tries to take a hold > > of the same netdev_dpdk mutex and blocks the iavf-event thread. > > > > As a resut, the net/iavf driver (still running on OVS main thread) is > > unable to complete as it is waiting for iavf-event to complete. > > > > To break from this situation, the OVS reset callback now won't take a > > netdev_dpdk mutex. Instead, the port reset request is stored in a simple > > RTE_ETH_MAXPORTS array associated to a seq object. > > This is enough to let the VF driver complete this port initialisation. > > The OVS main thread later handles the port reset request. > > > > Link: https://bugs.dpdk.org/show_bug.cgi?id=1337 > > Signed-off-by: David Marchand > > --- > > lib/netdev-dpdk.c | 73 +-- > > 1 file changed, 58 insertions(+), 15 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index fb26825ff8..528850971a 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -58,6 +58,7 @@ > > #include "openvswitch/match.h" > > #include "openvswitch/ofp-parse.h" > > #include "openvswitch/ofp-print.h" > > +#include "openvswitch/poll-loop.h" > > #include "openvswitch/shash.h" > > #include "openvswitch/vlog.h" > > #include "ovs-numa.h" > > @@ -2101,32 +2102,70 @@ netdev_dpdk_process_devargs(struct netdev_dpdk *dev, > > return new_port_id; > > } > > > > +static struct seq *netdev_dpdk_reset_seq; > > +static uint64_t netdev_dpdk_last_reset_seq; > > +static bool netdev_dpdk_pending_reset[RTE_MAX_ETHPORTS]; > > + > > +static void > > +netdev_dpdk_wait(const struct netdev_class *netdev_class OVS_UNUSED) > > +{ > > +uint64_t last_reset_seq = seq_read(netdev_dpdk_reset_seq); > > + > > +if (netdev_dpdk_last_reset_seq == last_reset_seq) { > > +seq_wait(netdev_dpdk_reset_seq, netdev_dpdk_last_reset_seq); > > +} else { > > +poll_immediate_wake(); > > +} > > +} > > + > > +static void > > +netdev_dpdk_run(const struct netdev_class *netdev_class OVS_UNUSED) > > +{ > > +uint64_t reset_seq = seq_read(netdev_dpdk_reset_seq); > > + > > +if (reset_seq != netdev_dpdk_last_reset_seq) { > > +dpdk_port_t port_id; > > + > > +netdev_dpdk_last_reset_seq = reset_seq; > > + > > +for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) { > > +struct netdev_dpdk *dev; > > + > > +if (!netdev_dpdk_pending_reset[port_id]) { > > +continue; > > +} > > +netdev_dpdk_pending_reset[port_id] = false; > > + > > +ovs_mutex_lock(&dpdk_mutex); > > +dev = netdev_dpdk_lookup_by_port_id(port_id); > > +if (dev) { > > +ovs_mutex_lock(&dev->mutex); > > +dev->reset_needed = true; > > +netdev_request_reconfigure(&dev->up); > > +VLOG_DBG_RL(&rl, "%s: Device reset requested.", > > +netdev_get_name(&dev->up)); > > +ovs_mutex_unlock(&dev->mutex); > > +} > > +ovs_mutex_unlock(&dpdk_mutex); > > +} > > +} > > +} > > + > > static int > > dpdk_eth_event_callback(dpdk_port_t port_id, enum rte_eth_event_type type, > > void *param OVS_UNUSED, void *ret_param OVS_UNUSED) > > Can we rely on this callback to always be called from a main thread? > Otherwise, we should use atomics for an array or a separate lock. I assumed it was not necessary after reading seq.h. * Thread-safety * = * * Fully thread safe. seq_change() synchronizes with seq_read() and * seq_wait() on the same variable in release-acquire fashion. That * is, all effects of the memory accesses performed by a thread prior * to seq_change() are visible to the threads returning from * seq_read() or seq_wait() observing that change. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev-dpdk: Trigger port reconfiguration in main thread for resets.
When OVS (main thread) configures a DPDK netdev, it holds a netdev_dpdk mutex lock. As part of this configure operation, the net/iavf driver (used with i40e VF devices) triggers a queue count change. The PF entity (serviced by a kernel PF driver for example) handles this change and requests back that the VF driver resets the VF device. The driver then completes the VF reset operation on its side and waits for completion of the iavf-event thread responsible for handling various VF device events. On the other hand, handling of the VF reset request in this iavf-event thread results in notifying the application with a port reset request (RTE_ETH_EVENT_INTR_RESET). The OVS reset callback tries to take a hold of the same netdev_dpdk mutex and blocks the iavf-event thread. As a resut, the net/iavf driver (still running on OVS main thread) is unable to complete as it is waiting for iavf-event to complete. To break from this situation, the OVS reset callback now won't take a netdev_dpdk mutex. Instead, the port reset request is stored in a simple RTE_ETH_MAXPORTS array associated to a seq object. This is enough to let the VF driver complete this port initialisation. The OVS main thread later handles the port reset request. Link: https://bugs.dpdk.org/show_bug.cgi?id=1337 Signed-off-by: David Marchand --- lib/netdev-dpdk.c | 73 +-- 1 file changed, 58 insertions(+), 15 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index fb26825ff8..528850971a 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -58,6 +58,7 @@ #include "openvswitch/match.h" #include "openvswitch/ofp-parse.h" #include "openvswitch/ofp-print.h" +#include "openvswitch/poll-loop.h" #include "openvswitch/shash.h" #include "openvswitch/vlog.h" #include "ovs-numa.h" @@ -2101,32 +2102,70 @@ netdev_dpdk_process_devargs(struct netdev_dpdk *dev, return new_port_id; } +static struct seq *netdev_dpdk_reset_seq; +static uint64_t netdev_dpdk_last_reset_seq; +static bool netdev_dpdk_pending_reset[RTE_MAX_ETHPORTS]; + +static void +netdev_dpdk_wait(const struct netdev_class *netdev_class OVS_UNUSED) +{ +uint64_t last_reset_seq = seq_read(netdev_dpdk_reset_seq); + +if (netdev_dpdk_last_reset_seq == last_reset_seq) { +seq_wait(netdev_dpdk_reset_seq, netdev_dpdk_last_reset_seq); +} else { +poll_immediate_wake(); +} +} + +static void +netdev_dpdk_run(const struct netdev_class *netdev_class OVS_UNUSED) +{ +uint64_t reset_seq = seq_read(netdev_dpdk_reset_seq); + +if (reset_seq != netdev_dpdk_last_reset_seq) { +dpdk_port_t port_id; + +netdev_dpdk_last_reset_seq = reset_seq; + +for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) { +struct netdev_dpdk *dev; + +if (!netdev_dpdk_pending_reset[port_id]) { +continue; +} +netdev_dpdk_pending_reset[port_id] = false; + +ovs_mutex_lock(&dpdk_mutex); +dev = netdev_dpdk_lookup_by_port_id(port_id); +if (dev) { +ovs_mutex_lock(&dev->mutex); +dev->reset_needed = true; +netdev_request_reconfigure(&dev->up); +VLOG_DBG_RL(&rl, "%s: Device reset requested.", +netdev_get_name(&dev->up)); +ovs_mutex_unlock(&dev->mutex); +} +ovs_mutex_unlock(&dpdk_mutex); +} +} +} + static int dpdk_eth_event_callback(dpdk_port_t port_id, enum rte_eth_event_type type, void *param OVS_UNUSED, void *ret_param OVS_UNUSED) { -struct netdev_dpdk *dev; - switch ((int) type) { case RTE_ETH_EVENT_INTR_RESET: -ovs_mutex_lock(&dpdk_mutex); -dev = netdev_dpdk_lookup_by_port_id(port_id); -if (dev) { -ovs_mutex_lock(&dev->mutex); -dev->reset_needed = true; -netdev_request_reconfigure(&dev->up); -VLOG_DBG_RL(&rl, "%s: Device reset requested.", -netdev_get_name(&dev->up)); -ovs_mutex_unlock(&dev->mutex); -} -ovs_mutex_unlock(&dpdk_mutex); +netdev_dpdk_pending_reset[port_id] = true; +seq_change(netdev_dpdk_reset_seq); break; default: /* Ignore all other types. */ break; - } - return 0; +} +return 0; } static void @@ -5001,6 +5040,8 @@ netdev_dpdk_class_init(void) "[netdev]", 0, 1, netdev_dpdk_get_mempool_info, NULL); +netdev_dpdk_reset_seq = seq_create(); +netdev_dpdk_last_reset_seq = seq_read(netdev_dpdk_reset_seq); ret = rte_eth_dev_callback_register(RTE_ETH_ALL,
[ovs-dev] [PATCH] system-common-macros: Fix conntrack matching.
Seen in GHA recently. Unit tests are checking conntracks relating to a destination ip address but the FORMAT_CT macro is not strict enough and would match unrelated conntracks too. Example: 148. system-traffic.at:6432: testing conntrack - DNAT with additional SNAT ... [...] ./system-traffic.at:6460: ovs-appctl dpctl/dump-conntrack | grep "dst=10.1.1.1" | sed -e 's/port=[0-9]*/port=/g' -e 's/id=[0-9]*/id=/g' -e 's/state=[0-9_A-Z]*/state=/g' | sort | uniq [...] @@ -1,2 +1,7 @@ tcp,orig=(src=10.1.1.1,dst=172.1.1.2,sport=,... +tcp,...,reply=(src=13.107.42.16,dst=10.1.1.10,sport=,... +tcp,...,reply=(src=168.63.129.16,dst=10.1.1.10,sport=,... +tcp,...,reply=(src=20.242.161.191,dst=10.1.1.10,sport=,... +tcp,orig=(src=13.107.42.16,dst=10.1.1.10,sport=,... +tcp,orig=(src=20.242.161.191,dst=10.1.1.10,sport=,... Fixes: 07659514c3c1 ("Add support for connection tracking.") Signed-off-by: David Marchand --- tests/system-common-macros.at | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at index 01ebe364ee..07be29f673 100644 --- a/tests/system-common-macros.at +++ b/tests/system-common-macros.at @@ -256,7 +256,7 @@ m4_define([STRIP_MONITOR_CSUM], [grep "csum:" | sed 's/csum:.*/csum: /']) # and limit the output to the rows containing 'ip-addr'. # m4_define([FORMAT_CT], -[[grep "dst=$1" | sed -e 's/port=[0-9]*/port=/g' -e 's/id=[0-9]*/id=/g' -e 's/state=[0-9_A-Z]*/state=/g' | sort | uniq]]) +[[grep "dst=$1\>" | sed -e 's/port=[0-9]*/port=/g' -e 's/id=[0-9]*/id=/g' -e 's/state=[0-9_A-Z]*/state=/g' | sort | uniq]]) # NETNS_DAEMONIZE([namespace], [command], [pidfile]) # -- 2.43.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpdk: Update to use v23.11.
Hello, On Mon, Jan 15, 2024 at 12:58 PM Ilya Maximets wrote: > OK. I think today we have no real choice but to go with the option 1. > We'll need a NEWS entry for that in the patch. I'll make sure to include > a variant of it in the release announce in February if nothing changes > until then. > > But I think we should still pursue the option 2 in case the solution will > be found before the final release in February. > > Though if there will be no conclusion on the long term problem until autumn, > we should go with 3 and move 24.11 adoption to summer of 2025. And follow > that strategy going forward, as the current approach is not sustainable. > > > > > David, let us know if you agree ? If so, maybe you can send a new > > version of the patch with the added documentation. I can help with docs > > or discussing further. > > David, could you, please, add a note in the NEWS file and send a new version > of the patch? I just sent a v2. I'll continue investigating the iavf issue. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] dpdk: Update to use v23.11.
This commit adds support for DPDK v23.11. It updates the CI script and documentation and includes the following changes coming from the dpdk-latest branch: - sparse: Add some compiler intrinsics for DPDK build. https://patchwork.ozlabs.org/project/openvswitch/list/?series=371129&state=* - ci: Cache DPDK installed libraries only. - ci: Reduce optional libraries in DPDK. https://patchwork.ozlabs.org/project/openvswitch/list/?series=383367&state=* - system-dpdk: Ignore net/ice error log about QinQ offloading. https://patchwork.ozlabs.org/project/openvswitch/list/?series=385259&state=* There is a known issue with i40e VF devices where OVS main thread may block when adding such devices as dpif-netdev dpdk ports. Signed-off-by: David Marchand --- .ci/dpdk-build.sh| 28 +++- .ci/linux-build.sh | 9 .github/workflows/build-and-test.yml | 4 ++-- Documentation/faq/releases.rst | 2 +- Documentation/intro/install/dpdk.rst | 16 +++--- Documentation/topics/dpdk/phy.rst| 12 +- Documentation/topics/dpdk/vdev.rst | 2 +- Documentation/topics/dpdk/vhost-user.rst | 2 +- Documentation/topics/testing.rst | 2 +- Documentation/topics/userspace-tso.rst | 2 +- NEWS | 9 debian/control.in| 2 +- include/sparse/automake.mk | 1 + include/sparse/ia32intrin.h | 23 +++ rhel/openvswitch-fedora.spec.in | 2 +- tests/system-dpdk-macros.at | 1 + 16 files changed, 80 insertions(+), 37 deletions(-) create mode 100644 include/sparse/ia32intrin.h diff --git a/.ci/dpdk-build.sh b/.ci/dpdk-build.sh index d4c178ee0d..23f3166a54 100755 --- a/.ci/dpdk-build.sh +++ b/.ci/dpdk-build.sh @@ -5,25 +5,27 @@ set -x function build_dpdk() { -local VERSION_FILE="dpdk-dir/cached-version" local DPDK_VER=$1 local DPDK_OPTS="" +local DPDK_INSTALL_DIR="$(pwd)/dpdk-dir" +local VERSION_FILE="$DPDK_INSTALL_DIR/cached-version" -rm -rf dpdk-dir +rm -rf dpdk-src +rm -rf $DPDK_INSTALL_DIR if [ "${DPDK_VER##refs/*/}" != "${DPDK_VER}" ]; then -git clone --single-branch $DPDK_GIT dpdk-dir -b "${DPDK_VER##refs/*/}" -pushd dpdk-dir +git clone --single-branch $DPDK_GIT dpdk-src -b "${DPDK_VER##refs/*/}" +pushd dpdk-src git log -1 --oneline else wget https://fast.dpdk.org/rel/dpdk-$1.tar.xz tar xvf dpdk-$1.tar.xz > /dev/null DIR_NAME=$(tar -tf dpdk-$1.tar.xz | head -1 | cut -f1 -d"/") -mv ${DIR_NAME} dpdk-dir -pushd dpdk-dir +mv ${DIR_NAME} dpdk-src +pushd dpdk-src fi -# Switching to 'default' machine to make dpdk-dir cache usable on +# Switching to 'default' machine to make the dpdk cache usable on # different CPUs. We can't be sure that all CI machines are exactly same. DPDK_OPTS="$DPDK_OPTS -Dmachine=default" @@ -40,16 +42,22 @@ function build_dpdk() DPDK_OPTS="$DPDK_OPTS -Denable_apps=test-pmd" enable_drivers="net/null,net/af_xdp,net/tap,net/virtio,net/pcap" DPDK_OPTS="$DPDK_OPTS -Denable_drivers=$enable_drivers" +# OVS depends on the vhost library (and its dependencies). +# net/tap depends on the gso library. +DPDK_OPTS="$DPDK_OPTS -Denable_libs=cryptodev,dmadev,gso,vhost" # Install DPDK using prefix. -DPDK_OPTS="$DPDK_OPTS --prefix=$(pwd)/build" +DPDK_OPTS="$DPDK_OPTS --prefix=$DPDK_INSTALL_DIR" meson $DPDK_OPTS build ninja -C build ninja -C build install - -echo "Installed DPDK in $(pwd)" popd + +# Remove examples sources. +rm -rf $DPDK_INSTALL_DIR/share/dpdk/examples + +echo "Installed DPDK in $DPDK_INSTALL_DIR" echo "${DPDK_VER}" > ${VERSION_FILE} } diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh index 90581c10b7..cf1462a0c4 100755 --- a/.ci/linux-build.sh +++ b/.ci/linux-build.sh @@ -10,8 +10,9 @@ JOBS=${JOBS:-"-j4"} function install_dpdk() { -local VERSION_FILE="dpdk-dir/cached-version" -local DPDK_LIB=$(pwd)/dpdk-dir/build/lib/x86_64-linux-gnu +local DPDK_INSTALL_DIR="$(pwd)/dpdk-dir" +local VERSION_FILE="${DPDK_INSTALL_DIR}/cached-version" +local DPDK_LIB=${DPDK_INSTALL_DIR}/lib/x86_64-linux-gnu if [ "$DPDK_SHARED" ]; then EXTRA_OPTS="$EXTRA_OPTS --with-dpdk=shared" @@ -27,13 +28,13 @@ function install_dpdk() export PATH=$(pwd)/dpdk-dir/build/bin:$PATH if [ ! -f "${VERSION_FILE}" ]; then -echo "Could not find DPDK in $(pwd)/dpdk-dir&qu
[ovs-dev] [PATCH v6] system-dpdk: Test with mlx5 devices.
The DPDK unit test only runs if vfio or igb_uio kernel modules are loaded: on systems with only mlx5, this test is always skipped. Besides, the test tries to grab the first device listed by dpdk-devbind.py, regardless of the PCI device status regarding kmod binding. Remove dependency on this DPDK script and use a minimal script that reads PCI sysfs. This script is not perfect, as one can imagine PCI devices bound to vfio-pci for virtual machines. Plus, this script only tries to take over vfio-pci devices. mlx5 devices can't be taken over blindly as it could mean losing connectivity to the machine if the netdev was in use for this system. For those two reasons, add a new environment variable DPDK_PCI_ADDR for testers to select the PCI device of their liking. For consistency and grep, the temporary file PCI_ADDR is renamed to DPDK_PCI_ADDR. Reviewed-by: Maxime Coquelin Acked-by: Eelco Chaudron Signed-off-by: David Marchand --- Changes since v5: - rebased, - moved the script to the python scripts list in automake.mk, - bumped copyright date, Changes since v4: - separated from the original series, - rebased, - dropped mlx5 devices from the discovery script, - documented DPDK_PCI_ADDR env variable, Changes since v3: - fixed nit from Maxime, Changes since v2: - sorted logs alphabetically, --- Documentation/topics/testing.rst | 11 ++--- tests/automake.mk| 1 + tests/system-dpdk-find-device.py | 39 tests/system-dpdk-macros.at | 10 ++-- tests/system-dpdk.at | 14 ++-- 5 files changed, 57 insertions(+), 18 deletions(-) create mode 100755 tests/system-dpdk-find-device.py diff --git a/Documentation/topics/testing.rst b/Documentation/topics/testing.rst index 5f6940b84d..fb9b3e77b1 100644 --- a/Documentation/topics/testing.rst +++ b/Documentation/topics/testing.rst @@ -343,15 +343,20 @@ To see a list of all the available tests, run:: These tests support a `DPDK supported NIC`_. The tests operate on a wider set of environments, for instance, when a virtual port is used. -They do require proper DPDK variables (``DPDK_DIR`` and ``DPDK_BUILD``). Moreover you need to have root privileges to load the required modules and to bind -the NIC to the DPDK-compatible driver. +a PCI device to the DPDK-compatible driver. .. _DPDK supported NIC: https://core.dpdk.org/supported/#nics +The phy test will skip if no suitable PCI device is found. +It is possible to select which PCI device is used for this test by setting the +DPDK_PCI_ADDR environment variable, which is especially useful when testing +with a mlx5 device:: + +# DPDK_PCI_ADDR=:82:00.0 make check-dpdk + All tests are skipped if no hugepages are configured. User must look into the DPDK manual to figure out how to `Configure hugepages`_. -The phy test will skip if no compatible physical device is available. .. _Configure hugepages: https://doc.dpdk.org/guides-22.11/linux_gsg/sys_reqs.html diff --git a/tests/automake.mk b/tests/automake.mk index 2ae0aeecaf..10c9fbb01f 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -520,6 +520,7 @@ CHECK_PYFILES = \ tests/flowgen.py \ tests/genpkts.py \ tests/ovsdb-monitor-sort.py \ + tests/system-dpdk-find-device.py \ tests/test-daemon.py \ tests/test-dpparse.py \ tests/test-json.py \ diff --git a/tests/system-dpdk-find-device.py b/tests/system-dpdk-find-device.py new file mode 100755 index 00..ced74e7f31 --- /dev/null +++ b/tests/system-dpdk-find-device.py @@ -0,0 +1,39 @@ +#!/usr/bin/env python3 +# Copyright (c) 2024 Red Hat, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at: +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from pathlib import Path +import os +import sys + +# The tester might want to select a PCI device, if so, trust it. +if 'DPDK_PCI_ADDR' in os.environ: +print(os.environ['DPDK_PCI_ADDR']) +sys.exit(0) + +for device in sorted(Path('/sys/bus/pci/devices').iterdir()): +class_path = device / 'class' +# Only consider Network class devices +if class_path.read_text().strip() != '0x02': +continue +kmod_path = device / 'driver' / 'module' +kmod_name = kmod_path.resolve().name +# Only care about devices bound to vfio_pci or igb_uio. +if kmod_name not in ['vfio_pci', 'igb_uio']: +continue +print(
[ovs-dev] [PATCH v5 2/2] tests: Move MFEX tests to dpif-netdev.
The MFEX code and tests do not depend on DPDK anymore. We can move the unit tests to dpif-netdev. Reviewed-by: Maxime Coquelin Acked-by: Eelco Chaudron Acked-by: Kumar Amber Signed-off-by: David Marchand --- Changes since v4: - rebased, - added dummy numa configuration, Changes since v3: - removed documentation update, --- tests/dpif-netdev.at | 165 tests/system-dpdk.at | 197 --- 2 files changed, 165 insertions(+), 197 deletions(-) diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at index d0359b5eab..c9474af0ad 100644 --- a/tests/dpif-netdev.at +++ b/tests/dpif-netdev.at @@ -852,3 +852,168 @@ OVS_VSWITCHD_STOP(["dnl /.*failed to put.*$/d /.*failed to flow_del.*$/d"]) AT_CLEANUP + +AT_SETUP([dpif-netdev - MFEX Autovalidator]) +AT_SKIP_IF([! $PYTHON3 -c "import scapy"], [], []) +AT_SKIP_IF([! $PYTHON3 $srcdir/genpkts.py 2000 > packets]) +OVS_VSWITCHD_START( + [add-port br0 p1 \ + -- set Interface p1 type=dummy-pmd], [], [], [--dummy-numa="0,0,0,0,1,1,1,1"]) + +AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d | grep "True"], [], [dnl +]) + +AT_CHECK([ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512], [0], [dnl +DPIF implementation set to dpif_avx512. +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator], [0], [dnl +Miniflow extract implementation set to autovalidator. +]) + +cat packets | while read line; do + AT_CHECK([ovs-appctl netdev-dummy/receive p1 $line], [0], [ignore]) +done + +OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep -oP 'rx_packets=\s*\K\d+'` -ge 16000]) + +OVS_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([dpif-netdev - MFEX Autovalidator Fuzzy]) +AT_SKIP_IF([! $PYTHON3 -c "import scapy"], [], []) +AT_SKIP_IF([! $PYTHON3 $srcdir/genpkts.py 2000 fuzzy > packets]) +OVS_VSWITCHD_START( + [add-port br0 p1 \ + -- set Interface p1 type=dummy-pmd], [], [], [--dummy-numa="0,0,0,0,1,1,1,1"]) + +AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d | grep "True"], [], [dnl +]) + +AT_CHECK([ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512], [0], [dnl +DPIF implementation set to dpif_avx512. +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator], [0], [dnl +Miniflow extract implementation set to autovalidator. +]) + +cat packets | while read line; do + AT_CHECK([ovs-appctl netdev-dummy/receive p1 $line], [0], [ignore]) +done + +OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep -oP 'rx_packets=\s*\K\d+'` -ge 16000]) + +OVS_VSWITCHD_STOP(["dnl +/upcall: datapath reached the dynamic limit of .* flows./d"]) +AT_CLEANUP + +AT_SETUP([dpif-netdev - MFEX Configuration]) +OVS_VSWITCHD_START( + [set Open_vSwitch . other_config:pmd-cpu-mask=0x1 \ + -- add-port br0 p1 \ + -- set Interface p1 type=dummy-pmd], [], [], [--dummy-numa="0,0,0,0,1,1,1,1"]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set scalar 1], [2], +[], [dnl +Error: unknown argument 1. +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set -pmd 6 study 300 xyz], [2], +[], [dnl +Error: invalid study_pkt_cnt value: xyz. +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set scalar abcd], [2], +[], [dnl +Error: unknown argument abcd. +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set -pmd 0 scalar abcd], [2], +[], [dnl +Error: unknown argument abcd. +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set -pmd], [2], +[], [dnl +Error: -pmd option requires a thread id argument. +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set tudy abcd], [2], +[], [dnl +Error: unknown argument abcd. +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set -pmd 7 study abcd], [2], +[], [dnl +Error: invalid study_pkt_cnt value: abcd. +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set -pmd 0 study], [0], [dnl +Miniflow extract implementation set to study, on pmd thread 0, studying 128 packets. +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set -pmd 0 study 512], [0], [dnl +Miniflow extract implementation set to study, on pmd thread 0, studying 512 packets. +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set study 512], [0], [dnl +Miniflow extract implementation set to study, studying 512 packets. +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set study], [0], [dnl +Miniflow extract implementation set to study, studying 128 packets. +]) + +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set -pmd 0 aut
[ovs-dev] [PATCH v5 1/2] system-dpdk: Use dummy-pmd port for packet injection.
net_pcap is not always available in DPDK (like, in a dev environment when you forgot to install the libpcap-devel). On the other hand, OVS already has its own way to inject packets into a bridge. Let's make use of it. The generating script outputs a bulk of 8 packets per line (to save some cpu spent calling ovs-appctl). Suggested-by: Ilya Maximets Reviewed-by: Maxime Coquelin Acked-by: Eelco Chaudron Signed-off-by: David Marchand --- Changes since v4: - rebased, - updated genpkts.py so it outputs 8 packets per line and have netdev-dummy/receive called for 8 packets at a time (this little optimisation seems enough to avoid hitting OVS_CTL_TIMEOUT), - stopped calling genpkts.py asynchronuously, Changes since v3: - dropped documentation update following rebase, - fixed regression in fuzzy packets generation, Changes since v2: - updated documentation, - cleaned tests/automake.mk, - fixed shebang in python script, - added missing check for scapy availability, Changes since v1: - renamed generator script, - decreased packet count for fuzzy test, - simplified wait expression for packet count, --- tests/automake.mk | 6 +--- tests/{mfex_fuzzy.py => genpkts.py} | 56 ++--- tests/system-dpdk.at| 24 - 3 files changed, 43 insertions(+), 43 deletions(-) rename tests/{mfex_fuzzy.py => genpkts.py} (66%) diff --git a/tests/automake.mk b/tests/automake.mk index f8a925012d..2ae0aeecaf 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -146,10 +146,6 @@ $(srcdir)/tests/fuzz-regression-list.at: tests/automake.mk echo "TEST_FUZZ_REGRESSION([$$basename])"; \ done > $@.tmp && mv $@.tmp $@ -EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS) -MFEX_AUTOVALIDATOR_TESTS = \ - tests/mfex_fuzzy.py - OVSDB_CLUSTER_TESTSUITE_AT = \ tests/ovsdb-cluster-testsuite.at \ tests/ovsdb-execution.at \ @@ -522,7 +518,7 @@ tests_test_type_props_SOURCES = tests/test-type-props.c CHECK_PYFILES = \ tests/appctl.py \ tests/flowgen.py \ - tests/mfex_fuzzy.py \ + tests/genpkts.py \ tests/ovsdb-monitor-sort.py \ tests/test-daemon.py \ tests/test-dpparse.py \ diff --git a/tests/mfex_fuzzy.py b/tests/genpkts.py similarity index 66% rename from tests/mfex_fuzzy.py rename to tests/genpkts.py index 50b9870641..3354e116d0 100755 --- a/tests/mfex_fuzzy.py +++ b/tests/genpkts.py @@ -17,51 +17,44 @@ except ModuleNotFoundError: from scapy.all import RandMAC, RandIP, PcapWriter, RandIP6, RandShort, fuzz from scapy.all import IPv6, Dot1Q, IP, Ether, UDP, TCP, random -# Path for the pcap file location. -path = str(sys.argv[1]) # The number of packets generated will be size * 8. -size = int(sys.argv[2]) +size = int(sys.argv[1]) # Traffic option is used to choose between fuzzy or simple packet type. -if len(sys.argv) > 3: -traffic_opt = str(sys.argv[3]) +if len(sys.argv) > 2: +traffic_opt = str(sys.argv[2]) else: traffic_opt = "" -pktdump = PcapWriter(path, append=False, sync=True) - -pkt = [] - for i in range(0, size): +pkt = [] + if traffic_opt == "fuzzy": eth = Ether(src=RandMAC(), dst=RandMAC()) vlan = Dot1Q() -udp = UDP(dport=RandShort(), sport=RandShort()) ipv4 = IP(src=RandIP(), dst=RandIP(), len=random.randint(0, 100)) ipv6 = IPv6(src=RandIP6(), dst=RandIP6(), plen=random.randint(0, 100)) +udp = UDP(dport=RandShort(), sport=RandShort()) tcp = TCP(dport=RandShort(), sport=RandShort(), flags='S', dataofs=random.randint(0, 15)) # IPv4 packets with fuzzing -pkt.append(fuzz(eth / ipv4 / udp)) -pkt.append(fuzz(eth / ipv4 / tcp)) -pkt.append(fuzz(eth / vlan / ipv4 / udp)) -pkt.append(fuzz(eth / vlan / ipv4 / tcp)) +pkt.append(fuzz(eth / ipv4 / udp).build().hex()) +pkt.append(fuzz(eth / ipv4 / tcp).build().hex()) +pkt.append(fuzz(eth / vlan / ipv4 / udp).build().hex()) +pkt.append(fuzz(eth / vlan / ipv4 / tcp).build().hex()) # IPv6 packets with fuzzing -pkt.append(fuzz(eth / ipv6 / udp)) -pkt.append(fuzz(eth / ipv6 / tcp)) -pkt.append(fuzz(eth / vlan / ipv6 / udp)) -pkt.append(fuzz(eth / vlan / ipv6 / tcp)) +pkt.append(fuzz(eth / ipv6 / udp).build().hex()) +pkt.append(fuzz(eth / ipv6 / tcp).build().hex()) +pkt.append(fuzz(eth / vlan / ipv6 / udp).build().hex()) +pkt.append(fuzz(eth / vlan / ipv6 / tcp).build().hex()) else: mac_addr_src = "52:54:00:FF:FF:{:02X}".format(i % 0xff) mac_addr_dst = "80:FF:FF:FF:FF:{:02X}".format(i % 0xff) -src_port = 200 + (i % 20) -dst_port = 1000 + (i % 20) eth = Ether(src=mac_addr_src, dst=mac_addr_dst) vlan = Dot1Q(vlan=(i % 10)) -udp = UDP(dp