On Thu, Mar 28, 2024 at 10:16 AM David Marchand <david.march...@redhat.com> wrote: > > The outer checksum offloading API in DPDK is ambiguous and was > added by Intel folks with the assumption that any outer offloading > always goes with an inner offloading request. > > With net/i40e and net/ice drivers, requesting outer ip checksum with a > tunnel context but no inner offloading request triggers a Tx failure > associated with a port MDD event. > 2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR: > MDD event > > To avoid this situation, if no checksum or segmentation offloading is > requested on the inner part of a packet, fallback to "normal" (non outer) > offloading request. > And outer offloading can be re-enabled for net/i40e and netice. > > Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") > Signed-off-by: David Marchand <david.march...@redhat.com> > --- > Changes since v1: > - reset inner marks before converting outer requests, > - fixed some coding style, > > --- > lib/netdev-dpdk.c | 83 ++++++++++++++++++++++++----------------------- > 1 file changed, 43 insertions(+), 40 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 2111f77681..ae43594a3d 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -1354,18 +1354,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) > info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_TCP_CKSUM; > } > > - if (!strcmp(info.driver_name, "net_ice") > - || !strcmp(info.driver_name, "net_i40e")) { > - /* FIXME: Driver advertises the capability but doesn't seem > - * to actually support it correctly. Can remove this once > - * the driver is fixed on DPDK side. */ > - VLOG_INFO("%s: disabled Tx outer udp checksum offloads for a " > - "net/ice or net/i40e port.", netdev_get_name(&dev->up)); > - info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM; > - info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO; > - info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO; > - } > -
A few comments after spending some time on the topic. - This patch fixes some misusage of the DPDK API. Basically, resolving a neighbor with ARP inside tunnels is broken on Intel nics (even without re-enabling outer udp checksum). This can be observed with the following debug patch at the end of netdev_dpdk_prep_hwol_packet(): + char buf[256]; + if (rte_get_tx_ol_flag_list(mbuf->ol_flags, buf, sizeof(buf)) != 0) + buf[0] = '\0'; + VLOG_WARN("len=%u, ol_flags=%s, outer_l2_len=%u, outer_l3_len=%u, l2_len=%u, l3_len=%u, l4_len=%u, tso_segsz=%u", mbuf->pkt_len, buf, mbuf->outer_l2_len, mbuf->outer_l3_len, mbuf->l2_len, mbuf->l3_len, mbuf->l4_len, mbuf->tso_segsz); Then doing a "arping" inside the tunnel triggers: 2024-04-03T16:05:40.920Z|00014|netdev_dpdk(pmd-c03/id:8)|WARN|len=96, ol_flags=RTE_MBUF_F_TX_L4_NO_CKSUM RTE_MBUF_F_TX_OUTER_IP_CKSUM RTE_MBUF_F_TX_OUTER_IPV4 RTE_MBUF_F_TX_TUNNEL_VXLAN , outer_l2_len=18, outer_l3_len=20, l2_len=0, l3_len=0, l4_len=0, tso_segsz=0 2024-04-03T16:05:40.920Z|00012|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event We need this fix in OVS regardless of the outer udp checksum issue. I'll respin this fix in a new series, without touching UDP checksum capa. - It does seem that X710 nics have no support for outer udp checksum (according to its datasheet). Some X722 version may have support for it, but net/i40e does not configure the Tx descriptor accordingly. On the other hand, E810 ones seem fine (according to its datasheet). After more debugging, I managed to get outer udp checksum working. I understand the DPDK rte_net_intel_cksum_flags_prepare() helper does not set the pseudo header checksum in the outer udp header. I proposed a fix in the dpdk bz. Waiting for the fix on DPDK side... it is still possible to add the missing bits in OVS (see the branch I pointed at in the OVS issue). - About the workaround (disabling outer udp checksum for net/ice and net/i40e), the net/iavf is subject to the same bugs. So we should disable outer udp checksum too for this driver. However, I am not sure the iavf driver (can?) differentiates which PF / hw is used underneath. So we may have no solution but to always disable this type of offloading in OVS for net/iavf. -- David Marchand _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev