On 4/3/24 19:40, David Marchand wrote:
> On Thu, Mar 28, 2024 at 10:16 AM David Marchand
> <david.march...@redhat.com> wrote:
>>
>> The outer checksum offloading API in DPDK is ambiguous and was
>> added by Intel folks with the assumption that any outer offloading
>> always goes with an inner offloading request.
>>
>> With net/i40e and net/ice drivers, requesting outer ip checksum with a
>> tunnel context but no inner offloading request triggers a Tx failure
>> associated with a port MDD event.
>> 2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
>>         MDD event
>>
>> To avoid this situation, if no checksum or segmentation offloading is
>> requested on the inner part of a packet, fallback to "normal" (non outer)
>> offloading request.
>> And outer offloading can be re-enabled for net/i40e and netice.
>>
>> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
>> Signed-off-by: David Marchand <david.march...@redhat.com>
>> ---
>> Changes since v1:
>> - reset inner marks before converting outer requests,
>> - fixed some coding style,
>>
>> ---
>>  lib/netdev-dpdk.c | 83 ++++++++++++++++++++++++-----------------------
>>  1 file changed, 43 insertions(+), 40 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 2111f77681..ae43594a3d 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -1354,18 +1354,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>>          info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_TCP_CKSUM;
>>      }
>>
>> -    if (!strcmp(info.driver_name, "net_ice")
>> -        || !strcmp(info.driver_name, "net_i40e")) {
>> -        /* FIXME: Driver advertises the capability but doesn't seem
>> -         * to actually support it correctly.  Can remove this once
>> -         * the driver is fixed on DPDK side. */
>> -        VLOG_INFO("%s: disabled Tx outer udp checksum offloads for a "
>> -                  "net/ice or net/i40e port.", netdev_get_name(&dev->up));
>> -        info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM;
>> -        info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO;
>> -        info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO;
>> -    }
>> -
> 
> A few comments after spending some time on the topic.

Hi, David.  Thanks for working on the issue.  I appreciate this.

> - This patch fixes some misusage of the DPDK API.

Hmm, I understand that the driver does something funny when it gets
outer flags set without any inner flags, but how is that a misuse
of the DPDK API?  Could you point me to the API docs that say that
inner flags must always be set in this case or that setting only
outer offloads is not allowed?

I agree that it seems safer to just downgrade all outer flags to
inner ones on OVS side, when no inner offloads are requested, I'm
just not sure I agree that it's an API misuse.  Especially since
non-Intel cards seem to work fine.

> Basically, resolving a neighbor with ARP inside tunnels is broken on
> Intel nics (even without re-enabling outer udp checksum).
> This can be observed with the following debug patch at the end of
> netdev_dpdk_prep_hwol_packet():
> 
> +    char buf[256];
> +    if (rte_get_tx_ol_flag_list(mbuf->ol_flags, buf, sizeof(buf)) != 0)
> +        buf[0] = '\0';
> +    VLOG_WARN("len=%u, ol_flags=%s, outer_l2_len=%u, outer_l3_len=%u,
> l2_len=%u, l3_len=%u, l4_len=%u, tso_segsz=%u", mbuf->pkt_len, buf,
> mbuf->outer_l2_len, mbuf->outer_l3_len, mbuf->l2_len, mbuf->l3_len,
> mbuf->l4_len, mbuf->tso_segsz);
> 
> Then doing a "arping" inside the tunnel triggers:
> 2024-04-03T16:05:40.920Z|00014|netdev_dpdk(pmd-c03/id:8)|WARN|len=96,
> ol_flags=RTE_MBUF_F_TX_L4_NO_CKSUM RTE_MBUF_F_TX_OUTER_IP_CKSUM
> RTE_MBUF_F_TX_OUTER_IPV4 RTE_MBUF_F_TX_TUNNEL_VXLAN , outer_l2_len=18,
> outer_l3_len=20, l2_len=0, l3_len=0, l4_len=0, tso_segsz=0
> 2024-04-03T16:05:40.920Z|00012|dpdk|WARN|ice_interrupt_handler():
> OICR: MDD event
> 
> We need this fix in OVS regardless of the outer udp checksum issue.
> I'll respin this fix in a new series, without touching UDP checksum capa.
> 
> 
> - It does seem that X710 nics have no support for outer udp checksum
> (according to its datasheet). Some X722 version may have support for
> it, but net/i40e does not configure the Tx descriptor accordingly.
> On the other hand, E810 ones seem fine (according to its datasheet).
> 
> After more debugging, I managed to get outer udp checksum working.
> I understand the DPDK rte_net_intel_cksum_flags_prepare() helper does
> not set the pseudo header checksum in the outer udp header.
> I proposed a fix in the dpdk bz.
> 
> Waiting for the fix on DPDK side... it is still possible to add the
> missing bits in OVS (see the branch I pointed at in the OVS issue).

Since this feature never worked with ice in OVS and it is experimental,
I tend to think that we should just disable it for ice as well until
DPDK is fixed.

A little too many fixes for that thing we have already and this one will
involve some extra driver-specific logic that we don't have any automated
tests for.

> 
> 
> - About the workaround (disabling outer udp checksum for net/ice and
> net/i40e), the net/iavf is subject to the same bugs.
> So we should disable outer udp checksum too for this driver.
> 
> However, I am not sure the iavf driver (can?) differentiates which PF
> / hw is used underneath.
> So we may have no solution but to always disable this type of
> offloading in OVS for net/iavf.

IAVF should just not advertise support if the underlying hardware
doesn't support it.  There should be a way for a driver to know the
exact card it runs on.  But we can add it to the list of broken
DPDK drivers in OVS until it's fixed, sure.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to