On 6/23/23 19:33, David Marchand wrote:
> On Fri, Jun 23, 2023 at 5:14 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>> On 6/23/23 15:16, David Marchand wrote:
>>> On Wed, Jun 21, 2023 at 12:13 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>>
>>>> On 6/14/23 21:03, Mike Pattrick wrote:
>>>>> The netdev receiving packets is supposed to provide the flags
>>>>> indicating if the L4 checksum was verified and it is OK or BAD,
>>>>> otherwise the stack will check when appropriate by software.
>>>>>
>>>>> If the packet comes with good checksum, then postpone the
>>>>> checksum calculation to the egress device if needed.
>>>>>
>>>>> When encapsulate a packet with that flag, set the checksum
>>>>> of the inner L4 header since that is not yet supported.
>>>>>
>>>>> Calculate the L4 checksum when the packet is going to be sent
>>>>> over a device that doesn't support the feature.
>>>>>
>>>>> Linux tap devices allows enabling L3 and L4 offload, so this
>>>>> patch enables the feature. However, Linux socket interface
>>>>> remains disabled because the API doesn't allow enabling
>>>>> those two features without enabling TSO too.
>>>>>
>>>>> Signed-off-by: Flavio Leitner <f...@sysclose.org>
>>>>> Co-authored-by: Flavio Leitner <f...@sysclose.org>
>>>>> Signed-off-by: Mike Pattrick <m...@redhat.com>
>>>>> ---
>>>>>  Since v9:
>>>>>   - Extended miniflow_extract changes into avx512 code
>>>>>   - Formatting changes
>>>>>   - Note that we cannot currently enable checksum offloading in
>>>>>     CONFIGURE_VETH_OFFLOADS for check-system-userspace as
>>>>>     netdev-linux.c currently only parses the vnet header if TSO
>>>>>     is enabled.
>>>>>  Since v10:
>>>>>   - No change
>>>>>  Since v11:
>>>>>   - Added AVX512 IPv6 checksum offload support.
>>>>>   - Improved error messages and logging.
>>>>>  Since v12:
>>>>>   - Added missing mutex annotations
>>>>>  Since v13:
>>>>>   - Added TUNGETFEATURES check in netdev-linux
>>>>>  Since v14:
>>>>>   - Only check TUNGETFEATURES once
>>>>>   - Respect FLOW_TNL_F_CSUM flag
>>>>> ---
>>>>>  lib/conntrack.c                  |  15 +-
>>>>>  lib/dp-packet.c                  |  25 +++
>>>>>  lib/dp-packet.h                  |  78 +++++++++-
>>>>>  lib/dpif-netdev-extract-avx512.c |  62 +++++++-
>>>>>  lib/flow.c                       |  23 +++
>>>>>  lib/netdev-dpdk.c                | 172 +++++++++++++++------
>>>>>  lib/netdev-linux.c               | 258 ++++++++++++++++++++++---------
>>>>>  lib/netdev-native-tnl.c          |  32 +---
>>>>>  lib/netdev.c                     |  46 ++----
>>>>>  lib/odp-execute-avx512.c         |  88 +++++++----
>>>>>  lib/packets.c                    | 175 ++++++++++++++++-----
>>>>>  lib/packets.h                    |   3 +
>>>>>  12 files changed, 710 insertions(+), 267 deletions(-)
>>>>
>>>> <snip>
>>>>
>>>>> @@ -5443,22 +5513,22 @@ netdev_dpdk_vhost_client_reconfigure(struct 
>>>>> netdev *netdev)
>>>>>          }
>>>>>
>>>>>          if (userspace_tso_enabled()) {
>>>>> -            netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
>>>>> -            netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM;
>>>>> -            netdev->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CKSUM;
>>>>> -            netdev->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CKSUM;
>>>>> -            netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM;
>>>>> -            vhost_unsup_flags = 1ULL << VIRTIO_NET_F_HOST_ECN
>>>>> -                                | 1ULL << VIRTIO_NET_F_HOST_UFO;
>>>>> +            virtio_unsup_features = 1ULL << VIRTIO_NET_F_HOST_ECN
>>>>> +                                    | 1ULL << VIRTIO_NET_F_HOST_UFO;
>>>>> +            VLOG_DBG("%s: TSO enabled on vhost port",
>>>>> +                     netdev_get_name(&dev->up));
>>>>>          } else {
>>>>> -            /* This disables checksum offloading and all the features
>>>>> -             * that depends on it (TSO, UFO, ECN) according to virtio
>>>>> -             * specification. */
>>>>> -            vhost_unsup_flags = 1ULL << VIRTIO_NET_F_CSUM;
>>>>> +            /* Advertise checksum offloading to the guest, but explicitly
>>>>> +             * disable TSO and friends.
>>>>> +             * NOTE: we can't disable HOST_ECN which may have been 
>>>>> wrongly
>>>>> +             * negotiated by a running guest. */
>>>>> +            virtio_unsup_features = 1ULL << VIRTIO_NET_F_HOST_TSO4
>>>>> +                                    | 1ULL << VIRTIO_NET_F_HOST_TSO6
>>>>> +                                    | 1ULL << VIRTIO_NET_F_HOST_UFO;
>>>>
>>>> Hold on a second... Why exactly we can disable UFO, but can't disable ECN ?
>>>>
>>>> Previously, this branch of code was disabling VIRTIO_NET_F_CSUM, so neither
>>>> ECN or UFO should be negotiated, right?
>>>>
>>>> Or is it possible to have ECN negotiated without VIRTIO_NET_F_HOST_CSUM 
>>>> enabled?
>>>> In that case, why the same is not true for UFO as well?
>>>>
>>>> What do I miss here?
>>>
>>> Sorry, long mail in hope I am not missing anything:
>>>
>>>
>>> Pasting the v1.2 virtio spec
>>> https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-2210001:
>>> VIRTIO_NET_F_HOST_TSO4 Requires VIRTIO_NET_F_CSUM.
>>> VIRTIO_NET_F_HOST_TSO6 Requires VIRTIO_NET_F_CSUM.
>>> VIRTIO_NET_F_HOST_ECN Requires VIRTIO_NET_F_HOST_TSO4 or 
>>> VIRTIO_NET_F_HOST_TSO6.
>>> VIRTIO_NET_F_HOST_UFO Requires VIRTIO_NET_F_CSUM.
>>>
>>> So I agree, one would expect that none of the HOST_TSO*/ECN/UFO
>>> features are successfully negotiated in the absence of
>>> VIRTIO_NET_F_CSUM.
>>>
>>> But, in practice for some released OVS versions, ECN does get
>>> negotiated without VIRTIO_NET_F_CSUM.
>>
>> Yeah, that is no good.
>>
>>> It is a bug in vhost-user/ovs that do not invalidate ECN.
>>
>> It might have been better if vhost-user library didn't enable any
>> features that require support on the application side, unless
>> application asked for them.  Currently the application needs to
>> know all the features that the library "supports" and disable them.
>>
>> Though I'm not sure if that can actually be changed at this point.
> 
> Maxime and Chenbo (vhost-user maintainers) on dpdk side will have to
> make sure such new features are opt-in, not opt-out.
> But for existing ones, that's too late: the issue were are looking at
> is a good example.
> 
> 
>>
>>>
>>> I started ovs with tso disabled (before the recent changes), then
>>> looking at a vm first boot up:
>>> # git ll | head -1
>>> 22df63c38 - (HEAD) Documentation: Document netdev offload. (8 days
>>> ago) <Mike Pattrick>
>>>
>>> Qemu requests virtio features supported by vhost-user:
>>> Thread 17 "vhost-events" hit Breakpoint 1, vhost_user_get_features
>>> (pdev=0x7f6263dfc638, msg=0x7f6263dfc640, main_fd=78) at
>>> ../lib/vhost/vhost_user.c:325
>>> 325    {
>>> Missing separate debuginfos, use: yum debuginfo-install
>>> libevent-2.1.8-5.el8.x86_64 libibverbs-44.0-2.el8.1.x86_64
>>> unbound-libs-1.16.2-2.el8.x86_64
>>> (gdb) n
>>> 327        uint64_t features = 0;
>>> (gdb) n
>>> 329        if (validate_msg_fds(msg, 0) != 0)
>>> (gdb) n
>>> 332        rte_vhost_driver_get_features(dev->ifname, &features);
>>> (gdb) n
>>> 334        msg->payload.u64 = features;
>>> (gdb) p features
>>> $1 = 57921677258
>>>
>>> Which translates to:
>>> ./features.sh 57921677258
>>> VIRTIO_NET_F_GUEST_CSUM
>>> VIRTIO_NET_F_MTU
>>> VIRTIO_NET_F_GSO
>>> VIRTIO_NET_F_GUEST_TSO4
>>> VIRTIO_NET_F_GUEST_TSO6
>>> VIRTIO_NET_F_GUEST_ECN
>>> VIRTIO_NET_F_GUEST_UFO
>>> VIRTIO_NET_F_HOST_ECN
>>> VIRTIO_NET_F_MRG_RXBUF
>>> VIRTIO_NET_F_CTRL_VQ
>>> VIRTIO_NET_F_CTRL_RX
>>> VIRTIO_NET_F_GUEST_ANNOUNCE
>>> VIRTIO_NET_F_MQ
>>> Unknown feature bit: 26
>>> VIRTIO_F_ANY_LAYOUT
>>> VIRTIO_RING_F_INDIRECT_DESC
>>> VIRTIO_RING_F_EVENT_IDX
>>> ! VHOST_USER_F_PROTOCOL_FEATURES
>>> VIRTIO_F_VERSION_1
>>> VIRTIO_F_RING_PACKED
>>> VIRTIO_F_IN_ORDER
>>>
>>> ECN is incorrectly exposed.
>>> qemu could filter this incorrect feature, but unfortunately, later:
>>
>> Do you know which check in QEMU code triggers a disconnection?
> 
> I did not check in detail, but the log I get is:
> 2023-06-23T11:43:31.475416Z qemu-kvm: failed to init vhost_net for queue 0
> vhost lacks feature mask 6145 for backend
> 
> Which is likely from:
> https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/vhost_net.c#L223

Thanks for the pointer.

> 
> qemu saved acked features, and expects them to be available on reconnect.
> 
> 
>>> As a conclusion, we may have to handle both incorrect setting of UFO
>>> and ECN :-(.
>>
>> But we can't really do that from the OVS side, can we?
>> DPDK will clear the bit even if we enable it...
> 
> Hum, I did not test it yet, but I was thinking of calling
> rte_vhost_driver_enable_features.
> Now that I look at the code, I think it works, since the clearing
> happens in rte_vhost_driver_register().

Yeah, it should work.
I think, we should just always disable every F_HOST_* and F_CSUM right
after rte_vhost_driver_register(), and then enable what we need. Might
be easier to follow the code this way.  It's hard to always think in
negatives.

> 
> 
> Another thought.. so far, CSUM was serving as a gate for HOST_UFO,
> HOST_TSO etc...
> 
> Leaving ECN exposed without TSO is dirty, but the specification states
> that ECN depends on TSO features, so OVS can "hide" behind this
> assumption and expect no application sends ECN offloading requests.
> 
> On the other hand, if OVS leaves UFO exposed and CSUM is now the
> default, an application may (rightfully?) expect UFO works...
> And I may have to support a two stage retry for my "recovery patch"
> (not sure I am explaining clearly enough..).

Yeah, we can't really enable checksum offload if UFO is negotiated.
The checksum is not negotiated in these old VMs, but IIUC, if the
guest driver will be re-initialized, it may pick up new features.

Currently possible configurations:

1. CSUM + TSO + ___ + ___   OVS 3.1   with userspace-tso=true
2. ____ + TSO + ECN + ___   OVS 3.1   with userspace-tso=false
3. ____ + ___ + ECN + UFO   OVS 2.11

The "___" means not negotiated feature.
Only in case 1 we can actually receive packets with offloading
configured, because it's the only case where a base prerequisite
CSUM is negotiated.

Is that correct or am I missing some case?

Configurations that we can allow (we can allow any subset of them):

a. CSUM + TSO + ___ + ___
b. CSUM + ___ + ECN + ___
c. ____ + TSO + ECN + UFO

So, we can only safely upgrade/migrate:

1 --> a
2 --> c
3 --> c

"Safe" configurations:

a. CSUM + TSO + ___ + ___
c. ____ + TSO + ECN + UFO

Configurations we actually want:

x. CSUM + TSO + ___ + ___
y. CSUM + ___ + ___ + ___


Logic can be:

1. Try 'a/x' if userspace-tso=true
2. Try 'y'   if userspace-tso=false
3. Try 'c'   if above failed

As far as I understand we can advertise new features, but we can't remove
already acked ones.  So, the configuration 'c' is a catch-all for broken
cases.  'y' will only be possible for new/restarted VMs.

Does that make sense?

best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to