Hello,

On Mon, Jun 2, 2025 at 3:23 PM Ilya Maximets <[email protected]> wrote:
>
> On 5/22/25 1:13 PM, Ilya Maximets wrote:
> > On 3/13/25 1:43 PM, David Marchand wrote:
> >> Following a bug report about vhost-user port traffic with a vxlan tunnel
> >> being dropped, I relooked at the offloading code to see how we could
> >> switch to the non legacy offload flags mode in the vhost-user libary.
> >>
> >> As I dug into this, and with Ilya recent comment (on switching to an
> >> inner offload API) in mind, I wrote this series.
> >>
> >> The result seems promising in terms of code.
> >> I did no extensive performance testing yet, but I would expect either
> >> similar performance (or maybe better performance for non offloaded
> >> traffic, as a number of branches are removed in the flow extraction
> >> code).
> >>
> >>
> >> Mike recently posted a patch for removing the csum_start/csum_offset
> >> netdev-linux for which I also had a patch, but since it had some
> >> implications on the handling of inner checksums, I ended up including my
> >> version for now.
> >>
> >>
> >> There is still one extra step that is left for a future revision:
> >> eliminate last if (tunnel) in miniflow_extract().
> >>
> >>
> >> Comments?
> >
> > Thanks, David.
> >
> > I think, this is a right approach for moving forward, should be
> > much easier to understand and maintain in the long term.
> >
> > For now I applied patches 1 though 6 and the patch 8.  I made a
> > few small adjustments listed below as per review comments from
> > Mike and my own while looking through the set.
> > Since there are mainly bug fixes and test improvements, I also
> > backported them down to 3.3.  Having all these tests on LTS will
> > also allow us to be slightly more confident when backporting
> > checksum offload fixes in the future.
> >
> > Patch 7 is also a bug fix and I agree that we should be handling
> > offload flags properly in the simple match case, I just want to
> > do a few more additional tests manually, including a performance
> > check, before applying this one.
> >
> > List of adjustments:
> > Patch 3: Re-worded a commit message.
> > Patch 4: Added coverage counter checks in the test as they were
> >          missing in a couple 'Flag as Rx good.' cases.
> > Patch 6: Renamed coverage counters removing the 'netdev_' part.
> >          This module is generally named 'native_tnl' in the logs,
> >          we should be using the same shorter name for the counters.
> >
> > Will follow up on reviews for the rest of the set next week.
>
> I replied to the patch 9 conversation and I don't have major comments for
> other patches in the set, so it may be a good idea to respin and continue
> from there.
>
> I did some performance testing with a basic v2v setup and it's a bit of a
> roller coaster throughout the set.  virtio-user ports in this setup do not
> have any offloads enabled and a pure v2v performance first goes noticeably
> higher after patches 9 and 10, but then it goes down and ends up lower than
> the original by about 1-2% at the end of the set.

I see something similar:
- origin/main, 5223225
- patch 9,     5230925
- patch 10,    5234560
- patch 11,    5302217
- patch 12,    5162640
- patch 13,    5106118
- patch 14,    5170704
- patch 15,    5163440
- patch 16,    5151656
- patch 17,    5076095
- patch 18,    5112686
- patch 19,    5260415


>
> I can get back most of the performance gains seen at patch 11 by moving the
> following part to the case where L4 checksums are enabled:
>
>             /* There is no support in virtio net to offload IPv4 csum,
>              * but the vhost library handles IPv4 csum offloading fine. */
>             dev->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD;
>
> And then wrapping the netdev_dpdk_prep_hwol_batch() call with the
> if (!netdev->ol_flags) check.

I even see better number with this change:
- no ipv4,     5363635


>
> I'm not sure if having ip checksum offload has a lot of sense when L4
> checksums are not supported.

Now that we have a good software fallback, it seems we don't need to
fake support from OVS pov.

[snipped the hack]

> It also improved performance quite a bit.
> I guess, there might be an optimization of accumulating offload flags
> per batch and then avoid calling the offload preparation function per
> packet if none required.

I am a bit skeptical.
Accumulating and just checking the value as a whole would still get
unneeded calls to the prepare helper for "resolved" packets
(non-partial and non-tso packets).
Maybe I did not understand the suggestion.

For now, I'll send a n+1 with the no ip csum suggestion.
Let's continue in the new thread.


-- 
David Marchand

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to