Hello, On Mon, Jun 2, 2025 at 3:23 PM Ilya Maximets <[email protected]> wrote: > > On 5/22/25 1:13 PM, Ilya Maximets wrote: > > On 3/13/25 1:43 PM, David Marchand wrote: > >> Following a bug report about vhost-user port traffic with a vxlan tunnel > >> being dropped, I relooked at the offloading code to see how we could > >> switch to the non legacy offload flags mode in the vhost-user libary. > >> > >> As I dug into this, and with Ilya recent comment (on switching to an > >> inner offload API) in mind, I wrote this series. > >> > >> The result seems promising in terms of code. > >> I did no extensive performance testing yet, but I would expect either > >> similar performance (or maybe better performance for non offloaded > >> traffic, as a number of branches are removed in the flow extraction > >> code). > >> > >> > >> Mike recently posted a patch for removing the csum_start/csum_offset > >> netdev-linux for which I also had a patch, but since it had some > >> implications on the handling of inner checksums, I ended up including my > >> version for now. > >> > >> > >> There is still one extra step that is left for a future revision: > >> eliminate last if (tunnel) in miniflow_extract(). > >> > >> > >> Comments? > > > > Thanks, David. > > > > I think, this is a right approach for moving forward, should be > > much easier to understand and maintain in the long term. > > > > For now I applied patches 1 though 6 and the patch 8. I made a > > few small adjustments listed below as per review comments from > > Mike and my own while looking through the set. > > Since there are mainly bug fixes and test improvements, I also > > backported them down to 3.3. Having all these tests on LTS will > > also allow us to be slightly more confident when backporting > > checksum offload fixes in the future. > > > > Patch 7 is also a bug fix and I agree that we should be handling > > offload flags properly in the simple match case, I just want to > > do a few more additional tests manually, including a performance > > check, before applying this one. > > > > List of adjustments: > > Patch 3: Re-worded a commit message. > > Patch 4: Added coverage counter checks in the test as they were > > missing in a couple 'Flag as Rx good.' cases. > > Patch 6: Renamed coverage counters removing the 'netdev_' part. > > This module is generally named 'native_tnl' in the logs, > > we should be using the same shorter name for the counters. > > > > Will follow up on reviews for the rest of the set next week. > > I replied to the patch 9 conversation and I don't have major comments for > other patches in the set, so it may be a good idea to respin and continue > from there. > > I did some performance testing with a basic v2v setup and it's a bit of a > roller coaster throughout the set. virtio-user ports in this setup do not > have any offloads enabled and a pure v2v performance first goes noticeably > higher after patches 9 and 10, but then it goes down and ends up lower than > the original by about 1-2% at the end of the set.
I see something similar: - origin/main, 5223225 - patch 9, 5230925 - patch 10, 5234560 - patch 11, 5302217 - patch 12, 5162640 - patch 13, 5106118 - patch 14, 5170704 - patch 15, 5163440 - patch 16, 5151656 - patch 17, 5076095 - patch 18, 5112686 - patch 19, 5260415 > > I can get back most of the performance gains seen at patch 11 by moving the > following part to the case where L4 checksums are enabled: > > /* There is no support in virtio net to offload IPv4 csum, > * but the vhost library handles IPv4 csum offloading fine. */ > dev->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; > > And then wrapping the netdev_dpdk_prep_hwol_batch() call with the > if (!netdev->ol_flags) check. I even see better number with this change: - no ipv4, 5363635 > > I'm not sure if having ip checksum offload has a lot of sense when L4 > checksums are not supported. Now that we have a good software fallback, it seems we don't need to fake support from OVS pov. [snipped the hack] > It also improved performance quite a bit. > I guess, there might be an optimization of accumulating offload flags > per batch and then avoid calling the offload preparation function per > packet if none required. I am a bit skeptical. Accumulating and just checking the value as a whole would still get unneeded calls to the prepare helper for "resolved" packets (non-partial and non-tso packets). Maybe I did not understand the suggestion. For now, I'll send a n+1 with the no ip csum suggestion. Let's continue in the new thread. -- David Marchand _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
