On 2/10/25 14:30, Ilya Maximets wrote: > On 2/7/25 06:46, Mike Pattrick wrote: >> On Thu, Feb 6, 2025 at 9:15 AM David Marchand <[email protected]> >> wrote: >>> >>> Hello, >>> >>> On Wed, Feb 5, 2025 at 1:55 PM Ilya Maximets <[email protected]> wrote: >>>> >>>> On 1/23/25 16:56, David Marchand wrote: >>>>> Rather than drop all pending Tx offloads on recirculation, >>>>> preserve inner offloads (and mark packet with outer Tx offloads) >>>>> when parsing the packet again. >>>>> >>>>> Fixes: c6538b443984 ("dpif-netdev: Fix crash due to tunnel offloading on >>>>> recirculation.") >>>>> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") >>>>> Signed-off-by: David Marchand <[email protected]> >>>>> --- >>>>> Changes since v1: >>>>> - rebased, >>>>> - dropped API change on miniflow_extract(), rely on tunnel offloading >>>>> flag presence instead, >>>>> - introduced dp_packet_reset_outer_offsets, >>>>> >>>>> --- >>>>> lib/dp-packet.h | 23 +++++++++++------------ >>>>> lib/dpif-netdev.c | 27 --------------------------- >>>>> lib/flow.c | 34 ++++++++++++++++++++++++++++------ >>>>> 3 files changed, 39 insertions(+), 45 deletions(-) >>>> >>>> Hi, David. Thanks for the patch! >>>> >>>> Did you run some performance tests with this change? It touches the very >>>> core of packet parsing, so we need to check how that impacts normal V2V or >>>> PVP scenarios even without tunneling. >>> >>> I would be surprised those added branches add much to the already good >>> number of branches in miniflow_extract. >>> Though I can understand a concern of decreased performance. >>> >>> >>> I did a "simple" test with testpmd as a tgen and a simple port0 -> >>> port1 and port1 -> port0 openflow rules. >>> 1 pmd thread per port on isolated cpu, no thread sibling. >>> >>> I used current main branch: >>> 481bc0979 - (HEAD, origin/main, origin/HEAD) route-table: Allow >>> parsing routes without nexthop. (7 days ago) <Martin Kalcok> >>> >>> Unexpectedly, I see a slight improvement (I repeated builds, >>> configuration and tests a few times). >> >> Hello David, >> >> I also did a few performance tests. In all tests below I generated >> traffic in a VM with iperf3, transited a netdev datapath OVS, and >> egressed through an i40e network card. All tests were repeated 10 >> times and I restarted OVS in between some tests. >> >> First I tested with tso + tunnel encapsulation with a vxlan tunnel. >> >> Without patch: >> Mean: 6.09 Gbps >> Stdev: 0.098 >> >> With patch: >> Mean: 6.20 Gbps >> Stdev: 0.097 >> >> From this it's clear in the tunnel + TSO case there is a noticeable >> improvement! >> >> Next I just tested just a straight path from the VM, through OVS, to the nic. >> >> Without patch: >> Mean: 16.81 Gbps >> Stdev: 0.86 >> >> With patch: >> Mean: 17.68 Gbps >> Stdev: 0.91 >> >> Again we see the small but paradoxical performance improvement with >> the patch. There weren't a lot of samples overall, but I ran a t-test >> and found a p value of 0.045 suggesting significance. >> >> Cheers, >> M >> >>> >>> >>> - testpmd (txonly) mlx5 -> mlx5 OVS mlx5 <-> mlx5 testpmd (mac) >>> * Before patch: >>> flow-dump from pmd on cpu core: 6 >>> ufid:5ba3b6ab-7595-4904-aeb3-410ec10f0f84, >>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:91/00:00:00:00:00:00,dst=04:3f:72:b2:c0:90/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0), >>> packets:100320113, bytes:6420487232, used:0.000s, dp:ovs, >>> actions:dpdk0, dp-extra-info:miniflow_bits(4,1) >>> flow-dump from pmd on cpu core: 4 >>> ufid:3627c676-e0f9-4293-b86b-6824c35f9a6c, >>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:90/00:00:00:00:00:00,dst=02:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0), >>> packets:106807423, bytes:6835675072, used:0.000s, dp:ovs, >>> actions:dpdk1, dp-extra-info:miniflow_bits(4,1) >>> >>> Rx-pps: 11367442 Rx-bps: 5820130688 >>> Tx-pps: 11367439 Tx-bps: 5820128800 >>> >>> * After patch: >>> flow-dump from pmd on cpu core: 6 >>> ufid:41a51bc1-f6cb-4810-8372-4a9254a1db52, >>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:91/00:00:00:00:00:00,dst=04:3f:72:b2:c0:90/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0), >>> packets:32408002, bytes:2074112128, used:0.000s, dp:ovs, >>> actions:dpdk0, dp-extra-info:miniflow_bits(4,1) >>> flow-dump from pmd on cpu core: 4 >>> ufid:115e4654-1e01-467b-9360-de75eb1e872b, >>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:90/00:00:00:00:00:00,dst=02:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0), >>> packets:37689559, bytes:2412131776, used:0.000s, dp:ovs, >>> actions:dpdk1, dp-extra-info:miniflow_bits(4,1) >>> >>> Rx-pps: 12084135 Rx-bps: 6187077192 >>> Tx-pps: 12084135 Tx-bps: 6187077192 >>> >>> >>> - testpmd (txonly) virtio-user -> vhost-user OVS vhost-user -> >>> virtio-user testpmd (mac) >>> * Before patch: >>> flow-dump from pmd on cpu core: 6 >>> ufid:79248354-3697-4d2e-9d70-cc4df5602ff9, >>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:56/00:00:00:00:00:00,dst=00:11:22:33:44:55/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0), >>> packets:23402111, bytes:1497735104, used:0.000s, dp:ovs, >>> actions:vhost0, dp-extra-info:miniflow_bits(4,1) >>> flow-dump from pmd on cpu core: 4 >>> ufid:ca8974b4-2c7e-49c1-bdc6-5d90638997b6, >>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:55/00:00:00:00:00:00,dst=00:11:22:33:44:66/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0), >>> packets:23402655, bytes:1497769920, used:0.001s, dp:ovs, >>> actions:vhost1, dp-extra-info:miniflow_bits(4,1) >>> >>> Rx-pps: 6022487 Rx-bps: 3083513840 >>> Tx-pps: 6022487 Tx-bps: 3083513840 >>> >>> * After patch: >>> flow-dump from pmd on cpu core: 6 >>> ufid:c2bac91a-d8a6-4a96-9d56-aee133d1f047, >>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:56/00:00:00:00:00:00,dst=00:11:22:33:44:55/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0), >>> packets:53921535, bytes:3450978240, used:0.000s, dp:ovs, >>> actions:vhost0, dp-extra-info:miniflow_bits(4,1) >>> flow-dump from pmd on cpu core: 4 >>> ufid:c4989fca-2662-4645-8291-8971c00b7cb4, >>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:55/00:00:00:00:00:00,dst=00:11:22:33:44:66/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0), >>> packets:53921887, bytes:3451000768, used:0.000s, dp:ovs, >>> actions:vhost1, dp-extra-info:miniflow_bits(4,1) >>> >>> Rx-pps: 6042410 Rx-bps: 3093714208 >>> Tx-pps: 6042407 Tx-bps: 3093712616 >>> > > Hi, Mike and David. > > Thanks for the test results, but I don't think they are relevant. At least > the > David's ones. The datapath flows show no matches on eth addresses and that > suggests that the simple match is in use. And miniflow_extract is not called > in > this case, so the test doesn't really check the changes. The variance in the > test results is also concerning as nothing should have changed in the > datapath, > but the performance changes for some reason. > > Mike, what OpenFlow rules are you using in your setup? > > > On my end, I did my own set of runs with and without this patch and I see > about > 0.82% performance degradation for a V2V scenario with a NORMAL OpenFlow rule > and > no real difference with simple match, which is expected. My numbers are: > > NORMAL Simple match > patch main patch main > 7420.0 7481.6 8333.1 8333.9
These numbers are in Kpps. > > -0.82 % -0.009 % > > The numbres are averages over 14 alternating runs of each type, so the results > should be statistically significant. The fact that there is no difference in > a simple match case also suggests that the difference with NORMAL is real. > > Could you re-check your tests? > > For the reference, my configuration is: > > --- > ./configure CFLAGS="-msse4.2 -g -Ofast -march=native" --with-dpdk=static > CC=gcc > > ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x14 > ovs-vsctl add-br ovsbr -- set bridge ovsbr datapath_type=netdev > ovs-vsctl add-port ovsbr vhost0 \ > -- set Interface vhost0 type=dpdkvhostuserclient \ > options:vhost-server-path=/tmp/vhost0 > ovs-vsctl set Interface vhost0 other_config:pmd-rxq-affinity=0:2,1:2 > ovs-vsctl add-port ovsbr vhost1 \ > -- set Interface vhost1 type=dpdkvhostuserclient \ > options:vhost-server-path=/tmp/vhost1 > ovs-vsctl set Interface vhost1 other_config:pmd-rxq-affinity=0:4,1:4 > > ovs-vsctl set Open_vSwitch . other_config:dpdk-extra='--no-pci > --single-file-segments' > ovs-vsctl set Open_vSwitch . other_config:dpdk-init=try > > ovs-ofctl del-flows ovsbr > ovs-ofctl add-flow ovsbr actions=NORMAL > > ./build-24.11/bin/dpdk-testpmd -l 12,14 -n 4 --socket-mem=1024,0 --no-pci \ > > --vdev="net_virtio_user,path=/tmp/vhost1,server=1,mac=E6:49:42:EC:67:3C,in_order=1" > \ > --in-memory --single-file-segments -- \ > --burst=32 --txd=2048 --rxd=2048 --rxq=1 --txq=1 --nb-cores=1 \ > --eth-peer=0,5A:90:B6:77:22:F8 --forward-mode=txonly --stats-period=5 > > ./build-24.11/bin/dpdk-testpmd -l 8,10 -n 4 --socket-mem=1024,0 --no-pci \ > > --vdev="net_virtio_user,path=/tmp/vhost0,server=1,mac=5A:90:B6:77:22:F8,in_order=1" > \ > --in-memory --single-file-segments -- \ > --burst=32 --txd=2048 --rxd=2048 --rxq=1 --txq=1 --nb-cores=1 \ > --eth-peer=0,E6:49:42:EC:67:3C --forward-mode=mac --stats-period=5 > --- > > Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
