On Mon, Jun 08, 2026 at 07:12:14PM +0200, Paolo Abeni wrote:
> On 6/8/26 12:41 PM, Fiona Ebner wrote:
> > Am 05.06.26 um 4:54 PM schrieb Paolo Abeni:
> >> On 6/5/26 4:02 PM, Fiona Ebner wrote:
> >>> Am 09.11.25 um 4:10 PM schrieb Michael S. Tsirkin:
> >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>> index 17ed0ef919..3b85560f6f 100644
> >>>> --- a/hw/net/virtio-net.c
> >>>> +++ b/hw/net/virtio-net.c
> >>>> @@ -4299,19 +4299,19 @@ static const Property virtio_net_properties[] = {
> >>>> VIRTIO_DEFINE_PROP_FEATURE("host_tunnel", VirtIONet,
> >>>> host_features_ex,
> >>>> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO,
> >>>> - false),
> >>>> + true),
> >>> it seems that the host_tunnel setting can cause issues when VXLAN
> >>> traffic originating in a guest goes over a physical NIC which does not
> >>> support the feature. We received several reports about the issue
> >>> [0][1][2][3] and were able to reproduce it. Turning off the
> >>> 'host_tunnel' property in the commandline for the VirtIO net device
> >>> makes TCP traffic work. The network configuration from our reproducer
> >>> setup is as follows:
> >>>
> >>> guest A (iperf3 -c) guest B (iperf3 -s)
> >>> vxlan using vNIC as underlay vxlan using vNIC as underlay
> >>> virtualized NIC exposed to guest virtualized NIC exposed to guest
> >>> ---guest boundary--- ---guest boundary---
> >>> tap device connected to bridge tap device connected to bridge
> >>> bridge with physical NIC as port bridge with physical NIC as port
> >>> physical NIC <---host boundary---> physical NIC
> >>>
> >>> Bridge configuration:
> >>> iface vmbr0 inet static
> >>> address 10.48.0.109/20
> >>> gateway 10.48.0.1
> >>> bridge-ports nic3
> >>> bridge-stp off
> >>> bridge-fd 0
> >>> bridge-vlan-aware yes
> >>> bridge-vids 2-4094
> >>>
> >>> VXLAN created with:
> >>> ip link add vxlan0 type vxlan id 100 remote X dstport 4789 dev eth1
> >>> where eth1 is the virtualized NIC exposed to the guest
> >>>
> >>> The physical NIC does not have the feature:
> >>> Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme
> >>> BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
> >>> tx-udp_tnl-segmentation: off [fixed]
> >>> tx-udp_tnl-csum-segmentation: off [fixed]
> >>>
> >>> Using a physical NIC which does have the feature works:
> >>> Ethernet controller [0200]: Broadcom Inc. and subsidiaries BCM57504
> >>> NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb Ethernet [14e4:1751] (rev 11)
> >>> tx-udp_tnl-segmentation: on
> >>> tx-udp_tnl-csum-segmentation: on
> >>>
> >>> Host kernel:
> >>> Proxmox VE with 7.0.2-6-pve
> >>>
> >>> Guest kernel:
> >>> Apline with 6.18.34-0-lts
> >>>
> >>> QEMU commandline for the vNIC:
> >>>> -netdev
> >>>> 'type=tap,id=net2,ifname=tap103i2,script=/usr/libexec/qemu-server/pve-bridge,downscript=/usr/libexec/qemu-server/pve-bridgedown,vhost=on'
> >>>> \
> >>>> -device
> >>>> 'virtio-net-pci,mac=BC:24:11:78:C3:3B,netdev=net2,bus=pci.0,addr=0x14,id=net2,rx_queue_size=1024,tx_queue_size=256,host_mtu=1500'
> >>>> \
> >>>
> >>> We can see that QEMU sets the features for the tap interface via ioctl()
> >>> and the host kernel allows it:
> >>> tx-udp_tnl-segmentation: on
> >>> tx-udp_tnl-csum-segmentation: on
> >>>
> >>> As far as we understand, in the problematic scenario, nothing is ever
> >>> filling in the checksums for the inner TCP packets, meaning the outer
> >>> UDP checksum ends up being wrong on the target side. Is the host kernel
> >>> responsible for doing that before passing the packet to the physical NIC
> >>> (without the feature)? Or who would be?
> >>>
> >>> Turning off host_tunnel_csum without turning off host_tunnel does not
> >>> help.
> >>>
> >>> Interestingly, turning off the features for the working physical NIC
> >>> does not make it break:
> >>> tx-udp_tnl-segmentation: off
> >>> tx-udp_tnl-csum-segmentation: off
> >>> Could it be that the NIC just always fills in the inner TCP checksums
> >>> regardless of that setting?
> >>>
> >>> On the other hand, running
> >>> localhost:~# ethtool -K eth2 tx-checksum-ip-generic off
> >>> Actual changes:
> >>> tx-checksum-ip-generic: off
> >>> tx-tcp-segmentation: off [not requested]
> >>> tx-tcp-ecn-segmentation: off [not requested]
> >>> tx-tcp6-segmentation: off [not requested]
> >>> tx-udp-segmentation: off [not requested]
> >>> inside the guests makes it work for the physical NIC without the
> >>> tx-udp_tnl* features.
> >>>
> >>> I wanted to ask if this configuration is expected to be unsupported and
> >>> if the management is expected to turn off the feature on the commandline
> >>> if the traffic might go over a physical NIC without the feature. Or if
> >>> this could be a kernel or NIC bug that should be investigated further?
> >>> In the former case, should the option really be turned on by default
> >>> with new machine versions?
> >>
> >> Thank you for the detailed report. The configuration you describe is
> >> supported and expected to work. The fact that different results are
> >> obtained on top of a NIC with:
> >>
> >> [1] tx-udp_tnl-segmentation: off [fixed]
> >>
> >> WRT to similar setup on top of NIC with:
> >>
> >> [2] tx-udp_tnl-segmentation: off
> >>
> >> is indeed strange/unexpected, as the two scenarios are indistinguishable
> >> from the stack perspective. I suspect the issue is NIC driver dependent.
> >>
> >> I understand [1] is using a tg3 driver, and [2] bnxt, both running Linux
> >> 7.0.2, am I correct?
> >
> > Yes.
> >
> >> If you disable csum offloading on the tg3 NIC, does that impact the
> >> results?
> >
> > Yes, doing
> >
> > root@tamy3:~# ethtool -K nic3 tx-checksum-ipv4 off
> > Actual changes:
> > tx-checksum-ipv4: off
> > tx-tcp-segmentation: off [not requested]
> > tx-tcp-ecn-segmentation: off [not requested]
> >
> > on both hosts makes it work.
> >
> >> If you have such data handy, could you please share pcap captures on
> >> both ends? links to some accessible URL would be better than sending a
> >> lot of data to the ML, I think.
> >
> > I captured the following while the problem is present with
> > tcpdump -i foo udp port 4789 -w bar.pcap
> > on the host interfaces (tap, bridge and physical NIC) just to be sure.
> > Looking at it with tcpdump -envvvr, within the same host, only the
> > timestamps change. Between the hosts, the UDP checksums do change, but
> > the inner TCP checksums do not. So I suppose the NIC fills in the UDP
> > checksum based on the still wrong data? Since the UDP checksum would
> > already be correct if the TCP checksums would be fixed up?
> >
> > For the NIC with the tx-udp_tnl features, the inner TCP checksums do get
> > corrected and the UDP checksum stays the same. I did not include
> > captures for this.
> >
> > IPs for the guest running iperf -s (on host tamy2)
> > 10.48.6.81 for the virtualized NIC
> > 10.0.123.102 for the VXLAN
> >
> > IPs for the guest running iperf -c (on host tamy3):
> > 10.48.6.101 for the virtualized NIC
> > 10.0.123.103 for the VXLAN
> >
> > The captures are short, so I take the liberty to just provide them directly:
> >
> > [I] febner@enia ~> tar cf pcap.tar tamy*.pcap
> > [I] febner@enia ~> xz pcap.tar
> > [I] febner@enia ~> base64 -w 70 pcap.tar.xz
> > /Td6WFoAAATm1rRGBMDXBYCgASEBFgAAAAAAAKQwOkPgT/8Cz10AOhhJ/551cIJN23SQMX
> > Q2Us4cGiof2bxOS4FK4DxejNh+76NiWIpdIfOxrB5urac3FT0mPKMbUreSY+04/NhofcgS
> > Zz41D6t/Xp+VkPxNYx7Xsp3xz4xUCsVuK205jz6G/NAY0bJ0+UrJuCkP0G5VBtn88hJstD
> > 7qlaT7qcBLECseOO1OfqsLezxasbm5p614IL18cqAVMCMWucr/Kh2Oqth26v7zI4SVEJC/
> > YSEgaOhfjCbQZSi85BEw9/NSZO6IqoyNLrEiPUPgXTWH63NssG+4RMuBswrkgN5Wld70B1
> > mROOCwKbo9b9oXI4DumGHqgCV5jdAxzITpEjMQpvDKh6NvM5L/8v1cPiGjLFSL2JesZ0F5
> > dTbstymv1q4eN+9f3ng+4AXCvDzaziYMwtGwxYyptK5qDI2oGsCIGwFDpP/ZEw7NYI9EMM
> > G2+SDG6D8bKgKWl9Mi6EJcqSMVKFR1P1Z/P3XJ/9sWOMJug1IVYZGIJmtXXM3+roqOEGMF
> > tco/LMUJHgdmfkitfuZ5tN1+0EVE0/f4GQiUpdidjqfZ2m9jL0svcGXUd5D3LN0tbh5vmP
> > KzXQNtMQiMY6Fj7gbzDbOQGGW/L3/34B5YV+pWEpzhAbeTI9KL0ZF3vJ0OESlL9OMhrqgl
> > WX23bxek2h9eG15eO9cderaoCOFb8NEKIjC+UTh2Ir7/ZFfDvlXeGB/3jXM8OTmWmJSr5b
> > CrAvBQ4xvow3hwKq2Fbyu7aU6KycVpo03a+59LqxPyRfc3qRXcoUnp8MTi2YUk+kfYR6mI
> > S9AE/5xYFzb7I40RUBPUm0OCzguzk9qlIcab3lnTFnrMWa+Cj9AMIkWEEf0tMzw0v9+17u
> > VJg/8tWMad3d9Jc5Z6B9kOukzGvgVEWoq4z9snb/k6u2sBVY36q2iI1cmSPrI+UcF2GtSA
> > Qs6bt/T/c1Xi2r0Up+tRDrIE9O2aNAAAAKAWpkIVfsvrAAHzBYCgAQAA6om1scRn+wIAAA
> > AABFla
>
> Thanks for the data. The bug is in the virtio_net driver; non GSO
> packets requiring inner header csum are handled unmodified from the
> guest to the host, and the H/W NICs has to compute the inner transport
> header csum for the encap packet.
>
> Could you please try the attached kernel patch? You will need to update
> the kernel inside the guest.
Paolo was there supposed to be a patch here?
> > Do you have any tips where to start looking in the kernel? What is the
> > expected place where the TCP checksums are corrected if the NIC does not
> > have the tx-udp_tnl features?
>
> Hopefully no more investigation needed. FTR, for non GSO packets, the
> inner transport header csum should be computed inside the guest before
> transmitting on the virtio net device.
>
> For GSO packets such csum should be computed inside the host, just
> before transmitting on the H/W NIC (if the latter does not support the
> relevant offload).
>
> Thanks,
>
> Paolo