Hi,
On 6/5/26 4:02 PM, Fiona Ebner wrote:
> Am 09.11.25 um 4:10 PM schrieb Michael S. Tsirkin:
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 17ed0ef919..3b85560f6f 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -4299,19 +4299,19 @@ static const Property virtio_net_properties[] = {
>> VIRTIO_DEFINE_PROP_FEATURE("host_tunnel", VirtIONet,
>> host_features_ex,
>> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO,
>> - false),
>> + true),
> it seems that the host_tunnel setting can cause issues when VXLAN
> traffic originating in a guest goes over a physical NIC which does not
> support the feature. We received several reports about the issue
> [0][1][2][3] and were able to reproduce it. Turning off the
> 'host_tunnel' property in the commandline for the VirtIO net device
> makes TCP traffic work. The network configuration from our reproducer
> setup is as follows:
>
> guest A (iperf3 -c) guest B (iperf3 -s)
> vxlan using vNIC as underlay vxlan using vNIC as underlay
> virtualized NIC exposed to guest virtualized NIC exposed to guest
> ---guest boundary--- ---guest boundary---
> tap device connected to bridge tap device connected to bridge
> bridge with physical NIC as port bridge with physical NIC as port
> physical NIC <---host boundary---> physical NIC
>
> Bridge configuration:
> iface vmbr0 inet static
> address 10.48.0.109/20
> gateway 10.48.0.1
> bridge-ports nic3
> bridge-stp off
> bridge-fd 0
> bridge-vlan-aware yes
> bridge-vids 2-4094
>
> VXLAN created with:
> ip link add vxlan0 type vxlan id 100 remote X dstport 4789 dev eth1
> where eth1 is the virtualized NIC exposed to the guest
>
> The physical NIC does not have the feature:
> Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme
> BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
> tx-udp_tnl-segmentation: off [fixed]
> tx-udp_tnl-csum-segmentation: off [fixed]
>
> Using a physical NIC which does have the feature works:
> Ethernet controller [0200]: Broadcom Inc. and subsidiaries BCM57504
> NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb Ethernet [14e4:1751] (rev 11)
> tx-udp_tnl-segmentation: on
> tx-udp_tnl-csum-segmentation: on
>
> Host kernel:
> Proxmox VE with 7.0.2-6-pve
>
> Guest kernel:
> Apline with 6.18.34-0-lts
>
> QEMU commandline for the vNIC:
>> -netdev
>> 'type=tap,id=net2,ifname=tap103i2,script=/usr/libexec/qemu-server/pve-bridge,downscript=/usr/libexec/qemu-server/pve-bridgedown,vhost=on'
>> \
>> -device
>> 'virtio-net-pci,mac=BC:24:11:78:C3:3B,netdev=net2,bus=pci.0,addr=0x14,id=net2,rx_queue_size=1024,tx_queue_size=256,host_mtu=1500'
>> \
>
> We can see that QEMU sets the features for the tap interface via ioctl()
> and the host kernel allows it:
> tx-udp_tnl-segmentation: on
> tx-udp_tnl-csum-segmentation: on
>
> As far as we understand, in the problematic scenario, nothing is ever
> filling in the checksums for the inner TCP packets, meaning the outer
> UDP checksum ends up being wrong on the target side. Is the host kernel
> responsible for doing that before passing the packet to the physical NIC
> (without the feature)? Or who would be?
>
> Turning off host_tunnel_csum without turning off host_tunnel does not help.
>
> Interestingly, turning off the features for the working physical NIC
> does not make it break:
> tx-udp_tnl-segmentation: off
> tx-udp_tnl-csum-segmentation: off
> Could it be that the NIC just always fills in the inner TCP checksums
> regardless of that setting?
>
> On the other hand, running
> localhost:~# ethtool -K eth2 tx-checksum-ip-generic off
> Actual changes:
> tx-checksum-ip-generic: off
> tx-tcp-segmentation: off [not requested]
> tx-tcp-ecn-segmentation: off [not requested]
> tx-tcp6-segmentation: off [not requested]
> tx-udp-segmentation: off [not requested]
> inside the guests makes it work for the physical NIC without the
> tx-udp_tnl* features.
>
> I wanted to ask if this configuration is expected to be unsupported and
> if the management is expected to turn off the feature on the commandline
> if the traffic might go over a physical NIC without the feature. Or if
> this could be a kernel or NIC bug that should be investigated further?
> In the former case, should the option really be turned on by default
> with new machine versions?
Thank you for the detailed report. The configuration you describe is
supported and expected to work. The fact that different results are
obtained on top of a NIC with:
[1] tx-udp_tnl-segmentation: off [fixed]
WRT to similar setup on top of NIC with:
[2] tx-udp_tnl-segmentation: off
is indeed strange/unexpected, as the two scenarios are indistinguishable
from the stack perspective. I suspect the issue is NIC driver dependent.
I understand [1] is using a tg3 driver, and [2] bnxt, both running Linux
7.0.2, am I correct?
If you disable csum offloading on the tg3 NIC, does that impact the results?
If you have such data handy, could you please share pcap captures on
both ends? links to some accessible URL would be better than sending a
lot of data to the ML, I think.
Thanks,
Paolo