On 6/8/26 10:25 AM, Johan Thomsen wrote:
> Hello,
> 
> I am seeing what looks like a kernel bug in the Geneve/OVS/vhost
> transmit path on a Talos Linux node running Kube-ovn with Geneve
> overlay and KubeVirt VM traffic.
> 
> Environment:
> 
> Kernel: 6.18.33-talos
> Distro: Talos v1.13.3
> 
> Compiler/config:
> 
> CONFIG_CC_VERSION_TEXT="clang version 22.1.2"
> CONFIG_CC_IS_CLANG=y
> CONFIG_LTO=y
> CONFIG_LTO_CLANG=y
> CONFIG_LTO_CLANG_THIN=y
> CONFIG_FORTIFY_SOURCE=y
> 
> Hardware: HPE ProLiant DL325 Gen11, AMD EPYC
> 
> NIC driver: bnxt_en
> 
> Workload/network:
> 
> Kube-OVN, Geneve overlay
> Open vSwitch datapath
> KubeVirt/QEMU VM traffic via vhost/tap
> 
> Relevant console output:
> 
> [  648.742603] memcpy: detected buffer overflow: 104 byte write of
> buffer size 96
> [  648.749907] WARNING: CPU: 61 PID: 27020 at
> lib/string_helpers.c:1036 __fortify_report+0x45/0x60
> [  648.758689] Modules linked in: dm_round_robin dm_multipath lpfc
> nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en
> nvme hpilo hkdf libahci sp5100_tco watchdog k10temp
> [  648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not
> tainted 6.18.29-talos #1 PREEMPT(none)
> [  648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325
> Gen11, BIOS 2.84 11/05/2025
> [  648.890478]  skb_tunnel_info_unclone+0x179/0x190
> [  648.895152]  geneve_xmit+0x7fe/0xe00
> [  648.907240]  dev_hard_start_xmit+0xa7/0x1f0
> [  648.911479]  __dev_queue_xmit+0x864/0xf40
> [  648.919688]  do_execute_actions+0x9b9/0x1be0
> [  648.927727]  ovs_execute_actions+0x58/0x170
> [  648.931960]  ovs_dp_process_packet+0xb1/0x1c0
> [  648.936370]  ovs_vport_receive+0x90/0x100
> [  648.940428]  netdev_frame_hook+0x146/0x1a0
> [  648.954093]  __netif_receive_skb+0x3f/0x160
> [  648.958324]  process_backlog+0x10c/0x210
> [  648.962295]  __napi_poll+0x2f/0x190
> [  648.965832]  net_rx_action+0x2e3/0x500
> [  648.969632]  handle_softirqs+0xe7/0x310
> [  648.985387]  tun_get_user+0x137e/0x1510
> [  649.005878]  handle_tx+0x41f/0xd30
> [  649.029014]  vhost_run_work_list+0x52/0x90
> [  649.033162]  vhost_task_fn+0xc2/0x140
> [  649.064145] ---[ end trace 0000000000000000 ]---
> [  649.068820] ------------[ cut here ]------------
> [  649.073489] kernel BUG at lib/string_helpers.c:1043!
> 
> I don't know whether this is a real overflow or a FORTIFY false-positive.

Looks like a false-positive from the __counted_by fortification.

I'd guess something like this would fit it:

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 1fc2fb03ce3f9..e51c3795da474 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct 
sk_buff *skb)
        if (!new_md)
                return ERR_PTR(-ENOMEM);
 
+       new_md->u.tun_info.options_len = md_size;
        memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
               sizeof(struct ip_tunnel_info) + md_size);
 #ifdef CONFIG_DST_CACHE
---

Johan, could you try this in your setup?

The memory was actually allocated for the options, but the structure
is zeroed out on allocation, so the __counted_by check doesn't work
properly for the initial initialization copy.

But the operation in the diff above is kind of pointless as the memcpy
itself will copy the value again.  So, I'm not sure if that's the right
solution here.

Alternative might be to revert the kmalloc_flex back to the simple
kmalloc in metadata_dst_alloc.

CC: Kees

Best regards, Ilya Maximets.

> 
> I cannot reproduce the issue on Talos v1.12.X which uses a gcc built
> kernel, whereas the affected kernel is built with clang. Don't know
> whether this is relevant here.
> 
> I am currently trying to make a reliable reproducer, but I can almost
> always trigger the issue when iperf stressing the VM-network.
> 
> Please let me know if this should go to a more specific
> maintainer/list or further info is needed. I am able to test candidate
> patches if provided.
> 
> Downstream bug reports:
> https://github.com/siderolabs/talos/issues/13440
> https://github.com/kubeovn/kube-ovn/issues/6767
> 
> Thanks,
> Johan

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to