Hello,
In a high-traffic scenario, when modifying the bond-rebalance-interval 
configuration for an OVS-DPDK bond interface, 
we observed that OVS-DPDK generated USERSPACE_INVALID_PORT_DROP errors.

After analysis, executing the command ovs-vsctl set port dpdk_tun_port 
other_config:bond-rebalance-interval=1000 
triggered the following process, ultimately leading to the 
USERSPACE_INVALID_PORT_DROP errors:

1. Execution of memset(bond->hash, 0, hash_len);
Call stack:
#0 bond_entry_reset (bond=0x4c64bc0) at ofproto/bond.c:1852  
#1 0x0000000001a2a238 in bond_reconfigure (bond=0x4c64bc0, s=0x7fff6d1dec10) at 
ofproto/bond.c:514  
#2 0x0000000001a4e253 in bundle_set (ofproto_=0x4c21110, aux=0x4c39d90, 
s=0x7fff6d1deb90) at ofproto/ofproto-dpif.c:3484  
#3 0x0000000001a31b27 in ofproto_bundle_register (ofproto=0x4c21110, 
aux=0x4c39d90, s=0x7fff6d1deb90) at ofproto/ofproto.c:1430  
#4 0x0000000001a1c80e in port_configure (port=0x4c39d90) at 
vswitchd/bridge.c:1384  
#5 0x0000000001a1b7b3 in bridge_reconfigure (ovs_cfg=0x4bb37c0) at 
vswitchd/bridge.c:1005  
#6 0x0000000001a223e7 in bridge_run () at vswitchd/bridge.c:3423  
#7 0x0000000001a27b9e in main (argc=11, argv=0x7fff6d1def38) at 
vswitchd/ovs-vswitchd.c:129  

2. Execution of member_map[i] = OFPP_NONE
Call stack:
#0 bond_add_lb_output_buckets (bond=0x37220f0) at ofproto/bond.c:2135  
#1 0x0000000001a29b4f in update_recirc_rules__ (bond=0x37220f0) at 
ofproto/bond.c:356  
#2 0x0000000001a29ebe in update_recirc_rules (bond=0x37220f0) at 
ofproto/bond.c:426  
#3 0x0000000001a2a262 in bond_reconfigure (bond=0x37220f0, s=0x7fffffffe230) at 
ofproto/bond.c:520  
#4 0x0000000001a4e292 in bundle_set (ofproto_=0x366afa0, aux=0x3713290, 
s=0x7fffffffe1b0) at ofproto/ofproto-dpif.c:3484  
#5 0x0000000001a31b66 in ofproto_bundle_register (ofproto=0x366afa0, 
aux=0x3713290, s=0x7fffffffe1b0) at ofproto/ofproto.c:1430  
#6 0x0000000001a1c80e in port_configure (port=0x3713290) at 
vswitchd/bridge.c:1384  
#7 0x0000000001a1b7b3 in bridge_reconfigure (ovs_cfg=0x3660180) at 
vswitchd/bridge.c:1005  
#8 0x0000000001a223b7 in bridge_run () at vswitchd/bridge.c:3422  
#9 0x0000000001a27b92 in main (argc=1, argv=0x7fffffffe558)  

3.PMD thread sending packets found port_no=0xffffffff
Call stack:
#0  dp_execute_output_action (pmd=0x7fff68731010, packets_=0x7fff53ff8f50, 
should_steal=true, port_no=4294967295)
    at lib/dpif-netdev.c:9273
#1  0x0000000001acaf6d in dp_execute_lb_output_action (pmd=0x7fff68731010, 
packets_=0x7fff53ff9ca0, should_steal=true, 
    bond=1) at lib/dpif-netdev.c:9350
#2  0x0000000001acb0b6 in dp_execute_cb (aux_=0x7fff53ff9b30, 
packets_=0x7fff53ff9ca0, a=0x7fff4800f074, should_steal=true)
    at lib/dpif-netdev.c:9379
#3  0x0000000001b526b5 in odp_execute_actions (dp=0x7fff53ff9b30, 
batch=0x7fff53ff9ca0, steal=true, 
    actions=0x7fff4800f074, actions_len=8, dp_execute_action=0x1acafc0 
<dp_execute_cb>) at lib/odp-execute.c:1016
#4  0x0000000001acbc8e in dp_netdev_execute_actions (pmd=0x7fff68731010, 
packets=0x7fff53ff9ca0, should_steal=true, 
    flow=0x7fff4800ea70, actions=0x7fff4800f074, actions_len=8) at 
lib/dpif-netdev.c:9698
#5  0x0000000001ac8133 in packet_batch_per_flow_execute (batch=0x7fff53ff9c90, 
pmd=0x7fff68731010)
    at lib/dpif-netdev.c:8338
#6  0x0000000001aca3ad in dp_netdev_input__ (pmd=0x7fff68731010, 
packets=0x7fff53ffbdf0, md_is_valid=false, port_no=4)
    at lib/dpif-netdev.c:9055
#7  0x0000000001aca3ff in dp_netdev_input (pmd=0x7fff68731010, 
packets=0x7fff53ffbdf0, port_no=4) at lib/dpif-netdev.c:9064
#8  0x0000000001ac0da2 in dp_netdev_process_rxq_port (pmd=0x7fff68731010, 
rxq=0x3720220, port_no=4)
    at lib/dpif-netdev.c:5690
#9  0x0000000001ac566a in pmd_thread_main (f_=0x7fff68731010) at 
lib/dpif-netdev.c:7334
#10 0x0000000001bc4b1b in ovsthread_wrapper (aux_=0x3711920) at 
lib/ovs-thread.c:422
#11 0x00007ffff76f4802 in start_thread () from /lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--
#12 0x00007ffff7694314 in clone () from /lib64/libc.so.6

The main issue arises from a timing discrepancy between the main thread and the 
PMD thread when operating on pmd->tx_bonds,
which causes the PMD to temporarily resolve the egress interface to 0xffffffff 
(an invalid value). 
What solutions does the community propose to address this problem?

our ovs version 2.17.5 lts.
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to