Hi all, I'm using ovs-dpdk(ovs:2.17.1, dpdk:21.11.1). Now I found a BUG that ovs crash and could NOT fix again after set request_mtu.
1. How to reproduce and my Analysis: ``` # start ovs and add bridge and port and openflow [root@bogon ~]# ovs-vsctl show 0444869c-dc4d-462f-8caf-074ecbab1a55 Bridge br-int datapath_type: netdev Port p0 Interface p0 type: dpdk options: {dpdk-devargs="0000:c1:00.0"} Port br-int Interface br-int type: internal Bridge br-phy datapath_type: netdev Port pf1vf0 Interface pf1vf0 type: dpdk options: {dpdk-devargs="0000:c1:00.1,representor=[0]"} Port pf1vf1 Interface pf1vf1 type: dpdk options: {dpdk-devargs="0000:c1:00.1,representor=[1]"} Port br-phy Interface br-phy type: internal Port pf1vf3 Interface pf1vf3 type: dpdk options: {dpdk-devargs="0000:c1:00.1,representor=[3]"} Port pf1vf2 Interface pf1vf2 type: dpdk options: {dpdk-devargs="0000:c1:00.1,representor=[2]"} ovs_version: "2.17.2" [root@bogon ~]# ovs-ofctl dump-flows br-int cookie=0x0, duration=60216.364s, table=0, n_packets=16923639262, n_bytes=984712027272, priority=0 actions=NORMAL 865084 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:48.23 revalidator53 865123 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:00.43 revalidator92 865158 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:58.49 revalidator127 865171 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 176:29.69 revalidator140 865058 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:58.03 revalidator27 865091 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 175:41.81 revalidator60 865111 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:05.97 revalidator80 865113 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 177:09.64 revalidator82 865130 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:16.27 revalidator99 865155 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:11.22 revalidator124 865097 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 177:00.22 revalidator66 865110 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 175:16.52 revalidator79 865149 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 176:00.84 revalidator118 865151 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 176:29.06 revalidator120 865057 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 178:03.60 revalidator26 865070 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 176:17.63 revalidator39 865112 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 175:35.65 revalidator81 865083 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:21.53 revalidator52 865124 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 175:31.27 revalidator93 865127 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:59.65 revalidator96 865147 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:51.85 revalidator116 865164 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 177:34.16 revalidator133 865051 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:27.68 revalidator20 865066 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:54.05 revalidator35 865087 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:38.54 revalidator56 865100 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:12.42 revalidator69 865118 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 176:02.57 revalidator87 865121 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 176:06.20 revalidator90 865132 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:24.71 revalidator101 865148 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 179:07.53 revalidator117 865162 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:18.34 revalidator131 865047 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 176:30.75 revalidator16 865080 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 175:36.41 revalidator49 865117 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 176:03.18 revalidator86 865125 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 177:15.42 revalidator94 865122 root 10 -10 522.9g 1.6g 42808 S 15.0 0.6 176:45.37 revalidator91 865065 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 176:49.66 revalidator34 865116 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 174:57.67 revalidator85 865161 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 175:10.52 revalidator130 865133 root 10 -10 522.9g 1.6g 42808 S 14.3 0.6 174:49.83 revalidator102 865016 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 1:27.68 ovs-vswitchd 865017 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:14.57 eal-intr-thread 865020 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 bond_cmd_parse_ 865021 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 telemetry-v2 865022 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.65 dpdk_watchdog1 865023 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:10.16 urcu2 865025 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:36.14 ct_clean3 865026 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.04 ipf_clean4 865027 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:12.28 hw_offload5 865028 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 pmd-c106/id:6 865030 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 pmd-c88/id:8 865031 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 pmd-c21/id:9 865032 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 pmd-c78/id:10 865033 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 pmd-c124/id:11 865035 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 pmd-c96/id:13 Notice here, I found that if only one revalidator, there is no BUG. So maybe thread race-condition of revalidator? # type these commands ovs-vsctl set interface p0 mtu_request=3000 ovs-vsctl set interface p0 mtu_request=1000 ovs-vsctl set interface p0 mtu_request=2000 ovs-vsctl set interface p0 mtu_request=3100 ovs-vsctl set interface p0 mtu_request=200 ovs-vsctl set interface p0 mtu_request=300 ovs-vsctl set interface p0 mtu_request=500 ovs-vsctl set interface p0 mtu_request=3000 ovs-vsctl set interface p0 mtu_request=1500 ovs-vsctl set interface p0 mtu_request=1300 ovs-vsctl set interface p0 mtu_request=1200 ovs-vsctl set interface p0 mtu_request=800 ovs-vsctl set interface p0 mtu_request=4000 ovs-vsctl set interface p0 mtu_request=5000 ovs-vsctl set interface p0 mtu_request=600 ovs-vsctl set interface p0 mtu_request=2400 ovs-vsctl set interface p0 mtu_request=4800 Notice, type these commands at one time, the BUG may happen. But if type commands one by one, which type one command and wait for a time, the BUG will NOT happen. So maybe thread race-condition revalidator? # BUG happen 2024-05-24T10:29:54.061Z|00001|fatal_signal(revalidator111)|WARN|terminating with signal 15 (Terminated) # 1st, ovs-vswitch restart, I think this is because hugepage is not enough? 2024-05-24T11:03:48.154Z|00858|netdev_dpdk|WARN|'p0' is trying to use device '0000:c1:00.0' which is already in use by 'p0' 2024-05-24T11:03:48.154Z|00859|netdev|WARN|p0: could not set configuration (Address already in use) 2024-05-24T11:03:48.154Z|00860|dpdk|ERR|Invalid port_id=512 # 2nd, after restart, lots of this log. # Is this caused by thread race-condition of revalidator? Which one thread add p0, but another add p0 again? But the key is, this condition could not recover by such as `ovs-vsctl del-port br-int p0` or `ovs-vsctl set interface p0 mtu_request=1500`. Only restart ovs-vswitch could recover. ``` 2. My question ``` - Is this a BUG which has already been resolved? If it is, which commit? - How to resolve this BUG? ``` Thanks~ ---- Simon Jones _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev