On 5/31/24 04:00, Simon Jones wrote: > Hi all, > > I'm using ovs-dpdk(ovs:2.17.1, dpdk:21.11.1). > Now I found a BUG that ovs crash and could NOT fix again after set > request_mtu. > > 1. How to reproduce and my Analysis: > ``` > # start ovs and add bridge and port and openflow > > [root@bogon ~]# ovs-vsctl show > 0444869c-dc4d-462f-8caf-074ecbab1a55 > Bridge br-int > datapath_type: netdev > Port p0 > Interface p0 > type: dpdk > options: {dpdk-devargs="0000:c1:00.0"} > Port br-int > Interface br-int > type: internal > Bridge br-phy > datapath_type: netdev > Port pf1vf0 > Interface pf1vf0 > type: dpdk > options: {dpdk-devargs="0000:c1:00.1,representor=[0]"} > Port pf1vf1 > Interface pf1vf1 > type: dpdk > options: {dpdk-devargs="0000:c1:00.1,representor=[1]"} > Port br-phy > Interface br-phy > type: internal > Port pf1vf3 > Interface pf1vf3 > type: dpdk > options: {dpdk-devargs="0000:c1:00.1,representor=[3]"} > Port pf1vf2 > Interface pf1vf2 > type: dpdk > options: {dpdk-devargs="0000:c1:00.1,representor=[2]"} > ovs_version: "2.17.2" > > [root@bogon ~]# ovs-ofctl dump-flows br-int > cookie=0x0, duration=60216.364s, table=0, n_packets=16923639262, > n_bytes=984712027272, priority=0 actions=NORMAL > > 865084 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:48.23 > revalidator53 > 865123 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:00.43 > revalidator92 > 865158 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 175:58.49 > revalidator127 > 865171 root 10 -10 522.9g 1.6g 42808 S 17.3 0.6 176:29.69 > revalidator140 > 865058 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:58.03 > revalidator27 > 865091 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 175:41.81 > revalidator60 > 865111 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:05.97 > revalidator80 > 865113 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 177:09.64 > revalidator82 > 865130 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:16.27 > revalidator99 > 865155 root 10 -10 522.9g 1.6g 42808 S 16.9 0.6 176:11.22 > revalidator124 > 865097 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 177:00.22 > revalidator66 > 865110 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 175:16.52 > revalidator79 > 865149 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 176:00.84 > revalidator118 > 865151 root 10 -10 522.9g 1.6g 42808 S 16.6 0.6 176:29.06 > revalidator120 > 865057 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 178:03.60 > revalidator26 > 865070 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 176:17.63 > revalidator39 > 865112 root 10 -10 522.9g 1.6g 42808 S 16.3 0.6 175:35.65 > revalidator81 > 865083 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:21.53 > revalidator52 > 865124 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 175:31.27 > revalidator93 > 865127 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:59.65 > revalidator96 > 865147 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 176:51.85 > revalidator116 > 865164 root 10 -10 522.9g 1.6g 42808 S 15.9 0.6 177:34.16 > revalidator133 > 865051 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:27.68 > revalidator20 > 865066 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:54.05 > revalidator35 > 865087 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 175:38.54 > revalidator56 > 865100 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:12.42 > revalidator69 > 865118 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 176:02.57 > revalidator87 > 865121 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 176:06.20 > revalidator90 > 865132 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:24.71 > revalidator101 > 865148 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 179:07.53 > revalidator117 > 865162 root 10 -10 522.9g 1.6g 42808 S 15.6 0.6 177:18.34 > revalidator131 > 865047 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 176:30.75 > revalidator16 > 865080 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 175:36.41 > revalidator49 > 865117 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 176:03.18 > revalidator86 > 865125 root 10 -10 522.9g 1.6g 42808 S 15.3 0.6 177:15.42 > revalidator94 > 865122 root 10 -10 522.9g 1.6g 42808 S 15.0 0.6 176:45.37 > revalidator91 > 865065 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 176:49.66 > revalidator34 > 865116 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 174:57.67 > revalidator85 > 865161 root 10 -10 522.9g 1.6g 42808 S 14.6 0.6 175:10.52 > revalidator130 > 865133 root 10 -10 522.9g 1.6g 42808 S 14.3 0.6 174:49.83 > revalidator102 > 865016 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 1:27.68 > ovs-vswitchd > 865017 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:14.57 > eal-intr-thread > 865020 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > bond_cmd_parse_ > 865021 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > telemetry-v2 > 865022 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.65 > dpdk_watchdog1 > 865023 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:10.16 urcu2 > 865025 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:36.14 > ct_clean3 > 865026 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.04 > ipf_clean4 > 865027 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:12.28 > hw_offload5 > 865028 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > pmd-c106/id:6 > 865030 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > pmd-c88/id:8 > 865031 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > pmd-c21/id:9 > 865032 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > pmd-c78/id:10 > 865033 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > pmd-c124/id:11 > 865035 root 10 -10 522.9g 1.6g 42808 S 0.0 0.6 0:00.00 > pmd-c96/id:13 > > Notice here, I found that if only one revalidator, there is no BUG. > So maybe thread race-condition of revalidator? > > # type these commands > > ovs-vsctl set interface p0 mtu_request=3000 > ovs-vsctl set interface p0 mtu_request=1000 > ovs-vsctl set interface p0 mtu_request=2000 > ovs-vsctl set interface p0 mtu_request=3100 > ovs-vsctl set interface p0 mtu_request=200 > ovs-vsctl set interface p0 mtu_request=300 > ovs-vsctl set interface p0 mtu_request=500 > ovs-vsctl set interface p0 mtu_request=3000 > ovs-vsctl set interface p0 mtu_request=1500 > ovs-vsctl set interface p0 mtu_request=1300 > ovs-vsctl set interface p0 mtu_request=1200 > ovs-vsctl set interface p0 mtu_request=800 > ovs-vsctl set interface p0 mtu_request=4000 > ovs-vsctl set interface p0 mtu_request=5000 > ovs-vsctl set interface p0 mtu_request=600 > ovs-vsctl set interface p0 mtu_request=2400 > ovs-vsctl set interface p0 mtu_request=4800 > > Notice, type these commands at one time, the BUG may happen. > But if type commands one by one, which type one command and wait for a > time, the BUG will NOT happen. > So maybe thread race-condition revalidator? > > # BUG happen > > 2024-05-24T10:29:54.061Z|00001|fatal_signal(revalidator111)|WARN|terminating > with signal 15 (Terminated)
This is not a crash or a bug. Signal 15 is a SIGTERM. It was sent by some other process to ask OVS to terminate itself. You need to find the process that sends it. In case you're running OVS inside the container, the usual suspect would be the container termination. Container runtimes usually send SIGTERM to the processes inside before stopping the container. > # 1st, ovs-vswitch restart, I think this is because hugepage is not enough? > 2024-05-24T11:03:48.154Z|00858|netdev_dpdk|WARN|'p0' is trying to use > device '0000:c1:00.0' which is already in use by 'p0' This looks strange, I'm not sure how that can happen. > 2024-05-24T11:03:48.154Z|00859|netdev|WARN|p0: could not set configuration > (Address already in use) > 2024-05-24T11:03:48.154Z|00860|dpdk|ERR|Invalid port_id=512 > # 2nd, after restart, lots of this log. > # Is this caused by thread race-condition of revalidator? Which one thread > add p0, but another add p0 again? Port additions are happening in a single thread, so there should be no race. > > But the key is, this condition could not recover by such as `ovs-vsctl > del-port br-int p0` or `ovs-vsctl set interface p0 mtu_request=1500`. > Only restart ovs-vswitch could recover. > ``` > > 2. My question > ``` > - Is this a BUG which has already been resolved? If it is, which commit? > - How to resolve this BUG? > ``` > > Thanks~ > > ---- > Simon Jones _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev