On 5/31/24 04:00, Simon Jones wrote:
> Hi all,
> 
> I'm using ovs-dpdk(ovs:2.17.1, dpdk:21.11.1).
> Now I found a BUG that ovs crash and could NOT fix again after set
> request_mtu.
> 
> 1. How to reproduce and my Analysis:
> ```
> # start ovs and add bridge and port and openflow
> 
> [root@bogon ~]# ovs-vsctl show
> 0444869c-dc4d-462f-8caf-074ecbab1a55
>     Bridge br-int
>         datapath_type: netdev
>         Port p0
>             Interface p0
>                 type: dpdk
>                 options: {dpdk-devargs="0000:c1:00.0"}
>         Port br-int
>             Interface br-int
>                 type: internal
>     Bridge br-phy
>         datapath_type: netdev
>         Port pf1vf0
>             Interface pf1vf0
>                 type: dpdk
>                 options: {dpdk-devargs="0000:c1:00.1,representor=[0]"}
>         Port pf1vf1
>             Interface pf1vf1
>                 type: dpdk
>                 options: {dpdk-devargs="0000:c1:00.1,representor=[1]"}
>         Port br-phy
>             Interface br-phy
>                 type: internal
>         Port pf1vf3
>             Interface pf1vf3
>                 type: dpdk
>                 options: {dpdk-devargs="0000:c1:00.1,representor=[3]"}
>         Port pf1vf2
>             Interface pf1vf2
>                 type: dpdk
>                 options: {dpdk-devargs="0000:c1:00.1,representor=[2]"}
>     ovs_version: "2.17.2"
> 
> [root@bogon ~]# ovs-ofctl dump-flows br-int
>  cookie=0x0, duration=60216.364s, table=0, n_packets=16923639262,
> n_bytes=984712027272, priority=0 actions=NORMAL
> 
>  865084 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:48.23
> revalidator53
>  865123 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:00.43
> revalidator92
>  865158 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 175:58.49
> revalidator127
>  865171 root      10 -10  522.9g   1.6g  42808 S  17.3   0.6 176:29.69
> revalidator140
>  865058 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:58.03
> revalidator27
>  865091 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 175:41.81
> revalidator60
>  865111 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:05.97
> revalidator80
>  865113 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 177:09.64
> revalidator82
>  865130 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:16.27
> revalidator99
>  865155 root      10 -10  522.9g   1.6g  42808 S  16.9   0.6 176:11.22
> revalidator124
>  865097 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 177:00.22
> revalidator66
>  865110 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 175:16.52
> revalidator79
>  865149 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 176:00.84
> revalidator118
>  865151 root      10 -10  522.9g   1.6g  42808 S  16.6   0.6 176:29.06
> revalidator120
>  865057 root      10 -10  522.9g   1.6g  42808 S  16.3   0.6 178:03.60
> revalidator26
>  865070 root      10 -10  522.9g   1.6g  42808 S  16.3   0.6 176:17.63
> revalidator39
>  865112 root      10 -10  522.9g   1.6g  42808 S  16.3   0.6 175:35.65
> revalidator81
>  865083 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:21.53
> revalidator52
>  865124 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 175:31.27
> revalidator93
>  865127 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:59.65
> revalidator96
>  865147 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 176:51.85
> revalidator116
>  865164 root      10 -10  522.9g   1.6g  42808 S  15.9   0.6 177:34.16
> revalidator133
>  865051 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:27.68
> revalidator20
>  865066 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:54.05
> revalidator35
>  865087 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 175:38.54
> revalidator56
>  865100 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:12.42
> revalidator69
>  865118 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 176:02.57
> revalidator87
>  865121 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 176:06.20
> revalidator90
>  865132 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:24.71
> revalidator101
>  865148 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 179:07.53
> revalidator117
>  865162 root      10 -10  522.9g   1.6g  42808 S  15.6   0.6 177:18.34
> revalidator131
>  865047 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 176:30.75
> revalidator16
>  865080 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 175:36.41
> revalidator49
>  865117 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 176:03.18
> revalidator86
>  865125 root      10 -10  522.9g   1.6g  42808 S  15.3   0.6 177:15.42
> revalidator94
>  865122 root      10 -10  522.9g   1.6g  42808 S  15.0   0.6 176:45.37
> revalidator91
>  865065 root      10 -10  522.9g   1.6g  42808 S  14.6   0.6 176:49.66
> revalidator34
>  865116 root      10 -10  522.9g   1.6g  42808 S  14.6   0.6 174:57.67
> revalidator85
>  865161 root      10 -10  522.9g   1.6g  42808 S  14.6   0.6 175:10.52
> revalidator130
>  865133 root      10 -10  522.9g   1.6g  42808 S  14.3   0.6 174:49.83
> revalidator102
>  865016 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   1:27.68
> ovs-vswitchd
>  865017 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:14.57
> eal-intr-thread
>  865020 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> bond_cmd_parse_
>  865021 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> telemetry-v2
>  865022 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.65
> dpdk_watchdog1
>  865023 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:10.16 urcu2
>  865025 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:36.14
> ct_clean3
>  865026 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.04
> ipf_clean4
>  865027 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:12.28
> hw_offload5
>  865028 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> pmd-c106/id:6
>  865030 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> pmd-c88/id:8
>  865031 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> pmd-c21/id:9
>  865032 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> pmd-c78/id:10
>  865033 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> pmd-c124/id:11
>  865035 root      10 -10  522.9g   1.6g  42808 S   0.0   0.6   0:00.00
> pmd-c96/id:13
> 
> Notice here, I found that if only one revalidator, there is no BUG.
> So maybe thread race-condition of revalidator?
> 
> # type these commands
> 
> ovs-vsctl set interface p0 mtu_request=3000
> ovs-vsctl set interface p0 mtu_request=1000
> ovs-vsctl set interface p0 mtu_request=2000
> ovs-vsctl set interface p0 mtu_request=3100
> ovs-vsctl set interface p0 mtu_request=200
> ovs-vsctl set interface p0 mtu_request=300
> ovs-vsctl set interface p0 mtu_request=500
> ovs-vsctl set interface p0 mtu_request=3000
> ovs-vsctl set interface p0 mtu_request=1500
> ovs-vsctl set interface p0 mtu_request=1300
> ovs-vsctl set interface p0 mtu_request=1200
> ovs-vsctl set interface p0 mtu_request=800
> ovs-vsctl set interface p0 mtu_request=4000
> ovs-vsctl set interface p0 mtu_request=5000
> ovs-vsctl set interface p0 mtu_request=600
> ovs-vsctl set interface p0 mtu_request=2400
> ovs-vsctl set interface p0 mtu_request=4800
> 
> Notice, type these commands at one time, the BUG may happen.
> But if type commands one by one, which type one command and wait for a
> time, the BUG will NOT happen.
> So maybe thread race-condition revalidator?
> 
> # BUG happen
> 
> 2024-05-24T10:29:54.061Z|00001|fatal_signal(revalidator111)|WARN|terminating
> with signal 15 (Terminated)

This is not a crash or a bug.  Signal 15 is a SIGTERM.  It was sent by some
other process to ask OVS to terminate itself.  You need to find the process
that sends it.

In case you're running OVS inside the container, the usual suspect would be
the container termination.  Container runtimes usually send SIGTERM to the
processes inside before stopping the container.

> # 1st, ovs-vswitch restart, I think this is because hugepage is not enough?
> 2024-05-24T11:03:48.154Z|00858|netdev_dpdk|WARN|'p0' is trying to use
> device '0000:c1:00.0' which is already in use by 'p0'

This looks strange, I'm not sure how that can happen.

> 2024-05-24T11:03:48.154Z|00859|netdev|WARN|p0: could not set configuration
> (Address already in use)
> 2024-05-24T11:03:48.154Z|00860|dpdk|ERR|Invalid port_id=512
> # 2nd, after restart, lots of this log.
> # Is this caused by thread race-condition of revalidator? Which one thread
> add p0, but another add p0 again?

Port additions are happening in a single thread, so there should be no race.

> 
> But the key is, this condition could not recover by such as `ovs-vsctl
> del-port br-int p0` or `ovs-vsctl set interface p0 mtu_request=1500`.
> Only restart ovs-vswitch could recover.
> ```
> 
> 2. My question
> ```
> - Is this a BUG which has already been resolved? If it is, which commit?
> - How to resolve this BUG?
> ```
> 
> Thanks~
> 
> ----
> Simon Jones

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to