Hello,
we have really strange issue. Ovs-vswitchd  get stucked on ovs_rcu
revalidator quiesce, reporting following to the log:

2019-10-14T05:11:18.049Z|00416|ovs_rcu|WARN|blocked 1000 ms waiting for
revalidator518 to quiesce
2019-10-14T05:11:19.049Z|00417|ovs_rcu|WARN|blocked 2000 ms waiting for
revalidator518 to quiesce
2019-10-14T05:11:21.049Z|00418|ovs_rcu|WARN|blocked 4000 ms waiting for
revalidator518 to quiesce
2019-10-14T05:11:21.372Z|00001|ovs_rcu(urcu4)|WARN|blocked 1001 ms
waiting for revalidator518 to quiesce
2019-10-14T05:11:22.371Z|00002|ovs_rcu(urcu4)|WARN|blocked 2000 ms
waiting for revalidator518 to quiesce
2019-10-14T05:11:24.371Z|00003|ovs_rcu(urcu4)|WARN|blocked 4000 ms
waiting for revalidator518 to quiesce
2019-10-14T05:11:25.049Z|00419|ovs_rcu|WARN|blocked 8000 ms waiting for
revalidator518 to quiesce
2019-10-14T05:11:28.371Z|00004|ovs_rcu(urcu4)|WARN|blocked 8000 ms
waiting for revalidator518 to quiesce
2019-10-14T05:11:33.050Z|00420|ovs_rcu|WARN|blocked 16000 ms waiting for
revalidator518 to quiesce
2019-10-14T05:11:36.372Z|00005|ovs_rcu(urcu4)|WARN|blocked 16000 ms
waiting for revalidator518 to quiesce
2019-10-14T05:11:49.049Z|00421|ovs_rcu|WARN|blocked 32000 ms waiting for
revalidator518 to quiesce
2019-10-14T05:11:52.371Z|00006|ovs_rcu(urcu4)|WARN|blocked 32000 ms
waiting for revalidator518 to quiesce
2019-10-14T05:12:21.050Z|00422|ovs_rcu|WARN|blocked 64000 ms waiting for
revalidator518 to quiesce
2019-10-14T05:12:24.371Z|00007|ovs_rcu(urcu4)|WARN|blocked 64000 ms
waiting for revalidator518 to quiesce
2019-10-14T05:13:25.049Z|00423|ovs_rcu|WARN|blocked 128000 ms waiting
for revalidator518 to quiesce
2019-10-14T05:13:28.371Z|00008|ovs_rcu(urcu4)|WARN|blocked 128000 ms
waiting for revalidator518 to quiesce
2019-10-14T05:15:33.050Z|00424|ovs_rcu|WARN|blocked 256000 ms waiting
for revalidator518 to quiesce
2019-10-14T05:15:36.372Z|00009|ovs_rcu(urcu4)|WARN|blocked 256001 ms
waiting for revalidator518 to quiesce
2019-10-14T05:19:49.051Z|00425|ovs_rcu|WARN|blocked 512002 ms waiting
for revalidator518 to quiesce
2019-10-14T05:19:52.372Z|00010|ovs_rcu(urcu4)|WARN|blocked 512001 ms
waiting for revalidator518 to quiesce
2019-10-14T05:28:21.050Z|00426|ovs_rcu|WARN|blocked 1024000 ms waiting
for revalidator518 to quiesce
2019-10-14T05:28:24.372Z|00011|ovs_rcu(urcu4)|WARN|blocked 1024001 ms
waiting for revalidator518 to quiesce
2019-10-14T05:45:25.050Z|00427|ovs_rcu|WARN|blocked 2048000 ms waiting
for revalidator518 to quiesce
2019-10-14T05:45:28.373Z|00012|ovs_rcu(urcu4)|WARN|blocked 2048002 ms
waiting for revalidator518 to quiesce
2019-10-14T06:19:33.053Z|00428|ovs_rcu|WARN|blocked 4096004 ms waiting
for revalidator518 to quiesce
2019-10-14T06:19:36.372Z|00013|ovs_rcu(urcu4)|WARN|blocked 4096001 ms
waiting for revalidator518 to quiesce

This is resulting all calls to ovs-vswitchd get blocked, ovs-ofctl
dump-flows br-int gets frozen.

We have tried to debug this issue, and noticed while running strace or
producing core dump will magically unstuck ovs-vswitchd and it starts
working immediatelly:

gcore `pidof ovs-vswitchd`

Core dump size is always 3.7G

Any ideas what causes this, and how we can mitigate this issue ? We have
this gcore fix that we can apply on runtime as soon as frozen
ovs-vswitchd is detected by monitoring, but we need a definitive
solution as this is clearly a bug.

OS version: ubuntu bionic
Kernel version: 5.0.0-31-generic (ubuntu hwe kernel, issue was also
present on 4.X version)
Vswitch version: 2.12.0 (compiled from latest ubuntu source, but
previous version 2.11.X did also show the same issue)

Thanks
Zdenek Janda
cloudinfrastack
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to