Right, that is likely for sure. Will look there first.
What do you think of the case where the thread is "main". I've got
examples of
this one as well. Have not been able to figure out so far what would
cause this.
...
ovs-vswitchd.log.1.1.1.1:2016-01-23T01:47:19.026Z|00016|ovs_rcu(urcu2)|WARN|blocked
32768000 ms waiting for main to quiesce
ovs-vswitchd.log.1.1.1.1:2016-01-23T10:53:27.026Z|00017|ovs_rcu(urcu2)|WARN|blocked
65536000 ms waiting for main to quiesce
ovs-vswitchd.log.1.1.1.1:2016-01-24T05:05:43.026Z|00018|ovs_rcu(urcu2)|WARN|blocked
131072000 ms waiting for main to quiesce
ovs-vswitchd.log.1.1.1.1:2016-01-24T18:24:40.826Z|00001|ovs_rcu(urcu1)|WARN|blocked
1092 ms waiting for main to quiesce
ovs-vswitchd.log.1.1.1.1:2016-01-24T18:24:41.805Z|00002|ovs_rcu(urcu1)|WARN|blocked
2072 ms waiting for main to quiesce
...
Could it be in connection with a deletion of a netdev port?
Regards,
Patrik
On 01/25/2016 07:50 PM, Ben Pfaff wrote:
On Mon, Jan 25, 2016 at 03:09:09PM +0100, Patrik Andersson R wrote:
during robustness testing, where VM:s are booted and deleted using nova
boot/delete in rather rapid succession, VMs get stuck in spawning state
after
a few test cycles. Presumably this is due to the OVS not responding to port
additions and deletions anymore, or rather that responses to these requests
become painfully slow. Other requests towards the vswitchd fail to complete
in any reasonable time frame as well, ovs-appctl vlog/set is one example.
The only conclusion I can draw at the moment is that some thread (I've
observed main and dpdk_watchdog3) is blocking the ovsrcu_synchronize()
operation for "infinite" time and there is no fall-back to get out of this.
To
recover, the minimum operation seems to be a service restart of the
openvswitch-switch service but that seems to cause other issues longer term.
In the vswitch log when this happens the following can be observed:
2016-01-24T20:36:14.601Z|02742|ovs_rcu(vhost_thread2)|WARN|blocked 1000 ms
waiting for dpdk_watchdog3 to quiesce
This looks like a bug somewhere in the DPDK code. The watchdog code is
really simple:
static void *
dpdk_watchdog(void *dummy OVS_UNUSED)
{
struct netdev_dpdk *dev;
pthread_detach(pthread_self());
for (;;) {
ovs_mutex_lock(&dpdk_mutex);
LIST_FOR_EACH (dev, list_node, &dpdk_list) {
ovs_mutex_lock(&dev->mutex);
check_link_status(dev);
ovs_mutex_unlock(&dev->mutex);
}
ovs_mutex_unlock(&dpdk_mutex);
xsleep(DPDK_PORT_WATCHDOG_INTERVAL);
}
return NULL;
}
Although it looks at first glance like it doesn't quiesce, xsleep() does
that internally, so I guess check_link_status() must be hanging.
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss