Daniele, I think that you said in our meeting today that there was some
sort of bug that falsely blames a thread. Can you explain further?
On Mon, Jan 25, 2016 at 09:29:52PM +0100, Patrik Andersson R wrote:
> Right, that is likely for sure. Will look there first.
>
> What do you think of the case where the thread is "main". I've got examples
> of
> this one as well. Have not been able to figure out so far what would cause
> this.
>
> ...
> ovs-vswitchd.log.1.1.1.1:2016-01-23T01:47:19.026Z|00016|ovs_rcu(urcu2)|WARN|blocked
> 32768000 ms waiting for main to quiesce
> ovs-vswitchd.log.1.1.1.1:2016-01-23T10:53:27.026Z|00017|ovs_rcu(urcu2)|WARN|blocked
> 65536000 ms waiting for main to quiesce
> ovs-vswitchd.log.1.1.1.1:2016-01-24T05:05:43.026Z|00018|ovs_rcu(urcu2)|WARN|blocked
> 131072000 ms waiting for main to quiesce
> ovs-vswitchd.log.1.1.1.1:2016-01-24T18:24:40.826Z|00001|ovs_rcu(urcu1)|WARN|blocked
> 1092 ms waiting for main to quiesce
> ovs-vswitchd.log.1.1.1.1:2016-01-24T18:24:41.805Z|00002|ovs_rcu(urcu1)|WARN|blocked
> 2072 ms waiting for main to quiesce
> ...
>
> Could it be in connection with a deletion of a netdev port?
>
> Regards,
>
> Patrik
>
>
> On 01/25/2016 07:50 PM, Ben Pfaff wrote:
> >On Mon, Jan 25, 2016 at 03:09:09PM +0100, Patrik Andersson R wrote:
> >>during robustness testing, where VM:s are booted and deleted using nova
> >>boot/delete in rather rapid succession, VMs get stuck in spawning state
> >>after
> >>a few test cycles. Presumably this is due to the OVS not responding to port
> >>additions and deletions anymore, or rather that responses to these requests
> >>become painfully slow. Other requests towards the vswitchd fail to complete
> >>in any reasonable time frame as well, ovs-appctl vlog/set is one example.
> >>
> >>The only conclusion I can draw at the moment is that some thread (I've
> >>observed main and dpdk_watchdog3) is blocking the ovsrcu_synchronize()
> >>operation for "infinite" time and there is no fall-back to get out of this.
> >>To
> >>recover, the minimum operation seems to be a service restart of the
> >>openvswitch-switch service but that seems to cause other issues longer term.
> >>
> >>In the vswitch log when this happens the following can be observed:
> >>
> >>2016-01-24T20:36:14.601Z|02742|ovs_rcu(vhost_thread2)|WARN|blocked 1000 ms
> >>waiting for dpdk_watchdog3 to quiesce
> >This looks like a bug somewhere in the DPDK code. The watchdog code is
> >really simple:
> >
> > static void *
> > dpdk_watchdog(void *dummy OVS_UNUSED)
> > {
> > struct netdev_dpdk *dev;
> >
> > pthread_detach(pthread_self());
> >
> > for (;;) {
> > ovs_mutex_lock(&dpdk_mutex);
> > LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> > ovs_mutex_lock(&dev->mutex);
> > check_link_status(dev);
> > ovs_mutex_unlock(&dev->mutex);
> > }
> > ovs_mutex_unlock(&dpdk_mutex);
> > xsleep(DPDK_PORT_WATCHDOG_INTERVAL);
> > }
> >
> > return NULL;
> > }
> >
> >Although it looks at first glance like it doesn't quiesce, xsleep() does
> >that internally, so I guess check_link_status() must be hanging.
>
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss