Re: [ovs-discuss] dpdk watchdog stuck?

Ben Pfaff Mon, 25 Jan 2016 22:13:41 -0800

Daniele, I think that you said in our meeting today that there was some
sort of bug that falsely blames a thread.  Can you explain further?


On Mon, Jan 25, 2016 at 09:29:52PM +0100, Patrik Andersson R wrote:
> Right, that is likely for sure. Will look there first.
> 
> What do you think of the case where the thread is "main". I've got examples
> of
> this one as well. Have not been able to figure out so far what would cause
> this.
> 
> ...
> ovs-vswitchd.log.1.1.1.1:2016-01-23T01:47:19.026Z|00016|ovs_rcu(urcu2)|WARN|blocked
> 32768000 ms waiting for main to quiesce
> ovs-vswitchd.log.1.1.1.1:2016-01-23T10:53:27.026Z|00017|ovs_rcu(urcu2)|WARN|blocked
> 65536000 ms waiting for main to quiesce
> ovs-vswitchd.log.1.1.1.1:2016-01-24T05:05:43.026Z|00018|ovs_rcu(urcu2)|WARN|blocked
> 131072000 ms waiting for main to quiesce
> ovs-vswitchd.log.1.1.1.1:2016-01-24T18:24:40.826Z|00001|ovs_rcu(urcu1)|WARN|blocked
> 1092 ms waiting for main to quiesce
> ovs-vswitchd.log.1.1.1.1:2016-01-24T18:24:41.805Z|00002|ovs_rcu(urcu1)|WARN|blocked
> 2072 ms waiting for main to quiesce
> ...
> 
> Could it be in connection with a deletion of a netdev port?
> 
> Regards,
> 
> Patrik
> 
> 
> On 01/25/2016 07:50 PM, Ben Pfaff wrote:
> >On Mon, Jan 25, 2016 at 03:09:09PM +0100, Patrik Andersson R wrote:
> >>during robustness testing, where VM:s are booted and deleted using nova
> >>boot/delete in rather rapid succession, VMs get stuck in spawning state
> >>after
> >>a few test cycles. Presumably this is due to the OVS not responding to port
> >>additions and deletions anymore, or rather that responses to these requests
> >>become painfully slow. Other requests towards the vswitchd fail to complete
> >>in any reasonable time frame as well, ovs-appctl vlog/set is one example.
> >>
> >>The only conclusion I can draw at the moment is that some thread (I've
> >>observed main and dpdk_watchdog3) is blocking the ovsrcu_synchronize()
> >>operation for "infinite" time and there is no fall-back to get out of this.
> >>To
> >>recover, the minimum operation seems to be a service restart of the
> >>openvswitch-switch service but that seems to cause other issues longer term.
> >>
> >>In the vswitch log when this happens the following can be observed:
> >>
> >>2016-01-24T20:36:14.601Z|02742|ovs_rcu(vhost_thread2)|WARN|blocked 1000 ms
> >>waiting for dpdk_watchdog3 to quiesce
> >This looks like a bug somewhere in the DPDK code.  The watchdog code is
> >really simple:
> >
> >     static void *
> >     dpdk_watchdog(void *dummy OVS_UNUSED)
> >     {
> >         struct netdev_dpdk *dev;
> >
> >         pthread_detach(pthread_self());
> >
> >         for (;;) {
> >             ovs_mutex_lock(&dpdk_mutex);
> >             LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> >                 ovs_mutex_lock(&dev->mutex);
> >                 check_link_status(dev);
> >                 ovs_mutex_unlock(&dev->mutex);
> >             }
> >             ovs_mutex_unlock(&dpdk_mutex);
> >             xsleep(DPDK_PORT_WATCHDOG_INTERVAL);
> >         }
> >
> >         return NULL;
> >     }
> >
> >Although it looks at first glance like it doesn't quiesce, xsleep() does
> >that internally, so I guess check_link_status() must be hanging.
> 
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss

Re: [ovs-discuss] dpdk watchdog stuck?

Reply via email to